Meta-Modeling: when to start a new model?

  • 28 June 2019
  • 3 replies

We’re almost two years into our Looker journey at Automattic, and it’s been quite a ride!

We’re at the point now where we’ve started to zoom out a bit, to spend some time working ON our Looker instance rather than just solely working IN our Looker instance. One question I’d love to crowdsource a bit is how to think about Model files:

  • How do you decide what goes in one versus another?

  • Do you create models per data source? Per business unit? Some other bucket?

  • How do you know when you ought to be creating a new model?

I’d especially love to hear from folks at Warby Parker ( @Ryan_Tuck I believe?) and Weight Watchers ( @Carl_Anderson right?) on this topic

3 replies

Userlevel 3

@Simon_Ouderkirk I’m not sure I have a great answer here. “It depends” and it is far more a business decision than a technical decision.

A model around a single data source makes sense only if that source is fully self-contained—all reasonable questions can be answered in those data alone—but typically that is not the case. you want to join the dots to many other sources across a user ID, product ID or some other entity.

It might make more sense to have a model around a business domain. For instance, we have one around data quality that pulls in a lot of other sources and metrics, another that covers our code data warehouse.

However, we also have some models that meet particular team’s needs because they need access to data that is normally locked off for normal analytics. An example is finance who can access actual dollar amounts on certain tables that are masked to others.

Userlevel 7

There are some things you cannot get away from and those are that a model can only have 1 connection and you cannot set permissions on explores, only the whole model.

We have very few centralised projects and so users or teams own their own projects and models, we mention how and when to use new models but we would rather people build and then we assist cleaning up/tuning after. What we usually say is:

Keep the number model files to a minimum.

  • Why: Cut down on administration of permissions and having many potentially unneeded files in a project.

  • How: A model file has a 1-to-1 relationship with a connection and a 1-to-many relationship with explores. Make generic model files for each connection but have many explores in each model file.

  • Exceptions: Sometimes you may wish to disregard this as you may want to split the explores logically into different files.

Userlevel 3

Yeah, generally agreed with the previous replies.

I’d generally err on the side of a model containing a set of explorations related to some business domain. You can define the Model Set that any given Role can access, so if your Looker Roles map closely to actual business roles, that is a somewhat intuitive mapping.

Keep in mind that finer-grain access control at the row/field level can/should be implemented (as far as I can tell) using access_grant and access_filter instead of model-based rules, but they’re a decent first pass at defining what data generally an individual should need to work with.