Solved: Uncertain on Best Practices for Aggregations

newbie1 · 11-02-2022 10:36 PM

Hi all,

I’m new to Looker and come from a background of mostly using Tableau where my team used to create a ton of aggregation tables and link those to Tableau to display on a dashboard for stakeholders.

What I’m trying to understand with Looker is if it’s efficient to continue with this practice of creating aggregate tables on my db for pretty much any view we’d like to report on and then having Looker pick it up OR because of LookerML is it more efficient to bring in the raw tables and create the aggregate views on Looker?

kuopaz

I had the same on moving to Looker and think it’s one of those ‘it depends’ issues - perhaps not that helpful!

I would say, ideally, let Looker use the raw tables because that gives users the flexibility to aggregate measures over all the dimension in the explore (including day, week, month, quarter, year, etc.). The downside is the risk of unacceptably slow performance. Looker introduced a while back what they call Aggregate Awareness and, with a little LookML code, will pre-aggregate and store results in a scratch table in your database. This will improve performance for common user queries but won’t cover all the flexibility mentioned above.

As all Looker queries generate SQL, performance will vary with database design, indexing, quantity of data, etc. in the usual way.

I still have a small number of cases where I do pre-aggregate on the database for performance. Actually, probably do more outside of Looker to pre-calculate column values if too slow for Looker to be doing it on the fly. Or to ‘tidy up’ raw tables that are far away from what you would expect in a nice clean data warehouse.

Hope this helps, would be interested in the thoughts of others.

View solution in original post

kuopaz

I had the same on moving to Looker and think it’s one of those ‘it depends’ issues - perhaps not that helpful!

I would say, ideally, let Looker use the raw tables because that gives users the flexibility to aggregate measures over all the dimension in the explore (including day, week, month, quarter, year, etc.). The downside is the risk of unacceptably slow performance. Looker introduced a while back what they call Aggregate Awareness and, with a little LookML code, will pre-aggregate and store results in a scratch table in your database. This will improve performance for common user queries but won’t cover all the flexibility mentioned above.

As all Looker queries generate SQL, performance will vary with database design, indexing, quantity of data, etc. in the usual way.

I still have a small number of cases where I do pre-aggregate on the database for performance. Actually, probably do more outside of Looker to pre-calculate column values if too slow for Looker to be doing it on the fly. Or to ‘tidy up’ raw tables that are far away from what you would expect in a nice clean data warehouse.

Hope this helps, would be interested in the thoughts of others.