Aggregate tables and symmetric aggregation

kuopaz · 03-17-2021 01:08 PM

https://docs.looker.com/data-modeling/learning-lookml/aggregate_awareness

I have an aggregate table that is built successfully, but isn’t being used in a query when I would expect it to be.

This is the message in the SQL: “The query contains the following measures that cannot roll up.”

I think this is the issue: “Also, if your Explore has joins, verify that your measures aren’t converted to distinct measures (symmetric aggregates) through fanned out joins..”

The section “Symmetric aggregates for Explores with joins” isn’t that clear to me - but implies that the aggregate table won’t be used if the measure is on the one-end table and has used symmetric aggregation to create count distinct, sum distinct or average distinct.

For my case:

If the query results table contains *all* the dimensions in the aggregate table, it will be used and the generated SQL is only a SELECT from the aggregate table.

If the query results table contains only *some* of the dimensions in the aggregate table, it will not be used. In this case the generated SQL would need to include a GROUP BY , as I have seen on other explores and aggregate tables I have.

After all that(!), my questions are:

Am I correct in the above?

If yes, isn’t this quite a big restriction on aggregate awareness as symmetric aggregation on fanouts are commonplace?

Naomi_Johnson

Hi kupoaz,

This may seem like a big restriction but it is needed to ensure rows aren’t counted more than once in the aggregation.

As an example, let’s say you have an orders view which is joined to a customer view (many-to-one), and you have an aggregate table at order month and store level.
If you wanted to see the number of customers (count distinct) by month and used this aggregate table, you could end up counting customers more than once, if they bought from more than one store in that month.

The only way to correctly count the number of customers is to either have an aggregate table to the exact level you want (in the example above, at month level), or by using the most granular table (which Looker does).

There are a couple of suggestions Looker makes:

Create the exact aggregate table https://docs.looker.com/data-modeling/learning-lookml/aggregate_awareness#exact_match
Use approximates, if your SQL dialect allows https://docs.looker.com/reference/field-params/allow_approximate_optimization

Hope this helps,

Naomi

kuopaz

Thanks Naomi. Makes sense, and I can see the problem with count measures. Wouldn’t it work OK with sum measures, wouldn’t they be additive?

Naomi_Johnson

No problem.

It’s the same with sum measures. In the example above, if there was a column in the customers table that you sum, you’d end up summing the row more than once if the customer ordered from more than one store in a month.

Looker is really powerful and ensures that rows that are only counted once in aggregations where a fanout occurs.

This article explains symmetric aggregates well: https://help.looker.com/hc/en-us/articles/360023722974-A-Simple-Explanation-of-Symmetric-Aggregates-...

Hope this helps,

Naomi

kuopaz

OK, will need to get my head around that for a sum measure!

For my use case, I will have to create multiple aggregate tables, as you suggested as one of the work arounds.

It is an inevitable restriction as explained - perhaps should be given a higher profile. Aggregate tables are a great addition to Looker, and have solved a major problem for me - no longer need to pre-aggregate in the database to get adequate performance.

Naomi_Johnson

Agrgegate tables are definitely a great addition.

I’d recommend reading the links I sent; they should help get your head around this too.

kuopaz

I’ve ended-up turning off symmetric aggregates and redesigning the explore. Full aggregate awareness (i.e. performing a GROUP BY on the aggregate table) is too important to lose for my use case.