TLDR: We need symmetric aggregates for a small number of records in our db, but they’re causing performance issues for the records that don’t need them. Any suggestions?
At Zearn, we use Looker to report on usage of our web application. We’re an educational nonprofit, so our customers/users are students, teachers, and administrators (principals etc.) at elementary schools and school districts.
One quirk of our data model is that some users can belong to multiple schools (our “customers”). Specifically, we have a specific user role (“school administrator”) who can belong to multiple schools. School admins who actually belong to multiple schools make up probably <1% of our userbase.
Our main usage Explore looks something like:
explore: fact_daily_usage {
join: dim_zearn_users {
relationship: one_to_one
sql_on: ${fact_daily_usage.user_id} = ${dim_zearn_users.id}
}
join: dim_schools {
relationship: many_to_many #THE PROBLEM
sql_on: ${dim_zearn_users.school_id} = ${dim_schools.id}
}
# ... etc
}
Because the join to schools is many-to-many for these multi-school admins, it can cause fanouts, which Looker helpfully solves with symmetric aggregates. However, we have some Looks from this explore that only deal with other types of users and don’t need symmetric aggregates, but are slowed down a ton (e.g. 2 minutes runtime becomes 40 mins) by Looker inserting symmetric aggregates (after all, we told Looker it was many_to_many!).
Does anyone have ideas of how to solve? One thing we could do is make a separate explore just for admin usage, but I’m worried that would be confusing to our business users.