Post Context: This content is adapted from our internal SME (Subject Matter Expert) curricula. Similar to knowledge drops, this content may be highly Looker specific and may become out of date in the future. The goal is to convey a high level understanding of a common SQL query issue and convey next steps for troubleshooting and resolving.
Scenario: User reports that there's no performance improvement after partitioning a table(s), or that the 'estimated size' of the query in Looker seems off.
A few common problems:
Tables should be partitioned by a field that creates a meaningful grouping of rows. Given that partitions work by reducing the number of rows the DB has to query over to fetch query results, the partition_key needs to be a field that has few(ish) distinct values. To test for this, try running select partition_key, count(*)... order by 2 desc
. The output should be something like this:
partition_key | count |
key_1 | 23500 |
key_2 | 13500 |
key_3 | 3500 |
NOT this:
partition_key | count |
key_1 | 3 |
key_2 | 10 |
key_3 | 100 |
Should be self explanatory, however the resolution might require a bit of creative thinking about how Looker manipulates the where clause. Usually, something like a sql_always_where
or an access_filter
that ensures the explore always generates a WHERE partition_key_field = foo does the trick.
Tables won't share partition_keys (although it's possible to define partition_keys that will logically share values across all tables). Thusly, if we add a join, we need to filter on that joined table's partition_key to take full advantage of partitioning's performance benefits. Eg:
# Good use of partitioning
select *
from table1
join table2 on table1.id = table2.id
where table1.partition_key = foo and table2.partition_key = foo
# Bad use of partitioning
select *
from table1
join table2 on table1.id = table2.id
where table1.partition_key = foo