DB Concepts Series: Troubleshooting Partitions

jamesnestler · 05-28-2021 11:19 AM

DB Concepts Series: Troubleshooting Partitions

Post Context: This content is adapted from our internal SME (Subject Matter Expert) curricula. Similar to knowledge drops, this content may be highly Looker specific and may become out of date in the future. The goal is to convey a high level understanding of a common SQL query issue and convey next steps for troubleshooting and resolving.

Scenario: User reports that there's no performance improvement after partitioning a table(s), or that the 'estimated size' of the query in Looker seems off.

A few common problems:

Problem 1: The partition_key is not appropriately defined

Tables should be partitioned by a field that creates a meaningful grouping of rows. Given that partitions work by reducing the number of rows the DB has to query over to fetch query results, the partition_key needs to be a field that has few(ish) distinct values. To test for this, try running select partition_key, count(*)... order by 2 desc . The output should be something like this:

partition_key	count
key_1	23500
key_2	13500
key_3	3500

NOT this:

partition_key	count
key_1	3
key_2	10
key_3	100

Problem 2: The partition_key is not used in the WHERE clause

Should be self explanatory, however the resolution might require a bit of creative thinking about how Looker manipulates the where clause. Usually, something like a sql_always_where or an access_filter that ensures the explore always generates a WHERE partition_key_field = foo does the trick.

Problem 3: The partition_keys are not being used for all the joined tables (and possibly not all tables have a partition key)

Tables won't share partition_keys (although it's possible to define partition_keys that will logically share values across all tables). Thusly, if we add a join, we need to filter on that joined table's partition_key to take full advantage of partitioning's performance benefits. Eg:

# Good use of partitioning


select *

from table1

join table2 on table1.id = table2.id

where table1.partition_key = foo and table2.partition_key = foo


# Bad use of partitioning


select *

from table1

join table2 on table1.id = table2.id

where table1.partition_key = foo

Database Partitions: Troubleshooting

DB Concepts Series: Troubleshooting Partitions