DB Concepts Series: What partitions are, and how they're used
Post Context: This content is adapted from our internal SME (Subject Matter Expert) curricula. Similar to knowledge drops, this content may be highly Looker specific and may become out of date in the future. The goal is to convey a high level understanding of a common SQL query issue and convey next steps for troubleshooting and resolving.
Partitions essentially work by creating a reduced subset of rows that the Database can quickly scan to reduce the number of rows it has to evaluate to fetch query results.
To restate this a different way, and in the context of getting Looker to work with them:
- We create a partition key to partition by. We don't need to do this in Looker, and a PDT is not required to make use of partitioning.
- The key essentially "slices" up rows of your dataset, grouping them so that the DB can do a scan over less rows
- We evaluate a condition against that Partition Key in the WHERE clause. the DB performs a scan over the partition key column and it gets to do all that work over less rows.
For example, using the following sample data:
|ID||Animal Name||Danger Level (1-5)||Has Hair (y/n)|
If we were to run
select * from table where Animal = butterfly , the DB has to scan over all 8 rows to fetch you your result at row 7.
However if you partition by ID, and adjust the query to reference the appropriate partition like so;
select * from table where Animal = butterfly and ID = 3 the following occurs:
- The DB scans the partitions in the partition key (ID). There are 3 unique values, so this means only 3 rows to scan
- It then scans the rows in the partition, there are 2 rows in partition 3, so that's only 2 rows to scan.
- Returns results after only having to scan 5 rows, instead of 8.
While in the example above, the performance gain is negligible, in large tables (hundreds of thousands or millions of rows), the performance gain can be exponential.