Last tested: Oct 23, 2018
In August of 2018, Google introduced a beta feature called "Clustering," which allows you to organize, or sort, the contents of your tables by one or more columns. This optimizes performance when filtering or aggregating by those columns. This allows you to optimize performance on a string, because the partitioning can only be done on dates.
When to use clustering (for user managing tables in BQ)
Currently, BigQuery supports clustering over a partitioned table. Use clustering over a partitioned table when:
- Your data is already partitioned on a date or timestamp column.
- You commonly use filters or aggregation against particular columns in your queries.
Using with PDTs
There are some downsides that come with clustering discussed in this blog, because the data is only clustered or sorted at the time of the table creation. This means any new data coming into the table will not be sorted or clustered. However, because PDTs need to be consistently rebuilt this does not affect PDTs and gives us even more tuning options for PDTs when we need to use them on BigQuery.
More info here: https://cloud.google.com/bigquery/docs/clustered-tables
Good Blog: https://medium.com/mark-rittman/date-partitioning-and-table-clustering-in-google-bigquery-and-looker-pdts-2bab9ec3be19