BIGQUERY (Concept): BigQuery Clustering

Knowledge Drop

Last tested: Oct 23, 2018
 

In August of 2018, Google introduced a beta feature called "Clustering," which allows you to organize, or sort, the contents of your tables by one or more columns. This optimizes performance when filtering or aggregating by those columns. This allows you to optimize performance on a string, because the partitioning can only be done on dates.

When to use clustering (for user managing tables in BQ)

Currently, BigQuery supports clustering over a partitioned table. Use clustering over a partitioned table when:

  • Your data is already partitioned on a date or timestamp column.
  • You commonly use filters or aggregation against particular columns in your queries.

Using with PDTs

There are some downsides that come with clustering discussed in this blog, because the data is only clustered or sorted at the time of the table creation. This means any new data coming into the table will not be sorted or clustered. However, because PDTs need to be consistently rebuilt this does not affect PDTs and gives us even more tuning options for PDTs when we need to use them on BigQuery.

Links

More info here: https://cloud.google.com/bigquery/docs/clustered-tables

Good Blog: https://medium.com/mark-rittman/date-partitioning-and-table-clustering-in-google-bigquery-and-looker...

This content is subject to limited support.                


 

Version history
Last update:
‎05-07-2021 09:04 AM
Updated by: