BigQuery tables without Primary Keys

lloydtabb · 10-02-2018 03:59 PM

Primary keys are essential for Looker’s symmetric aggregates to work. Sometimes you end up with a table without a primary key.

The folks at BigQuery just announced a new function GENERATE_UUID() that will make one for you. If you have a table without a primary key and you need one, you can simply make a derived table.

For example, Let’s take the names data, it that doesn’t have a primary key.

explore: names2 {}
view: names2 {
  sql_table_name: `fh-bigquery.popular_names.usa_1910_2013` ;;
  dimension: name {}
  dimension: state {}
  dimension: year {}
  dimension: number {type:number}
  measure: total_number {
    type: sum
    sql: ${number} ;;
  }
  measure: count {type:count}
}

You can add a primary key by making it a derived table. Let’s also assume that the table is clustered by year (its not, but let’s assume it is). The pattern would be to create a derived table, generate a uuid as primary key and add a WHERE condition to make sure we pull the data efficiently.

view: names2 {
  derived_table: {
    sql:
      SELECT 
        GENERATE_UUID() as uuid, *
      FROM `fh-bigquery.popular_names.usa_1910_2013`
      WHERE
        {%condition year %}year{%endcondition%}
      ;;
  }
  
  dimension: uuid {primary_key:yes}
  dimension: name {}
  dimension: state {}
  dimension: year {}
  dimension: number {type:number}
  measure: total_number {
    type: sum
    sql: ${number} ;;
  }
  measure: count {type:count}
}

eliott

How would that work with a large table which has multiple nested records? Would this trick also work? Would we have to create NDTs for each outer nested tables as well or simply generating uuids in a dimension suffice?
Thanks!

fabio1

Hi @eliott

If you’re trying to assign a primary key to nested records (starting from a table which has its own primary key), what you usually want is UNNEST WITH OFFSET and then you concatenate the original primary key and the array offset.

https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening_arrays

Another pattern that can be helpful when joining multiple one_to_many joins (including unnested arrays) is join paths