Primary keys are essential for Looker’s symmetric aggregates to work. Sometimes you end up with a table without a primary key.
The folks at BigQuery just announced a new function GENERATE_UUID()
that will make one for you. If you have a table without a primary key and you need one, you can simply make a derived table.
For example, Let’s take the names data, it that doesn’t have a primary key.
explore: names2 {}
view: names2 {
sql_table_name: `fh-bigquery.popular_names.usa_1910_2013` ;;
dimension: name {}
dimension: state {}
dimension: year {}
dimension: number {type:number}
measure: total_number {
type: sum
sql: ${number} ;;
}
measure: count {type:count}
}
You can add a primary key by making it a derived table. Let’s also assume that the table is clustered by year (its not, but let’s assume it is). The pattern would be to create a derived table, generate a uuid as primary key and add a WHERE
condition to make sure we pull the data efficiently.
view: names2 {
derived_table: {
sql:
SELECT
GENERATE_UUID() as uuid, *
FROM `fh-bigquery.popular_names.usa_1910_2013`
WHERE
{%condition year %}year{%endcondition%}
;;
}
dimension: uuid {primary_key:yes}
dimension: name {}
dimension: state {}
dimension: year {}
dimension: number {type:number}
measure: total_number {
type: sum
sql: ${number} ;;
}
measure: count {type:count}
}
How would that work with a large table which has multiple nested records? Would this trick also work? Would we have to create NDTs for each outer nested tables as well or simply generating uuids in a dimension suffice?
Thanks!
Hi @eliott
If you’re trying to assign a primary key to nested records (starting from a table which has its own primary key), what you usually want is UNNEST WITH OFFSET
and then you concatenate the original primary key and the array offset.
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening_arrays
Another pattern that can be helpful when joining multiple one_to_many joins (including unnested arrays) is join paths