With the release of Big Query’s Standard SQL dialect, it’s now much easier to model nested data sets in LookML.
For a quick primer on how nested and repeated files work in BigQuery, and why they’re valuable, take a look at Daniel’s post.
You can also head here for a complete Looker Block for Google Analytics Premium and GA 360 data, which puts the concepts described below into a plug-and-play LookML model.
These examples are based on the Google Analytics Premium (GAP) Big Query export data set, which makes heavy use of nested repeated fields.
Join-based querying of nested-repeated (i.e. arrays of STRUCT) fields, replacing the need for FLATTENS
The Google Analytics Premium schema has one table for session-level data, with individual hit (i.e. event) records nested within each session. To query hit-level and session level data at the same time (e.g. counting total number of sessions and total number of hits over time), there is a new join-based syntax for referencing those two different levels of nesting ([documented here] (https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#removing_repetition_with_flatten)). Mirroring this syntax in Looker, you’d define separate view files for session and hit fields in your model, and join them together in an explore:
view: session {
sql_table_name: `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_*` ;;
# Define all session-related dimensions and measures in this view file
}
view: hits {
#note that there is no sql_table_name here because we've defined it in the sql in the explore below
# Define all hit-related dimensions and measures in this view file
}
explore: session {
join: hits {
sql: ,UNNEST(visits.hits) as hits ;;
relationship: one_to_many
}
}
You can also select individual elements, or aggregate across multiple elements, from a nested-repeated columns, and use those unnested columns as dimensions in GROUP by statements.
This is particularly valuable for handling custom hit attributes in the GAP export schema:
dimension: custom_attribute_1_value {
type: string
sql: (SELECT value FROM UNNEST(${TABLE}.customdimensions) where index = 1) ;;
}
You can even aggregate across several nested repeated records for a given event. This too can be used in a GROUP BY, or even further aggregated as a measure:
dimension: number_of_custom_attributes {
type: number
sql: (SELECT count(*) FROM UNNEST(${TABLE}.customdimensions)) ;;
}
measure: average_number_of_custom_attributes {
type: average
sql: ${number_of_custom_attributes} ;;
}
Also of note, replacing table_date_range() and table_query()
This discourse post describes how to model date-separated tables, like the ga_sessions tables in the GAP schema, using bigquery’s new partitioning tools.