Modeling nested data in Big Query Standard SQL

  • 29 November 2016
  • 1 reply

Userlevel 3

With the release of Big Query’s Standard SQL dialect, it’s now much easier to model nested data sets in LookML.


For a quick primer on how nested and repeated files work in BigQuery, and why they’re valuable, take a look at Daniel’s post.



You can also head here for a complete Looker Block for Google Analytics Premium and GA 360 data, which puts the concepts described below into a plug-and-play LookML model.



These examples are based on the Google Analytics Premium (GAP) Big Query export data set, which makes heavy use of nested repeated fields.



Join-based querying of nested-repeated (i.e. arrays of STRUCT) fields, replacing the need for FLATTENS


The Google Analytics Premium schema has one table for session-level data, with individual hit (i.e. event) records nested within each session. To query hit-level and session level data at the same time (e.g. counting total number of sessions and total number of hits over time), there is a new join-based syntax for referencing those two different levels of nesting ([documented here] ( Mirroring this syntax in Looker, you’d define separate view files for session and hit fields in your model, and join them together in an explore:



view: session {

sql_table_name: `*` ;;

# Define all session-related dimensions and measures in this view file


view: hits {

#note that there is no sql_table_name here because we've defined it in the sql in the explore below

# Define all hit-related dimensions and measures in this view file


explore: session {

join: hits {

sql: ,UNNEST(visits.hits) as hits ;;

relationship: one_to_many




You can also select individual elements, or aggregate across multiple elements, from a nested-repeated columns, and use those unnested columns as dimensions in GROUP by statements.


This is particularly valuable for handling custom hit attributes in the GAP export schema:



  dimension: custom_attribute_1_value {

type: string

sql: (SELECT value FROM UNNEST(${TABLE}.customdimensions) where index = 1) ;;



You can even aggregate across several nested repeated records for a given event. This too can be used in a GROUP BY, or even further aggregated as a measure:



  dimension: number_of_custom_attributes {

type: number

sql: (SELECT count(*) FROM UNNEST(${TABLE}.customdimensions)) ;;


measure: average_number_of_custom_attributes {

type: average

sql: ${number_of_custom_attributes} ;;



Also of note, replacing table_date_range() and table_query()


This discourse post describes how to model date-separated tables, like the ga_sessions tables in the GAP schema, using bigquery’s new partitioning tools.


1 reply

Userlevel 3

Here are a few even more complex examples of dimensions you can build using subqueries:

dimension: number_of_long_pauses {
/* Find the number of times that X time elapses between events */
( select count(*) -- 2. Count the hits
from (
/* -- 1: For every hit (i.e. event), get it's timestamp, and the timestamp of the prior hit */
select time,
lag(time) over(order by time) as last_event_time
from unnest(hits)
where time - last_event_time > 1000 /* 3. Where the hit was more than 1000 miliseconds after the prior hit */
) ;;

dimension: event_type_with_longest_pause {

(select last_value(event_category) over(order by time_spent asc)
eventInfo.eventCategory as event_category,
lead(time) over(order by time) as time_spent
from unnest(hits))
limit 1