Question

Create a dimension that is lagged by one row

  • 21 August 2017
  • 5 replies
  • 2799 views

I am working with google analytics dataset in looker. I need to access the “hits.time” (start time of the page view). In order to get the “end time” of the page view, I need to get the “hits.time” (i.e. start time) of the next page view. These are off by a row. Any suggestions how I can create a dimension called “time_next” that is essentially the hits.time from the next page view in the sequence of page views.


In bigquery, I am able to do this by using this function:

LEAD(hits.time, 1) over(

PARTITION BY

fullVisitorId,

visitId

ORDER BY

hits.time ASC)


5 replies

Userlevel 4
Badge

Hi @Vivek_Sinha,


We generally recommend building window functions in to derived tables (see this discourse article).


An example of that would be the following,


view: test {
derived_table: {
sql:
SELECT
column_A,
LEAD(column_A) OVER (ORDER BY column_A ASC) AS column_A_lead
FROM table
;;
}

dimension: column_A {
type: string
sql: ${TABLE}.column_A ;;
}

measure: column_A_lead {
type: string
sql: ${TABLE}.column_A_lead ;;
}
}
Userlevel 6
Badge

Actually, there might be an easy solution. Normally, when joining in hits, it simply unnests the array.


You should be able to replace the code:


  join: hits {
view_label: "Session: Hits"
sql: LEFT JOIN UNNEST(${ga_sessions.hits}) as hits ;;
relationship: one_to_many
}

We can rework the the hits subtable to include the endtime. If you replace the code above

in the block with the code below, it will create an additional dimension: endtime that can be added to the hits view in the model.


    join: hits {
view_label: "Session: Hits"
sql:
LEFT JOIN UNNEST(ARRAY(
(SELECT STRUCT(
hitNumber
, time
, LEAD(time, 1) OVER(ORDER BY hitNumber) as endtime
, hour
, minute
, issecure
, isinteraction
, referer
, page
, transaction
, item
, contentinfo
)
FROM UNNEST(ga_sessions.hits)
)
))
as hits ;;
relationship: one_to_many
}

It is unfortunate that you can’t reference * in a STRUCT, but I just tested this and it should work.

This works, Thank you!! But it limited my use cases quite a bit because of the inability to reference * in the struct. If there is a better way to actually get all the fields from hits + some additional lead fields, then this would be golden!


For example: All my filters related to eventInfo are now resulting in error. It worked fine when i was using a simple join like


join: hits {
view_label: "Session: Hits"
sql: LEFT JOIN UNNEST(${ga_sessions.hits}) as hits ;;
relationship: one_to_many
}

This is the join I am using, adapted from the suggestions above:


join: hits {
view_label: "Session: Hits"
sql: LEFT JOIN UNNEST(ARRAY(
(SELECT STRUCT(
fullVisitorId
, visitId
, hitNumber
, LEAD(hitNumber, 1) OVER(PARTITION BY fullVisitorId,visitId, hitNumber ORDER BY hitNumber) as hitNumber_next
, max(hitNumber) OVER(ORDER BY hitNumber) as hitNumber_max
, time
, LEAD(time, 1) OVER(ORDER BY time) as time_end
, 1.0*(((LEAD(time, 1) OVER(ORDER BY time)) - time)/1000) as time_on_page
, hour
, minute
, issecure
, isinteraction
, referer
, page
, transaction
, item
, contentinfo
, type
)
FROM UNNEST(ga_sessions.hits)
)
)) as hits ;;
relationship: one_to_many
}

Would really appreciate some suggestions on how to not disturb the existing filters that are based on hits.

Userlevel 6
Badge

Looks like we are missing eventInfo from hits. Try adding:


join: hits {
view_label: "Session: Hits"
sql: LEFT JOIN UNNEST(ARRAY(
(SELECT STRUCT(
fullVisitorId
, visitId
, hitNumber
, LEAD(hitNumber, 1) OVER(PARTITION BY fullVisitorId,visitId, hitNumber ORDER BY hitNumber) as hitNumber_next
, max(hitNumber) OVER(ORDER BY hitNumber) as hitNumber_max
, time
, LEAD(time, 1) OVER(ORDER BY time) as time_end
, 1.0*(((LEAD(time, 1) OVER(ORDER BY time)) - time)/1000) as time_on_page
, hour
, minute
, issecure
, isinteraction
, referer
, page
, transaction
, item
, contentinfo
, type
, eventInfo ### <- ADD THIS
)
FROM UNNEST(ga_sessions.hits)
)
)) as hits ;;
relationship: one_to_many
}
Userlevel 6
Badge

So it turns out there is a way to easily do this. BigQuery has a SELECT AS STRUCT that makes this better.


Thanks to Elliot over at Google for the help (https://issuetracker.google.com/issues/64929067)


  join: hits {
view_label: "Session: Hits"
sql:
LEFT JOIN UNNEST(ARRAY(
(SELECT AS STRUCT
*
, LEAD(time, 1) OVER(ORDER BY hitNumber) as endtime
FROM UNNEST(ga_sessions.hits)
)
))

Reply