Question

Create a dimension that is lagged by one row

  • 21 August 2017
  • 5 replies
  • 3064 views

I am working with google analytics dataset in looker. I need to access the “hits.time” (start time of the page view). In order to get the “end time” of the page view, I need to get the “hits.time” (i.e. start time) of the next page view. These are off by a row. Any suggestions how I can create a dimension called “time_next” that is essentially the hits.time from the next page view in the sequence of page views.



In bigquery, I am able to do this by using this function:


LEAD(hits.time, 1) over(


PARTITION BY


fullVisitorId,


visitId


ORDER BY


hits.time ASC)


5 replies

Userlevel 6
Badge

Actually, there might be an easy solution. Normally, when joining in hits, it simply unnests the array.



You should be able to replace the code:



  join: hits {

view_label: "Session: Hits"

sql: LEFT JOIN UNNEST(${ga_sessions.hits}) as hits ;;

relationship: one_to_many

}



We can rework the the hits subtable to include the endtime. If you replace the code above


in the block with the code below, it will create an additional dimension: endtime that can be added to the hits view in the model.



    join: hits {

view_label: "Session: Hits"

sql:

LEFT JOIN UNNEST(ARRAY(

(SELECT STRUCT(

hitNumber

, time

, LEAD(time, 1) OVER(ORDER BY hitNumber) as endtime

, hour

, minute

, issecure

, isinteraction

, referer

, page

, transaction

, item

, contentinfo

)

FROM UNNEST(ga_sessions.hits)

)

))

as hits ;;

relationship: one_to_many

}



It is unfortunate that you can’t reference * in a STRUCT, but I just tested this and it should work.

Userlevel 6
Badge

So it turns out there is a way to easily do this. BigQuery has a SELECT AS STRUCT that makes this better.



Thanks to Elliot over at Google for the help (https://issuetracker.google.com/issues/64929067)



  join: hits {

view_label: "Session: Hits"

sql:

LEFT JOIN UNNEST(ARRAY(

(SELECT AS STRUCT

*

, LEAD(time, 1) OVER(ORDER BY hitNumber) as endtime

FROM UNNEST(ga_sessions.hits)

)

))

Userlevel 4
Badge

Hi @Vivek_Sinha,



We generally recommend building window functions in to derived tables (see this discourse article).



An example of that would be the following,



view: test {

derived_table: {

sql:

SELECT

column_A,

LEAD(column_A) OVER (ORDER BY column_A ASC) AS column_A_lead

FROM table

;;

}



dimension: column_A {

type: string

sql: ${TABLE}.column_A ;;

}



measure: column_A_lead {

type: string

sql: ${TABLE}.column_A_lead ;;

}

}

Userlevel 6
Badge

Looks like we are missing eventInfo from hits. Try adding:



join: hits {

view_label: "Session: Hits"

sql: LEFT JOIN UNNEST(ARRAY(

(SELECT STRUCT(

fullVisitorId

, visitId

, hitNumber

, LEAD(hitNumber, 1) OVER(PARTITION BY fullVisitorId,visitId, hitNumber ORDER BY hitNumber) as hitNumber_next

, max(hitNumber) OVER(ORDER BY hitNumber) as hitNumber_max

, time

, LEAD(time, 1) OVER(ORDER BY time) as time_end

, 1.0*(((LEAD(time, 1) OVER(ORDER BY time)) - time)/1000) as time_on_page

, hour

, minute

, issecure

, isinteraction

, referer

, page

, transaction

, item

, contentinfo

, type

, eventInfo ### <- ADD THIS

)

FROM UNNEST(ga_sessions.hits)

)

)) as hits ;;

relationship: one_to_many

}

This works, Thank you!! But it limited my use cases quite a bit because of the inability to reference * in the struct. If there is a better way to actually get all the fields from hits + some additional lead fields, then this would be golden!



For example: All my filters related to eventInfo are now resulting in error. It worked fine when i was using a simple join like



join: hits {

view_label: "Session: Hits"

sql: LEFT JOIN UNNEST(${ga_sessions.hits}) as hits ;;

relationship: one_to_many

}



This is the join I am using, adapted from the suggestions above:



join: hits {

view_label: "Session: Hits"

sql: LEFT JOIN UNNEST(ARRAY(

(SELECT STRUCT(

fullVisitorId

, visitId

, hitNumber

, LEAD(hitNumber, 1) OVER(PARTITION BY fullVisitorId,visitId, hitNumber ORDER BY hitNumber) as hitNumber_next

, max(hitNumber) OVER(ORDER BY hitNumber) as hitNumber_max

, time

, LEAD(time, 1) OVER(ORDER BY time) as time_end

, 1.0*(((LEAD(time, 1) OVER(ORDER BY time)) - time)/1000) as time_on_page

, hour

, minute

, issecure

, isinteraction

, referer

, page

, transaction

, item

, contentinfo

, type

)

FROM UNNEST(ga_sessions.hits)

)

)) as hits ;;

relationship: one_to_many

}



Would really appreciate some suggestions on how to not disturb the existing filters that are based on hits.

Reply