[Analytic Block] Sessionization


  • Anonymous
  • 0 replies

Hi anicia,

I’m trying to implement this block, but I found some problems. On your explore,

isn’t it supposed to be like this? I don’t think people will have an unique_session_id on their events table.


- join: sessions
relationship: many_to_one
type: left_outer
#sql_on: ${events.unique_session_id} = ${sessions.unique_session_id}
sql_on: ${event_mapping.unique_session_id} = ${sessions.unique_session_id}

Aside from that, I’m facing a lot of other errors when generating the derived tables. event_mapping was complaining that I couldn’t define created_at as a sortkey if it wasn’t being selected. After adding that, I got "events_with_session_info.created_at" must appear in the GROUP BY clause or be used in an aggregate function;

If I try to aggregate anything else on session_facts I get another error.


Is there an updated block for sessionalization?


Thank you


12 replies

Userlevel 3

Hey @angelim,


The unique session id is generated in the outer SELECT and is not apart of the dataset:


, ROW_NUMBER () OVER (ORDER BY find_idle_time.created_at) AS unique_session_id


The error you are seeing seems to be a missing field in the projection, thanks for catching that! We’ll update the example SQL soon.


Mike

I’d really appreciate if you guys would send me a working example of this block. There’s a lot of stuff I can’t figure out apart from what I’ve already mentioned. For instance, on session_facts you’re grouping by 5 columns, but columns number 7 and 8 are also dimensions, right? Won’t Redshift complain about that?

Hi anicia,


It got it working based on that very same example on learnbeta. Got help all the way from London 🙂

Unfortunately I forgot the analyst’s name, but he gave me access to learnbeta and pointed me in the right direction.


Thank you once again for the excellent service.


Regards

I"m having the same issue with generating the session_fact table. Redshift is saying that we need to group the date field (created_at) . Mine is called etl_tstamp.


Let me know if you can help.


Sonny

Userlevel 3

Hey @srivera,


It could be that in the window function the ORDER BY clause is still using “created_at”, this can be swapped out to etl_tstamp to match the fields in the underlying event table.

Hi anicia,


Thanks for sharing this, really useful. I implemented it in our Looker instance - as a quick check, I compared the number of unique sessions to Google Analytics sessions and see a vast difference (Segment has about 30% of GA sessions), even though Segment feeds data to GA as well. Any ideas on why there would be such a discrepancy? The session definition remains the same (<30 mins),

Userlevel 4
Badge

@akshay524 There was a bug in the old Segment block that was assuming all users had an id. We’ve since adjusted the LookML to adhere to the possibility of an anonymous_id. You’ll have to reach out to your Account Analyst, or help.looker.com in order to get the updated block

Is there a reason why this WHERE clause specifies both a >= and a < date?


WHERE (date_created) >= (DATEADD(day,-59, DATE_TRUNC('day',GETDATE()) )) 
AND (log.created_at) < (DATEADD(day,60, DATEADD(day,-59, DATE_TRUNC('day',GETDATE()) ) )))

Seems like just the first part is sufficient to limit the dataset to 60 days. But I can’t tell if I’m missing something or if this makes the query more performant.

Userlevel 7
Badge +1

Looks like in the first place that WHERE appears in the code, there’s a comment that says


 -- optional limit of events table to only past 60 days

Which seems to answer that question 🙂 Guess it’s optional and only if you do want to limit the events. I bet you’re right, though, and it’d make things faster.

Hi, thanks for sharing this! We have been using this block, but are in the midst of migrating to Snowflake. Do you have anything you can share that would allow this to run in Snowflake as it’s currently not able to do so.

Userlevel 5
Badge

Do you have specific error messages you can share?


At a quick glance, I think you’ll need to:




  • replace GETDATE() by current_date




  • replace DATE_ADD by date_add




  • Remove distribution & sortkeys parameters



Thanks, Cyril! For the most part those actually don’t seem to be causing issues. The initial error I’m seeing is below, which seems to be caused by this line that’s parsing out user_agent. I should preface that I did not initially create this so could be missing some context or knowledge of any functions that may have been run.


SQL : , public.base_web_user_agent(pages.context_user_agent) as user_agent


Error : SQL compilation error: Unknown user-defined functions PUBLIC.BASE_WEB_USER_AGENT, PUBLIC.USER_SESSION_PARTITION_KEY

Reply