[Analytic Block] Flexible Period-over-Period Analysis


Userlevel 7
Badge

Looker has created this block to make analyzing data easier and more efficient. This Data Block is made available on an “as is” basis, meaning there will not be updates moving forward.




This is an advanced analytic block that assumes an advanced understanding of LookML, Liquid and SQL.



What Is This Block and What Does It Tell Me?


Year-over-year reporting (and, more generally, period-over-period analysis) can be complex. Data that isn’t joined properly can cause fanout and/or nested loop joins, both of which can potentially cause performance issues. The risk of fanout is often addressed by writing several variant SQL queries that group the underlying data by the right date granularity. This works, but it requires writing and maintaining several sets of nearly identical logic.


Fear not! With a bit of wizardry and by using parameters, we can create one Explore to rule them all!


The example LookML code for this solution is in the code block below; but first, let’s take a look at what the end result is and how it all fits together.


When a user first arrives at the resulting period-over-period Explore, they will see this:



By leveraging the always_filter Explore parameter, the user is always presented with the three filters below in which they must input values. However, the fields that make up these filters ([PoP] 1. Date Range, [PoP] 2. Break down date range by, [PoP] 3. Compare over) are hidden to avoid cluttering up the left-hand field picker. The first of these filters, [PoP] 1. Date Range, functions much like a typical date filter. However, it is also used to pick corresponding data in previous periods as well, by applying some date transformations to the start and end of the filter range in the back-end LookML. The other two filters leverage LookML case parameters to control the values the user can choose from. With these two filters, we are asking the user to pick the granularity they want for their date aggregates, along with the time period over which they want to compare the data.


That all sounds very abstract, so let’s list a few examples. A user might ask to see this data:



  • (1) The current month to date, (2) broken down by day, and (3) compared to the past month:




  • (1) The current year to date, (2) broken down by week, and (3) compared to last year:




  • (1) From Oct-1 to Mar-31, (2) broken down by month, and (3) compared to the prior year:




  • (1) The past 2 days, (2) broken down by hour, and (3) compared to the prior week:




  • A user can also change how many past periods to compare to, by bringing in an additional optional filter:




  • (1) The past 7 days, (2) broken down by day, and (3) compared to the 8 prior weeks:




  • (1) The past 7 days, (2) broken down by day, and (3) compared to the week 52 weeks ago:



Since this code block produces efficient joins, users are also able to select unrelated aggregates in an Explore for a given date range to compare them side by side, without causing prohibitive fan-out in the SQL query:



Implementation


The logic in this block requires dialect-specific date functions. Looker does not automatically translate from dialect to dialect. As a result, you may need to adapt the syntax to your specific dialect. The code below is for Redshift.


Beyond dialect adaptation, in order to customize the block for your data, you will need to perform these steps:



  1. Change the connection name to list your desired connection.

  2. Adapt (one or more copies of) the pop_* views in the example. These views define:

    • The underlying table

    • Which date field the table will be joined on

    • The aggregation to apply in the sub-queries

    • The aggregation to apply in the sub-queries



  3. Copy the joins in the pop_explore to point to as many pop_* views as you want to include.


The Code



Starting in Looker 7.4, the filters subparameter syntax has changed. See the always_filter parameter documentation page to view the new syntax.




connection: "your_connection_name"

#explore(pop_explore) is defined below the pop views

view: pop_order_items_created {

view_label: "Order Items (By created)"

#These are FYI - Currently, Looker does not substitute in sql_table_name, so you must do these by hand

dimension: SQL_TABLE_NAME { sql: order_items;; hidden:yes}

dimension: date_field {sql: order_items.created_at ;; hidden:yes}

dimension: join_date { sql: DATE_TRUNC({% parameter pop.within_period_type %},${date_field}) ;; hidden:yes }

# Do these substitutions by hand

# sql_table_name: (SELECT

# ${join_date} as join_date

# ${agg_1_inner} as agg_1

# FROM ${SQL_TABLE_NAME}

# GROUP BY ${join_date}

# ) ;;

sql_table_name: (SELECT

DATE_TRUNC({% parameter pop.within_period_type %},order_items.created_at) as join_date,

COUNT(*) as agg_1,

SUM(order_items.sale_price) as agg_2

FROM order_items

WHERE {%condition pop_order_items_created.sale_price %}order_items.sale_price{% endcondition %}

GROUP BY 1

) ;;

#OPTIONAL : Filter inner query on minx/max dates (since query optimizer probably won't)

#You can put aggregates and filters directly in the view file

#Or, you can extend them from another view like this exampme, so you can join the same view on different

# date fields

extends: [pop_order_items_base]

}

view: pop_order_items_delivered {

view_label: "Order Items (By delivered)"

#These are FYI - Currently, Looker does not substitute in sql_table_name, so you must do these by hand

dimension: SQL_TABLE_NAME { sql: order_items;; hidden:yes}

dimension: date_field {sql: order_items.shipped_at ;; hidden:yes}

dimension: join_date { sql: DATE_TRUNC({% parameter pop.within_period_type %},${date_field}) ;; hidden:yes }

# Do these substitutions by hand

# sql_table_name: (SELECT

# ${join_date} as join_date

# ${agg_1_inner} as agg_1

# FROM ${SQL_TABLE_NAME}

# GROUP BY ${join_date}

# ) ;;

sql_table_name: (SELECT

DATE_TRUNC({% parameter pop.within_period_type %},order_items.shipped_at) as join_date,

COUNT(*) as agg_1,

SUM(order_items.sale_price) as agg_2

FROM order_items

WHERE {%condition pop_order_items_delivered.sale_price %}order_items.sale_price{% endcondition %}

GROUP BY 1

) ;;

#OPTIONAL : Filter inner query on minx/max dates (since query optimizer probably won't)

#You can put aggregates and filters directly in the view file

#Or, you can extend them from another view like this exampme, so you can join the same view on different

# date fields

extends: [pop_order_items_base]

}

view: pop_order_items_base {

extension: required

filter: sale_price {

type: number

}

# Do this substitutions by hand in sql_table_name

# measure: agg_1_inner {

# hidden: yes

# sql: COUNT(*) ;;

# }

measure: agg_1 {

type: number

label: "Count"

sql: SUM(${TABLE}.agg_1) ;;

}

# Do this substitutions by hand in sql_table_name

#measure: agg_2_inner {

# hidden: yes

# sql: SUM(${SQL_TABLE_NAME}.sale_price) ;;

#}

measure: agg_2 {

type: number

label: "Total Amount"

sql: SUM(${TABLE}.agg_2) ;;

}

}

explore: pop_explore {

from: pop

view_name: pop

join: within_periods { #No editing needed

from: numbers

type: left_outer

relationship: one_to_many

fields: []

sql_on: ${within_periods.n}

<= DATEDIFF( {% parameter pop.within_period_type %},{% date_start pop.date_filter %},{% date_end pop.date_filter %} )

* CASE WHEN {%parameter pop.within_period_type %} = 'hour' THEN 24 ELSE 1 END;;

}

join: over_periods { #No editing needed

from: numbers

view_label: "[PoP]"

type: left_outer

relationship: one_to_many

sql_on:

CASE WHEN {% condition pop.over_how_many_past_periods %} NULL {% endcondition %}

THEN

${over_periods.n} <= 1

ELSE

{% condition pop.over_how_many_past_periods %} ${over_periods.n} {% endcondition %}

END;;

}

#Rename (& optionally repeat) below join to match your pop view(s)

join: pop_order_items_created {

type: left_outer

relationship: many_to_one

#Apply join name below in sql_on

sql_on: pop_order_items_created.join_date = DATE_TRUNC({% parameter pop.within_period_type %},

DATEADD({% parameter pop.over_period_type %}, 0 - ${over_periods.n},

DATEADD({% parameter pop.within_period_type %}, 0 - ${within_periods.n},

{% date_end pop.date_filter %}

)

)

);;

}

join: pop_order_items_delivered {

type: left_outer

relationship: many_to_one

#Apply join name below in sql_on

sql_on: pop_order_items_delivered.join_date = DATE_TRUNC({% parameter pop.within_period_type %},

DATEADD({% parameter pop.over_period_type %}, 0 - ${over_periods.n},

DATEADD({% parameter pop.within_period_type %}, 0 - ${within_periods.n},

{% date_end pop.date_filter %}

)

)

);;

}

#No editing needed below

always_join: [pop,within_periods,over_periods]

always_filter: {

filters: {

field: pop.date_filter

value: "last 12 weeks"

}

filters: {

field: pop.within_period_type

value: "week"

}

filters: {

field: pop.over_period_type

value: "year"

}

}

}

# The below views should not need editing (unless you want to add more than 52 periods)

view: numbers {

sql_table_name: (

SELECT 00 as n UNION ALL SELECT 01 UNION ALL SELECT 02 UNION ALL

SELECT 03 UNION ALL SELECT 04 UNION ALL SELECT 05 UNION ALL

SELECT 06 UNION ALL SELECT 07 UNION ALL SELECT 08 UNION ALL

SELECT 09 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL

SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL

SELECT 15 UNION ALL SELECT 16 UNION ALL SELECT 17 UNION ALL

SELECT 18 UNION ALL SELECT 19 UNION ALL SELECT 20 UNION ALL

SELECT 21 UNION ALL SELECT 22 UNION ALL SELECT 23 UNION ALL

SELECT 24 UNION ALL SELECT 25 UNION ALL SELECT 26 UNION ALL

SELECT 27 UNION ALL SELECT 28 UNION ALL SELECT 29 UNION ALL

SELECT 30 UNION ALL SELECT 31 UNION ALL SELECT 32 UNION ALL

SELECT 33 UNION ALL SELECT 34 UNION ALL SELECT 35 UNION ALL

SELECT 36 UNION ALL SELECT 37 UNION ALL SELECT 38 UNION ALL

SELECT 39 UNION ALL SELECT 40 UNION ALL SELECT 41 UNION ALL

SELECT 42 UNION ALL SELECT 43 UNION ALL SELECT 44 UNION ALL

SELECT 45 UNION ALL SELECT 46 UNION ALL SELECT 47 UNION ALL

SELECT 48 UNION ALL SELECT 49 UNION ALL SELECT 50 UNION ALL

SELECT 51 UNION ALL SELECT 52 )

;;

dimension: n {

type: number

hidden: yes

sql: ${TABLE}.n ;;

}

}

view: pop {

sql_table_name: (SELECT NULL) ;;

view_label: "[PoP]"

dimension: reference_date_formatted {

type: string

order_by_field: reference_date

label: "Reference date"

sql: TO_CHAR(

${reference_date},

CASE {% parameter pop.within_period_type %}

WHEN 'year' THEN 'YYYY'

WHEN 'month' THEN 'MON YY'

WHEN 'quarter' THEN 'YYYY"Q"Q'

WHEN 'week' THEN 'MM/DD/YY' --or 'YYYY"W"WW' or 'YY-MM"W"W'

WHEN 'day' THEN 'MM/DD/YY'

WHEN 'hour' THEN 'MM/DD HHam'

ELSE 'MM/DD/YY'

END)

;;}

dimension: reference_date {

hidden: yes

#type: date_time <-- too aggressive with choosing your string formatting for you

#type: date <-- too aggressive with truncating the time part

#convert_tz: no

#type: nothing <-- just right

sql: DATE_TRUNC({% parameter pop.within_period_type %},DATE_ADD({% parameter pop.within_period_type %},0 - ${within_periods.n},{% date_end pop.date_filter %}));;

}

filter: date_filter {

label: "1. Date Range"

hidden: yes

type: date

convert_tz: no

}

dimension: over_period_type {

label: "3. Compare over"

hidden: yes

type: string

#Using case just to get friendlier UI experience in filters. Otherwise, could have a no-sql filter field

case: {

when: {

sql: {% parameter pop.over_period_type %}='year' ;;

label: "year"

}

when: {

sql: {% parameter pop.over_period_type %}='quarter' ;;

label: "quarter"

}

when: {

sql: {% parameter pop.over_period_type %}='month' ;;

label: "month"

}

when: {

sql: {% parameter pop.over_period_type %}='week' ;;

label: "week"

}

when: {

sql: {% parameter pop.over_period_type %}='day' ;;

label: "day"

}

}

}

dimension: within_period_type {

label: "2. Break down date range by"

hidden: yes

type: string

#Using case just to get friendlier UI experience in filters. Otherwise, could have a no-sql filter field

case: {

when: {

sql: {% parameter pop.within_period_type %}='quarter' ;;

label: "quarter"

}

when: {

sql: {% parameter pop.within_period_type %}='month' ;;

label: "month"

}

when: {

sql: {% parameter pop.within_period_type %}='week' ;;

label: "week"

}

when: {

sql: {% parameter pop.within_period_type %}='day' ;;

label: "day"

}

when: {

sql: {% parameter pop.within_period_type %}='hour' ;;

label: "hour"

}

}

}

filter: over_how_many_past_periods {

label: "Override past periods"

description: "Apply this filter to change which past periods to compare to (from the default of current vs 1 period ago)"

type: number

default_value: "<=1"

}

dimension: over_periods_ago {

label: "Prior Periods"

description: "Pivot me!"

sql: CASE ${over_periods.n}

WHEN 0 THEN 'Current '||{% parameter pop.over_period_type %}

WHEN 1 THEN ${over_periods.n}||' '||{% parameter pop.over_period_type %} || ' prior'

ELSE ${over_periods.n}||' '||{% parameter pop.over_period_type %} || 's prior'

END;;

order_by_field: over_periods.n

}

}


For further reading on period-over-period modeling, also see the following Help Center articles:



38 replies

I used the period over period structure above and combined it with the join on false approach here to do something similar on Google BigQuery.


Goal: Compare any two arbitrary periods from the same table with varying levels of granularity (e.g. broken up by day, month, quarter, year, etc.)


Examples below are based on weather data but the approach can be adapted to any table that has date and a measure of interest.


Example 1: Comparing two months (July, 2018 and March, 2018) at the day level



Example 2: Comparing two quarters (Q1, 2018 and Q3, 2018) at the month level



LookML Highlights:



  1. sql_on: FALSE - when using an outer join it accomplishes the equivalent of a wide union, think of a diagonal table with lots of nulls

  2. extracting info from Looker-generated predicate after it translates liquid condition - allows us to determine when both of the arbitrary periods start

  3. using a parameter to label a dimension - shows Week/Day/etc. in the viz based on user’s input


Gist to LookML


Hope others find it useful!

Hey guys,


I’m trying to implement this code in our MS SQL Server environment and I get the error

“always_join: Unknown Join pop”


Not sure what causes this and how I can fix it…

Userlevel 7
Badge +1

That error says to me that there’s something iffy with the way you’ve defined your views. In the code in the top level post here, at the very end, there’s a view defined called pop that’s joined into some explores:


view: pop {
sql_table_name: (SELECT NULL) ;;
view_label: "[PoP]"
etc etc

Is that also present in your LookML? If you’ve renamed it, then that would also break the always_join reference to it.

Hey Miguel, I like your solution but could you please elaborate on your example model with the weather_raw measure and weather_date dimension? I tried to create the weather_raw as an average temperature and create a weather_date dimension but I received an error “Field references an aggregate but is specified as a “dimension”. If you want to use aggregations such as sum, average, count, use a measure type instead.”

@fabio: What does this piece of code actually do? I’m used to using SQL_TABLE_NAME as a reference to a derived table as in ${some_pdt.SQL_TABLE_NAME}



Thanks,

Userlevel 7
Badge

That declaration, and the two that follow it are 100% informational for the person implementing the code as to how to do the substitutions in the following code block.


image


I named it like the existing SQL_TABLE_NAME gesture because it is doing the same thing, providing the name of the table that should be referenced.


Final note, this was necessary back in the day to work around the fact that Looker would not do substitutions inside of the view>sql_table_name parameter. There are surely cleaner ways of implementing this now, but anyway I usually suggest to people not to use this pattern anymore and instead to use on pivots + built-in datepart dimensions, like day_of_month (along with a solution like “outer join on false” or “join paths” if they need to combine datasets without fanout)

@fabio Would you be willing to post the “cleaner” way to implementing this code now? I would greatly appreciate it.

Userlevel 7
Badge

I mostly recommend customers away from this approach nowadays and instead suggest using Looker’s default dateparts together with pivoting for a better user experience, and then using something like my join paths approach if there is a need to combine multiple fact tables. This allows for less manually written SQL and better drill downs.


If you are set on using the “PoP” approach in this article, the thing I was alluding to before is that I believe you should now be able to use, for example, ${order_items.SQL_TABLE_NAME} inside of the view>sql_table_name parameter

Hi fabio - thanks very much for this. I’m wondered if you could expand on your statement, “… suggest using Looker’s default dateparts together with pivoting…” or point me to the documentation that discusses this as a way to do PoP analysis, please?

Userlevel 7
Badge

For example, you can select “month of year”/“monthname” as a dimension and “year” as a pivot.


When doing this, I find it helps to put the two classes of date parts (i.e., period, and within-period) into two separate view labels.


It can also help to create YTD, MTC, etc filters. Here are a couple examples of that:





Hi, I m trying to adapt this for PostgreSQL but i m stucked !!!

It would be great if you post the equivalent code if exists

Thanks

I have used the same logic mentioned above for redshift dialect here and I am seeing 7 months when I choose (is in the past :6 months) This also works the same when I select complete months - I am able to see May 2021 when I choose complete months. Same is the case for complete quarters as well. Anyone face the same issue here?

 

Thanks for sharing @fabio .  Can you help me with below scenario?

Need help in getting past 4 quarters of data based on the filter selection on dashboard.

In the source table, we have data at quarter level ( have Year and Quarter column with other measures)

Example:
Year  Quarter Orders
2019  1        100
2019  2        200
2019  3        50
2019  4        90
2020  1        300


On the dashboard, have a filter in which user will select the Quarter ( this filter will have values like 2019Q1,2019Q2 so on which utilizes a dimension built using Year and Quarter)
Once the user selects the Quarter, then the visualization should only show past 4 quarters of data.

New to looker, any help here would be appreciated. 

Reply