Use Refinements to add Period over Period functionality to existing explores

  • 9 July 2020
  • 4 replies
  • 1662 views

Userlevel 2

Using Refinements to add Period over Period functionality to existing explores

 

In this example, we’ll use refinements to implement a commonly requested explore feature: Period over Period analysis.

 

 














 
Background on Refinements in general Background on Period over Period
With refinements, feature logic can be kept fully separate from core logic, yet can still be added to the original objects (where desired). For example, with this pattern you will be adding Period over Period capability seamlessly onto existing date fields in your existing explores, without complicating your core objects.

For additional explanation and background, see the lookml refinements docs page and Fabio’s Organize Lookml with Refinements.
Period over Period is a oft-requested capability, and several other Looker methods have been described in other articles such as Flexible Period-over-Period Analysis, and An explore filter based approach, among others.

This approach should be simple to implement, and will not need special maintenance if your explore changes. For users, it is flexible without being overly complex.

 

What Will This Period over Period Interface Look Like?

 

The user will pivot on a special new dimension, and will have the option to customize the period length and # of periods to show

 

 

 

 

basic pop example

 


Expand to see two other examples including optional parameters

Usage

 

 

 

Usage

 

 

 

Benefits of this approach

 

 



  • Includes similar benefits to the most comprehensive methods

    • Allows many periods
    • Allows several choices of period sizes
    • Support multiple variations of Pop such as different base date fields, even within the same explore.

       

     
  • Additional benefits of this method:

    • Layers on top of existing explores and reporting, rather than requiring new explores or many new fields. Existing drills, etc, should play nice
    • Very simple to maintain & scale: No derived table sql that needs to change when new fields are added, and measures do not need to be redefined anywhere for them to work with POP
    • Intuitive for Users: Enable smart period length default
    • Uses Looker automatic date function syntax generation so you should be able to avoid dialect specific syntax changes you may require with other methods.

       


     

 

 

Conceptual Explanation of this approach for Period over Period

 

 

Presume we know the period size and the number of periods the user wants for comparison. We will then:

 

 


  1. Make duplicates of the original dataset (conceptually) - one for each period.
  2. Offset the duplicated data according to how many periods ago it needs to represent and the period size.

     

 

 

Result: Each prior period’s data is correctly re-aligned to the corresponding date in the Current Period.

 

 

With these steps completed, everything else your in your explore can work smoothly even with PoP applied because the fanned out periods will fall naturally into corresponding pivot columns, and the grain of your results will be otherwise completely unchanged. Everything else will group and calculate just as before.

 

 

The same steps represented in an image:
 

 

Conceptual Daigram

 

 

Same steps in sql/query-ish steps:

 

 


  1.  

    Join a pop_support view

     

    • (one row for each selected period, effectively duplicating data for each row)

       

  2.  

    Refined_Date_Field = Original_Data_Date + ( [pop_support.number_periods_ago] * [Selected Period Length] )

     

    • Achieved by wrapping the date offset functions around your date field’s ${EXTENDED} keyword (it’s original sql).

       


     

 

 

Actual Implementation Steps

 

 

Prep) Paste some Period over Period Support LookML in a new file.

 

 

Note: This is generic code which you will be able to re-use with any number of pop enabled fields.

 


Period over Period Support LookML

#You should not need to modify the code below. Save this code in a file and include that file wherever needed (i.e. in your refinement that leverages this pop support logic) view: pop_support { derived_table: { sql: select periods_ago from ( select 0 as periods_ago {% if periods_ago._in_query%}{%comment%}extra backstop to prevent unnecessary fannouts if this view gets joined for any reason but periods_ago isn't actually used.{%endcomment%} {% for i in (1..52)%} union all select {{i}}{%endfor%}{%comment%}Up to 52 weeks. Number can be set higher, no real problem except poor selections will cause a pivot so large that rendering will get bogged down{%endcomment%} {%endif%} ) possible_periods where {%condition periods_ago_to_include%}periods_ago{%endcondition%} {% if periods_ago_to_include._is_filtered == false%}and periods_ago <=1{%endif%}{%comment%}default to only one prior period{%endcomment%} ;; } dimension: periods_ago {hidden:yes type:number} filter: periods_ago_to_include { label: "PoP Periods Ago To Include" description: "Apply this filter to specify which past periods to compare to. Default: 0 or 1 (meaning 1 period ago and 0 periods ago(current)). You can also use numeric filtration like Less Than or Equal To 12, etc" type: number default_value: "0,1" } parameter: period_size { label: "PoP Period Size" description: "The defaults should work intuitively (should align with the selected dimension, i.e. the grain of the rows), but you can use this if you need to specify a different offset amount. For example, you might want to see daily results, but compare to 52 WEEKS prior" type: unquoted allowed_value: {value:"Day"} allowed_value: {value:"Month"} allowed_value: {value:"Year"} # allowed_value: {value:"Week"} # allowed_value: {value:"Quarter"} # other timeframes could be handled with some adjustments, but may not be universally supported for each dialect and may be superfluous to users allowed_value: {value:"Default" label:"Default Based on Selection"} default_value: "Default" } dimension: now_sql { type: date_raw expression: now();; } dimension: now_converted_to_date_with_tz_sql { hidden: yes type: date expression: now();; } dimension: pop_sql_years_using_now {type: date_raw expression: add_years(${periods_ago},${now_sql});;}#use looker expressions to get dialect specific sql for date add functions dimension: pop_sql_months_using_now {type: date_raw expression: add_months(${periods_ago},${now_sql});;} dimension: pop_sql_days_using_now {type: date_raw expression: add_days(${periods_ago},${now_sql});;} dimension: period_label_sql { hidden:yes expression: if(${pop_support.periods_ago}=0," Current" , concat( ${pop_support.periods_ago}," REPLACE_WITH_PERIOD" ,if(${pop_support.periods_ago}>1,"s","") ," Prior" ) );; } }

 

 

Step 1) Paste the generic pop_support join in an explore you want to add pop to.

 


Explore/Join LookML

In our example, we will be refining an existing order_items view, and order_items is the base view of our explore. However, note that your PoP refinement could be similarly added to any existing explore that includes the refined view, regardless of what other joins or features are in the explore.

 



connection: "your_connection"

explore: order_items {

#your other joins, etc, would typically go here

#To enable pop, you'll paste this join to your explore definition

join: pop_support {

view_label: "PoP Support - Overrides and Tools" #(Optionally) Update view label for use in this explore here, rather than in pop_support view. You might choose to align this to your POP date's view label.

relationship:one_to_one #we are intentionally fanning out, so this should stay one_to_one

sql:{% if pop_support.periods_ago._in_query%}LEFT JOIN pop_support on 1=1{%endif%};;#join and fannout data for each prior_period included **if and only if** lynchpin pivot field (periods_ago) is selected. This safety measure ensures we dont fire any fannout join if the user selected PoP parameters from pop support but didn't actually select a pop pivot field.

}

#(Optionally): Update this always filter to your base date field to encourage a filter. Without any filter, 'future' periods will be shown when POP is used (because, for example: today's data is/will be technically 'last year' for next year)

#always_filter: {filters: [order_items.created_date: "before 0 minutes ago"]}

}

include: "/your_refinement_file" #!Update this to match the file name you used in step 2



 

 

Step 2) Refine your view that has the base date field

 

 

Paste the refinement template below and then update a few references (to match your existing view name and date field name), after which the prepared PoP feature logic will be applied to your existing date field.

 


See Refinement Template

Note that, in this code block, the lines where you need to update references to match to your existing objects are left aligned.

 



include: "/pop_support__template" #include the helper fields that are core to the PoP implementation - meaning: include the file in which you pasted the code as described in Step 1 above.

include: "/your_original_view_file.view" #!Include the file that defines your base view here so you can refine it



view: +order_items {#!Update to point to your view name (with the '+' making it a refinement). That view's file must be included here, and then THIS file must be included in the explore



#Refine YOUR date field by simply updating the dimension group name to match your base date field

dimension_group: created {

convert_tz: no #we need to inject the conversion before the date manipulation

datatype: datetime

sql:{% assign now_converted_to_date_with_timezone_sql = "${pop_support.now_converted_to_date_with_tz_sql::date}" %}{% assign now_unconverted_sql = pop_support.now_sql._sql %}{%comment%}pulling in logic from pop support template, within which we'll inject the original sql. Use $ {::date} when we want to get looker to do conversions, but _sql to extract raw sql {%endcomment%}
{% assign selected_period_size = selected_period_size._sql | strip %}
{%if selected_period_size == 'Day'%}{% assign pop_sql_using_now = "${pop_support.pop_sql_days_using_now}" %}{%elsif selected_period_size == 'Month'%}{% assign pop_sql_using_now = "${pop_support.pop_sql_months_using_now}" %}{%else%}{% assign pop_sql_using_now = "${pop_support.pop_sql_years_using_now}" %}{%endif%}
{% assign my_date_converted = now_converted_to_date_with_timezone_sql | replace:now_unconverted_sql,"${EXTENDED}" %}
{% if pop_support.periods_ago._in_query %}{{ pop_sql_using_now | replace: now_unconverted_sql, my_date_converted }}
{%else%}{{my_date_converted}}
{%endif%};;#wraps your original sql (i.e. ${EXTENDED}) inside custom pop logic, leveraging the parameterized selected-period-size-or-smart-default (defined below)

}



#Selected Period Size sets up Default Period Lengths to use for each of your timeframes, if the user doesn't adjust the PoP period size parameter

#If you only wanted YOY to be available, simply hard code this to year and hide the timeframes parameter in pop support

dimension: selected_period_size {

hidden: yes

sql:{%if pop_support.period_size._parameter_value != 'Default'%}{{pop_support.period_size._parameter_value}}

{% else %}

{% if

created_date._is_selected %}Day

{% elsif

created_month._is_selected %}Month

{% else %}Year

{% endif %}

{% endif %};;#!Update the liquid that mentions created_date and created_month to point to your timeframes, and potentially add more checks for other timeframes, and to consider other pop refined date fields within this view (if any)

}



dimension: created_date_periods_ago_pivot {#!Update to match your base field name. This is generic sql logic (so you might expect it to be in pop_support template), but it is helpful to manifest this lynchpin pivot field here so we can create a dedicated pivot field in this specific date dimension's group label.

label: "{% if _field._in_query%}Pop Period (Created {{selected_period_size._sql}}){%else%} Pivot for Period Over Period{%endif%}"#makes the 'PIVOT ME' instruction clear in the field picker, but uses a dynamic output label based on the period size selected

group_label: "Created Date" #!Update this group label if necessary to make it fall in your date field's group_label

order_by_field: pop_support.periods_ago #sort numerically/chronologically.

sql:{% assign period_label_sql = "${pop_support.period_label_sql}" %}{% assign selected_period_size = selected_period_size._sql | strip%}{% assign label_using_selected_period_size = period_label_sql | replace: 'REPLACE_WITH_PERIOD',selected_period_size%}{{label_using_selected_period_size}};;#makes intuitive period labels

}



# Optional Validation Support field. If there's ever any confusion with the results of PoP, it's helpful to see the exact min and max times of your raw data flowing through.

# measure: pop_validation {

# view_label: "PoP - VALIDATION - TO BE HIDDEN"

# label: "Range of Raw Dates Included"

# description: "Note: does not reflect timezone conversion"

#sql:{%assign base_sql = '${TABLE}.created_at'%}concat(concat(min({{base_sql}}),' to '),max({{base_sql}}));;#!Paste the sql parameter value from the original date fields as the variable value for base_sql

# }

}

 

 

Wrapup) Validate and ensure includes statements are updated as necessary based on your specific file names.

 

 

Known Challenges

 

 


  • If your base date field’s sql include double quotes, this can impact the liquid based formula construction. Double quotes should be avoidable in most dialects, by using [ ] instead of quotes or renaming the physical column without reserved characters.
  • We have used lookml_field_type_references ::date and ::datetime to pass sql logic between fields with converting dates to strings.  These references have surfaced datatype errors in certain dialects with certain base data types. Depending on your dialect and raw date field’s datatype, you may need to switch ::datetime to ::date or in some cases you can remove the ::datetime or ::date entirely.

     

 

 


Expand for additional complexities and considerations


- This process will show ‘Prior data for future periods’ which have not yet come to pass. Technically this is an accurate representation of the data but may be distracting to users. You may choose to apply the optional always filter or some other filtration to suppres this.




- Timezone conversion needs to happen BEFORE date manipulation in order to maintain correct groupings, so we apply timezone conversion with liquid instead of letting looker do convert_tz:yes. This adds to the complexity of the refinement logic, and may be removeable if you don't use timezone conversions, though it should do no harm.




- PoP functionality can be added on additional date fields, though it requires some care to avoid name collisions on pop pivot dimensions you create.




- Watch out for Database syntax differences... Although we have used looker expressions to leverage looker's built in dialect specific date function handling, some dialects may have other limitations not yet identified such as complaints about the fannout join (you may try type:cross instead), etc.

 

Closing

 

 

I hope this pattern helps your team quickly implement Period over Period functionality without complicating your base code or adding new explores just for PoP.

 

 

Additionally, I hope this inspires you to use refinements for better code organization and feature management.

 

 

Let us know how you used this pop approach or refinements in your case!

 


4 replies

We’ve been trying this and it looks really useful. However we’ve been struggling to get it to work if BigQuery when the table has Require partition filter turned on (which most of our tables do). For partition pruning BQ does not like it when, in the where clause, the dt is pruned using a non-constant expression. In the where clause we get something like this:

DATE_ADD(dt , INTERVAL pop_support.periods_ago MONTH)

which Bigquery can’t use to prune it’s partitions as the pop_support.periods_ago isn’t constant.
https://cloud.google.com/bigquery/docs/querying-partitioned-tables#pruning_limiting_partitions .

For tables that don’t have the require partition filter option the query results in a full table scan as BQ can prune any partitions.

One way we’re looking to get round this is to create a derived date table, which we create using Liquid, that has the absolute value of the periods_ago rather than being a reference, and then inner joining that to our fact table.

 

Not sure if anyone else has attempted this or has a work around/suggestions for BQ?

Userlevel 2

We’ve been trying this and it looks really useful. However we’ve been struggling to get it to work if BigQuery when the table has Require partition filter turned on (which most of our tables do). For partition pruning BQ does not like it when, in the where clause, the dt is pruned using a non-constant expression. In the where clause we get something like this:

DATE_ADD(dt , INTERVAL pop_support.periods_ago MONTH)

which Bigquery can’t use to prune it’s partitions as the pop_support.periods_ago isn’t constant.
https://cloud.google.com/bigquery/docs/querying-partitioned-tables#pruning_limiting_partitions .

For tables that don’t have the require partition filter option the query results in a full table scan as BQ can prune any partitions.

One way we’re looking to get round this is to create a derived date table, which we create using Liquid, that has the absolute value of the periods_ago rather than being a reference, and then inner joining that to our fact table.

 

Not sure if anyone else has attempted this or has a work around/suggestions for BQ?

Hello Mike. 

As you’ve mentioned, this solution uses date_add sql functions as a core component. Date add functions are supported by a variety of sql dialects, and allows us to rely on the database to handle the complexities, things like adding Months when there’s leap day involved, etc.  Another benefit of this approach is that normal Looker filters set on the refined date field automatically work as you would expect: limiting raw data to the selected period and the ‘duplicates of the original dataset for prior periods’ to the corresponding records.

That said, for your specific scenario, it appears that we would need filters applied directly to the partitioned field, and so we will need filters to be different and specific for every ‘duplicate of the original dataset for prior period’. 


Full disclosure, I have not implemented this approach, but I do see a path forward. The path involves replacing the cross-join component of this solution with a series of unions, each filtered on the partition field.  You would update the source of the view that the date field is in, using liquid to generates a union statement for each prior period (not dissimilar to how the cross join’s source uses a liquid for loop). 

Unfortunately, the challenge then will be to set the correct filter range for each union statement (without relying date_add in the filter) - instead of offsetting the raw dates and then filtering, we need to offset the start_date and end_date of the user’s filter, in the sql generation itself, also using liquid (see https://docs.looker.com/reference/liquid-variables).  There is no date_add equivalent in liquid which allows you specify date-parts - so I believe you’ll need to either

  1. decompose, manipulate, and recompose components of the date (which is likely prone to edge case problems like you’ll offset leap day to a prior year where 2/29 does not exist), or
  2. limit support to timeframes that can be defined in terms of a consistent number of seconds, like days, weeks and ‘52 week years’.  For this you will use use the %s liquid date format, add seconds corresponding to the period length, and then reset to filterable standard sql date format.

I hope these ideas help!  Let us know how it goes!

Hi Kevin,

This is a great help, I really like how it incorporates the Pop elements into the original date dimension group. 
I’m having issues implementing this, particularly using “${EXTENDED}” within Liquid. I’m getting a LookML Depreciation warning:

“This liquid does not conform to the Liquid language spec and will produce warnings or errors in a future version of Looker. Parse error was: Error parsing liquid: Liquid parse exception: parser error "extraneous input 'TRANSACTION_DATE' expecting {TagEnd, '|'}"
(with TRANSACTION_DATE being my original dimension group)

Furthermore, using only the original date field without a pivot in Explore fails. I can see in the SQL compilation error that it’s failing to properly passing the Extended sql from the original.  Any ideas on how ${EXTENDED} should now be formatted within liquid ?

 

Userlevel 2

Hello Dave,

I’m not 100% sure but this sounds like it relates to this issue below (copied from known issues section).  

  • If your base date field’s sql include double quotes, this can impact the liquid based formula construction. Double quotes should be avoidable in most dialects, by using [ ] instead of quotes or renaming the physical column without reserved characters.

Please let us know whether this helps resolve the issue

Reply