How to create a relative timeframe from column data and do a YoY comparison with it

  • 18 June 2020
  • 4 replies
  • 823 views

Userlevel 3

I recently created a Look that seems that it might not be common knowledge but is a possibly common scenario. The ability to set a relative date range that starts with a date within the database and ends some number of days, weeks, or months after. And use that range to do a year-over-year comparison.


Background


We have data that has unique dates per grouping. When creating a Look, the user should be able to set a timeframe window using this date as a “starting date” and the “end date” relative to a certain number of days/weeks/months. Any events that happen within that range would be counted. To take it one step further, we also wanted to do a year-over-year analysis of that range to compare before and after we launched.


Scenario


To make it easier, let’s pretend we are a service that makes riding buses easier. Each the cities already have existing state-run bus services and meaningful ridership data that they’d be willing to share. We are working with the state to launching to new cities. After some time, we’ve rolled out across multiple cities at different times and we want to report back to the state. Let’s limit it to just comparing the total number of riders before and after we launched across each city.


We’d likely create a look that filters for a single state and columns that are as follows: State, City, Launch Date, Total Riders. We wouldn’t be able to filter for a hard set date because it’s all relative to a single row’s date.


Solution


We can use parameters to allow the Look user to set the relative range and apply that directly to the SQL with liquid syntax. It should be noted that in this example, everything SQL is in RedShift.


Setting up the Parameters


parameter: timeframe_rage_number {
label: "Comparison: Unit Number"
description: "Number of date units (e.g. 90) that will be added to the start date and create an end date"
type: number
default_value: "90"
}
parameter: timeframe_rage_units {
label: "Comparison: Unit Type"
description: "Type of date units (e.g. days) that will be added to the start date and create an end date"
type: string
allowed_value: { label:"Day(s)" value: "day" }
allowed_value: { label:"Week(s)" value: "week" }
allowed_value: { label:"Month(s)" value: "month" }
allowed_value: { label:"Quarter(s)" value: "quarter" }
allowed_value: { label:"Year(s)" value: "year" }
default_value: "day"
}

By default, the timeframe_rage_number parameter is set to 90 and the timeframe_rage_units parameter is set to day. This will be inputted into the SQL to add 90 days to a date range. Within the Look, the Filter section will show these options and allow them to be changed on the fly.




2 - Setup Range & Check if an event is within it


dimension: has_before_events {
hidden: yes
type: yesno
sql: ${event_date} between dateadd(year, -1, ${launch_date}) and dateadd(year, -1, dateadd({% parameter timeframe_rage_units %}, {% parameter timeframe_rage_number %}, ${launch_date}));;
}
dimension: has_after_events {
hidden: yes
type: yesno
sql: ${event_date} between ${launch_date} and dateadd({% parameter timeframe_rage_units %}, {% parameter timeframe_rage_number %}, ${launch_date});;
}

After we’ve set the date ranges, we need to find the rows that have events date that fit within the range for both the before launch range and the after. Also, so it’s noted, I hide these parameters because they’re not used outside of other measures.


Also let’s breakdown the dateadd() function, which is Redshift (SQL) specific.


dateadd(
{% parameter timeframe_rage_units %}, # default: 90
{% parameter timeframe_rage_number %}, # default: 'day'
${launch_date}
)

Assuming default values, in SQL this will output as dateadd('day', 90, table.launch_date). While it says that a parameter requires a string, Looker is smart enough to convert to a number when appropriate (at least in this scenario).


Additionally, the dateadd(year, -1, ...) will remove a year from the date. So if the date was 2020-06-2019, this function will change it to 2019-06-2019




3 - Outputting a Measure


measure: before_total_rides {
label: "Before – Total Riders"
description: "Total number of riders who road within the set timeframe for before launch"
type: sum
sql: ${daily_rides} ;;
filters: [has_before_events: "yes"]
}
measure: after_total_rides {
label: "After – Total Riders"
description: "Total number of riders who road within the set timeframe for after launch"
type: sum
sql: ${daily_rides} ;;
filters: [has_after_events: "yes"]
}

Lastly, we want to take the numbers that fit within the range and roll it up. If you’re counting rows, then you’d instead set type: to count or count_dist. In this example, we assume that each row has a number of daily riders already calculated and we’re simply adding them up where data exists from our previous yesno filtering.


Output


+---------+--------------+-------------+-----------------------+----------------------+
| State | City | Launch Date | Before – Total Riders | After – Total Riders |
+---------+--------------+-------------+-----------------------+----------------------+
| Florida | Miami | 2019-02-11 | 153 | 234 |
| Florida | Orlando | 2019-02-18 | 101 | 150 |
| Florida | Jacksonville | 2019-04-01 | 89 | 100 |
+---------+--------------+-------------+-----------------------+----------------------+

I figured I’d be easier to make the numbers up in an ASCII table (sorry). For completeness, here’s what an output might look like with the above LookML.


Conclusion


I initially thought the best way would be to create a filter, but if a filter is not wrapped in a condition that is true, it seems it doesn’t return a value. I then thought a parameter with parameter_name._parameter_value setup, but it seems this only works within liquid syntax (like if statements) and doesn’t output in the SQL. And of course, after some testing, {% parameter parameter_name %} held the key.


I was also successful at setting up a simple user can set number for equations. So if we know that bus rides are typically 80% full and we had an average maximum capacity for the buses. We could to set that up as a parameter that defaults to 80 and set up a measure that has SQL something like this


measure: avg_occupacity {
type: number
sql: ({avg_bus_capacitiy} * {% parameter set_default_occupacity_pct %})/100.00;;
value_format_name: percent_0
}

Where set_default_occupacity_pct is a parameter set up the same as timeframe_rage_number. It’s important to note that you have to multiply by 100.00 not just 100 or you won’t see floating-point numbers.


4 replies

@mprogano this is great! Almost exactly what I’m looking for in terms of comparing relative date ranges for website launch dates and site traffic in a relative date after that. 
 

Did you consider creating a “days since launch” dimension or a similar surface to be able to compare daily measures across the relative date parameter? E.g. using the above if you wanted to see the daily rides per city for the first month after launch?
If so I’m struggling to figure out exactly how to do that next step and am curious if you went further with this and could help out?

Userlevel 3

@mprogano this is great! Almost exactly what I’m looking for in terms of comparing relative date ranges for website launch dates and site traffic in a relative date after that. 
 

Did you consider creating a “days since launch” dimension or a similar surface to be able to compare daily measures across the relative date parameter? E.g. using the above if you wanted to see the daily rides per city for the first month after launch?
If so I’m struggling to figure out exactly how to do that next step and am curious if you went further with this and could help out?

I’m not sure I follow. If you set the `timeframe_rage_units` parameter to 30 and `timeframe_rage_number` it will give you a time period from the launch date to 30 days into the future. This time range will then be compared to the year prior based the calculation that exists in `has_before_events`, more specifically the areas that reference `dateadd(year, -1, ...)`. If for some reason you wanted to compared a time period to the month before or X days before you can just modify this dimension.

 

Let me know if that helps!

@mprogano sorry for the delay on the response and confusing post.  I was trying to figure out a way to compare each day after launch between two cities for x number of days. For example if i wanted to see how ridership grew over the first 5 days since launch for Dallas and Austin (who launched on different days) I’d want a table like the following:

  Day 1 after launch day 2 after launch day 3 day 4 day 5 day n
Austin 10 15 20 40 50 x
Dallas 2 6 9 15 25 y


We ended up being able to create a derived table that uses a datediff between date of the rider and the launch date to create a table that has each city, launch date, and a count of days since launch that is working out. 

 

Does this help clarify what I was asking? Appreciate the post and response so thank you!

Userlevel 3

Does this help clarify what I was asking? Appreciate the post and response so thank you!

 

I think so, this sounds like a validate use-case for the OP (original post)

Reply