Lookml/schema for historical data

  • 4 April 2017
  • 5 replies

Userlevel 1

We have a lot of data that we would like to know the state of at a particular time. For instance, model A has dimensions X and Y, and I would like to know what X,Y were yesterday, two days ago, now, etc. Has anyone built a schema/lookml model that works well for these types of queries? I’d love to hear what strategies have been used.

5 replies

Userlevel 4

Hey @Kallin_Nagelberg - great question. I’d encourage you to check out this block to deal with state data? Hopefully this helps answer some questions.

Userlevel 1

Thanks @Dillon_Morrison, that’s pretty interesting! Have you tried accomplishing the same thing using only one date column? What if you had just a ‘state_changed_time’ column, and then you only have to filter on the one value instead of filtering both start and end time on each query?

Userlevel 4

[edit:deleted in favor of Kevin’s comment below]

Userlevel 3

Include Start and End date for dimension row states in your dimension tables table (standard Type 2 SCD approach) - use a required As At filter to bring back the dimension in the state at that time


You need both the “start” and “end” timestamps for two purposes:

  1. Filtering down the historical data to a specific point in time. Say I want to look at the state of things on Jan 1, 2017 at 12:00 AM. For any given record, I can only know it was effective at that moment if it started before it and ended after.

  2. Creating time-series data by joining against a time table. The state data table, on its own, does not allow you to answer questions like, “How many subscriptions were active day over day?” To accomplish this, you need to join the state data table to a time table, say a table of calendar dates, like so:

, count(*)
from state_data
inner join dates
on between state_data.start and state_data.end
where state_data.status = 'active'
group by 1
order by 1

Hope that helps! This pattern is incredibly powerful and is also a lot of fun 🙂