We have a lot of data that we would like to know the state of at a particular time. For instance, model A has dimensions X and Y, and I would like to know what X,Y were yesterday, two days ago, now, etc. Has anyone built a schema/lookml model that works well for these types of queries? I’d love to hear what strategies have been used.
Hey
Thanks
[edit:deleted in favor of Kevin’s comment below]
Include Start and End date for dimension row states in your dimension tables table (standard Type 2 SCD approach) - use a required As At
filter to bring back the dimension in the state at that time
Kallin,
You need both the “start” and “end” timestamps for two purposes:
- Filtering down the historical data to a specific point in time. Say I want to look at the state of things on Jan 1, 2017 at 12:00 AM. For any given record, I can only know it was effective at that moment if it started before it and ended after.
- Creating time-series data by joining against a time table. The state data table, on its own, does not allow you to answer questions like, “How many subscriptions were active day over day?” To accomplish this, you need to join the state data table to a time table, say a table of calendar dates, like so:
select dates.date
, count(*)
from state_data
inner join dates
on dates.date between state_data.start and state_data.end
where state_data.status = 'active'
group by 1
order by 1
Hope that helps! This pattern is incredibly powerful and is also a lot of fun 🙂
Reply
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.