Rolling average using offset_list in table calcula...

lindsey1 · 12-23-2015 12:21 PM

As of Looker 3.36, we have introduced a offset_list function. This function allows you to create a list from a group of rows in a column, and then aggregate over that list. Read more about how lists work here.

The offset function in table calculations allows you to reference a single value in a previous or following row. For example, you can calculate percent of previous using offset.

By comparison, offset_list allows you to reference a whole group of preceding or following rows in one step, and then perform functions on that group. This is particularly useful for something like a rolling average.

Let’s say I have this table that shows me the number of orders per day:

I want to calculate a rolling 7 day average of the number of orders per day. I can do this by using offset_list to grab the last 7 days of orders.

offset_list takes three arguments: the column you want to grab the offset values from, how far from the current row you want to start the offset, and how long you want the offset list to be. So in this case, I want to take the offset of ${orders.count}, I want it to start at the current row 0, and I want it to go for 7 rows.

Then I can take the average of that list for my rolling average.

mean(offset_list(${orders.count}, 0, 7))

This will give me a rolling average like so:

msh210

Note that if your rows are reversed and you want the rolling average to be for the rows preceding the current one, the second argument to offset_list will need to be negative. In the above example, that would be:

mean(offset_list(${orders.count}, -6, 7))

(-6 because you want the row that’s six rows before the present one. 7 because you want seven rows, from there to the present one inclusive.)

lindsey1

Great point Michael! Thanks for that. The direction of the offset does indeed depend on the sort order of your rows.

weitzenfeld

Any way to hack this so that if there’s a day with 0 orders - and therefore no row in the query results - the rolling 7 day average is still accurate?

Rick_Saporta

@weitzenfeld We have played around with creating a small table enumerating all of the days of interest and then doing a full-join.

The draw back here is in those cases where the look is filtered to a subset of the data, but is NOT filtered by date (ie there is no mindate). In this case, we get a bunch of empty rows at the top (or bottom) of our look.

lloydtabb

It is definitely more work, but this pattern also works for rolling averages and it handles the empty date problem.

Morgan1

I wanted to throw a note on here that dimension fill should solve the missing date problem!

dgroman1988

I love the table calculations, don’t get me wrong but does Looker plan on supporting measures of this type as to push the work into the database as opposed to the post query table calculations? I know of derived tables, but I’d be curious to know what types of complex aggregations are on the road map or if there are any.

lloydtabb

@dgroman1988 We’ve talked about how to map some of the table calculations to be calculated in the database. It becomes much more difficult with pivoted data. The are many calculations that aren’t provided by most of the databases. We’re still talking about it.

It is a different approach, but you can model this using native derived tables with SQL window functions. It is more cumbersome, but you won’t hit any limits. Look for window functions in the article below.

On complex aggregates, if you are using BigQuery, you can code your own complex aggregates and use them in Looker. Below are articles for a median function and a Top-N aggregate function in BigQuery. We generally support whatever is possible in the database.

TopN is a measure returns the top N results as strings.

dgroman1988

Thanks for much for the reply! You can never underestimate the amount of knowledge here on Discourse.

We’ve begun to combine window functions with liquid references and find that it has been a huge problem solver. The fact that we can use liquid to update our calculations to partition or group by different dimensions on the fly is so powerful as we only have to write and maintain a single derived view. I’m going to have to read the entire ‘Using Native Derived Tables’ article though to make sure I’m not missing out on anything else.

I did not know we could include BigQuery custom functions with the sql_preamble, so thank you for that! We have tinkered with these before and it’s great to know it’s available in Looker. I’m going to dig deep into the last two links you posted as I think they will allow us creative ways for tackling some of our use cases.

writchings

One method I’ve found to be useful and consistent, is to incorporate the row() function into your argument. In the example I’ve provided, I am using Table Calculations to provide a rolling monthly average. Instead of denoting rows by using values like zero ( 0 ), or seven ( 7 ), I use -row()+1 and row() to tell Looker to start the function at the first row, and calculate down to the current row. The formatting needs to be correct, but it is quick and easy to match your average to the frequency you are counting.

So the entire function I am using is mean(offset_list(${opportunities.count}, -row()+1, row)))

This will give me a rolling average like so:

Enjoy!

Rolling average using offset_list in table calculations (3.36+)