Solved

Creating a window function inside a table calculation/custom measure


Hey all,


I’ve searched everywhere and couldn’t find an answer


i would like to emulate a window functions in one of my looks



as you see i have 2 dimensions: created_at and merchant_id

and one measure: URLs which is a STRING_AGG function


im trying to add a second measure that counts per each created_at_time the number of merchant_id’s per this timestamp


so if it was plain SQL id write: count(distinct merchant_id) over(partition by created_at)


i’ve seen 3 posts where @izzy says @chris.seymour says its possible but he never replied 😉


Thanks!

icon

Best answer by chris.seymour 2 April 2020, 18:04

Hey @Eran_Sagi,



It is indeed possible! Here’s some example code you can use in a table calculation:



group_start_row:


match(${orders.id}, ${orders.id})



next_group_start_row:


count(${orders.id}) - match(${orders.id}, offset(${orders.id}, count(${orders.id}) - row()*2 + 1)) + 2



You can replace orders.id with the dimension you want to group by (in this case the Created At Time dimension). From there, you can get the grouped count like this:



${next_group_start_row} - ${group_start_row}



Breaking this down, the match function gets the row number of the first row that contains the current row’s value. So if rows 7-10 contain 2020-02-20 16:40:38, the match function will return 7 for each of those rows, which gives us the group_start_row calculation.



The next_group_start_row calculation reverses the list and applies the match function to find the last row for each group, making a few adjustments in the arithmetic to get to the next_group_start_row from there. In this case it should return 11 for rows 7-10, since that is the first appearance of the next timestamp, 2020-02-20 16:40:37.



Once you have the starting rows of the current group and the next group, you can simply subtract the two to get the grouped count.



Let me know if you have any questions!

View original

11 replies

Userlevel 2

Hey @Eran_Sagi,


It is indeed possible! Here’s some example code you can use in a table calculation:


group_start_row:

match(${orders.id}, ${orders.id})


next_group_start_row:

count(${orders.id}) - match(${orders.id}, offset(${orders.id}, count(${orders.id}) - row()*2 + 1)) + 2


You can replace orders.id with the dimension you want to group by (in this case the Created At Time dimension). From there, you can get the grouped count like this:


${next_group_start_row} - ${group_start_row}


Breaking this down, the match function gets the row number of the first row that contains the current row’s value. So if rows 7-10 contain 2020-02-20 16:40:38, the match function will return 7 for each of those rows, which gives us the group_start_row calculation.


The next_group_start_row calculation reverses the list and applies the match function to find the last row for each group, making a few adjustments in the arithmetic to get to the next_group_start_row from there. In this case it should return 11 for rows 7-10, since that is the first appearance of the next timestamp, 2020-02-20 16:40:37.


Once you have the starting rows of the current group and the next group, you can simply subtract the two to get the grouped count.


Let me know if you have any questions!

Userlevel 7
Badge +1

😲

The dark secrets revealed!

It works!!! 🤩

Thank you @chris.seymour and @izzy

I highly recommend adding this to the official documentation!


BTW can i use this for other measures? ill use the above data as an example, if i want to smear the latest created_at_time per merchant_id max(created_at) over (partition by merchant_id)

can i use this logic?


Only thing left is to overcome the download limit when using Table Calculations 😥

image

Userlevel 7
Badge +1

I’m not sure about that specific scenario, but I know you can use it for other kinds of functions once you have the building blocks.


Grouped count:
${next_group_start_row} - ${group_start_row}

Grouped sum:
sum(offset_list(${products.retail_price}, -1 * (row() - ${group_start_row}), ${next_group_start_row} - ${group_start_row}))

Grouped running total:
sum(offset_list(${products.retail_price}, -1 * (row() - ${group_start_row}), row() - ${group_start_row} + 1))

Max date in Group (date must be sorted desc):
index(${products.date}, ${group_start_row})
Userlevel 3

Theres also this whole set of calcs to get the max numeric value in a group:


row_offset_till_group_start:
if(row()=${group_start_row},0,0-1*(${group_start_row}-row()))

offset_number:
${next_group_start_row}-row()+1

Grouped_list:
offset_list(${field},-${row_offset_till_group_start},${offset_number})

max_in_group:
max(${grouped_list})

Hi Guys,

I'm really new to the looker environment. I would like to know how I can insert this solution above in this screen that I sent the print? That is my environment.

 


This solution is exactly something that I really need, but I didn't understand where it is possible to apply these lines of my code.

I thank you for all your help.

I'm really new to the looker environment. I would like to know how I can insert this solution above in this screen that I sent the print? That is my environment.

 

Hey @vik_geiling , welcome

You can implement this piece of code on table calculations. Would be something like this:

 

Userlevel 6
Badge +1

Just want to mention that if your data has to be ordered by the measure, the groups, unfortunately, won’t make sense. Also if you have more than one dimension.

 

I  really can’t wait for window functions in table calculations. I usually have two dimensions  and one measure and would like to have % of total, % of total of the first dimension, and % of total of the second dimension..

Theres also this whole set of calcs to get the max numeric value in a group:

 

row_offset_till_group_start:

if(row()=${group_start_row},0,0-1*(${group_start_row}-row()))



offset_number:

${next_group_start_row}-row()+1



Grouped_list:

offset_list(${field},-${row_offset_till_group_start},${offset_number})



max_in_group:

max(${grouped_list})

Hi James, thanks for posting this. I was referring to the explore shared by Chris. If we are applying this to there, is there any way where we can find out which item is giving us maximum revenue

 

Theres also this whole set of calcs to get the max numeric value in a group:

 

row_offset_till_group_start:

if(row()=${group_start_row},0,0-1*(${group_start_row}-row()))



offset_number:

${next_group_start_row}-row()+1



Grouped_list:

offset_list(${field},-${row_offset_till_group_start},${offset_number})



max_in_group:

max(${grouped_list})

 

 

Edit: Ignore this, I realise I was mixing up the function at the end, it now works as intended

Hi Team,

Is there anyway we can calculate Rank dense in Looker, the default Rank function is Skip, I mean if I have same values Looker skips and generated next rank .instead on continuous Rank.


Looker Default : 1,1,3,3,5,6

Required : 1,1,2,2,3,4

Reply