Solved

Creating a window function inside a table calculation/custom measure

  • 30 March 2020
  • 7 replies
  • 3642 views

Hey all,


I’ve searched everywhere and couldn’t find an answer


i would like to emulate a window functions in one of my looks



as you see i have 2 dimensions: created_at and merchant_id

and one measure: URLs which is a STRING_AGG function


im trying to add a second measure that counts per each created_at_time the number of merchant_id’s per this timestamp


so if it was plain SQL id write: count(distinct merchant_id) over(partition by created_at)


i’ve seen 3 posts where @izzy says @chris.seymour says its possible but he never replied 😉


Thanks!

icon

Best answer by chris.seymour 2 April 2020, 18:04

Hey @Eran_Sagi,



It is indeed possible! Here’s some example code you can use in a table calculation:



group_start_row:


match(${orders.id}, ${orders.id})



next_group_start_row:


count(${orders.id}) - match(${orders.id}, offset(${orders.id}, count(${orders.id}) - row()*2 + 1)) + 2



You can replace orders.id with the dimension you want to group by (in this case the Created At Time dimension). From there, you can get the grouped count like this:



${next_group_start_row} - ${group_start_row}



Breaking this down, the match function gets the row number of the first row that contains the current row’s value. So if rows 7-10 contain 2020-02-20 16:40:38, the match function will return 7 for each of those rows, which gives us the group_start_row calculation.



The next_group_start_row calculation reverses the list and applies the match function to find the last row for each group, making a few adjustments in the arithmetic to get to the next_group_start_row from there. In this case it should return 11 for rows 7-10, since that is the first appearance of the next timestamp, 2020-02-20 16:40:37.



Once you have the starting rows of the current group and the next group, you can simply subtract the two to get the grouped count.



Let me know if you have any questions!

View original

7 replies

Userlevel 2

Hey @Eran_Sagi,


It is indeed possible! Here’s some example code you can use in a table calculation:


group_start_row:

match(${orders.id}, ${orders.id})


next_group_start_row:

count(${orders.id}) - match(${orders.id}, offset(${orders.id}, count(${orders.id}) - row()*2 + 1)) + 2


You can replace orders.id with the dimension you want to group by (in this case the Created At Time dimension). From there, you can get the grouped count like this:


${next_group_start_row} - ${group_start_row}


Breaking this down, the match function gets the row number of the first row that contains the current row’s value. So if rows 7-10 contain 2020-02-20 16:40:38, the match function will return 7 for each of those rows, which gives us the group_start_row calculation.


The next_group_start_row calculation reverses the list and applies the match function to find the last row for each group, making a few adjustments in the arithmetic to get to the next_group_start_row from there. In this case it should return 11 for rows 7-10, since that is the first appearance of the next timestamp, 2020-02-20 16:40:37.


Once you have the starting rows of the current group and the next group, you can simply subtract the two to get the grouped count.


Let me know if you have any questions!

Userlevel 7
Badge

😲

The dark secrets revealed!

It works!!! 🤩

Thank you @chris.seymour and @izzy

I highly recommend adding this to the official documentation!


BTW can i use this for other measures? ill use the above data as an example, if i want to smear the latest created_at_time per merchant_id max(created_at) over (partition by merchant_id)

can i use this logic?


Only thing left is to overcome the download limit when using Table Calculations 😥

image

Userlevel 7
Badge

I’m not sure about that specific scenario, but I know you can use it for other kinds of functions once you have the building blocks.


Grouped count:
${next_group_start_row} - ${group_start_row}

Grouped sum:
sum(offset_list(${products.retail_price}, -1 * (row() - ${group_start_row}), ${next_group_start_row} - ${group_start_row}))

Grouped running total:
sum(offset_list(${products.retail_price}, -1 * (row() - ${group_start_row}), row() - ${group_start_row} + 1))

Max date in Group (date must be sorted desc):
index(${products.date}, ${group_start_row})
Userlevel 2

Theres also this whole set of calcs to get the max numeric value in a group:


row_offset_till_group_start:
if(row()=${group_start_row},0,0-1*(${group_start_row}-row()))

offset_number:
${next_group_start_row}-row()+1

Grouped_list:
offset_list(${field},-${row_offset_till_group_start},${offset_number})

max_in_group:
max(${grouped_list})

Hi Guys,

I'm really new to the looker environment. I would like to know how I can insert this solution above in this screen that I sent the print? That is my environment.

 


This solution is exactly something that I really need, but I didn't understand where it is possible to apply these lines of my code.

I thank you for all your help.

I'm really new to the looker environment. I would like to know how I can insert this solution above in this screen that I sent the print? That is my environment.

 

Hey @vik_geiling , welcome

You can implement this piece of code on table calculations. Would be something like this:

 

Reply