Solved: Creating a window function inside a table calculat...

Eran_Sagi · 03-30-2020 09:08 AM

Hey all,

I’ve searched everywhere and couldn’t find an answer

i would like to emulate a window functions in one of my looks

as you see i have 2 dimensions: created_at and merchant_id
and one measure: URLs which is a STRING_AGG function

im trying to add a second measure that counts per each created_at_time the number of merchant_id’s per this timestamp

so if it was plain SQL id write: count(distinct merchant_id) over(partition by created_at)

i’ve seen 3 posts where @izzymiller says @chris_seymour1 says its possible but he never replied 😉

Thanks!

chris_seymour1

Hey @Eran_Sagi,

It is indeed possible! Here’s some example code you can use in a table calculation:

group_start_row:
match(${orders.id}, ${orders.id})

next_group_start_row:
count(${orders.id}) - match(${orders.id}, offset(${orders.id}, count(${orders.id}) - row()*2 + 1)) + 2

You can replace orders.id with the dimension you want to group by (in this case the Created At Time dimension). From there, you can get the grouped count like this:

${next_group_start_row} - ${group_start_row}

Breaking this down, the match function gets the row number of the first row that contains the current row’s value. So if rows 7-10 contain 2020-02-20 16:40:38, the match function will return 7 for each of those rows, which gives us the group_start_row calculation.

The next_group_start_row calculation reverses the list and applies the match function to find the last row for each group, making a few adjustments in the arithmetic to get to the next_group_start_row from there. In this case it should return 11 for rows 7-10, since that is the first appearance of the next timestamp, 2020-02-20 16:40:37.

Once you have the starting rows of the current group and the next group, you can simply subtract the two to get the grouped count.

Let me know if you have any questions!

View solution in original post

chris_seymour1

Hey @Eran_Sagi,

It is indeed possible! Here’s some example code you can use in a table calculation:

group_start_row:
match(${orders.id}, ${orders.id})

next_group_start_row:
count(${orders.id}) - match(${orders.id}, offset(${orders.id}, count(${orders.id}) - row()*2 + 1)) + 2

You can replace orders.id with the dimension you want to group by (in this case the Created At Time dimension). From there, you can get the grouped count like this:

${next_group_start_row} - ${group_start_row}

Breaking this down, the match function gets the row number of the first row that contains the current row’s value. So if rows 7-10 contain 2020-02-20 16:40:38, the match function will return 7 for each of those rows, which gives us the group_start_row calculation.

The next_group_start_row calculation reverses the list and applies the match function to find the last row for each group, making a few adjustments in the arithmetic to get to the next_group_start_row from there. In this case it should return 11 for rows 7-10, since that is the first appearance of the next timestamp, 2020-02-20 16:40:37.

Once you have the starting rows of the current group and the next group, you can simply subtract the two to get the grouped count.

Let me know if you have any questions!

linnali-1635751

Hi Chris, I am curious of the code

this expression should used to get the next value of the current value I guess:
offset(${orders.id}, count(${orders.id}) - row()*2 + 1)

how did you get that need to use "count(${orders.id}) - row()*2 + 1" ?

izzymiller

😲
The dark secrets revealed!

Eran_Sagi

It works!!! 🤩
Thank you @chris_seymour1 and @izzymiller
I highly recommend adding this to the official documentation!

BTW can i use this for other measures? ill use the above data as an example, if i want to smear the latest created_at_time per merchant_id max(created_at) over (partition by merchant_id)
can i use this logic?

Only thing left is to overcome the download limit when using Table Calculations 😥

izzymiller

I’m not sure about that specific scenario, but I know you can use it for other kinds of functions once you have the building blocks.

Grouped count:
${next_group_start_row} - ${group_start_row}

Grouped sum:
sum(offset_list(${products.retail_price}, -1 * (row() - ${group_start_row}), ${next_group_start_row} - ${group_start_row}))

Grouped running total:
sum(offset_list(${products.retail_price}, -1 * (row() - ${group_start_row}), row() - ${group_start_row} + 1))

Max date in Group (date must be sorted desc):
index(${products.date}, ${group_start_row})

jamesnestler

Theres also this whole set of calcs to get the max numeric value in a group:

row_offset_till_group_start:
if(row()=${group_start_row},0,0-1*(${group_start_row}-row()))

offset_number:
${next_group_start_row}-row()+1

Grouped_list:
offset_list(${field},-${row_offset_till_group_start},${offset_number})

max_in_group:
max(${grouped_list})

vik_geiling

Hi Guys,

I'm really new to the looker environment. I would like to know how I can insert this solution above in this screen that I sent the print? That is my environment.

This solution is exactly something that I really need, but I didn't understand where it is possible to apply these lines of my code.

I thank you for all your help.

Andre_Forte_Sep

I'm really new to the looker environment. I would like to know how I can insert this solution above in this screen that I sent the print? That is my environment.

Hey @vik_geiling , welcome

You can implement this piece of code on table calculations. Would be something like this:

Dawid

Just want to mention that if your data has to be ordered by the measure, the groups, unfortunately, won’t make sense. Also if you have more than one dimension.

I really can’t wait for window functions in table calculations. I usually have two dimensions and one measure and would like to have % of total, % of total of the first dimension, and % of total of the second dimension..

Mihir_Kulkarni

Theres also this whole set of calcs to get the max numeric value in a group:

row_offset_till_group_start:

if(row()=${group_start_row},0,0-1*(${group_start_row}-row()))



offset_number:

${next_group_start_row}-row()+1



Grouped_list:

offset_list(${field},-${row_offset_till_group_start},${offset_number})



max_in_group:

max(${grouped_list})

Hi James, thanks for posting this. I was referring to the explore shared by Chris. If we are applying this to there, is there any way where we can find out which item is giving us maximum revenue

dboyd

Theres also this whole set of calcs to get the max numeric value in a group:

row_offset_till_group_start:

if(row()=${group_start_row},0,0-1*(${group_start_row}-row()))



offset_number:

${next_group_start_row}-row()+1



Grouped_list:

offset_list(${field},-${row_offset_till_group_start},${offset_number})



max_in_group:

max(${grouped_list})

Edit: Ignore this, I realise I was mixing up the function at the end, it now works as intended

Lokesh1

Hi Team,

Is there anyway we can calculate Rank dense in Looker, the default Rank function is Skip, I mean if I have same values Looker skips and generated next rank .instead on continuous Rank.

Looker Default : 1,1,3,3,5,6

Required : 1,1,2,2,3,4

chris_seymour1

Hey @Lokesh1, there are some tips on rank functions in table calculations (including dense rank) in this post:

https://community.looker.com/explores-36/ranks-with-table-calculations-8122?postid=55072#post55072

Creating a window function inside a table calculation/custom measure