Question

Sequencing Transactions With Window Functions

  • 17 January 2015
  • 5 replies
  • 1035 views

Userlevel 6
Badge

Sequencing Transactions is a very common analytical pattern. In the example below, we will compute a sequence number for each order indicating if the order was the customer’s first, second, third, or Nth order.


This requires the use of persistent derived tables and window functions. MySQL databases do not have window functions, and thus should use a correlated subquery.


How it’s done.


A persistent derived table is created with two columns.


order_id, sequence_num

For each order in the orders table, a row is created in the order_user_sequence table. The table is then indexed and joined into orders.


Example model:


- explore: orders
joins:
- join: order_user_sequence
foreign_key: id

- view: orders
fields:
- dimension: id
type: int
primary_key: true
sql: ${TABLE}.id

- measure: count
type: count
detail: [id, created_time]

# Compute the order's sequence over the users lifetime. Is this the first, second, third, etc.

- view: order_user_sequence
derived_table:
sql: |
SELECT
id AS order_id
, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at) as user_order_sequence_number
FROM orders

fields:
- dimension: order_id
primary_key: true
sql: ${TABLE}.order_id

- dimension: user_order_sequence_number
type: int
sql: ${TABLE}.user_order_sequence_number

Orders by Order Sequence Number, the last 10 days.



Explore Data


5 replies

Great Article, here’s the new LookML:


explore: orders {
join: order_user_sequence {
foreign_key: id
}
}
view: orders {
dimension: id {
type: int
primary_key: yes
sql: ${TABLE}.id ;;
}
measure: count {
type: count
drill_fields: [id, created_time]
}
}
# Compute the order's sequence over the users lifetime. Is this the first, second, third, etc.
view: order_user_sequence {
derived_table: {
sql: SELECT
id AS order_id
, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at) as user_order_sequence_number
FROM orders
;;
}
dimension: order_id {
primary_key: yes
sql: ${TABLE}.order_id ;;
}
dimension: user_order_sequence_number {
type: int
sql: ${TABLE}.user_order_sequence_number ;;
}
}

Hey mplooker, thanks for your question.


Generally in Looker, a derived table is treated as a static table, so injecting user inputted values into the SQL of the derived table isn’t possible without setting up Templated Filters for the fields you want .


With Templated Filters you could insert a token into the Derived Table’s WHERE clause that filters before the sequencing took place.


To do this with the example you described (order_total >= $100) you might have to calculate that number in a subquery in order to filter on it in the WHERE clause.


Reach out to help.looker.com for more help with your specific situation.


Hey mplooker,


Ok gotcha. You should be able to accomplish this using a Templated Filter in the pdt0 table.


The derived table would look like this:


derived_table: {

sql: select orderid, userid, ordertotal from orderstable

WHERE

{% condition foo %} ordertotal {% end_condition %}

;;

}


When you put the templated filter on the front end you’ll get filter options for the ordertotal similar to if you had a filter on ordertotal from the Explore.


There may be other approaches to this but this is what I was thinking.


One last note is that this view or others using will not be able to be persisted into the database because the Templated Filter requires a value.


Reach out on chat if you have questions implementing this.


Thanks,


Spencer

Userlevel 2

is there an approach you can recommend that would allow the sequencing PDT to be aware of what the user is specifying in the orders explore?


this example above assumes that you want the user’s order sequence for all of the user’s orders regardless of exploration filters or other contextual predicates (e.g. i want to compute the user’s order sequence after I filter for only orders where the order_total is >= $100 or something like that).


we can sort-of get around this by using NDTs and applying bind filters for specific fields but it can’t reasonably be a solution to bind all of the fields one by one in a tofilter/fromfilter pattern… even something like a cross-apply where the predicate of the subquery can incorporate the result set of the explore but I havent found a way to implement using LookML yet.


any help would be greatly appreciated.

Userlevel 2

thanks, spencer. i can reach out with more specifics…


it would be great if something like the following could be supported:


assume i have 2 PDTs defined like so…



view: pdt0 {

derived_table: {

sql: select orderid, userid, ordertotal from orderstable;;

}

dimension: orderid {}

dimension: userid {}

dimension: ordertotal {}

}


view: pdt1 {

derived_table: {

sql:

select distinct userid, row_number() over (partition by userid order by orderid asc) ordersequence

from ${pdt0.SQL_TABLE_NAME} t

}

dimension: userid { hidden:yes }

dimension: ordersequence {}

}


explore: bothpdts {

from: pdt0

join: pdt1 {

sql_on: ${pdt0.userid} = ${pdt1.userid};;

type: left_outer

}

}



currently the default behavior is for the sql generator to build PDT0, then build PDT1 but this is before the context resulting from the explore definition (filtering orderids in PDT0 by ordertotal >= $100).


it would be great to facilitate an ability to define PDT1 as a correlated subquery in the join so that something like this could work:



view: pdt1 {

derived_table: {

sql:

select distinct userid, row_number() over (partition by userid order by orderid asc) ordersequence

from ${pdt0.SQL_TABLE_NAME} t

where t.orderid = pdt0.orderid

}

dimension: userid { hidden:yes }

dimension: ordersequence {}

}



with a resulting sql statement that looks something like:


with pdt0 as 
( select orderid, userid, ordertotal from orderstable)

select pdt0.orderid, pdt0.userid, pdt1.ordersequence
from pdt0
left outer join (
select distinct
userid, row_number() over (partition by userid order by orderid asc) ordersequence
from pdt0 pdt0SQLTABLENAME
where pdt0SQLTABLENAME.orderid = pdt0.orderid) pdt1
on pdt0.userid = pdt1.userid
-- and allowing for the resulting calc in the subquery to be aware of context specified in explore
where pdt0.ordertotal >= 100

and while i know this example specifically uses an order sequence calculation that is dynamically defined based on the explore context, the broader utility of this type of functionality for me is dynamic percent of total calculations (and derived weighted average measures) defined/centralized in the LookML model.

Reply