Sequencing Transactions With Window Functions

Sequencing Transactions is a very common analytical pattern. In the example below, we will compute a sequence number for each order indicating if the order was the customer’s first, second, third, or Nth order.

This requires the use of persistent derived tables and window functions. MySQL databases do not have window functions, and thus should use a correlated subquery.

How it’s done.

A persistent derived table is created with two columns.

order_id, sequence_num

For each order in the orders table, a row is created in the order_user_sequence table. The table is then indexed and joined into orders.

Example model:

- explore: orders
  joins:
  - join: order_user_sequence
    foreign_key: id

- view: orders
  fields:
  - dimension: id
    type: int
    primary_key: true
    sql: ${TABLE}.id

  - measure: count
    type: count
    detail: [id, created_time]

# Compute the order's sequence over the users lifetime.  Is this the first, second, third, etc.

- view: order_user_sequence
  derived_table:
    sql: |
      SELECT
        id AS order_id
        , ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at) as user_order_sequence_number
      FROM orders
    
  fields:
  - dimension: order_id
    primary_key: true
    sql: ${TABLE}.order_id

  - dimension: user_order_sequence_number
    type: int
    sql: ${TABLE}.user_order_sequence_number

Orders by Order Sequence Number, the last 10 days.

Explore Data

0 5 1,849
5 REPLIES 5

Great Article, here’s the new LookML:

explore: orders {
  join: order_user_sequence {
    foreign_key: id
  }
}
view: orders {
  dimension: id {
    type: int
    primary_key: yes
    sql: ${TABLE}.id ;;
  }
  measure: count {
    type: count
    drill_fields: [id, created_time]
  }
}
# Compute the order's sequence over the users lifetime.  Is this the first, second, third, etc.
view: order_user_sequence {
  derived_table: {
    sql: SELECT
        id AS order_id
        , ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at) as user_order_sequence_number
      FROM orders
       ;;
  }
  dimension: order_id {
    primary_key: yes
    sql: ${TABLE}.order_id ;;
  }
  dimension: user_order_sequence_number {
    type: int
    sql: ${TABLE}.user_order_sequence_number ;;
  }
}

mplooker
Participant IV

is there an approach you can recommend that would allow the sequencing PDT to be aware of what the user is specifying in the orders explore?

this example above assumes that you want the user’s order sequence for all of the user’s orders regardless of exploration filters or other contextual predicates (e.g. i want to compute the user’s order sequence after I filter for only orders where the order_total is >= $100 or something like that).

we can sort-of get around this by using NDTs and applying bind filters for specific fields but it can’t reasonably be a solution to bind all of the fields one by one in a tofilter/fromfilter pattern… even something like a cross-apply where the predicate of the subquery can incorporate the result set of the explore but I havent found a way to implement using LookML yet.

any help would be greatly appreciated.

Hey mplooker, thanks for your question.

Generally in Looker, a derived table is treated as a static table, so injecting user inputted values into the SQL of the derived table isn’t possible without setting up Templated Filters for the fields you want .

With Templated Filters you could insert a token into the Derived Table’s WHERE clause that filters before the sequencing took place.

To do this with the example you described (order_total >= $100) you might have to calculate that number in a subquery in order to filter on it in the WHERE clause.

Reach out to help.looker.com for more help with your specific situation.

mplooker
Participant IV

thanks, spencer. i can reach out with more specifics…

it would be great if something like the following could be supported:

assume i have 2 PDTs defined like so…

view: pdt0 {
derived_table: {
sql: select orderid, userid, ordertotal from orderstable;;
}
dimension: orderid {}
dimension: userid {}
dimension: ordertotal {}
}

view: pdt1 {
derived_table: {
sql:
select distinct userid, row_number() over (partition by userid order by orderid asc) ordersequence
from ${pdt0.SQL_TABLE_NAME} t
}
dimension: userid { hidden:yes }
dimension: ordersequence {}
}

explore: bothpdts {
from: pdt0
join: pdt1 {
sql_on: ${pdt0.userid} = ${pdt1.userid};;
type: left_outer
}
}

currently the default behavior is for the sql generator to build PDT0, then build PDT1 but this is before the context resulting from the explore definition (filtering orderids in PDT0 by ordertotal >= $100).

it would be great to facilitate an ability to define PDT1 as a correlated subquery in the join so that something like this could work:

view: pdt1 {
derived_table: {
sql:
select distinct userid, row_number() over (partition by userid order by orderid asc) ordersequence
from ${pdt0.SQL_TABLE_NAME} t
where t.orderid = pdt0.orderid
}
dimension: userid { hidden:yes }
dimension: ordersequence {}
}

with a resulting sql statement that looks something like:

with pdt0 as 
( select orderid, userid, ordertotal from orderstable)

select pdt0.orderid, pdt0.userid, pdt1.ordersequence
from pdt0
left outer join (
 select distinct 
 userid, row_number() over (partition by userid order by orderid asc) ordersequence
 from pdt0 pdt0SQLTABLENAME
 where pdt0SQLTABLENAME.orderid = pdt0.orderid) pdt1
on pdt0.userid = pdt1.userid
-- and allowing for the resulting calc in the subquery to be aware of context specified in explore
where pdt0.ordertotal >= 100

and while i know this example specifically uses an order sequence calculation that is dynamically defined based on the explore context, the broader utility of this type of functionality for me is dynamic percent of total calculations (and derived weighted average measures) defined/centralized in the LookML model.

Hey mplooker,

Ok gotcha. You should be able to accomplish this using a Templated Filter in the pdt0 table.

The derived table would look like this:

derived_table: {
sql: select orderid, userid, ordertotal from orderstable
WHERE
{% condition foo %} ordertotal {% end_condition %}
;;
}

When you put the templated filter on the front end you’ll get filter options for the ordertotal similar to if you had a filter on ordertotal from the Explore.

There may be other approaches to this but this is what I was thinking.

One last note is that this view or others using will not be able to be persisted into the database because the Templated Filter requires a value.

Reach out on chat if you have questions implementing this.

Thanks,

Spencer

Top Labels in this Space
Top Solution Authors