Entity/Transaction Fact Tables

  • 10 February 2015
  • 2 replies
  • 559 views

Userlevel 5
Badge

Each business generally has a main transaction. In a store, the main transaction is probably orders; in a lab, the main transaction is probably tests; and in a bank, it’s deposits and withdrawals. Usually, there is an individual interacting with the entity, typically a user or a customer.




This page uses a derived table to realize a fact table. You can see an example derived table on learn.looker.com. Refer to Derived Table Reference.




As an example, we’ll use users as the person doing the transaction and orders as the transaction. All the following examples will use the following set of minimal tables. In the real world, these tables would be much more robust, but for illustration we’ll use these:




####Users






####Orders






Entity table: users.lookml







- view: users


fields:





# DIMENSIONS #





- dimension: id


type: int


primary_key: true


sql: ${TABLE}.id





- dimension_group: created


type: time


timeframes: [time, date, week, month, year]


sql: ${TABLE}.created_at





- dimension: name


sql: CONCAT(${TABLE}.first_name,' ', ${TABLE}.last_name)





# MEASURES #





- measure: count


type: count





Transaction table: orders.lookml




  - view: orders


fields:





# DIMENSIONS #





- dimension: id


primary_key: true


type: int


sql: ${TABLE}.id





- dimension_group: created


type: time


timeframes: [time, date, week, month, month_num, year, dow_num]


sql: ${TABLE}.created_at





- dimension: status


sql: ${TABLE}.status





- dimension: user_id


type: int


hidden: true


sql: ${TABLE}.user_id





- dimension: order_amount


type: number


sql: ${TABLE}.order_amount





# MEASURES #





- measure: count


type: count_distinct


sql: ${TABLE}.id





Characterizing users by buying patterns




There are a number of dimensions we create to characterize users by how often they buy. We will create a derived table (users_orders_facts.lookml) that generates the following values:







  • User lifetime orders



  • User first and last order date



  • User days as a customer



  • User days since purchase



  • User number of months in which a purchase was made






Fact table: users_orders_facts.lookml




  - view: users_orders_facts


derived_table:


sql: |


SELECT


orders.user_id as user_id


, COUNT(*) as lifetime_orders


, MIN(NULLIF(orders.created_at,0)) as first_order


, MAX(NULLIF(orders.created_at,0)) as latest_order


, DATEDIFF(MAX(NULLIF(orders.created_at,0)),


MIN(NULLIF(orders.created_at,0))) as days_as_customer


, DATEDIFF(CURDATE(),


MAX(NULLIF(orders.created_at,0))) as days_since_purchase


, COUNT(DISTINCT MONTH(NULLIF(orders.created_at,0)))


as number_of_distinct_months_with_orders


FROM orders


GROUP BY user_id


indexes: [user_id]


persist_for: 4 hours





fields:


- dimension: user_id


primary_key: true


hidden: true





- dimension: lifetime_orders


type: number





- dimension: lifetime_number_of_orders_tier


type: tier


tiers: [0,1,2,3,5,10]


sql: ${lifetime_orders}





- dimension: repeat_customer


type: yesno


sql: ${lifetime_orders} > 1





- dimension_group: first_order


type: time


timeframes: [date, week, month]


sql: ${TABLE}.first_order





- dimension: latest_order


type: time


timeframes: [date, week, month, year]


sql: ${TABLE}.latest_order





- dimension: days_as_customer


type: int





- dimension: days_as_customer_tiered


type: tier


tiers: [0,1,2,3,4,5,6,7,30,60,90,180]


sql: ${days_as_customer}





- dimension: days_since_purchase


type: int





- dimension: number_of_distinct_months_with_orders


type: int





Characterizing products by order patterns




There are a number of dimensions we create to characterize products by how often they are purchased. We create a derived table (products_orders_facts.lookml) that generates the following values:







  • Product lifetime items sold



  • Product items sold in last 7 days



  • Product items sold in last 30 days






Fact table: products_orders_facts.lookml




  - view: products_orders_facts


derived_table:


sql: |


SELECT p.id


, SUM(oi.quantity) lifetime_items_sold


, SUM(CASE WHEN oi.order_date > now() - INTERVAL 7 DAY


THEN oi.quantity ELSE 0 END) items_sold_last_7_days


, SUM(CASE WHEN oi.order_date > now() - INTERVAL 30 DAY


THEN oi.quantity ELSE 0 END) items_sold_last_30_days


FROM order_item oi


LEFT JOIN products p on p.id = oi.product_id


GROUP BY 1





- dimension: lifetime_items_sold


type: int





- dimension: items_sold_last_7_days


type: int





- dimension: items_sold_last_30_days


type: int





Characterizing orders




There are a number of dimensions we create to classify orders. We create a derived table (orders_sequence.lookml) that generates the following values:







  • Order sequence



  • Order is first purchase



  • Order is second purchase



  • Order is repeat purchase






Fact table: orders_sequence.lookml




  - view: orders_sequence


derived_table:


sql: |


SELECT id


, order_sequence


FROM


(SELECT id


, created_at


, user_id


, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at)


as order_sequence


FROM orders)


persist_for: 24 hours


distkey: id


sortkeys: [id]





fields:





- dimension: id


type: int


primary_key: true


hidden: true





- dimension: order_sequence


type: int





- dimension: order_sequence_tier


type: tier


tiers: [0,1,2,5,10,20,40,60]


sql: ${order_sequence}





- dimension: is_first_purchase


type: yesno


sql: ${order_sequence} = 1





- dimension: is_second_purchase


type: yesno


sql: ${order_sequence} = 2





- dimension: is_a_repeat_purchase


type: yesno


sql: ${order_sequence} >= 2





Note: ROW_NUMBER() does not work in MySQL and is only available in Postgres dialects (most MPPs like Redshift and Vertica)


2 replies

In the orders_sequence.lookml



I may have missed it, but did order_item get defined somewhere or is just an example of how that characterization would work on a hypothetical order_item table. Or was it meant to be based on the example orders table?

Userlevel 3
Badge

Hi Rahul,



The reference to the order_item table is indeed theoretical- we did not define it in this article but you can assume that it is a table that contains the following columns: product_id, quantity, and order_date.



Cheers!

Reply