Entity/Transaction Fact Tables

  • 10 February 2015
  • 2 replies
  • 580 views

Userlevel 5
Badge

Each business generally has a main transaction. In a store, the main transaction is probably orders; in a lab, the main transaction is probably tests; and in a bank, it’s deposits and withdrawals. Usually, there is an individual interacting with the entity, typically a user or a customer.



This page uses a derived table to realize a fact table. You can see an example derived table on learn.looker.com. Refer to Derived Table Reference.



As an example, we’ll use users as the person doing the transaction and orders as the transaction. All the following examples will use the following set of minimal tables. In the real world, these tables would be much more robust, but for illustration we’ll use these:



####Users




####Orders




Entity table: users.lookml





- view: users

fields:



# DIMENSIONS #



- dimension: id

type: int

primary_key: true

sql: ${TABLE}.id



- dimension_group: created

type: time

timeframes: [time, date, week, month, year]

sql: ${TABLE}.created_at



- dimension: name

sql: CONCAT(${TABLE}.first_name,' ', ${TABLE}.last_name)



# MEASURES #



- measure: count

type: count



Transaction table: orders.lookml



  - view: orders

fields:



# DIMENSIONS #



- dimension: id

primary_key: true

type: int

sql: ${TABLE}.id



- dimension_group: created

type: time

timeframes: [time, date, week, month, month_num, year, dow_num]

sql: ${TABLE}.created_at



- dimension: status

sql: ${TABLE}.status



- dimension: user_id

type: int

hidden: true

sql: ${TABLE}.user_id



- dimension: order_amount

type: number

sql: ${TABLE}.order_amount



# MEASURES #



- measure: count

type: count_distinct

sql: ${TABLE}.id



Characterizing users by buying patterns



There are a number of dimensions we create to characterize users by how often they buy. We will create a derived table (users_orders_facts.lookml) that generates the following values:





  • User lifetime orders


  • User first and last order date


  • User days as a customer


  • User days since purchase


  • User number of months in which a purchase was made




Fact table: users_orders_facts.lookml



  - view: users_orders_facts

derived_table:

sql: |

SELECT

orders.user_id as user_id

, COUNT(*) as lifetime_orders

, MIN(NULLIF(orders.created_at,0)) as first_order

, MAX(NULLIF(orders.created_at,0)) as latest_order

, DATEDIFF(MAX(NULLIF(orders.created_at,0)),

MIN(NULLIF(orders.created_at,0))) as days_as_customer

, DATEDIFF(CURDATE(),

MAX(NULLIF(orders.created_at,0))) as days_since_purchase

, COUNT(DISTINCT MONTH(NULLIF(orders.created_at,0)))

as number_of_distinct_months_with_orders

FROM orders

GROUP BY user_id

indexes: [user_id]

persist_for: 4 hours



fields:

- dimension: user_id

primary_key: true

hidden: true



- dimension: lifetime_orders

type: number



- dimension: lifetime_number_of_orders_tier

type: tier

tiers: [0,1,2,3,5,10]

sql: ${lifetime_orders}



- dimension: repeat_customer

type: yesno

sql: ${lifetime_orders} > 1



- dimension_group: first_order

type: time

timeframes: [date, week, month]

sql: ${TABLE}.first_order



- dimension: latest_order

type: time

timeframes: [date, week, month, year]

sql: ${TABLE}.latest_order



- dimension: days_as_customer

type: int



- dimension: days_as_customer_tiered

type: tier

tiers: [0,1,2,3,4,5,6,7,30,60,90,180]

sql: ${days_as_customer}



- dimension: days_since_purchase

type: int



- dimension: number_of_distinct_months_with_orders

type: int



Characterizing products by order patterns



There are a number of dimensions we create to characterize products by how often they are purchased. We create a derived table (products_orders_facts.lookml) that generates the following values:





  • Product lifetime items sold


  • Product items sold in last 7 days


  • Product items sold in last 30 days




Fact table: products_orders_facts.lookml



  - view: products_orders_facts

derived_table:

sql: |

SELECT p.id

, SUM(oi.quantity) lifetime_items_sold

, SUM(CASE WHEN oi.order_date > now() - INTERVAL 7 DAY

THEN oi.quantity ELSE 0 END) items_sold_last_7_days

, SUM(CASE WHEN oi.order_date > now() - INTERVAL 30 DAY

THEN oi.quantity ELSE 0 END) items_sold_last_30_days

FROM order_item oi

LEFT JOIN products p on p.id = oi.product_id

GROUP BY 1



- dimension: lifetime_items_sold

type: int



- dimension: items_sold_last_7_days

type: int



- dimension: items_sold_last_30_days

type: int



Characterizing orders



There are a number of dimensions we create to classify orders. We create a derived table (orders_sequence.lookml) that generates the following values:





  • Order sequence


  • Order is first purchase


  • Order is second purchase


  • Order is repeat purchase




Fact table: orders_sequence.lookml



  - view: orders_sequence

derived_table:

sql: |

SELECT id

, order_sequence

FROM

(SELECT id

, created_at

, user_id

, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at)

as order_sequence

FROM orders)

persist_for: 24 hours

distkey: id

sortkeys: [id]



fields:



- dimension: id

type: int

primary_key: true

hidden: true



- dimension: order_sequence

type: int



- dimension: order_sequence_tier

type: tier

tiers: [0,1,2,5,10,20,40,60]

sql: ${order_sequence}



- dimension: is_first_purchase

type: yesno

sql: ${order_sequence} = 1



- dimension: is_second_purchase

type: yesno

sql: ${order_sequence} = 2



- dimension: is_a_repeat_purchase

type: yesno

sql: ${order_sequence} >= 2



Note: ROW_NUMBER() does not work in MySQL and is only available in Postgres dialects (most MPPs like Redshift and Vertica)


2 replies

In the orders_sequence.lookml


I may have missed it, but did order_item get defined somewhere or is just an example of how that characterization would work on a hypothetical order_item table. Or was it meant to be based on the example orders table?

Userlevel 3
Badge

Hi Rahul,


The reference to the order_item table is indeed theoretical- we did not define it in this article but you can assume that it is a table that contains the following columns: product_id, quantity, and order_date.


Cheers!

Reply