Customer retention and value are some of the most analyzed metrics here at Looker and with our customers. There is a wealth of articles out there on how to analyze customer retention (Retention Analysis, Gaming Retention), the aim of this article is to build upon the linked articles and extend them to include customer lifetime value.
Expected Output
The output of this analysis should provide us with a method for quickly analyzing a customers lifetime value (LTV) and their survival curve (the point at which customers stop purchasing/engaging or fails to be retained).
The code will allow users to look at lifetime value and retention and quickly split it out by specific cohorts and help you identify areas of your business where customer retention and value are potentially over or under performing.
The Math - Kaplan-Meier & The Cumulative Distribution Curve
We will use a very simplistic but well studied, documented and employed statistical technique for this analysis. The Kaplan-Meier estimator is a statistic used to calculate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment or in marketing it can be used to analyze the amount of customers still purchasing after a defined period of time. Once we have calculated the survival model of customer lifetime, we use it to estimate lifetime value? Thanks to the the cumulative distribution we can use our survival/retention model to calculate the lifetime value of a customer.
Data Types and Technical Info
The pattern below has been designed to work with e-commerce customer purchasing data set, however it could be augmented to be used with a subscription based SaaS customer business or any other business interested in monitoring customer retention and value. From a data perspective we simply require a transactional table (order_items) that contains the purchasing information for each user and the users registration date and demographic information for cohorting (users).
The Code
view: kaplan_meier {
derived_table: {
sql:
--create the inital data set containing pivot, user and value information
with data as(SELECT
CASE WHEN NOT({% condition pivot_age %} 'xxxxxxxxxx' {% endcondition %}) then users.age::text
WHEN NOT({% condition pivot_gender %} 'xxxxxxxxxx' {% endcondition %}) then users.gender::text
WHEN NOT({% condition pivot_traffic_source %} 'xxxxxxxxxx' {% endcondition %}) then users.traffic_source::text
ELSE users.country END AS pivot,
DATEDIFF('month',users.created_at,order_items.created_at) AS "months_since_signup",
COUNT(DISTINCT users.id ) AS "userscount",
sum(DISTINCT sale_price ) AS "total_value"
FROM order_items AS order_items
LEFT JOIN users AS users ON order_items.user_id = users.id
WHERE
((DATEDIFF('month',users.created_at,order_items.created_at) >= 0
AND DATEDIFF('month',users.created_at,order_items.created_at) <= 12))
and {% condition country %} users.country {% endcondition %}
and {% condition age %} users.age {% endcondition %}
GROUP BY 1,2 ORDER BY 1,2
) ,
--use lag functions to calculate monthly % retention
y as(
select *,lag(userscount) over(partition by pivot order by months_since_signup ) as lags, case
when ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup )) >= 0 then ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup ))
when ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup )) < 0 then ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup )) *-1 +1
else 1 end as retention_that_month
from data
order by months_since_signup asc)
--create output table containg survival rate
select *,
case when lags is null
then 1
else retention_that_month * lag(retention_that_month) over(partition by pivot order by months_since_signup ) end as survival_rate from y
order by months_since_signup asc
;;
}
####Templated Filters####
filter:
age {
label: "Age"
description: "Users' Age"
view_label: "Filter Fields"
type: number
}
filter: country {
label: "Country"
description: "Users'Country"
view_label: "Filter Fields"
type: string
}
####Pivot Age####
filter: pivot_age {
label: "age"
view_label: "Pivot Fields"
type: string
default_value: "DONOTCHANGE"
}
filter: pivot_gender {
label: "gender"
view_label: "Pivot Fields"
type: string
default_value: "DONOTCHANGE"
}
filter: pivot_traffic_source {
label: "traffic source"
view_label: "Pivot Fields"
type: string
default_value: "DONOTCHANGE"
}
####Pivot Dimension####
dimension: pivot {
description: "MUST BE SELECTED. Defaults to users country"
type: string
sql: ${TABLE}.pivot ;;
}
measure: count {
type: count
drill_fields: [detail*]
}
dimension: months_since_signup {
type: number
sql: ${TABLE}.months_since_signup ;;
}
dimension: userscount {
type: number
sql: ${TABLE}.userscount ;;
}
dimension: total_value {
type: number
sql: ${TABLE}.total_value ;;
}
dimension: lags {
type: number
sql: ${TABLE}.lags ;;
}
dimension: retention_that_month1 {
hidden: yes
type: number
sql: ${TABLE}.retention_that_month ;;
}
dimension: survival_rate1 {
hidden: yes
type: number
sql: ${TABLE}.survival_rate ;;
}
measure: avg_value {
type: sum
sql:${survival_rate1}*(${total_value}/${userscount}) ;;
}
measure: LTV_Curve {
type: running_total
value_format_name: usd
sql: ${avg_value} ;;
direction: "column"
}
measure: survival_rate{
value_format_name: percent_2
type: sum
sql: ${survival_rate1} ;;
}
measure: retention_that_month{
value_format_name: percent_2
type: sum
sql: ${retention_that_month1} ;;
}
set: detail {
fields: [months_since_signup, userscount, lags, retention_that_month, survival_rate]
}
}
view: users {
sql_table_name: public.users ;;
dimension: id {
primary_key: yes
type: number
sql: ${TABLE}.id ;;
}
dimension: age {
type: number
sql: ${TABLE}.age ;;
}
dimension: country {
type: string
map_layer_name: countries
sql: ${TABLE}.country ;;
}
dimension_group: created {
type: time
sql: ${TABLE}.created_at ;;
}
measure: count {
type: count
drill_fields: [id, first_name, last_name, events.count, order_items.count]
}
}
view: order_items {
sql_table_name: public.order_items ;;
dimension: id {
primary_key: yes
type: number
sql: ${TABLE}.id ;;
}
dimension_group: created {
type: time
timeframes: [
raw,
time,
date,
week,
month,
quarter,
year
]
sql: ${TABLE}.created_at ;;
}
dimension: order_id {
type: number
sql: ${TABLE}.order_id ;;
}
dimension: sale_price {
type: number
sql: ${TABLE}.sale_price ;;
}
dimension: user_id {
type: number
# hidden: yes
sql: ${TABLE}.user_id ;;
}
measure: count {
type: count
}
}
Getting an error on the explore
@benpm Thanks for bringing this to our attention. We’re investigating now!
This has been fixed - thanks for the call out.