[Analytics Block] Customer Retention and Lifetime Value - Survival Style

  • 31 August 2017
  • 3 replies
  • 962 views

Userlevel 4

Customer retention and value are some of the most analyzed metrics here at Looker and with our customers. There is a wealth of articles out there on how to analyze customer retention (Retention Analysis, Gaming Retention), the aim of this article is to build upon the linked articles and extend them to include customer lifetime value.



Expected Output


The output of this analysis should provide us with a method for quickly analyzing a customers lifetime value (LTV) and their survival curve (the point at which customers stop purchasing/engaging or fails to be retained).



The code will allow users to look at lifetime value and retention and quickly split it out by specific cohorts and help you identify areas of your business where customer retention and value are potentially over or under performing.





Explore Data in Full Screen



The Math - Kaplan-Meier & The Cumulative Distribution Curve


We will use a very simplistic but well studied, documented and employed statistical technique for this analysis. The Kaplan-Meier estimator is a statistic used to calculate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment or in marketing it can be used to analyze the amount of customers still purchasing after a defined period of time. Once we have calculated the survival model of customer lifetime, we use it to estimate lifetime value? Thanks to the the cumulative distribution we can use our survival/retention model to calculate the lifetime value of a customer.



Data Types and Technical Info


The pattern below has been designed to work with e-commerce customer purchasing data set, however it could be augmented to be used with a subscription based SaaS customer business or any other business interested in monitoring customer retention and value. From a data perspective we simply require a transactional table (order_items) that contains the purchasing information for each user and the users registration date and demographic information for cohorting (users).







The Code


The Code



view: kaplan_meier {

derived_table: {

sql:

--create the inital data set containing pivot, user and value information

with data as(SELECT

CASE WHEN NOT({% condition pivot_age %} 'xxxxxxxxxx' {% endcondition %}) then users.age::text

WHEN NOT({% condition pivot_gender %} 'xxxxxxxxxx' {% endcondition %}) then users.gender::text

WHEN NOT({% condition pivot_traffic_source %} 'xxxxxxxxxx' {% endcondition %}) then users.traffic_source::text

ELSE users.country END AS pivot,

DATEDIFF('month',users.created_at,order_items.created_at) AS "months_since_signup",

COUNT(DISTINCT users.id ) AS "userscount",

sum(DISTINCT sale_price ) AS "total_value"

FROM order_items AS order_items

LEFT JOIN users AS users ON order_items.user_id = users.id

WHERE

((DATEDIFF('month',users.created_at,order_items.created_at) >= 0

AND DATEDIFF('month',users.created_at,order_items.created_at) <= 12))

and {% condition country %} users.country {% endcondition %}

and {% condition age %} users.age {% endcondition %}

GROUP BY 1,2 ORDER BY 1,2

) ,

--use lag functions to calculate monthly % retention

y as(

select *,lag(userscount) over(partition by pivot order by months_since_signup ) as lags, case

when ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup )) >= 0 then ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup ))

when ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup )) < 0 then ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup )) *-1 +1

else 1 end as retention_that_month

from data

order by months_since_signup asc)

--create output table containg survival rate

select *,

case when lags is null

then 1

else retention_that_month * lag(retention_that_month) over(partition by pivot order by months_since_signup ) end as survival_rate from y

order by months_since_signup asc

;;

}





####Templated Filters####

filter:

age {

label: "Age"

description: "Users' Age"

view_label: "Filter Fields"

type: number

}



filter: country {

label: "Country"

description: "Users'Country"

view_label: "Filter Fields"

type: string

}



####Pivot Age####

filter: pivot_age {

label: "age"

view_label: "Pivot Fields"

type: string

default_value: "DONOTCHANGE"

}



filter: pivot_gender {

label: "gender"

view_label: "Pivot Fields"

type: string

default_value: "DONOTCHANGE"

}



filter: pivot_traffic_source {

label: "traffic source"

view_label: "Pivot Fields"

type: string

default_value: "DONOTCHANGE"

}



####Pivot Dimension####

dimension: pivot {

description: "MUST BE SELECTED. Defaults to users country"

type: string

sql: ${TABLE}.pivot ;;

}







measure: count {

type: count

drill_fields: [detail*]

}



dimension: months_since_signup {

type: number

sql: ${TABLE}.months_since_signup ;;

}



dimension: userscount {

type: number

sql: ${TABLE}.userscount ;;

}



dimension: total_value {

type: number

sql: ${TABLE}.total_value ;;

}



dimension: lags {

type: number

sql: ${TABLE}.lags ;;

}



dimension: retention_that_month1 {

hidden: yes

type: number

sql: ${TABLE}.retention_that_month ;;

}



dimension: survival_rate1 {

hidden: yes

type: number

sql: ${TABLE}.survival_rate ;;

}



measure: avg_value {

type: sum

sql:${survival_rate1}*(${total_value}/${userscount}) ;;

}



measure: LTV_Curve {

type: running_total

value_format_name: usd

sql: ${avg_value} ;;

direction: "column"

}



measure: survival_rate{

value_format_name: percent_2

type: sum

sql: ${survival_rate1} ;;

}



measure: retention_that_month{

value_format_name: percent_2

type: sum

sql: ${retention_that_month1} ;;

}





set: detail {

fields: [months_since_signup, userscount, lags, retention_that_month, survival_rate]

}

}



view: users {

sql_table_name: public.users ;;



dimension: id {

primary_key: yes

type: number

sql: ${TABLE}.id ;;

}



dimension: age {

type: number

sql: ${TABLE}.age ;;

}



dimension: country {

type: string

map_layer_name: countries

sql: ${TABLE}.country ;;

}



dimension_group: created {

type: time

sql: ${TABLE}.created_at ;;

}



measure: count {

type: count

drill_fields: [id, first_name, last_name, events.count, order_items.count]

}

}



view: order_items {

sql_table_name: public.order_items ;;



dimension: id {

primary_key: yes

type: number

sql: ${TABLE}.id ;;

}



dimension_group: created {

type: time

timeframes: [

raw,

time,

date,

week,

month,

quarter,

year

]

sql: ${TABLE}.created_at ;;

}



dimension: order_id {

type: number

sql: ${TABLE}.order_id ;;

}



dimension: sale_price {

type: number

sql: ${TABLE}.sale_price ;;

}



dimension: user_id {

type: number

# hidden: yes

sql: ${TABLE}.user_id ;;

}



measure: count {

type: count

}



}




3 replies

Getting an error on the explore



@benpm Thanks for bringing this to our attention. We’re investigating now!

Userlevel 4

This has been fixed - thanks for the call out.

Reply