[Analytics Block] Customer Retention and Lifetime Value - Survival Style

Customer retention and value are some of the most analyzed metrics here at Looker and with our customers. There is a wealth of articles out there on how to analyze customer retention (Retention Analysis, Gaming Retention), the aim of this article is to build upon the linked articles and extend them to include customer lifetime value.

Expected Output
The output of this analysis should provide us with a method for quickly analyzing a customers lifetime value (LTV) and their survival curve (the point at which customers stop purchasing/engaging or fails to be retained).

The code will allow users to look at lifetime value and retention and quickly split it out by specific cohorts and help you identify areas of your business where customer retention and value are potentially over or under performing.

Explore Data in Full Screen

The Math - Kaplan-Meier & The Cumulative Distribution Curve
We will use a very simplistic but well studied, documented and employed statistical technique for this analysis. The Kaplan-Meier estimator is a statistic used to calculate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment or in marketing it can be used to analyze the amount of customers still purchasing after a defined period of time. Once we have calculated the survival model of customer lifetime, we use it to estimate lifetime value? Thanks to the the cumulative distribution we can use our survival/retention model to calculate the lifetime value of a customer.

Data Types and Technical Info
The pattern below has been designed to work with e-commerce customer purchasing data set, however it could be augmented to be used with a subscription based SaaS customer business or any other business interested in monitoring customer retention and value. From a data perspective we simply require a transactional table (order_items) that contains the purchasing information for each user and the users registration date and demographic information for cohorting (users).

The Code

The Code

view: kaplan_meier {
derived_table: {
sql:
--create the inital data set containing pivot, user and value information
with data as(SELECT
                   CASE WHEN NOT({% condition pivot_age %} 'xxxxxxxxxx' {% endcondition %}) then users.age::text
                         WHEN NOT({% condition pivot_gender %} 'xxxxxxxxxx' {% endcondition %}) then users.gender::text
                         WHEN NOT({% condition pivot_traffic_source %} 'xxxxxxxxxx' {% endcondition %}) then users.traffic_source::text
                         ELSE users.country END AS pivot,
                  DATEDIFF('month',users.created_at,order_items.created_at)  AS "months_since_signup",
                  COUNT(DISTINCT users.id ) AS "userscount",
                  sum(DISTINCT sale_price ) AS "total_value"
                FROM order_items  AS order_items
                LEFT JOIN users  AS users ON order_items.user_id = users.id
                WHERE
                ((DATEDIFF('month',users.created_at,order_items.created_at)  >= 0
                AND DATEDIFF('month',users.created_at,order_items.created_at)  <= 12))
                and {% condition country %} users.country {% endcondition %}
                and {% condition age %} users.age {% endcondition %}
              GROUP BY 1,2  ORDER BY 1,2
              ) ,
--use lag functions to calculate monthly % retention
              y as(
                    select *,lag(userscount) over(partition by pivot order by months_since_signup ) as lags, case
                    when ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup )) >= 0 then ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup ))
                    when ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup )) < 0 then ((lag(userscount) over(partition by pivot order by months_since_signup ) -userscount)*1.0/ lag(userscount) over(partition by pivot order by months_since_signup )) *-1 +1
                    else 1 end as retention_that_month
                    from data
                    order by months_since_signup asc)
--create output table containg survival rate
  select *,
  case when lags is null
  then 1
  else  retention_that_month * lag(retention_that_month) over(partition by pivot order by months_since_signup ) end as survival_rate  from y
  order by months_since_signup asc
   ;;
}


####Templated Filters####
filter: 
age {
label: "Age"
description: "Users' Age"
view_label: "Filter Fields"
type: number
}

filter: country {
label: "Country"
description: "Users'Country"
view_label: "Filter Fields"
type: string
}

####Pivot Age####
  filter: pivot_age {
label: "age"
view_label: "Pivot Fields"
type: string
default_value: "DONOTCHANGE"
}

filter: pivot_gender {
label: "gender"
view_label: "Pivot Fields"
type: string
default_value: "DONOTCHANGE"
}

filter: pivot_traffic_source {
label: "traffic source"
view_label: "Pivot Fields"
type: string
default_value: "DONOTCHANGE"
}

####Pivot Dimension####
dimension: pivot {
description: "MUST BE SELECTED. Defaults to users country"
type: string
sql: ${TABLE}.pivot ;;
}



measure: count {
type: count
drill_fields: [detail*]
}

dimension: months_since_signup {
type: number
sql: ${TABLE}.months_since_signup ;;
}

dimension: userscount {
type: number
sql: ${TABLE}.userscount ;;
}

dimension: total_value {
type: number
sql: ${TABLE}.total_value ;;
}

dimension: lags {
type: number
sql: ${TABLE}.lags ;;
}

dimension: retention_that_month1 {
hidden: yes
type: number
sql: ${TABLE}.retention_that_month ;;
}

dimension: survival_rate1 {
hidden: yes
type: number
sql: ${TABLE}.survival_rate ;;
}

measure: avg_value {
type: sum
sql:${survival_rate1}*(${total_value}/${userscount})  ;;
}

measure: LTV_Curve {
type: running_total
value_format_name: usd
sql: ${avg_value} ;;
direction: "column"
}

measure: survival_rate{
value_format_name: percent_2
type: sum
sql: ${survival_rate1} ;;
}

measure: retention_that_month{
value_format_name: percent_2
type: sum
sql: ${retention_that_month1} ;;
}


set: detail {
fields: [months_since_signup, userscount, lags, retention_that_month, survival_rate]
}
}

view: users {
  sql_table_name: public.users ;;
  
  dimension: id {
    primary_key: yes
    type: number
    sql: ${TABLE}.id ;;
  }
  
  dimension: age {
    type: number
    sql: ${TABLE}.age ;;
  }
  
  dimension: country {
    type: string
    map_layer_name: countries
    sql: ${TABLE}.country ;;
  }
  
  dimension_group: created {
    type: time
    sql: ${TABLE}.created_at ;;
  }
  
  measure: count {
    type: count
    drill_fields: [id, first_name, last_name, events.count, order_items.count]
  }
}

view: order_items {
  sql_table_name: public.order_items ;;
  
  dimension: id {
    primary_key: yes
    type: number
    sql: ${TABLE}.id ;;
  }
  
  dimension_group: created {
    type: time
    timeframes: [
      raw,
      time,
      date,
      week,
      month,
      quarter,
      year
    ]
    sql: ${TABLE}.created_at ;;
  }
  
  dimension: order_id {
    type: number
    sql: ${TABLE}.order_id ;;
  }
  
  dimension: sale_price {
    type: number
    sql: ${TABLE}.sale_price ;;
  }
 
  dimension: user_id {
    type: number
    # hidden: yes
    sql: ${TABLE}.user_id ;;
  }
  
  measure: count {
    type: count
  }
 
  }
4 3 1,329
3 REPLIES 3

Getting an error on the explore

dave5
Participant III

@benpm Thanks for bringing this to our attention. We’re investigating now!

This has been fixed - thanks for the call out.

Top Labels in this Space
Top Solution Authors