Question

Calculating Cohort Size for Cohorting

  • 29 March 2016
  • 1 reply
  • 129 views

Userlevel 1

Note: This article is a companion to Cohort Exploration Normalized by Cohort Size.



Often, the size of a cohort can be determined using Totals or Row Totals (as in the example here), but other times one might wish to calculate this in a separate table–for example, if not every member of the cohort in question has participated in the action being measured.


This is a derived table pattern that can be used to calculate the size of a cohort, which is useful when considering cohort retention curves, as described in Cohort Exploration Normalized by Cohort Size. The specific SQL below is for Redshift–the easiest way to figure out how this timestamp transformation should use for your database is to grab the month timeframe and see the SQL Looker generates.



Old LookML
```
- view: monthly_user_signup_cohort_size
derived_table:
sql: |
SELECT
TO_CHAR(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', users.created_at), 'YYYY-MM') AS cohort_month
, COUNT(DISTINCT users.id) AS cohort_size
FROM users
WHERE {% condition cohort_month %} orders.created_at {% endcondition %}
GROUP BY 1

fields:

- dimension: cohort_month

type: date_month

sql: ${TABLE}.cohort_month


- dimension: cohort_size
type: number
sql: ${TABLE}.cohort_size

</details>

<details>
<summary>New LookML</summary>

view: monthly_user_signup_cohort_size {

derived_table: {

sql: SELECT

TO_CHAR(CONVERT_TIMEZONE(‘UTC’, ‘America/Los_Angeles’, users.created_at), ‘YYYY-MM’) AS cohort_month

, COUNT(DISTINCT users.id) AS cohort_size

FROM users

WHERE {% condition cohort_month %} orders.created_at {% endcondition %}

GROUP BY 1

;;

}


dimension: cohort_month {

type: date_month

sql: ${TABLE}.cohort_month ;;

}


dimension: cohort_size {

type: number

sql: ${TABLE}.cohort_size ;;

}

}


</details>

The `{% condition cohort_month %} orders.created_at {% endcondition %}` portion ensures the table only builds over the months you're considering (see [Templated Filters and Derived Tables](https://discourse.looker.com/t/templated-filters-and-derived-tables/1749)), which will help with performance.

We can now join this derived table back to where we're building out the cohort exploration:

<details open>
<summary>Old LookML</summary>



  • explore: orders

    joins:




    • join: users

      relationship: many_to_one

      type: left_outer

      sql_on: ${orders.user_id} = ${users.id}




    • join: monthly_user_signup_cohort_size

      relationship: many_to_one

      type: left_outer

      sql_on: ${monthly_user_signup_cohort_size.cohort_month} = ${users.created_month}






</details>

<details>
<summary>New LookML</summary>

explore: orders {

join: users {

relationship: many_to_one

type: left_outer

sql_on: ${orders.user_id} = ${users.id} ;;

}


join: monthly_user_signup_cohort_size {

relationship: many_to_one

type: left_outer

sql_on: ${monthly_user_signup_cohort_size.cohort_month} = ${users.created_month} ;;

}

}


</details>

1 reply

For each old LookML code block in this article and its comments, we just added the New LookML equivalent code.

Reply