This content, written by Lloyd Tabb, was initially posted in Looker Blog on Jan 15, 2016. The content is subject to limited support.
A cohort is a group of people banded together and treated as a group. In this model, we are going to treat all people of a given gender born in a particular year in a particular state as a cohort.
Cohort table: the code
We're going to use to create a cohort table that takes the raw data and computes the number of people in each cohort. We'll declare dimensions for the cohorts so we can join this table to our usual names table. We'll make the year, state, and gender dimensions hidden so they do not appear in the field picker.
We'll also declare a measure cohort_size, so we can combine smaller cohorts into larger ones. This computation is very difficult to do in other systems but correctly handles this computation.
view: cohorts {
derived_table: {
sql: SELECT
year, state, gender,
SUM(number) as number,
row_number() OVER () as id
FROM names
GROUP BY 1,2,3
;;
}
dimension: id {
primary_key: yes
}
dimension: year {
hidden: yes
}
dimension: state {
hidden: yes
}
dimension: gender {
hidden: yes
}
dimension: number {
#X# Invalid LookML inside "dimension": {"type":null}
}
measure: cohort_size {
type: sum
sql: ${number} ;;
}
}
Code: the join
We're going to join each name to the cohort that has the same gender, year, and state:
# build a percent of cohort to normalize data
explore: names_step_4 {
join: cohorts {
sql_on: ${state} = ${cohorts.state}
AND ${year} = ${cohorts.year}
AND ${gender} = ${cohorts.gender}
;;
relationship: many_to_one
}
}
The view
And finally, we simply add a single measure, percentage_of_cohort. This measure will tell us what percentage this particular name or names were of the total cohort. Looker correctly computes this measure no matter what filters we add or which dimensions we group by.
# build out cohort analysis to normalize data by cohort
# so you can compare names popularity between states or years
#
view: names_step_4 {
extends: [names_step_3]
measure: percentage_of_cohort {
type: number
sql: ${total_population} / ${cohorts.cohort_size} ;;
value_format_name: percent_2
}
}
Explore!
All of the queries below can be created by . Select dimensions and measures, add filters and bonk the 'Run Button'.
So far in this analysis, we've been using population to tell us the popularity of a given name. However, looking at this cohort graph we can see that we've been wrong. One line shows that the population of people with the name James has risen and fallen. The other line shows that from 1920 to 1950 fully 3% of all people were named 'James' but since then the popularity of the name has been on decline. Using cohorts, we can normalize the data to the population size and see actual trends in popularity.
The cohorts work regardless of how we group the data. Here is a graph of popularity of the name 'James' by state. Percentages are against all the male people born in that state for all time.
We can see the changes in popularity of the name James over time by region. The popularity calculation works regardless of how the data is grouped or filtered.
Here's a list of all of the steps: