Flexible, agile, 'near-infinite', cohort analysis

  • 28 March 2022
  • 0 replies

Userlevel 5

This content, written by Joshua Moskovitz, was initially posted in Looker Blog on Nov 3, 2015. The content is subject to limited support.

What is cohort analysis?

Cohorting is the practice of grouping subjects, typically users, who share a specific facet, event, or attribute, and is used to compare behaviors over some amount of time. Shared attributes within the cohort, such as the date a user first visited your website, the application version they are using, or the number of lifetime purchases they've made, are known as the commonality. Cohort analysis answers questions related to behaviors of a group over time such as time spent on a website, length of a subscription, or lifetime spend.

Dynamic, flexible, custom cohorts allow self-service data discovery

At Looker we are often asked if we enable 'cohort analysis', and the answer is yes. Because of Looker's speed-of-thought data modeling and exploration capabilities, cohort analysis is quick and customizable. The short answer is yes, the longer answer is Looker allows for extremely agile, dynamic, and custom cohorting based on every available facet of your business. To understand why Looker makes cohorting by practically anything possible we must first understand just what cohorting is, and how important it is to help determine relationships and correlations in your data. Real data-driven organizations enable everyone to manipulate, test, and iterate on different cohorts, utilizing business experience to inform, and the data to infer and drive real business conclusions.

How can cohort analysis help build successful businesses?

Looker customers leverage cohorts to determine the effect of product, operations, and initiatives on key performance indicators, the needles businesses are trying to move. Cohorting allows us to more scientifically measure trends in an apples-to-apples way. Creating isolated user groups allow us to measure the rate of change of specific metrics based on attributes of a cohort. Businesses can then make decisions in an attempt to enable positive changes or mitigate negative changes. You're able to test and iterate towards measurable goals.

There are numerous aspects of the business that cohort analysis can be leveraged to answer strategic questions, including but definitely not limited to:

Product Recommendations A/B Testing Gaming Retention Lead Scoring
Cart Check-Outs Optimizations Social Network and Application Activity ROI and Breakeven Analysis Looker's Customer Health Tiers
  Funnel Iteration and Analysis Feature Utilization    

By defining cohorts and specific KPIs to monitor across them, businesses can create process, programs, and initiatives to help drive changes in the behavior of cohorts. Upworthy created an entirely new way to think about engagement in the context of content, attention minutes. At Looker, it is means , investigating issues they face, and finding ways to help.

What makes Looker so great at cohorting?

We're not the only tool that helps companies create cohorts. There are tools that fulfill very specific cohorting purposes. Kissmetrics for events, RJ Metrics for e-commerce behavior, etc.. But none allow for the degree of possibilities, flexibility, and agility when it comes to defining and exploring cohorts, as does Looker. This is made possible for a few reasons.

First, . Companies have the capability to collect and store an incredible wealth of data on users and events, each providing novel ways to bucket our data. The advance of simple web hooks, javascript snippets, and even point and click event trackers, allow for low friction data collection. We’re storing so much that exploring the data for meaningful insights has opened up an entire new world of data discovery platforms, like Looker, in an attempt to make find value by grouping, and measuring behavior within each cohort. New external systems, third party trackers, and data sources continue to emerge, each with their own capabilities of capturing information on existing or potential users, their eyeballs and wallets.

Secondly, databases are getting more flexible, much faster, and capable of storing massive amounts of data. New technologies such as , , and Spark on Hadoop, have opened up the ability to explore massive data sets faster than ever before. We're now seeing technology emerge that allow exploring massive sets of data, with realistic query performance.

Finally, third party systems and vendors have emerged that easily allow for the capture and migration of data. We often encounter the need to move data from another database or system, into a that performant, centralized warehouse. There is the emergence of ETL (Extract, Transform, and Load) vendors willing to help you reconcile disparate data sources into one place. This means we can store rich information on our users, as well as incredible data on the actions they take, and then combine that with our operational data systems, allowing us to build incredible, 360 degree views, of businesses.

When combined, and in conjunction, with transactional data (actions or events you measure within orders, events, etc.) - you've got a 'near-infinite' number of ways to slice and dice your population. Looker was designed specifically for this purpose, to allow for exploration of your entire data set - defining cohorts, filtering, pivoting, and measuring the changes in custom-built KPIs along the way - enabling agile, flexible, and fast data exploration.

To make this easier for our customers to get started we've designed , LookML design patterns that customers can implement on their data in order to cohort their uses in different ways, delivering new and useful analysis in the process. We're excited to see what our customers build as they continue to plug and play. We even have a Looker Block dedicated specifically to .

Dimensions create the cohorts

In Looker there are the concepts of dimensions and measures. A dimension can represent a column in database table, or be created from another dimension or calculation. Each distinct dimension’s value becomes a grouping, cohort, or bucket. Then when including any dimension, or combination of them, will create distinct cohorts for each possible combination of potential dimension values. For example, if we had the following user information table:

Dimension: user_created_month salary_segment source app_version
Values: january, february, march, april,... low, medium, high facebook, twitter, linked_in 3.0, 3.5, 4.0

Here's what an explore looks like if we cohort by the month a user was created, pivot by months in which each cohort has placed an order, and then measure the user's average order value. Each row is a cohort of users*

Measures track KPIs, telling the story along the way.

Measures are the values we count, the totals we sum, the averages we compute, or the rates of change we're interested in tracking over time. They are typically derived, and include traditional metrics like average order value, total session length, lifetime number of purchases. Or they can be completely unique like attention minutes, a string of events in a short time period, or a computed health score based on feature usage.

Here's an example of how we track a customer's health at Looker, the line in the visualization measures customer health, it's completely unique to Looker. It's calculates a unique score based on a range of customer specific usage metrics. We can track an individual client's progress through the various client health cohorts (red, yellow, and green). This particular customer is consistently in our healthiest customer cohort, green. This unique measure allows us to create an entirely new and customized way to group our customers which in turn allows us to build programs, communications, and operations around them. With this in place we have a framework to easily conduct cohort analysis by grouping all customers that were healthy during the same time period. We can then track their scores overtime in an attempt to determine which engagement factors influence health over time.

Cohort all the things!

Analysis gets even more powerful with LookML, providing an easy, and quick mechanism, to define new dimensions, and derived tables, computing facts, in the process enabling new ways to cohort the users, or other entities. A common pattern we suggest in Blocks, are the , to summarize key dimensions about entities in your dataset, such as your users and their purchase history.

In e-commerce this often is a user order facts table, that allows for cohorting users by number of purchases, first purchase month, latest purchase month, and many other unique ways to compute facts about a user and their order history. Event data may include session, visit, or visitor fact tables that allow for cohorting by session length, lifetime number of visits, landing pages, and so on.

When combining both transactional and event data, you can start to analyze explore some very interesting cohorts, asking robust questions along the way. We've built examples of this in our own Looker demo, it really creates an incredible view of user behavior. We can begin to ask such rich and interesting questions of the data.

Like, "What's the average session duration of users cohorted by their days as a customer broken out by landing page category?"

This is what we're so excited about at Looker. We continue to see our customers architecting , wrangling and merging data sources in the process, for it all to eventually land in a centralized warehouse. From there, they're able to ask and answer questions that are completely unique to the business. When certain metrics they'd like to see don't exist, they define them. The data model begins to grow as quickly as their businesses, evolving to answer every strategic business question along the way.

To learn more about how companies are using Looker for cohort analysis, read our white paper.

0 replies

Be the first to reply!