and Looker: instant in-house audience analytics

  • 28 March 2022
  • 0 replies

Userlevel 5

This content, written by Andrew Montalenti, was initially posted in Looker Blog on Oct 4, 2016. The content is subject to limited support. recently introduced their Data Pipeline product. Building on the analytics infrastructure expertise they’ve developed by processing over 50 billion user events per month, is now making its fully-managed pipeline available as a service for developers. Specifically, their open source recipes for streaming data compatible with and provide an easy way to get started with Looker. Andrew Montalenti, CTO of, details the joint solution.

Making analytics work for content is an analytics platform designed to make it easy for anyone access and understand their digital audiences. Whether you need real-time or historical insights about your content, the dashboards and APIs help teams monitor, promote, and improve based on data.

Thousands of typically data averse editors, marketers and content creators have adopted the intuitive dashboards because we’ve removed the complexity and jargon around digital analytics. Plus, we’ve helped product teams rapidly develop features like to drive higher on-site engagement through a simple API.

The real-time overview screens within's web and mobile dashboards. has over 170 customers, including media companies like TechCrunch, Slate, and Mashable, and brands like Artsy, Ben & Jerry's and more.

Questions no vendor dashboard can answer

With our covering the basic questions for the entire organization, our customers became more sophisticated and started asking questions specific to their business. So, we introduced : a new service from that provides clean and enriched raw event data collected from your sites and apps via a fully-managed service.

We handle data collection on infrastructure that has already been scaled for over 700 top-traffic websites and 500 million monthly unique visitors. We also take care of enriching the raw events with useful information like geolocation and device categorizations leaving you with something immediately ready for analysis.

When decided to branch out from our core offering to provide , we knew we wanted to partner with a business intelligence platform that shared our company's philosophy of "analytics for everyone". Looker completely embraces this philosophy and was an easy choice as a launch partner for .

  • Are you a media company or digital publisher?

If you answered "yes", you now have your ideal data source for Looker. A clean, enriched raw data source for metrics like unique visitors, pageviews, sessions, time spent, video starts, video watch time, and more, then you can just integrate via our standard JavaScript tracker and SDKs, and the data will simply flow.

  • Are you a B2B or B2C marketer?

As more marketers invest in content marketing, getting data on content/audience engagement has become key to understanding and improving their strategy and proving their value.

If you're a B2B or B2C company that has been investing heavily in your website, knowledge base, online resources, public documentation, and blog, and you want to unify audience data from all of these sources to get a complete picture of your content marketing efforts, can help. You simply follow our standard integration instructions, and data will flow.

Getting the data pipeline to work with Looker

Event data is delivered to you in raw form via a fully-managed AWS Kinesis Stream (for real-time data) and AWS S3 Bucket (for historical). From there, you have two great options to load it into a Looker-compatible SQL database, while fully controlling your extract, transform and load (ETL) process:

  • Near-real-time bulk loads: Run a cron job or similar to issue a Redshift COPY command from your S3 bucket or a BigQuery load command from the same. This will get data into your warehouse with latencies as low as 15 minutes from the time of data arrival.

  • Real-time streaming writes: Spin up a long-lived process in your favorite language (we recommend Python) that consumes data from your Kinesis stream and does streaming writes to either a Kinesis Firehose Stream configured to point to your Redshift instance or to BigQuery's streaming write API. This provides the fastest latencies possible; for Google, this can be sub-minute latency.

Our product has for our event records, and we have these defined using Redshift and BigQuery DDL, as well. But even better -- our partnership with Looker means we've built a Looker Block atop this standard schema, meaning that most of the basic LookML modeling work is done for you already. This means you don't have to spend time deciphering attributes or learning how to properly structure a query to count unique visitors or sessions, you can just explore your data and start getting answers.

Above is an example Looker dashboard built from's Data Pipeline and the Looker Block for The customer receives the streaming data (via Kinesis/S3) and loads it into Amazon Redshift or Google BigQuery, while maintaining full control over the ETL process. Using the Block, Looker queries the standard column names and types that are common to raw events. This provides an easy starting point for your exploration, from which the LookML model can be further developed to support analyses unique to your company.

Want to get started with and Looker?

Refer to the Looker site for more details on the . Or access the Block by reaching out to your assigned Looker analyst, or and trial.

To learn more and start using the , sign up for an account today.

0 replies

Be the first to reply!