This content, written by Jonathan Tao, was initially posted in Looker Blog on Apr 14, 2016. The content is subject to limited support.
At Looker we combine data from many disparate sources, including our Salesforce data, to drive internal operations and audit our sales workflow against multiple streams of information. However, the potential for data discovery is dictated directly by the quality of data; and incomplete information makes it difficult or even impossible to gain valid answers to questions. While Looker is great for taking a closer look into your data, it's also great for identifying what's missing. In this post, we’ll explore how we leverage Looker to audit our Salesforce records, specifically regarding the free trials we run for our prospects.
Setting the stage
Before diving in, let’s take a step back and get a sense of the ETL processes and the flow of the data involved in moving data from Salesforce to a SQL database. At the start, we have the Salesforce data source, where trial records are created with some fundamental fields filled in when a trial is requested. Every few hours we move this data from Salesforce to a Redshift database. There’s many simple ways to get this done, via APIs or ETL vendors or other third-party tools. At Looker, we use an “Extract and Load” software called to easily centralize and maintain our data. There is a corresponding trial table that gets populated or updated depending on the state of the Salesforce trial record.
With this data flow in place, we now need to determine which trial record fields are important to track, and how we're defining potential errors in these fields. At what point should we require a field to be filled in? Are there other dependencies that would allow for certain field states but not others? What questions are we trying to answer, and what fields are needed to answer these questions? By evaluating the intent of our analytics, we can develop error checks to enforce the validity, accuracy, and completeness of trial record data.
Implementing audit logic in LookML
The easiest way to build out this logic in LookML is to create a dimension for every error condition we want to check. Below is an example of an error we want to check; as you can see, it is a simple CASE WHEN statement.
Notice how I’m writing a string with a “1” followed by the error description. This facilitates two derived dimensions that sum up the error count per trial, and also generate a text list of the error descriptions respectively. By delivering both pieces of information in a single string, I can extract the relevant portions with a simple SUBSTRING operator for each derived dimension. You can see how both work in tandem in the below report.
Notice how I’m leveraging html links in this report. I can click on the blue Salesforce cloud and be taken to that Salesforce trial record on demand. I can also click on the sparkline within a trial’s error report to get a better understanding of the errors being returned.
The dashboard and final thoughts
We can not only self-audit and correct Salesforce trial records with the above-mentioned report, but also gain further insights through dashboard-level metrics and KPIs. At Looker, we have a Salesforce Auditor dashboard that displays the following information, to name a few:
- Total error count per error
- Count of trials with errors
- Chart displaying the most delinquent analysts with incomplete trial records
As we watch the error count change over time, not only are we able to leverage this data for proper analytics, we also learn how to tune the trial record entry and error handling processes.
Data quality and integrity are vital for analytics. Through careful thought and implementation, we are able to indicate areas of data incompleteness while creating action items to correct these issues. Auditing Salesforce trial records is just one way that we’re using Looker to help Looker.