LookML Data Tests: Recommendations and Best Practices

  • 11 December 2020
  • 1 reply
  • 2305 views

Userlevel 2

tl;dr: Do you use data tests? If so, what are some practices that you like to employ, whether they’re similar to or different from mine? If not, consider making it a New Year’s resolution to give them a try!

 

Overview

LookML data tests were introduced in Looker version 6.20, and can be very helpful in everyday LookML development and unit testing. By creating a list of queries to run high-level checks against the database, the LookML developer can easily verify:

  1. Whether ETL (extract, transform, load) into the database was successful: Was data loaded or refreshed on schedule? Are the expected records still there, or did they disappear or duplicate due to some pipeline issue?

  2. The validity of LookML code changes: Is the new join logic correct, or does it now cause 0, null, or fanned-out records to appear on one side? Do derived tables contain data, or are they empty due to a syntactical or logical error?

 

Types of Tests

Start with basic checks that will catch if something is outright wrong with the database, such as a recently failed ETL or accidental database user permissions removal. Work your way up to more sophisticated validations as needed for your use cases.

Tests for ETL validation:

  1. Data exists: count measure is greater than 0

  2. Data is sufficient: count measure is greater than or equal to…

  3. Data is not redundant: count of primary key field is equal to 1 [the Docs page shows an example]

  4. All data is available: count or sum measure is equal to…for some filters [picking a fixed timeframe in the past works best]

  5. Data is recent: max of some timestamp is greater than or equal to… [You can do something like DATE_ADD(current_date(), interval -1 day) so you are always getting “yesterday” without hard-coding a date here]

Tests for LookML model validation:

  1. Joins are valid: count or sum measure in each joined-in view is greater than 0

  2. Joins are correct: count or sum measure in each joined-in view is equal to…for some filters [picking a fixed timeframe in the past works best]

 

What Warrants a Test?

The most common question people ask is, “Do I need a data test for every table in my database? Should I check every primary key?”

Ask yourself, “From a business perspective, what would be important for us to catch as soon as possible?” Is there a critical dashboard that should never be allowed to break? If you were to push some code that broke something, what would be the worst-case scenario? What would totally ruin your day as a LookML developer or administrator?

Consider making at least data tests 1–5 (see above) for every business-critical table. Data tests 6 and 7 are more useful if you are fiddling around with existing Explores—less so if you don’t plan on modifying any existing logic.

 

Creating Tests

The Docs page states, “You can create data tests in model files, view files, or in separate, dedicated data test files.” Most people find it most convenient to centralize all their tests in dedicated data test files.

When planning where you want to write your tests, consider this: You cannot pick one test to run by itself. You must run all the tests in a single LookML file, or run all the tests in your entire project. Thus, if you place all your tests in the same file, then you have to run all of them every time, which may not be efficient if you only care about validating the ETL or recent model changes (but not both together). However, if you place each view’s relevant data tests in each view file, then you have to open each file individually to run that batch of tests—which may not be efficient, either.

To create a “dedicated data test file,” simply click the Add file or folder option (with the + icon) in the LookML project, and select Create Generic LookML File in the drop-down. You can also create data tests in a view file. Functionally, both will work and there is no difference in behavior. Aesthetically, you may find it easier to locate or differentiate your data test files if they all have the .lkml extension instead of .view.

If you have many (more than a dozen) data tests, split them into two or more LookML files. One file can contain only the tests for ETL validation, and the other can hold only the tests for model validation. You can split the tests further by data source and/or model.

If you have IDE folders enabled, which most people do nowadays, you can place all your data test files in the same project folder for ease of access.

 

Running Tests

Establish a process around the data tests. One simple way to make it part of your development routine:

  1. Run the ETL-related data tests at the start of your workday to minimize time wasted on something that broke overnight.

  2. Run the model-related tests before committing any code to verify your changes aren’t breaking anything.

  3. If one or more checks fail, follow the escalation path.*

You can also run data tests via API. Thus, you could create a simple script to run the tests automatically every day on a schedule.

* What is your escalation path if these fundamental checks fail?

  • If there appears to be an issue with the data in the database, you should contact a data engineer or database administrator to resolve the issue.

  • If your code changes appear to break something, is there a senior developer or team lead you can ask for help determining the right logic?

  • You can put this information in a Looker playbook or wiki if you have one, write it in a project Markdown file, or write it as a comment in the data tests LookML file.

 

Additional Notes

  1. Instead of writing the explore_source: ... { ... } code manually, you can open the Explore interface and pull in the desired fields and filters. Then click on the gear icon in the upper-right corner, and click on Get Derived Table LookML… in the drop-down menu.

  2. The explore_source: ... { ... } code in the data test file will have dotted lines under the Explore name and each field reference, with an i warning to the side. You can make these warnings go away by entering an include parameter and specifying the file that houses the definition of the corresponding Explore (typically the model file).

  3. When using a dedicated file to house your data tests, remember to include the data test file in any model file or view file where you want to run your data tests. This is in the Docs page, but worth reiterating because people commonly ask, “Why don’t I see the Run LookML Tests option in the file drop-down menu?”
    6sal99-KYZTPmkR-JR4zyfJxdxzaAI59KJ-NubRSvr-ZSgGsR6fXE_98etQthkX54bq0cEGw_S50jTpuGBL9JvMCUpsMzf-oEAmIZpNC6hcesfWp5wuVRN-4dLPdRBRnaD1bSYnw

  1. When asserting tests for specific filter criteria (e.g., “the number of active users in Q1 2019 was 5,482”), specify the timezone parameter in the explore_source. If the timezone is not specified, the explore_source query will use the database timezone by default, which may not match the results you were expecting based on your own Explore query.
    On the other hand, if you had determined your expected results by writing literal SQL queries directly against your database, then you
    don’t need the timezone parameter.

Do you use data tests? If so, what are some practices that you like to employ, whether they’re similar to or different from mine? If not, consider making it a New Year’s resolution to give them a try!


This topic has been closed for comments

1 reply

Userlevel 4

Rebecca, this is an amazing guide to Looker data tests. We’re finding that Looker customers still aren’t using this feature to its full potential, so it’s really helpful to have these ideas to point to.

In addition to running data tests from the API or manually in the IDE, you can also run data tests in continuous integration with Spectacles. Spectacles has an open-source CLI and is also an integrated, hosted web application. With the CLI, it’s as simple as running “spectacles assert”, which is a lot easier than rolling your own API script!