How to: Tuning your Looker Project for speed using...

craig_juergens · 06-22-2023 04:25 PM

This post will explain how you can begin to optimize your Looker instance by tuning your Model file (no view-level changes required!). This post assumes that your project is set up to read tables directly from the database (rather than using derived tables, which will be reviewed in a future post). The comments below correspond to the code sample below.

explore: orders_history_header {
  view_label: " eCom Order Header" -- leading space controls alphabetization of Views
  description: "A rolling 2 years of eCom order history."
  always_filter: {
    filters: [orders_history_header.order_create_date: "this month"]
  } -- (comment 1)
  sql_always_where: ${orders_history_header.partition_date} 
    > date_sub(current_date(), interval 25 month) -- (comment 2)
    AND ${orders_history_header.is_test_order} = false ;; -- (comment 3)

  join: orders_history_detail {
    view_label: "eCom Order Detail"
    type: left_outer
    relationship: many_to_one
    sql_on: ${orders_history_header.order_id} = ${orders_history_detail.order_id} 
      AND ${orders_history_detail.partition_date} 
      > date_sub(current_date(), interval 25 month) ;; -- (comment 4)
  }

  join: item_master {
    type: left_outer
    relationship: many_to_one
    sql_on: ${orders_history_header.item_id} = ${item_master.item_id}
      AND ${item_master.item_create_date} 
      > date_sub(current_date(), interval 37 month) ;; -- (comment 5)
    fields: [item_dimensions*] -- (comment 6)
  }
}

1. Whenever possible, 'nudge' your users to use a universally applicable filter. The example above pushes the user to use a date filter so they are less inclined to pull an 'all time' history of your orders (which would be costly to the database and not insightful to the end user). You can require multiple filters, give default filter answers, or leave filters blank. You can read more about this here: always_filter

2. Optimize the end-user's query as much as possible behind the scenes. In this example, we have implemented a rolling 2 year history constraint. The sql_always_where is pointed at the partition field of the underlying database table. This conveniently ensures that the partition field is included in every query generated out of this Explore. The database will use this partition field to scan only part of the overall table, thereby decreasing query run time as well as database cost.

3. This Explore excludes all test orders. Although these could easily be filtered out in ETL further upstream, we have chosen to exclude them in the Looker layer. This allows savvy BI Analysts to query the underlying database easily for testing purposes AND ensure that end-users are only presented with genuine business performance in their reporting.

4. The rolling 25 month constraint is included again in this join to optimize queries as they pass through each fact table. Imagine if you joined a rolling 25 months' of orders_history_header to the entire orders_history_detail table! This would be very inefficient.

5. Joining out to a look-up table can be optimized by again implementing a rolling date window. Note this date window may need to be longer than your fact tables (assumption: items are created months BEFORE they are sold).

6. Be considerate to your fellow LookML developers (and your future self!) by using field_sets in your Model file. This prevents your Model file from containing long and possible repetitive lists of fields.

Please let me know what tricks you use to make your Project faster in the comments below.

How to: Tuning your Looker Project for speed using only your Model file.