This content, written by Donal Tobin, was initially posted in Looker Blog on Sep 30, 2020. The content is subject to limited support.
Many Looker users find that Redshift is an excellent, high performance database that can be used to power sophisticated dashboards. However, given Redshift’s architecture, having an increased number of tiles in a dashboard can result in slower response times and queries that take longer to complete than they initially did.
In this piece, we’ll explore how to fine-tune your Redshift cluster so you can better match your Looker workloads to your Amazon Redshift configuration and render dashboards quickly and efficiently.
How data bottlenecks happen
For many organizations, standing up an analytics stack can initially be a bit of an experiment. It often starts with an identified data need, followed by somebody spinning up an Amazon Redshift cluster, building a few data pipelines, and then connecting an analytics platform, like Looker, to visualize the results. As the data-hungry workforce grows, however, more and more people request to be set up with dashboards and data access, which creates a new challenge — a data bottleneck.
The key to solving data bottlenecks lies in balancing your Looker workloads with your Redshift setup. To better understand this, let’s first discuss how Amazon Redshift processes queries before looking closer at how Looker generates workloads.
Amazon Redshift workload management and query queues
Amazon Redshift operates in a queuing model, and offers a key feature in the form of the . The WLM console allows you to set up different query queues, and then assign a specific group of queries to each queue.
For example, you can assign data loads to one queue, and your ad-hoc queries to another. By separating these workloads, you ensure that they don’t block each other. With the WLM, you can also assign the right amount of concurrency, or slot count, to each queue, which enables you to balance your workloads to allow ad hoc queries and scheduled loads to complete in a reasonable amount of time. As query volumes grow and you run more concurrent queries, your queries will start to get stuck in the queue as they wait for others to finish. When this happens, the result can be slower-loading Looker dashboards.
Understanding Looker workloads: LookML and PDTs
There are two components of the Looker platform that generate high query volumes with heavy workloads; LookML and PDTs.
is Looker’s data modeling language that separates query structure from content. With LookML, analysts only have to define the data structure once in a LookML project, which then enables business users to run queries of their own without having to write SQL. Looker then uses the LookML project to generate ad-hoc queries on the fly for both technical and nontechnical users.
Persistent Derived Tables (PDTs)
Some reports, or Looks, create complex queries that require the creation of temporary tables to do things like store the intermediate results of a query. These tables are ephemeral, and the queries needed to create them run every time a user requests the data. It’s for this reason that, when not performing well, derived tables can put extra strain on a cluster.
In instances when a query does take a while to run, creating a persistent derived table () can be the better option. Looker writes PDTs into a scratch and refreshes the PDT on a set schedule. And because the PDT has already been created when a user requests data from it, query time and database load is significantly reduced.
The impact of LookML and PDTs on query volume
To understand the impact LookML and PDTs can have on query volume, let’s consider a scenario where we have a single-node Amazon Redshift cluster, five business users, and a single LookML project.
Each of our business users have 10 dashboards with 20 Looks on each of their dashboards, and behind each Look is a single query. To refresh all of these dashboards, the resulting query load maps out to being
5 (users) x 10 (dashboards) x 20 (looks) = 1,000 queries
With a single-node Redshift cluster and a default WLM setup, you’ll only be able to process five queries at a time, meaning you’d need 200 cycles total to process all of these queries. And if each query takes 15 seconds to run, that would mean the last 15-second query will finish running after 50 minutes.
Even if you were to add a node now and double the amount of queries you could process, you’d only be cutting that wait time in half — which means you’d still be waiting 25 minutes for all the queries to run.
At this point, let’s say we add PDTs into the mix. Our PDTs will generate more workloads, often with complex, memory-intensive and long-running queries, meaning the PDTs will compete with our already slow ad-hoc queries.
There are a few different avenues we could take to help solve for this challenge. We could throttle the number of per-user queries, reduce the total row limit for queries, or allow fewer data points within each query. However, since the point of leveraging Looker is to derive meaningful insights from mass amounts of data, imposing these limitations may not make the most sense long term.
3 steps for improving query speeds and dashboard load-time
To solve for the challenges posed by increased use of Looker, below are three steps to help you optimize your query speeds, node count, and your overall Redshift spend.
Step 1) Optimize your Redshift WLM for Looker workloads
for better workload scalability can be done in four steps:
- Set up individual logins for each user.
- Define the workload type for each user (e.g. load, transform, ad-hoc).
- Group users by their workload type.
- Create one queue per workload type, and define the appropriate slot count and memory percentage for each queue.
Using this same logic, you can take these steps to optimize your Looker queries as well. By having your Looker queries run in a queue that’s separate from your loads and transforms, you’ll be able to define the right concurrency and memory configuration for that queue. The result? Each Looker query will be optimized to run with enough memory to complete the query without spilling to disk, thus minimizing the volume of disk-based queries.
Additionally, during peak times of use, gives Redshift clusters additional capacity to handle bursts in query load, routing queries based on their WLM configuration and rules. With an intermix.io dashboard, you can see the high watermark/peak concurrency for your Looker queries along with how much memory they consume, giving you a better idea of what memory percentage you should assign to each slot.
Step 2) Optimize your Looker workloads
Optimizing your Looker workloads is as simple as reducing the number of recurring redundant workloads. A redundant workload is a query that’s running but doesn’t need to be — like when users refresh dashboards more frequently than they need. By reducing the refresh rate, your Redshift cluster won’t be tasked with as many queries, resulting in more overall throughput and faster dashboard loads.
The intermix.io feature allows you to easily identify which users are running more redundant queries than others so you can appropriately adjust and free up query volume.
Step 3) Optimize your Redshift node count
Once you’ve squeezed all the juice out of your WLM, it’s time to adjust your Redshift node count. If you’re still encountering concurrency issues or disk-based queries, it may be time to add more nodes. In most cases, though, there’s an opportunity to reduce node count and save on your Redshift spend, like the teams at did, which resulted in reducing their Amazon Redshift spend by 25 percent.
There you have it! By understanding the various impacts Redshift WLM settings have on query performance, you can optimize your Redshift cluster and continue feeding your data-hungry workforce with fast, high-performing dashboards.
Want to learn how intermix.io can help you run your Looker dashboards at top efficiency? Reach out to our intermix team to on the platform.