The PDT Regenerator Process

adina_katz · ‎04-05-2021

Knowledge Drop

Last Tested: Nov 4, 2020

Purpose:
The regenerator's purpose is to manage the building of PDTs on the scratch schema. Its major tasks are to check datagroup triggers, build new PDTs that have been pushed to production and rebuild existing production PDTs whose trigger values have changed.

Number of Regenerator Threads:
Every connection has one regenerator thread. Since the regenerator is only a single thread, it can only perform one operation at a time. This means that it can only check a single trigger or build/rebuild a single PDT at a time (unless parallel PDTs are enabled on that connection, then multiple PDTs can be built at a time).

There is a maximum of 25 regenerator threads per instance. However, there is still only one thread per connection. This means that 25 PDTs could theoretically build simultaneously only if there were 25 different databases connected to the Looker instance. If there are more than 25 connections with PDTs enabled on an instance, multiple connections will share the same regenerator thread.

Regenerator Schedule:
The regenerator runs on the schedule set in the `PDT And Datagroup Maintenance Schedule` section of the connection settings. The schedule is set using a cron expression. A cron expression is a string comprising five or six fields separated by white space that represents a set of times. The default value is every 5 minutes. More on cron expressions here.
Note: The `PDT And Datagroup Maintenance Schedule` setting will accept a cron string for a timeframe that is more frequent than every 5 minutes, however the regenerator will only run at most every 5 minutes.

Regenerator process:
1. Checking Datagroup Triggers
The regenerator first checks all datagroup triggers. Meaning it runs the sql of the sql_trigger on the database. If the trigger value has changed from the value that is stored, it updates the value and marks the datagroup as ‘triggered’. After all datagroup triggers have been checked, the regenerator then moves on to building/rebuilding PDTs.

2. Building PDTs
For PDTs persisted with sql_trigger_value, the regenerator first runs the trigger sql on the database. If the results of the trigger sql have changed since the previous run, the regenerator then rebuilds the PDT. If a PDT is persisted with a datagroup, the regenerator first checks to see if that datagroup has been marked triggered. If the datagroup has been marked triggered the regenerator then rebuilds the PDT.

The order in which the regenerator builds/rebuilds PDT is random except that it rebuilds dependencies first.

See The PDT Build Process for information on the build process.

This content is subject to limited support.

mruth · ‎08-18-2021

Does this affect Looker performance, in terms of query response time? In other words, can more frequent ‘regenerator’ processes degrade performance?

thanks.

adina_katz · ‎12-08-2021

Hey @mruth ,

Great question!

In terms of performance of the Looker server, the regenerator is not a resource intensive process. So a more frequent regenerator cycle won’t affect Looker sending queries to the db / pulling query results from cache.

In terms of database performance, if the regenerator is building a particularly resource intensive table then this could have an affect on other queries being run on the database and affect query response time. This is more so caused by the PDT than the frequency of the Regenerator process. If a table like this is building frequently, it can certainly be combated by decreasing the regenerator frequency, though this is not the recommended resolution since decreasing the regenerator frequency would impact the rebuild frequency of all PDTs, not just the one that is consuming resources. My recommendation in this case would be to 1. Improve performance of the PDT if possible 2. Ensure the PDTs trigger is only set to trigger when underlying data in the table has changed. Here are a couple of docs on building performant PDTs:
https://help.looker.com/hc/en-us/articles/360023742593-Identifying-and-Building-PDTs-for-Performance...
https://help.looker.com/hc/en-us/articles/360023726114-Improving-PDT-Performance

Another thing that could impact database performance/ query runtime is if too many PDTs are building at the same time. By default, the Regenerator will only build 1 PDT at a time, unless otherwise specified in the Max PDT Builder Connections setting. It’s important to configure this setting based on available database resources.

Cheers,

Adina