Knowledge Drop

Parallel PDTs (Concurrent PDT Builds)


Userlevel 2

Last tested: Feb 15, 2021
 

As of 7.8, you can configure Looker to build multiple PDTs at a time, per connection. Prior to 7.8, the regenerator is only capable of building one PDT at a time, per connection.
Note: Parallel PDTs only applies to PDT builds that are initiated by the regenerator. Builds of persist_for PDTs, dev mode PDT and PDTs build via "Rebuild Derived Tables & Run" are all initiated by an explore thread, and so the will be Build consecutively, not concurrently.

How to enable parallel pdts

Admins can set the number of concurrent builds allowed for their connection in the Max PDT Builder Connections section of the connection settings. This setting has a default of 1 and cannot exceed the Max Connections value. See "How to Optimize" section below for guidance on setting the Max PDT Builder Connection value.

Screen Shot 2020-05-11 at 10.37.04 PM.png

Note: Users should confirm their database can handle the increased load before increasing the Max PDT Builder Connections setting.

How it works

Under the hood, Looker builds a graph of PDT dependencies. When the regenerator process runs for a connection, it will kick off multiple PDT builds at a time, starting with PDTs at the lowest level of the graph, ensuring dependencies are built first.

The benefits

  • Increasing the number of PDT builds at a time will allow the regenerator process to complete more quickly, thereby bringing end users up to date data sooner.
    Note: The overall regeneration time will be decreased, however this does not decrease the build time of any one specific table.
  • Decreased regeneration time will allow for more frequent datagroup trigger checks.
  • Concurrent PDT builds will decrease the likelihood that a PDT with a long build time will delay the rebuild of other PDTs that are triggered for rebuild.
  • Takes advantage of a database's ability to build tables concurrently

How to optimize

If you have an idea of how many concurrent builds your database performs during its ETL process, you can use that number as a starting point for the Max PDT Builder Connection setting and make adjustments as needed. If you're not sure how many concurrent builds your database can handle, we recommend starting low- first set Max PDT Builder Connection to 2 and test. Once you've verified that the database can handle the increased load, repeat the process, increasing the Max PDT Builder Connection by only 1 each time.

Q&A

Q: With concurrent builds, will Looker build PDTs in the same order every time?
A: No. Looker will always begin at the bottom level of the dependencies graph, building dependencies first; however, beyond that the order in which builds on are kicked off is random.

Q: Does this setting apply to datagroups as well?
A: No. Datagroups will still be checked consecutively, not concurrently.

Q: I have the PDT concurrency set to 1, but I see three tables being built by the Regenerator at the same time. How does that work?
A: In a clustered instance, we would have the limit on each node. So, while the concurrency limit is 1, if we have three nodes, then we could have three pdts rebuild on one connection at the same time.

 

This content is subject to limited support.                

 

 

 


0 replies

Be the first to reply!

Reply