New (Looker 7.0+) PDT Build Process

Knowledge Drop

Last tested: Dec 17, 2019
 

Intro

As of Looker 7.0, the PDT build/rebuild process has changed significantly. The intent behind these changes is to harden the build pipeline and prevent some of the harder-to-diagnose PDT build failures.

Process

The old drop/rename procedure:

  1. A build request for a PDT comes in (sql_trigger value changes, a user requests a rebuild derived tables & run, the persist_for expires, etc).
  2. The PDT manager builds the new version of the PDT with a temporary name.
  3. Once the new PDT has built successfully, the PDT manager drops the old PDT and renames the new PDT to the old PDT's name.

The new unique-name procedure:

  1. A build request for a PDT comes in
  2. The PDT manager builds the new version of the PDT with an incremented table name
  3. Once the new PDT has built successfully, the PDT manager points Looker to use this new version of the PDT (via changes to the active_derived_table table in the internal database).

Why?

PDTs can fail to rebuild for a variety of reasons, most of which are extremely difficult to diagnose. One known culprit of PDT rebuild failures is that long-running queries on PDTs can cause table locks that prevent the PDT manager from successfully executing the DROP statement needed in the old rebuild process.

The new rebuild process enables Looker to rebuild a PDT regardless of any database locks as Looker will no longer execute DROP and RENAME statements during the rebuild process.

FAQ

Q. if a PDT rebuild is triggered while a query is running against the now "old" version of that PDT, what happens to that query?
A. the query will complete its run on the old version of the PDT. It will not be interrupted. The next time the query is run against the database, Looker will use the new rebuilt version of the PDT.

Q. Will updating to 7.0 impact existing PDTs? Will all PDTs need to be rebuilt in order to interact with the new rebuild process?
A. Looker will continue to use existing PDTs unless they would organically be marked for rebuild by their triggers. i.e. if some PDT from the old system relies on a SELECT 1 sql_trigger, it will not rebuild upon upgrade; Looker knows to use the old version and will not force rebuilds.

Q. What happens to tables in the database once they are replaced by new ones?
A. Unreferenced, retired tables will be dropped by the reaper process, which runs every 5 minutes. Looker marks unused tables as retired when it detects that there are no queries acting on the table through Looker models.

Q. What if you want to reference a PDT outside of Looker? i.e. you have been relying on semi-stable PDT names, which is no longer possible because the table name for a PDT changes every time
A. You can use the stable views feature. Adding publish_as_db_view: yes to a PDT definition will cause a stably named database view to be created as an additional step in the PDT rebuild process. The name of this view will be in the format [registration_key][model_name][view_name].

This content is subject to limited support.                

Version history
Last update:
‎05-07-2021 09:48 AM
Updated by: