We’re excited to share that we now have three new API endpoints available for PDT management starting with Looker version 22.2. These new API endpoints provide finer-grained control over PDT management and pave the way for powerful integrations with workflow orchestration tools.
With these new endpoints, developers can use the API (or API Explorer) to:
Force a full rebuild of an incremental PDT, either on-demand or on a schedule
Start a PDT build without running a query against it or waiting for the Regenerator to kick off a build the next time the table is triggered
Build a PDT that takes longer than an hour to materialize in development mode (PDTs that take longer than an hour to build cannot be built in an Explore)
This new functionality is gated behind the “develop” permission, so a user would need to have this permission in order to use the API for PDT management.
In order to manage PDTs via the API, a toggle called Enable PDT API Control needs to be enabled in the connection settings. This is a new checkbox that will appear at the bottom of the settings page for any existing or new database connection starting in Looker version 22.2.
To provide more robust PDT scalability, we now have three new API endpoints that can be used for orchestrating PDT transformation processes outside of Looker.
The start endpoint can be used to start building a specified PDT based on the model name and view name that you input. It will also generate a materialization_id for the PDT build job.
GET /api/3.x/derived_table/:model_name/:view_name/start
OR
GET /api/4.0/derived_table/:model_name/:view_name/start
We know many customers have been looking for a way to rebuild incremental PDTs in full. The force_full_incremental parameter can be used for this by simply setting this parameter to true. When this parameter is set to false, which is the default behavior, an incremental PDT will rebuild incrementally based on the definition set in the LookML.
The stop endpoint can be used to cancel the materialization of a PDT. This requires input of a materialization_id to ensure cancellation of the appropriate build job (which can be retrieved when starting a PDT build via the API).
GET /api/3.x/derived_table/:materialization_id/stop
OR
GET /api/4.0/derived_table/:materialization_id/stop
The check status endpoint can be used to check materialization status for a given PDT.
GET /api/3.x/derived_table/:materialization_id/status
OR
GET /api/4.0/derived_table/:materialization_id/status
Because a materialization ID is required as an input for the stop and check status API calls, only PDTs that have been started via API can be monitored or canceled via API at this time.
The behavior of PDT builds that are triggered using the API will be consistent with builds triggered manually using the “Rebuild Derived Tables & Run” option in the Explore dropdown menu.
When a PDT is rebuilt via the API, the last triggered time will be updated to the API build time. This means that:
The clock for PDTs using persist_for or interval_trigger
will start (or restart) based on the API rebuild time
PDTs with a sql_trigger_value
or datagroup_trigger
will consider the API materialization to be a forced rebuild and will compute a new trigger value at the start of the build.
When you use the “Rebuild Derived Tables & Run” functionality to manually rebuild a PDT with dependencies, all upstream PDTs that it depends on will be rebuilt automatically. Rebuilding using “Rebuild Derived Tables & Run” will not force rebuilds of downstream PDTs.
This rebuild behavior is slightly different when forcing a rebuild via the new API functionality. A PDT directly requested via the API will be rebuilt, but any upstream PDTs that it depends on will not be rebuilt automatically. In order to rebuild a PDT and its dependencies, the force_rebuild
parameter must be set to true when the build request is made. Note that neither using "Rebuild Derived Tables & Run" from within Looker nor rebuilding via the API with the force_rebuild
parameter will force rebuilds of downstream PDTs.
In order to complete a full rebuild of an incremental PDT and force rebuilds of any PDT dependencies, you would need to set the following parameters:
force_full_incremental=true
force_rebuild=true
In this situation, the target PDT will be rebuilt in full and any upstream PDTs will be rebuilt according to their LookML definitions (i.e. upstream incremental PDTs will be triggered to rebuild a new increment but would not fully rebuild).
For derived tables that will be managed primarily outside of Looker, implementing a strategy that persists the table without triggering the regenerator from periodically rebuilding it is recommended. There are a few ways in which this could be accomplished:
Using single-firing SQL trigger statements, such as SELECT 1
Setting a long persistence time period using persist_for
or interval_trigger
Using a SQL trigger like SELECT 1 will ensure that the internal regenerator never kicks in and builds the table, but it will still ping the database regularly. Using persist_for
or interval_trigger
would avoid regular database pings, but if the set timeframe is reached, then the table would be dropped or rebuilt.
The Looker team is currently working on adding a new PDT parameter to indicate if a PDT is externally managed. Once available, adding this new parameter to a derived table will enable you to persist that table without setting a persistence strategy inside of Looker.
An easy way to get started with these new endpoints is by using the API Explorer. Check out the parameters available and even try using the API to trigger a rebuild! You’ll be able to see build status in-product in the Queries section of the Admin Panel. You will also see information for each table in the Persistent Derived Tables section of the Admin Panel.
And these new endpoints are just the beginning! We have new integrations with Apache Airflow and Cloud Composer coming your way soon to provide even more scalable mechanisms for monitoring and managing PDTs. Stay tuned!
Please note: these API endpoints will work with persisted derived tables defined using derived_table LookML syntax, but they will not work with aggregate tables defined using the aggregate_table LookML syntax.