The PDT Reaper

Knowledge Drop

Last tested: Dec 12, 2019

The reaper is the process in Looker that is responsible for dropping non-active PDTs from the scratch schema.

Schedule:


Unlike the regenerator, there is only one reaper thread total for all connections (the reaper thread operates on the master node, in cases of a cluster). The reaper operates at most every hour, with respect to the cron specified in the PDT Maintenance Schedule of the connection settings.

The process:


The reaper uses the active_derived_tables table in the internal database to determine which tables it should drop from the scratch schema. Before the reaper drops any tables, it will update the active_derived_tables table.

Updating the active_derived_tables table:

The reaper first acquires a list of all derived tables in use in production-built LookML and then deletes rows from the active_derived_tables table for any tables not in that list.
Next, the reaper gathers a list of all tables that currently exist on the scratch schema and then deletes rows from the active_derived_tables table for any tables that are not in that list.
Next the reaper will remove rows from the active_derived_tables table for any persist_for PDTs that have expired.
After this, the active_derived_tables table is up-to-date with what active tables live on the scratch schema.

In its updated state, the active_derived_tables table contains a list of all active tables that exist on the scratch schema. Note that this table does not include tables that should exist on the scratch schema but don't for some reason (i.e. a table that hasn't been built yet or was dropped by a user or by error).

Now that the active_derived_tables table is up to date, the reaper is ready to begin dropping tables.

Checking reg_key and dropping tables:

For all tables on the scratch_schema, the reaper first checks to see if the reg_key is valid. The reg_key is the 2 letters following the LX$ in the table name, where X= C or R. The connection_reg_r3 table on the scratch_schema contains a list of all valid reg_keys.

If the reg_key is valid:

  • If there is a row for this table in the active_derived_tables table, the table will not be dropped.
  • If there is not a row for this table in the active_derived_tables table, the table will be dropped.

If the reg_key is not valid, the reaper checks the connection_reg_r3 table to see if this reg_key exists in that table.

  • If there is a row in the connection_reg_r3 table corresponding to this tables reg_key, the table will not be dropped.
  • If there is not a row in the connection_reg_r3 table corresponding to this tables reg_key, the table will be dropped.

Enemy Reaping


The reaper typically only reaps tables for its Looker instance - it can identify the instance a PDT belongs to based on the instance hash in the PDT table name.
Under very specific circumstances, two instances can have the same instance hash and the reaper for one instance will reap PDTs that are active on another instance. This is known as enemy reaping and can cause problems with the reaping process.

This content is subject to limited support.                

Comments
zdenek_hanzal-1
Bronze 1
Bronze 1

Hello,

can we have more details on this part?
“The reaper operates at most every hour”
if we have the cron expression set to run every hour and sql trigger aligned to the same time (changing value when hour is switched)
when would the reaper process remove all those expired PDTs from database, few minutes after the full hour or it has up to one hour after the regenerator process is executed and table got expired status?

Thank you

Version history
Last update:
‎05-07-2021 09:10 AM
Updated by: