When to use persistence (PDT)

Knowledge Drop

Last tested: Feb 21, 2020
 

At a high level, persisting a derived table is useful for reducing database load and increasing query speed - querying a table that exists in the database schema can be much faster than composing a table in a CTE or temporary table and then using it in a query.

For example, suppose we want to report on customers by cohort based on when they made their first order. This query might be expensive in real time, but by using a PDT the query can be calculated once and reused.

Additionally, the PDT functionality makes it very simple to test different indexing, distributions, and other options without needing a large amount of support from your DBA or ETL developers.

 

When NOT to use persistence:

There are several situations where a PDT might not be the best choice.

  • When a derived table is being extended - an extended PDT will create a new copy in the scratch schema for each extension
  • When templated filters or Liquid parameters are being used - there are potentially infinite numbers of possible user inputs with these features, so the number of PDTs built in the database could become unmanageable.
  • When the underlying data is changing frequently and a “snapshot in time” is not sufficient. If the data is getting fed from a transactional system on a nightly basis, then persisting might make sense, but if data is being fed continuously, a PDT will quickly become out of date.
  • If the cost and time involved in building PDTs often enough to be up-to-date is too high. At some rebuild frequency, the database could end up spending more time rebuilding the PDTs rather than serving useful queries. In these cases, customers should instead consider having the ETL processes create this table using an incremental update strategy.

Source: Docs

This content is subject to limited support.                

Version history
Last update:
‎05-07-2021 09:49 AM
Updated by: