Rebuilding only a portion of PTD

Hi all,

Is it possible to rebuild only a portion of a PDT based on a key? What I need is different from an incremental rebuild in the sense that I don't just need to append the data, but overwrite the entire section.

I have college enrollment data where each row is an application. The application gets added as soon as it's created, but it gets updated in the data through a period of time (different for each semester). My question is, is there any way to rebuild a PDT and update the scratch schema ONLY for the terms that have an end date after the current date (current and subsequent semesters). We have years of historical data that no longer needs to be rebuilt, but with our existing architecture gets overwritten daily, which is extremely computationally expensive. 

As a side note, we have multiple cascading PDTs, and the intended program start term is one consistent variable throughout all the tables, so it is ideal to rebuild all the PDTs based on that term variable (or any associated dates - such as term end date), not on when the application was started or modified. 

Any help or resources would be appreciated!

 

0 2 122
2 REPLIES 2

Looker's "incremental" PDT feature is actually implemented as MERGE or a DELETE+INSERT query, depending on the dialect, so in principle I believe you should be able to use the feature for this purpose, as long as there is a timestamp column associated with those keys which Looker can use to determine where that cutoff is. (In your case, the updated date seems to fit this logic)

https://cloud.google.com/looker/docs/incremental-pdts

@fabio1 thank you for your reply, it makes sense. My only concern is if I base the rebuild on updated date I would not be able to rebuild the records for the entire term, but a set range of dates from the updated date. Because of cascading PDTs that are aggregated by term, I need all of these PDTs rebuilt for a specific term. So I can potentially set the base PDT to be updated based up the new data ingestion date, but the PDT that aggregates the data needs to reaggregate the records for the entire term.  I am struggling to understand how to make it do that only for current and future terms. Since the records for the previous terms do not get updated, there is no need to rebuild that portion of the table (which is most of our data). I am not sure how to associate a date with each term's record that changes every day, to use as increment key and indicate which terms need to be update. I am not even sure if it's possible, but I feel like am just missing something crucial and there should be a way to make it work.

Top Labels in this Space
Top Solution Authors