Incremental PDT's - rebuild entire table nightly

DataChico · 01-31-2022 01:42 AM

Hi All! Does anyone have thoughts on the following use-case for incremental PDT’s?

I have an incremental PDT that refreshes every 30 minutes with an increment_offset of 5 days. Is it possible to set a trigger so that this PDT rebuilds in its entirety once per day? (from the beginning of time, ignoring the increment offset - just like it did on the initial build).

The use case is that customers want to see very recent results during the day, but every night, there should be a ‘true-up’ for late arriving data - and it’s ok that this build takes 5x longer.

IanT

You could:

increment the pdt on the date loaded date rather than activity date so that it only appends new data (you can then cut it down to only 1 increment load rather than 5), you still include activity date for analysis in the pdt - this only works if you have additive measures across days.

or

delete pdt from database forcing looker to rebuilt in full.

DataChico

@IanT Thanks for the response! I don’t think the first option would work because we have data that occasionally changes going back 10+ days - which is why we would want to do an entire rebuild every night.

The second option is interesting. The PDT in the database would need to have a stable name in order for it to be deleted. I could set up a script in Redshift to do the nightly delete. I wonder if it would have unintended effects on Looker if a PDT has simply disappeared from the database...

IanT

Does the data actually change or is it in event format and it comes in “late”?

You should test option 2 but I believe that it will rebuild the table the next time the 30 min trigger is hit and if any queries run it will fall back to the original explore (if an agg table) or it will rebuild the pdt and then run your analysis (so slow for first time IF before that 30 mins)

DataChico

The underlying data would change. For example a ‘location name’ could be changed by a client, and this should be applied to all data. I’ll test option 2 - will update the thread with results!

moebe

Good catch DataChico

I would also like to see a function to recreate the PDT from time to time.

The current solution is fine, but as described here, the current solution is not good enough if you have historical changes.
We currently refresh once a weekend (Manual process where we add "an 1 = 1" to the datagrup, or comment it out again.
This leads to a complete refresh...
Only a workaround and not a solution...

DataChico

@moebe Thanks for sharing your previous post - looks like you’ve spent quite a bit of time with this problem. I wonder if there is any automated way to add the 1 = 1 into the datagroup? There may be a way to do it using the API, but again, a workaround and not a real solution. This should be added to the product roadmap for Looker...

Pablo_Adan

I have an issue where i had a bug with my data source and i will need to rebuild the whole incremental PDT and not finding a way to do that without changing the query, push to production and changing it back and push to production again (which i will do since i don’t find a better way, or i can change the offset for the day where the bug happened (which in my case is one month ago so still better than rebuild the whole thing) but we face a similar issue and we were thinking in a different approach:

issue: some columns in our data source may change e.g.: is_customer_alive boolean 🙂

we can have an update_time on the row that can be used as the incremental key but would that duplicate the object? → I actually don’t know and haven’t tried yet

solution considered → have incremental for the whole table except the column that can change. that column would be in a new view that can be incremental on update_time just with a foreign key and the value needed. Then a sanity check to see how looker is appending the data so if it duplicates the the foreign key, we made another query on that so we make the fk unique (if needed) and that might need a full PDT without incremental or maybe don’t need a PDT at all, depending on the size of your data an how do you use it.

We are about todo those changes/test so can’t confirm if it is a good solution but hopefully can help you there.

Pablo_Adan

just to add to an info, we could rebuilt the full PDT by adding a space to the query…. but couldn’t figure how to full rebuild a view that has incremental PDTs without doing a Pull request

dbrinegar

Would it be possible to split the table in two?

The archive table would rebuild completely each night, with a live (derived) table pulling in only records updated in the current day. A subsequent derived_table could be written to pull only the latest record for the primary key across the two “feeder” tables.

This eliminates incremental scheduling, however depending on the volume/structure/db it could slow things down.

daniel22

Möchten Sie Pflanzkübel online erwerben? Auf Pflanzkuebel7.de können Sie dies problemlos und bequem tun. Unsere Lieferzeiten bewegen sich im Durchschnitt zwischen 5 und 7 Tagen. Haben Sie bereits den perfekten Pflanzkübel ausgesucht? Dann haben Sie die Möglichkeit, die Höhe des Gefäßes nach Ihren Wünschen anzupassen. Zudem stehen Ihnen weitere Optionen zur Verfügung: eine Bodenplatte, Füße oder Räder – Sie haben die Qual der Wahl!

Sobald Sie Ihr ideales Pflanzgefäß gefunden haben, können Sie es direkt heute online erwerben! Ob in Ihrem Garten, auf dem Balkon oder auf Ihrer Dachterrasse – Pflanzkübel sind wahre Alleskönner, wenn es darum geht, Räume mit Blumen und Pflanzen stilvoll zu gestalten und zu dekorieren. Aber wie wählen Sie das richtige Material und die passende Größe für Ihren Pflanzkübel, um Ihre Pflanzen optimal zur Geltung zu bringen? In diesem Artikel teilen wir wertvolle Ratschläge zur Auswahl von Pflanzkübeln, die perfekt zu Ihrer Pergola und Ihren Gartenmöbeln passen.

Für ein stimmiges Gesamtbild in Ihrem Garten oder auf Ihrer Terrasse empfehlen wir außerdem, Gartenmöbel und eine Pergola in Ihre Gestaltung einzubeziehen. Durch die Kombination von Pflanzkübeln, Gartenmöbeln und einer Pergola schaffen Sie eine harmonische und einladende Outdoor-Oase, in der Sie die Schönheit der Natur in vollen Zügen genießen können. Lassen Sie sich von unserer vielfältigen Auswahl inspirieren und machen Sie Ihren Außenbereich zu einem wahren Paradies!

Wenn Sie Ihren Wohnraum in ein echtes Schmuckstück verwandeln möchten, sollten Sie auch das Parkett in Betracht ziehen. Parkett schleifen kann Ihrem Boden neues Leben einhauchen und seine natürliche Schönheit wiederherstellen. Unsere Experten stehen Ihnen gerne zur Verfügung, um Ihnen bei der Renovierung Ihres Parkettbodens zu helfen

Pablo_Adan

Would it be possible to split the table in two?

The archive table would rebuild completely each night, with a live (derived) table pulling in only records updated in the current day. A subsequent derived_table could be written to pull only the latest record for the primary key across the two “feeder” tables.

This eliminates incremental scheduling, however depending on the volume/structure/db it could slow things down.

yep. I would do a incremental PDT with all the fields that don’t change and a PDT (full rebuild) with the columns that change and you join them on the explore level