We load data into certain Snowflake tables daily. I want to update a PDT only when both tables have been loaded for the day. To update the PDT when either table has been loaded, I use
select count(*) c from TABLE1
select count(*) from TABLE2
(And other solutions exist.) But how do I update the PDT when both tables have been loaded? Assume that at least one of the tables has no column whose value can be used for this purpose (like a timestamp).
Good question! There are at least two ways that we could get a PDT trigger that only triggers after two different tables in the DB have been updated.
Alternatively, we could a setup like below if Table load times are not available to us:
To break this down, the master_table represents your main PDT (i.e. the PDT you’d like to trigger only after two subordinate tables have been updated). The idea is that if pdt_one and pdt_two both become larger than prime_one_prime and pdt_two_prime, which are something like “snapshot” tables in this case, then we force a rebuild of the master_table. Note that pdt_one and pdt_two will rebuild at the rate you specify and in this example, they’re rebuilt everyday using
SELECT current_date. The master table being rebuilt will then trigger to the prime tables being rebuilt, which then gets our loop started back up.
Hope that help provides some options regarding getting this PDT triggered accordingly!