Is anybody else pulling PDTs dynamically? I am running into this problem because I need to be able to reference PDT content from within database views, which in turn are re-used for both Looker and the ETL process.
Since Looker no longer differentiates between prod. and dev. copies of a PDT with separate hash combinations, it is not trivial how to pull the latest production copy dynamically (Sorry everyone@Looker - I know you don’t want to encourage that, but there is a strong use case).
So far I have the following approaches:
Pull the most stable/oldest record:
AND creation_time = (SELECT MIN(creation_time)
The problem with this is that someone might have built a table in dev. long time ago and never pushed it to production. Under certain circumstances, this table would be the one getting pulled.
The problem with that approach is that a daily-triggered dev table will be written over the next day if it is triggered in dev mode (assuming no changes happen to it). So the timestamps will still be different.
Yet another idea is to use the above + some logic about the time in-between start and end times–ideally, production copies would have the largest in-between duration in most cases.
Caura | Data Musings
Best idea I came up when we were looking at this was to set-up a script which pulls the table names in via the API from a dummy report. We were planning on having the script run at frequent intervals and update database views to always point to the correct table.
This might be the only suitable approach for StandardSQL, since the behavior of
_TABLE_SUFFIXis not consistent with
For example, according to Google’s documentation, you would expect something like this to work to pull all tables:
But it does not. BigQuery gets confused about the schema even for just one table matching the pattern. This is despite the following working perfectly well:
Bottom line, it appears
_TABLE_SUFFIXwas designed for date suffix in mind, so all the other use cases have surprising inconsistent behavior.