Question

Developer and Production PDTs in Looker

  • 17 October 2016
  • 2 replies
  • 128 views

Userlevel 3

Is anybody else pulling PDTs dynamically? I am running into this problem because I need to be able to reference PDT content from within database views, which in turn are re-used for both Looker and the ETL process.


Since Looker no longer differentiates between prod. and dev. copies of a PDT with separate hash combinations, it is not trivial how to pull the latest production copy dynamically (Sorry everyone@Looker - I know you don’t want to encourage that, but there is a strong use case).


So far I have the following approaches:


Pull the most stable/oldest record:


SELECT
*
FROM
TABLE_QUERY([project:looker_scratch],
'REGEXP_MATCH(table_id, r"^LR.*table_name$")
AND creation_time = (SELECT MIN(creation_time)
FROM [project:looker_scratch.__TABLES__]
where
REGEXP_MATCH(table_id, r"^LR.*table_name$")
)')


The problem with this is that someone might have built a table in dev. long time ago and never pushed it to production. Under certain circumstances, this table would be the one getting pulled.


@Peter_Krengel suggested an alternative approach: most frequently updated table, or the one that was created at a different time from when it was modified.


The problem with that approach is that a daily-triggered dev table will be written over the next day if it is triggered in dev mode (assuming no changes happen to it). So the timestamps will still be different.


Yet another idea is to use the above + some logic about the time in-between start and end times–ideally, production copies would have the largest in-between duration in most cases.


Others?


Segah Meer

Caura |​ Data Musings


2 replies

Best idea I came up when we were looking at this was to set-up a script which pulls the table names in via the API from a dummy report. We were planning on having the script run at frequent intervals and update database views to always point to the correct table.

Userlevel 3

This might be the only suitable approach for StandardSQL, since the behavior of _TABLE_SUFFIX is not consistent with TABLE_QUERY.


For example, according to Google’s documentation, you would expect something like this to work to pull all tables:


SELECT
*
FROM `project.looker_scratch.*`
WHERE
REGEXP_CONTAINS(_TABLE_SUFFIX,r"^LR.*table_name$")

But it does not. BigQuery gets confused about the schema even for just one table matching the pattern. This is despite the following working perfectly well:


SELECT
COUNT(*)
FROM `project.looker_scratch.*`
WHERE
REGEXP_CONTAINS(_TABLE_SUFFIX,r"^LR.*table_name$")

Bottom line, it appears _TABLE_SUFFIX was designed for date suffix in mind, so all the other use cases have surprising inconsistent behavior.

Reply