PDTs in Looker - Useless outside of Looker?!


Userlevel 1

PDTs are great for Looker users, but our heavy data users/early company employees/founders are married to their own SQL queries where they can do all kinds of crazy ad-hoc reporting.


Since PDTs are stored with a hashed names in our database, those users can’t reference Looker’s PDTs. The same people are worried that building out PDTs in Looker instead of our database makes us more dependent on Looker. Therefore, the executive team has mandated that we either



  1. do not hash the PDT names anymore or

  2. move all PDTs out of Looker and into our database.


Maintaining PDTs in Looker is easier for me. Can anyone help me figure out how I can go with solution 1?


Thanks,

Lucas


35 replies

Userlevel 2

I think this has been shared elsewhere, but for folks looking in this thread, here’s a workaround we use right now:


view: your_pdt_name {
derived_table: {
create_process: {
sql_step: DROP TABLE if exists looker_scratch.your_pdt_name CASCADE ;;
sql_step:
CREATE TABLE looker_scratch.your_pdt_name AS
SELECT
... ;;
sql_step: CREATE VIEW ${SQL_TABLE_NAME} AS
SELECT * FROM looker_scratch.your_pdt_name ;;
}

sql_trigger_value: SELECT date_trunc('week', now() at time zone 'America/New_York') ;;
}
}
Userlevel 7
Badge +1

The upshot of the work is that PDT table names will be stable and readable, instead of including a long hash that changes often. Thus, they’ll be able to be referenced reliably outside of Looker.

Userlevel 7
Badge +1

Ohh that’s amazing. I’ve been using Looker’s PDTs in standard SQL every now and then. That will make my life much easier!

Hi all!


Great news! Looker will be officially supporting this use case in 7.0 (January), via a new LookML parameter called publish_as_db_view: yes. Enabling this on a PDT will cause Looker to create a view table (in addition to the PDT) with a consistent name in your database’s scratch schema, so that you can reliably query it via SQL from outside of Looker.


The view table will be automatically defined as SELECT * FROM pdtname, where “pdtname” is the table name of the actual, materialized PDT. As a user you won’t have to worry about managing this; Looker will automatically update the FROM clause whenever the underlying PDT table name changes.


I should also point out that, with this new feature, we’ve also upgraded the way that PDTs are built so that they no longer need DROP and RENAME. In this upgrade, any direct, hard-coded references to PDTs will break. We strongly encourage you to start using the new view tables instead!


Feel free to DM me with any questions. Full documentation will be published in January when the feature releases.


Regards,

Kevin Marr, Product Manager @ Looker

It’s great that stable PDT names is now released, but unfortunately the table names are pretty unwieldy. The data team looks pretty foolish when have to tell people that the source of truth for flights data can be found in “tmp.LB_faa_e_flights_pdt”!


If you’re looking for a more robust solution for creating and managing datasets that can then be shared with the rest of the organisation, check out Dataform.

Userlevel 7
Badge +1

Dan, would be cool to see you share some of the ways you all integrate Dataform and Looker! Do you use PDT’s at all, or entirely dataform-driven? Something for another thread, maybe 🙂 .

We typically use Dataform to manage all of the batch data processing, and then point Looker at the outputs of that batch processing pipeline. Because Looker and Dataform both read/write from the warehouse, integrating the two comes “for free”.


I wrote this blog post to explain the process of moving from Looker PDTs to the Dataform framework, happy to answer any questions if you have any: dan@dataform.co.



Hi,

Good question, I had exactly the same question in my project.

From my point of view, PDT are great for Looker usages. If you need a PDT for other usage than Looker, the PDT must be transformed in a ‘’real’’ view directly in the table. Looker is great, but for me you can not create PDT / views from Looker for other applications / usages.

Best regards,

Userlevel 2

Update:

The publish_as_db_view parameter allows you to flag a PDT for querying outside of Looker. For PDTs with publish_as_db_view set to yes, Looker creates a stable database view on the database for the PDT. The stable database view is created on the database itself, so that it can be queried outside of Looker. Check out our docs for more info:
https://docs.looker.com/reference/view-params/publish_as_db_view?version=21.8

someone brought this up a few years ago, but i havent seen any updates on this topic so pasting below:

Is it possible to reorder this PDT naming convention?

This naming convention is inconvenient for querying outside of the Looker environment. For example, in BigQuery it is possible to query tables using a wildcard, however, the wildcard can only be used at the end of the table name:

Works: SELECT * FROM <project_id>.<dataset>.<pdt_name>_LR_*

Doesn’t Work: SELECT * FROM <project_id>.<dataset>.LR_*_<pdt_name>

publishing the pdt as a db view does not work in the case where the pdt is generated using “persist for” in the derived table lookml

Reply