Postgres (P)DT Performance Tip

simonuid
Participant I

I’m currently working with a client where Postgres is used as the data warehouse DB. One of Postgres’ features is that CTEs in queries, i.e. WITH subquery AS ( SELECT ... ) act as ‘optimisation fences’, meaning that, if you reference subquery later in your SQL and include WHERE predicates, e.g.

SELECT * FROM subquery WHERE date > '2019-01-01' AND date < '2020-04-01'  

…the date predicates will not be pushed down into the CTE - Postgres will run the complete CTE regardless of dates, and only apply the outer query to these results afterwards…

Unfortunately, since Looker uses CTEs for §DTs, Postgres’ query performance suffers a lot from this, since your PDT SQL with always be in a CTE and any filters you apply through Looker will not be pushed down.

My workaround is to use the sql_table_name: parameter. Whatever SQL you put in here will be inserted into queries where the table reference would normally go, i.e. an inline subquery, where Postgres will do predicate push-downs.

You need to wrap your DT SQL in brackets so that the query will look like

SELECT * FROM ( SELECT... ) AS subquery

and be syntactically correct.

This works fine for ephemeral (non-persisted) DTs which is all we’re using with this client. It should also work for PDTs, albeit with some extra code, by using a second view for the persistence, referencing the first with ${viewname.SQL_TABLE_NAME} but I haven’t verified it. A similar idea is noted here in the discussion about avoiding CTEs in (P)DTs as they cause nested CTEs and can be brittle: https://docs.looker.com/data-modeling/learning-lookml/derived-tables#monitoring_and_troubleshooting_...

3 0 125
0 REPLIES 0
Top Labels in this Space
Top Solution Authors