BigQuery UDFs in Derived Tables (or sql_preamble in Native Derived Tables)

axjia
Participant I

I’ve developed some BigQuery UDFs (to remove non-business time between two timestamps) that we’re currently leveraging through sql_preamble in our explores, but I can’t seem to find a way to use them in a PDT – or a derived table of any sort for that matter.

Use case is we need to pre-aggregate timestamp differences with non-business time removed so that we can join them back in.

  1. Tried Native Derived Tables, but sql_preamble doesn’t appear in the query, which prevents us from using the UDFs.

  2. Tried copy-pasting the UDFs directly into a Derived Table, but since derived tables are generated as subqueries, the UDFs end up inside of the subquery…triggering a “Syntax error: Expected “(” or keyword SELECT or keyword WITH but got keyword CREATE”

Is there a better solution? Not much documentation for sql_preamble out there, so not sure if there’s a workaround I’m unaware of.

Thanks!

1 7 2,704
7 REPLIES 7

axjia
Participant I

Okay, did more experimenting:

  1. Got the 2nd solution (copy-pasting UDFs directly into a derived table) to work by adding persistence – when materializing the view, it isn’t put into a subquery. Which is nice.

  2. Also got it working with sql_create – nothing fancy, so basically the same thing as a PDT at that point.

What I still don’t have a good answer for is how to reuse BigQuery UDFs defined in sql_preamble (or any other standardized util file for that matter) across both explores and different PDTs. CTRL-C+V is an acceptable short-term solution, but not scalable in the long-term if we’re trying to stay DRY!

Hi Alex!

My first thought on reading this is to create a view called UDF_empty where you define your UDF (copy and paste into a derived table, as you mentioned). You can then create an explore based on that view with the parameter extension: required where you can define your sql_preamble.

Then for each explore where the UDF is req’d, either extend the base explore or join the view file with sql_on = 1=1.

Hopefully this is a more scalable solution.

You mentioned defining the UDF direcly in the preamble. In order to test if this is possible please email your code to help.looker.com and we can help figure this out.

Many thanks,
Bernard Kavanagh

axjia
Participant I

Hi Bernard,

Just to clarify – I’ve already followed the process outlined in this thread ([Analytic Block] Weighted Medians in BigQuery with UDFs) to define UDFs and leverage them in defining dimensions / measures!

The use case here is not leveraging UDFs in an explore – it’s leveraging UDFs in PDTs. As far as I know, we can’t extend an explore in a view, nor can we use sql_on inside of the derived_table clause of a PDT.

With this in mind, should we still be going down the sql_preamble in a util explore (which is what we have) or a util view that’s joined on 1=1?

It is ugly, but you can do this by putting all the functions in a view and using ${foo.SQL_TABLE_NAME}. Here is a working code example.

The sql_preamble has been really difficult, especially when referencing derived tables from other derived tables and making native derived tables work. Sorry this is so complex.

connection: "bigquery_publicdata_standard_sql"

view: library {
    sql_table_name:  
    CREATE TEMP FUNCTION MEDIAN(a_num ARRAY<FLOAT64>)
    RETURNS FLOAT64 AS ((
       SELECT
          AVG(num)
        FROM (
          SELECT
            row_number() OVER (ORDER BY num) -1 as rn
            , num
          FROM UNNEST(a_num) num
        )
        WHERE
          rn = TRUNC(ARRAY_LENGTH(a_num)/2)
            OR (
             MOD(ARRAY_LENGTH(a_num), 2) = 0 AND
              rn = TRUNC(ARRAY_LENGTH(a_num)/2)-1 )
    ));
    ;;
}

explore: median_test {}
view: median_test {
  derived_table: {
    persist_for: "24 hours"
    sql_create: 
      ${library.SQL_TABLE_NAME}
      
      CREATE TABLE ${SQL_TABLE_NAME} AS 
        SELECT MEDIAN([1.0,2,3,4,5,9,10]) as median

    ;;
  }
  dimension: median {type:number}
}

axjia
Participant I

Oh man, that’s a heck of a workaround!

I gave it a shot – it works great for derived tables, but doesn’t work in the sql_preamble for some reason. All in all though, maintaining the util functions in two places (util view and util explore with preamble) rather than x places is a win in my book. Thanks 🙂

Just for reference, are there any plans or thoughts to either expand sql_preamble or deprecate it?

The need for sql_preamble is really a BigQuery only feature and a pretty advanced one at that. Most other data abases just let you install UDFs directly on the server. Just curious how many folks are suffering this? sql_preamble works reasonably well for explores and native derived table (even persistent ones). SQL persistent derived tables seems to be the only spot that they fall down.

All that said, I’d love not to see the functions everytime I looked at the SQL for a query (but then again, it is really nice to be able to cut and paste into the sql_runner/bigquery console and have the complete query).

Good news! You can just create the function once and it will persist! You could probably even run the create statement as the Looker BQ user via the SQL runner and it should just work. So, no more sql_preamble necessary!

What’s happening in BigQuery: New persistent user-defined functions,...

The latest releases from BigQuery include new persistent user-defined functions (UDFs), increased concurrency limits, new GIS and encryption functions, and more.

Top Labels in this Space
Top Solution Authors