Question

Help for Looker generated UDF approx percentile distinct disc

  • 17 May 2017
  • 3 replies
  • 286 views

Userlevel 2

Hi, I am using Looker 4.12 and Google BigQuery standard SQL. Initially I am seeing just looker_array_sum UDF created by Looker when there is many-to-many join or 1-many join to get unique records but somehow now it is showing one new UDF (approx_percentile_distinct_disc) not sure what it does as not seeing it in final select statement of the look. I serached online but did not find any documents related to this. Can you please help me and is there a way to avoid this UDF to comming up in the SQL?


Following is the syntax cooming up in Look SQL


CREATE TEMP FUNCTION approx_percentile_distinct_disc(a_num ARRAY, fraction FLOAT64)

RETURNS FLOAT64 AS ((

SELECT

MAX(num1)

FROM (

SELECT

row_number() OVER (ORDER BY CAST(REGEXP_EXTRACT(num, ‘\|\|(\-?\d+(?:.\d+)?)$’) AS FLOAT64)) - 1 as rn

, CAST(REGEXP_EXTRACT(num, ‘\|\|(\-?\d+(?:.\d+)?)$’) AS FLOAT64) as num1

FROM UNNEST(a_num) num

)

WHERE

rn = CEIL(ARRAY_LENGTH(a_num) * fraction) - 1

));


3 replies

Userlevel 6
Badge

Hey there.


Right now, if you use symmetric aggregates we emit both symmetric functions and aren’t very smart about which ones are necessary, so no, there isn’t any way to eliminate it.


We could probably do better here and only include the function when it is being used.


We use lots of BigQuery UDFs internally (using sql_preamble: here is an article on bigquery UDFs and Looker ), so we don’t really notice the additional function very much. I can see how you might want it not to exist when not in use.

Userlevel 2

Hi Lloyd,


Any update on this? Also we are planning to move away from UDF as that is decreasing performance of the SQL so wanted to see how to deal with this symmetric aggregate without Looker generated looker_array_sum.


Also can we see the logic behind the looker_array_sum and other looker generated UDFs to see how it is working?


Thanks,

Kunal

Userlevel 6
Badge

In BigQuery all UDFs appear before the query. the looker_array_sum expression should appear on queries that use symmetric aggregates before the query, for example:



CREATE TEMPORARY FUNCTION looker_array_sum(ar ARRAY) AS ((SELECT SUM(CAST(REGEXP_EXTRACT(val, ‘\|\|(\-?\d+(?:.\d+)?)$’) AS FLOAT64)) FROM UNNEST(ar) as val));



Creates the definition for looker_array_sum


Does this make sense?

Reply