Question

Concise SQL

  • 29 June 2017
  • 1 reply
  • 70 views

A brief perception / feature request:


Building explicit SQL queries is one of the awesome features of Looker - we can understand the mechanics of the query, and use the generated query elsewhere. No magic / black box.


But some of the BigQuery SQL generated is becoming gnarly + difficult to parse as a human; is this really necessary:


CREATE TEMPORARY FUNCTION looker_array_sum(ar ARRAY<STRING>) AS
((SELECT SUM(CAST(REGEXP_EXTRACT(val, '\\|\\|(\\-?\\d+(?:.\\d+)?)$') AS FLOAT64)) FROM UNNEST(ar) as val));
CREATE TEMP FUNCTION approx_percentile_distinct_disc(a_num ARRAY<STRING>, fraction FLOAT64)
RETURNS FLOAT64 AS ((
SELECT
MAX(num1)
FROM (
SELECT
row_number() OVER (ORDER BY CAST(REGEXP_EXTRACT(num, '\\|\\|(\\-?\\d+(?:.\\d+)?)$') AS FLOAT64)) - 1 as rn
, CAST(REGEXP_EXTRACT(num, '\\|\\|(\\-?\\d+(?:.\\d+)?)$') AS FLOAT64) as num1, COUNT(*) OVER () AS total
FROM UNNEST(a_num) num
WHERE num IS NOT NULL
)
WHERE
rn = CEIL(total * fraction) - 1
));
-- [a reasonable 8 line query]

WHERE ((((cast(Equity__ts.date as timestamp) ) >= ((TIMESTAMP(CONCAT(CAST(DATE_ADD(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS DATE), INTERVAL -4 YEAR) AS STRING), ' ', CAST(TIME(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS TIMESTAMP)) AS STRING))))) AND (cast(Equity__ts.date as timestamp) ) < ((TIMESTAMP(CONCAT(CAST(DATE_ADD(CAST(TIMESTAMP(CONCAT(CAST(DATE_ADD(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS DATE), INTERVAL -4 YEAR) AS STRING), ' ', CAST(TIME(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS TIMESTAMP)) AS STRING))) AS DATE), INTERVAL 5 YEAR) AS STRING), ' ', CAST(TIME(CAST(TIMESTAMP(CONCAT(CAST(DATE_ADD(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS DATE), INTERVAL -4 YEAR) AS STRING), ' ', CAST(TIME(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS TIMESTAMP)) AS STRING))) AS TIMESTAMP)) AS STRING)))))))) AND ((((Equity__latest.date ) >= ((TIMESTAMP(CONCAT(CAST(DATE_ADD(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS DATE), INTERVAL -4 YEAR) AS STRING), ' ', CAST(TIME(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS TIMESTAMP)) AS STRING))))) AND (Equity__latest.date ) < ((TIMESTAMP(CONCAT(CAST(DATE_ADD(CAST(TIMESTAMP(CONCAT(CAST(DATE_ADD(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS DATE), INTERVAL -4 YEAR) AS STRING), ' ', CAST(TIME(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS TIMESTAMP)) AS STRING))) AS DATE), INTERVAL 5 YEAR) AS STRING), ' ', CAST(TIME(CAST(TIMESTAMP(CONCAT(CAST(DATE_ADD(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS DATE), INTERVAL -4 YEAR) AS STRING), ' ', CAST(TIME(CAST(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AS TIMESTAMP), YEAR) AS TIMESTAMP)) AS STRING))) AS TIMESTAMP)) AS STRING))))))))
GROUP BY 1
ORDER BY 1
LIMIT 5000

One of the functions at the top isn’t used, I don’t know why the other is needed to sum some floats. And the WHERE clause feels monstrous to ask whether a date was in the past five years. When I’m getting the same data by writing a SQL query by hand, it’s ~80% smaller.


Of course, if there’s something that we’re doing that could be causing the generator to spew, let us know and we can adjust


1 reply

Userlevel 6
Badge

Yes, I agree, we should do better here. Filing some bugs.

Reply