Variable to use in a Query

Hi,

I would like to create or define a environment variable/s to add in a query , this variables should be get from a table. 

I mean:

variable last_day_dt_Madrid= (select max(day_dt) from table_raw where city=’Madrid’ )

variable last_day_dt_Barcelona=(select max(day_dt) from table_raw where city=’Barcelona’ )

select city,  count(*)

from table_raw

where  ( city = ‘Madrid’ and date_dt = ${last_day_dt_Madrid} ) 

or         ( city = ‘Barcelona’ and date_dt = ${last_day_dt_Barcelona} ) 

group by 1

table_raw is a very huge table partitioned by date_dt in BQ.

I am tested to include the max date as a join of PDT table with the max date_dt values by city but the performance it’s not good because in this way it don’t use the partition field (date_dt).

Anyone can help me?

0 9 1,280
9 REPLIES 9

I think you would have to use derived_table to return MAX for each city and then use it in your JOINs

or alternatively you can use CTEs to first get the MAXs and add it to each raw in your output table

Thanks Dawid.

I have tested use a derived_table to return MAX for each city and use it JOINs, but the performance it’s not good.

The output table is a very big huge table it’s not possible to add a new column.

Is there any option to use Looker variables to catch values from a table?

How did you test it? You still need to return the MAX somewhere in order to use it

I created a PDTs with the MAX load_date for each city. Then I use this PDT as a JOIN in the huge table. , in this way the performance is not good (full scan against the fact table).

The best solution should be first to get the MAX values in variables (not store in a table), and then compound dynamically the SQL with these variables, in this way the performance will be good because will use the partition field.

But in order to store it in a variable you still need to retrieve it. Looker doesn’t have a concept of variables. The derived table and a dimension basically acts like a variable through the JOIN. The performance could take a while upon first query, then you could set caching that would be much much quicker but that depends on your data and how often the MAX would change

MAX values should change frequently, each week.

Using PDTs is as this:

select city,  count(*)

from table_raw -- *** partition by date_dt

left  outer join pdt

on table_raw.city = pdt.city and table_raw.date_dt = pdt.max_date_dt

group by 1

The performance is not good:thumbsdown_tone3: , make a full scan of the fact huge table, it’s spend a lot of time.

select city,  count(*)

from table_raw

where  ( city = ‘Madrid’ and date_dt = ‘2020-03-01’ ) 

or         ( city = ‘Barcelona’ and date_dt = ‘2020-03-03’ ) 

group by 1

In this way the performace will be good:ok_hand_tone3: because use the partition (date_dt)

The values 2020-03-01 and 2020-03-03 could be calculated previously in any way? parameter? variable? constant? another option to compound this filter dinamically in this way? 

I meant to do PDT only to store CITY + MAX and use it as a different view file, that has a cache of 7 days, and that joins to your huge table (either in SQL or as a Looker JOIN in the explore definition)

That’s i did:

view: pdt_last_date_city{   derived_table: {     datagroup_trigger: seven_days     sql: select city, max(date_dt) max_date_dt from table_raw group by city

       ;;   }

This view its used to JOIN against the fact table:

    join: pdt_last_date_city{       relationship: many_to_one       sql_on: ${table_raw.city} = ${pdt_last_date_city.city}

      and  ${table_raw.date_dt } = ${pdt_last_date_city.max_date_dt } ;;     }

The SQL that executes is like this

select city,  count(*)

from table_raw

left  outer join pdt

on table_raw.city = pdt.city and table_raw.date_dt = pdt.max_date_dt

group by 1

the performance in this way is not good, make a full scan against the huge fact table, don’t use the partition field (date_dt).

I would like to compound a SQL like this (it is the best performance):

select city,  count(*)

from table_raw

where  ( city = ‘Madrid’ and date_dt = ‘2020-03-01’ ) 

or         ( city = ‘Barcelona’ and date_dt = ‘2020-03-03’ ) 

group by 1

Hi @JMJ1,

Were you able to find a solution to your problem?


Regards

Top Labels in this Space
Top Solution Authors