Hi. I read the article " Outer Join on False - or: How I learned to stop fanning-out and love the NULL and [Analytic Block] - Pivot by Top X - Introducing bind_all_filters: yes. First of all, thanks for the great articles.
I’m trying to make a model / Explore which:
- Combines multiple (4) existing
PDTsin a way like the first article describes, where the data is not fanned-out by performing “normal” joins between the 4 tables
- Uses non-persistent derived tables (probably native, to use
bind_all_filters), which get pre-filtered before being merged for the final user-facing table. This is for performance reasons; our source tables are > 25TB each, and I’d rather do the joining of these tables after they are relatively small by being already-filtered.
- Create a derived table that finds common time bins (time interval defined by a Liquid parameter) between the 4 tables. Do this by taking the filtered tables, show only the calculated time bin column, and do a union on these 4 sub-tables. Then merge the 4 tables back into this derived table using left joins. This essentially performs a fuzzy
ANDfilter. May provide user the ability to make this an
ORfor certain searches.
- Create a final table which left joins the 4 filtered tables using the common time bins from the table above. Do this by including 2 columns in the common time bin table: the time bin values that are common to all 4 tables, plus a column with the names of each table (one of 4 string literals).
- Return results to the user without re-performing the user’s filter on the final table above. I think I can do this with a
sql_always_whereclause of “
OR TRUE” (without parenthesis) on the final table. Still have to test this.
I’m pretty sure I can do something similar to the above using Liquid variables for every single filterable field (there are like 3 dozen), but I’d rather try doing it with
bind_all_filters, if I can do it in a performance-friendly way.
I’m a bit confused by the
bind_all_filters feature, though. The article states:
Note: A native derived table with
bind_all_filtersis required to be joined in the same explore it came from; i.e.,
This is confusing me. At what “stage” in the source Explore does the
NDT receive its input table? My explore was going to include:
- the 4 filtered tables above,
- maybe the 4
PDTsif the noted
NDTshave to use
explore_sourcetargets contained in the explore (not arbitrary
- the above table containing the timestamp bins common to all 4 filtered tables.
But if the [4+1]
NDTs all get their tables from the single Explore source, and those tables are included in the Explore source, this seems cyclical. Do the
NDTs get the “full” table obtained after doing all the joining in the Explore? And then they just pick the columns that they need? If so, wouldn’t this cause a possible performance loss (tons of superfluous rows per
NDT) by joining those tables into a massive table before applying my custom logic?
- Can anybody clarify the order of operations performed when an Explore contains a view, and that view uses the Explore for its
explore_source? I’ve been meaning to just play with what Looker generates as
SQLwhile playing with such an Explore, but I haven’t been able to dedicate enough time.
- Am I over-worried about doing my join (after which I calculate time bins, group by time bins, and then define this as a temporary table) before filtering the rather-large individual tables?
- Should I just stick with using Liquid variables for this task if my main goal is performance, or does anybody have ideas for how I could accomplish a high-performance version of this using
bind_all_filtersversus Liquid variables?
Thanks so much!!