How do you optimise filters?

blue1 · 01-25-2024 11:48 AM

Here is the situation:

We have a very large model as customer would like to join basically all existing tables together.
As with all models, there is a root table that all other tables join to through a tree of connections.

The problem comes when I set up filters for a dimension in a table that isn't root on this explorer. In order to extract a distinct list for the dimension in question, Looker will generate an SQL that parses from the root table all the way to the table in question. This is very inefficient.

I can think of a solution where you would set up an Explore for each and every table in the instance and use those for Filters, but that seems very tedious to set up and to maintain.

Has anyone else struggled with this issue and has a clever solution or any ideas at all?

Thanks, Blue

dsimeonova

Hey @blue1 !

There is a way to cache the results in the filter for specific amount of time (you can align it to your data refresh schedule).
It's not the solution to your issue but still can improve the time of waiting and it will reduce the amount of queries over the distinct values when somebody clicks over the filter.
I don't know if you considered this option but hope it helps.
Please share your thoughts and specifically your concerns why it's inefficient in your case.

Regards, Desi

blue1

Hi @dsimeonova , thanks for your suggestion. Caching happens automatically, the problem is the initial slow load of the filter when it is run for the first time that can be frustrating UX.

The inefficiency comes from the fact that the sql query from the filter doesn‘t access the table directly, but has to ho through the entire Model structure to reach the table.

For example, you have a „Product Explorer“, where product is the root table, orders joins to that table and then users joins to orders. If I add a User Name filter to the dashboard, the sql for the filter will start from product, join orders, and then join users to get the distinct User Name. Instead, the sql should simply access users directly and get a distinct list of User Name.

This is a simple example, of course, but you can imagine a similar case here the table of the filter is 8 joins deep in the model tree. That can make it slow for the user to get the results.

dsimeonova

I am not sure what do you mean when you say "caching happens automatically".
I had on mind this parameter inside of the dimensions: suggest_persist_for | Looker | Google Cloud
This parameter will make some of the times that is being run the filter query to be shorter. So it's not caching policy over the usual queries but only over the filter values.
I don't know if you tried this one to see the impact.

blue1

I see, I did not know about this parameter, thank you! It might help for the second or third load, however, it will not solve the initial runtime being slow.