Solved: How to avoid unnecessary join

shekhar0413 · 05-12-2021 05:14 PM

Hi, I have an explore in which I join a view, say view1 to another view, say view2 like so:

explore: view1{

join : view2{
type: inner
relationship: many_to_many
sql_on: ${view1.id} = ${view2.id} ;;
}
}

Now, when I am in Explore, both view1 and view2 are visible to explore. If I select dimensions and measures only from view1, then the sql generated has no joins which is what I expected. But, if I select dimensions and measures only from view2, the sql generated has a join. I was expecting that since no measure and dimension has been selected from view1 there would have been no join in the resulting sql. Why is a join required when nothing from view1 has been selected, is there a reason looker does this? And are there ways I can define the lookMl to avoid this join?

Thanks for the inputs

IanT

It makes sense to me but maybe just cos I have never had your use case. An explore is usually centred around 1 thing, eg. users, orders, dates…..so having this at the centre is all good.

Also as soon as you add more views you will need to have an order of how these views are joined in otherwise it will start getting complicated. eg. x joins to y unless y not there so it joins to z on different fields.

View solution in original post

IanT

Every explore has a starting table which will be included in every query. You might just have to create another explore for just v2.

shekhar0413

Thanks, however, it seems straightforward enough to figure out if a query includes all measures and dimensions just from a view? Does anyone with in-depth knowledge know why looker does the join, or it’s simply that’s the way it’s implemented. From an end user perspective it doesn’t quite make a lot of sense to do that. If a user sees a view in the joined explore, they’ll naturally want to explore directly from there, instead of having to go to another similar looking explore.

IanT

It makes sense to me but maybe just cos I have never had your use case. An explore is usually centred around 1 thing, eg. users, orders, dates…..so having this at the centre is all good.

Also as soon as you add more views you will need to have an order of how these views are joined in otherwise it will start getting complicated. eg. x joins to y unless y not there so it joins to z on different fields.

jamesnestler

To expand on @IanT ‘s comment, you have to keep in mind how Looker generates SQL in the context of an explore. I often find it helpful to use the framework outlined here since this is very predictable.

Essentially though, think of each individual explore definition as the FROM and JOIN clauses in your SQL. Just like you can join a table to another and not select any fields from the joined table, Looker will let you write LookML that generates equivalent SQL. In an explore definition, the base view (either the view that corresponds with the explore name or the view referenced with `view_name` or `from`) will be the base table (eg the FROM). Views specified in `join: … {}`s are the JOINs.

Using a simple example to illustrate, say we have the following explore definition:

explore: my_explore {
   from: my_explore
   join: second_table {
      ...
      sql_on: ${my_explore.id} = ${second_table.id} ;;
     }
 }

No matter which fields are selected in our front end, Looker will always generate a FROM clause that references `my_explore`. If we select a field from `second_table`, Looker will generate a FROM my_explore JOIN second_table…

If we were to define our explore instead as `explore: second_table` and `join: my_explore`, second_table would always be the from.

shekhar0413

Thank, yes, I understand now that that is what looker does, it will always join regardless, what I was trying to see is if there was any way around it. For the example above, to the end business user, my_explore and second_table appear as two things they can select dimensions etc from. They don’t care, should not care, whether my_explore joins second_table or vice versa. So, they could very well explore my_explore and not have any joins performed, but if they explored second_table there would be joins. So, they would need to go to a separate explore to do that. It’s a minor thing but does not seem intuitive for our use case to say to the business user that if you select something from the first table it’s fine, but if you want to select things purely from the 2nd table go to this or that explore, but that’s what it is. From what I understand, based on the selected dimensions and measures and the lookMl, looker generates the SQL dynamically. When all the columns are from a single view, other than the base view, looker will join with some obvious performance issues, and if there was a fan-out, there could be additional performance costs.