Conditions in Join Clauses (3.20+)


Userlevel 6
Badge

Release 3.20 (still weeks away when writing this) will come with the ability to add conditions in Join clauses.



The problem



Tables in many MPP database have a single sort key, that key is often time. Joining to these tables presents a challenge.



The common solution is to add a date condition in the join predicate. Until now that has been difficult in Looker.



For example, you might wish to join orders and emails for a given user over time and compute a conversion rate. Both orders and emails might be indexed on time and otherwise difficult or slow to join.



In SQL



You might write the following query



SELECT 

users.id

, COUNT(DISTINCT order.id)

FROM users

LEFT JOIN orders

ON orders.user_id=users.id

AND orders.created_time BETWEEN '2015-01-01 00:00:00' AND '2015-01-31 00:00:00'

WHERE

users.created_at BETWEEN '2015-01-01 00:00:00' AND '2015-01-07 00:00:00'

GROUP BY 1



In this case, we are looking at the number of orders that happened in january by the users created in the first week of january.



Since the orders table only has an index on time, we need to add a timeframe in order to avoid scanning the entire orders table. Adding:



 AND orders.created_time  BETWEEN '2015-01-01 00:00:00' AND '2015-01-31 00:00:00'



helps the query optimizer figure out how to pull out a subset of the data and join it.



In LookML



In release 3.20, these kinds of query can now be simply expressed in LookML.



The following example starts from the user base view then joins orders. In the explore, you must set two filters, the timeframe of the user creation and the order timeframe you wish to examine. In a non-MPP world, the SQL optimizer could probably use user_id as a key to limit scanning the orders table, but in an MPP world, you would need to set this manually.



- connection: red_look



- scoping: true # for backward compatibility



- explore: users

always_filter:

users.created_date: 30 days

orders.order_date_filter: 30 days

joins:

- join: orders

relationship: one_to_many

sql_on: |

${orders.user_id}=${users.id}

AND {% condition orders.order_date_filter %} orders.created_at {% endcondition %}



- view: orders

fields:

- dimension: id

primary_key: true



- dimension: user_id



- filter: order_date_filter

type: date



# You can still group by whatever time frame you like.

- dimension_group: created

type: time

timeframes: [time, date, week, month]

sql: ${TABLE}.created_at



- measure: count

type: count



- view: users

fields:

- dimension: id

primary_key: true



- dimension_group: created

type: time

timeframes: [time, date, week, month]

sql: ${TABLE}.created_at



- measure: count

type: count



Running the following query:





Yields the following SQL



SELECT 

COUNT(*) AS "users.count",

COUNT(DISTINCT orders.id) AS "orders.count"

FROM users

LEFT JOIN orders ON orders.user_id=users.id

AND (( orders.created_at ) >= (CONVERT_TIMEZONE('America/Los_Angeles', 'UTC', timestamp '2015-03-01')) AND ( orders.created_at ) < (CONVERT_TIMEZONE('America/Los_Angeles', 'UTC', timestamp '2015-04-01')))





WHERE

((users.created_at) >= (CONVERT_TIMEZONE('America/Los_Angeles', 'UTC', timestamp '2015-03-01')) AND (users.created_at) < (CONVERT_TIMEZONE('America/Los_Angeles', 'UTC', timestamp '2015-03-07')))

ORDER BY 2 DESC

LIMIT 500



This is just a simple example, there are many ways to use this feature to optimize queries.


10 replies

Is there a way if we use templated filter it dosen’t appear in where clause along with join clause. I want to put condition in Join only and do not want same condition in  where clause.

 

SELECT
    test.cust_id  AS test_cust_id
FROM  test
INNER JOIN  test_1 ON test.cust_id=test_1.cust_id
LEFT JOIN test_history
     ON test.cust_id=test_history.cust_id AND
    (test_history.country = ‘US)

WHERE  (test_history.country = ‘US)
GROUP BY
    1
ORDER BY
    1
LIMIT 500

 

looking for similar solution, @Manish 183 please let us know if you were able to achieve this

Is there a way if we use templated filter it dosen’t appear in where clause along with join clause. I want to put condition in Join only and do not want same condition in  where clause.

 

SELECT
    test.cust_id  AS test_cust_id
FROM  test
INNER JOIN  test_1 ON test.cust_id=test_1.cust_id
LEFT JOIN test_history
     ON test.cust_id=test_history.cust_id AND
    (test_history.country = ‘US)

WHERE  (test_history.country = ‘US)
GROUP BY
    1
ORDER BY
    1
LIMIT 500

 

Userlevel 6
Badge

This is an unusual pattern, probably difficult for business users to understand at any rate. Suppose you have a users table and you want to know users that never did something. There is a transaction table with a type string field that they might have done.


Almost all SQL dialects support EXISTS and that coupled with {% condition %} blocks can give you most of what you need.


The following snippit will filter the users table on users that never had a transaction of type of filtered by ‘user_did_not’. You can make other filters and include it here.


The pattern is a little odd, I’d recommend building a special purpose explore.


- view: users
fields:
- dimension: id

- filter: user_did_not
sql: |
NOT EXISTS (
SELECT * FROM user_transactions
WHERE ${user.id} = user_transctions.user_id
AND {% condition user_did_not %} user_transactions.type {% endcondition %} )

Is there any way to make this apply for all fields in a table? (Move all statements in the WHERE to the ON part of a join)


We have a situation where we’re doing a FULL OUTER JOIN on two tables, and are interested in the results from both sides, but are also interested in the NULLs. (Think a usage log and customer lookup, and wanting to see customers who have NOT used 😵.


Right now, I’m thinking I have to write a condition for each filterable field on both tables.

Any other ways to accomplish this?

Userlevel 6
Badge

Yep! It should though I haven’t tried. Play with it on http://learnbeta.looker.com . You should have an account.

This is very exciting. Will {% parameter } also work in this case in addition to {% condition } because I have a few use cases where parameter works much better (one date filter against multiple date columns)?

Userlevel 6
Badge

Yes, It should.

Userlevel 3

Will this work in the “sql:” join parameter as well?

This will change our lives in a few tough modeling cases.

Userlevel 3

If you’re not familiar with the {% condition orders.order_date_filter %} orders.created_at {% endcondition %} syntax, it is called a “templated filter”. You can read about them here.

Reply