Sort by Pivots, Rank with Other Bucket, and Percentile Tail Analysis

  • 22 June 2022
  • 0 replies
  • 862 views

Userlevel 5

Looker will not be updating this content, nor guarantees that everything is up-to-date. 

The Problem


A common issue users run into while modeling is sorting by pivots. For example, I might want to pivot my top 10 brands by total sales.

The most elegant solution I've found happens to work well for several other use cases. In this article I'll show how we can satisfy each in turn, by progressively adding to an ephemeral derived table.
 

The Solution


In this example we'll use a simple e-commerce model. Here I've pulled my top brands by revenue:


Now, if I'd like to see how these top 10 perform over time, a simple pivot won't help. It will sort alphanumerically by default:


I would have to filter my brand by each of the brand names that were in the top 10. This can work for a one-off analysis, but won't scale effectively for my team.
 

Sort by Pivots


To accomplish this, we can have Looker rank our brands for us, using an ephemeral derived table.

The idea is to dynamically calculate statistics at a brand level and then join them back into our model.

The first step is to create the query we'll use for the ranking. I'll often write these by hand, but a quick shortcut is to assemble the query in an Explore, then punch it out to SQL Runner, make any needed modifications, and then bring it into my project as a derived table:

SELECT
products.brand AS "brand",
COALESCE(SUM(order_items.sale_price ), 0) AS "total_revenue", # Included for clarity
RANK() OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0) DESC) AS "RNK"
FROM
public.order_items AS order_items
LEFT JOIN public.orders AS orders ON ( orders.id = order_items.order_id)
LEFT JOIN public.inventory_items AS inventory_items ON ( order_items.inventory_item_id = inventory_items.id )
LEFT JOIN public.products AS products ON ( inventory_items.product_id = products.id )
WHERE
1=1
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20

The query above groups by the dimension by which we wanted to pivot/sort, and establishes a rank function on total revenue. Now, we can bring it into our model as a derived table. You can run this in SQL Runner and then have Looker automatically generate the LookML.

Looker prompted me for a view name and I called our new view brand_rank_by_sales:

view: brand_rank_by_sales {
derived_table: {
sql:
SELECT
products.brand AS "brand",
COALESCE(SUM(order_items.sale_price ), 0) AS "total_revenue",
RANK() OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0) DESC) AS "RNK"
FROM
public.order_items AS order_items
LEFT JOIN public.orders AS orders ON orders.id = order_items.order_id
LEFT JOIN public.inventory_items AS inventory_items ON order_items.inventory_item_id = inventory_items.id
LEFT JOIN public.products AS products ON inventory_items.product_id = products.id
WHERE
1=1

# This is a "templated filter" - any filter I place on my orders created date will be mirrored here

AND {% condition orders.created_date %} orders.created_at {% endcondition %}
GROUP BY 1
ORDER BY 2 DESC
;;
}

measure: count {
type: count
drill_fields: [detail*]
}

dimension: brand {
type: string
sql: ${TABLE}.brand ;;
}

dimension: total_revenue {
type: number
sql: ${TABLE}.total_revenue ;;
}

dimension: rnk {
ype: number
sql: ${TABLE}.rnk ;;
}

set: detail {
fields: [brand, total_revenue, rnk]
}
}

Now, we can join it into the rest of our e-commerce Explore, as a many-to-one relationship joined on brand:

explore: order_items {
    join: orders {
        relationship: many_to_one
        sql_on: ${orders.id} = ${order_items.order_id} ;;
    }

    join: users {
        relationship: many_to_one
        sql_on: ${users.id} = ${orders.user_id} ;;
    }

    join: inventory_items {
        type: left_outer
        sql_on: ${order_items.inventory_item_id} = ${inventory_items.id} ;;
        relationship: one_to_one
    }

    join: products {
        type: left_outer
        sql_on: ${inventory_items.product_id} = ${products.id} ;;
        relationship: many_to_one
    }

    join: brand_rank_by_sales {
        type: left_outer
        sql_on: ${products.brand} = ${brand_rank_by_sales.brand} ;;
        relationship: many_to_one
    }
}


Now, we have the tools we need to look at the performance of our top 10 brands' revenue over time.
 


I can now pivot by my brand and simply filter the brand_rank_by_sales.rnk field, on the condition of less than or equal to 10.

To allow clear sorting by our pivot field, we can combine the rank and the brand name by adding a new dimension that will support being sorted alphanumerically. The CASE WHEN statement will simply prepend a 0 to ranks less than 10, for proper sorting.

dimension: ranked_brand {
type: string
sql:
CASE
WHEN
${rnk} < 10 THEN '0'|| ${rnk} || ') ' || ${brand}
ELSE ${rnk} || ') ' || ${brand}
END ;;
}

Now, we can pivot by our brand_rank_by_sales.ranked_brand field and filter for the top 10 rank:


 

Group Our Long Tail into a Single "Other" Bucket


Adding on to the pattern above, I may want to take everything below a certain rank and place it into a long tail or "other" bucket. This can be used for time-series analysis, but is especially helpful in pie charts. You can find more information on creating "other" buckets using table calculationsor using charts.

I'll simply add the following two fields:

filter: other_bucket_threshold {
type: number
}

dimension: ranked_brand_with_tail {
type: string
sql:
CASE
WHEN {% condition other_bucket_threshold %} ${rnk} {% endcondition %}
THEN ${ranked_brand}
ELSE 'x) Other'
END ;;
}

The benefit of this approach is that the end-user can dynamically adjust the tail threshold, which may vary or need to be tuned for the subset under consideration.
 


 

Drill into Your Outliers Using Percentile Rank


Add the following field to your derived table:

,PERCENT_RANK () OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0)) AS "percent_rank"

And the following dimensions to the view:

dimension: percent_rank {
type: number
value_format_name: percent_2
sql: ${TABLE}.percent_rank ;;
}

dimension: percentile_tiers {
type: tier
tiers: [0.25,0.5,0.75,0.9,0.95]
sql: ${percent_rank} ;;
}

Now, we can drill into our outliers:

 


0 replies

Be the first to reply!

Reply