# Sort by Pivots, Rank with Other Bucket, and Percentile Tail Analysis

• 0 replies
• 862 views

Userlevel 5
• Moderator
• 0 replies

Looker will not be updating this content, nor guarantees that everything is up-to-date.

## The Problem

A common issue users run into while modeling is sorting by pivots. For example, I might want to pivot my top 10 brands by total sales.

The most elegant solution I've found happens to work well for several other use cases. In this article I'll show how we can satisfy each in turn, by progressively adding to an ephemeral derived table.

## The Solution

In this example we'll use a simple e-commerce model. Here I've pulled my top brands by revenue:

Now, if I'd like to see how these top 10 perform over time, a simple pivot won't help. It will sort alphanumerically by default:

I would have to filter my brand by each of the brand names that were in the top 10. This can work for a one-off analysis, but won't scale effectively for my team.

## Sort by Pivots

To accomplish this, we can have Looker rank our brands for us, using an ephemeral derived table.

The idea is to dynamically calculate statistics at a brand level and then join them back into our model.

The first step is to create the query we'll use for the ranking. I'll often write these by hand, but a quick shortcut is to assemble the query in an Explore, then punch it out to SQL Runner, make any needed modifications, and then bring it into my project as a derived table:

``SELECT    products.brand AS "brand",    COALESCE(SUM(order_items.sale_price ), 0) AS "total_revenue",   # Included for clarity    RANK() OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0) DESC) AS "RNK"FROM    public.order_items AS order_items    LEFT JOIN public.orders AS orders ON ( orders.id = order_items.order_id)    LEFT JOIN public.inventory_items AS inventory_items ON ( order_items.inventory_item_id = inventory_items.id )    LEFT JOIN public.products AS products ON ( inventory_items.product_id = products.id )WHERE    1=1GROUP BY 1ORDER BY 2 DESCLIMIT 20``

The query above groups by the dimension by which we wanted to pivot/sort, and establishes a rank function on total revenue. Now, we can bring it into our model as a derived table. You can run this in SQL Runner and then have Looker automatically generate the LookML.

Looker prompted me for a view name and I called our new view `brand_rank_by_sales`:

``view: brand_rank_by_sales {    derived_table: {        sql:        SELECT         products.brand AS "brand",         COALESCE(SUM(order_items.sale_price ), 0) AS "total_revenue",         RANK() OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0) DESC) AS "RNK"        FROM         public.order_items AS order_items         LEFT JOIN public.orders AS orders ON orders.id = order_items.order_id         LEFT JOIN public.inventory_items AS inventory_items ON order_items.inventory_item_id = inventory_items.id         LEFT JOIN public.products AS products ON inventory_items.product_id = products.id         WHERE         1=1        # This is a "templated filter" - any filter I place on my orders created date will be mirrored here         AND {% condition orders.created_date %} orders.created_at {% endcondition %}         GROUP BY 1         ORDER BY 2 DESC        ;;        }    measure: count {        type: count        drill_fields: [detail*]    }    dimension: brand {        type: string        sql: \${TABLE}.brand ;;    }    dimension: total_revenue {        type: number        sql: \${TABLE}.total_revenue ;;    }    dimension: rnk {        ype: number        sql: \${TABLE}.rnk ;;    }    set: detail {        fields: [brand, total_revenue, rnk]    }}``

Now, we can join it into the rest of our e-commerce Explore, as a many-to-one relationship joined on `brand`:

``explore: order_items {    join: orders {        relationship: many_to_one        sql_on: \${orders.id} = \${order_items.order_id} ;;    }    join: users {        relationship: many_to_one        sql_on: \${users.id} = \${orders.user_id} ;;    }    join: inventory_items {        type: left_outer        sql_on: \${order_items.inventory_item_id} = \${inventory_items.id} ;;        relationship: one_to_one    }    join: products {        type: left_outer        sql_on: \${inventory_items.product_id} = \${products.id} ;;        relationship: many_to_one    }    join: brand_rank_by_sales {        type: left_outer        sql_on: \${products.brand} = \${brand_rank_by_sales.brand} ;;        relationship: many_to_one    }}``

Now, we have the tools we need to look at the performance of our top 10 brands' revenue over time.

I can now pivot by my brand and simply filter the `brand_rank_by_sales.rnk` field, on the condition of less than or equal to 10.

To allow clear sorting by our pivot field, we can combine the rank and the brand name by adding a new dimension that will support being sorted alphanumerically. The `CASE WHEN` statement will simply prepend a 0 to ranks less than 10, for proper sorting.

``dimension: ranked_brand {    type: string    sql:        CASE        WHEN            \${rnk} < 10 THEN '0'|| \${rnk} || ') ' || \${brand}            ELSE \${rnk} || ') ' || \${brand}        END ;;}``

Now, we can pivot by our `brand_rank_by_sales.ranked_brand` field and filter for the top 10 rank:

## Group Our Long Tail into a Single "Other" Bucket

Adding on to the pattern above, I may want to take everything below a certain rank and place it into a long tail or "other" bucket. This can be used for time-series analysis, but is especially helpful in pie charts. You can find more information on creating "other" buckets using table calculationsor using charts.

I'll simply add the following two fields:

``filter: other_bucket_threshold {    type: number}dimension: ranked_brand_with_tail {    type: string    sql:     CASE        WHEN {% condition other_bucket_threshold %} \${rnk} {% endcondition %}        THEN \${ranked_brand}        ELSE 'x) Other'    END ;;}``

The benefit of this approach is that the end-user can dynamically adjust the tail threshold, which may vary or need to be tuned for the subset under consideration.

## Drill into Your Outliers Using Percentile Rank

Add the following field to your derived table:

``,PERCENT_RANK () OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0)) AS "percent_rank"``

And the following dimensions to the view:

``dimension: percent_rank {    type: number    value_format_name: percent_2    sql: \${TABLE}.percent_rank ;;}dimension: percentile_tiers {    type: tier    tiers: [0.25,0.5,0.75,0.9,0.95]    sql: \${percent_rank} ;;}``

Now, we can drill into our outliers:

### 0 replies

Be the first to reply!