Using parameters to find the top N items versus the rest of the population

  • 18 July 2022
  • 0 replies
  • 109 views

  • Looker Staff
  • 10 replies

Looker will not be updating this content, nor guarantees that everything is up-to-date. 

The Looker parameter field lets you inject a value directly into the SQL.

Various Looker pages describe how to use this feature:

  • The Templated filters and parameters documentation page, which discusses creating the field and applying the user input
  • The parameter reference page for this field type and its allowed_value subparameter
  • The Field parameters page, which includes the parameter field type and shows which LookML parameters can be subparameters of the parameter field
  • The Liquid variable reference documentation page, which mentions the {% parameter %} Liquid variable
  • The Dimension, filter, and parameter types documentation page, which indicates the LookML type parameter values that can be used with parameter fields

In this article, we discuss how to use parameter to let users find the top N items versus the rest of the population in Explores.
 

The problem


Users often want to find out the top n performing products and compare those with other products.

Without a parameter, you would have to hard code certain buckets (for example, top 5, 10, 20, and so on). However, with a parameter, we can let the user determine n, and we can label the remainder as Other.
 

The solution


In the context of an e-commerce store, where we want to find out how well the top n products are doing, we would start with a derived table where we rank each product based on the number of sales:

view: brand_rankings {
derived_table: {
sql:
SELECT brand, count(*) as count, RANK() OVER(ORDER BY COUNT(*) DESC) as rank
FROM order_items LEFT JOIN inventory_items on order_items.inventory_item_id=inventory_items.id
LEFT JOIN products ON inventory_items.product_id = products.id
GROUP BY 1 ;;
}

dimension: brand {
type: string
primary_key: yes
sql: ${TABLE}.brand ;;
}

dimension: rank_raw {
type: number
sql: ${TABLE}.rank ;;
}
}

Using a parameter, we let users specify how many of the top brands they're interested in:

parameter: max_brands {
type: number
}

By referencing the parameter inside the sql parameter of our rank dimension, we can then create the Other bucket on the fly:

dimension: rank {
type: string
sql: CASE WHEN ${rank_raw} <= {% parameter max_brands %} THEN RIGHT('00' + CAST(${rank_raw} AS VARCHAR), 2) ELSE 'Other' END;;
}

dimension: rank_and_brand {
type: string
sql: CASE WHEN ${rank} = 'Other' THEN 'Other' ELSE ${rank} || '-' || ${brand} END;;
}

This results in the user's top n items having their own row and the rest being grouped together as Other:

Additionally, we can create a ranking of countries we have sold to, set that ranking based on a metric not shown in the visualization, and select a date range different from what is shown in the results.

To do this, create a view like this:

view: top_5_countries {
label: "Countries Ranking"
derived_table: {
sql:
select
country_id as country_code,
{% parameter country_name_criteria %} as country_rank
from
(
select country_code,
rank() over(order by count(*) desc) as visitorCount,
rank() over(order by sum(orders.gross_revenue) desc) as totalGrossRevenue,
rank() over(order by avg(orders.gross_revenue) desc) as averageGrossRevenue
FROM orders
WHERE
{% condition rank_date_range %}created_at {% endcondition %}
group by country_code
) AS country_summary ;;
}

filter: rank_date_range {
type: date
description: "Select a range within which you are ranking the ordering of countries by metric selection. E.g. the rank of countries who had the top 10 highest revenue in May"
}

dimension: country_code {
primary_key: yes
hidden: yes
type: string
sql: ${TABLE}.country_code ;;
}

parameter: country_name_criteria {
label: "Ranking Criteria"
description: "Specify which metric to order the ranking by"
type: unquoted
default_value: "totalGrossRevenue"
allowed_value: {
label: "Total Gross Revenue"
value: "totalGrossRevenue"
}
allowed_value: {
label: "Average Gross Revenue"
value: "averageGrossRevenue"
}
allowed_value: {
label: "Visitor Count"
value: "visitorCount"
}
}

#### This parameter will allow a user to select a Top N ranking limit for bucketing the countries, almost like parameterizing the Row Limit in the UI
parameter: country_rank_limit {
label: "Rank Limit"
description: "Specify the cutoff for overall rank"
type: unquoted
default_value: "5"
allowed_value: {
label: "Top 5"
value: "5"
}
allowed_value: {
label: "Top 10"
value: "10"
}
allowed_value: {
label: "Top 20"
value: "20"
}
allowed_value: {
label: "Top 50"
value: "50"
}
}

dimension: country_rank_top_N {
hidden: yes
description: "Rank within the range selected and list of countries based on metric selected. Useful for sorting visualization based on ranking."
label_from_parameter: country_name_criteria
label: "Country Code"
type: number
sql: case when ${TABLE}.country_rank<={% parameter country_rank_limit %} then ${TABLE}.country_rank else null end ;;
}

dimension: country_name_top_N {
description: "Name of the Country within the rank selection."
label: "Country Name (Top N)"
type: string
sql: case when ${TABLE}.country_rank<={% parameter country_rank_limit %} then ${TABLE}.country_code else 'other' end ;;
}
}

And then join it to the base Explore:

explore: orders {
join: top_5_countries {
type: inner
sql_on: ${top_5_countries.country_name_top_N} = ${orders.country_code} ;;
relationship: many_to_one
}
}

This allows us to answer such edge-case questions as "Show me the top 10 countries and their Gross Revenue for the last 7 days, based on how many unique visitors those countries had in the 7 days prior, and display them in order of that visitor count."


0 replies

Be the first to reply!

Reply