Question

Limiting a table to the max value per group

  • 2 January 2018
  • 3 replies
  • 3236 views

Hey there, happy 2018.


I have a table that includes:

state, type and count


There are 50 states and each state has multiple types and each type has a count.


It looks generally like this


state type count

ca a 50

ca b 3

ca c 5

ca d 12

wa d 13

wa b 14

wa e 11

wa a 28

ny c 39

ny a 12

ny b 52

ny d 11


I’d like to have a list of states and just the type with the highest count per state. Then, I’d like to get a percentage that shows how many of the records in that state are of that particular type.


So it looks like this –


state type count percent

ca a 50 71%

wa a 28 42%

ny c 52 46%


How do I do this in looker? Thanks!


3 replies

Userlevel 3

Hey @sebastian, it looks like we’ll want to rank the count amount for each state. Depending on what your dialect supports we will need either a correlated subquery or a window function. This is possible with some creative SQL one-off coding… but what about re-use? What if you want the 2nd highest count per state, or least? Looker gives us the flexibility to do this!


Using your general table from above we define table stc from schema SCHEMA as a view file in Looker:


view: stc {
sql_table_name: SCHEMA.stc ;;

dimension: pkey {
type: string
primary_key: yes # This primary key added in order for our aggregate
hidden: yes # functions to work properly through joins
sql: CONCAT(stc.State,stc.Type) ;;
}

dimension: state {
type: string
sql: ${TABLE}.State ;;
}

dimension: typecount { # this is your "count" DIMENSION, not to be confused
type: number # with the general "count" MEASURE Looker generates
sql: ${TABLE}.count ;;
}

dimension: type {
type: string
sql: ${TABLE}.Type ;;
}

measure: type_sum { # We will use this measure to total the count dimension per state
type: sum # and use it for the percent calculation
sql: ${typecount} ;;
}
}

Using SQL Runner we can create another view derived_window from a SQL query that uses a window function to rank the types by highest count per state.


view: derived_window {
derived_table: {
sql: SELECT State AS ST,
Type AS TY,
count AS CO,
RANK() OVER (PARTITION BY State ORDER BY count DESC) AS rank
FROM SCHEMA.stc
GROUP BY 1,2,3
;;
}

dimension: state {
type: string
sql: ${TABLE}.ST ;;
}

dimension: type {
type: string
sql: ${TABLE}.TY ;;
}

dimension: maxcount {
type: number
sql: ${TABLE}.CO ;;
}

dimension: rank {
type: number
sql: ${TABLE}.rank ;;
}
}

Which gives us results like this:


We can achieve identical results with a correlated subquery, in which case we would replace our derived table sql: clause with the following:


view: derived_correlated_sub_query {
derived_table: {
sql: SELECT a.State ST, a.Type TY, a.count CO, COUNT(*) as rank
FROM SCHEMA.stc a JOIN SCHEMA.stc b
ON a.count <= b.count AND a.State = b.State
GROUP BY 1,2,3;;
}
... # same dimensions as above
}

Either way we will want to join the derived table view to the original stc view in our model:


explore: stc {
join: derived_window {
relationship: many_to_many
sql_on: ${stc.state} = ${derived_window.state} ;;
}

When we go to the explore, we can select our desired fields (State, Type, maxcount from the derived table, and Type_Sum from stc). Then we can filter on Rank = 1 or 2 or none at all if we want to see percentages for all of the counts. From there we have all the parts we need for the percentage calculation. Which can be done with a percent formatted table calc using ${derived_window.maxcount}/${stc.type_sum}


End result:

This is really great, thank you Jeffrey!

Userlevel 7
Badge

Hey @sebastian - I also have an article about this subject here:



Reply