Question

Limiting a table to the max value per group

  • 2 January 2018
  • 3 replies
  • 3092 views

Hey there, happy 2018.



I have a table that includes:


state, type and count



There are 50 states and each state has multiple types and each type has a count.



It looks generally like this



state type count


ca a 50


ca b 3


ca c 5


ca d 12


wa d 13


wa b 14


wa e 11


wa a 28


ny c 39


ny a 12


ny b 52


ny d 11



I’d like to have a list of states and just the type with the highest count per state. Then, I’d like to get a percentage that shows how many of the records in that state are of that particular type.



So it looks like this –



state type count percent


ca a 50 71%


wa a 28 42%


ny c 52 46%



How do I do this in looker? Thanks!


3 replies

Userlevel 3

Hey @sebastian, it looks like we’ll want to rank the count amount for each state. Depending on what your dialect supports we will need either a correlated subquery or a window function. This is possible with some creative SQL one-off coding… but what about re-use? What if you want the 2nd highest count per state, or least? Looker gives us the flexibility to do this!



Using your general table from above we define table stc from schema SCHEMA as a view file in Looker:



view: stc {

sql_table_name: SCHEMA.stc ;;



dimension: pkey {

type: string

primary_key: yes # This primary key added in order for our aggregate

hidden: yes # functions to work properly through joins

sql: CONCAT(stc.State,stc.Type) ;;

}



dimension: state {

type: string

sql: ${TABLE}.State ;;

}



dimension: typecount { # this is your "count" DIMENSION, not to be confused

type: number # with the general "count" MEASURE Looker generates

sql: ${TABLE}.count ;;

}



dimension: type {

type: string

sql: ${TABLE}.Type ;;

}



measure: type_sum { # We will use this measure to total the count dimension per state

type: sum # and use it for the percent calculation

sql: ${typecount} ;;

}

}



Using SQL Runner we can create another view derived_window from a SQL query that uses a window function to rank the types by highest count per state.



view: derived_window {

derived_table: {

sql: SELECT State AS ST,

Type AS TY,

count AS CO,

RANK() OVER (PARTITION BY State ORDER BY count DESC) AS rank

FROM SCHEMA.stc

GROUP BY 1,2,3

;;

}



dimension: state {

type: string

sql: ${TABLE}.ST ;;

}



dimension: type {

type: string

sql: ${TABLE}.TY ;;

}



dimension: maxcount {

type: number

sql: ${TABLE}.CO ;;

}



dimension: rank {

type: number

sql: ${TABLE}.rank ;;

}

}



Which gives us results like this:




We can achieve identical results with a correlated subquery, in which case we would replace our derived table sql: clause with the following:



view: derived_correlated_sub_query {

derived_table: {

sql: SELECT a.State ST, a.Type TY, a.count CO, COUNT(*) as rank

FROM SCHEMA.stc a JOIN SCHEMA.stc b

ON a.count <= b.count AND a.State = b.State

GROUP BY 1,2,3;;

}

... # same dimensions as above

}



Either way we will want to join the derived table view to the original stc view in our model:



explore: stc {

join: derived_window {

relationship: many_to_many

sql_on: ${stc.state} = ${derived_window.state} ;;

}



When we go to the explore, we can select our desired fields (State, Type, maxcount from the derived table, and Type_Sum from stc). Then we can filter on Rank = 1 or 2 or none at all if we want to see percentages for all of the counts. From there we have all the parts we need for the percentage calculation. Which can be done with a percent formatted table calc using ${derived_window.maxcount}/${stc.type_sum}



End result:


This is really great, thank you Jeffrey!

Userlevel 7
Badge

Hey @sebastian - I also have an article about this subject here:





Reply