Question

Running Total of Distinct Values

  • 3 March 2019
  • 4 replies
  • 127 views

Hello,


I’m looking to create a cumulative line graph of distinct values. I’m aware of the Running_total() function as well as Count_distinct() function.


Example:

My table have 4 days


Day 1 - A,B,C

Day 2 - A,B,D

Day 3 - B,C

Day 4 - E


My results should be


Day 1: 3

Day 2: 4

Day 3: 4

Day 4: 5


I’ve nested Count_distinct within running_total but its returning:


Day 1: 3

Day 2: 6

Day 3: 8

Day 4: 9


The Table Calculation: running_total(id_count})

The Measure:

measure: id_count {

type: count_distinct

sql: ${TABLE}.“ID” ;;

}


Any suggestions? Much appreciated!


Cheers,

Andrew


4 replies

Userlevel 7
Badge

If you have 5,000 distinct values or less in your dataset of interest, and your model has a MIN(date) measure, you can do this in the explore (otherwise you’ll need to use similar logic, but in the SQL of a derived table)


First, dimension by your entities / distinct values, measure them by their min (earliest) date, and sort by the min date. Using your example data, you’ll get a table like:



Then add two table calcs, one for a running total, and one to only show the last row of each Min Date value.




  • Running Count: row() + if(is_null(${min_date}),0,0)

    ^ We add the +if(...) to reference the measure and ensure this goes to the right of it, for visualization purposes


  • Show row - ${min_date} != offset(${min_date},1)

    ^ We’ll show only the last row (running count) within each value of min date



Finally, hide the first column, and use “Hide No’s” on the last column, and pick a line vis.

Thank you Fabio, this approach makes sense. I think for our use case we’ll have to do this on the SQL side given the size of the data set. Much Appreciated!

Userlevel 7
Badge

If you want to pre-calculate some of this in SQL and then allow users to pick their own timeframes, it will help to create a roll-up table with dimensions “date”, “prior date”, and measure “count”. Start by using a window function to partition by entity, order by date, and add a ‘lag’ function:



Then group and aggregate that to get this rollup table, which you would persist:



Then to get any cumulative distinct count, you can just group by date (or some function thereof, like week or month), filter on last date being outside of the range for which you want the measure to be distinct (because the ones where last date is within in that same range are not unique), and sum.

 

 

Reply