Hi!
I have a table with three columns. I want to be able to count all the unique values across the 3 columns. Basically, in the end, I would like to have a column with each ID found in the 3 previous ones, but only once (thus, only have unique values) - a column with the total number of unique values, from the 3 initial ones.
Any ideas on how to do that or whether it’s possible to do it?
Thanks a lot!
Hi Miruna,
If you simply want to count the number of unique items in the columns I would use the count_distinct function in the calculation to do something like:
count_distinct(column 1) + count_distinct(column 2) + count_distinct(column 3)
Alternatively, you could possibly create a separate calculation for each column (if you want to see the count of unique values per column) and then add these together using a fourth table calculation.
Thanks Adam!
However, if I do a count_distinct per each column, then I would end up counting the same values several times.
In the 3 columns that I have, the same values can be found across several columns (for example, column 1 and column 2 can include the same value on several rows). I need a way to do an aggregation of the 3 columns first and then do a count_distinct on everything.
In that case I think a way would be, if this is something you want to use quite frequently, to maybe build a custom dimension, and then use the UNION function in the SQL line to combine the three columns (in theory, although I haven’t got chance to test that at the moment!). There may be a way using table calcs but after a quick internet search I haven’t been able to find a function.
Great! Thanks!
Do you have any other information about how to use the Union function?
Unfortunately I could find information about this function in Looker only here: UNION & UNION ALL in Looker.
Thanks again for your help! 🙂
I know this is a bit old but it still ranks highly on google
if you dialect supports it
measure: my_distinct_count {
type: count_distinct
sql: ${col_1}, ${col_2}, ${col_3} ;;
}
if not try to concatenate the columns with explicit sql
measure: my_distinct_count {
type: number
sql: (SELECT COUNT(DISTINCT col_1 || col_2 || col_3)) ;;
}
To echo the above, this still ranks highly on google. My own problem was similar, and my solution syntax was a little different. just adding it here in case someone else (or me again) ends up needing to revisit. For context, our dialect was redshift.
measure: count_distinct_col_ {
label: "Count of distinct ids"
type: count_distinct
sql: CONCAT(${col_2},${col_1}) ;;
}