Question

How to Use ThetaSketch for COUNT DISTINCT on Druid?

  • 15 February 2020
  • 3 replies
  • 353 views

When we are trying to do COUNT DISTINCT, it reports


Remote driver error: QueryInterruptedException: Incompatible type for metric[id], expected a 
HyperUnique, got a class org.apache.druid.query.aggregation.datasketches.theta.SketchHolder -
> QueryInterruptedException: Incompatible type for metric[id], expected a HyperUnique, got a
class org.apache.druid.query.aggregation.datasketches.theta.SketchHolder

We do want to use ThetaSketch instead of HyperUnique. How we can change that?


3 replies

bump, is there any update here? We’re trying to do the same.

Yes. Our team developed a UDF called APPROX_COUNT_DISTINCT_DS_THETA to do this. Please try this out. It’s open sourced.


@pandog Very cool, thanks! On that note, has your team used DruidSQL to compile down to Top N queries?

Reply