How to Use ThetaSketch for COUNT DISTINCT on Druid?

When we are trying to do COUNT DISTINCT, it reports

Remote driver error: QueryInterruptedException: Incompatible type for metric[id], expected a 
HyperUnique, got a class org.apache.druid.query.aggregation.datasketches.theta.SketchHolder - 
> QueryInterruptedException: Incompatible type for metric[id], expected a HyperUnique, got a 
class org.apache.druid.query.aggregation.datasketches.theta.SketchHolder

We do want to use ThetaSketch instead of HyperUnique. How we can change that?

0 3 750
3 REPLIES 3

bump, is there any update here? We’re trying to do the same.

Yes. Our team developed a UDF called APPROX_COUNT_DISTINCT_DS_THETA to do this. Please try this out. It’s open sourced.

@pandog Very cool, thanks! On that note, has your team used DruidSQL to compile down to Top N queries?

Top Labels in this Space
Top Solution Authors