Subtotal not displaying correctly

jon_at2 · 01-07-2022 12:05 PM

Using an Apache Spark 3+ connection, I am trying to display simple subtotal results. Instead, the results are only showing the underlying rows. This is the test data I am using:

CREATE TABLE test.test AS
SELECT t.category, t.subcategory, t.amount
FROM
(
        SELECT 'A' AS category, 'A-1' AS subcategory, 1 AS amount
        UNION ALL
        SELECT 'A' AS category, 'A-2' AS subcategory, 1 AS amount
        UNION ALL
        SELECT 'A' AS category, 'A-3' AS subcategory, 1 AS amount
        UNION ALL
        SELECT 'B' AS category, 'B-1' AS subcategory, 1 AS amount
        UNION ALL
        SELECT 'B' AS category, 'B-2' AS subcategory, 1 AS amount
        UNION ALL
        SELECT 'B' AS category, 'B-3' AS subcategory, 1 AS amount
        UNION ALL
        SELECT 'C' AS category, 'C-1' AS subcategory, 1 AS amount
        UNION ALL
        SELECT 'C' AS category, 'C-2' AS subcategory, 1 AS amount
        UNION ALL
        SELECT 'C' AS category, 'C-3' AS subcategory, 1 AS amount
) t

My view looks like:

view: DirectTest {
  label: "Test"
  sql_table_name: `test`.`test`;;
  dimension: category {
    label: "category"
    type: string
    sql: ${TABLE}.`category`;;
  }

  dimension: subcategory {
    label: "subcategory"
    type: string
    sql: ${TABLE}.`subcategory`;;
  }


  measure: amount {
    label: "amount"
    group_label: "Stuff"
    type: sum
    sql: ${TABLE}.`amount`;;
  }
}

When I explore this view, I add category, subcategory, and amount to the results table, click the Subtotals checkbox, and press Run. This is the SQL query generated:

SELECT
    `directtest_category_1`,
    `directtest_subcategory_1`,
    `directtest_amount_1`,
    `z___grouping`
FROM
    (SELECT
            DirectTest.`category` `directtest_category_1`,
            DirectTest.`subcategory` `directtest_subcategory_1`,
            COALESCE(SUM(DirectTest.`amount`), 0) `directtest_amount_1`,
                (CAST(FLOOR(LOG2(GROUPING_ID() * 2 + 1)) AS INT)) `z___grouping`
        FROM `test`.`test` AS DirectTest
        GROUP BY
            DirectTest.`category`, DirectTest.`subcategory` WITH ROLLUP
        ORDER BY
            CASE WHEN `z___grouping` >= 2 THEN `z___grouping` ELSE 0 END,
            `directtest_category_1`,
            `z___grouping`
        LIMIT 1000) `t4`

I get back 9 results, which is just every row in the underlying table. When I run the query directly in Spark, I get back 13 rows, which includes the 3 subtotals and the grand total. I have tried the same test case in Hive using the Apache Hive 2 dialect and I get different results. This is the query:

SELECT
  DirectTest.category AS directtest_category_1,
  DirectTest.subcategory AS directtest_subcategory_1,
  SUM(test.amount) AS directtest_amount_1,
  (
    2 - CAST(FLOOR(LOG2(GROUPING__ID * 2 + 1)) AS INT)
  ) as z___grouping
  , GROUPING__ID
FROM `test`.`test` AS DirectTest
GROUP BY
  DirectTest.category,
  DirectTest.subcategory WITH ROLLUP
ORDER BY
  CASE WHEN z___grouping >= 2 THEN z___grouping ELSE 0 END,
  directtest_category_1,
  directtest_amount_1 DESC,
  z___grouping
LIMIT
  1000;

This only returns the grand total row. When I run the query directly in Hive, I get 13 rows back, but the ordering is different than Spark.

Is there some sort of issue with the view definition?