Question

Subtotal not displaying correctly

  • 7 January 2022
  • 0 replies
  • 71 views

Using an Apache Spark 3+ connection, I am trying to display simple subtotal results.  Instead, the results are only showing the underlying rows.  This is the test data I am using:

 

CREATE TABLE test.test AS
SELECT t.category, t.subcategory, t.amount
FROM
(
SELECT 'A' AS category, 'A-1' AS subcategory, 1 AS amount
UNION ALL
SELECT 'A' AS category, 'A-2' AS subcategory, 1 AS amount
UNION ALL
SELECT 'A' AS category, 'A-3' AS subcategory, 1 AS amount
UNION ALL
SELECT 'B' AS category, 'B-1' AS subcategory, 1 AS amount
UNION ALL
SELECT 'B' AS category, 'B-2' AS subcategory, 1 AS amount
UNION ALL
SELECT 'B' AS category, 'B-3' AS subcategory, 1 AS amount
UNION ALL
SELECT 'C' AS category, 'C-1' AS subcategory, 1 AS amount
UNION ALL
SELECT 'C' AS category, 'C-2' AS subcategory, 1 AS amount
UNION ALL
SELECT 'C' AS category, 'C-3' AS subcategory, 1 AS amount
) t

My view looks like:

view: DirectTest {
label: "Test"
sql_table_name: `test`.`test`;;
dimension: category {
label: "category"
type: string
sql: ${TABLE}.`category`;;
}

dimension: subcategory {
label: "subcategory"
type: string
sql: ${TABLE}.`subcategory`;;
}


measure: amount {
label: "amount"
group_label: "Stuff"
type: sum
sql: ${TABLE}.`amount`;;
}
}

When I explore this view, I add category, subcategory, and amount to the results table, click the Subtotals checkbox, and press Run.  This is the SQL query generated:

SELECT
`directtest_category_1`,
`directtest_subcategory_1`,
`directtest_amount_1`,
`z___grouping`
FROM
(SELECT
DirectTest.`category` `directtest_category_1`,
DirectTest.`subcategory` `directtest_subcategory_1`,
COALESCE(SUM(DirectTest.`amount`), 0) `directtest_amount_1`,
(CAST(FLOOR(LOG2(GROUPING_ID() * 2 + 1)) AS INT)) `z___grouping`
FROM `test`.`test` AS DirectTest
GROUP BY
DirectTest.`category`, DirectTest.`subcategory` WITH ROLLUP
ORDER BY
CASE WHEN `z___grouping` >= 2 THEN `z___grouping` ELSE 0 END,
`directtest_category_1`,
`z___grouping`
LIMIT 1000) `t4`

I get back 9 results, which is just every row in the underlying table.  When I run the query directly in Spark, I get back 13 rows, which includes the 3 subtotals and the grand total.  I have tried the same test case in Hive using the Apache Hive 2 dialect and I get different results.   This is the query:

SELECT
DirectTest.category AS directtest_category_1,
DirectTest.subcategory AS directtest_subcategory_1,
SUM(test.amount) AS directtest_amount_1,
(
2 - CAST(FLOOR(LOG2(GROUPING__ID * 2 + 1)) AS INT)
) as z___grouping
, GROUPING__ID
FROM `test`.`test` AS DirectTest
GROUP BY
DirectTest.category,
DirectTest.subcategory WITH ROLLUP
ORDER BY
CASE WHEN z___grouping >= 2 THEN z___grouping ELSE 0 END,
directtest_category_1,
directtest_amount_1 DESC,
z___grouping
LIMIT
1000;

This only returns the grand total row.  When I run the query directly in Hive, I get 13 rows back, but the ordering is different than Spark.

Is there some sort of issue with the view definition?  


0 replies

Be the first to reply!

Reply