The Problem

sam8 · ‎07-07-2021

Knowledge Drop

Last tested: Mar 25, 2019

The Problem

In MySQL, you can write a measure which applies arithmetic between a measure and dimension.

measure: bad_measure {

type: number

sql: ${measure} * ${dimension} ;;

}

In Redshift (and most other databases), the same measure throws an error: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function . Why isn't every dialect as cool as MySQL?

The Reasoning

MySQL is not the cool one here. MySQL makes some really weird assumptions about grouping. Take for example this query which selects all the "years" in my video game database. Year isn't a primary key, and we aren't grouping by year, so we see several duplicates. This seems expected.

The problem arises when we also select a COUNT(*) without any grouping. As expected, in this query we get back one row with the full count of records. But we also see the "year" here as well, with a random value chosen. Most databases would have errored here instead of showing us a year.

While this may seem purely cosmetic, this can lead to bad data fast once you start doing arithmetic between the aggregated count and unaggregated year. Since you don't know which year will be used, your result in this query is probably unexpected and incorrect.

This content is subject to limited support.

MySQL and grouping

The Problem

The Reasoning