Knowledge Drop

MySQL and grouping

  • 7 July 2021
  • 0 replies
  • 16 views

Userlevel 5
Badge
  • Looker Staff
  • 171 replies

Last tested: Mar 25, 2019
 

The Problem

In MySQL, you can write a measure which applies arithmetic between a measure and dimension.

measure: bad_measure {

type: number

sql: ${measure} * ${dimension} ;;

}

 

In Redshift (and most other databases), the same measure throws an error: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function . Why isn't every dialect as cool as MySQL?

The Reasoning

MySQL is not the cool one here. MySQL makes some really weird assumptions about grouping. Take for example this query which selects all the "years" in my video game database. Year isn't a primary key, and we aren't grouping by year, so we see several duplicates. This seems expected.

1.png

The problem arises when we also select a COUNT(*) without any grouping. As expected, in this query we get back one row with the full count of records. But we also see the "year" here as well, with a random value chosen. Most databases would have errored here instead of showing us a year.

2.png

While this may seem purely cosmetic, this can lead to bad data fast once you start doing arithmetic between the aggregated count and unaggregated year. Since you don't know which year will be used, your result in this query is probably unexpected and incorrect.

3.png

 

This content is subject to limited support.                

 

 


0 replies

Be the first to reply!

Reply