Something very weird has been discovered by one of my colleagues. They were trying to plot some data with day of the month as dimension and few months as pivot.
Our database is set to UTC and the Query time zone to Europe/Madrid.
They noticed that, for example, June had no data on 30th but on 31st, and July didn’t have any data on 31st. Surprisingly August, which has only just begun, also had data on 31st.
I compared the code generated in SQL tab when we use Table Calculations vs timeframe day of month
WITH data AS (
CAST('2021-05-31 23:50:00' AS TIMESTAMP) AS created_at
TIMESTAMP_TRUNC(created_at , DAY),
TIMESTAMP_TRUNC(created_at, HOUR, 'Europe/Madrid'),``
TIMESTAMP_TRUNC(created_at , DAY, 'Europe/Madrid'),
-- Code generated when using timeframe "month"
(FORMAT_TIMESTAMP('%Y-%m', created_at , 'Europe/Madrid')),
-- Code generated when using timeframe "day_of_month"
(EXTRACT(DAY FROM created_at AT TIME ZONE 'Europe/Madrid'),
-- Code generated when using "extract_days" in Table Calculation
EXTRACT(DAY FROM (TIMESTAMP_TRUNC(created_at , DAY, 'Europe/Madrid'))),
If you run the above you will notice the difference between the last three columns.
If the original timestamp is 23:50 in UTC, then the two timeframes are correct - in our time zone it’s already June but the extract_days() function’s behaviour is incorrect.
The problem is that we need to specify time zone in both functions used to generate it: TIMESTAMP_TRUNC and EXTRACT
EXTRACT(DAY FROM (TIMESTAMP_TRUNC(created_at , DAY, 'Europe/Madrid')) AT TIME ZONE 'Europe/Madrid')
This will give us correct day of month.