Grouping by date types instead of strings

jwp · 06-14-2016 06:50 PM

For queries that group by a date, keeping the column as a date is much faster than converting it to a string. I wouldn’t be surprised if this is true for all databases; it is definitely true for Snowflake. Here’s an example:

select date_trunc('month',d), count(1) from foo group by 1
vs
select to_char(date_trunc('month',d),'YYYY-MM'), count(1) from foo group by 1

Looker generates the second version when selecting a month timeframe from a dimension_group for d. But the first version runs about 20x faster!

Ideally Looker would generate sql like the first query, but in lieu of that, what are people’s favorite workarounds?

Rich1000

I’ve also noticed it convert timestamps to strings when there’s no need:

SELECT 
  TO_CHAR(orders.created_at, 'YYYY-MM-DD HH24:MI:SS') AS "orders.created_time",
  COUNT(*) AS "orders.count"
FROM orders
GROUP BY 1
LIMIT 500;

(i.e. orders.created_at is already a timestamp)

lloydtabb

The reason Looker does this is consistent display between dialects (and so the strings can be re-applied as filters in a consistent way when drilling).

Old LookML

``` - dimension: fast_month sql: date_trunc('month',d) ```

New LookML

``` dimension: fast_month { sql: date_trunc('month',d) ;; } ```

The problem is the month is displayed like:

@jwp, I’m surprised at the speed differences you are seeing.

jwp

Thanks, I thought of the string version too : )! But as you pointed out it’s not great for filtering and drilling…

For more context, the data is laid out ordered by the date column with

create table foo_ordered as (select * from foo order by d)

The snowflake team encouraged this, since it indexes along many (all?) columns. I’m handwaving/guessing here, but maybe in other databases, if the col isn’t indexed, the cost of switching to strings and string comparisons isn’t as high.

tig

For each old LookML code block in this article and its comments, we just added the New LookML equivalent code.

jpiersol

Hello,

We have noticed the same thing (and, we use Snowflake). Has anybody else experienced this and come up with a clever work-around?

Thanks!
Jake

Alex_Hancock

we’ve definitely seen differences on Redshift too, not of this magnitude but it’s definitely slower!

TimothyBurke

I’m using Redshift under the hood. I haven’t detected performance implications, but I would love if there was some way to configure Looker to select truncated dates instead of character conversions.

For a given query aggregating data into monthly buckets, SQL could be generated one of two ways which would return one of two values for the same date10/20/2017, :

date_trunc('month',date) returns 10/1/2017 (as a timstamp)
to_char(date_trunc('month',d),'YYYY-MM') returns '2017-10' (as string) - Looker’s current behavior

The former is preferable to the latter in most instances in my experience, especially when using Looker to produce or refresh a dataset which is then used in Excel for adhoc calculations, forecasting exercises, etc., since it’s recognized as a date, and month/year date logic in Excel can be applied. I often find myself translating Looker month data (formatted as YYYY-MM) back to date format in Excel with:
=DATE(VALUE(LEFT(A1,4)),VALUE(RIGHT(A1,2)),1)

I realize that you could define a new dimension, so where I have a purchase_date and purchase_month I could define purchase_first_of_month or something like that and group to appear with other fields, but with hundreds of date fields defined across all of the views in our various models, defining this dimension by dimension would be pretty tedious to implement and maintain going forward, and clutter the LookML.

fabio1

Just FYI, we have been dropping the to_char from our grouping clause since 4.20:

Looker 4.20 Release Notes Release Notes

Anticipated Deployment Dates Release Rollout Begins: August 13, 2017 Release Final & Download Available: August 28, 2017 All Release Notes Legacy Features End-of-life Schedule Release Highlights In addition to general tweaks and enhancements, this release comes with new and improved features in the following categories. Read on for more detail. Introduced the ability to schedule Looks and Dashboards when an ETL completes. Added a scheduling configuration option to remove links back to …