Question

Grouping by date types instead of strings

  • 15 June 2016
  • 8 replies
  • 1292 views

Userlevel 2

For queries that group by a date, keeping the column as a date is much faster than converting it to a string. I wouldn’t be surprised if this is true for all databases; it is definitely true for Snowflake. Here’s an example:


select date_trunc('month',d), count(1) from foo group by 1

vs

select to_char(date_trunc('month',d),'YYYY-MM'), count(1) from foo group by 1


Looker generates the second version when selecting a month timeframe from a dimension_group for d. But the first version runs about 20x faster!



Ideally Looker would generate sql like the first query, but in lieu of that, what are people’s favorite workarounds?


8 replies

Userlevel 3

I’ve also noticed it convert timestamps to strings when there’s no need:


SELECT 
TO_CHAR(orders.created_at, 'YYYY-MM-DD HH24:MI:SS') AS "orders.created_time",
COUNT(*) AS "orders.count"
FROM orders
GROUP BY 1
LIMIT 500;

(i.e. orders.created_at is already a timestamp)

Userlevel 6
Badge

The reason Looker does this is consistent display between dialects (and so the strings can be re-applied as filters in a consistent way when drilling).



Old LookML
```
- dimension: fast_month
sql: date_trunc('month',d)
```


New LookML
```
dimension: fast_month {
sql: date_trunc('month',d) ;;
}
```

The problem is the month is displayed like:



@jwp, I’m surprised at the speed differences you are seeing.

Userlevel 2

Thanks, I thought of the string version too : )! But as you pointed out it’s not great for filtering and drilling…


For more context, the data is laid out ordered by the date column with


create table foo_ordered as (select * from foo order by d)


The snowflake team encouraged this, since it indexes along many (all?) columns. I’m handwaving/guessing here, but maybe in other databases, if the col isn’t indexed, the cost of switching to strings and string comparisons isn’t as high.

For each old LookML code block in this article and its comments, we just added the New LookML equivalent code.

Hello,


We have noticed the same thing (and, we use Snowflake). Has anybody else experienced this and come up with a clever work-around?


Thanks!

Jake

Userlevel 3

we’ve definitely seen differences on Redshift too, not of this magnitude but it’s definitely slower!

I’m using Redshift under the hood. I haven’t detected performance implications, but I would love if there was some way to configure Looker to select truncated dates instead of character conversions.


For a given query aggregating data into monthly buckets, SQL could be generated one of two ways which would return one of two values for the same date10/20/2017, :




  • date_trunc('month',date) returns 10/1/2017 (as a timstamp)


  • to_char(date_trunc('month',d),'YYYY-MM') returns '2017-10' (as string) - Looker’s current behavior


The former is preferable to the latter in most instances in my experience, especially when using Looker to produce or refresh a dataset which is then used in Excel for adhoc calculations, forecasting exercises, etc., since it’s recognized as a date, and month/year date logic in Excel can be applied. I often find myself translating Looker month data (formatted as YYYY-MM) back to date format in Excel with:

=DATE(VALUE(LEFT(A1,4)),VALUE(RIGHT(A1,2)),1)


I realize that you could define a new dimension, so where I have a purchase_date and purchase_month I could define purchase_first_of_month or something like that and group to appear with other fields, but with hundreds of date fields defined across all of the views in our various models, defining this dimension by dimension would be pretty tedious to implement and maintain going forward, and clutter the LookML.

Userlevel 6
Badge

Just FYI, we have been dropping the to_char from our grouping clause since 4.20:


Reply