Question

How to zero fill dates in Looker

  • 14 February 2015
  • 5 replies
  • 1975 views

Userlevel 5
Badge

A daily_usage table can be useful for summarizing usage per day. But what if there is a day without any usage information? Having missing dates when there was no usage could make analysis misleading or confusing. Thus in this case, we will want to have one row for each date, regardless of whether there is usage information on that date.



This can be done in a derived table using a generated date series as the base table. Lets create a simple daily_usage derived table that counts total events per day:





Old LookML



- view: daily_active_users_new

derived_table:

sql: |

SELECT general_date.date AS general_date

, COUNT(*) AS event_count

FROM

events









New LookML



view: daily_active_users_new {

derived_table: {

sql: SELECT general_date.date AS general_date

, COUNT(*) AS event_count

FROM

events

;;

}

}





However, this table may be missing rows for dates without events. To ensure that all dates are included in this daily_usage table, we can put a generated date series in the FROM clause, and then join events to that date series:



Note: the generated date series will vary based on the dialect of your database.



PostgreSQL



- view: daily_active_users_new

derived_table:

sql: |

SELECT general_date.date AS date

, COUNT(*) AS event_count

FROM (SELECT GENERATE_SERIES('2012-10-12', CURRENT_DATE, '1 day')::DATE AS date) AS general_date

LEFT JOIN events

ON events.created_date = general_date.date



fields:



- dimension_group: event

type: time

timeframes: [time, date, month, year]

sql: ${TABLE}.date



- dimension: event_count

type: number

sql: ${TABLE}.event_count



MySQL and Redshift



- view: daily_active_users_new

derived_table:

sql: |

SELECT general_date.date AS date

, COUNT(*) AS event_count

FROM (

SELECT

DATE_ADD('2001-01-01', INTERVAL numbers.number DAY) AS date

--DATEADD(day,number::int,'2001-01-01') AS date ##redshift syntax

FROM

(SELECT

p0.n

+ p1.n*2

+ p2.n * POWER(2,2)

+ p3.n * POWER(2,3)

+ p4.n * POWER(2,4)

+ p5.n * POWER(2,5)

+ p6.n * POWER(2,6)

+ p7.n * POWER(2,7)

as number

FROM

(SELECT 0 as n UNION SELECT 1) p0,

(SELECT 0 as n UNION SELECT 1) p1,

(SELECT 0 as n UNION SELECT 1) p2,

(SELECT 0 as n UNION SELECT 1) p3,

(SELECT 0 as n UNION SELECT 1) p4,

(SELECT 0 as n UNION SELECT 1) p5,

(SELECT 0 as n UNION SELECT 1) p6,

(SELECT 0 as n UNION SELECT 1) p7) as numbers) as general_date

LEFT JOIN events

ON events.created_date = general_date.date



fields:



- dimension_group: event

type: time

timeframes: [time, date, month, year]

sql: ${TABLE}.date



- dimension: event_count

type: number

sql: ${TABLE}.event_count



Alternative MySQL Approach



- view: daily_active_users_new

derived_table:

sql: |

SELECT general_date.date AS date

, COUNT(*) AS event_count

FROM (

SELECT date

FROM (

SELECT curdate() - interval (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) + (10000 * e.a)) day as date

FROM (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a

CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b

CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c

CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d

CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as e

) dates

WHERE date >= '2012-01-01'

ORDER BY date) as general_date

LEFT JOIN events

ON events.created_date = general_date.date



fields:



- dimension_group: event

type: time

timeframes: [time, date, month, year]

sql: ${TABLE}.date



- dimension: event_count

type: number

sql: ${TABLE}.event_count


5 replies

There are many data dimension generators out there. Look for the the Kimball Group Generator as one example. These date and time dimension generators allow for adding columns for more complex time groupings than what looker provides out of the box.

I just remembered about this generator which is very handy. Remember that one of the things about a date dimension is that it give you the power to do exotic things with grouping. Things like adding a column for third of a day or same day of week last quarter without complicated sql. I highly recommend you read up on the Data Warehouse Toolkit’s Date Dimension sections in Kimbal’s Book. They are the standard.

Just wanted to drop a quick reminder that now Looker can fill in gaps in dates without a date dimension. Thanks @lindsey!

Userlevel 2

Hi @Rex,



Correct, with Looker 4.0+, we have introduced Dimension Fill that will do this for you. More on that here. With Looker 4.2+, Dimension Fill now be a parameter allow_fill for relevant dimensions. Learn more about that here.



Cheers,


Vince

Userlevel 3

If you’re doing this using BigQuery, this article from SO may be helpful to create a range of dates.



Skipping straight to the dessert, to get a list of date values between two dates (2015-06-1 until the current date in this example):





SELECT day FROM

UNNEST(GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 DAY)) AS day



Reply