This content, written by Scott Hoover, was initially posted in Looker Blog on Mar 27, 2014. The content is subject to limited support.
Recently, the Looker Analytics team was posed with a unique challenge by a client: come up with a way to exclude weekends from an analysis that calculates the duration between two times.
The data in question captures the rate at which jobs move from various stages in production. Each row of the table of interest (hereinafter velocity) represents a single job — which, along with other attributes, has an approved and a completed time.
velocity table
1  20130731 04:00:00  20130801 11:00:00  20130804 13:00:00  ...  1  1 
2  20130307 07:00:00  20130307 10:00:00  20130312 14:00:00  ...  4  1 
...  ...  ...  ...  ...  ...  ... 
n  20130507 09:00:00  20130508 02:00:00  20130516 12:00:00  ...  3  2 
The first pass at addressing approvedtocomplete velocity relied on a simple date difference:
SELECT
job_id
,
approved_at
,
published_at
,
EXTRACT
(
'EPOCH'
FROM
(
completed_at

approved_at
))
/
3600
AS
velocity_in_hours
FROM
velocity
The problem with this approach is that some jobs start in one week, extend over one or more weekends, and are completed in some subsequent week; however, work is not done on weekends, and the velocity for jobs that extend over one or more weekends is penalized in a sense. This is why Looker was asked to come up with a solution to exclude weekends from the analysis.
Solution
A few approaches were posed, but the final solution relied on a lookup table containing three columns: a date_time field, a day_of_week field, and an integer, hour_seq, indicating the number of hours since the first hour in this table (hereinafter sequence). The table would be used to get a corresponding integer (again, the number of hours since the beginning of the sequence table) for the approved_at and completed_at for each job and then subtract the two, yielding the number of hours between the two times.
The key to the solution was that the integers in the sequence table would not increment on weekend hours. The resulting table would look like this:
sequence table
20080101 00:00:00  2  1 
20080101 01:00:00  2  2 
20080101 02:00:00  2  3 
...  ...  ... 
20080104 21:00:00  5  93 
20080104 22:00:00  5  94 
20080104 23:00:00  5  95 
20080105 00:00:00  6  95 
20080105 01:00:00  6  95 
20080105 02:00:00  6  95 
...  ...  ... 
20080106 23:00:00  0  95 
20080107 00:00:00  1  96 
20080107 01:00:00  1  97 
To achieve this, we needed to start with a table of hours from the beginning of time (or 2008, when the business started) going well into the future. We relied on recursive common table expressions (CTEs) in PostgreSQL to generate a sequence of numbers and then add hours accordingly, but this could achieved in a number of ways for dialects without recursive CTEs.
WITH
RECURSIVE
seq
(
n
)
AS
(
SELECT
1
UNION
ALL
SELECT
n
+
1
FROM
seq
)
generate a sequence of numbers
,
hours
AS
(
SELECT
TIMESTAMP
'20080101 00:00:00'
+
n
*
INTERVAL
'1 hour'
AS
date_time
FROM
seq
LIMIT
87600
)
generate ten years of hours from 20080101 00:00:00 into the future
SELECT
date_time
,
EXTRACT
(
dow
FROM
date_time
)
AS
day_of_week
,
SUM
(
CASE
WHEN
EXTRACT
(
dow
FROM
date_time
)
NOT
IN
(
6
,
0
)
THEN
1
ELSE
NULL
END
)
OVER
(
ORDER
BY
date_time
)
AS
hour_seq
FROM
hours
The penultimate step was to twice join the sequence table into velocity and map each created_at and completed_at entry to its corresponding hour_seq. The final velocity table looked something like this:
velocity table
1  20130731 04:00:00  20130801 11:00:00 
34979  20130804 13:00:00 
35015  ...  1  1 
2  20130307 07:00:00 
20130307 10:00:00 
32458  20130312 14:00:00 
32534  ...  4  1 
...  ...  ...  ...  ...  ...  ...  ...  ... 
n  20130507 09:00:00 
20130508 02:00:00 
33506  20130516 12:00:00 
33660  ...  3  2 
With a velocity table that had corresponding hour sequence numbers for each approved and completed time, we could simply difference these two values to see the time spent in production, in hours. This could be easily extended to capture any timeframe.
SELECT
approved_at
,
completed_at
,
completed_int

approved_int
AS
duration_in_hours
FROM
velocity
— fin —