Question

Generating a sequence of numbers or dates

  • 15 February 2015
  • 2 replies
  • 46 views

Userlevel 3

Sometimes it is useful to generate a sequential list of numbers or dates. This list can be used, for example, to produce records that indicate there is no data for a particular key. Consider the following example…


There is an orders table that includes an order date. For a particular range of days, the desired output is a record that shows the total count and value of the orders for each day. The first attempt for the SQL that expresses this is as follows…


SELECT order_date, count(id), sum(value)
FROM orders
WHERE order_date BETWEEN start_date AND end_date
GROUP BY order_date

This works well enough for a big table with a variety of data. We expect to get a single row representing each day in the date range. If the orders table is sparse, or if there are additional filters the output can be sparse. We might query on a 7 day range, but only get 5 rows of data back. That is because this query only returns a row for each day with at least one order. It won’t return a row with a zero order count.


If we want to get a row for every day in the range, regardless of presence or absence of a row for that day, we need to join it with a table with the complete list of days.


SELECT dates.date AS order_date, count(orders.id), sum(orders.value)
FROM dates LEFT OUTER JOIN orders
ON dates.date = orders.order_date
WHERE dates.date BETWEEN start_date AND end_date
GROUP BY dates.date

This query now generates a row for every date in the range, even if there is now associated order.


If the dates table exists, that is great. Often there will be no such table. In those cases, we might create just such a list on the fly. This query will generate the last 1,000 days.


WITH digit AS (
SELECT 0 AS d UNION ALL
SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
seq AS (
SELECT a.d + 10*b.d + 100*c.d + 1000*d.d AS num
FROM digit a
CROSS JOIN
digit b
CROSS JOIN
digit c
CROSS JOIN
digit d
ORDER BY 1
)
SELECT current_date - seq.num AS "Date"
FROM seq

The way this works is that first a pseudo-table called digits is defined. This pseudo-table - more properly called a “Common Table Expression” or CTE - has 10 rows, and each row has one value - a digit between 0 and 9.


SELECT 0 AS d UNION ALL
SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9

Next a CTE called seq is defined by CROSS JOINing four copies of the digits CTE. This will yield 101010*10 = 10,000 rows - all the permutations of the digits 0-9 in four places. By multiply the digits by 1, 10, 100, and 1000 and adding them together, all the values 0 - 9,999 are obtained.


SELECT a.d + 10*b.d + 100*c.d + 1000*d.d AS num
FROM digit a
CROSS JOIN
digit b
CROSS JOIN
digit c
CROSS JOIN
digit d
ORDER BY 1

Now the final part of the query gets the current date and subtracts each value in the seq (0,1,2,3,…,9997,9998,9999) from it.


SELECT current_date - seq.num AS "Date"
FROM seq

This now provides us with the an expression that we can use instead of a dates table. Our query would look like this.


WITH digit AS (
SELECT 0 AS d UNION ALL
SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
seq AS (
SELECT a.d + 10*b.d + 100*c.d + 1000*d.d AS num
FROM digit a
CROSS JOIN
digit b
CROSS JOIN
digit c
CROSS JOIN
digit d
ORDER BY 1
),
dates AS (
SELECT current_date - seq.num AS "Date"
FROM seq
)
SELECT dates.Date AS order_date, count(orders.id), sum(orders.value)
FROM dates LEFT OUTER JOIN orders
ON dates.Date = orders.order_date
WHERE dates.Date BETWEEN start_date AND end_date
-- This is redundant
--AND orders.order_date BETWEEN start_date AND end_date
-- but it might make the query more efficient
-- on some databases
GROUP BY dates.Date

The same technique is use by @anika in How to generate a date series in MySQL but MySQL does not support factoring out CTEs with the WITH clause of a SELECT statement.


2 replies

Postgres has a generate_series() function that simplifies this.


SELECT

generate_series(‘2014-01-01’::timestamp, ‘2014-06-01’, ‘1 month’)


generate_series


2014-01-01 00:00:00

2014-02-01 00:00:00

2014-03-01 00:00:00

2014-04-01 00:00:00

2014-05-01 00:00:00

2014-06-01 00:00:00

Userlevel 6
Badge

Closing in favor of Generating a Numbers Table in MySQL and Redshift

Reply