Generating a sequence of numbers or dates

  • 15 February 2015
  • 2 replies
  • 777 views

Userlevel 3

Sometimes it is useful to generate a sequential list of numbers or dates. This list can be used, for example, to produce records that indicate there is no data for a particular key. Consider the following example…



There is an orders table that includes an order date. For a particular range of days, the desired output is a record that shows the total count and value of the orders for each day. The first attempt for the SQL that expresses this is as follows…



SELECT order_date, count(id), sum(value)

FROM orders

WHERE order_date BETWEEN start_date AND end_date

GROUP BY order_date



This works well enough for a big table with a variety of data. We expect to get a single row representing each day in the date range. If the orders table is sparse, or if there are additional filters the output can be sparse. We might query on a 7 day range, but only get 5 rows of data back. That is because this query only returns a row for each day with at least one order. It won’t return a row with a zero order count.



If we want to get a row for every day in the range, regardless of presence or absence of a row for that day, we need to join it with a table with the complete list of days.



SELECT dates.date AS order_date, count(orders.id), sum(orders.value)

FROM dates LEFT OUTER JOIN orders

ON dates.date = orders.order_date

WHERE dates.date BETWEEN start_date AND end_date

GROUP BY dates.date



This query now generates a row for every date in the range, even if there is now associated order.



If the dates table exists, that is great. Often there will be no such table. In those cases, we might create just such a list on the fly. This query will generate the last 1,000 days.



WITH digit AS (

SELECT 0 AS d UNION ALL

SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL

SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL

SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9

),

seq AS (

SELECT a.d + 10*b.d + 100*c.d + 1000*d.d AS num

FROM digit a

CROSS JOIN

digit b

CROSS JOIN

digit c

CROSS JOIN

digit d

ORDER BY 1

)

SELECT current_date - seq.num AS "Date"

FROM seq



The way this works is that first a pseudo-table called digits is defined. This pseudo-table - more properly called a “Common Table Expression” or CTE - has 10 rows, and each row has one value - a digit between 0 and 9.



SELECT 0 AS d UNION ALL

SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL

SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL

SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9



Next a CTE called seq is defined by CROSS JOINing four copies of the digits CTE. This will yield 101010*10 = 10,000 rows - all the permutations of the digits 0-9 in four places. By multiply the digits by 1, 10, 100, and 1000 and adding them together, all the values 0 - 9,999 are obtained.



SELECT a.d + 10*b.d + 100*c.d + 1000*d.d AS num

FROM digit a

CROSS JOIN

digit b

CROSS JOIN

digit c

CROSS JOIN

digit d

ORDER BY 1



Now the final part of the query gets the current date and subtracts each value in the seq (0,1,2,3,…,9997,9998,9999) from it.



SELECT current_date - seq.num AS "Date"

FROM seq



This now provides us with the an expression that we can use instead of a dates table. Our query would look like this.



WITH digit AS (

SELECT 0 AS d UNION ALL

SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL

SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL

SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9

),

seq AS (

SELECT a.d + 10*b.d + 100*c.d + 1000*d.d AS num

FROM digit a

CROSS JOIN

digit b

CROSS JOIN

digit c

CROSS JOIN

digit d

ORDER BY 1

),

dates AS (

SELECT current_date - seq.num AS "Date"

FROM seq

)

SELECT dates.Date AS order_date, count(orders.id), sum(orders.value)

FROM dates LEFT OUTER JOIN orders

ON dates.Date = orders.order_date

WHERE dates.Date BETWEEN start_date AND end_date

-- This is redundant

--AND orders.order_date BETWEEN start_date AND end_date

-- but it might make the query more efficient

-- on some databases

GROUP BY dates.Date



The same technique is use by @anika in How to generate a date series in MySQL but MySQL does not support factoring out CTEs with the WITH clause of a SELECT statement.


2 replies

Postgres has a generate_series() function that simplifies this.


SELECT

generate_series(‘2014-01-01’::timestamp, ‘2014-06-01’, ‘1 month’)


generate_series


2014-01-01 00:00:00

2014-02-01 00:00:00

2014-03-01 00:00:00

2014-04-01 00:00:00

2014-05-01 00:00:00

2014-06-01 00:00:00

Userlevel 6
Badge

Closing in favor of Generating a Numbers Table in MySQL and Redshift

Reply