Say your data is stored in a relational store using child to parent relationship (e.g. one row per unique pair of child and its direct parent). Imagine now that this is a generic N-ary Tree and it has indefinite number of levels. An example of such a tree can be observed in the category hierarchy of Foursquare.
Traversing this tree presents a problem. What if you need to match a child with its top-most parent? Or what if you need to match some category with all of its children (direct and non-direct)?
Below is a solution that returns the top parent for every category.
WITH RECURSIVE parent_category(parent_foursquare_id, parent_id, orig_foursquare_id ) AS (
, foursquare_id AS parent_id
, foursquare_id AS orig_foursquare_id
-- next iteration's id
-- this iteration's id
-- carry forward the original category id to connect indirect parent/child nodes in one row
, categories c
-- recursive step
WHERE c.foursquare_id = pc.parent_foursquare_id
parent_id AS top_most_parent_id
, orig_foursquare_id AS category_id
-- no more parents for this top_most_parent
-- without this line, category_id will be matched with all of its direct and indirect parents
parent_foursquare_id IS NULL
GROUP BY 1, 2
Just a note on this. Many databases - not just postgres derived - will support this use of CTEs, and the recursive keyword is not always required. For example, this can be done with SqlServer too.
RedShift is currently an important exception. This does not work for RedShift.
Oracle can do similar things using the non-standard CONNECT BY … START WITH syntax.