An indented representation of a parent-child hierarchy

2014-03-30 — 1 Comment

When you’re designing reports, they can often be based on hiearchies represented by “nodes” in a parent-child setup. To the end-user, the parent-child representation doesn’t provide very much readability, so you need to output this information in a human-readable form, for instance in a table where the names/titles are indented.

A good example of data that is organized in a hierarchy is a balance sheet. Used in financial reporting, the balance sheet represents different types of assets and liabilities in a company, divided and subdivided into multiple levels of detail. Because this dimension has a different number of levels depending on where in the dimension we are (a so-called “ragged” dimension), it’s ideally modelled as a parent-child dimension.

In a parent-child dimension, a “parent” column provides each member with a pointer to its parent member. The top-level members use a NULL value as the parent member ID.

Example balance sheet

Here’s an example of a balance sheet in a parent-child dimension that we can use:

DECLARE @balanceSheet TABLE (
    rowID             int NOT NULL,
    parentRowID       int NULL,
    rowDescription    varchar(100) NOT NULL,
    PRIMARY KEY CLUSTERED (rowID)
);

--- Balance sheet example from http://en.wikipedia.org/wiki/Balance_sheet.
--- Slightly modified to readability.
INSERT INTO @balanceSheet (rowID, parentRowID, rowDescription)
VALUES (1,  NULL, 'ASSETS'),
       (2,     1, 'Current Assets'),
       (3,     2, 'Cash and Cash Equivalents'),
       (4,     2, 'Accounts Receivable (Debtors)'),
       (5,     4, 'Less : Allowances for Doubtful Accounts'),
       (6,     2, 'Inventories'),
       (7,     2, 'Prepaid Expenses'),
       (8,     2, 'Investment Securities'),
       (9,     2, 'Other Current Assets'),
       (10,    1, 'Non-Current Assets (Fixed Assets)'),
       (11,   10, 'Property, Plant and Equipment'),
       (12,   11, 'Less : Accumulated Depreciation'),
       (13,   10, 'Investment Securities'),
       (14,   10, 'Investments in Associates'),
       (15,   10, 'Intangible Assets'),
       (16,   15, 'Less : Accumulated Amortization'),
       (17,   10, 'Goodwill'),
       (18,   10, 'Other Non-Current Assets'),
       (19, NULL, ''),
       (20, NULL, 'LIABILITIES and SHAREHOLDERS'' EQUITY'),
       (21,   20, 'LIABILITIES'),
       (22,   21, 'Current Liabilities (due within one year)'),
       (23,   22, 'Accounts Payable'),
       (24,   22, 'Current Income Tax Payable'),
       (25,   22, 'Current portion of Loans Payable'),
       (26,   22, 'Short-term Provisions'),
       (27,   22, 'Other Current Liabilities'),
       (28,   21, ''),
       (29,   21, 'Non-Current Liabilities (due after more than one year)'),
       (30,   29, 'Loans Payable'),
       (31,   29, 'Issued Debt Securities'),
       (32,   29, 'Deferred Tax Liabilities'),
       (33,   29, 'Provisions, e.g. Pension Obligations'),
       (34,   29, 'Other Non-Current Liabilities'),
       (35,   20, 'SHAREHOLDERS'' EQUITY'),
       (36,   35, 'Paid-in Capital'),
       (37,   35, 'Share Capital'),
       (38,   35, 'Share Premium'),
       (39,   38, 'Less: Treasury Shares'),
       (40,   35, 'Retained Earnings'),
       (41,   35, 'Revaluation Reserve'),
       (42,   35, 'Accumulated Other Comprehensive Income');

Using a recursive CTE

There are two main challenges to deal with: We need to indent the dimension members in accordance to how “deep” they are in the hierarchy, and we need to sort the output correctly. In this case, the sort order is by ID by each parent ID. So the first thing you notice is that this is probably going to be a recursive operation of some kind.

A recursive common table expression is built in two parts, the anchor (your starting point), and the recursive part, both unioned together. In this case, the anchor consists of all the “root” nodes, i.e. the nodes that don’t have a parent node ID.

--- Root nodes:
SELECT CAST(0 AS tinyint) AS [level], rowID, parentRowID, rowDescription
FROM @balanceSheet
WHERE parentRowID IS NULL;

The recursive part is where we derive the children of these nodes.

--- Recursion:
SELECT CAST(cte.[level]+1 AS tinyint) AS [level], bs.rowID,
       bs.parentRowID, bs.rowDescription
FROM @balanceSheet AS bs
INNER JOIN cte ON bs.parentRowID=cte.rowID;

Put them together in a recursive common table expression, and you end up with something like this:

WITH cte ([level], rowID, parentRowID, rowDescription)
AS (
    --- Root nodes:
    SELECT CAST(0 AS tinyint) AS [level], rowID, parentRowID, rowDescription
    FROM @balanceSheet
    WHERE parentRowID IS NULL

    UNION ALL

    --- Recursion:
    SELECT CAST(cte.[level]+1 AS tinyint) AS [level], bs.rowID,
           bs.parentRowID, bs.rowDescription
    FROM @balanceSheet AS bs
    INNER JOIN cte ON bs.parentRowID=cte.rowID)

SELECT *
FROM cte;

.. which returns the following result set:

level rowID parentRowID rowDescription
----- ----- ----------- --------------------------------------------------------
0     1     NULL        ASSETS
0     19    NULL        
0     20    NULL        LIABILITIES and SHAREHOLDERS' EQUITY
1     21    20          LIABILITIES
1     35    20          SHAREHOLDERS' EQUITY
2     36    35          Paid-in Capital
2     37    35          Share Capital
2     38    35          Share Premium
2     40    35          Retained Earnings
2     41    35          Revaluation Reserve
2     42    35          Accumulated Other Comprehensive Income
3     39    38          Less: Treasury Shares
2     22    21          Current Liabilities (due within one year)
2     28    21          
2     29    21          Non-Current Liabilities (due after more than one year)
3     30    29          Loans Payable
3     31    29          Issued Debt Securities
3     32    29          Deferred Tax Liabilities
3     33    29          Provisions, e.g. Pension Obligations
3     34    29          Other Non-Current Liabilities
3     23    22          Accounts Payable
3     24    22          Current Income Tax Payable
3     25    22          Current portion of Loans Payable
3     26    22          Short-term Provisions
3     27    22          Other Current Liabilities
1     2     1           Current Assets
1     10    1           Non-Current Assets (Fixed Assets)
2     11    10          Property, Plant and Equipment
2     13    10          Investment Securities
2     14    10          Investments in Associates
2     15    10          Intangible Assets
2     17    10          Goodwill
2     18    10          Other Non-Current Assets
3     16    15          Less : Accumulated Amortization
3     12    11          Less : Accumulated Depreciation
2     3     2           Cash and Cash Equivalents
2     4     2           Accounts Receivable (Debtors)
2     6     2           Inventories
2     7     2           Prepaid Expenses
2     8     2           Investment Securities
2     9     2           Other Current Assets
3     5     4           Less : Allowances for Doubtful Accounts

Note how the “level” column is incremented by 1 for each recursion, so we can use it to calculate the indentation for each row, like this:

...
SELECT REPLICATE('   ', [level])+rowDescription AS rowDescription
FROM cte;

Now, all that remains is to order the rows correctly. In this case, we want to sort the rows by rowID, but individually for each parent node. For this purpose, we’ll construct a “sort column” using ROW_NUMBER(), like this:

ROW_NUMBER() OVER (
            PARTITION BY parentRowID
            ORDER BY rowID) AS sortColumn

Then we’ll pad the leading side of the return value to a constant length string, so we can build a “tree” of sorts. The first row for each parent will get sortColumn=”   1″, and so on. Converting a numeric value to a string and padding it at the same time is conveniently provided by the STR() function. The result looks like this:

CAST(STR(ROW_NUMBER() OVER (
            PARTITION BY parentRowID
            ORDER BY rowID), 4, 0) AS varchar(1024)) AS sortColumn

Note that because the recursive common table expression requires the anchor and the recursive query to have identical datatype definitions, we’ll explicitly CAST() the sortColumn to varchar(1024). Once built into the CTE, here’s the entire statement:

WITH cte ([level], rowID, parentRowID, rowDescription, sortColumn)
AS (
    --- Root nodes:
    SELECT CAST(0 AS tinyint) AS [level],
        rowID,
        parentRowID,
        rowDescription,
        CAST(STR(ROW_NUMBER() OVER (
            PARTITION BY parentRowID
            ORDER BY rowID), 4, 0) AS varchar(1024)) AS sortColumn
    FROM @balanceSheet
    WHERE parentRowID IS NULL

    UNION ALL

    --- Recursion:
    SELECT CAST(cte.[level]+1 AS tinyint) AS [level],
        bs.rowID,
        bs.parentRowID,
        bs.rowDescription,
        CAST(cte.sortColumn+STR(ROW_NUMBER() OVER (
            PARTITION BY bs.parentRowID
            ORDER BY bs.rowID), 4, 0) AS varchar(1024)) AS sortColumn
    FROM @balanceSheet AS bs
    INNER JOIN cte ON bs.parentRowID=cte.rowID)

SELECT sortColumn, rowID,
       --- Indented rowDescription column:
       REPLICATE('   ', [level])+rowDescription AS rowDescription
FROM cte
ORDER BY sortColumn;

To help you understand how the sortColumn works, here’s the final result of the query:

sortColumn        rowID rowDescription
----------------- ----- ------------------------------------------------------------
   1              1     ASSETS
   1   1          2        Current Assets
   1   1   1      3           Cash and Cash Equivalents
   1   1   2      4           Accounts Receivable (Debtors)
   1   1   2   1  5              Less : Allowances for Doubtful Accounts
   1   1   3      6           Inventories
   1   1   4      7           Prepaid Expenses
   1   1   5      8           Investment Securities
   1   1   6      9           Other Current Assets
   1   2          10       Non-Current Assets (Fixed Assets)
   1   2   1      11          Property, Plant and Equipment
   1   2   1   1  12             Less : Accumulated Depreciation
   1   2   2      13          Investment Securities
   1   2   3      14          Investments in Associates
   1   2   4      15          Intangible Assets
   1   2   4   1  16             Less : Accumulated Amortization
   1   2   5      17          Goodwill
   1   2   6      18          Other Non-Current Assets
   2              19    
   3              20    LIABILITIES and SHAREHOLDERS' EQUITY
   3   1          21       LIABILITIES
   3   1   1      22          Current Liabilities (due within one year)
   3   1   1   1  23             Accounts Payable
   3   1   1   2  24             Current Income Tax Payable
   3   1   1   3  25             Current portion of Loans Payable
   3   1   1   4  26             Short-term Provisions
   3   1   1   5  27             Other Current Liabilities
   3   1   2      28          
   3   1   3      29          Non-Current Liabilities (due after more than one year)
   3   1   3   1  30             Loans Payable
   3   1   3   2  31             Issued Debt Securities
   3   1   3   3  32             Deferred Tax Liabilities
   3   1   3   4  33             Provisions, e.g. Pension Obligations
   3   1   3   5  34             Other Non-Current Liabilities
   3   2          35       SHAREHOLDERS' EQUITY
   3   2   1      36          Paid-in Capital
   3   2   2      37          Share Capital
   3   2   3      38          Share Premium
   3   2   3   1  39             Less: Treasury Shares
   3   2   4      40          Retained Earnings
   3   2   5      41          Revaluation Reserve
   3   2   6      42          Accumulated Other Comprehensive Income

See how sortColumn adds a new ROW_COUNT() for each level of recursion, creating a string representation of a “tree”, which in turn can be sorted?

Now, all that remains is to connect the facts, and you’re done!

SELECT REPLICATE('   ', dim.[level])+dim.rowDescription AS rowDescription,
       fact.amount
FROM cte AS dim
LEFT JOIN (
    SELECT rowID, SUM(amount) AS amount
    FROM dbo.balanceSheet
    GROUP BY rowID
    ) AS fact ON
        dim.rowID=fact.rowID
ORDER BY dim.sortColumn;

Check back for a future post on how to calculate totals and subtotals in a parent-child dimension like this. Meanwhile, don’t forget to like the Facebook page to get updates on new posts!

Trackbacks and Pingbacks:

  1. Accumulating values in a parent-child hierarchy « Sunday morning T-SQL - April 6, 2014

    […] week, we looked at how to construct a visual representation of a hierarchy stored as a parent-child table. The obvious next step is to accumulate values stored on those nodes […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s