When you’re designing reports, they can often be based on hiearchies represented by “nodes” in a parent-child setup. To the end-user, the parent-child representation doesn’t provide very much readability, so you need to output this information in a human-readable form, for instance in a table where the names/titles are indented.
A good example of data that is organized in a hierarchy is a balance sheet. Used in financial reporting, the balance sheet represents different types of assets and liabilities in a company, divided and subdivided into multiple levels of detail. Because this dimension has a different number of levels depending on where in the dimension we are (a so-called “ragged” dimension), it’s ideally modelled as a parent-child dimension.
In a parent-child dimension, a “parent” column provides each member with a pointer to its parent member. The top-level members use a NULL value as the parent member ID.
Example balance sheet
Here’s an example of a balance sheet in a parent-child dimension that we can use:
DECLARE @balanceSheet TABLE ( rowID int NOT NULL, parentRowID int NULL, rowDescription varchar(100) NOT NULL, PRIMARY KEY CLUSTERED (rowID) ); --- Balance sheet example from http://en.wikipedia.org/wiki/Balance_sheet. --- Slightly modified to readability. INSERT INTO @balanceSheet (rowID, parentRowID, rowDescription) VALUES (1, NULL, 'ASSETS'), (2, 1, 'Current Assets'), (3, 2, 'Cash and Cash Equivalents'), (4, 2, 'Accounts Receivable (Debtors)'), (5, 4, 'Less : Allowances for Doubtful Accounts'), (6, 2, 'Inventories'), (7, 2, 'Prepaid Expenses'), (8, 2, 'Investment Securities'), (9, 2, 'Other Current Assets'), (10, 1, 'Non-Current Assets (Fixed Assets)'), (11, 10, 'Property, Plant and Equipment'), (12, 11, 'Less : Accumulated Depreciation'), (13, 10, 'Investment Securities'), (14, 10, 'Investments in Associates'), (15, 10, 'Intangible Assets'), (16, 15, 'Less : Accumulated Amortization'), (17, 10, 'Goodwill'), (18, 10, 'Other Non-Current Assets'), (19, NULL, ''), (20, NULL, 'LIABILITIES and SHAREHOLDERS'' EQUITY'), (21, 20, 'LIABILITIES'), (22, 21, 'Current Liabilities (due within one year)'), (23, 22, 'Accounts Payable'), (24, 22, 'Current Income Tax Payable'), (25, 22, 'Current portion of Loans Payable'), (26, 22, 'Short-term Provisions'), (27, 22, 'Other Current Liabilities'), (28, 21, ''), (29, 21, 'Non-Current Liabilities (due after more than one year)'), (30, 29, 'Loans Payable'), (31, 29, 'Issued Debt Securities'), (32, 29, 'Deferred Tax Liabilities'), (33, 29, 'Provisions, e.g. Pension Obligations'), (34, 29, 'Other Non-Current Liabilities'), (35, 20, 'SHAREHOLDERS'' EQUITY'), (36, 35, 'Paid-in Capital'), (37, 35, 'Share Capital'), (38, 35, 'Share Premium'), (39, 38, 'Less: Treasury Shares'), (40, 35, 'Retained Earnings'), (41, 35, 'Revaluation Reserve'), (42, 35, 'Accumulated Other Comprehensive Income');
Using a recursive CTE
There are two main challenges to deal with: We need to indent the dimension members in accordance to how “deep” they are in the hierarchy, and we need to sort the output correctly. In this case, the sort order is by ID by each parent ID. So the first thing you notice is that this is probably going to be a recursive operation of some kind.
A recursive common table expression is built in two parts, the anchor (your starting point), and the recursive part, both unioned together. In this case, the anchor consists of all the “root” nodes, i.e. the nodes that don’t have a parent node ID.
--- Root nodes:
SELECT CAST(0 AS tinyint) AS [level], rowID, parentRowID, rowDescription
FROM @balanceSheet
WHERE parentRowID IS NULL;
The recursive part is where we derive the children of these nodes.
--- Recursion:
SELECT CAST(cte.[level]+1 AS tinyint) AS [level], bs.rowID,
bs.parentRowID, bs.rowDescription
FROM @balanceSheet AS bs
INNER JOIN cte ON bs.parentRowID=cte.rowID;
Put them together in a recursive common table expression, and you end up with something like this:
WITH cte ([level], rowID, parentRowID, rowDescription) AS ( --- Root nodes: SELECT CAST(0 AS tinyint) AS [level], rowID, parentRowID, rowDescription FROM @balanceSheet WHERE parentRowID IS NULL UNION ALL --- Recursion: SELECT CAST(cte.[level]+1 AS tinyint) AS [level], bs.rowID, bs.parentRowID, bs.rowDescription FROM @balanceSheet AS bs INNER JOIN cte ON bs.parentRowID=cte.rowID) SELECT * FROM cte;
.. which returns the following result set:
level rowID parentRowID rowDescription ----- ----- ----------- -------------------------------------------------------- 0 1 NULL ASSETS 0 19 NULL 0 20 NULL LIABILITIES and SHAREHOLDERS' EQUITY 1 21 20 LIABILITIES 1 35 20 SHAREHOLDERS' EQUITY 2 36 35 Paid-in Capital 2 37 35 Share Capital 2 38 35 Share Premium 2 40 35 Retained Earnings 2 41 35 Revaluation Reserve 2 42 35 Accumulated Other Comprehensive Income 3 39 38 Less: Treasury Shares 2 22 21 Current Liabilities (due within one year) 2 28 21 2 29 21 Non-Current Liabilities (due after more than one year) 3 30 29 Loans Payable 3 31 29 Issued Debt Securities 3 32 29 Deferred Tax Liabilities 3 33 29 Provisions, e.g. Pension Obligations 3 34 29 Other Non-Current Liabilities 3 23 22 Accounts Payable 3 24 22 Current Income Tax Payable 3 25 22 Current portion of Loans Payable 3 26 22 Short-term Provisions 3 27 22 Other Current Liabilities 1 2 1 Current Assets 1 10 1 Non-Current Assets (Fixed Assets) 2 11 10 Property, Plant and Equipment 2 13 10 Investment Securities 2 14 10 Investments in Associates 2 15 10 Intangible Assets 2 17 10 Goodwill 2 18 10 Other Non-Current Assets 3 16 15 Less : Accumulated Amortization 3 12 11 Less : Accumulated Depreciation 2 3 2 Cash and Cash Equivalents 2 4 2 Accounts Receivable (Debtors) 2 6 2 Inventories 2 7 2 Prepaid Expenses 2 8 2 Investment Securities 2 9 2 Other Current Assets 3 5 4 Less : Allowances for Doubtful Accounts
Note how the “level” column is incremented by 1 for each recursion, so we can use it to calculate the indentation for each row, like this:
...
SELECT REPLICATE(' ', [level])+rowDescription AS rowDescription
FROM cte;
Now, all that remains is to order the rows correctly. In this case, we want to sort the rows by rowID, but individually for each parent node. For this purpose, we’ll construct a “sort column” using ROW_NUMBER(), like this:
ROW_NUMBER() OVER ( PARTITION BY parentRowID ORDER BY rowID) AS sortColumn
Then we’ll pad the leading side of the return value to a constant length string, so we can build a “tree” of sorts. The first row for each parent will get sortColumn=” 1″, and so on. Converting a numeric value to a string and padding it at the same time is conveniently provided by the STR() function. The result looks like this:
CAST(STR(ROW_NUMBER() OVER ( PARTITION BY parentRowID ORDER BY rowID), 4, 0) AS varchar(1024)) AS sortColumn
Note that because the recursive common table expression requires the anchor and the recursive query to have identical datatype definitions, we’ll explicitly CAST() the sortColumn to varchar(1024). Once built into the CTE, here’s the entire statement:
WITH cte ([level], rowID, parentRowID, rowDescription, sortColumn) AS ( --- Root nodes: SELECT CAST(0 AS tinyint) AS [level], rowID, parentRowID, rowDescription, CAST(STR(ROW_NUMBER() OVER ( PARTITION BY parentRowID ORDER BY rowID), 4, 0) AS varchar(1024)) AS sortColumn FROM @balanceSheet WHERE parentRowID IS NULL UNION ALL --- Recursion: SELECT CAST(cte.[level]+1 AS tinyint) AS [level], bs.rowID, bs.parentRowID, bs.rowDescription, CAST(cte.sortColumn+STR(ROW_NUMBER() OVER ( PARTITION BY bs.parentRowID ORDER BY bs.rowID), 4, 0) AS varchar(1024)) AS sortColumn FROM @balanceSheet AS bs INNER JOIN cte ON bs.parentRowID=cte.rowID) SELECT sortColumn, rowID, --- Indented rowDescription column: REPLICATE(' ', [level])+rowDescription AS rowDescription FROM cte ORDER BY sortColumn;
To help you understand how the sortColumn works, here’s the final result of the query:
sortColumn rowID rowDescription ----------------- ----- ------------------------------------------------------------ 1 1 ASSETS 1 1 2 Current Assets 1 1 1 3 Cash and Cash Equivalents 1 1 2 4 Accounts Receivable (Debtors) 1 1 2 1 5 Less : Allowances for Doubtful Accounts 1 1 3 6 Inventories 1 1 4 7 Prepaid Expenses 1 1 5 8 Investment Securities 1 1 6 9 Other Current Assets 1 2 10 Non-Current Assets (Fixed Assets) 1 2 1 11 Property, Plant and Equipment 1 2 1 1 12 Less : Accumulated Depreciation 1 2 2 13 Investment Securities 1 2 3 14 Investments in Associates 1 2 4 15 Intangible Assets 1 2 4 1 16 Less : Accumulated Amortization 1 2 5 17 Goodwill 1 2 6 18 Other Non-Current Assets 2 19 3 20 LIABILITIES and SHAREHOLDERS' EQUITY 3 1 21 LIABILITIES 3 1 1 22 Current Liabilities (due within one year) 3 1 1 1 23 Accounts Payable 3 1 1 2 24 Current Income Tax Payable 3 1 1 3 25 Current portion of Loans Payable 3 1 1 4 26 Short-term Provisions 3 1 1 5 27 Other Current Liabilities 3 1 2 28 3 1 3 29 Non-Current Liabilities (due after more than one year) 3 1 3 1 30 Loans Payable 3 1 3 2 31 Issued Debt Securities 3 1 3 3 32 Deferred Tax Liabilities 3 1 3 4 33 Provisions, e.g. Pension Obligations 3 1 3 5 34 Other Non-Current Liabilities 3 2 35 SHAREHOLDERS' EQUITY 3 2 1 36 Paid-in Capital 3 2 2 37 Share Capital 3 2 3 38 Share Premium 3 2 3 1 39 Less: Treasury Shares 3 2 4 40 Retained Earnings 3 2 5 41 Revaluation Reserve 3 2 6 42 Accumulated Other Comprehensive Income
See how sortColumn adds a new ROW_COUNT() for each level of recursion, creating a string representation of a “tree”, which in turn can be sorted?
Now, all that remains is to connect the facts, and you’re done!
SELECT REPLICATE(' ', dim.[level])+dim.rowDescription AS rowDescription, fact.amount FROM cte AS dim LEFT JOIN ( SELECT rowID, SUM(amount) AS amount FROM dbo.balanceSheet GROUP BY rowID ) AS fact ON dim.rowID=fact.rowID ORDER BY dim.sortColumn;
Check back for a future post on how to calculate totals and subtotals in a parent-child dimension like this. Meanwhile, don’t forget to like the Facebook page to get updates on new posts!
1 comment