Parsing parenthesis expressions

Today’s post illustrates a pretty cool application of SQL Server’s built-in XML and XQuery support, used to parse parenthesis-delimited expressions. You may want to get your reading glasses on for this one..

The challenge here is to turn this expression:

123*a-(90/b+(c+8)+(d+(x+y)*9)-(e+9))(f*(g*h+1))

… into a result set that looks like this:

id  parent expression
--- ------ ------------------------------------------------
0   NULL   123*a-(90/b+(c+8)+(d+(x+y)*9)-(e+9))(f*(g*h+1))
1   0      90/b+(c+8)+(d+(x+y)*9)-(e+9)
2   1      c+8
3   1      d+(x+y)*9
4   3      x+y
5   1      e+9
6   0      f*(g*h+1)
7   6      g*h+1

Let me be the first one to admit that this may not be a clear-cut T-SQL task, but as this is a T-SQL programming blog, here goes.

Turning the string expression into XML

Any expression with balanced parentheses is basically a hierarchy of elements, just like an XML document. So we’ll approach this task by turning the string into an XML document.

By replacing the “(” and “)” characters with XML elements, you can turn the plaintext string into a hierarchy of XML elements.

DECLARE @string varchar(max)=
        '123*a-(90/b+(c+8)+(d+(x+y)*9)-(e+9))(f*(g*h+1))';

SET @string='<expr>'+
    REPLACE(REPLACE(@string,
        '(', '<expr>('),
        ')', ')</expr>')+
    '</expr>';

PRINT @string;

The new value of the @string variable is:

<expr>
    123*a-<expr>
        (90/b+<expr>(c+8)</expr>+<expr>
            (d+<expr>(x+y)</expr>*9)
        </expr>-<expr>(e+9)</expr>)
    </expr><expr>
        (f*<expr>(g*h+1)</expr>)
    </expr>
</expr>

Because we’re turning the plaintext string into XML, we’ll also have to encode the special characters that are used in XML documents, by translating them using the REPLACE() function.

DECLARE @string varchar(max)=
        '123*a-(90/b+(c+8)+(d+(x+y)*9)-(e+9))(f*(g*h+1))';

SET @string='<expr>'+
    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@string,
        '<', '&lt;'),
        '>', '&gt;'),
        '"', '&quot;'),
        '(', '<expr>('),
        ')', ')</expr>')+
    '</expr>';

PRINT @string;

Assigning id attributes

We want to assign id number attributes to each “expr” element in the XML document, so we can output each element’s id and parent element id in the output table.

To do this, we’ll create the “id” attribute on each “expr” element, with the default value of “*”. After that, we’ll loop through all these attributes and replace the “*” value with an actual id from a counter variable, @id, that we increment by 1 for each loop.

DECLARE @string varchar(max)=
        '123*a-(90/b+(c+8)+(d+(x+y)*9)-(e+9))(f*(g*h+1))',
    @id int=1;

SET @string='<expr id="0">('+
        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@string,
            '<', '&lt;'),
            '>', '&gt;'),
            '"', '&quot;'),
            '(', '<expr id="*">('),
            ')', ')</expr>')
        +')</expr>';

--- Loop through the id attributes and replace them with actual numbers.
WHILE (@string LIKE '%<expr id="*">%') BEGIN;
    SET @string=STUFF(
        @string,
        CHARINDEX('<expr id="*">', @string)+10,
        1,
        CAST(@id AS varchar(max)));
    SET @id=@id+1;
END;

PRINT @string;

The XML should now look like this:

<expr id="0">
    (123*a-<expr id="1">
        (90/b+<expr id="2">(c+8)</expr>+<expr id="3">
            (d+<expr id="4">(x+y)</expr>*9)
        </expr>-<expr id="5">(e+9)</expr>)
    </expr><expr id="6">
        (f*<expr id="7">(g*h+1)</expr>)
    </expr>)
</expr>

Parsing the XML

At this point, we can parse the XML string. First off, we’ll convert it to a native XML format, and then use an XQuery to recurse through all of its elements, through all levels of the document.

The XQuery used to return all elements is “//*”. The SELECT statement returns three columns/values from the document elements: the “id” attribute of the element, the “id” attribute of the parent element and the plaintext expression of the element.

DECLARE @xml xml=CAST(@string AS xml);

WITH nodes ([id], parent, expression)
AS (
    SELECT n.expression.value('@id[1]', 'int') AS [id],
           n.expression.value('../@id[1]', 'int') AS parent,
           n.expression.value('.', 'varchar(max)') AS expression
    FROM @xml.nodes('//*') AS n(expression))

SELECT [id], parent, expression
FROM nodes;

The output now looks a lot like the final product that we’re aiming for, except for the fact that the expression column still includes the leading and trailing parenthesis characters.

The end product

We can turn these three blocks of code – turning the string into XML, assigning an id attribute to the elements and parsing the resulting XML – into a table value function. The input parameters of this function are @string, the plaintext string, and @startChar and @endChar, representing the opening and closing parenthesis characters respectively.

CREATE FUNCTION dbo.fn_parseParensXML(
    @string        varchar(max),
    @startChar     char(1)='(',
    @endChar       char(1)=')'
)
RETURNS @tbl TABLE (
    [id]           int NOT NULL,
    parent         int NULL,
    expression     varchar(max) NULL,
    PRIMARY KEY CLUSTERED ([id])
)
AS

BEGIN;
    DECLARE @xml xml, @id int=1;

    --- "XML'ifying" the string by replacing the start character
    --- with the <expr> element and the end character with the
    --- corresponding </expr>. Because we're going to turn all
    --- this into XML, we also need to encode special characters
    --- used in the XML document, such as quotes, "<" and ">".
    --- Finally, we're adding an ID attribute to each <expr>
    --- element. This attribute is the same id that will be
    --- returned in the output table.
    SET @string='<expr id="0">'+@startChar+
            REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@string,
                '<', '&lt;'),
                '>', '&gt;'),
                '"', '&quot;'),
                @startChar, '<expr id="*">'+@startChar),
                @endChar, @endChar+'</expr>')
            +@endChar+'</expr>';

    --- Loop through the id attributes and replace them with
    --- actual numbers.
    WHILE (@string LIKE '%<expr id="*">%') BEGIN;
        SET @string=STUFF(
            @string,
            CHARINDEX('<expr id="*">', @string)+10,
            1,
            CAST(@id AS varchar(max)));
        SET @id=@id+1;
    END;

    --- Convert @string into a proper XML document.
    SET @xml=CAST(@string AS xml);

    --- Parse this XML document into the output table, @tbl:
    --- * The id of each element goes into the "id" column.
    --- * the id of the element's parent element goes into the
    ---   "parent" column.
    --- * The plaintext expression itself goes into the
    ---   "expression" column.
    WITH nodes ([id], parent, expression)
    AS (
        SELECT n.expression.value('@id[1]', 'int') AS [id],
               n.expression.value('../@id[1]', 'int') AS parent,
               n.expression.value('.', 'varchar(max)') AS expression
        FROM @xml.nodes('//*') AS n(expression))

    INSERT INTO @tbl ([id], parent, expression)
    SELECT [id], parent,
        --- The SUBSTRING() is used to trim away the leading
        --- and trailing @startChar and @endChar characters
        --- from the expression.
        SUBSTRING(expression, 2, LEN(expression)-2)
    FROM nodes;

    RETURN;
END;

GO

And here’s an example of a call to the function:

SELECT *
FROM dbo.fn_parseParensXML(
    '123*a-(90/b+(c+8)+(d+(x+y)*9)-(e+9))(f*(g*h+1))',
    '(', ')');

Let me know if you find this function useful, and check back next week for more.

Let me hear your thoughts!

This site uses Akismet to reduce spam. Learn how your comment data is processed.