Parsing XML data

You’ll be hard-pressed not to stumble over XML data in your daily work. Good thing then, SQL Server contains built-in XML parsing logic – there’s a native xml datatype, built-in XPath support and all the tools you need to store and transform data from or to XML data. However, this is an extensive topic, so this article will cover the basics of parsing XML data into a recordset, just to wet your apetite.

The XML datatype

The xml datatype can be used to declare variables or columns in a table. You could store XML data in a regular varchar(max) column, but the xml datatype is a lot better suited for the purpose, with built-in validation, optional schema control and dedicated XML indexes.

XML indexes

A really sweet feature that is really easy to overlook is the XML index. An XML index is placed on a single xml column in a table, not the entire table. Much like an un-indexed table (a heap), a lot of XML data without an index has to be parsed every time you want to access some of it, whereas if you have an index on your XML column, access will be radically faster.

As with regular table indexes, there are two types of indexes. In the context of XML indexes, they’re called primary and secondary – roughly equivalent to clustered indexes and non-clustered indexes on tables. And just like with indexed views, the first index on an XML column must be a primary xml index.

Other than that, the syntax is very similar to a regular index.

CREATE PRIMARY XML INDEX PXML_ProductModel_CatalogDescription
    ON Production.ProductModel (CatalogDescription);

Look for a separate article on XML indexes in the future.

Transforming an XML set to a tabular set

The nodes() and value() functions in SQL Server provide the means to extract XML data from a variable or a column. Here’s an example that uses both those functions:

DECLARE @x xml;
SET @x='
<addressbook name="Name of my addressbook">
    <contact>
        <company>AdventureWorks</company>
        <lastname>Smith</lastname>
        <firstname>A.</firstname>
    </contact>
    <contact>
        <company>AdventureWorks</company>
        <lastname>Anderson</lastname>
        <firstname>Mr.</firstname>
    </contact>
</addressbook>';

SELECT
    tbl.contacts.value('company[1]', 'varchar(100)') AS company,
    tbl.contacts.value('lastname[1]', 'varchar(100)') AS lastname,
    tbl.contacts.value('firstname[1]', 'varchar(100)') AS firstname
FROM @x.nodes('/addressbook/contact') AS tbl(contacts);

First off, in the nodes() function in the FROM clause, note how we define the “base XPath” of our recordset. In this case, we want to create one record for every /addressbook/contact. So in this case, we want the recordset to contain two rows. Depending on the XPath you provide to the nodes() function, you will get different numbers of records in your result set.

The alias of the table in the FROM clause, tbl(contacts), tells us that the nodes() function returns a table with one XML column. The table alias is tbl and the alias of the single column is contacts. So, in the SELECT clause of this query, you’ll write your query for an XML column called contacts in a table called tbl.

The next step is to extract individual columns from tbl.contacts. This is done with the value() function, which accepts two parameters:

1. An XPath, which is a relative path in relation to /addressbook/contact. This XPath points to where the element or attribute is that you want in the output column.
2. A datatype declaration for the output column.

Using CROSS APPLY with XML transformations

As you may have noticed, the nodes() function actually works like a table value function, so if you want to join a new nodes() recordset to an existing query, you have to use CROSS APPLY instead of a regular join.

Here’s an example where we derive a subset of an XML column (book.addrbook) into another XML column (tbl.contacts) using CROSS APPLY with a nodes() function:

SELECT
    book.addrbook.value('@name', 'varchar(100)') AS name1,
    tbl.contacts.value('../@name', 'varchar(100)') AS name2,
    tbl.contacts.value('company[1]', 'varchar(100)') AS company,
    tbl.contacts.value('lastname[1]', 'varchar(100)') AS lastname,
    tbl.contacts.value('firstname[1]', 'varchar(100)') AS firstname
FROM @x.nodes('/addressbook') AS book(addrbook)
CROSS APPLY book.addrbook.nodes('contact') AS tbl(contacts);

As a bonus, note how the columns name1 and name2 are really the same value, but they are two different paths of extracting the same data.

But there’s so much more.

I’ve only scratched the surface of all the XML stuff you can do in SQL Server. There are XSD schemas, advanced XPath queries both for extracting and for modifying XML variables, and lots more. If you want all the details, make sure to check out the MSDN page on XML in SQL Server.

2 thoughts on “Parsing XML data

  1. Pingback: Using DDL triggers to set up version control | Sunday morning T-SQL

  2. Pingback: Aggregate string concatenations « Sunday morning T-SQL

Let me hear your thoughts!

This site uses Akismet to reduce spam. Learn how your comment data is processed.