XML indexes

XML documents are, by design, at best like an entire relational database, stored in a single column. Worst-case, they don’t even come with a schema. Small wonder then, that XML queries take a good deal of CPU and I/O performance to complete. The solution to this problem is to index them, pretty much like you would index regular tables.

A short primer on indexing

Indexing is a central concept in all kinds data management, be it relational tables or even files in a tree of directories (a file system actually resembles an index in many ways). With extremely few exceptions, an indexed and sorted dataset will yield an exceptional performance boost, at very little cost in relation. If you want to get up to scratch on indexing, check out the Indexing Basics article.

Traditionally, database indexes would only cover columns in tables and indexed views. With XML indexes in SQL Server, you apply the same indexing benefits to all that data that resides in XML columns as well.

XML indexes apply from SQL Server 2005 and forward. Selective XML indexes can be used as of SQL Server 2012.

Primary XML indexes

A primary XML index indexes and stores all parts of the XML data from the chosen column, and in that sense, you could say it works a bit like a clustered index does on a table. And just like a table can only have one clustered index, an XML column can have only one primary XML index.

An XML index requires you to have a clustered index on your table, so the XML index can tie in with the table rows (i.e. the clustered index leaf nodes).

Creating a primary XML index is just as straight-forward as creating a regular single-column index on a table.

CREATE PRIMARY XML INDEX
    PXML_ProductModel_CatalogDescription
        ON Production.ProductModel (CatalogDescription);

Secondary XML indexes

Because the primary XML index contains all of the data in the XML column, it is fairly large in size. To provide an additional performance boost to XML queries, you can add specialized secondary XML indexes. The secondary XML index still covers the same data as the underlying primary index (i.e. the entire XML object), but you could say that it creates a more specific index, based on the primary index.

The T-SQL syntax for setting up a secondary XML index is very similar to the primary index, with the addition of the “USING XML INDEX” clause, which tells SQL Server that this isn’t a primary index, but is based on one.

CREATE XML INDEX IXML_ProductModel_CatalogDescription_Path 
    ON Production.ProductModel (CatalogDescription)
        USING XML INDEX PXML_ProductModel_CatalogDescription
        FOR PATH;

There are three different types of secondary indexes (specified with the FOR keyword at the end of the statement). The difference between them is what they index and how this information is arranged in the database, so you’ll need to choose a type that suits your query needs.

PATH secondary indexes

The PATH secondary XML index is optimally designed for looking up paths, for instance when you’re using the exist() method in your queries. The index key of this type of index is the node path of all nodes (elements, attributes, values) in the XML document.

VALUE secondary indexes

The index key of the VALUE secondary XML index is the node value and the node path, so you may want to consider this type if you’re looking for a specific value, but don’t know the exact path. For instance, if you want to look for a certain “LastName” property, but you don’t know if you’re looking for an employee, a customer or some other type of contact.

PROPERTY secondary indexes

The PROPERTY secondary XML index’s key is the base table’s primary key, node path and node value. This provides the means to perform queries that search specific table records and specific paths in those records’ XML documents, all in a single index scan/seek.

If you’re having a hard time wrapping your head around which type of secondary XML index to choose, this is perhaps the one you might want to start off with.

Selective XML indexes

A selective XML index works a bit like the XML index equivalent of a filtered index – because the index does not cover the entire XML structure, it’s considerably smaller, bringing a performance benefit with it. An ordinary primary XML index may take a lot of space and CPU time to build if your XML blobs are large or if your table has a lot of rows, whereas you may only be interested in specific, small parts of all this data.

Like ordinary XML indexes, selective indexes can be created as primary (one per column) and secondary (which cover a subset of the primary selective index). When you create secondary selective indexes, they obviously only cover the same data that the primary selective XML index does.

Because the selective index requires you to specify one or several paths, the CREATE INDEX syntax is slightly more complicated:

CREATE SELECTIVE XML INDEX sxi_index
ON Tbl(xmlcol)

FOR (
    pathab   = '/a/b' as XQUERY 'node()'
    pathabc  = '/a/b/c' as XQUERY 'xs:double', 
    pathdtext = '/a/b/d/text()' as XQUERY
            'xs:string' MAXLENGTH(200) SINGLETON
    pathabe = '/a/b/e' as SQL NVARCHAR(100)
);

You can add or remove paths from the selective XML index later on, using the ALTER INDEX statement. For more on selective XML indexes, see this MSDN article.

2 comments

  1. Hi, what is the selective index for optimising retrieval of an attribute within a tag?


  2. Adding the primary XML index decreased the query duration to 4 minutes. Before i cancelled the query after an hour on a 100mb xml file. Thank you!

Leave a comment

Your email address will not be published. Required fields are marked *