PIVOT, UNPIVOT and performance

Few things deserve the attention of a long rant as much as unneccessarily complicated syntaxes. When you want to achieve something that is clearly defined and supported, but you have to look up the the syntax. PIVOT and UNPIVOT are examples of such features, and in this case, I’ll even show you a more well-performing alternative.

A more detailed description of what pivoting is and when it’s applicable is beyond the scope of this post, but if in doubt, ask any Excel user you may know.

Setting up

We’re going to need some sample data to play around with, preferably enough to cause SQL Server to parallellize the query plan if possible. This script will create a temp table with about two million rows.

CREATE TABLE #unpivoted (
    ColName        char(1) NOT NULL,
    RowName        char(1) NOT NULL,
    Amount         numeric(10, 2) NOT NULL

CREATE CLUSTERED INDEX #unpivoted_ix ON #unpivoted (ColName, RowName);

--- A single row..
INSERT INTO #unpivoted (ColName, RowName, Amount)
        ROUND(1000.0*RAND(CHECKSUM(NEWID())), 2);

--- ... which doubles over and over:
WHILE (@@ROWCOUNT<1000000)
    INSERT INTO #unpivoted (ColName, RowName, Amount)
           ROUND(1000.0*RAND(CHECKSUM(NEWID())), 2)
    FROM #unpivoted;

PIVOT out of the box

Here’s how the PIVOT command looks.

SELECT p.RowName, p.A, p.B, p.C, p.D
FROM #unpivoted AS x
       FOR x.ColName IN (A, B, C, D)
       ) AS p
ORDER BY p.RowName;

And the query plan looks like this:

PIVOT plan

Right-to-left, the query performs a scan of the table, aggregates this data with a Hash Match aggregate, and sorts the stream in order to be suitable for a Stream Aggregate. The scan and the hash match are the expensive operations here. On my system, this query consistently completes in about 1.7 seconds.

Do-it-yourself PIVOT

However, anytime I need to pivot anything myself, I tend to prefer the following approach mostly because I feel I have more control over it, the syntax is easier to remember (I know, very subjective), and it’s about as easy to read as the original.

       SUM((CASE WHEN ColName='A' THEN Amount ELSE 0.0 END)) AS A,
       SUM((CASE WHEN ColName='B' THEN Amount ELSE 0.0 END)) AS B,
       SUM((CASE WHEN ColName='C' THEN Amount ELSE 0.0 END)) AS C,
       SUM((CASE WHEN ColName='D' THEN Amount ELSE 0.0 END)) AS D
FROM #unpivoted

You can already tell from the code that the query plan should look very similar indeed. And it does. See if you can spot the difference:

Manual PIVOT plan

Did you notice the Compute Scalar operator that moved from the serial zone to the parallel? The operator itself hardly carries any cost at all, but anything in the serial zone will adversely affect the performance of the entire query, because at that point, all those parallel streams have been merged together into a single stream.

As a rule of thumb, the smaller that last serial zone is, the better your query is going to perform overall.

Speaking of performance, this “manual pivot” query performs about 25% better than the traditional PIVOT. The I/O reads and the estimated cost are both exactly the same, so I’ll chalk it up to the smaller serial zone.


Let’s set up some test data to try out UNPIVOT. This should give us some 300 000 pivoted rows to play with:

       SUM((CASE WHEN ColName='A' THEN Amount ELSE 0.0 END)) AS A,
       SUM((CASE WHEN ColName='B' THEN Amount ELSE 0.0 END)) AS B,
       SUM((CASE WHEN ColName='C' THEN Amount ELSE 0.0 END)) AS C,
       SUM((CASE WHEN ColName='D' THEN Amount ELSE 0.0 END)) AS D
INTO #pivoted
FROM #unpivoted

CREATE CLUSTERED INDEX #pivoted_ix ON #pivoted (RowName);

INSERT INTO #pivoted (RowName, A, B, C, D)
SELECT RowName, A, B, C, D
FROM #pivoted;

    INSERT INTO #pivoted (RowName, A, B, C, D)
    SELECT RowName, A, B, C, D
    FROM #pivoted;

To UNPIVOT these rows, we would use something like this:

SELECT u.ColName, u.RowName, u.Amount
FROM #pivoted AS p
         FOR ColName IN (p.A, p.B, p.C, p.D)
        ) AS u;

This query produces 1.3 million rows in about 1.5 seconds. Here’s the query plan:


Do-it-yourself UNPIVOT

Yes, of course you can do your own UNPIVOT, and it’s a really neat solution as well, based on the use of CROSS APPLY.

SELECT u.ColName, p.RowName, u.Amount
FROM #pivoted AS p
    VALUES ('A', p.A),
           ('B', p.B),
           ('C', p.C),
           ('D', p.D)
    ) AS u(ColName, Amount);

CROSS APPLY creates a nested loop join, the SQL Server-equivalent of a for-each loop. For each row in #pivoted, we create four rows in the CROSS APPLY, and because those rows are entirely constructed from constants and values from #pivoted, the result is just a constant scan, and thus very efficient. Here’s the query plan:

Manual UNPIVOT plan

This plan excludes the Compute Scalar and Filter operators that we saw in the UNPIVOT plan, and it also parallellizes very neatly (no serial zone at all). The estimated cost is slightly lower, while both queries perform a single scan of the table. When we ramp up the data volumes, the UNPIVOT query also parallelizes, but the performance difference becomes more visible: there’s about a 10% performance gain in the home-grown version over UNPIVOT.

Bottom line

I personally really dislike the PIVOT and UNPIVOT constructs, because their syntaxes are unintuitive and hard to remember. Just for fun, I thought I’d build my own equivalents, and it turns out that SQL Server is actually better at optimizing those than the original PIVOT and UNPIVOT, which is a positive surprise to me.

Let me hear your thoughts!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s