How to run your CTE just once, and re-use the output

You may already know that common table expressions, like views, don’t behave like regular tables. They’re a way to make your query more readable by allowing you to write a complex SQL expression just once, rather than repeating it all over your statement or view. This makes reading, understanding and future refactoring of your code a little less painful.

But they’re no magic bullet, and you may end up with some unexpected execution plans.

Straight to demo

The example queries in this post run in the StackOverflow database, which Brent Ozar has made available for some time now. If you want to follow along, I would also add the following index for good measure:

CREATE INDEX IX_Posts_Type_Owner
    ON dbo.Posts
        (PostTypeId, OwnerUserId)
    INCLUDE (ViewCount, CommentCount, FavoriteCount);

The problem

If you’re a long-time reader of the blog, you may already know that this query actually executes the code in the CTE four times.

WITH cte AS (
    SELECT u.DisplayName, u.Reputation,
           SUM(p.ViewCount) AS ViewCount,
           SUM(p.CommentCount) AS CommentCount,
           SUM(p.FavoriteCount) AS FavoriteCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName, u.Reputation)

--- 1
SELECT DisplayName, 'Reputation' AS Metric, Reputation AS [Value]
FROM cte
WHERE Reputation>0

UNION ALL

--- 2
SELECT DisplayName, 'Views' AS Metric, ViewCount AS [Value]
FROM cte
WHERE ViewCount>0

UNION ALL

--- 3
SELECT DisplayName, 'Comments' AS Metric, CommentCount AS [Value]
FROM cte
WHERE CommentCount>0

UNION ALL

--- 4
SELECT DisplayName, 'Favorited' AS Metric, FavoriteCount AS [Value]
FROM cte
WHERE FavoriteCount>0;

Under the hood, SQL Server “expands” the common table expression, so the query becomes something like this:

--- 1
SELECT DisplayName, 'Reputation' AS Metric, Reputation AS [Value]
FROM (
    SELECT u.DisplayName, u.Reputation
    FROM dbo.Users AS u
    GROUP BY u.DisplayName, u.Reputation) AS cte
WHERE Reputation>0

UNION ALL

--- 2
SELECT DisplayName, 'Views' AS Metric, ViewCount AS [Value]
FROM (
    SELECT u.DisplayName, SUM(p.ViewCount) AS ViewCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName) AS cte
WHERE ViewCount>0

UNION ALL

--- 3
SELECT DisplayName, 'Comments' AS Metric, CommentCount AS [Value]
FROM (
    SELECT u.DisplayName, SUM(p.CommentCount) AS CommentCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName) AS cte
WHERE CommentCount>0

UNION ALL

--- 4
SELECT DisplayName, 'Favorited' AS Metric, FavoriteCount AS [Value]
FROM (
    SELECT u.DisplayName, SUM(p.FavoriteCount) AS FavoriteCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName) AS cte
WHERE FavoriteCount>0;

These two queries generate identical execution plans, but the one with the CTE is much, much more readable.

The CTE actually runs once for each UNION ALL

You can tell by the plan why this is an inefficient query: the SQL expression in the common table expression is executed once for every time that it’s referenced in the code.

Better living through CROSS APPLY

You could store the results of the CTE in a temp table, but where’s the fun in that? Instead, why not use the CTE once, and then return four rows for each row that the CTE spits out? That’s exactly what CROSS APPLY does.

WITH cte AS (
    SELECT u.DisplayName, u.Reputation,
           SUM(p.ViewCount) AS ViewCount,
           SUM(p.CommentCount) AS CommentCount,
           SUM(p.FavoriteCount) AS FavoriteCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName, u.Reputation)

SELECT DisplayName, x.Metric, x.[Value]
FROM cte
CROSS APPLY (
    --- 1
    SELECT 'Reputation' AS Metric, cte.Reputation AS [Value] WHERE cte.Reputation>0

    UNION ALL

    --- 2
    SELECT 'Views' AS Metric, cte.ViewCount AS [Value] WHERE cte.ViewCount>0

    UNION ALL

    --- 3
    SELECT 'Comments' AS Metric, cte.CommentCount AS [Value] WHERE cte.CommentCount>0

    UNION ALL

    --- 4
    SELECT 'Favorited' AS Metric, cte.FavoriteCount AS [Value] WHERE cte.FavoriteCount>0
    ) AS x;

Think of CROSS APPLY (and it’s baby sibling OUTER APPLY) as a way to do a “for-each” construct, but in T-SQL. “For each row generated above, execute and join this.

Now, you can now see that the execution plan only scans the Posts and Users tables once each, instead of once for every UNION ALL set.

The CTE runs only once. The Nested Loops operator splits every row into four.

And the statistics add up: The original query performed around 82,000 logical reads on dbo.Posts and 178,000 logical reads on dbo.Users.

Our new-and-improved CROSS APPLY query only needs 27,200 logical reads on dbo.Posts and 44,500 reads on dbo.Users to accomplish the same results. These are single-thread results, but they scale when you run parallel threads.

Just one more little tweak

We can simplify the query just a little bit more. By moving the WHERE clauses out of the CROSS APPLY, we can turn the SELECT … UNION ALL.. pattern into a VALUES construct:

WITH cte AS (
    SELECT u.DisplayName, u.Reputation,
           SUM(p.ViewCount) AS ViewCount,
           SUM(p.CommentCount) AS CommentCount,
           SUM(p.FavoriteCount) AS FavoriteCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName, u.Reputation)

SELECT DisplayName, x.Metric, x.[Value]
FROM cte
CROSS APPLY (
    VALUES ('Reputation', cte.Reputation),    --- 1
           ('Views',      cte.ViewCount),     --- 2
           ('Comments',   cte.CommentCount),  --- 3
           ('Favorited',  cte.FavoriteCount)  --- 4
    ) AS x(Metric, [Value])
WHERE x.[Value]>0;

This reduces the amount of work the Nested Loops operator needs to do, and dramatically simplifies the plan:

When four become one.

The IO statistics are about the same for this query, but the CPU time is about 10% less, because we’ve eliminated a few Compute Scalar and Filter operators from the plan.

16 thoughts on “How to run your CTE just once, and re-use the output

  1. Excellent example.
    The Cross Apply method also gives you the ability to chose row-based resultset or column-based resultset.

    CROSS APPLY (
    VALUES (‘Reputation’, cte.Reputation, ‘Views’, cte.ViewCount, ‘Comments’, cte.CommentCount,’Favorited’, cte.FavoriteCount);

    because in some cases that is what is needed.

  2. This is a great re-write example, but I think the title is a bit misleading. Your final version references the CTE only once, so you’ve not really “re-used” any output. You no longer need to re-use any output, thanks to the re-write. Nothing wrong with that at all – it’s a good technique. But based on the title I was expecting to see a spool or some similar rewindable temporary object, and the CTE referenced multiple times but only evaluated once.

  3. Pingback: RE: How to run your CTE just once, and re-use the output – Protiguous Software

  4. For the sake of completeness: you can do it with UNPIVOT too (even if I always have to check the exact syntax, when I use it). At least on my PC and the smallest StackOverflow database it used in average 3 seconds fewer CPU than the version with the VALUES() syntax (reads are equal).

    WITH cte AS (
    SELECT u.DisplayName, u.Reputation,
    SUM(p.ViewCount) AS ViewCount,
    SUM(p.CommentCount) AS CommentCount,
    SUM(p.FavoriteCount) AS FavoriteCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName, u.Reputation)
    SELECT upv.DisplayName
    , upv.Reputation
    , upv.Metric
    , upv.Value
    FROM cte
    UNPIVOT (Value FOR Metric IN (ViewCount, CommentCount, FavoriteCount)) AS upv
    ;

Let me hear your thoughts!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.