How to run your CTE just once, and re-use the output

You may already know that common table expressions, like views, don’t behave like regular tables. They’re a way to make your query more readable by allowing you to write a complex SQL expression just once, rather than repeating it all over your statement or view. This makes reading, understanding and future refactoring of your code a little less painful.

But they’re no magic bullet, and you may end up with some unexpected execution plans.

Straight to demo

The example queries in this post run in the StackOverflow database, which Brent Ozar has made available for some time now. If you want to follow along, I would also add the following index for good measure:

CREATE INDEX IX_Posts_Type_Owner
    ON dbo.Posts
        (PostTypeId, OwnerUserId)
    INCLUDE (ViewCount, CommentCount, FavoriteCount);

The problem

If you’re a long-time reader of the blog, you may already know that this query actually executes the code in the CTE four times.

WITH cte AS (
    SELECT u.DisplayName, u.Reputation,
           SUM(p.ViewCount) AS ViewCount,
           SUM(p.CommentCount) AS CommentCount,
           SUM(p.FavoriteCount) AS FavoriteCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName, u.Reputation)

--- 1
SELECT DisplayName, 'Reputation' AS Metric, Reputation AS [Value]
FROM cte
WHERE Reputation>0

UNION ALL

--- 2
SELECT DisplayName, 'Views' AS Metric, ViewCount AS [Value]
FROM cte
WHERE ViewCount>0

UNION ALL

--- 3
SELECT DisplayName, 'Comments' AS Metric, CommentCount AS [Value]
FROM cte
WHERE CommentCount>0

UNION ALL

--- 4
SELECT DisplayName, 'Favorited' AS Metric, FavoriteCount AS [Value]
FROM cte
WHERE FavoriteCount>0;

Under the hood, SQL Server “expands” the common table expression, so the query becomes something like this:

--- 1
SELECT DisplayName, 'Reputation' AS Metric, Reputation AS [Value]
FROM (
    SELECT u.DisplayName, u.Reputation
    FROM dbo.Users AS u
    GROUP BY u.DisplayName, u.Reputation) AS cte
WHERE Reputation>0

UNION ALL

--- 2
SELECT DisplayName, 'Views' AS Metric, ViewCount AS [Value]
FROM (
    SELECT u.DisplayName, SUM(p.ViewCount) AS ViewCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName) AS cte
WHERE ViewCount>0

UNION ALL

--- 3
SELECT DisplayName, 'Comments' AS Metric, CommentCount AS [Value]
FROM (
    SELECT u.DisplayName, SUM(p.CommentCount) AS CommentCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName) AS cte
WHERE CommentCount>0

UNION ALL

--- 4
SELECT DisplayName, 'Favorited' AS Metric, FavoriteCount AS [Value]
FROM (
    SELECT u.DisplayName, SUM(p.FavoriteCount) AS FavoriteCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName) AS cte
WHERE FavoriteCount>0;

These two queries generate identical execution plans, but the one with the CTE is much, much more readable.

The CTE actually runs once for each UNION ALL

You can tell by the plan why this is an inefficient query: the SQL expression in the common table expression is executed once for every time that it’s referenced in the code.

Better living through CROSS APPLY

You could store the results of the CTE in a temp table, but where’s the fun in that? Instead, why not use the CTE once, and then return four rows for each row that the CTE spits out? That’s exactly what CROSS APPLY does.

WITH cte AS (
    SELECT u.DisplayName, u.Reputation,
           SUM(p.ViewCount) AS ViewCount,
           SUM(p.CommentCount) AS CommentCount,
           SUM(p.FavoriteCount) AS FavoriteCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName, u.Reputation)

SELECT DisplayName, x.Metric, x.[Value]
FROM cte
CROSS APPLY (
    --- 1
    SELECT 'Reputation' AS Metric, cte.Reputation AS [Value] WHERE cte.Reputation>0

    UNION ALL

    --- 2
    SELECT 'Views' AS Metric, cte.ViewCount AS [Value] WHERE cte.ViewCount>0

    UNION ALL

    --- 3
    SELECT 'Comments' AS Metric, cte.CommentCount AS [Value] WHERE cte.CommentCount>0

    UNION ALL

    --- 4
    SELECT 'Favorited' AS Metric, cte.FavoriteCount AS [Value] WHERE cte.FavoriteCount>0
    ) AS x;

Think of CROSS APPLY (and it’s baby sibling OUTER APPLY) as a way to do a “for-each” construct, but in T-SQL. “For each row generated above, execute and join this.“

Now, you can now see that the execution plan only scans the Posts and Users tables once each, instead of once for every UNION ALL set.

The CTE runs only once. The Nested Loops operator splits every row into four.

And the statistics add up: The original query performed around 82,000 logical reads on dbo.Posts and 178,000 logical reads on dbo.Users.

Our new-and-improved CROSS APPLY query only needs 27,200 logical reads on dbo.Posts and 44,500 reads on dbo.Users to accomplish the same results. These are single-thread results, but they scale when you run parallel threads.

Just one more little tweak

We can simplify the query just a little bit more. By moving the WHERE clauses out of the CROSS APPLY, we can turn the SELECT … UNION ALL.. pattern into a VALUES construct:

WITH cte AS (
    SELECT u.DisplayName, u.Reputation,
           SUM(p.ViewCount) AS ViewCount,
           SUM(p.CommentCount) AS CommentCount,
           SUM(p.FavoriteCount) AS FavoriteCount
    FROM dbo.Users AS u
    LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
    GROUP BY u.DisplayName, u.Reputation)

SELECT DisplayName, x.Metric, x.[Value]
FROM cte
CROSS APPLY (
    VALUES ('Reputation', cte.Reputation),    --- 1
           ('Views',      cte.ViewCount),     --- 2
           ('Comments',   cte.CommentCount),  --- 3
           ('Favorited',  cte.FavoriteCount)  --- 4
    ) AS x(Metric, [Value])
WHERE x.[Value]>0;

This reduces the amount of work the Nested Loops operator needs to do, and dramatically simplifies the plan:

The IO statistics are about the same for this query, but the CPU time is about 10% less, because we’ve eliminated a few Compute Scalar and Filter operators from the plan.

32 comments

Peter Larsson says:

2020-12-01 at 10:19

Excellent example.
The Cross Apply method also gives you the ability to chose row-based resultset or column-based resultset.

CROSS APPLY (
VALUES (‘Reputation’, cte.Reputation, ‘Views’, cte.ViewCount, ‘Comments’, cte.CommentCount,’Favorited’, cte.FavoriteCount);

because in some cases that is what is needed.

Reply
Adam Machanic says:

2020-12-01 at 17:46

This is a great re-write example, but I think the title is a bit misleading. Your final version references the CTE only once, so you’ve not really “re-used” any output. You no longer need to re-use any output, thanks to the re-write. Nothing wrong with that at all – it’s a good technique. But based on the title I was expecting to see a spool or some similar rewindable temporary object, and the CTE referenced multiple times but only evaluated once.

Reply
1. Daniel Hutmacher says:
  
  2020-12-01 at 17:48
  
  Fair enough. Post titles are tricky. 🙂
  
  Reply
Pingback: RE: How to run your CTE just once, and re-use the output – Protiguous Software
Román says:

2020-12-02 at 01:06

This is simply amazing. What a creative way to reuse CTE.

Reply
Roman Peralta says:

2020-12-02 at 01:12

I really like it. I use CTEs a lot. Great technique!

Reply
Khoa.Nguyen says:

2020-12-02 at 05:07

this is helpful

Reply
Sree Kandula says:

2020-12-02 at 12:43

Very nice

Reply
Barry Weikle says:

2020-12-07 at 17:31

Thank you for giving me an excellent way to explain CROSS APPLY in one sentence!

Reply
1. Daniel Hutmacher says:
  
  2020-12-07 at 17:32
  
  Thanks! Glad you liked it!
  
  Reply
thiagotimm says:

2020-12-07 at 20:36

Awesome! Really nice post!

Reply
1. Daniel Hutmacher says:
  
  2020-12-07 at 20:36
  
  Thank you!
  
  Reply
Thomas Franz says:

2020-12-08 at 15:03

For the sake of completeness: you can do it with UNPIVOT too (even if I always have to check the exact syntax, when I use it). At least on my PC and the smallest StackOverflow database it used in average 3 seconds fewer CPU than the version with the VALUES() syntax (reads are equal).

WITH cte AS (
SELECT u.DisplayName, u.Reputation,
SUM(p.ViewCount) AS ViewCount,
SUM(p.CommentCount) AS CommentCount,
SUM(p.FavoriteCount) AS FavoriteCount
FROM dbo.Users AS u
LEFT JOIN dbo.Posts AS p ON p.OwnerUserId=u.Id AND p.PostTypeId IN (1, 2)
GROUP BY u.DisplayName, u.Reputation)
SELECT upv.DisplayName
, upv.Reputation
, upv.Metric
, upv.Value
FROM cte
UNPIVOT (Value FOR Metric IN (ViewCount, CommentCount, FavoriteCount)) AS upv
;

Reply
1. Daniel Hutmacher says:
  
  2020-12-08 at 15:04
  
  That’s correct; UNPIVOT is essentially a CROSS APPLY, but the CROSS APPLY gives you vastly more flexibility.
  
  Reply
Alex says:

2021-06-04 at 20:35

What tool do you use to view execution plan?

Reply
1. Daniel Hutmacher says:
  
  2021-06-04 at 20:41
  
  That’s SentryOne Plan Explorer. It makes for some amazing screenshots, and it has a couple of cool features, too.
  
  Reply
Jeff Moden says:

2021-09-17 at 02:04

Really nicely done and great order of revelation in your article, Daniel. Thanks for the effort you put into this. It’s definitely gone into “favorites” under “SQL Performance”.

Reply
1. Daniel Hutmacher says:
  
  2021-09-17 at 07:35
  
  Hey, thanks!
  
  Reply
johnatpheasantville says:

2022-02-06 at 16:56

Good illustration of how CTE’s are interpreted by the SQL Engine – similar to in-line function.

However, my basic observations of SQL Server is that it appears to have “intelligence” in that it detects that the exact same statement has been requested a second time and as nothing has changed it can reuse the same results from a cache. This is purely a guess on my part but otherwise I find it hard to explain the time difference when executing a complex query (not involving table maintenance) the second and multiple times thereafter, done in quick succession. It almost always comes back instantly whereas could get several seconds for the first request.

Reply
1. Daniel Hutmacher says:
  
  2022-02-06 at 18:23
  
  What you’re seeing is probably a combination of a cached execution plan and cached data pages. I believe Oracle caches the actual query results to some extent, but SQL Server does not.
  
  Reply
  1. johnatpheasantville says:
    
    2022-02-09 at 16:19
    
    As a follow-up, would it be possible, if MS-SQL is re-evaluating the CTE independently for each call, if an update had been done in the instance between queries, would it be possible to return a different data set each time or is that managed by locks. If it is managed by locks, it would seem that as SQL could guarantee the same results then it would make sense to optimise it. The actual execution plan mirrors the steps for each call identically and so that information may be used within the SQL engine. Also, you don’t happen to remember the Microsoft page that details the fact that SQL Server treats them independently as that was the reason for my initial investigation that brought me to your article.
    
    Reply
    1. Daniel Hutmacher says:
      
      2022-02-09 at 19:40
      
      I don’t know about the documentation, but it’s pretty simple to demonstrate experimentally. As for data consistency, I’d expect it to be managed with locks, as you mention.
      
      Reply
  2. johnatpheasantville says:
    
    2022-02-09 at 16:21
    
    PS: Wouldn’t it be nice if MS-SQL introduced another keyword (e.g. REMEMBER) that could be substituted for the WITH that instructed the SQL Engine to use the same result set for each call to the functions defined within.
    
    Reply
    1. Daniel Hutmacher says:
      
      2022-02-09 at 19:42
      
      I mean.. it would probably require a massive rewrite of the execution engine. 😉
      You could mimic the same behaviour by caching your working data in a temp table if that’s what you’re after. As an added benefit, you’d get to index it just the way you need it, too.
      
      Reply
      1. johnatpheasantville says:
        
        2022-02-10 at 17:02
        
        In the meantime, I have done some further research and came across, what looks like a comprehensive analysis of all the various methods. URL: https://www.sommarskog.se/share_data.html
        
        So, I have now reconfigured my nested SQL In-line functions to use (a) Multi-statement function and (b) Stored procedure so I could pass the results of the initial function into a declared table. Without going into the mechanics of it, which is probably pushing the boundaries of the original post, I got a greatly increased CPU time and logical reads, even though I had applied a UNIQUE CLUSTERED INDEX to the table and the number of rows being stored was only ~150. Breaking down the analysis and running the SQL Extended Events on the statements, I was a factor of 10x the CPU of the In-line function, even though the execution plan of the In-line function clearly showed 3 identical tree structures representing the 3 calls to the in-line function. The CPU was 296,000 to 31,000 and logical reads 14,104 to 8517. In particular, I couldn’t justify the greatly increased number of reads as I dissected both the function call in isolation and the wrapper call referencing the MSF / Stored Procedure. The index on the declared table was being used and all the pipes were single figures or ~150 mark representing the rows in the declared table. The CPU difference mainly appeared to be the cost of inserting rows into the declared table, rather than the select statements which were a small fraction of the time in comparison. I also came across the restriction of INSERT .. EXEC when dealing with the stored procedure. I find this technique perfect for database maintenance (e.g. a financial journal posting) but not good for queries as their result set can’t be used within a SELECT statement.
        
        Anyway, as a result of all that analysis and coding the three different approaches, I will obviously stick with the in-line function for this particular call and take my hat off to the MS-SQL guys for how impressed I am with the speed of execution of what was a pretty complex nested set of SELECT statements including recursion and contains() over a fairly large SQL table.
        
        I am pretty old-school, coming from a Pascal based Network database and still getting used to trusting SQL to do its job without trying to interfere and be ‘helpful’ – e.g. by forcing it to use an intermediate table!
    2. Thomas Franz says:
      
      2022-02-09 at 20:58
      
      or you just create a #temp table which contains the result of your CTE and simply join it…. Not really hard and you could even add an fitting index onto it (if you need to read it multiple times)
      
      Reply
      1. johnatpheasantville says:
        
        2022-02-10 at 17:12
        
        Thanks for the suggestion. See my experience of utilising a temporary table in the post above on 2022-02-10 at 17:02
István Heckl says:

2022-02-10 at 11:57

It is a very nice example. But I am not sure if it is applicable for my problem. I have a Transaction and an Account table. The accounts represents a tree structure, so a full name (e.g.: mainAccount2.subAccount23.subSubAccount11) can be calculated with a recursive cte. I would like to run this cte only once. A transaction row has 4 separate accountId columns, and I want the full name for each of them. I use this construct.
, CapitalGainsTaxNameCte AS(
SELECT tr.*
, va1.FullName as CapitalGainsTaxName
, va2.FullName as InterestOrDividendName
, va3.FullName as TaxForInterestOrDividendName
, va4.FullName as FeeName
from AccountDateCte as tr
left join AccountFullNameCte as va1 on tr.CapitalGainsTaxId = va1.Id
left join AccountFullNameCte as va2 on tr.InterestOrDividendId = va2.Id
left join AccountFullNameCte as va3 on tr.TaxForInterestOrDividendId = va3.Id
left join AccountFullNameCte as va4 on tr.FeeId = va4.Id
)
But the AccountFullNameCte is executed 4 times. Can I use Cross apply in this case?
Now I see only the save the result of the cte into tempdb possibility.

Reply
johnatpheasantville says:

2022-02-10 at 17:19

Can I use Cross apply in this case? I believe (although no expert) that in order to use CROSS APPLY in your case, you would first need to convert the 4 individual fields defined for the tr table into individual rows (by using UNPIVOT). You can then CROSS APPLY to the AccountFullNameCTE passing the id column representing each account. You would need to add an (@id INT) parameter on to the AccountFullNameCte for this to work, but I have found that CROSS APPLY is very effective and neat way of integrating In-line functions and the performance is excellent (See my post on 2022-02-10 at 17:02 above for my experience when incorporating a temporary table).

Reply
Pingback: WITH-clause not pre-calculated, lineal resource wasting, why?
Nitish says:

2023-07-19 at 09:22

I tried using cte in snowflake and noticed it’s evaluated only once in the execution plan. Does it means it depends on execution engine also for that respective database how it’s evaluating cte

Reply
1. Daniel Hutmacher says:
  
  2023-07-19 at 09:51
  
  Absolutely, SQL is a declarative language. How it is actually executed under the hood depends completely on the engine.
  
  This blog post is written with SQL Server in mind.
  
  Reply

Straight to demo

The problem

Better living through CROSS APPLY

Just one more little tweak

32 comments

Leave a comment Cancel reply