A short tutorial on common table expressions (CTE)

What’s a common table expression?

Common table expressions are a great way to clean up your code and make it more readable. A CTE resembles a subquery, but unlike subqueries, you can re-use the same data set over and over again in your code. Sometimes, you’ll even find that the query optimizer does a better job when you use a common table expression, resulting in a faster-running, more efficient query.

Example

In the AdventureWorks database, there’s a table called Sales.SalesPersonQuotaHistory that contains historic sales quotas per employee (called “BusinessEntityID” for some reason). It looks something like this:

SELECT BusinessEntityID, QuotaDate, SalesQuota
FROM Sales.SalesPersonQuotaHistory
ORDER BY BusinessEntityID, QuotaDate;

The QuotaDate is the starting date of each new quota, as it’s continuously adjusted. Say we want to present this table with date intervals (the starting and ending date) of each quota for each sales person. We could use a common table expression based on the query above.

WITH quota_cte (BusinessEntityID, QuotaDate, SalesQuota, QuotaVersion)
AS (
   SELECT BusinessEntityID, QuotaDate, SalesQuota,
   ROW_NUMBER() OVER (
      PARTITION BY BusinessEntityID
      ORDER BY QuotaDate) AS QuotaVersion
   FROM Sales.SalesPersonQuotaHistory)

SELECT *
FROM quota_cte
ORDER BY BusinessEntityID, QuotaVersion;

Note how we’ve defined a common table expression “quota_cte”, that we use just like any ordinary table or view in the query. The magic happens in the WITH clause, where we define the CTE. I’ve added an integer column to the CTE called QuotaVersion which is basically a counter. Each sales person’s first quota has QuotaVersion=1, the next one has QuotaVersion=2, and so on. We’re going to need this information in the next step when we join the common table expression quota_cte with itself.

WITH quota_cte (BusinessEntityID, QuotaDate, SalesQuota, QuotaVersion)
AS (
   SELECT BusinessEntityID, QuotaDate, SalesQuota,
   ROW_NUMBER() OVER (
      PARTITION BY BusinessEntityID
      ORDER BY QuotaDate) AS QuotaVersion
   FROM Sales.SalesPersonQuotaHistory)

SELECT a.BusinessEntityID, a.QuotaDate AS QuotaStartDate,
   b.QuotaDate AS QuotaEndDate, a.SalesQuota
FROM quota_cte AS a
LEFT JOIN quota_cte AS b ON
   a.BusinessEntityID=b.BusinessEntityID AND
   a.QuotaVersion+1=b.QuotaVersion
ORDER BY a.BusinessEntityID, a.QuotaVersion;

In this query we’ve used quota_cte twice, but we only had to define it once, in the WITH clause! The alias “a” represents the “current” QuotaDate, and the alias “b” is the next in chronological order (if any).

Multiple CTEs in the same query

You can define multiple common table expressions in one WITH clause, separated by commas. They can even reference each other.

WITH one (x)
AS (SELECT 1 AS x),

     two (y)
AS (SELECT x+1 AS y FROM one),

     three (z)
AS (SELECT y+1 AS z FROM two)

SELECT one.x, two.y, three.z
FROM one
CROSS JOIN two
CROSS JOIN three;

Magic.

The semicolon

Because common table expressions were introduced long after SQL Server supported various forms of WITH clauses (table and index hints and lots of other examples), you need to specifically tell SQL Server that this is a common table expression, and not for instance an index hint. This is done by prefixing every CTE with a semicolon.

In the future, all T-SQL statements will have to be terminated with semicolons, so this may actually be a good time to start.

Recursive common table expressions

CTEs can be made recursive, so they can iterate counters, aggregates, etc. But that is the subject of a separate blog post.

6 comments

Leave a comment

Your email address will not be published. Required fields are marked *