This article will show you how to create a “top 10” list of data, as well as providing an aggregate on the remainder of the data. There are a few different approaches to this.
The example queries in this post are all built on the AdventureWorks database, but you can easily adapt them to pretty much any other table you like.
The simple TOP 10 list
The simplest query is a regular SELECT query with a TOP keyword, but it won’t provide the “remainder” aggregate that we’re looking for:
SELECT TOP 10 SalesOrderID, SalesOrderNumber, SubTotal FROM Sales.SalesOrderHeader ORDER BY SubTotal DESC;
Using a common table expression and windowed functions
If you use the above query within a common table expression, you can write a UNION of two queries; one with the top 10 list, and the other with the aggregate of the remaining rows.
The ordinal number (1st place, 2nd place, etc) in the common table expression is calculated using a ROW_NUMBER() windowed function. If you’re using GROUP BY in the CTE, you might have to change ROW_NUMBER() to DENSE_RANK().
WITH topHeaders (SalesOrderNumber, SubTotal, Ordinal) AS (--- Give each row an ordinal number SELECT SalesOrderNumber, SubTotal, ROW_NUMBER() OVER (ORDER BY SubTotal DESC) AS Ordinal FROM Sales.SalesOrderHeader) --- The top 10 rows: SELECT SalesOrderNumber, SubTotal, Ordinal FROM topHeaders WHERE Ordinal<=10 UNION ALL --- .. and the remaining aggregate SELECT 'Others', SUM(SubTotal), NULL FROM topHeaders WHERE Ordinal>10;
This query might not be optimal if your source data contains a lot of rows, because it will enumerate the entire table with ordinal numbers. There are, however, few simple solutions to this performance problem.
Using a temp table to optimize performance
With large quantities of data, using a temp table could actually lead to a query plan that is more optimized for performance. This is not a given, because it may vary from case to case. As usual, use the estimated query plan to see which solution is best for you.
--- This is the temp table we're going to be using: CREATE TABLE #top10 ( SalesOrderNumber nvarchar(50) NOT NULL, SubTotal money NOT NULL, Ordinal int NOT NULL, PRIMARY KEY CLUSTERED (Ordinal) ); --- Adding the top 10 rows into the temp table: INSERT INTO #top10 (SalesOrderNumber, SubTotal, Ordinal) SELECT TOP 10 SalesOrderNumber, SubTotal, ROW_NUMBER() OVER (ORDER BY SubTotal DESC) AS Ordinal FROM Sales.SalesOrderHeader ORDER BY SubTotal DESC; --- Rows from the #top10 temp table: SELECT SalesOrderNumber, SubTotal, Ordinal FROM #top10 UNION ALL --- .. and the remainder: SELECT NULL, --- The remainder is the grand total of the base table, --- minus the sum of the top 10. (SELECT SUM(SubTotal) FROM Sales.SalesOrderHeader)- (SELECT SUM(SubTotal) FROM #top10) AS SubTotal, NULL; DROP TABLE #top10;
The main difference between this query and the previous one is that when populating the temp table, we’re using the TOP operator along with ORDER BY to limit the number of rows that we calculate the ROW_NUMBER() on. This should, in theory, reduce the workload for the SQL Server when there is a large volume of rows in the table.
What about naming the remaining “Other” and ordering at an “11” vs having it pull as null in both fields not sales?