Using TOP n WITH TIES

Here’s an interesting feature to that may greatly simplifie a few queries. TOP n WITH TIES works pretty much like TOP, except it orders the top records “densely” to include tied values.

A regular TOP query, without WITH TIES

The following query will return two rows:

WITH work (a, b)
AS (SELECT 1, 'One' UNION ALL
    SELECT 2, 'Two' UNION ALL
    SELECT 2, 'Two again' UNION ALL
    SELECT 3, 'Three' UNION ALL
    SELECT 4, 'Four')

SELECT TOP 2 a, b
FROM work
ORDER BY a;

However, because there are two values where a=2, which of the two rows (“Two” or “Two again”) that appears as row number two is arbitrary, because the ordering clause is not unique for the table.

TOP 2 WITH TIES

If you add “WITH TIES” to the query above, the returned recordset will include three rows, because there is one row with a=1 and two rows with a=2.

WITH work (a, b)
AS (SELECT 1, 'One' UNION ALL
    SELECT 2, 'Two' UNION ALL
    SELECT 2, 'Two again' UNION ALL
    SELECT 3, 'Three' UNION ALL
    SELECT 4, 'Four')

SELECT TOP 2 WITH TIES a, b
FROM work
ORDER BY a;

Using TOP n WITH TIES with windowed aggregates

Suppose you have a table with server uptimes for a few days, and you want to look at the day-to-day figures for the worst-performing server. Here’s the base data:

WITH uptimes (serverName, [date], uptime)
AS (SELECT 'SQL1', {d '2013-10-01'}, 100.0 UNION ALL
    SELECT 'SQL1', {d '2013-10-02'}, 100.0 UNION ALL
    SELECT 'SQL1', {d '2013-10-03'}, 100.0 UNION ALL
    SELECT 'SQL2', {d '2013-10-01'},  96.8 UNION ALL
    SELECT 'SQL2', {d '2013-10-02'}, 100.0 UNION ALL
    SELECT 'SQL2', {d '2013-10-03'},  99.9 UNION ALL
    SELECT 'SQL3', {d '2013-10-01'},  97.9 UNION ALL
    SELECT 'SQL3', {d '2013-10-02'},  98.7 UNION ALL
    SELECT 'SQL3', {d '2013-10-03'}, 100.0)

SELECT *
FROM uptimes;

If you just use “TOP 1 .. ORDER BY uptime”, you will get only one record. That record will point to the single worst uptime for one day, which is SQL2 on october 1. What we’re looking for, however, is the worst average for a server over the three days, which should be SQL3.

You could use GROUP BY and add an aggregate function, but that means you would lose the details because you aggregate all the values. So, instead we’ll use a windowed function, like this:

WITH uptimes (serverName, [date], uptime)
AS (SELECT 'SQL1', {d '2013-10-01'}, 100.0 UNION ALL
    SELECT 'SQL1', {d '2013-10-02'}, 100.0 UNION ALL
    SELECT 'SQL1', {d '2013-10-03'}, 100.0 UNION ALL
    SELECT 'SQL2', {d '2013-10-01'}, 96.8 UNION ALL
    SELECT 'SQL2', {d '2013-10-02'}, 100.0 UNION ALL
    SELECT 'SQL2', {d '2013-10-03'}, 99.9 UNION ALL
    SELECT 'SQL3', {d '2013-10-01'}, 97.9 UNION ALL
    SELECT 'SQL3', {d '2013-10-02'}, 98.7 UNION ALL
    SELECT 'SQL3', {d '2013-10-03'}, 100.0)

SELECT *, AVG(uptime) OVER (PARTITION BY serverName)
FROM uptimes;

Notice how the right-most column returns the average of the uptime for each server, as specified in the PARTITION BY clause of the windowed function. Now, all we have to do is move the AVG(update) column from the SELECT block to ORDER BY and add TOP 1 WITH TIES.

WITH uptimes (serverName, [date], uptime)
AS (SELECT 'SQL1', {d '2013-10-01'}, 100.0 UNION ALL
    SELECT 'SQL1', {d '2013-10-02'}, 100.0 UNION ALL
    SELECT 'SQL1', {d '2013-10-03'}, 100.0 UNION ALL
    SELECT 'SQL2', {d '2013-10-01'}, 96.8 UNION ALL
    SELECT 'SQL2', {d '2013-10-02'}, 100.0 UNION ALL
    SELECT 'SQL2', {d '2013-10-03'}, 99.9 UNION ALL
    SELECT 'SQL3', {d '2013-10-01'}, 97.9 UNION ALL
    SELECT 'SQL3', {d '2013-10-02'}, 98.7 UNION ALL
    SELECT 'SQL3', {d '2013-10-03'}, 100.0)

SELECT TOP 1 WITH TIES *
FROM uptimes
ORDER BY AVG(uptime) OVER (PARTITION BY serverName);

This way we get all the rows for the top 1 worst server average.

Credit

Thanks to my colleague Tobias Lindh for the tip!

Leave a comment

Your email address will not be published. Required fields are marked *