Turn your list into human-readable intervals

If you’ve worked with reporting, you’ve probably come across the following problem. You have a list of values, say “A, B, C, D, K, L, M, N, R, S, T, U, Z” that you want to display in a more user-friendly, condensed manner, “A-D, K-N, R-U, Z”.

Today, we’re going to look at how you can accomplish this in T-SQL, and what this has to do with window functions and gaps and islands.

Setting up some demo data

Imagine that we have some kind of business software. Your organization is subdivided into a number of “profit centers”, represented by three-digit identifiers. Your users are assigned permissions to one or more of these profit centers.

Here’s a quick-and-dirty script to create some demo data in two tables, dbo.ProfitCenters and dbo.UserContext.

CREATE TABLE dbo.ProfitCenters (
    PC          varchar(10) NOT NULL,
    CONSTRAINT PK_ProfitCenters PRIMARY KEY CLUSTERED (PC)
);

INSERT INTO dbo.ProfitCenters (PC)
SELECT TOP (200) REPLACE(STR(5*ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-5, 3, 0), ' ', '0') AS PC
FROM sys.columns;

CREATE TABLE dbo.UserContext (
    UserID      varchar(100) NOT NULL,
    PC          varchar(10) NOT NULL,
    CONSTRAINT PK_UserContext PRIMARY KEY CLUSTERED (UserID, PC)
);

INSERT INTO dbo.UserContext (UserID, PC)
SELECT u.UserID, pc.PC
FROM (
    VALUES ('Alice'), ('Bob'), ('Charles'), ('Deborah'), ('Erik'), ('Felicia')
    ) AS u(UserID)
INNER JOIN dbo.ProfitCenters AS pc ON
    (ABS(CHECKSUM(u.UserID))%512) & CAST(pc.PC AS int) & (16+32+64+128) > 100;

Building out the logic

First thing you want to do is to order the data alphabetically. This intuitively shows us which items are in contiguous groups, and where these groups start and end.

DECLARE @UserId varchar(10)='Erik';

SELECT pc.PC, cx.PC
FROM dbo.ProfitCenters AS pc
LEFT JOIN dbo.UserContext AS cx ON pc.PC=cx.PC AND cx.UserID=@UserId
ORDER BY pc.PC;

Using a LEFT JOIN, we can see all the profit centers in the dbo.ProfitCenter table, as well as which profit centers are assigned to Erik.

Erik has profit centers 115-125, 140

Now, using a little gaps-and-islands logic, we can create a computed value that indicates the start or finish of a group.

DECLARE @UserId varchar(10)='Erik';

SELECT pc.PC, cx.PC,
       (CASE --- Previous not matched, this one is.
             WHEN LAG(cx.PC, 1) OVER (ORDER BY pc.PC) IS NULL AND cx.PC IS NOT NULL THEN 1
             --- Previous matched, this one isn't.
             WHEN LAG(cx.PC, 1) OVER (ORDER BY pc.PC) IS NOT NULL AND cx.PC IS NULL THEN 1
             --- Same
             ELSE 0 END) AS _group_increment
FROM dbo.ProfitCenters AS pc
LEFT JOIN dbo.UserContext AS cx ON pc.PC=cx.PC AND cx.UserID=@UserId
ORDER BY pc.PC;

What we’re doing in the CASE block is comparing the previous row to the current row. If the previous row was not part of the user’s permissions, and the current row is, or vice versa, we set a 1 in the “_group_increment” column.

A new group starts at 115 (matched), 130 (unmatched),
140 (matched again), and 145 (unmatched again)

Now we can add up a running total of the _group_increment column:

DECLARE @UserId varchar(10)='Erik';

WITH compare AS (
    SELECT pc.PC, cx.PC AS matched_PC,
           (CASE --- Previous not matched, this one is.
                 WHEN LAG(cx.PC, 1) OVER (ORDER BY pc.PC) IS NULL AND cx.PC IS NOT NULL THEN 1
                 --- Previous matched, this one isn't.
                 WHEN LAG(cx.PC, 1) OVER (ORDER BY pc.PC) IS NOT NULL AND cx.PC IS NULL THEN 1
                 --- Same
                 ELSE 0 END) AS _group_increment
    FROM dbo.ProfitCenters AS pc
    LEFT JOIN dbo.UserContext AS cx ON pc.PC=cx.PC AND cx.UserID=@UserId)

SELECT matched_PC, SUM(_group_increment) OVER (ORDER BY PC ROWS UNBOUNDED PRECEDING) AS _group
FROM compare
ORDER BY PC;

.. and that running total is our group number:

The running total is our group number.

Filter this list by non-NULL values and aggregate the groups, and we can extract the minimum and maximum members from each group:

DECLARE @UserId varchar(10)='Erik';

WITH compare AS (
    SELECT pc.PC, cx.PC AS matched_PC,
           (CASE --- Previous not matched, this one is.
                 WHEN LAG(cx.PC, 1) OVER (ORDER BY pc.PC) IS NULL AND cx.PC IS NOT NULL THEN 1
                 --- Previous matched, this one isn't.
                 WHEN LAG(cx.PC, 1) OVER (ORDER BY pc.PC) IS NOT NULL AND cx.PC IS NULL THEN 1
                 --- Same
                 ELSE 0 END) AS _group_increment
    FROM dbo.ProfitCenters AS pc
    LEFT JOIN dbo.UserContext AS cx ON pc.PC=cx.PC AND cx.UserID=@UserId),

groups AS (
    SELECT matched_PC, SUM(_group_increment) OVER (
        ORDER BY PC ROWS UNBOUNDED PRECEDING) AS _group
    FROM compare)

SELECT MIN(matched_PC)+'-'+MAX(matched_PC)
FROM groups
WHERE matched_PC IS NOT NULL
GROUP BY _group
ORDER BY MIN(matched_PC);

Now all we have to do is clean up intervals with just a single member (like “970-970”) and then package them together in a comma-delimited string. We can use NULLIF() to eliminate two values that are the same.

SELECT MIN(matched_PC)+ISNULL('-'+NULLIF(MAX(matched_PC), MIN(matched_PC)), '')
FROM groups
WHERE matched_PC IS NOT NULL
GROUP BY _group
ORDER BY MIN(matched_PC);

With NULLIF() we’re saying that if the first and second argument are the same, return NULL. Then we surround the “-” and the NULLIF() expression with an ISNULL().

From here on, STRING_AGG will make all of this a single line:

DECLARE @UserId varchar(10)='Erik';

WITH compare AS (
    SELECT pc.PC, cx.PC AS matched_PC,
           (CASE --- Previous not matched, this one is.
                 WHEN LAG(cx.PC, 1) OVER (ORDER BY pc.PC) IS NULL AND cx.PC IS NOT NULL THEN 1
                 --- Previous matched, this one isn't.
                 WHEN LAG(cx.PC, 1) OVER (ORDER BY pc.PC) IS NOT NULL AND cx.PC IS NULL THEN 1
                 --- Same
                 ELSE 0 END) AS _group_increment
    FROM dbo.ProfitCenters AS pc
    LEFT JOIN dbo.UserContext AS cx ON pc.PC=cx.PC AND cx.UserID=@UserId),

groups AS (
    SELECT matched_PC, SUM(_group_increment) OVER (ORDER BY PC ROWS UNBOUNDED PRECEDING) AS _group
    FROM compare),

ranges AS (
    SELECT MIN(matched_PC)+ISNULL('-'+NULLIF(MAX(matched_PC), MIN(matched_PC)), '') AS _range
    FROM groups
    WHERE matched_PC IS NOT NULL
    GROUP BY _group)

SELECT STRING_AGG(_range, ',') WITHIN GROUP (ORDER BY _range)
FROM ranges;

Final touches: One row for every user

We don’t want to run this query once for every user, so with just a little modification, we can return all of the users in a single query.

WITH compare AS (
    SELECT u.UserID, pc.PC, cx.PC AS matched_PC,
           (CASE --- Previous not matched, this one is.
                 WHEN LAG(cx.PC, 1) OVER (
                    PARTITION BY u.UserID
                    ORDER BY pc.PC) IS NULL AND cx.PC IS NOT NULL THEN 1
                 --- Previous matched, this one isn't.
                 WHEN LAG(cx.PC, 1) OVER (
                    PARTITION BY u.UserID
                    ORDER BY pc.PC) IS NOT NULL AND cx.PC IS NULL THEN 1
                 --- Same
                 ELSE 0 END) AS _group_increment
    FROM dbo.ProfitCenters AS pc
    CROSS JOIN (SELECT DISTINCT UserID FROM dbo.UserContext) AS u
    LEFT JOIN dbo.UserContext AS cx ON pc.PC=cx.PC AND cx.UserID=u.UserID),

groups AS (
    SELECT UserID, matched_PC,
           SUM(_group_increment) OVER (
                PARTITION BY UserID
                ORDER BY PC ROWS UNBOUNDED PRECEDING) AS _group
    FROM compare),

ranges AS (
    SELECT UserID, MIN(matched_PC)+ISNULL('-'+NULLIF(MAX(matched_PC), MIN(matched_PC)), '') AS _range
    FROM groups
    WHERE matched_PC IS NOT NULL
    GROUP BY UserID, _group)

SELECT UserID, STRING_AGG(_range, ',') WITHIN GROUP (ORDER BY _range)
FROM ranges
GROUP BY UserID;

We want to reset the window functions for each new user, and that’s what the PARTITION BY clause is used for.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *