Splitting a range string into a table

This week’s post is a requirement that I see very regularly as a developer. You get a plaintext string containing one or more ranges. Each range is comma delimited, and the start and end values of the range are separated by a dash. The string could look something like this, for example: 100-120,121-499,510,520,790-999.

Wouldn’t it be practical if we could construct a table value function that returns one row for each range, with columns for the start and end of each range?

The first challenge is understanding how to split a string using a delimiter of some sort. In our case, there are two delimiters: the comma (between ranges) and the dash (between each range’s start and end value). For this purpose, I find the CHARINDEX() function the most useful.

DECLARE @string varchar(100)='abcdef,ghijkl';

SELECT CHARINDEX(',', @string);

If your delimiter is a wildcard, you’d use the PATINDEX() function instead of CHARINDEX(), but the rest is the same.

The above query returns the integer value 7, because the 7th character in the string is the first comma. Now, if we combine this with the LEFT() and the SUBSTRING() functions, we can retrieve anything to the left of the comma, and anything after it.

DECLARE @string varchar(100)='abcdef,ghijkl';

SELECT
    --- The first 6 characters:
    LEFT(@string, CHARINDEX(',', @string)-1),
    --- Remaining characters, starting at offset 8:
    SUBSTRING(@string, CHARINDEX(',', @string)+1, LEN(@string));

But if the string doesn’t contain a comma, the CHARINDEX() function will return 0, in which case the LEFT() call will result in an error, because you can’t pass -1 as the second argument to the LEFT() function.

DECLARE @string varchar(100)='abcdefghijkl';

SELECT
    --- Characters before first comma:
    LEFT(@string, CHARINDEX(',', @string)-1),
    --- Characters after first comma:
    SUBSTRING(@string, CHARINDEX(',', @string)+1, LEN(@string));

This query will return

Msg 537, Level 16, State 2, Line 3
Invalid length parameter passed to the LEFT or SUBSTRING function.

To remedy this possibility, we’re going to add a comma of our own to the string in the CHARINDEX() function call, making sure it will always return a working value.

DECLARE @string varchar(100)='abcdefghijkl';

SELECT
    --- Characters before first comma:
    LEFT(@string, CHARINDEX(',', @string+',')-1),
    --- Characters after first comma:
    SUBSTRING(@string, CHARINDEX(',', @string+',')+1, LEN(@string));

Using this logic, we can now iterate through a comma-delimited string using a recursive common table expression:

DECLARE @string varchar(100)='100-120,121-499,510,520,790-999';

WITH ranges ([range], remain)
AS (
    --- This is the anchor of the recursive CTE:
    SELECT
        CAST(NULL AS varchar(max)) AS [range],
        @string AS remain
    UNION ALL
    SELECT
        --- This is the first remaining range:
        CAST(LEFT(remain, CHARINDEX(',', remain+',')-1) AS varchar(max)) AS [range],
        --- This is the remainder of the string:
        SUBSTRING(remain, CHARINDEX(',', remain+',')+1, LEN(remain)) AS remain
    FROM ranges
    WHERE remain!='')

SELECT *
FROM ranges;

This query will now return the following result set:

range     remain
NULL      100-120,121-499,510,520,790-999
100-120   121-499,510,520,790-999
121-499   510,520,790-999
510       520,790-999
520       790-999
790-999

Now all we have to do is split the “range” column once more, this time into a start and an end value, using the same logic that we applied to the comma separated list above, except we’re using the dash as the separator character, and we can safely assume that there will never be more than one dash in one range.

SELECT
    --- Anything to the left of the first dash (if any)
    LEFT([range], CHARINDEX('-', [range]+'-')-1),
    --- .. and anything after:
    SUBSTRING([range], CHARINDEX('-', [range])+1, LEN([range]))
FROM ranges
WHERE [range]!='';

Finally, we’ll package all of this code into an easy-to-use table value function:

CREATE FUNCTION dbo.fn_parseRanges(@range varchar(max))
RETURNS @tbl TABLE (
    rangeStart    varchar(max) NOT NULL,
    rangeEnd      varchar(max) NOT NULL,
    PRIMARY KEY CLUSTERED (rangeStart, rangeEnd)
)
AS

BEGIN;

    WITH ranges ([range], remain)
    AS (
        --- This is the anchor of the recursive CTE:
        SELECT
            CAST(NULL AS varchar(max)) AS [range],
            @range AS remain
        UNION ALL
        SELECT
            --- This is the first remaining range:
            CAST(LEFT(remain, CHARINDEX(',', remain+',')-1)
                AS varchar(max)) AS [range],
            --- This is the remainder of the string:
            SUBSTRING(remain, CHARINDEX(',', remain+',')+1,
                LEN(remain)) AS remain
        FROM ranges
        WHERE remain!='')

    --- Output each range into @tbl:
    INSERT INTO @tbl (rangeStart, rangeEnd)
    SELECT DISTINCT
        --- The start of the range:
        LEFT([range], CHARINDEX('-', [range]+'-')-1),
        --- ... and the end of the range:
        SUBSTRING([range], CHARINDEX('-', [range])+1,
            LEN([range]))
    FROM ranges
    WHERE [range]!='';

    RETURN;

END;

… which can be called like this:

SELECT *
FROM dbo.fn_parseRanges('100-120,121-499,510,520,790-999');

That’s it for this week. Let me know if there’s anything I’ve left out, and don’t forget to like the Facebook page to receive an update whenever there’s a new post!

Leave a comment

Your email address will not be published. Required fields are marked *