Basic string parsing

A lot of times, T-SQL queries aren’t just for compiling huge sets of aggregate calculations or advanced joins and indexing strategies. Sometimes, you need the skills and tools to do some basic string parsing. Here, you’ll learn the most common basic string parsing and functions found in T-SQL.

Basic T-SQL string functions

Some of the most commonly used string functions are the LEFT() and RIGHT() functions along with SUBSTRING(). LEFT(a, n) returns the first “n” characters of “a”, just like you’d expect in, say, Basic or many other programming languages. RIGHT(a, n) works in the same way, but the returned string is “n” characters from the end. SUBSTRING(a, offset, length) corresponds to the VisualBasic Mid() function – it returns “length” characters from “a”, starting at “offset”. LEN() returns the length of a string.

DECLARE @str varchar(100);
SET @str='Another brown fox jumps over something';

SELECT LEFT(@str, 7),
       SUBSTRING(@str, 15, 3),
       RIGHT(@str, 9),
       LEN(@str);

Finding a character or a wildcard

There are basically two different functions for finding a character string or a wildcard in T-SQL, CHARINDEX() and PATINDEX(). They work pretty much the same way, but the difference is that CHARINDEX(s, a) returns the first offset of the character string “s” in “a”, whereas PATINDEX(s, a) returns the first offset where the wildcard “s” is true in “a”. Wildcards use the same syntax as you would apply in a LIKE clause. For both functions, the return value 0 means that there was no match.

DECLARE @str varchar(100);
SET @str='Another brown fox jumps over something';

SELECT CHARINDEX('fox', @str),
       PATINDEX('%jump%', @str);

Splitting a string value

That’s basically all you need to split a string into, say, words.

DECLARE @str varchar(100), @offset int;
SET @str='Another brown fox jumps over something';

WHILE (@str!='') BEGIN;
    SET @offset=CHARINDEX(' ', @str+' ');
    SELECT LEFT(@str, @offset-1);
    SET @str=SUBSTRING(@str, @offset+1, LEN(@str));
END;

What the above query does is:

  • Loop until @str is blank.
  • Find the next space (note the @str+’ ‘. This so we’ll find the last word, which doesn’t have a trailing space!)
  • SELECT everything to the left of @offset (we’re using @offset-1, so we don’t include the space itself)
  • Truncating the beginning of @str with @offset+1 (to include the space) characters.
  • When we’ve truncated the last word from @str, the variable will be blank and the loop will stop.

… or with a recursive CTE

If you want extra points for finesse, use a recursive common table expression.

DECLARE @str varchar(100), @offset int;
SET @str='Another brown fox jumps over something';

WITH words (word, str_remain)
AS (
    --- Anchor part
    SELECT CAST(NULL AS varchar(100)) AS word,
           @str AS str_remain

    UNION ALL

    --- Recursion
    SELECT LEFT(str_remain,
                CHARINDEX(' ', str_remain+' ')-1) AS word,
        SUBSTRING(str_remain,
                CHARINDEX(' ', str_remain+' ')+1,
                LEN(str_remain)) AS str_remain
    FROM words
    WHERE str_remain!='')

SELECT word
FROM words
WHERE word IS NOT NULL;

Recursive common table expressions are the topic of a different post.

Let me hear your thoughts!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s