Indexing for substring search

A question from a client got me thinking. Relational databases (at least the ones I know and love) can’t really index for queries that use LIKE queries for a substring of a column value. If you want to search for strings beginning with a given string, a regular rowstore index will have you covered. If you’re looking for entire words or sentences, a full text index might be a good call. But because of the very way indexes work, you’ll never get great performance searching for just arbitrary parts of a string.

So today I’ll put on my lab coat and do a little rocket surgery, just to prove to the world that it can be done.

Professional driver on closed roads, always wear protection. Your mileage may vary.

Collations and case sensitive wildcards

The other day, I ran across the following issue: I was looking to split a value using upper case and lower case characters. Here’s an example query:

SELECT PATINDEX('%[a-z][A-Z]%',
    'testValue' COLLATE Finnish_Swedish_CS_AS) AS offset

I expected this query to return the value 4, because at that offset in the string, there is a lower case character (t) followed directly by an upper case character (V). However, in practice, this query returns the value 1.