A quick look at SQL Server UTF-8 collations

A client asked me about SQL Server collations, and if they should consider the new UTF8 collations (new since SQL Server 2019). I tried to hide my blank stare of ignorance, and promised them I’d look it up and get back to them.

Not gonna lie, I think UTF and Unicode can be pretty confusing at times, so I did some googling and some testing, and here’s what I found.

Collations and case sensitive wildcards

The other day, I ran across the following issue: I was looking to split a value using upper case and lower case characters. Here’s an example query:

SELECT PATINDEX('%[a-z][A-Z]%',
    'testValue' COLLATE Finnish_Swedish_CS_AS) AS offset

I expected this query to return the value 4, because at that offset in the string, there is a lower case character (t) followed directly by an upper case character (V). However, in practice, this query returns the value 1.