Implicit conversions in SQL Server follow a specific, predictable order, called data type precedence. This means that if you compare or add/concatenate two values, a and b, with different data types, you can predict which one will be implicitly converted to the data type of the other one in order to be able to complete the operation.
I stumbled on an interesting exception to this rule the other day.
Consider the following example table of integers:
CREATE TABLE #integers ( i int NOT NULL, CONSTRAINT PK PRIMARY KEY CLUSTERED (i) ); INSERT INTO #integers (i) SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.messages;
Let’s look at an example:
DECLARE @s varchar(100)='123'; SELECT * FROM #integers WHERE [email protected];
In this query, data type precedence determines that @s will be converted to an int value, because int (the data type of the “i” column) has higher precedence than varchar. You can see the results of this conversion when we look at the resulting execution plan:
Now, because we’re looking for an int value in an indexed int column, SQL Server can use a Clustered Index Seek on that column. So what would happen if we change the WHERE clause from an equality condition to a wildcard?
DECLARE @s varchar(100)='123'; SELECT * FROM #integers WHERE i LIKE @s;
Turns out that with a LIKE comparison, both sides need to be string values, so in this case SQL Server will convert the “i” column to a varchar. The effect of this, however, is that this search predicate is no longer sargable, which means we need to scan through the entire table rather than seeking to a specific point in the index.
How about dates?
Let’s try something similar, but instead of integers we’ll look at date columns:
CREATE TABLE #dates ( [Date] date NOT NULL, [DateTime] datetime2(7) NOT NULL, CONSTRAINT UQ_Date UNIQUE ([Date]), CONSTRAINT UQ_Datetime PRIMARY KEY CLUSTERED ([DateTime]) ); INSERT INTO #dates ([Date], [DateTime]) SELECT TOP (1000) DATEADD(day, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '2000-01-01'), DATEADD(minute, 123*ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '2000-01-01') FROM sys.messages;
Will an equality condition on a date column work the same way it did with an integer?
DECLARE @s varchar(100)='2000-01-02'; SELECT * FROM #dates WHERE [Date][email protected];
Yes. With an equality condition, data type precedence applies, and the varchar parameter is converted to date. We get a sargable search condition that allows us to use a Seek on the index. The same thing applies if we use a different date format in @s:
DECLARE @s varchar(100)='2000.01.02'; SELECT * FROM #dates WHERE [Date][email protected];
The reason for this is that SQL Server does not make any assumptions with regards to the format of the date in the @s variable: Note the third argument of CONVERT_IMPLICIT, which is “0”. This argument sets the string format for the conversion. For instance, 121 is the ISO standard yyyy-mm-dd, 110 is the US mm-dd-yyyy, and so on. A zero means “best guess”.
So what happens if we try a wildcard on a date or datetime column?
DECLARE @s varchar(100)='2000-01-02'; SELECT * FROM #dates WHERE [DateTime] LIKE @s+'%';
Turns out, this behaviour is consistent across data types.
Just like when we tried a wildcard condition with the int column, the datetime2 column is converted to a varchar, and then tested for the LIKE condition.
Note that when SQL Server implicitly converts a date or datetime column to varchar, it does so using the 121 format for the conversion, which means that the date is represented as yyyy-mm-dd hh:mi:ss.mmm. This becomes important when you apply a string comparison. The following comparisons will work:
WHERE [DateTime] LIKE '2021-%' -- all 2021 transactions WHERE [DateTime] LIKE '2021-02%' -- all February, 2021 transactions WHERE [DateTime] LIKE '2021-02-03%' -- all February 3rd, 2021 transactions
The following will run, but won’t return any rows:
WHERE [DateTime] LIKE '%/%/2021%' -- all 2021 transactions WHERE [DateTime] LIKE '2/%/2021%' -- all February, 2021 transactions WHERE [DateTime] LIKE '2/03/2021%' -- all February 3rd, 2021 transactions
The reason being that the implicitly converted datetime string is in ISO format, so the strings won’t match the wildcard.
But please don’t implicit conversions.
Needless to say, even if you can get something like this to run and return correct results, please don’t. The better strategy is to avoid implicit conversions and create a sargable query.
WHERE [DateTime]>='2021-01-01' AND [DateTime]<'2022-01-01' -- all 2021 transactions WHERE [DateTime]>='2021-02-01' AND [DateTime]<'2022-03-01' -- all February, 2021 transactions WHERE [DateTime]>='2021-02-03' AND [DateTime]<'2022-02-04' -- all February 3rd, 2021 transactions
All three of these queries are sargable, will benefit from column statistics, and will result in lightning-fast Seek queries, rather than scanning the entire table and converting every date to a string before making a text comparison.
But wait, there’s more!
Turns out, as Dirk Hondong (b|t) pointed out in the comments, that there’s a difference between how SQL Server converts different date/time data types to strings. The newer date and datetime2 types will convert to the ISO “2021-02-03” and “2021-02-03 00:00:00.000” formats, respectively, while the legacy datetime type will convert to the traditional “Feb 3 2021 12:00AM”. I suppose this is for backward compatibility. Needless to say, the datetime data type is not recommended for new development.
7 thoughts on “About wildcards and data type precedence”
Interesting, and it makes sense really, I mean how would LIKE compare two numbers anyway. Also interesting is that once the comparison is between two strings of different data types, they do get converted according to the precedence (e.g. varchar to nvarchar).
I’m a little old-school myself. I’d like a LIKE comparison to fail on non-string data types. But that’s just me. :)
I’m actually glad things like (pun intended) this work. You just can’t tell when an “undiscovered use” might pop up. To me, SQL is a programming language, not a GUI, and I don’t actually want to be protected every step of the way.
I can’t think of a recent example but when I had to do some work in Oracle a very long time ago, it wouldn’t let me update a single variable using an overlay by a single SELECT like we can in SQL Server as a “pseudo cursor” instead of writing a WHILE Loop.
And, sorry, I don’t remember exactly what I was trying to calculate that required such an action.
Can such a thing allow people to make a mistake? Absolutely… and it happens in programming languages all the time. One person’s fault in a programming language is another person’s feature. The way UPDATE works in SQL Server compared to Oracle is another example that I’ve used (Data cleanup of international telephone numbers using a right to left lookup on the single column that contained the Country Code and City Code)
I just tried your example today.
What I figured out this morning in one of my AdminDBs, where I capture for example wait stats: the like condition did not work and it took about 20 minutes to figure out why.
I just have a datetime column, not datetime2.
So the answer to your init question can also be 0 rows if you just have datetime and not datetime2.
(or I have a complete error in thinking today…) :-)
Interesting! I tried date and datetime2 columns, but not specifically datetime. Will investigate! :)
Pingback: Wildcards and Data Type Precedence – Curated SQL
In your new addition to the article, you state “Needless to say, the datetime data type is not recommended for new development.”.
I frequently recommend the exact opposite. The DATETIME datatype actually follows ISO standards everywhere else except the default output format, which I don’t trust on any of the temporal datatypes anyway. The newer data types (everything other than DATETIME and SMALLDATETIME) are NOT actually ISO compliant..
The ISO standards state that temporal data should allow for the following formula and all related “mutations” of the formula to work.
Periiod = EndDateTime – StartDateTime
Try that or any of the mutations of the formula on any of the newer datatypes and you’ll get a failure in return.
And, yeah, we ARE talking about “Direct Date/Time Math”, which a whole lot of people inappropriately poo-poo. They say it’s a “Best Practice” to not do direct date math but that supposed “Best Practice” has no actual standing and causes much more seriously complicated date math to be created by the uninitiated.
A good example of that is in the original formula cited above. Another name for “Period” is “Duration”. Using a start and end date and time column in a table, calculate of all the durations and express the answer in the form of DD:HH:MI:SS for each row. Do that using the newer datatypes and then read the following article.