Although it’s rotten practice, values like dates, times, numbers, etc are often stored in columns with the wrong data type. Sometimes, it’s because the data model needs to allow for multiple types of data in the same column (sometimes even on the same row in the form of a formula or text syntax), but sometimes, it’s just plain and simple lazy.
With this type of setup, it’s only a matter of time before you run into conversion errors like this one:
Msg 245, Level 16, State 1, Line 29 Conversion failed when converting the varchar value 'xyz' to data type int.
To make things worse, these errors can be tricky to pinpoint, because they can appear to come and go without any real recognizable pattern. In this post, I’ll take a look at how this happens, how to look for these errors, and ultimately how to fix them if you can’t change the database schema.
Some sample data
Let’s begin by setting up a kind of EAV-model (entity-attribute-value) table. I’ve simplified the example – we’re not discussing EAVs themselves, but rather conversion issues with incorrect data types:
CREATE TABLE dbo.BunchOfStrings ( ID int IDENTITY(1, 1) NOT NULL, [Type] varchar(10) NOT NULL, StringValue varchar(100) NOT NULL, CONSTRAINT PK_BunchOfStrings PRIMARY KEY CLUSTERED (ID) ); --- A single text value, "xyz" INSERT INTO dbo.BunchOfStrings ([Type], StringValue) VALUES ('String', 'xyz'); --- and bunch of numeric and date values, all stored in the varchar column: WHILE (@@ROWCOUNT<1000) INSERT INTO dbo.BunchOfStrings ([Type], stringValue) SELECT 'Number', CAST(CAST(1000*RAND(CHECKSUM(NEWID())) AS int) AS varchar(100)) FROM dbo.BunchOfStrings UNION ALL SELECT 'Date', CONVERT(varchar(100), DATEADD(hh, 365*24*RAND(CHECKSUM(NEWID())), {d '2016-01-01'}), 121) FROM dbo.BunchOfStrings;
Converting the number values
Right off the bat, let’s see if we can isolate a few numeric values, convert them to int and display them:
SELECT CAST(StringValue AS int) AS NumberValue FROM dbo.BunchOfStrings WHERE [Type]='Number';
Sure enough, the query works, and this is what the query plan looks like:
As you can see, the index really doesn’t do us any good, since it just orders the table by its identity column, resulting in a clustered index scan. The Filter operator isolates only rows where Type=’Number’, and finally Compute Scalar performs the actual data type conversion, turning the string value of each row into an integer value.
If you hover over the Filter operator, you’ll see that its predicate (the filter condition) is:
[BunchOfStrings].[Type]='Number'
We’ll get back to the predicate in a second. This query works great because every value that we’ve stored with Type=’Number’ is indeed numeric and returns a valid integer upon conversion.
Let’s complicate things, though, and add a second criteria. Take a look at this query:
SELECT CAST(StringValue AS int) AS NumberValue FROM dbo.BunchOfStrings WHERE [Type]='Number' AND CAST(StringValue AS int)<500;
The query plan still contains the same operators:
But the predicate of the Filter operator has changed:
[BunchOfStrings].[Type]=[@1] AND CONVERT(int, [BunchOfStrings].[StringValue], 0)<CONVERT_IMPLICIT(int, [@2], 0)
You can’t see the difference in the graphical plan, but the predicate of the Filter operator now also filters on StringValue<500. Also, we now have a parameterized plan – our hard-coded values, “Number” and 500, have been replaced by parameters in the query. The idea is that this query will fit similar future queries as well, say, if we want rows where StringValue<100, or whatever. What’s important to note here is also the order of the conditions: The query plan filters by Type first, then by StringValue.
Query plans and caching
Query plans aren’t neccessarily generated from scratch every time you run a query – they’re cached so they can be reused if the same (or a very similar) query runs again. This is particularly important if you’re running an OLTP system with a lot of similar queries, where saving a few milliseconds for each and every query can vastly improve the performance of your server as a whole.
But cached plans expire, for any of a number of reasons: for instance, if you haven’t run the query for a long time, if the relevant statistics on a table change, if you change or rebuild a table or index used by the query, or if you restart the server.
Now, let’s simulate a server restart or a few days gone by. We can provide a query hint, WITH (RECOMPILE), which will force SQL Server to generate a new query plan from scratch. Everything else about the query is unchanged.
SELECT CAST(StringValue AS int) AS NumberValue FROM dbo.BunchOfStrings WHERE [Type]='Number' AND CAST(StringValue AS int)<500 OPTION (RECOMPILE);
Now, let’s look at the plan:
Still the same operators and execution order, but when we look at the predicate of the Filter, can you spot the difference?
CONVERT(int, [BunchOfStrings].[StringValue], 0)<(500) AND [BunchOfStrings].[Type]='Number'
And, voilà, this is what happens when you try to convert StringValue to an integer before you’ve filtered the Type column:
Msg 245, Level 16, State 1, Line 29 Conversion failed when converting the varchar value 'xyz' to data type int.
Remember that in our test data, “xyz” is the StringValue that we created first, with Type=’String’. Before we’ve filtered on Type, we’re trying to convert ‘xyz’ to an integer so we can compare it to 500.
On implicit conversions
When comparing a value of data type A to one of data type B, there is a deterministic order in which the implicit conversion happens. This is known as data type precedence, and here are of some of the more common data types by precedence, from high to low:
- datetime
- date
- time
- decimal
- int
- text
- nvarchar (including nvarchar(max) )
- varchar (including varchar(max) )
- char
- varbinary (including varbinary(max) )
- binary (lowest)
So, for instance, when comparing a datetime column to a date column, the date column (lower priority) will be implicitly converted to datetime (higher priority).
Strategies to avoid conversion errors
Use the correct data types from the start
This one goes first and is a no-brainer. Store dates in dates, numbers, etc in their respective correct data types.
Use TRY_CAST() or TRY_CONVERT()
This is a lazy but relatively elegant and easy-to-read strategy that works if you’re on SQL Server 2012 or newer. Whereas CAST() will convert a value explicitly and crash on conversion errors, TRY_CAST() does exactly the sames, but when there’s a conversion error, it will simply return a NULL value.
To fix the query above, you could simple exchange CAST() with TRY_CAST(), leaving everything else unchanged:
SELECT TRY_CAST(StringValue AS int) AS NumberValue FROM dbo.BunchOfStrings WHERE [Type]='Number' AND TRY_CAST(StringValue AS int)<500 WITH (RECOMPILE);
Note that you’ll have to design your code to handle NULL values, even though the expression you’re converting may not itself be nullable. In our example, the WHERE clause takes care of that – any value that results in TRY_CAST()=NULL will be removed because NULL compared to anything is always false.
Explicitly convert everything to varchar first
When comparing values of two data types, SQL Server needs to implictly convert one of the values to the data type of the other. For instance, when comparing 123 < ‘456’, the text value is converted to an integer, then the two are compared. This is what happened in the first example of this post. Instead, you could explicitly convert the integer value to a string value and perform the comparison.
The conversion ‘123’ < ‘456’ is perfectly valid, and so is ‘xyz’ < ‘456’. However, string comparisons work differently than text comparisons, so you need to stay on top of what types of values you’ll be comparing. Consider for instance that ‘100’ is less than ’20’, because the comparison is alphabetical. ‘020’, however, is less than ‘100’. So for int values, padding any value with zeroes is one way to go:
SELECT CAST(StringValue AS int) AS NumberValue FROM dbo.BunchOfStrings WHERE [Type]='Number' AND RIGHT('00000000'+StringValue, 8)<'00000500' WITH (RECOMPILE);
The query plan is the same, but predicate for the Filter operator will now look like this:
right([BunchOfStrings].[StringValue], (8))<'00000500' AND [BunchOfStrings].[StringValue]='Number'
1 comment