I’m trying a new type of blog post, and if it works out, I would be happy to post more of the same going forward. The format is a real-world troubleshooting mystery, and I’ll clue you in to the details along the way.
How quickly can you crack it?
Here’s a mock-up of something that happened to me today: We have two tables with a key column, ID, that joins them. The ID column has a clustered primary key on it.
CREATE TABLE #a ( ID nvarchar(4) NOT NULL, PRIMARY KEY CLUSTERED (ID) ); CREATE TABLE #b ( ID nvarchar(8) NOT NULL, PRIMARY KEY CLUSTERED (ID) );
We’re going to fill the tables with some data:
INSERT INTO #a VALUES ('1000'), ('1001'), ('1002'), ('1003'), ('1004'), ('1005'), ('1006'), ('1007'), ('1008'); --- Table #b has roughly the same data as #a. INSERT INTO #b SELECT ID FROM #a; --- ... plus one little troublemaker: INSERT INTO #b VALUES ('1001x');
10 points: first look at the problem
I was comparing data from the two tables (out of scope for this post), using a FULL JOIN, when I came upon a duplicate that I couldn’t immediately understand.
SELECT ISNULL(a.ID, b.ID) --- ... and a lot of other columns FROM #a AS a FULL JOIN #b AS b ON a.ID=b.ID;
This query returned two rows with the ID value 1001. Which is strange, as the ID column is a primary key in both tables.
For a full 10-point score, why do you think this happened?
8 points: trying to isolate the problem
We could see that the 1001 key was the troublesome row, so we’ll try to isolate it. Remember that this is a FULL JOIN, so we’ll have to be careful when we’re writing our WHERE clause, because either side of the join could be NULL:
SELECT ISNULL(a.ID, b.ID) FROM #a AS a FULL JOIN #b AS b ON a.ID=b.ID WHERE ISNULL(a.ID, b.ID)=N'1001';
Still two rows. How about if we change the WHERE clause around a little?
SELECT ISNULL(a.ID, b.ID) FROM #a AS a FULL JOIN #b AS b ON a.ID=b.ID WHERE '1001' IN (a.ID, b.ID);
Now we get a single row.
For a very respectable 8 points, have you figured out what’s going on yet?
6 points: show me the data
Let’s look at all the data. Maybe there’s something we’re missing.
SELECT a.ID, ISNULL(a.ID, b.ID), b.ID FROM #a AS a FULL JOIN #b AS b ON a.ID=b.ID WHERE ISNULL(a.ID, b.ID)=N'1001';
Here’s the output:
ID (No column name) ID ---- ---------------- ----- 1001 1001 1001 NULL 1001 1001x
For 6 points, here’s your chance to crack it before we really start touching on the problem.
4 points: another take on ISNULL()
In sheer desperation, we might try to swap ISNULL() for COALESCE(), which essentially does the same thing.
SELECT a.ID, ISNULL(a.ID, b.ID), COALESCE(a.ID, b.ID), b.ID FROM #a AS a FULL JOIN #b AS b ON a.ID=b.ID WHERE ISNULL(a.ID, b.ID)=N'1001';
And now, we’re getting somewhere:
ID (No column name) (No column name) ID ---- ---------------- ---------------- ----- 1001 1001 1001 1001 NULL 1001 1001x 1001x
… so if we just…
SELECT COALESCE(a.ID, b.ID) FROM #a AS a FULL JOIN #b AS b ON a.ID=b.ID WHERE COALESCE(a.ID, b.ID)=N'1001';
… then the duplicates are gone and the output looks the way we expected it.
For 4 points, what happened when we switched ISNULL() for COALESCE()?
2 points: everyone gets a prize!
Seriously, if make it to the end of the post, you deserve at least two points.
The secret lies in how ISNULL() and COALESCE() handle datatypes, i.e. how the implicit datatype conversions happen. You’ll notice that the ID column is an nvarchar(4) in #a, while it’s an nvarchar(8) in #b, so when you compute ISNULL(a.ID, b.ID), some kind of implicit datatype conversion needs to happen – I mean, the output column has to have some kind of datatype, so one or both of the input values will have to be implicitly converted.
You’ll find that the documentation for ISNULL says that the ISNULL() returns the same datatype as the first parameter passed to it. Meanwhile the documentation for COALESCE returns the value with the highest datatype precedence. For instance, nvarchar(4) is implicitly converted to nvarchar(8), which is also what happened in this example.
If you want more meat, Aaron Bertrand goes into plenty more detail in a very well-written post on ISNULL vs COALESCE and datatype precedence.
How did you do?
Let me know how you did! Want more? Let me know in the comments.