I recently ran into a curious deadlock issue. I have a process that performs a lot of updates in a “state” table using multiple, concurrent connections. The business logic in the application guarantees that two connections won’t try to update the same item, so we shouldn’t ever run into any locking issues. And yet, we keep getting deadlocks.
What’s going on here? Hint: it has to do with isolation levels and range locks.
A single statement that does everything.
Let’s set up a very simple test table and give it some test data:
CREATE TABLE dbo.ApplicationState ( id int NOT NULL, val int NOT NULL, PRIMARY KEY CLUSTERED (id) ); INSERT INTO dbo.ApplicationState (id, val) VALUES (100, 0), (10000, 0);
We have an app that very frequently and highly concurrently updates the ApplicationState table with a MERGE statement. The great thing with the MERGE statement is that you don’t have to check if the row exists. With a single statement, SQL Server updates or inserts a row depending on if it already exists or not:
MERGE INTO dbo.ApplicationState WITH (HOLDLOCK) AS t USING ( VALUES (@id, @val) ) AS s(id, val) ON t.id=s.id WHEN NOT MATCHED THEN INSERT (id, val) VALUES (s.id, s.val) WHEN MATCHED THEN UPDATE SET t.val=s.val;
You’ll notice that there’s a HOLDLOCK hint on the target table. We’ll need to talk a little about this.
MERGE has some issues
Years ago, Aaron Bertrand compiled an excellent blog post with all the reported issues with the MERGE statement (there are many), and a surprising amount of them are marked as “Won’t fix” by Microsoft – either because it’s “By design” or because it’s just plain-old not important enough for Microsoft to fix them.
My own takeaway from this post is that if you’re on an older SQL Server version, or if you do funky things like filtered indexes or indexed views, you should carefully review the list if there are issues that might impact you. For a very simple use-case like the one we’re working on here, we’ll probably be fine. One of the generally accepted recommendations to avoid most of these issues is to add a WITH (HOLDLOCK) hint to the target table.
The HOLDLOCK hint tells SQL Server to use the Serializable transaction isolation level. I have a whole post on isolation levels, but you can think of isolation levels as the server’s “SLA”, a promise if you will, to the client application.
- The default “Read Committed” isolation level, basically means: “I promise that what I showed you was accurate and committed at the time.”
- You may have played around with “Read Uncommitted” (the NOLOCK hint). It roughly translates to “To somebody, this may have been accurate at the time, but we don’t know if that transaction will even be committed.”
- The “Serializable” isolation level is the strictest of them all. “I promise that what you saw will still be true if and when you look again.”
What HOLDLOCK does to a MERGE statement
To maintain an isolation level’s promise, and certainly that of the Serializable isolation level, SQL Server needs to place locks. With Read Committed, locks are placed on objects that you modify, in order to prevent other processes from reading something that you may or may not choose to commit later on.
But Serializable places locks even on things you look at – because if you’ve looked at it, SQL Server makes sure that it looks the same if you come back and look at it later on.
Here’s what our MERGE statement looks like in the execution plan. This explains pretty well what the MERGE statement does:
From right to left, the first order of business is to establish if the row exists or not. That’s the Clustered Index Seek in our case. A tiny fraction of a millisecond later, the outcome of that Seek determines if we perform an UPDATE or an INSERT in the Clustered Index Merge operator to the left.
In a database with high transaction throughput, there’s always the risk that the table could change in the short time after we perform the Seek, but before we execute the Merge. If that were to happen, we would get an unexpected outcome, like update a row that no longer exists, or inserting a row that someone else has already created.
Key lock vs. Range lock
If the Seek hits an existing row, SQL Server will place a key lock on that specific row. This prevents other processes from changing this exact row until we’re finished with it. Remember, because we’re in Serializable, we expect the row we saw to be there next time we need it, and to look just the way we found it.
But if we didn’t find a row, there’s nothing specific to lock, because locks can only be placed on keys (rows), pages, partitions, tables and schemas. SQL Server cannot lock an arbitrary search criteria like “WHERE id=123”, because that would create a computational nightmare. To solve this, SQL Server will place a range lock, which holds not a specific key, but everything between two existing keys.
As you can see, our single-point lookup just got a little more invasive.
If we have a table with two rows, id=100 and id=10000, a Seek on id=123 would place a Range lock on 100>id≥1000.
Reconstructing the deadlock
Because a single-row update in a MERGE statement happens in a fraction of a millisecond, and because it’s essentially a two-phase operation, we can rewrite the logic of the MERGE like this:
DECLARE @id int=500, @val int=1234; --- THIS IS SPID 93 BEGIN TRANSACTION; IF (EXISTS (SELECT NULL FROM dbo.ApplicationState WITH (HOLDLOCK) WHERE id=@id) ) BEGIN; WAITFOR DELAY '00:00:10'; UPDATE dbo.ApplicationState SET val=@val WHERE id=@id; END ELSE BEGIN; WAITFOR DELAY '00:00:10'; INSERT INTO dbo.ApplicationState (id, val) VALUES (@id, @val); END; COMMIT TRANSACTION;
I’ve added the 10-second delay to artificially slow down the query. Immediately after starting this in one connection, let’s update an entirely different row in another connection. It doesn’t matter if we use the regular MERGE statement here:
DECLARE @id int=1000, @val int=5678; --- THIS IS SPID 94 MERGE INTO dbo.ApplicationState WITH (HOLDLOCK) AS t USING ( VALUES (@id, @val) ) AS s(id, val) ON t.id=s.id WHEN NOT MATCHED THEN INSERT (id, val) VALUES (s.id, s.val) WHEN MATCHED THEN UPDATE SET t.val=s.val;
Here’s what happens in chronological order:
- SPID 93 checks dbo.ApplicationState for a row with id=500. There’s no such row, but because of the HOLDLOCK hint, it places a shared range lock for [100>id≥1000]. Shared means it’s readable to other processes, but nobody else can change what’s locked.
- SPID 94 checks dbo.ApplicationState for a row with id=1000. Again, no such row exists, so this process also places a range lock on [100>id≥1000]
- SPID 93 wants to INSERT a row with id=500, but it needs to wait on SPID 94’s range lock before proceeding.
- SPID 94 wants to INSERT a row with id=1000, and ends up waiting for SPID 93’s range lock.
- Both SPIDs are waiting on each other, and that’s the deadlock. SQL Server will (after a few seconds) terminate the process that has performed the least work by rolling back its transaction.
Msg 1205, Level 13, State 48, Line 4 Transaction (Process ID 94) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
The simplest fix
The easiest fix is not to fix the problem. Your application should retry failed transactions, and that could be the end of this.
This may be fine if you can wait for the deadlock to be resolved. But remember that there is a background process in SQL Server that looks for and kills deadlocks, and it runs every few seconds, so you could end up waiting 5-7 seconds (worst case) for your chance to retry the transaction.
The simple fix
The root cause of our deadlock was the HOLDLOCK hint. You could remove it, if you’re sure that you don’t need an isolation level this strict – say, if your application keeps track of changes, so you know that it will never try to update the same row from two connections at the same time.
This could, for instance, apply if you have one connection per user and every user has logically segmented keys in the table (user A uses id range 100-199, user B uses 200-299, etc).
Rewriting the MERGE
In our “upsert” scenario, you can rewrite the MERGE statement in one of two ways. The choice of strategy depends on if you expect mostly updates or inserts.
Expecting mainly updates
BEGIN TRANSACTION; UPDATE dbo.ApplicationState SET val=@val WHERE id=@id; IF (@@ROWCOUNT=0) INSERT INTO dbo.ApplicationState (id, val) VALUES (@id, @val); COMMIT TRANSACTION;
We’ll try to update the row first, and if we don’t hit any rows, we immediately do an INSERT. Even if you place HOLDLOCK hint on the UPDATE, this pattern will not cause deadlocks like the MERGE example did!
Expecting mainly inserts
If, on the other hand, expect to have mainly insert transactions with the occasional update, we can try the other way around:
BEGIN TRANSACTION; BEGIN TRY; INSERT INTO dbo.ApplicationState (id, val) VALUES (@id, @val); END TRY BEGIN CATCH; UPDATE dbo.ApplicationState SET val=@val WHERE id=@id; END CATCH; COMMIT TRANSACTION;
We’ll simply try to insert the row, and if that doesn’t work, try an UPDATE.
The performance aspect
For my client workload, using SQLQueryStress, I found that rewriting a MERGE statement to an UPDATE/INSERT or INSERT/UPDATE statement both added about 20% to the execution time. That may be relevant to your workload, or it may mean absolutely nothing compared to not having to deal with deadlock issues.