As of SQL Server 2008, there’s a new powerful consolidation statement in the DML toolbox: MERGE. Using MERGE, you can perform so-called “upserts”, i.e. one statement that performs an insert, delete and/or update in a single statement. And, more importantly, with just a single join.
The MERGE syntax
The MERGE syntax basically comprises the target (the table you want to update), the source, where the updates come from, and a join condition between the two. Then you can arbitrarily add conditions describing the relation between the source and the target – like if there is any matching source or destination data, or more specific T-SQL conditions on columns in the source or destination table.
Here’s a test scenario with two tables, #src and #tgt:
CREATE TABLE #src ( a int NOT NULL, b numeric(28, 8) NOT NULL, PRIMARY KEY (a) ); INSERT INTO #src VALUES (1, 1234.00), (3, 1340.00); CREATE TABLE #tgt ( a int NOT NULL, b numeric(28, 8) NOT NULL PRIMARY KEY (a) ); INSERT INTO #tgt VALUES (2, 1239.00), (3, 1345.00); SELECT * FROM #src AS src FULL JOIN #tgt AS tgt ON src.a=tgt.a;
As you can see, if we want to merge #src into #tgt, we’ll need to
- Add a record for a=1
- Delete the record with a=2
- Update the b column of the record with a=3
Here’s the MERGE statement that does all of that:
--- Which table we're merging into.. MERGE INTO #tgt AS t --- .. and what data we're using: USING #src AS s ON t.a=s.a --- When the record exists in target, but there's not --- matching record in source: WHEN NOT MATCHED BY SOURCE THEN DELETE --- When there's a record in source, but no matching --- record in target: WHEN NOT MATCHED BY TARGET THEN INSERT (a, b) VALUES (a, b) --- When there are matching records in the source and --- target tables: WHEN MATCHED THEN UPDATE SET t.b=s.b;
You can further expand the WHEN clauses, with standard T-SQL conditions like AND/OR, etc. For instance, maybe you only want to update #tgt if #src.b is less than #tgt.b:
WHEN MATCHED AND t.b>s.b THEN UPDATE SET t.b=s.b;
MERGE with a common table expression
As with any other DML statement, you can use a common table expression to construct the source (in USING). All you need to do is use WITH to construct the CTE before you use the MERGE keyword.
--- Defining the CTE.. WITH s (a, b) AS (SELECT a, b FROM #src) MERGE INTO #tgt AS t --- .. and using it in the merge statement: USING s ON t.a=s.a WHEN NOT MATCHED BY SOURCE AND t.b!=0 THEN DELETE WHEN NOT MATCHED BY TARGET THEN INSERT (a, b) VALUES (a, b) WHEN MATCHED THEN UPDATE SET t.b=s.b;
Keeping track of your changes with MERGE using OUTPUT
With MERGE as with any other DML statement, you can output the results of the operation using the OUTPUT keyword. In the MERGE statement, OUTPUT is placed last. Let’s say we have a log table that will receive a running change log of everything with do with this MERGE statement:
CREATE TABLE #log ( a int NOT NULL, dt datetime NOT NULL, what varchar(1000) NOT NULL, PRIMARY KEY (dt, a) )
Here’s the MERGE statement with the OUTPUT clause:
MERGE INTO #tgt AS t
USING #src AS s ON
t.a=s.a
WHEN NOT MATCHED BY SOURCE AND t.b!=0 THEN
DELETE
WHEN NOT MATCHED BY TARGET THEN
INSERT (a, b)
VALUES (a, b)
WHEN MATCHED THEN
UPDATE SET t.b=s.b
OUTPUT ISNULL(inserted.a, deleted.a),
GETDATE() AS dt,
SUSER_SNAME()+' '+$action+'d.' AS what
INTO #log (a, dt, what);
Notice the $action variable? It’s a special variable that you’ll only see in the OUTPUT clause of a MERGE statement – it can assume one of three values, “INSERT”, “UPDATE” or “DELETE”, depending on what the merge does with each affected row.
Semi-colons
As with common table expressions, the MERGE statement requires you to use semi-colons after the statement to separate it from the following statements. Again, semi-colons are the future of T-SQL, so you might as well start using them anyway, or you’ll spend time in a future version of SQL Server cleaning up your code.
MERGE and performance
Most of the time, you’ll find that the main performance issue with performing a combined INSERT, UPDATE and DELETE operation is in the JOIN between the source and the destination tables. This means that by traditional means, you’d have to perform this JOIN three times, once for the INSERT (source LEFT JOIN target), once for the UPDATE (source INNER JOIN target) and finally once for the DELETE (source RIGHT JOIN target). So if you can replace these three statements with a single MERGE statement, it will only perform a single JOIN, followed by a merge operator that does all the work in a single operation.
But check the query plan carefully and compare the two solutions – depending on your specific database and tables, the old-school three-part way of doing it with an INSERT, UPDATE and DELETE may still be faster.
But it’ll certainly never be as pretty to look at..
4 comments