Using MERGE to insert, delete and update all-in-one

As of SQL Server 2008, there’s a new powerful consolidation statement in the DML toolbox: MERGE. Using MERGE, you can perform so-called “upserts”, i.e. one statement that performs an insert, delete and/or update in a single statement. And, more importantly, with just a single join.

The MERGE syntax

The MERGE syntax basically comprises the target (the table you want to update), the source, where the updates come from, and a join condition between the two. Then you can arbitrarily add conditions describing the relation between the source and the target – like if there is any matching source or destination data, or more specific T-SQL conditions on columns in the source or destination table.

Here’s a test scenario with two tables, #src and #tgt:

CREATE TABLE #src (
    a    int NOT NULL,
    b    numeric(28, 8) NOT NULL,
    PRIMARY KEY (a)
);

INSERT INTO #src VALUES (1, 1234.00), (3, 1340.00);

CREATE TABLE #tgt (
    a    int NOT NULL,
    b    numeric(28, 8) NOT NULL
    PRIMARY KEY (a)
);

INSERT INTO #tgt VALUES (2, 1239.00), (3, 1345.00);

SELECT *
FROM #src AS src
FULL JOIN #tgt AS tgt ON
    src.a=tgt.a;

As you can see, if we want to merge #src into #tgt, we’ll need to

  • Add a record for a=1
  • Delete the record with a=2
  • Update the b column of the record with a=3

Here’s the MERGE statement that does all of that:

--- Which table we're merging into..
MERGE INTO #tgt AS t
--- .. and what data we're using:
USING #src AS s ON
    t.a=s.a

--- When the record exists in target, but there's not
--- matching record in source:
WHEN NOT MATCHED BY SOURCE THEN
    DELETE

--- When there's a record in source, but no matching
--- record in target:
WHEN NOT MATCHED BY TARGET THEN
    INSERT (a, b)
    VALUES (a, b)

--- When there are matching records in the source and
--- target tables:
WHEN MATCHED THEN
    UPDATE SET t.b=s.b;

You can further expand the WHEN clauses, with standard T-SQL conditions like AND/OR, etc. For instance, maybe you only want to update #tgt if #src.b is less than #tgt.b:

WHEN MATCHED AND t.b>s.b THEN
    UPDATE SET t.b=s.b;

MERGE with a common table expression

As with any other DML statement, you can use a common table expression to construct the source (in USING). All you need to do is use WITH to construct the CTE before you use the MERGE keyword.

--- Defining the CTE..
WITH s (a, b)
AS (SELECT a, b FROM #src)

MERGE INTO #tgt AS t
--- .. and using it in the merge statement:
USING s ON
    t.a=s.a

WHEN NOT MATCHED BY SOURCE AND t.b!=0 THEN
    DELETE

WHEN NOT MATCHED BY TARGET THEN
    INSERT (a, b)
    VALUES (a, b)

WHEN MATCHED THEN
    UPDATE SET t.b=s.b;

Keeping track of your changes with MERGE using OUTPUT

With MERGE as with any other DML statement, you can output the results of the operation using the OUTPUT keyword. In the MERGE statement, OUTPUT is placed last. Let’s say we have a log table that will receive a running change log of everything with do with this MERGE statement:

CREATE TABLE #log (
    a     int NOT NULL,
    dt    datetime NOT NULL,
    what  varchar(1000) NOT NULL,
    PRIMARY KEY (dt, a)
)

Here’s the MERGE statement with the OUTPUT clause:

MERGE INTO #tgt AS t
USING #src AS s ON
    t.a=s.a

WHEN NOT MATCHED BY SOURCE AND t.b!=0 THEN
    DELETE

WHEN NOT MATCHED BY TARGET THEN
    INSERT (a, b)
    VALUES (a, b)

WHEN MATCHED THEN
    UPDATE SET t.b=s.b

OUTPUT ISNULL(inserted.a, deleted.a),
       GETDATE() AS dt,
       SUSER_SNAME()+' '+$action+'d.' AS what
INTO #log (a, dt, what);

Notice the $action variable? It’s a special variable that you’ll only see in the OUTPUT clause of a MERGE statement – it can assume one of three values, “INSERT”, “UPDATE” or “DELETE”, depending on what the merge does with each affected row.

Semi-colons

As with common table expressions, the MERGE statement requires you to use semi-colons after the statement to separate it from the following statements. Again, semi-colons are the future of T-SQL, so you might as well start using them anyway, or you’ll spend time in a future version of SQL Server cleaning up your code.

MERGE and performance

Most of the time, you’ll find that the main performance issue with performing a combined INSERT, UPDATE and DELETE operation is in the JOIN between the source and the destination tables. This means that by traditional means, you’d have to perform this JOIN three times, once for the INSERT (source LEFT JOIN target), once for the UPDATE (source INNER JOIN target) and finally once for the DELETE (source RIGHT JOIN target). So if you can replace these three statements with a single MERGE statement, it will only perform a single JOIN, followed by a merge operator that does all the work in a single operation.

But check the query plan carefully and compare the two solutions – depending on your specific database and tables, the old-school three-part way of doing it with an INSERT, UPDATE and DELETE may still be faster.

But it’ll certainly never be as pretty to look at..

4 comments

Leave a comment

Your email address will not be published. Required fields are marked *