Inline variable assignment in UPDATE statements

Ok, here’s a brain-twister. Not only can you assign values to a column in an UPDATE statement using variables, but you can assign values to variables as well. It’s really not as complicated as it may sound, but there are a few trapdoors to avoid.

Inline variable assignment in SELECT

But first, the basics: With the SELECT statement, you can choose to either return data to the caller, or you can assign the results of the query to one or more variables.

DECLARE @something int;

SELECT @something=other
FROM dbo.someTable;

You’d be forgiven for asking what happens if there’s more than one row in the output. The answer is that this variable assignment is performed over and over, once for every row in the output. For instance, consider the following example:

DECLARE @counter int=0;

SELECT @counter=@counter+1
FROM dbo.someTable;

This would add 1 to @counter for each row returned from the SELECT query, in effect a row counter. But you can also use it for more practical appliances, like string concatenations:

DECLARE @tbl TABLE (
    ordinal     int NOT NULL,
    word        varchar(100) NOT NULL,
    PRIMARY KEY CLUSTERED (ordinal)
);

INSERT INTO @tbl (ordinal, word)
VALUES (1, 'This'),   (2, 'is'),
       (3, 'a'),      (4, 'test'),
       (5, 'of'),     (6, 'string'),
       (7, 'concatenation');

DECLARE @output varchar(1000)='';

--- For each row in @tbl, add this row's word to @output.
--- Note that the output order is important here, so we're
--- using ORDER BY at the end.
SELECT @output=@output+' '+word
FROM @tbl
ORDER BY ordinal;

PRINT LTRIM(@output);

On a side-note: We looked at a more advanced string concatenation example, using XML, in a previous post.

Variable assignment in UPDATE

Just like the SELECT statement, the UPDATE statement executes row-by-row on the table being updated. This means that the SET clause of the UPDATE statement is evaluated once for every row in the updated table.

With the SELECT statement, you can’t assign variables and return an output in the same statement. UPDATE, however, does allow you to update a table column and assign values to variables in the same statement, which proves a very useful feature as you’ll see further on.

DECLARE @counter int=0;

UPDATE dbo.someTable
SET amount=1.1*amount,
    @counter=@counter+1
WHERE someCriteria=1;

In the example above, we’re doing just this. We update a table column by increasing the amount column by 10% for a selection of rows, and we assign a value to a variable, by incrementing @counter by 1 for every row. After the UPDATE, we can view the value of the @counter variable to see how many rows we actually modified.

Variable assignments, just like column assignments, can be made using the full range of T-SQL expressions that you could apply to regular columns in an UPDATE, allowing you to also write more complex expressions. For instance, here’s an example using CASE:

DECLARE @maxBatchNumber int=0;

UPDATE dbo.someTable
SET --- 1. multiply the amount
    amount=1.1*amount,
    --- 2. set @maxBatchNumber
    @maxBatchNumber=(CASE
        WHEN batchNumber>@maxBatchNumber THEN batchNumber
        ELSE @maxBatchNumber END)
WHERE someCriteria=1;

Again, because the UPDATE statement loops through all of the rows individually, the variable assignment is evaluated once for each row. In this example, we want to know the highest batch number of the rows we’ve updated. For each row, we use the CASE expression to check if the row’s batch number is higher than @maxBatchNumber. If it is, we’ll update @maxBatchNumber to the current row’s batch number, otherwise, @maxBatchNumber stays the same.

Assignment order

To complicate things a bit, you might want to set a new batch number on rows that you’ve updated, and still keep a record of the highest previous batch numbers before the update.

DECLARE @maxBatchNumber int=0,
        @newBatchNumber int=1234;

UPDATE dbo.someTable
SET --- 1. multiply the amount
    amount=1.1*amount,
    --- 2. set @maxBatchNumber
    @maxBatchNumber=(CASE
        WHEN batchNumber>@maxBatchNumber THEN batchNumber
        ELSE @maxBatchNumber END),
    --- 3. now, update table column
    batchNumber=@newBatchNumber
WHERE someCriteria=1;

With this change, the importance of the update order becomes apparent: SQL Server will perform the column/variable assignments in the order that they’re written, i.e. first the amount column, then the @maxBatchNumber variable, and finally the batchNumber column. If you switch the order of the last two, the query won’t work the way we want to.

This also applies when you look at sequential assignments. These two syntaxes generate different results:

--- Set both @variable and amount to (amount*1.1)
SET @variable=amount=amount*1.1

--- Set @variable to old amount, then set amount to (amount*1.1)
SET @variable=amount, amount=amount*1.1

The first case will set both @variable and amount to amount*1.1. The second example will set @variable to amount, then amount to amount*1.1. These are both valid syntaxes and have their own specific uses.

Calculating aggregates

Obviously, the use of variables isn’t simply limited to just counting the rows updated in a table. You can use them to calculate pretty much any aggregate you can think of. How about a grand total of how much we’ve added to the amount column?

DECLARE @amountTotal numeric(10, 2)=0;

UPDATE dbo.someTable
SET --- How much we've added to the amount column:
    @amountTotal=@amountTotal+0.1*amount,
    --- .. and perform the actual update of amount:
    amount=amount*1.1
WHERE someCriteria=1;

Other types of aggregates may take more than one variable to compute, but are also possible. Here’s the average of all the changes to the amount column.

DECLARE @amountTotal numeric(10, 2)=0, @count int=0, @amountAverage numeric(10, 2);

UPDATE dbo.someTable
SET --- How much we've added to the amount column:
    @amountTotal=@amountTotal+0.1*amount,
    --- Add one to the counter:
    @counter=@counter+1,
    --- .. and perform the actual update of amount:
    amount=amount*1.1
WHERE someCriteria=1;

IF (@count>0)
    --- The average is the total divided by the count:
    SET @amountAverage=@amountTotal/@count;

To make the code as readable and understandable as possible, we’re using two variables: One that adds up the total of all changes, and one that counts the number of rows updated. The average is the total divided by the number of changes (rows). Note: Don’t divide by @count if you haven’t actually updated any rows, or you’ll get a division by zero error.

Calculating (ordered) inline running totals

So far, we’ve worked only with unordered sets. Some calculations, like running totals, require that the data is ordered properly to function correctly.

If you’re careful and know what you’re doing, you can use inline variable assignments to calculate not just aggregates but running aggregates in a table, like you might otherwise using window functions. This is particularly useful if you haven’t made the move to SQL Server 2012 or 2014 yet, since these types of ordered window functions aren’t available in older versions and require some potentially cumbersome workarounds to achieve.

Say you have a table with accounts and transactions:

CREATE TABLE #transactions (
    account       int NOT NULL,
    [row]         int NOT NULL,
    amount        numeric(10, 2) NOT NULL,
    balance       numeric(10, 2) NULL,
    PRIMARY KEY CLUSTERED (account, [row])
);

INSERT INTO #transactions (account, [row], amount)
VALUES (1, 1, 100.0),
       (1, 2, 101.0),
       (1, 3, 102.0),
       (1, 4, 103.0),
       (2, 1, 201.0),
       (2, 2, 202.0),
       (2, 3, 203.0),
       (3, 1, 301.0),
       (3, 2, 302.0),
       (4, 1, 401.0);

What we want to do is update the balance column of the table with the running total of the amount column. But every time we start over with a new account, the running total is zeroed out. The resulting query requires a variable to keep track of the “current” account and one to accumulate the running account balance.

DECLARE @account int=0, @balance numeric(10, 2)=0;

UPDATE t
SET --- Update both the @balance variable and the
    --- balance column:
    @balance=balance=(CASE
        --- Still the same account? Add to @balance:
        WHEN @account=account THEN @balance+amount
        --- Different account? Reset @balance:
        ELSE amount END),
    --- Update @account with "current" account
    @account=account
FROM #transactions AS t;

SELECT *
FROM #transactions
ORDER BY account, [row];

This example will probably work as designed if you try it. But with different indexes, you might get some unexpected results.

Using the right index

The important thing to remember here is that you can’t explicitly specify an ORDER BY clause in an UPDATE statement, and yet you still need the data to be properly sorted. Which means that sorting will have to be done by using an index that is suitable for the operation. In our example, the table has a clustered index that is sorted first by account, then by row, just like we need it to be.

But with a different index, the query optimizer might choose a different index and, with that, a different sort order which will ruin your results. For instance, if we create the following index:

CREATE UNIQUE INDEX #transactions_ix2 ON #transactions ([row], account) INCLUDE (amount);

.. the optimizer may find that using this index is preferrable for some reason. But by doing this, the records will be ordered by row, then account, which means that the account changes for every row.

The solution is to force the query to use a specific index by specifying an index hint:

FROM #transactions AS t WITH (INDEX=1)

Writing index hints, you can specify the ordinal of the index or the name of the index. Index 1 is always the clustered index, if there is any.

Parallellisation

If the running totals query were to be parallellised, the order of the records would probably be changed. Just as with an improperly chosen index, this would lead to incorrect running totals. However, I haven’t managed to get SQL Server to run this type of query using parallellisation, so it appears that the query optimizer detects this type of operation and avoids using parallel execution plans.

Either way, as a precaution, you may be wise to ensure that no parallellisation happens by adding the MAXDOP query hint at the end of the update:

OPTION (MAXDOP 1)

As usual, let me know if there’s anything I’ve left out. And like the Facebook page to receive regular news updates.

6 comments

  1. Cool post! When you say “SQL Server will perform the column/variable assignments in the order that they’re written,” is that documented (and is it ANSI standard?), or does that just happen to be what happens?

    Also, just FYI on calculating running totals – it happens to be like the one scenario where a CURSOR is actually must FASTER than a set-based update on a large dataset.

    1. It’s not something I could find documented, so this is probably one of those things that you should test extensively before putting in production. 🙂

  2. Assigment order
    I try simple example:
    declare @a int
    create table ##t (i int, a char)
    insert into ##t values(1, ‘a’), (2, ‘b’), (3, ‘c’)
    select @a = max(i)+1 from ##t
    select @a
    select * from ##t
    update ##t set i = @a, @a = @a +1
    select * from ##t
    drop table ##t

    no mater of order of setting. Every time first assign variable, then column – so final table looks like:

    i a
    ———
    5 a
    6 b
    7 c

    not
    i a
    ——–
    4 a
    5 b
    6 c

    as I expected..

    Tested on MSSQL v12.0.4237.0

    1. try this
      declare @a int
      create table ##t (i int, a char)
      insert into ##t values(1, ‘a’), (2, ‘b’), (3, ‘c’)
      select @a = max(i) from ##t
      select @a
      select * from ##t
      update ##t set @a = @a +1,i = @a
      select * from ##t
      drop table ##t

  3. update table set @variable = column, column = expression did not work after updating database to 2017(140) … any ideas ?

Leave a comment

Your email address will not be published. Required fields are marked *