Ok, here’s a brain-twister. Not only can you assign values to a column in an UPDATE statement using variables, but you can assign values to variables as well. It’s really not as complicated as it may sound, but there are a few trapdoors to avoid.
Inline variable assignment in SELECT
But first, the basics: With the SELECT statement, you can choose to either return data to the caller, or you can assign the results of the query to one or more variables.
DECLARE @something int; SELECT @something=other FROM dbo.someTable;
You’d be forgiven for asking what happens if there’s more than one row in the output. The answer is that this variable assignment is performed over and over, once for every row in the output. For instance, consider the following example:
DECLARE @counter int=0; SELECT @[email protected]+1 FROM dbo.someTable;
This would add 1 to @counter for each row returned from the SELECT query, in effect a row counter. But you can also use it for more practical appliances, like string concatenations:
DECLARE @tbl TABLE ( ordinal int NOT NULL, word varchar(100) NOT NULL, PRIMARY KEY CLUSTERED (ordinal) ); INSERT INTO @tbl (ordinal, word) VALUES (1, 'This'), (2, 'is'), (3, 'a'), (4, 'test'), (5, 'of'), (6, 'string'), (7, 'concatenation'); DECLARE @output varchar(1000)=''; --- For each row in @tbl, add this row's word to @output. --- Note that the output order is important here, so we're --- using ORDER BY at the end. SELECT @[email protected]+' '+word FROM @tbl ORDER BY ordinal; PRINT LTRIM(@output);
On a side-note: We looked at a more advanced string concatenation example, using XML, in a previous post.
Variable assignment in UPDATE
Just like the SELECT statement, the UPDATE statement executes row-by-row on the table being updated. This means that the SET clause of the UPDATE statement is evaluated once for every row in the updated table.
With the SELECT statement, you can’t assign variables and return an output in the same statement. UPDATE, however, does allow you to update a table column and assign values to variables in the same statement, which proves a very useful feature as you’ll see further on.
DECLARE @counter int=0; UPDATE dbo.someTable SET amount=1.1*amount, @[email protected]+1 WHERE someCriteria=1;
In the example above, we’re doing just this. We update a table column by increasing the amount column by 10% for a selection of rows, and we assign a value to a variable, by incrementing @counter by 1 for every row. After the UPDATE, we can view the value of the @counter variable to see how many rows we actually modified.
Variable assignments, just like column assignments, can be made using the full range of T-SQL expressions that you could apply to regular columns in an UPDATE, allowing you to also write more complex expressions. For instance, here’s an example using CASE:
DECLARE @maxBatchNumber int=0; UPDATE dbo.someTable SET --- 1. multiply the amount amount=1.1*amount, --- 2. set @maxBatchNumber @maxBatchNumber=(CASE WHEN batchNumber>@maxBatchNumber THEN batchNumber ELSE @maxBatchNumber END) WHERE someCriteria=1;
Again, because the UPDATE statement loops through all of the rows individually, the variable assignment is evaluated once for each row. In this example, we want to know the highest batch number of the rows we’ve updated. For each row, we use the CASE expression to check if the row’s batch number is higher than @maxBatchNumber. If it is, we’ll update @maxBatchNumber to the current row’s batch number, otherwise, @maxBatchNumber stays the same.
To complicate things a bit, you might want to set a new batch number on rows that you’ve updated, and still keep a record of the highest previous batch numbers before the update.
DECLARE @maxBatchNumber int=0, @newBatchNumber int=1234; UPDATE dbo.someTable SET --- 1. multiply the amount amount=1.1*amount, --- 2. set @maxBatchNumber @maxBatchNumber=(CASE WHEN batchNumber>@maxBatchNumber THEN batchNumber ELSE @maxBatchNumber END), --- 3. now, update table column [email protected] WHERE someCriteria=1;
With this change, the importance of the update order becomes apparent: SQL Server will perform the column/variable assignments in the order that they’re written, i.e. first the amount column, then the @maxBatchNumber variable, and finally the batchNumber column. If you switch the order of the last two, the query won’t work the way we want to.
This also applies when you look at sequential assignments. These two syntaxes generate different results:
--- Set both @variable and amount to (amount*1.1) SET @variable=amount=amount*1.1 --- Set @variable to old amount, then set amount to (amount*1.1) SET @variable=amount, amount=amount*1.1
The first case will set both @variable and amount to amount*1.1. The second example will set @variable to amount, then amount to amount*1.1. These are both valid syntaxes and have their own specific uses.
Obviously, the use of variables isn’t simply limited to just counting the rows updated in a table. You can use them to calculate pretty much any aggregate you can think of. How about a grand total of how much we’ve added to the amount column?
DECLARE @amountTotal numeric(10, 2)=0; UPDATE dbo.someTable SET --- How much we've added to the amount column: @[email protected]+0.1*amount, --- .. and perform the actual update of amount: amount=amount*1.1 WHERE someCriteria=1;
Other types of aggregates may take more than one variable to compute, but are also possible. Here’s the average of all the changes to the amount column.
DECLARE @amountTotal numeric(10, 2)=0, @count int=0, @amountAverage numeric(10, 2); UPDATE dbo.someTable SET --- How much we've added to the amount column: @[email protected]+0.1*amount, --- Add one to the counter: @[email protected]+1, --- .. and perform the actual update of amount: amount=amount*1.1 WHERE someCriteria=1; IF (@count>0) --- The average is the total divided by the count: SET @[email protected]/@count;
To make the code as readable and understandable as possible, we’re using two variables: One that adds up the total of all changes, and one that counts the number of rows updated. The average is the total divided by the number of changes (rows). Note: Don’t divide by @count if you haven’t actually updated any rows, or you’ll get a division by zero error.
Calculating (ordered) inline running totals
So far, we’ve worked only with unordered sets. Some calculations, like running totals, require that the data is ordered properly to function correctly.
If you’re careful and know what you’re doing, you can use inline variable assignments to calculate not just aggregates but running aggregates in a table, like you might otherwise using window functions. This is particularly useful if you haven’t made the move to SQL Server 2012 or 2014 yet, since these types of ordered window functions aren’t available in older versions and require some potentially cumbersome workarounds to achieve.
Say you have a table with accounts and transactions:
CREATE TABLE #transactions ( account int NOT NULL, [row] int NOT NULL, amount numeric(10, 2) NOT NULL, balance numeric(10, 2) NULL, PRIMARY KEY CLUSTERED (account, [row]) ); INSERT INTO #transactions (account, [row], amount) VALUES (1, 1, 100.0), (1, 2, 101.0), (1, 3, 102.0), (1, 4, 103.0), (2, 1, 201.0), (2, 2, 202.0), (2, 3, 203.0), (3, 1, 301.0), (3, 2, 302.0), (4, 1, 401.0);
What we want to do is update the balance column of the table with the running total of the amount column. But every time we start over with a new account, the running total is zeroed out. The resulting query requires a variable to keep track of the “current” account and one to accumulate the running account balance.
DECLARE @account int=0, @balance numeric(10, 2)=0; UPDATE t SET --- Update both the @balance variable and the --- balance column: @balance=balance=(CASE --- Still the same account? Add to @balance: WHEN @account=account THEN @balance+amount --- Different account? Reset @balance: ELSE amount END), --- Update @account with "current" account @account=account FROM #transactions AS t; SELECT * FROM #transactions ORDER BY account, [row];
This example will probably work as designed if you try it. But with different indexes, you might get some unexpected results.
Using the right index
The important thing to remember here is that you can’t explicitly specify an ORDER BY clause in an UPDATE statement, and yet you still need the data to be properly sorted. Which means that sorting will have to be done by using an index that is suitable for the operation. In our example, the table has a clustered index that is sorted first by account, then by row, just like we need it to be.
But with a different index, the query optimizer might choose a different index and, with that, a different sort order which will ruin your results. For instance, if we create the following index:
CREATE UNIQUE INDEX #transactions_ix2 ON #transactions ([row], account) INCLUDE (amount);
.. the optimizer may find that using this index is preferrable for some reason. But by doing this, the records will be ordered by row, then account, which means that the account changes for every row.
The solution is to force the query to use a specific index by specifying an index hint:
FROM #transactions AS t WITH (INDEX=1)
Writing index hints, you can specify the ordinal of the index or the name of the index. Index 1 is always the clustered index, if there is any.
If the running totals query were to be parallellised, the order of the records would probably be changed. Just as with an improperly chosen index, this would lead to incorrect running totals. However, I haven’t managed to get SQL Server to run this type of query using parallellisation, so it appears that the query optimizer detects this type of operation and avoids using parallel execution plans.
Either way, as a precaution, you may be wise to ensure that no parallellisation happens by adding the MAXDOP query hint at the end of the update:
OPTION (MAXDOP 1)
As usual, let me know if there’s anything I’ve left out. And like the Facebook page to receive regular news updates.