This insanely cool new performance-related update is one of nicest features in SQL Server 2019, and certainly one I’ve been looking forward to for a long time.
If you’ve done any work around performance tuning and user-defined scalar functions, I’m pretty sure you’ll love this.
Some example data
Let’s set up some example data. We’ll be working with a generic financial transaction table with some 100 000 rows. Each row has a currency and a currency amount:
CREATE TABLE dbo.Transactions (
TransactionID int NOT NULL,
TransactionDate date NOT NULL,
CurrencyID tinyint NOT NULL,
[Description] nvarchar(2048) NOT NULL,
CurrencyAmount numeric(12, 2) NOT NULL,
PRIMARY KEY CLUSTERED (TransactionID)
);
TRUNCATE TABLE dbo.Transactions;
--- 100 000 transaction rows
INSERT INTO dbo.Transactions (TransactionID, TransactionDate, CurrencyID, [Description], CurrencyAmount)
SELECT TOP (100000)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS TransactionID,
CAST(DATEADD(day, message_id%365, {d '2019-01-01'}) AS date) AS TransactionDate,
message_id%8 AS CurrencyID,
AS [Description],
10000.*RAND(CHECKSUM()) AS LocalAmount
FROM sys.messages;
We’re also going to need a table containing some exchange rates:
CREATE TABLE dbo.ExchangeRates (
CurrencyID tinyint NOT NULL,
[Date] date NOT NULL,
ExchangeRate numeric(12, 8) NOT NULL,
PRIMARY KEY CLUSTERED (CurrencyID, [Date] DESC)
);
TRUNCATE TABLE dbo.ExchangeRates;
INSERT INTO dbo.ExchangeRates (CurrencyID, [Date], ExchangeRate)
SELECT TOP (250)
message_id%8 AS CurrencyID,
CAST(DATEADD(day, message_id%365, {d '2019-01-01'}) AS date) AS [Date],
0.4*(message_id%8)+1.5*RAND(CHECKSUM()) AS ExchangeRate
FROM sys.messages;
Note that in this example, like many real-world applications, you can’t be sure that you’ll have an exchange rate for every combination of currency and date, so what you’ll need to do is get the most recent currency rate for each given currency and date combination.
And here’s the user-defined scalar function that does just that. I wrote it from scratch for this blog post, but I’m sure you’ve seen others just like it a million times in the real-world:
CREATE OR ALTER FUNCTION dbo.ExchangeRate(@CurrencyID tinyint, @Date date)
RETURNS numeric(12, 8)
AS
BEGIN;
--- Anything to declare?
DECLARE @ExchangeRate numeric(12, 8);
--- Fetch the most recent exchange rate for @CurrencyID up until @Date
SELECT TOP (1) @ExchangeRate=ExchangeRate
FROM dbo.ExchangeRates
WHERE CurrencyID=@CurrencyID
AND [Date]<=@Date
ORDER BY [Date] DESC;
--- Give it back:
RETURN(@ExchangeRate);
END;
Scalar functions and performance
User-defined scalar functions, as a rule, kill performance dead.
SQL Server performs best when the workload comes as an uninterrupted stream of work for one or more worker threads. However, when you use a UDF, SQL Server essentially turns your pretty set-based query into one large cursor. It fetches a single row, then runs the function for those values, fetches the next row, calls the same function again with the values from that row, and so on.
Here’s what that looks like in a query plan.
SELECT *, CurrencyAmount*dbo.ExchangeRate(CurrencyID, TransactionDate)
FROM dbo.Transactions AS t;
… and for each row from the transaction table, SQL Server calls the user-defined scalar function:
This example query runs for about 11 500 ms in my small container.
Now, if we run the same query on SQL Server 2019 CTP 2.4 (the most recent version at the time of publishing this post), here’s what the query plan looks like:
This query completes in 2 200 ms, about five times as fast, even though it does not go parallel, and it still performs 100 000 index seeks to dbo.ExchangeRate. So the only difference between the two is the fact that the scalar UDF has been inlined.
What does “inlining” mean?
Basically, SQL Server 2019 interprets the code from the scalar function, and integrates it into the main query plan to form a single plan. So instead of looping function calls, it performs the function logic inline with the rest of the query.
Note that not all scalar functions can be inlined like this – there are a number of constructs that will prevent a UDF from being inlineable. To check if yours is eligible, you can take a peek inside the sys.sql_modules DMV:
SELECT is_inlineable
FROM sys.sql_modules
WHERE [object_id]=OBJECT_ID('dbo.ExchangeRate');
For instance, a scalar function that references a non-deterministic function, like GETDATE() won’t be inlineable.
Other performance benefits
Scalar functions give rise to other performance problems, too. For instance, SQL Server won’t make a very good job of calculating the estimated cost of what happens within the function, and it will effectively block the use of parallel plans, which can be a real problem with large workloads like business intelligence ETL jobs.
One more thing
Hey, I have a newsletter. So if you like newsletters, you should definitely subscribe to this one!
2 comments