Scalar functions can be a real headache when you’re performance tuning. For one, they don’t parallelize. In fact, if you use a scalar function in a computed column, it will prevent any query that uses that table from going parallel – even if you don’t reference that column at all!
Let’s set up a table:
CREATE TABLE dbo.lots_of_rows_computed ( a int NOT NULL, b int NOT NULL, c AS 2*a, PRIMARY KEY CLUSTERED (a) );
Note that we have a computed column that doubles the value of “a”. We’ll use UPDATE STATISTICS to force SQL Server to think that this is a really large table, hoping that this might persuade the optimizer to choose a parallel plan.
UPDATE STATISTICS dbo.lots_of_rows_computed WITH ROWCOUNT=1000000000;
Now, let’s try an identical table, except the computed column is calculated using a scalar function. The calculation and the data, however, should be identical. First, set up the function:
CREATE FUNCTION dbo.[double] (@int int) RETURNS int AS BEGIN; -- Simple enough. RETURN(@int*2); END; GO
.. and the table with some fake statistics:
CREATE TABLE dbo.lots_of_rows_function ( a int NOT NULL, b int NOT NULL, c AS dbo.[double](a), PRIMARY KEY CLUSTERED (a) ); UPDATE STATISTICS dbo.lots_of_rows_function WITH ROWCOUNT=1000000000;
Now, let’s see what it does with a heavy query (that doesn’t reference any computed column).
SELECT SUM(b) FROM dbo.lots_of_rows_computed; SELECT SUM(b) FROM dbo.lots_of_rows_function;
Because both tables have a billion rows each according to their statistics, this should be an easy sell for parallel processing. So how did we do?
The computed column (a*2) parallelizes nicely. The one with scalar function, however..
How to find them
Rounding off this quick post, here’s a handy query to find out which tables contain computed columns using scalar functions.
SELECT s.[name]+'.'+o.[name] AS [table], p.[rows], cc.[name] AS [computed column], cc.[definition], sref.[name]+'.'+ref.[name] AS referenced_object, ref.[type_desc] AS referenced_type FROM sys.schemas AS s INNER JOIN sys.objects AS o ON s.[schema_id]=o.[schema_id] INNER JOIN ( SELECT [object_id], SUM([rows]) AS [rows] FROM sys.partitions WHERE index_id IN (0, 1) GROUP BY [object_id] ) AS p ON o.[object_id]=p.[object_id] INNER JOIN sys.computed_columns AS cc ON o.[object_id]=cc.[object_id] INNER JOIN sys.sql_expression_dependencies AS dep ON cc.[object_id]=dep.referencing_id AND cc.column_id=dep.referencing_minor_id INNER JOIN sys.objects AS ref ON dep.referenced_id=ref.[object_id] AND ref.[type]='FN' -- Scalar function INNER JOIN sys.schemas AS sref ON sref.[schema_id]=ref.[schema_id] WHERE dep.referenced_class=1 -- OBJECT_OR_COLUMN ORDER BY s.[name], o.[name], sref.[name], ref.[name];