Scalar functions can be a real headache when you’re performance tuning. For one, they don’t parallelize. In fact, if you use a scalar function in a computed column, it will prevent any query that uses that table from going parallel – even if you don’t reference that column at all!
Tag: performance
Prioritizing rows in a union
I just remembered a pretty common data challenge the other day. Suppose you have a number of tables, all with similar information in them. You want to union their contents, but you need to prioritize them, so you want to choose all the rows from table A, then rows from table B that are not included in A, then rows from C that are not included in A or B, and so on.
This is a pretty common use case in data cleansing or data warehousing applications. There are a few different ways to go about this, some more obvious than others.
The compelling case for using heaps
I’m an outspoken advocate of always using a clustered index on each and every table you create as a matter of best practice. But even I will agree that there’s a case for using the odd heap now and then.
Not giving a shit about performance is tech-debt
For practically every piece of code you develop, there will be trade-offs. Sometimes, you can combine the best of two worlds, other times it comes down to some hard choices. For T-SQL developers, it typically boils down to a few key questions:
- How much time can you spend perfecting code instead of just shipping?
- Can we just fix it when it becomes a problem?
- Is buying more hardware cheaper than paying for developers to tune their code?
- Is better code harder to read, and will a junior developer be able to work with it?
Video: three SQL Server join operators in three minutes
In an attempt to try a different approach, here’s a three-minute video explanation of how the different physical join operators in SQL Server work and why you would choose one over the other.
More reading
I’ve written a few blog posts on join operators befores, so if this video wet your appetite, here’s some recommended reading:
I’d love to hear what you think of the short video format! Please leave feedback in the comments below or on Twitter.
Selectively disable “Include actual execution plan”
The “include actual execution plan” feature in SQL Server Management Studio is an invaluable tool for performance tuning. It returns the actual execution plan used for each statement, including actual row counts, tempdb spills and a lot of other information you need to do performance tuning.
But sometimes you want to run a series of statements or procedures where you only want the execution plan for some of the statements. Here’s how:
Last row per group
A very common challenge in T-SQL development is filtering a result so it only shows the last row in each group (partition, in this context). Typically, you’ll see these types of queries for SCD 2 dimension tables, where you only want the most recent version for each dimension member. With the introduction of windowed functions in SQL Server, there are a number of ways to do this, and you’ll see that performance can vary considerably.
Different query plans for “OR” type queries
The SQL Server query optimizer can find interesting ways to tackle seemingly simple operations that can be hard to optimize. Consider the following query on a table with two indexes, one on (a), the other on (b):
SELECT a, b FROM #data WHERE a<=10 OR b<=10000;
The basic problem is that we would really want to use both indexes in a single query.
In this post, we’re going to take a look at a few examples of how this type of query would be optimized, as well as how statistics can affect the query plan, and finally, we’ll take a look at a slightly rare plan operator called “Merge Join (Concatenation)”.
Using UPDATE STATISTICS to fake row counts
Here’s a quick tip: When you’re evaluating query strategies, you may want to consider how your query will scale when the volume in your database goes up. This does not neccessarily mean that you have to start filling your tables with gigabytes on gigabytes of data.
Median and percentile in T-SQL
Back in 2014 I wrote a blog post on how to calculate a median value using the NTILE window function. It’s far from the best performing solution there is, but it worked on SQL Server 2008, before the introduction of OFFSET-FETCH i SQL Server. In this post, I’m going to look at creating a generalized function that calculates the median (or any percentile) of a series of values.