Prioritizing rows in a union

I just remembered a pretty common data challenge the other day. Suppose you have a number of tables, all with similar information in them. You want to union their contents, but you need to prioritize them, so you want to choose all the rows from table A, then rows from table B that are not included in A, then rows from C that are not included in A or B, and so on.

This is a pretty common use case in data cleansing or data warehousing applications. There are a few different ways to go about this, some more obvious than others.

Continue reading

Not giving a shit about performance is tech-debt

For practically every piece of code you develop, there will be trade-offs. Sometimes, you can combine the best of two worlds, other times it comes down to some hard choices. For T-SQL developers, it typically boils down to a few key questions:

  • How much time can you spend perfecting code instead of just shipping?
  • Can we just fix it when it becomes a problem?
  • Is buying more hardware cheaper than paying for developers to tune their code?
  • Is better code harder to read, and will a junior developer be able to work with it?

Continue reading

Video: three SQL Server join operators in three minutes

In an attempt to try a different approach, here’s a three-minute video explanation of how the different physical join operators in SQL Server work and why you would choose one over the other.

More reading

I’ve written a few blog posts on join operators befores, so if this video wet your appetite, here’s some recommended reading:

I’d love to hear what you think of the short video format! Please leave feedback in the comments below or on Twitter.

Selectively disable “Include actual execution plan”

The “include actual execution plan” feature in SQL Server Management Studio is an invaluable tool for performance tuning. It returns the actual execution plan used for each statement, including actual row counts, tempdb spills and a lot of other information you need to do performance tuning.

But sometimes you want to run a series of statements or procedures where you only want the execution plan for some of the statements. Here’s how:

Continue reading

Last row per group

A very common challenge in T-SQL development is filtering a result so it only shows the last row in each group (partition, in this context). Typically, you’ll see these types of queries for SCD 2 dimension tables, where you only want the most recent version for each dimension member. With the introduction of windowed functions in SQL Server, there are a number of ways to do this, and you’ll see that performance can vary considerably.

Continue reading

Different query plans for “OR” type queries

The SQL Server query optimizer can find interesting ways to tackle seemingly simple operations that can be hard to optimize. Consider the following query on a table with two indexes, one on (a), the other on (b):

SELECT a, b
FROM #data
WHERE a<=10 OR b<=10000;

The basic problem is that we would really want to use both indexes in a single query.

In this post, we’re going to take a look at a few examples of how this type of query would be optimized, as well as how statistics can affect the query plan, and finally, we’ll take a look at a slightly rare plan operator called “Merge Join (Concatenation)”.

Continue reading

The 2015 Swedish SQL Server usergroup challenge

It’s been absolutely ages since I last wrote a blog post, mostly because I’ve been busy getting my shiny new own consultancy up to speed, but I’ll admit that writer’s block has also been a factor.

But here’s something to write home about. This year’s annual Swedish SQL Server usergroup challenge was as interesting as ever, and it marks my third stab at this prestigious competition. In this post, I’ll go through my contribution, highlighting some of the techniques that I’ve applied to make it go really fast.

Continue reading