I just remembered a pretty common data challenge the other day. Suppose you have a number of tables, all with similar information in them. You want to union their contents, but you need to prioritize them, so you want to choose all the rows from table A, then rows from table B that are not included in A, then rows from C that are not included in A or B, and so on.
This is a pretty common use case in data cleansing or data warehousing applications. There are a few different ways to go about this, some more obvious than others.
I’m an outspoken advocate of always using a clustered index on each and every table you create as a matter of best practice. But even I will agree that there’s a case for using the odd heap now and then.
For practically every piece of code you develop, there will be trade-offs. Sometimes, you can combine the best of two worlds, other times it comes down to some hard choices. For T-SQL developers, it typically boils down to a few key questions:
- How much time can you spend perfecting code instead of just shipping?
- Can we just fix it when it becomes a problem?
- Is buying more hardware cheaper than paying for developers to tune their code?
- Is better code harder to read, and will a junior developer be able to work with it?
In an attempt to try a different approach, here’s a three-minute video explanation of how the different physical join operators in SQL Server work and why you would choose one over the other.
I’ve written a few blog posts on join operators befores, so if this video wet your appetite, here’s some recommended reading:
I’d love to hear what you think of the short video format! Please leave feedback in the comments below or on Twitter.
The “include actual execution plan” feature in SQL Server Management Studio is an invaluable tool for performance tuning. It returns the actual execution plan used for each statement, including actual row counts, tempdb spills and a lot of other information you need to do performance tuning.
But sometimes you want to run a series of statements or procedures where you only want the execution plan for some of the statements. Here’s how:
A very common challenge in T-SQL development is filtering a result so it only shows the last row in each group (partition, in this context). Typically, you’ll see these types of queries for SCD 2 dimension tables, where you only want the most recent version for each dimension member. With the introduction of windowed functions in SQL Server, there are a number of ways to do this, and you’ll see that performance can vary considerably.
The SQL Server query optimizer can find interesting ways to tackle seemingly simple operations that can be hard to optimize. Consider the following query on a table with two indexes, one on (a), the other on (b):
SELECT a, b
WHERE a<=10 OR b<=10000;
The basic problem is that we would really want to use both indexes in a single query.
In this post, we’re going to take a look at a few examples of how this type of query would be optimized, as well as how statistics can affect the query plan, and finally, we’ll take a look at a slightly rare plan operator called “Merge Join (Concatenation)”.
Here’s a quick tip: When you’re evaluating query strategies, you may want to consider how your query will scale when the volume in your database goes up. This does not neccessarily mean that you have to start filling your tables with gigabytes on gigabytes of data.
Back in 2014 I wrote a blog post on how to calculate a median value using the NTILE window function. It’s far from the best performing solution there is, but it worked on SQL Server 2008, before the introduction of OFFSET-FETCH i SQL Server. In this post, I’m going to look at creating a generalized function that calculates the median (or any percentile) of a series of values.
It’s been absolutely ages since I last wrote a blog post, mostly because I’ve been busy getting my shiny new own consultancy up to speed, but I’ll admit that writer’s block has also been a factor.
But here’s something to write home about. This year’s annual Swedish SQL Server usergroup challenge was as interesting as ever, and it marks my third stab at this prestigious competition. In this post, I’ll go through my contribution, highlighting some of the techniques that I’ve applied to make it go really fast.