A popular form of organizing dimensions is in parent-child structures, also known as “unbalanced” or “ragged” dimensions, because any branch can have an arbitrary number of child levels. There are many advantages to this type of representation, but their recursive nature also brings some challenges. In this post, we’re going to look at circular references, and how you can trap them before they run out of control.
Category: Advanced
Median and percentile in T-SQL
Back in 2014 I wrote a blog post on how to calculate a median value using the NTILE window function. It’s far from the best performing solution there is, but it worked on SQL Server 2008, before the introduction of OFFSET-FETCH i SQL Server. In this post, I’m going to look at creating a generalized function that calculates the median (or any percentile) of a series of values.
Segment and Sequence Project
For windowed functions, SQL Server introduces two new operators in the execution plan; Segment and Sequence Project. If you’ve tried looking them up in the documentation, you’ll know that it’s not exactly perfectly obvious how they work. Here’s my stab at clarifying what they actually do.
Joining two SCD2 tables
A number of OLTP systems store dimension data in SCD2-like tables in order to retain all the revisions whenever the dimension information changes. In certain situations, you may come across a need to join two or more SCD tables, while keeping all the versions information intact. Sound tricky? Not really.
Parallel execution, part 3
In the two previous parts of this series, we’ve looked at how parallelism works, how you can control it, and how it affects your query (and server) performance in different environments. In this, the third part, we’re going to take a more technical look at how the different Parallelism operators work.
Parallel execution, part 2
Continuing on last week’s post on parallelism, here’s part two, where we take a closer look at when parallel plans are considered and what you can do to either force or prevent a query from running parallel as well as things you want to avoid if you’re trying to achieve a parallel query plan.
Partitioned views over table partitioning
Today, we’re going to be looking at a kind of poor-mans’s-partitioning, using a view to union records from multiple tables. We’ll also take a look at when you would want this type of solution, some benefits and drawbacks, as well as ways to make things go faster.
Inline variable assignment in UPDATE statements
Ok, here’s a brain-twister. Not only can you assign values to a column in an UPDATE statement using variables, but you can assign values to variables as well. It’s really not as complicated as it may sound, but there are a few trapdoors to avoid.
Blocking/non-blocking aggregate operators
Some database operations can be performed in distinctly different manners, with different impacts on query performance. One important example of such an operation is calculating an aggregate. In this article, we’ll take a look at how aggregates can be “blocking” or “non-blocking”, how it affects memory allocation, and ultimately, what impact this has on your query.
HASH JOIN deep-dive
Among the three different types of join operators used by SQL Server, the HASH JOIN does some of the hardest work. It scales pretty well and is very suitable for parallel processing. As such, it can be very powerful in many applications, but hash joins can potentially consume quite a bit of memory, so seeing on in your query plan could be an indicator of a performance tuning issue in your query or data.