I recently worked with a large set of accounting transactions. I needed to split those rows into multiple logical batches, but each batch had to be logically consistent – among other things, those batches had to be properly balanced, because accounting people are kind of fussy like that.
So I designed a little T-SQL logic that would split all of those transactions into evenly sized batches, without violating their logical groupings.
Safety glasses on. Let’s dive in.
I’ve solved this puzzle a number of times, but I’ve never really been quite happy with the result. It was either too slow, too much code, too hard to understand. So here’s a fresh take at computing the time-to-payment on a large amount of invoices, with multiple, overlapping, partial payments.
It’s not entirely uncommon to want to group by a computed expression in an aggregation query. The trouble is, whenever you group by a computed expression, SQL Server considers the ordering of the data to be lost, and this will turn your buttery-smooth Stream Aggregate operation into a Hash Match (aggregate) or create a corrective Sort operation, both of which are blocking.
Is there anything we can do about this? Yes, sometimes, like when those computed expressions are YEAR() and MONTH(), there is. But you should probably get your nerd on for this one.
This is a real-world problem that I came across the other day. In a reporting scenario, I wanted to output a number of values in an easy, human-readable way for a report. But just making a long, comma-separated string of numbers doesn’t really make it very readable. This is particularly true when there are hundreds of values.
So here’s a powerful pattern to solve that task.
You may have discovered that the use of DISTINCT is not supported in windowed functions. A query that uses a distinct aggregate in a windowed function,
SELECT COUNT(DISTINCT something) OVER (PARTITION BY other)
will generate the following error message:
Msg 10759, Level 15, State 1, Line 1
Use of DISTINCT is not allowed with the OVER clause.
There are, however, a few relatively simple workarounds that are suprisingly efficient.
A very common challenge in T-SQL development is filtering a result so it only shows the last row in each group (partition, in this context). Typically, you’ll see these types of queries for SCD 2 dimension tables, where you only want the most recent version for each dimension member. With the introduction of windowed functions in SQL Server, there are a number of ways to do this, and you’ll see that performance can vary considerably.
For windowed functions, SQL Server introduces two new operators in the execution plan; Segment and Sequence Project. If you’ve tried looking them up in the documentation, you’ll know that it’s not exactly perfectly obvious how they work. Here’s my stab at clarifying what they actually do.
A while back, I was shown an absolutely gorgeous median calculation, using the new OFFSET/FETCH functionality introduced in SQL Server 2012. But this got me thinking. How can you calculate the median in an older SQL Server? Here’s an idea of how to do it using the NTILE() function.
Here’s an interesting feature to that may greatly simplifie a few queries. TOP n WITH TIES works pretty much like TOP, except it orders the top records “densely” to include tied values.
Windowed functions are a powerful feature of T-SQL, allowing you to perform advanced aggregates. They provide a very efficient way of doing this as soon as you just get the hang of the OVER() clause.