Consolidating grouped transactions into evenly sized batches

I recently worked with a large set of accounting transactions. I needed to split those rows into multiple logical batches, but each batch had to be logically consistent – among other things, those batches had to be properly balanced, because accounting people are kind of fussy like that.

So I designed a little T-SQL logic that would split all of those transactions into evenly sized batches, without violating their logical groupings.

Safety glasses on. Let’s dive in.

Continue reading

Grouping dates without blocking operators

It’s not entirely uncommon to want to group by a computed expression in an aggregation query. The trouble is, whenever you group by a computed expression, SQL Server considers the ordering of the data to be lost, and this will turn your buttery-smooth Stream Aggregate operation into a Hash Match (aggregate) or create a corrective Sort operation, both of which are blocking.

Is there anything we can do about this? Yes, sometimes, like when those computed expressions are YEAR() and MONTH(), there is. But you should probably get your nerd on for this one.

Continue reading

Human-readable ranges of integers or dates

This is a real-world problem that I came across the other day. In a reporting scenario, I wanted to output a number of values in an easy, human-readable way for a report. But just making a long, comma-separated string of numbers doesn’t really make it very readable. This is particularly true when there are hundreds of values.

So here’s a powerful pattern to solve that task.

Continue reading

Windowed DISTINCT aggregates

You may have discovered that the use of DISTINCT is not supported in windowed functions. A query that uses a distinct aggregate in a windowed function,

SELECT COUNT(DISTINCT something) OVER (PARTITION BY other)
FROM somewhere;

will generate the following error message:

Msg 10759, Level 15, State 1, Line 1
Use of DISTINCT is not allowed with the OVER clause.

There are, however, a few relatively simple workarounds that are suprisingly efficient.

Continue reading

Last row per group

A very common challenge in T-SQL development is filtering a result so it only shows the last row in each group (partition, in this context). Typically, you’ll see these types of queries for SCD 2 dimension tables, where you only want the most recent version for each dimension member. With the introduction of windowed functions in SQL Server, there are a number of ways to do this, and you’ll see that performance can vary considerably.

Continue reading