Windowed DISTINCT aggregates

You may have discovered that the use of DISTINCT is not supported in windowed functions. A query that uses a distinct aggregate in a windowed function,

SELECT COUNT(DISTINCT something) OVER (PARTITION BY other)
FROM somewhere;

will generate the following error message:

Msg 10759, Level 15, State 1, Line 1
Use of DISTINCT is not allowed with the OVER clause.

There are, however, a few relatively simple workarounds that are suprisingly efficient.

Continue reading

Last row per group

A very common challenge in T-SQL development is filtering a result so it only shows the last row in each group (partition, in this context). Typically, you’ll see these types of queries for SCD 2 dimension tables, where you only want the most recent version for each dimension member. With the introduction of windowed functions in SQL Server, there are a number of ways to do this, and you’ll see that performance can vary considerably.

Continue reading

Efficient data, part 6: Versioning changes

This installment in the series on efficient data is on versioning changes in a table. The article is a re-post of a post I wrote in september on compressing slowly changing dimensions, although the concept does not only apply to dimensions – it can be used pretty much on any data that changes over time.

The idea is to “compress” a versioned table, so instead of just adding a date column for each version, you can compress multiple, sequential versions into a single row with a “from” date and a “to” date. This can significantly compress the size of the table.

Continue reading