You may have discovered that the use of DISTINCT is not supported in windowed functions. A query that uses a distinct aggregate in a windowed function,
SELECT COUNT(DISTINCT something) OVER (PARTITION BY other)
will generate the following error message:
Msg 10759, Level 15, State 1, Line 1
Use of DISTINCT is not allowed with the OVER clause.
There are, however, a few relatively simple workarounds that are suprisingly efficient.
A very common challenge in T-SQL development is filtering a result so it only shows the last row in each group (partition, in this context). Typically, you’ll see these types of queries for SCD 2 dimension tables, where you only want the most recent version for each dimension member. With the introduction of windowed functions in SQL Server, there are a number of ways to do this, and you’ll see that performance can vary considerably.
For windowed functions, SQL Server introduces two new operators in the execution plan; Segment and Sequence Project. If you’ve tried looking them up in the documentation, you’ll know that it’s not exactly perfectly obvious how they work. Here’s my stab at clarifying what they actually do.
When you’re designing reports, they can often be based on hiearchies represented by “nodes” in a parent-child setup. To the end-user, the parent-child representation doesn’t provide very much readability, so you need to output this information in a human-readable form, for instance in a table where the names/titles are indented.
This installment in the series on efficient data is on versioning changes in a table. The article is a re-post of a post I wrote in september on compressing slowly changing dimensions, although the concept does not only apply to dimensions – it can be used pretty much on any data that changes over time.
The idea is to “compress” a versioned table, so instead of just adding a date column for each version, you can compress multiple, sequential versions into a single row with a “from” date and a “to” date. This can significantly compress the size of the table.
Here’s an interesting feature I found in the code of a colleague the other day. A common task in T-SQL is eliminating duplicate records. This became a lot easier with the introduction of windowed functions way back in SQL Server 2005, such as ROW_NUMBER(), but it turns out, I’ve still been missing out on a really simple and cool solution.
In this second installment of the Slowly Changing Dimensions series (see part one here), we’ll take a look at how to practically create a slowly changing dimension table using T-SQL.
Check out this interesting article from SQL Server superstar
Windowed functions are a powerful feature of T-SQL, allowing you to perform advanced aggregates. They provide a very efficient way of doing this as soon as you just get the hang of the OVER() clause.