Watch out for Merge Interval with date range Index Seeks

In my last post, I found that DATEDIFF, DATEADD and the other date functions in SQL Server are not as datatype agnostic as the documentation would have you believe. Those functions would perform an implicit datatype conversion to either datetimeoffset or datetime (!), which would noticeably affect the CPU time of a query.

Well, today I was building a query on an indexed date range, and the execution plan contained a Merge Interval operator. Turns out, this operator brings a few unexpected surprises to your query performance. The good news is, it’s a relatively simple fix.

DATEDIFF performs implicit conversions

As I was performance tuning a query, I found that a number of date calculation functions in SQL Server appear to be forcing a conversion of their date parameters to a specific datatype, adding computational work to a query that uses them. In programming terms, it seems that these functions do not have “overloads”, i.e. different code paths depending on the incoming datatype.

So let’s take a closer look at how this manifests itself.

Calculating invoice settlements with window functions

I’ve solved this puzzle a number of times, but I’ve never really been quite happy with the result. It was either too slow, too much code, too hard to understand. So here’s a fresh take at computing the time-to-payment on a large amount of invoices, with multiple, overlapping, partial payments.

Calculating the number of weekdays between two dates

I wish the DATEDIFF() function would count the number of working days (mondays through fridays) between two dates for me, but until that happens, I’ve had to roll my own scalar function. I tried to think of a smart way involving perhaps a modulus calculation, but I quickly succumbed to a more down-to-earth approach.

Efficient data, part 6: Versioning changes

This installment in the series on efficient data is on versioning changes in a table. The article is a re-post of a post I wrote in september on compressing slowly changing dimensions, although the concept does not only apply to dimensions – it can be used pretty much on any data that changes over time.

The idea is to “compress” a versioned table, so instead of just adding a date column for each version, you can compress multiple, sequential versions into a single row with a “from” date and a “to” date. This can significantly compress the size of the table.

Efficient data, part 5: Datatypes

This week’s installment in the series on efficient data is all about data types. Be smart about choosing them; The right type for the right job will not only store your data in less physical space and enforce data integrity better, but smaller indexes and tables will also improve overall query performance.

Efficient data, part 3: Compression

This is the third part in a series of articles on optimizing large volumes of data. This part is all about using compression, and since I wrote an article on that a few months ago, I’m not going to re-invent the wheel. So this article is a repost of my july article on compression.

Efficient data, part 2: Partitioning

This is the second part in a series on storing and modelling data efficiently. A great way to add performance to your data is to partition it. Like the name implies, partitioning splits a table or index into multiple partitions, so the data can be stored across multiple physical files and drives. Partitioning is a feature of SQL Server Enterprise Edition, but if you have one, you’re in luck!

Efficient data, part 1: Normalization

We’ve talked a lot about optimizing queries and query performance, but we haven’t really touched that much on the storage and data modelling aspects. In this series of post, I’ll run through some basic tips on how you can more efficiently model and store your data, which may come in particularly handy when you’re working with large databases and large transaction volumes, but a lot of it also makes good design sense in smaller databases.

In this first article, we’ll cover the normalized data model.