Watch out for Merge Interval with date range Index Seeks

In my last post, I found that DATEDIFF, DATEADD and the other date functions in SQL Server are not as datatype agnostic as the documentation would have you believe. Those functions would perform an implicit datatype conversion to either datetimeoffset or datetime (!), which would noticeably affect the CPU time of a query.

Well, today I was building a query on an indexed date range, and the execution plan contained a Merge Interval operator. Turns out, this operator brings a few unexpected surprises to your query performance. The good news is, it’s a relatively simple fix.

DATEDIFF performs implicit conversions

As I was performance tuning a query, I found that a number of date calculation functions in SQL Server appear to be forcing a conversion of their date parameters to a specific datatype, adding computational work to a query that uses them. In programming terms, it seems that these functions do not have “overloads”, i.e. different code paths depending on the incoming datatype.

So let’s take a closer look at how this manifests itself.

About wildcards and data type precedence

Implicit conversions in SQL Server follow a specific, predictable order, called data type precedence. This means that if you compare or add/concatenate two values, a and b, with different data types, you can predict which one will be implicitly converted to the data type of the other one in order to be able to complete the operation.

I stumbled on an interesting exception to this rule the other day.

The SQL Server Calendar project

I’m the type of developer that invents wheels. Yes, every wheel I design is unique in its own way, and hand-crafted for a specific purpose. And so it has also been with calendar dimensions (typically when I do data warehousing work).

This got me thinking – why not design the mother of all calendar dimensions? One that includes every conceivable calendar and property that I and others could use and re-use. One that could save me a ton of coding, and lessen the burden of having to validate it each and every time?

And that’s how I got started designing my one calendar script to rule them all.

Grouping dates without blocking operators

It’s not entirely uncommon to want to group by a computed expression in an aggregation query. The trouble is, whenever you group by a computed expression, SQL Server considers the ordering of the data to be lost, and this will turn your buttery-smooth Stream Aggregate operation into a Hash Match (aggregate) or create a corrective Sort operation, both of which are blocking.

Is there anything we can do about this? Yes, sometimes, like when those computed expressions are YEAR() and MONTH(), there is. But you should probably get your nerd on for this one.

Human-readable ranges of integers or dates

This is a real-world problem that I came across the other day. In a reporting scenario, I wanted to output a number of values in an easy, human-readable way for a report. But just making a long, comma-separated string of numbers doesn’t really make it very readable. This is particularly true when there are hundreds of values.

So here’s a powerful pattern to solve that task.

Calculating the number of weekdays between two dates

I wish the DATEDIFF() function would count the number of working days (mondays through fridays) between two dates for me, but until that happens, I’ve had to roll my own scalar function. I tried to think of a smart way involving perhaps a modulus calculation, but I quickly succumbed to a more down-to-earth approach.

Joining two SCD2 tables

A number of OLTP systems store dimension data in SCD2-like tables in order to retain all the revisions whenever the dimension information changes. In certain situations, you may come across a need to join two or more SCD tables, while keeping all the versions information intact. Sound tricky? Not really.

Calculating the date of easter sunday

We’ve previously looked at how to calculate recurring public holidays. However, calculating the date of easter sunday is not as simple as you might think, because it involves calculations of lunar phases. This short post contains a T-SQL translation of the popularly used Meeus-Jones-Butcher formula.

Efficient data, part 5: Datatypes

This week’s installment in the series on efficient data is all about data types. Be smart about choosing them; The right type for the right job will not only store your data in less physical space and enforce data integrity better, but smaller indexes and tables will also improve overall query performance.