PIVOT, UNPIVOT and performance

Few things deserve the attention of a long rant as much as unneccessarily complicated syntaxes. When you want to achieve something that is clearly defined and supported, but you have to look up the the syntax. PIVOT and UNPIVOT are examples of such features, and in this case, I’ll even show you a more well-performing alternative.

Parallel execution, part 1

The subject of parallel execution of SQL Server queries is at times somewhat shrouded in mystery and uncertainty. Since the concept of parallel execution is such a significant (and indispensable tool) for performance tuning, it’s good to have a fair idea of how it works. In this first post in a series on parallelization, I’m going to try to sort out the apples from the pears, or the serial from the parallel if you will.

Using memory optimized tables on VirtualBox

There are a handful of wonderful software packages out there that I wouldn’t want to work without, including the free, open-source virtualization platform VirtualBox, which was acquired by Sun a few years ago, which in turn was bought by Oracle.

The other day, I was setting up a new virtual machine to run SQL Server 2014 with memory-optimized tables, which incidentally is one of the great reasons to update to 2014. Memory-optimized tables are tables that are stored in the RAM memory of the server. Some of the great advantages of working with “in-memory OLTP”, as it’s also known, include a greatly reduced number of latches and locks (accomplished with a form of row-versioning), which allows for a much larger number of concurrent users to work with the same data. With a few limitations, working with memory-optimized tables is transparent, so you’re using regular T-SQL DML commands.

Turns out, however, that SQL Server didn’t want to run memory-optimized code right out of the box on my VM. Instead, I got this:

Msg 41342, Level 15, State 2, Line 3
The model of the processor on the system does not support creating MEMORY_OPTIMIZED=ON. This error typically occurs with older processors. See SQL Server Books Online for information on supported models.

This message stems from the fact that the processor needs to support the CMPXCHG16B command (I have no idea what that does), which is available on pretty much any modern 64-bit processor. My physical processor is fairly new, so the issue is obviously with the VM software. In this case, it was just a matter of enabling a setting in VirtualBox, which can be done with the following command:

VBoxManage setextradata "Your VM" VBoxInternal/CPUM/CMPXCHG16B 1

I have no idea why this feature isn’t turned on by default, but once you’ve enabled it, it works like a charm. So, expect to see more blog posts about memory-optimized tables in the near future. 🙂

Distributing values with overlapping ranges

Today, we’re going to look at how to distribute values with what I call “overlapping ranges”. What I mean by that is that the value that you need to distribute has a given range and the objects you want to distribute the values to are also defined as ranges. Sound complicated? It doesn’t have to.

Manual transaction management

Transactions are great. They keep your data together atomically, so you’re not in for any nasty surprises. But even a novice knows better than to leave transactions open, waiting for user interaction. If you do, lock waits and probably deadlocks will pile up in no time.

So how do you book a flight without blocking all the other users or losing your seat to somebody else while you make up your mind?

How to build a histogram in T-SQL

Talk to SQL Server developers or DBAs about histograms, and they’ll inevitably think of index statistics. However, a task you may encounter some day is to calculate the distribution of numbers in a table. And although there’s no quick built-in function to do this, it’s not as difficult as you may think.

Reloading fact tables with zero downtime

If you’re working with data warehousing or reporting, you’ll recognize this problem as a recurring headache whenever you’re designing an ETL process for fact tables: If you want to completely reload all the rows of a fact table, you would typically start by emptying (or truncating) the fact table, and then load new data into it. But during the loading process, depending on what your job does, there won’t be any data in the table, or worse, it will be half-filled and incorrect. Worst-case: If your ETL job crashes, the table will remain empty. Now, if your ETL job takes an hour to run, that’s a problem.