A “shock absorber” pattern for high-performance data ingestion

It’s almost like a myth – one that I’ve heard people talk about, but never actually seen myself. The “shock absorber” is a pretty clever data flow design pattern to ingest data where a regular ETL process would choke on the throughput or spikes. The idea is to use a buffer table to capture incoming data, and then run an asynchronous process that loads that data in batches from the buffer into its intended target table.

While I’ve seen whitepapers and blog posts mention the concept loosely along with claims of “7x or 10x performance”, none of them go into technical detail on how it’s done, so I decided to try my hand at it.

I’ve compiled my findings, along with some pre-baked framework code if you want to try building something yourself. Professional driver on closed roads. It’s gonna get pretty technical.

Using memory optimized tables on VirtualBox

There are a handful of wonderful software packages out there that I wouldn’t want to work without, including the free, open-source virtualization platform VirtualBox, which was acquired by Sun a few years ago, which in turn was bought by Oracle.

The other day, I was setting up a new virtual machine to run SQL Server 2014 with memory-optimized tables, which incidentally is one of the great reasons to update to 2014. Memory-optimized tables are tables that are stored in the RAM memory of the server. Some of the great advantages of working with “in-memory OLTP”, as it’s also known, include a greatly reduced number of latches and locks (accomplished with a form of row-versioning), which allows for a much larger number of concurrent users to work with the same data. With a few limitations, working with memory-optimized tables is transparent, so you’re using regular T-SQL DML commands.

Turns out, however, that SQL Server didn’t want to run memory-optimized code right out of the box on my VM. Instead, I got this:

Msg 41342, Level 15, State 2, Line 3
The model of the processor on the system does not support creating MEMORY_OPTIMIZED=ON. This error typically occurs with older processors. See SQL Server Books Online for information on supported models.

This message stems from the fact that the processor needs to support the CMPXCHG16B command (I have no idea what that does), which is available on pretty much any modern 64-bit processor. My physical processor is fairly new, so the issue is obviously with the VM software. In this case, it was just a matter of enabling a setting in VirtualBox, which can be done with the following command:

VBoxManage setextradata "Your VM" VBoxInternal/CPUM/CMPXCHG16B 1

I have no idea why this feature isn’t turned on by default, but once you’ve enabled it, it works like a charm. So, expect to see more blog posts about memory-optimized tables in the near future. 🙂