It’s almost like a myth – one that I’ve heard people talk about, but never actually seen myself. The “shock absorber” is a pretty clever data flow design pattern to ingest data where a regular ETL process would choke on the throughput or spikes. The idea is to use a buffer table to capture incoming data, and then run an asynchronous process that loads that data in batches from the buffer into its intended target table.
While I’ve seen whitepapers and blog posts mention the concept loosely along with claims of “7x or 10x performance”, none of them go into technical detail on how it’s done, so I decided to try my hand at it.
I’ve compiled my findings, along with some pre-baked framework code if you want to try building something yourself. Professional driver on closed roads. It’s gonna get pretty technical.
I recently ran into a curious deadlock issue. I have a process that performs a lot of updates in a “state” table using multiple, concurrent connections. The business logic in the application guarantees that two connections won’t try to update the same item, so we shouldn’t ever run into any locking issues. And yet, we keep getting deadlocks.
What’s going on here? Hint: it has to do with isolation levels and range locks.
or: How I learned to stop worrying, and love all-caps domain names.
I’m a complete beginner at Linux, so I should preface this post with the fact that these are my humble notes after hours of pulling my hair. It’s not really a fully-fledged how-to article, and there are lot of things I’m not covering. But I figured it may help someone out there at some point.
Whenever SQL Server needs to sort a data stream, it will use the Sort operator to reorder the rows of the stream. Sorting data is an expensive operation because it entails loading part or all of the data into memory and shifting that data back and forth a couple of times. The only time SQL Server doesn’t sort the data is when it already knows the data to be ordered correctly, like when it has already passed a Sort operator or it’s reading from an appropriately sorted index.
But what happens if the data is ordered correctly, but SQL Server doesn’t know about it? Let’s find out.
A number of business processes require you to distribute a value over a date range. However, if your distribution keys and values don’t add up perfectly or aren’t perfectly divisible, it’s very easy to get rounding errors in your distributions.
Watching Brent Ozar’s 2017 PASS Summit session on Youtube the other day, I learned that the Top N Sort operation in SQL Server behaves dramatically differently, depending on how many rows you want from the TOP.
You may already know that common table expressions, like views, don’t behave like regular tables. They’re a way to make your query more readable by allowing you to write a complex SQL expression just once, rather than repeating it all over your statement or view. This makes reading, understanding and future refactoring of your code a little less painful.
But they’re no magic bullet, and you may end up with some unexpected execution plans.
I recently worked with a large set of accounting transactions. I needed to split those rows into multiple logical batches, but each batch had to be logically consistent – among other things, those batches had to be properly balanced, because accounting people are kind of fussy like that.
So I designed a little T-SQL logic that would split all of those transactions into evenly sized batches, without violating their logical groupings.