Optimizing a string split and search

We’re no strangers to doing things in T-SQL that would perhaps be more efficient in a procedural language. Love it or hate it, a T-SQL solution is easier in some situations, like my sp_ctrl3 procedure that I use as a drop-in replacement for the standard sp_help procedure to display object information in a way that simplifies copying and pasting.

One of the things that sp_ctrl3 does is plaintext database search. If you pass a string to the procedure that does not match an existing object, it’ll just perform a plaintext search of all SQL modules (procedure, views, triggers, etc) for that string. The search result includes line numbers for each result, so it needs to split each module into lines.

I’ve found that this takes a very long time to run in a database with large stored procedures, so here’s how I tuned it to run faster.

Continue reading

A “shock absorber” pattern for high-performance data ingestion

It’s almost like a myth – one that I’ve heard people talk about, but never actually seen myself. The “shock absorber” is a pretty clever data flow design pattern to ingest data where a regular ETL process would choke on the throughput or spikes. The idea is to use a buffer table to capture incoming data, and then run an asynchronous process that loads that data in batches from the buffer into its intended target table.

While I’ve seen whitepapers and blog posts mention the concept loosely along with claims of “7x or 10x performance”, none of them go into technical detail on how it’s done, so I decided to try my hand at it.

I’ve compiled my findings, along with some pre-baked framework code if you want to try building something yourself. Professional driver on closed roads. It’s gonna get pretty technical.

Continue reading

Consolidating grouped transactions into evenly sized batches

I recently worked with a large set of accounting transactions. I needed to split those rows into multiple logical batches, but each batch had to be logically consistent – among other things, those batches had to be properly balanced, because accounting people are kind of fussy like that.

So I designed a little T-SQL logic that would split all of those transactions into evenly sized batches, without violating their logical groupings.

Safety glasses on. Let’s dive in.

Continue reading

Grouping dates without blocking operators

It’s not entirely uncommon to want to group by a computed expression in an aggregation query. The trouble is, whenever you group by a computed expression, SQL Server considers the ordering of the data to be lost, and this will turn your buttery-smooth Stream Aggregate operation into a Hash Match (aggregate) or create a corrective Sort operation, both of which are blocking.

Is there anything we can do about this? Yes, sometimes, like when those computed expressions are YEAR() and MONTH(), there is. But you should probably get your nerd on for this one.

Continue reading

Encrypting SQL Server connections with Let’s Encrypt certificates

Encrypting your SQL Server’s TDS connections should be high on your list of things to do if you’re concerned with the privacy of your data. This often boils down to one big problem: can you get a valid certificate without paying a ton of money, and will it work with SQL Server?

So follow me down the rabbit hole, as we work out the steps to using Let’s Encrypt to create (and auto-renew!) a certificate for SQL Server. This is going to get technical.

Continue reading

Key Lookup without an output column?

Performance tuning the other day, I was stumped by a query plan I was looking at. Even though I had constructed a covering index, I was still getting a Key Lookup operator in my query plan. What I usually do when that happens is to check the operator’s properties to see what its output columns are, so I can include those columns in my covering index.

Here’s the interesting thing: there weren’t any output columns. What happened?

Continue reading

Catching circular references in parent-child structures

A popular form of organizing dimensions is in parent-child structures, also known as “unbalanced” or “ragged” dimensions, because any branch can have an arbitrary number of child levels. There are many advantages to this type of representation, but their recursive nature also brings some challenges. In this post, we’re going to look at circular references, and how you can trap them before they run out of control.

Continue reading