Just for the heck of it, I scratched together a template parser for T-SQL . The usage of this function is similar to the STRING_SPLIT() function, except instead of splitting a string by a delimiter character, we want to split a string according to a defined template.
Notice how the “%” wildcard character denotes how the string is split. Unlike the fancy stuff you can do with regular expressions, T-SQL wildcards don’t allow you to define capture groups, so this function is unfortunately constrained to just using “%”. I hope it will still come in handy to someone out there.
We’re no strangers to doing things in T-SQL that would perhaps be more efficient in a procedural language. Love it or hate it, a T-SQL solution is easier in some situations, like my sp_ctrl3 procedure that I use as a drop-in replacement for the standard sp_help procedure to display object information in a way that simplifies copying and pasting.
One of the things that sp_ctrl3 does is plaintext database search. If you pass a string to the procedure that does not match an existing object, it’ll just perform a plaintext search of all SQL modules (procedure, views, triggers, etc) for that string. The search result includes line numbers for each result, so it needs to split each module into lines.
I’ve found that this takes a very long time to run in a database with large stored procedures, so here’s how I tuned it to run faster.
It’s almost like a myth – one that I’ve heard people talk about, but never actually seen myself. The “shock absorber” is a pretty clever data flow design pattern to ingest data where a regular ETL process would choke on the throughput or spikes. The idea is to use a buffer table to capture incoming data, and then run an asynchronous process that loads that data in batches from the buffer into its intended target table.
While I’ve seen whitepapers and blog posts mention the concept loosely along with claims of “7x or 10x performance”, none of them go into technical detail on how it’s done, so I decided to try my hand at it.
I’ve compiled my findings, along with some pre-baked framework code if you want to try building something yourself. Professional driver on closed roads. It’s gonna get pretty technical.
I recently worked with a large set of accounting transactions. I needed to split those rows into multiple logical batches, but each batch had to be logically consistent – among other things, those batches had to be properly balanced, because accounting people are kind of fussy like that.
So I designed a little T-SQL logic that would split all of those transactions into evenly sized batches, without violating their logical groupings.
I’ve solved this puzzle a number of times, but I’ve never really been quite happy with the result. It was either too slow, too much code, too hard to understand. So here’s a fresh take at computing the time-to-payment on a large amount of invoices, with multiple, overlapping, partial payments.
Some operations in SQL Server will turn your entire query plan serial (single-threaded), others will just reserve a so-called “serial zone”. I read up on this stuff a number of years ago (including a great post by Paul White), and thinking that some things must have changed since, I decided to go see for myself.
It’s not entirely uncommon to want to group by a computed expression in an aggregation query. The trouble is, whenever you group by a computed expression, SQL Server considers the ordering of the data to be lost, and this will turn your buttery-smooth Stream Aggregate operation into a Hash Match (aggregate) or create a corrective Sort operation, both of which are blocking.
Is there anything we can do about this? Yes, sometimes, like when those computed expressions are YEAR() and MONTH(), there is. But you should probably get your nerd on for this one.
Encrypting your SQL Server’s TDS connections should be high on your list of things to do if you’re concerned with the privacy of your data. This often boils down to one big problem: can you get a valid certificate without paying a ton of money, and will it work with SQL Server?
So follow me down the rabbit hole, as we work out the steps to using Let’s Encrypt to create (and auto-renew!) a certificate for SQL Server. This is going to get technical.
Performance tuning the other day, I was stumped by a query plan I was looking at. Even though I had constructed a covering index, I was still getting a Key Lookup operator in my query plan. What I usually do when that happens is to check the operator’s properties to see what its output columns are, so I can include those columns in my covering index.
Here’s the interesting thing: there weren’t any output columns. What happened?
A popular form of organizing dimensions is in parent-child structures, also known as “unbalanced” or “ragged” dimensions, because any branch can have an arbitrary number of child levels. There are many advantages to this type of representation, but their recursive nature also brings some challenges. In this post, we’re going to look at circular references, and how you can trap them before they run out of control.