Optimizing a string split and search

We’re no strangers to doing things in T-SQL that would perhaps be more efficient in a procedural language. Love it or hate it, a T-SQL solution is easier in some situations, like my sp_ctrl3 procedure that I use as a drop-in replacement for the standard sp_help procedure to display object information in a way that simplifies copying and pasting.

One of the things that sp_ctrl3 does is plaintext database search. If you pass a string to the procedure that does not match an existing object, it’ll just perform a plaintext search of all SQL modules (procedure, views, triggers, etc) for that string. The search result includes line numbers for each result, so it needs to split each module into lines.

I’ve found that this takes a very long time to run in a database with large stored procedures, so here’s how I tuned it to run faster.

How to run your CTE just once, and re-use the output

You may already know that common table expressions, like views, don’t behave like regular tables. They’re a way to make your query more readable by allowing you to write a complex SQL expression just once, rather than repeating it all over your statement or view. This makes reading, understanding and future refactoring of your code a little less painful.

But they’re no magic bullet, and you may end up with some unexpected execution plans.

Joining two SCD2 tables

A number of OLTP systems store dimension data in SCD2-like tables in order to retain all the revisions whenever the dimension information changes. In certain situations, you may come across a need to join two or more SCD tables, while keeping all the versions information intact. Sound tricky? Not really.

Splitting a range string into a table

This week’s post is a requirement that I see very regularly as a developer. You get a plaintext string containing one or more ranges. Each range is comma delimited, and the start and end values of the range are separated by a dash. The string could look something like this, for example: 100-120,121-499,510,520,790-999.

Wouldn’t it be practical if we could construct a table value function that returns one row for each range, with columns for the start and end of each range?

Calculating business days and holidays

A common scenario you may have encountered is the need to calculate the first business day after a given date. There are quite a few ugly ways to solve this, including cursors, but there are also some pretty neat ways to approach the problem, and as a bonus, you’ll learn about recursion and the new LEAD(), LAG() functions and accumulation in T-SQL.

Working with dependencies

Working with dependencies, particularly recursive dependencies, may not always be entirely intuitive, but it could be critical knowledge in your database development work. This article focuses primarily on different ways of visualizing dependencies and how to loop through them using recursive common table expressions.

Using DELETE on subqueries and common table expressions

Here’s an interesting feature I found in the code of a colleague the other day. A common task in T-SQL is eliminating duplicate records. This became a lot easier with the introduction of windowed functions way back in SQL Server 2005, such as ROW_NUMBER(), but it turns out, I’ve still been missing out on a really simple and cool solution.

Execution order of non-deterministic functions

Here’s a strange insight that I gained when building a test case where I needed some randomized values. In order to generate random values, you can use the NEWID() function, which creates a uniqueidentifier value for each row. But NEWID() comes with a strange behaviour, that some (including me) will consider a bug, while others (including the SQL Server development team) consider it to be “by design”.