Just for laughs, we’re going to take a look at a slightly-too-complex yet elegant solution to a seemingly simple task: creating a list of 100 fictional company names for a demo database.
I just remembered a pretty common data challenge the other day. Suppose you have a number of tables, all with similar information in them. You want to union their contents, but you need to prioritize them, so you want to choose all the rows from table A, then rows from table B that are not included in A, then rows from C that are not included in A or B, and so on.
This is a pretty common use case in data cleansing or data warehousing applications. There are a few different ways to go about this, some more obvious than others.
With good naming and datatyping conventions, an automated script can help you with the process of creating foreign key constraints across your database, or actually, suggest table relations where you’ve forgotten to implement them.
Probably one of the most common challenges I see when I do ETL and business intelligence work is analyzing a table (or a file) for possible primary keys. And while a bit of domain knowledge, along with a quick eye and some experience will get you really far, sometimes you may need some computational help just to be sure.
Here are some handy tricks to get you started!
If all you have is a hammer, everything will eventually start looking like a nail. This is generally known as Maslow’s hammer and refers to the fact that you use the tools you know to solve any problem, regardless if that’s what the problem actually needs. With that said, I frequently need a way to visualize the load distribution of scheduled jobs over a day or week, but I could never be bothered to set up a web server, learn a procedural programming language or build custom visualizations in PowerBI.
So here’s how to do that without leaving Management Studio.
AlwaysOn Availability Groups are a reasonably simple way to set up disaster recovery (DR) for your SQL Server environment, and with fairly little effort, you can get a bit of high availability (HA) from it as well. But there are a few gotchas, the most obvious of them being that Availability Groups only synchronize specific user-databases, not the entire server setup.
Things that are not included in AGs include logins, SQL Server Agent jobs, SSIS packages stored in SQL Server, linked servers and server settings. You could sync these manually (as is often the case), but wouldn’t you just love to have an automated process do all this for you?
This is the first post in a series on synchronizing stuff between Availability Groups, and in this installment, we’ll look at logins. For the sake of simplicity, I’ll assume that you have a primary replica with a single AG and any number of secondary replicas. The logic holds true if you have multiple AGs, it just gets trickier.
This is a real-world problem that I came across the other day. In a reporting scenario, I wanted to output a number of values in an easy, human-readable way for a report. But just making a long, comma-separated string of numbers doesn’t really make it very readable. This is particularly true when there are hundreds of values.
So here’s a powerful pattern to solve that task.
In an attempt to try a different approach, here’s a three-minute video explanation of how the different physical join operators in SQL Server work and why you would choose one over the other.
I’ve written a few blog posts on join operators befores, so if this video wet your appetite, here’s some recommended reading:
I’d love to hear what you think of the short video format! Please leave feedback in the comments below or on Twitter.
Do you ever compare the values of a lot of columns in two tables? Sure you do. Like, for instance, in a cross update, when you need to figure out which rows you should actually update. But it gets worse if the columns are nullable. The fact that any value could potentially be NULL vastly complicates the comparison and might wreak havoc not only on your code but also on your query performance.
But there’s hope.
I wish the DATEDIFF() function would count the number of working days (mondays through fridays) between two dates for me, but until that happens, I’ve had to roll my own scalar function. I tried to think of a smart way involving perhaps a modulus calculation, but I quickly succumbed to a more down-to-earth approach.