Everyone has a script, a hack or a checklist they can’t function without. In this edition of T-SQL Tuesday, Bert Wagner challenged us to write about our favorite scripts. This is my take.
It’s not entirely uncommon to want to group by a computed expression in an aggregation query. The trouble is, whenever you group by a computed expression, SQL Server considers the ordering of the data to be lost, and this will turn your buttery-smooth Stream Aggregate operation into a Hash Match (aggregate) or create a corrective Sort operation, both of which are blocking.
Is there anything we can do about this? Yes, sometimes, like when those computed expressions are YEAR() and MONTH(), there is. But you should probably get your nerd on for this one.
I’m trying a new type of blog post, and if it works out, I would be happy to post more of the same going forward. The format is a real-world troubleshooting mystery, and I’ll clue you in to the details along the way.
How quickly can you crack it?
Download and print this nifty little PDF with all of the INNER, LEFT, RIGHT, FULL and CROSS JOINs visualized! It’ll look great on your office wall or cubicle. Your coworkers and your interior decorator will love you for it.
How it works: For each join example, there are two tables, the left and the right table, shown as two columns. For the sake of simplicity, these tables are called “a” and “b” respectively in the code.
You’ll notice that the sheet uses a kind of pseudo-code when it comes to table names and column names.
Dynamic data masking is a neat new feature in recent SQL Server versions that allows you to protect sensitive information from non-privileged users by masking it. But using a brute-force guessing attack, even a non-privileged user can guess the contents of a masked column. And if you’re on SQL Server 2014 or earlier, you won’t have the option of using data masking at all.
I just remembered a pretty common data challenge the other day. Suppose you have a number of tables, all with similar information in them. You want to union their contents, but you need to prioritize them, so you want to choose all the rows from table A, then rows from table B that are not included in A, then rows from C that are not included in A or B, and so on.
This is a pretty common use case in data cleansing or data warehousing applications. There are a few different ways to go about this, some more obvious than others.
With good naming and datatyping conventions, an automated script can help you with the process of creating foreign key constraints across your database, or actually, suggest table relations where you’ve forgotten to implement them.
Probably one of the most common challenges I see when I do ETL and business intelligence work is analyzing a table (or a file) for possible primary keys. And while a bit of domain knowledge, along with a quick eye and some experience will get you really far, sometimes you may need some computational help just to be sure.
Here are some handy tricks to get you started!
The binary datatype of SQL Server is one of those features most developers don’t really use that often, but it turns out there’s more to binary values than just storing large, non-relational blobs.