As part of an improvement process (any process, not just database development), you can start by asking yourself “How could I make this process as bad as possible”. In fact, when you think of how to royally screw something up, all you have to do is flip the answer around, and there you have a good starting point for improving your process, work, whatever.
Here are a few T-SQL “worst practices” to get you started on improving your database design and development practices. I’ve personally seen all of these in the field, and I think you might recognize some of them, too.
10. Using DISTINCT to eliminate duplicate records
If you’ve missed a join, or worse – the data model isn’t good, you may end up using DISTINCT to eliminate those last few duplicates in the output. But remember that there’s a reason that you’re still seeing duplicates – look at all the tables and their primary keys, as well as your joins, and you’ll probably find something you’ve missed – either in the data model or in your query.
9. Making all users sysadmin or db_owner
Users and roles are there for a reason, so if you’ve spent hours and days on modelling and building your database, you might as well spend a few minutes on setting up adequate security as well. Just how nitty-gritty you want it, is your choice. But you should at least make an effort.
You may know and trust all your users, but everyone makes mistakes. So, for instance, making sure that your developers are members of the datareader group instead of sysadmin on your production machine may save your bacon on a rainy thursday morning when one developer isn’t paying attention.
8. Not using primary keys
A primary key is not just a unique index, which is good in itself, but it also tells anyone else working with your data what key column(s) are unique, which helps a lot when you’re trying to build a good query. Primary keys help developers understand your database, and they are a requirement for building foreign keys.
7. Storing everything, including dates and numbers in nullable varchar(max) columns
The obvious problem here is that numbers and dates stored in varchar columns are prone to conversion errors and invalid values. But there’s also the issues of sorting, indexing, searching and aggregating the values. And the fact that a varchar value takes up one byte for every character, so the value 123456.8 takes up 9 bytes, whereas a numeric(9, 1) takes up just 5 bytes. That adds up when you have a few columns in a table with 10 million rows.
This is really a no-brainer: Define columns to hold the types of values you’re going to be working with. And only allow NULL values on columns that are really nullable.
6. Placing the Windows swap file on the same physical disk as the database or log files
If Windows is swapping its memory to the same drive that you’re using for your database, needless to say, they’re both going to take a potentially severe performance penalty.
Bear in mind that this problem may not be as apparent in virtual environments, where you may think that you’re using two different disks, but they in fact map to the same physical drive in the end.
This is less of an issue on SANs and other fast RAID sets, but it certainly needs to be managed.
5. Using dirty reads (NOLOCK) everywhere
Using the query hint WITH (NOLOCK) or the equivalent isolation level performs “dirty reads” on your data, which means that if someone else is modifying the data or is just about to, your process ignores this and reads data that may potentially not be there in a few milliseconds from now.
This does not mean that there is no place for dirty reads – just always make sure that the data your working has a very low chance of being changed, like for instance dimension tables in a datawarehouse.
4. Changing (turning off) the default ANSI flags
If you start changing the default ANSI (and other) connection flags, you may end up chasing ghosts such as strange query behaviour, non-optimal query plans, NULL values that don’t behave the way you expect, etc.
3. Cursors
A cursor is a row-based approach to a set-based problem. In a database, odds are excellent that there is a good set-based solution to your problem that, apart from being a lot more readable to another developer, will give you a massive performance boost over a cursor-based solution.
2. Using AUTO_SHRINK on your database files
I’ve seen many cases where databases are set to auto-shrink in order to keep them from taking up disk space. Commonly, I would find this in databases with nightly or weekly ETL jobs that would grow the transaction log and/or tempdb database.
The downside of shrinking a database is that you’re freeing up disk space. Because when you re-allocate this disk space (i.e. the next time your database auto-grows), your database files may very well end up being fragmented on the physical disk. This means that read performance in the database will decline over time because the physical head on the harddrive will have to skip back-and-forth on the disk when you’re reading/scanning large chunks of contiguous data.
Design your database code so it won’t grow the files out of proportion, or make sure you have enough disk space in the first place.
1. Using heaps
A clustered index vastly improve query performance on your table, because without one, all your queries will perform a table scan, i.e. go through the entire table (heap), no matter how little data you’re looking for.