etl – sqlsunday.com

The compelling case for using heaps

I’m an outspoken advocate of always using a clustered index on each and every table you create as a matter of best practice. But even I will agree that there’s a case for using the odd heap now and then.

Copying data with foreign keys and/or identity columns

In a sense, you could call me lazy. If there’s a script that will perform a task for me, I’d rather use that script than reinvent another wheel. Then again, if needs be, I’d rather spend a day writing such a script, rather than spending ten minutes just getting the job done.

Somehow, that makes me a happier developer.

Reloading fact tables with zero downtime

If you’re working with data warehousing or reporting, you’ll recognize this problem as a recurring headache whenever you’re designing an ETL process for fact tables: If you want to completely reload all the rows of a fact table, you would typically start by emptying (or truncating) the fact table, and then load new data into it. But during the loading process, depending on what your job does, there won’t be any data in the table, or worse, it will be half-filled and incorrect. Worst-case: If your ETL job crashes, the table will remain empty. Now, if your ETL job takes an hour to run, that’s a problem.