Reloading fact tables with zero downtime

If you’re working with data warehousing or reporting, you’ll recognize this problem as a recurring headache whenever you’re designing an ETL process for fact tables: If you want to completely reload all the rows of a fact table, you would typically start by emptying (or truncating) the fact table, and then load new data into it. But during the loading process, depending on what your job does, there won’t be any data in the table, or worse, it will be half-filled and incorrect. Worst-case: If your ETL job crashes, the table will remain empty. Now, if your ETL job takes an hour to run, that’s a problem.

There are a few different approaches to solving this problem, and your choice may vary depending on factors such as load frequency, available disk space and whether or not you have an Enterprise Edition of SQL Server.

Partition switching

Partitioning is a very powerful feature of SQL Server that allows you to store table rows in multiple “partitions”. These partitions resemble a bunch of separate tables with identical column and index definitions, except the partitioning is completely transparent, so you only see one table. One of the greatest advantages of partitioning is the ability to distribute different partitions on different physical filegroups, which allows you to split the I/O workload of the table on different physical storage units. But arguably, one of the most useful and beautiful features of partitioning is partition switching.

Think of partitions as slices of bread if you will. Each slice can be different in content (rye, wheat) and can have different thickness, but the column definitions are obviously always the same. When you query the table, SQL Server will try to find specific partitions that contain the information you’re looking for in a process called partition elimination. Probably the most common application of partitioning is for date-derived expressions; a partition can for instance contain a single month, quarter or year of data.

Partition switching is when you swap (switch) the data in two table partitions with each other. Because this process doesn’t actually move any data (the partitions/tables have to be located on the same physical filegroup), this is a lightning-fast process. Here’s how it works:

ALTER TABLE dbo.aPartitionedTable
    SWITCH PARTITION 3 TO dbo.aNonPartitionedTable;

The example query above takes all records from partition 3 in a partitioned table and switches them into the non-partitioned table, which has to be empty in order for this to work. To switch data back into the partition, you can use the following:

ALTER TABLE dbo.aNonPartitionedTable
    SWITCH TO dbo.aPartitionedTable PARTITION 3;

You can even move data between two partitioned tables:

ALTER TABLE dbo.aPartitionedTable
    SWITCH PARTITION 3 TO dbo.anotherPartitionedTable PARTITION 12;

In the case of loading a fact table, you can use a staging table to load your data into and then switch that staging table into the actual fact table (one partition at a time). There are a bunch of other requirements associated with partition switching, but they’re outside the scope of this post.

Here’s an idea on how you could implement partition switching to load a single partition of fact data while keeping the fact table online and populated the entire time:

--- Populate staging table - this is the part of the ETL job that
--- will take up pretty much all of the time.
TRUNCATE TABLE STAGE.sales;

INSERT INTO STAGE.sales (...)
SELECT (...) FROM (...);

--- Clearing the partition by switching it to a dummy table. This is
--- required because the target of a partition switch must be empty.
--- You could DELETE these rows, but this is much, much, faster.
ALTER TABLE FACT.sales PARTITION 3
    SWITCH TO STAGE.sales_empty;

--- .. then clear the dummy table:
TRUNCATE TABLE STAGE.sales_empty;

--- Finally, switch the new staged data into the fact table:
ALTER TABLE STAGE.sales
    SWITCH TO FACT.sales PARTITION 3;

This method has another obvious advantage over the traditional method of just emptying the fact data and re-populating it: If the ETL job fails, you will still have the old fact data intact and untouched, giving you a bit of peace to try to sort out what went wrong with the ETL task, without the screaming users and all that comes with those.

Switching without partitioning

Granted, partitioning isn’t for everyone. For starters, you’ll need to really open up that wallet for an Enterprise Edition license. Secondly, it can add quite a deal of complexity to your solution. Some of the issues are obvious and easily managed, others are not.

But did you know that the SWITCH command can still be used on all editions of SQL Server?

While you won’t be able to partition a table and then reload/switch those individual partitions, you can still switch entire tables! Here’s the partition switching example, but without the partitioning bit:

--- Populate staging table - this is the part of the ETL job that
--- will take up pretty much all of the time.
TRUNCATE TABLE STAGE.sales;

INSERT INTO STAGE.sales (...)
SELECT (...) FROM (...);

--- Clearing the fact table. This is required because the target
--- of a partition switch must be empty.
TRUNCATE TABLE FACT.sales;

--- Switch the new staged data into the fact table:
ALTER TABLE STAGE.sales
    SWITCH TO FACT.sales;

As with partition switching, the source and target of the switch operation still need to have identical column and index definitions. And apart from having a partitioned fact table, you’ll still enjoy the other advantages, such as still having a populated fact table if the ETL job crashes half-way. The downside is that you need to populate the entire fact table, which requires time and disk space.

When switching won’t work

Switching tables or partitions won’t work in a number of scenarios, including the following:

when you don’t have enough space to build the stage table on the same file group – because switching always has to happen within the same filegroup.
when one of the switched tables is replicated.

There’s a whole host of other things to look out for. See this article on TechNet for all the details.

Other methods

I’ve seen a few other methods to accomplish the type of switching we’ve looked at in this article, but considering the flexibility of switching and the fact that it’s available on Standard Edition as long as you’re not working with partitions, I would stick to the general idea in the examples above.

As usual, thanks for reading. Please let me know if there’s anything I’ve left unclear. See you next week!

8 thoughts on “Reloading fact tables with zero downtime”

Johan Idstam (@idstam)

2014-08-27 at 21:49

Is there any advantage to using switch (without partitioning) instead of just renamng the tables?

- Daniel Hutmacher
  
  2014-08-27 at 23:31
  
  Yes, there is. Renaming objects can be hazardous if you have dependencies or cached query plans that might break or function incorrectly after a rename.
  
Geovanny Hernandez

2020-01-05 at 22:14

Very interesting post Daniel, I arrived to this post by accident, however, I would like to congrat you and it should be useful to add at the end of this post some references to other websites.

- Daniel Hutmacher
  
  2020-01-05 at 22:15
  
  Thanks!
  
Pruthvi Tatikonda

2021-07-07 at 19:51

Very beautiful and excellent post. Thank you Daniel.
I have one question/requirement.
Can two partitions in the table, hold the same values? Practically it is not possible but I want to check with you if it is possible.

- Daniel Hutmacher
  
  2021-07-07 at 20:32
  
  Thanks!
  As you’ve already concluded, because the value in the partitioning column dictates which partition the value ends up with, two duplicate values would by definition always end up in the same partition.
  
Alberto Martinez

2022-03-08 at 00:55

Daniel, we use an approach where we have two physical tables (table1 and table2) and a view in front of them. We then fill table2 while the view points to table1, then we switch the view to point to table2 when it is ready. Can you speak to the advantages of the partition approach over this?

- Daniel Hutmacher
  
  2022-03-08 at 07:37
  
  Because the view changes, SQL Server will need to recompile a bunch of dependent execution plans. And you’ll also need to maintain that code that rebuilds the view, which adds complexity.
  Otherwise, the solution is equivalent in terms of query performance and downtime, in my opinion. Just make sure the users never query the physical tables by accident. :)