If you’re working with data warehousing or reporting, you’ll recognize this problem as a recurring headache whenever you’re designing an ETL process for fact tables: If you want to completely reload all the rows of a fact table, you would typically start by emptying (or truncating) the fact table, and then load new data into it. But during the loading process, depending on what your job does, there won’t be any data in the table, or worse, it will be half-filled and incorrect. Worst-case: If your ETL job crashes, the table will remain empty. Now, if your ETL job takes an hour to run, that’s a problem.
In the third installment of the series on slowly changing dimensions, we’re going to tackle the question of how to manage accumulated fact aggregates in a solution that uses SCD 2 dimensions. While SCD 2 dimensions solve a lot of problems with slowly changing dimensions, accumulated values can still make a mess of the aggregate data.