Datawarehouse modelling: Inmon vs Kimball

If you’re into business intelligence, data warehousing and analytics, you will have heard an endless number of references to Bill Inmon and Ralph Kimball. These two figureheads in datawarehousing architecture have produced an immense number of books, articles, training seminars, etc. While many of their strategies and modelling approaches are similar, they have near-opposite views on other aspects.

Inmon: The top-down approach

By and large, you could summarize Bill Inmon’s datawarehouse model as “top-down”, in the sense that you start off at “the top”, with a single, large datawarehouse (typically a snowflake-schema database). On this large database you can, optionally, build isolated star schemas or cubes for specific reporting purposes. All of your business’ source systems integrate into this enterprise database, where fact and dimension data is cleaned and integrated.

Inmon defines a datawarehouse as:

  • subject-oriented – everything is connected,
  • time-variant – the data is versioned using timestamps (i.e. snapshots) or start/end dates,
  • non-volatile – what’s written to the datawarehouse doesn’t change, it’s just versioned,
  • integrated – the datawarehouse contains “everything” of value.

As you can tell, the pros of this approach is that you can fairly easily add or replace source systems in the existing solution. Adding new star schemas, cubes and reports is also fairly straight-forward, because all the data is already there. The cons, however, are that the construction of the “base” datawarehouse may take quite some work to complete, and unless you build data marts or cubes on it, relevant data can be hard to identify or extract for the end-user.

In essence, the Inmon approach focuses on the enterprise, not the individual process – so, in summary, you could view it as a strategic, long-term investment. It’s durable, powerful, but perhaps not that user-friendly.

Kimball: The bottom-up approach

Where Inmon’s view is a tactical enterprise-level investment, Kimball’s view on datawarehousing is rather a tactical venture.

Referred to as “dimensional modelling”, the Kimball model focuses on a specific business process, by modelling data marts that are specific for given scenarios – a defined set of reporting or analytics needs. As such, modelling start at the “bottom”, i.e. with the reporting/analysis requirements. Kimball views the datawarehouse as actually just the aggregate, or union, of all of its data marts.

He calls this approach the “enterprise data bus”, because each data mart is fed individually, i.e. with a bus-like flow structure. Like Inmon, Kimball also relies on a staging area, although the staging area isn’t really as tightly integrated as the Inmon datawarehouse. Building this type of staging area is a lot faster, because the scope is limited, and the data doesn’t neccessarily need to be cleaned and integrated to the same extent as in an enterprise datawarehouse.

In summary, the pros are that a Kimball datawarehouse is faster to build and deploy initially. The cons that it comes across as less integrated. A new source data system or report may not only require new star schemas or cubes, you may very likely have to build an entire ETL chain with staging, cleaning and denormalization. This also means that changing or adding a source data system may require extensive re-working of the datawarehouse.

The Kimball model is all about accessibility. It’s simple to use, easy to understand, at the cost of limiting its usefulness outside the specific task for which is was designed.

A word about trends and gurus

Some of this goes without saying.

Every field of knowledge has their gurus. Whether you’re into fine arts and antiquities, medicine, interior decoration, taxation law or data warehousing, there are trends and generally accepted “truths”. More often than not, discussions on data modelling can resemble the tiresome debates on Windows vs. OS X or Nikon vs. Canon cameras.

Sometimes, these things are a matter of preference, sometimes the circumstances are in favor of one solution over the other. But, please, always make a habit of constructively arguing pros and cons of your choice in data models instead of just citing Kimball or Inmon like scripture. :)

Let me hear your thoughts!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s