Data Warehouses

Data Warehouse

A data warehouse is a relational database that is designed to facilitate reporting and analysis. It usually contains historical data derived from transaction data and other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several source systems. Typically the Data warehouse environment also includes business intelligence tools, ETL tools and tools to manage and retrieve metadata.

There are two leading approaches to storing data in a data warehouse — the dimensional approach and the normalized approach.

In a dimensional approach, transaction data are partitioned into either "facts", which are generally numeric transaction data, or "dimensions", which are the reference information that gives context to the facts. For example, a sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name and product number. A key advantage of a dimensional approach is that the data warehouse is easier for the user to understand and to use. Also, the retrieval of data from the data warehouse tends to operate very quickly.

In the normalized approach, the data in the data warehouse are stored using database normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.). The main advantage of this approach is that it is straightforward to add information into the database. A disadvantage of this approach is that, because of the number of tables involved, it can be difficult for users to join data into meaningful information without a precise understanding of the data sources and of the data structure of the data warehouse.

Two leading authorities on data warehouse design are Ralph Kimball and Bill Inmon; however they strongly disagree on which is the best approach to implement a data warehouse!

Ralph Kimball is a proponent of an approach to data warehouse design frequently considered as bottom-up. In the so-called bottom-up approach data marts are first created to provide reporting and analytical capabilities for specific business processes. Data marts contain atomic data and, if necessary, summarized data. These data marts can eventually be unioned together to create a comprehensive data warehouse. The combination of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture".

Bill Inmon, has defined a data warehouse as a centralized repository for the entire enterprise. Inmon is one of the leading proponents of the top-down approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise data model. "Atomic" data is stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. In the Inmon vision the data warehouse is at the center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence.

Over time it has become apparent to proponents of bottom-up and top-down data warehouse design that both methodologies have benefits and risks. Hybrid methodologies have evolved to take advantage of the fast turn-around time of bottom-up design and the enterprise-wide data consistency of top-down design.