Data Warehousing and Data Mining Overview

Posted on Sep 7 2008 - 6:30pm by Raj

Data warehousing and mining provide the tools to bring data out of the silos and put it to use. Traditionally, enterprise data has been kept in information silos that are physically separate from other data repositories and serve specialized functions. Enterprise-wide reporting was difficult at best, requiring multiple data extracts and reformulation. All this data manipulation extracted a high cost in terms of accuracy and timeliness. Fortunately, the technology sector has anted up new data warehousing and mining tools to provide assistance.

Data warehousing

Data warehouses offer organizations the ability to gather and store enterprise information in a single conceptual enterprise repository. Basic data modeling techniques are applied to create relationship associations between individual data elements or data element groups. These associations, or "models," often take the form of entity relationship diagrams (ERDs). More advanced techniques include the star schema and snowflake data model concepts. Regardless of the technique chosen, the goal is to build a metadata model that conceptually represents the information usage and relationships within the organization.

Leveraging the metadata model, enterprise users can then apply elementary data analysis techniques to gather business knowledge. For example, ad hoc queries can be run against the data warehouse to extract enterprise-level information. These queries would supply information that was impossible to obtain under the legacy system of disparate information silos.

More advanced data warehouse toolsets incorporate the concept of multidimensional data, or data cubes. This data structure allows information to be multi-indexed, which allows for rapid drill-down on data attributes. Data cubes are usually used to perform what-if scenarios over identified data indices. For example, suppose Company X sells jewelry and has offices in Detroit, Pittsburgh, and Atlanta. If the proper attributes were chosen as indices, a user could perform the following analysis.

* What was the enterpriseÙs total revenue for 2006?

* What was AtlantaÙs revenue in November?

* If there were a 30 percent increase in orders during the first quarter of 2007, what would my year-end revenue be for Pittsburgh?

* If the Detroit office were closed, what would the impact be to the bottom line?

This multidimensional analysis of multiple business views is called Online Analytical Processing (OLAP). The primary function of OLAP systems is to provide users the ability to perform manual exploration and analysis of enterprise summary and detailed information. It is important to understand that OLAP requires the user to know what information he or she is searching for. OLAP techniques do not process enterprise data for hidden or unknown intelligence.

Pages: 1 2
About the Author

Leave A Response