Tag: Data Warehouse, Data Mart, Data Mining, and Decision Support
A ‘data warehouse’ is a repository of an organization’s electronically stored data. Data warehouses are designed to facilitate reporting and analysis. Data Warehouse Software Testing is not that different than any other testing project. The basic system analysis and testing process still applies. Let’s review a few of these steps and how they fit within a data warehouse context.
Analyze source documentation
As with many other projects, when testing a data warehouse implementation, there is typically a requirements document of some sort. These documents can be useful for basic test strategy development, but often lack the details to support test development and execution. Many times there are other documents, known as source-to-target mappings, which provide much of the detailed technical specifications. These source-to-target documents specify where the data is coming from, what should be done to the data, and where it should get loaded. If you have it available, additional system-design documentation can also serve to guide the test strategy.
Develop strategy and test plans
As you analyze the various pieces of source documentation, you’ll want to start to develop your test strategy. I’ve found that from a lifecycle and quality perspective it’s often best to seek an incremental testing approach when testing a data warehouse. This essentially means that the development teams will deliver small pieces of functionality to the test team earlier in the process. The primary benefit of this approach is that it avoids an overwhelming "big bang" type of delivery and enables early defect detection and simplified debugging. In addition, this approach serves to set up the detailed processes involved in development and testing cycles. Specific to data warehouse testing this means testing of acquisition staging tables, then incremental tables, then base historical tables, BI views and so forth.
Another key data warehouse test strategy decision is the analysis-based test approach versus the query-based test approach. The pure analysis-based approach would put test analysts in the position of mentally calculating the expected result by analyzing the target data and related specifications. The query-based approach involves the same basic analysis but goes further to codify the expected result in the form of a SQL query. This offers the benefit of setting up a future regression process with minimal effort. If the testing effort is a one time effort, then it may be sufficient to take the analysis-based path since that is typically faster. Conversely, if the organization will have an ongoing need for regression testing, then a query-based approach may be appropriate.
Test development and execution
Depending on the stability of the upstream requirements and analysis process it may or may not make sense to do test development in advance of the test execution process. If the situation is highly dynamic, then any early tests developed may largely become obsolete. In this situation, an integrated test development and test execution process that occurs in real time can usually yield better results. In any case, it is helpful to frame the test development and execution process with guiding test categories. For example, a few data warehouse test categories might be:
* record counts (expected vs. actual)
* duplicate checks
* reference data validity
* referential integrity
* error and exception logic
* incremental and historical process
* control column values and default values
In addition to these categories, defect taxonomies like Larry Greenfield’s may also be helpful.
An often forgotten aspect of execution is an accurate status reporting process. Making sure the rest of the team understands your approach, the test categories and testing progress will ensure that the rest of the team is clear on the testing status. With some careful planning, follow-through and communication, a data warehouse testing process can be set up to guide the project team to a successful release.
Some Related terms:
A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject, that may be distributed to support business needs. Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization. Data marts are often derived from subsets of data in a data warehouse though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts.
Data mining is the process of sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods.