| Article Index |
|---|
| Data Warehousing and Data Mining Overview |
| Page 2 |
| All Pages |
Data mining
Enter the concept of data mining. During the mid- to late 1990s, commercial vendors began exploring the feasibility of applying traditional statistical and artificial intelligence analysis techniques to large databases for the purpose of discovering hidden data attributes, trends, and patterns. This exploration evolved into formal data-mining toolsets based on a wide collection of statistical analysis techniques.
For a commercial business, the discovery of previously unknown statistical patterns or trends can provide valuable insight into the function and environment of their organization. Data-mining techniques allow businesses to make predictions of future events, whereas OLAP only gives an analysis of past facts. Data-mining techniques can generally be grouped into one of three categories: clustering, classifying, and predictive.
Clustering techniques group information based on a set of input patterns using an unsupervised or undirected algorithm. One example of clustering could be the analysis of business consumers for unknown attribute groupings. Input to this example would be well-defined consumer attributes over which the algorithm would search.
Classifying techniques group or assign objects to predetermined groupings based on well-defined attributes. The groupings are often clusters discovered using the above techniques. An example would be assigning a consumer to a particular sales cluster based on their income level.
Predictive techniques take as input known attributes regarding a particular object or category and apply those attributes to another similar group to identify expected behavior or outcomes. For example, if a group of individuals wearing helmets and shoulder pads is known to be a football team, we can expect another group of individuals with helmets and pads to be a football team as well.
Data-mining techniques
The following list describes many data-mining techniques in use today. Each of these techniques exists in several variations and can be applied to one or more of the categories above.
* Regression modeling—This technique applies standard statistics to data to prove or disprove a hypothesis. One example of this is linear regression, in which variables are measured against a standard or target variable path over time. A second example is logistic regression, where the probability of an event is predicted based on known values in correlation with the occurrence of prior similar events.
* Visualization—This technique builds multidimensional graphs to allow a data analyst to decipher trends, patterns, or relationships.
* Correlation—This technique identifies relationships between two or more variables in a data group.
* Variance analysis—This is a statistical technique to identify differences in mean values between a target or known variable and nondependent variables or variable groups.
* Discriminate analysis—This is a classification technique used to identify or "discriminate" the factors leading to membership within a grouping.
* Forecasting—Forecasting techniques predict variable outcomes based on the known outcomes of past events.
* Cluster analysis—This technique reduces data instances to cluster groupings and then analyzes the attributes displayed by each group.
* Decision trees—Decision trees separate data based on sets of rules that can be described in "if-then-else" language.
* Neural networks—Neural networks are data models that are meant to simulate cognitive functions. These techniques "learn" with each iteration through the data, allowing for greater flexibility in the discovery of patterns and trends.
Conclusion
Organizations today are under tremendous pressure to compete in an environment of tight deadlines and reduced profits. Legacy business processes that require data to be extracted and manipulated prior to use will no longer be acceptable. Instead, enterprises need rapid decision support based on the analysis and forecasting of predictive behavior. Data-warehousing and data-mining techniques provide this capability.

| < Prev | Next > |
|---|
