Introduction to Data mining

Posted on Jun 1 2008 - 11:32pm by Raj

In today’s information society, an explosive growth of electronic information has been noticed. Many industrial sectors like banking, power, bio-informatics, operate huge data warehouses for collecting many different types of relevant information. This necessarily involves handling voluminous data. Data mining is the technique to do that.

Business Intelligence (BI) refers to the ability to collect and analyze high volumes of data pertaining to customers, vendors, markets, internal processes and the business environment. BI tools are software that facilitates analysis and decision-making. BI includes a range of functions such as query and reporting, business graphics, online analytical processing (OLAP), statistical analysis, forecasting and data mining. 

A data warehouse is the corner stone of an enterprise-wide business intelligence solution; various analytical and data mining tools are used to turn data in the warehouse into actionable information. Data mining involves numerous techniques. The use of specific techniques gives better results. The choice of particular technique or a combination of them depends on the nature of data, the domain and type of relationships or hidden patterns. The information and the knowledge thus gained can be used for effective decision-making.

Why data mining?

Data explosion is a real problem in today’s world. Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories.We are drowning in data, but starving for knowledge! Solution of the above problem is Data warehousing and data mining.

Evolution of Database Technology:
1960s:
Data collectio, database creatio, IMS ad etwork DBMS
1970s:
Relatioal data model, relatioal DBMS implemetatio
1980s:
RDBMS, advaced data models (exteded-relatioal, OO, deductive, etc.) ad applicatio-orieted DBMS (spatial, scietific, egieerig, etc.)
1990s2000s:
Data miig ad data warehousig, multimedia databases, ad Web databases

What Is Data Mining?
Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases.

What is not data mining?
(Deductive) query processing.  
Expert systems or small ML/statistical programs
Look up phone number in phone directory
Query a Web search engine for information about Amazon

Potential Applications of Data mining:

Market analysis and management: Data mining can help in the following areas for corporates:

Where are the data sources for analysis? Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies
Target marketing : Find clusters of model customers who share the same characteristics: interest, income level, spending habits, etc. Determine customer purchasing patterns over time eg Conversion of single to a joint bank account: marriage, etc.
Cross-market analysis: Associations/co-relations between product sales and  Prediction based on the association information
Customer profiling : data mining can tell you what types of customers buy what products (clustering or classification)
Identifying customer requirements: identifying the best products for different customers and use prediction to find what factors will attract new customers
Provides summary information: various multidimensional summary reports and statistical summary information (data central tendency and variation)
Corporate Analysis and Risk Management: This is another big area of data mining applications
Finance planning and asset evaluation: cash flow analysis and prediction, contingent claim analysis to evaluate assets cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)
Resource planning: summarize and compare the resources and spending
Competition: monitor competitors and market directions, group customers into classes and a class-based pricing procedure and set pricing strategy in a highly competitive market.

Fraud Detection and Management: In today’s world the hacking is a big problem. Data Mining can help us to reduce that.
widely used in health care, retail, credit card services, telecommunications (phone card fraud), etc.
Approach :use historical data to build models of fraudulent behavior and use data mining to help identify similar instances
Examples
auto insurance: detect a group of people who stage accidents to collect on insurance
money laundering: detect suspicious money transactions (US Treasury’s Financial Crimes Enforcement Network)
medical insurance: detect professional patients and ring of doctors and ring of references
Detecting inappropriate medical treatment
Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested  (save Australian $1m/yr).
Detecting telephone fraud
Telephone call model: destination of the call, duration, time of day or week.  Analyze patterns that deviate from an expected norm.
British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud.
Retail: Analysts estimate that 38% of retail shrink is due to dishonest employees.

Other Applications of Data mining:
Sports
IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat
Astronomy
JPL and the Palomar Observatory discovered 22 quasars with the help of data mining
Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site organization, etc.

About the Author

Leave A Response