to many other disciplines, such as machine learning, database, statistics,
data analytics, operational research, decision support, information systems,
information retrieval and so on. For example, from the viewpoint of data
itself, data mining is a variant discipline of database systems, following
research directions, such as data warehousing (on storage and retrieval) and
clustering (data coherence and performance). In terms of methodologies
and tools, data mining could be considered as the sub-stream of machine
learning and statistics—revealing the statistical characteristics of data
occurrences and distributions via computational or artifi cial intelligence
paradigms.
Thus data mining is defi ned as the process of using one or more
computational learning techniques to analyze and extract useful knowledge
from data in databases. The aim of data mining is to reveal trends and
patterns hidden in data. Hence from this viewpoint, this procedure is very
relevant to the term Pattern Recognition, which is a traditional and active
topic in Artifi cial Intelligence. The emergence of data mining is closely related
to the research advances in database systems in computer science, especially
the evolution and organization of databases, and later incorporating more
computational learning approaches. The very basic database operations
such as query and reporting simulate the very early stages of data mining.
Query and reporting are very functional tools to help us locate and identify
the requested data records within the database at various granularity levels,
and present more informative characteristics of the identifi ed data, such
as statistical results. The operations could be done locally and remotely,
where the former is executed at local end-user side, while the latter over
a distributed network environment, such as the Intranet or Internet. Data
retrieval, similar to data mining, extracts the needed data and information
from databases. In order to fi lter out the needed data from the whole
data repository, the database administrators or end-users need to defi ne
beforehand a set of constraints or fi lters which will be employed at a later
stage. A typical example is the marketing investigation of customer groups
who have bought two products consequently by using the “and” joint
operator to form a fi lter, in order to identify the specifi c customer group. This
is viewed as a simplest business means in marketing campaign. Apparently,
the database itself offers somewhat surface methods for data analysis and
business intelligence but far from the real business requirements such as
customer behavioral modeling and product targeting.
Data mining is different from data query and retrieval because it drills
down the in-depth associations and coherences between the data occurrence
within the repository that are impossible to be known beforehand or via
using basic data manipulating. Instead of query and retrieval operations,
data mining usually utilizes more complicated and intelligent data analysis
approaches, which are “borrowed” from the relevant research domains
Introduction 5