Acat where Aquan
1
and Aquan
2
are tests on quantitative attributive ranges
(where the ranges are dynamically determined), and Acat assigns a class
label for a categorical attribute from the given training data. Association
rules are plotted on a 2-D grid. The algorithm scans the grid, searching for
rectangular clusters of rules. In this way, adjacent ranges of the quantitative
attributes occurring within a rule cluster may be combined. The clustered
association rules generated by ARCS were empirically found to be slightly
more accurate than C4.5 when there are outliers in the data. The accuracy of
ARCS is related to the degree of discretization used. In terms of scalability,
ARCS requires “a constant amount of memory”, regardless of the database
size. C4.5 has exponentially higher execution times than ARCS, requiring the
entire database, multiplied by some factor, to fi t entirely in main memory.
The second method is referred to as associative classifi cation. It mines
rules of the form condset= >y, where condset is a set of items (or attribute-
value pairs) and y is a class label. Rules that satisfy pre-specifi ed minimum
supports are frequent, where a rule has support s. if s% of the samples in
the given data set contain consent and belong to class y. A rule satisfying
minimum confi dence is called accurate, where a rule has confi dence c, if c%
of the samples in the given data set that contain consent belong to class y. If
a set of rules has the same consent, then the rule with the highest confi dence
is selected as the possible rule (PR) to represent the set.
The association classifi cation method consists of two steps. The fi rst
step fi nds the set of all PRs that are both frequent and accurate. It uses an
iterative approach, where prior knowledge is used to prune the rule search.
The second step uses a heuristic method to construct the classifi er, where the
discovered rules are organized according to decreasing precedence based
on their confi dence and support. The algorithm may require several passes
over the data set, depending on the length of the longest rule found.
When classifying a new sample, the fi rst rule satisfying the sample is
used to classify it. The classifi er also contains a default rule, having lowest
precedence, which specifi es a default class for any new sample that is
not satisfi ed by any other rule in the classifi er. In general, the associative
classifi cation method was empirically found to be more accurate than C4.5
on several data sets. Each of the above two steps was shown to have linear
scale-up.
The third method, CAEP (classifi cation by aggregating emerging
patterns), uses the notion of itemset supports to mine emerging patterns
(EPs), which are used to construct a classifi er. Roughly speaking, an EP is
an itemset (or set of items) whose support increases signifi cantly from one
class of data to another. The ratio of the two supports is called the growth
rate of the EP. For example, suppose that we have a data set of customers
with the classes buys
c
omputer = “yes”, or C1, and buys computer = “no”,
or C2, the itemset age = “≤30”, student = “no” is a typical EP, whose support
Classifi cation 111