Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Learning Concepts

Keywords

concept

concept learning

simple concept

multiple concept

example

positive example

negative example

induction

background knowledge

generalization

specialization

partial order of generalization

MSC generalization

version space

conceptual clustering

attribute complex

This chapter will discuss algorithms for learning concepts from examples. Learning concepts is the most basic kind of learning on a computer, so we need to understand it fully. We will first give the definition of a concept and general methods of learning concepts and algorithms. We will then give algorithms for version space and concept clustering.

6.1 Definition of a Concept

(a) Extensional and intensional meanings

Words such as “a triangle,” “a coffee cup,” and “a hammer” indicate a concept. An individual triangle is considered to be an example of the concept called a triangle. It is important to know how various concepts and their examples are represented in a computer when we formulate algorithms for pattern recognition and learning on a computer. In previous chapters, we represented a concept and its examples using representation techniques like class and instance; however, there we hardly defined terms such as concept and example. So, let us first define the notion of a concept.

There are two different ways of defining the meaning of a concept: extensional meaning and intensional meaning.

Definition 1

(extensional meaning) Various examples exist first and the meaning of a concept is defined by the set of these examples. This is the extensional meaning of a concept.

Definition 2

(intensional meaning) Various properties exist first and the meaning of a concept is defined by a set of these attributes. This is the intensional meaning of a concept.

Suppose we collect together all the triangles we have seen or have imagined and make the set of them. The extensional meaning of the concept “triangle” is this set. The intensional meaning of “triangle” is the set of its properties such that a triangle is surrounded by three line segments or that it is a plane figure. The extensional meaning and the intensional meaning of the same object are not necessarily the same. For example, “the brightest star in the evening (the evening star)” and “the brightest star in the morning (the morning star)” mean the same thing, Venus, in their extensional meaning, but they are different in their intensional meaning.

When we try to define a concept using extensional meaning, we need to list all its examples (it is impossible in an actual life but is possible conceptually). Using the intensional meaning, we can define a concept as a list of attributes, but, once again in real life, it is impossible to list all the possible attributes of a concept. In this book, we determine a set of attributes first and consider objects that satisfy such a set as examples of the concept. This type of definition can be considered as an approximation of a definition for the meaning of a concept that is both extensional and intensional.

The word “concept,” like “knowledge” and “learning,” has been used in philosophy and psychology. Although it is important to know the meaning of “concept” in these fields, we limit ourselves to the above approximate definition when we are dealing with learning on a computer.

(b) Representing a concept

A typical method for representing the meaning of a concept is to use a list containing the name of the concept followed by elements that are attribute-value pairs:

For example, the concept c “something big, red, and round,” can be expressed as follows using the three attributes, size, color, shape:

where size, color, and shape are the attributes representing a size, a color, and a shape, respectively, and large, red, and round are the values representing big, red, and round, respectively.

On the other hand, we also can represent the meaning of a concept using first-order predicate logic. In this case, each attribute corresponds to a predicate symbol and the name of the concept and the value correspond to arguments of the predicate. If we use the conjunction operator, a concept can generally be represented as a logical conjunction:

When a variable symbol is included in the above well-formed formula, the variable symbol is considered to be bound by the universal quantifier, ∀. If we omit the conjunction operators for simplicity, the definition of a concept can be represented as follows:

We also assume that the variable symbols in this set are bounded by universal quantifiers.

As an example, let us consider the concept c, “something big, red, and round.” If the predicate symbols for size, color, and shape are size, color, and shape, c can be represented as follows:

where large, red, and round are the constant symbols for big, red, and round, respectively. If we consider c, which we used as the name of the concept, to be a variable and use a variable symbol X, then the concept “something big, red, and round” can be represented by the well-formed formula

In this chapter, we will use a predicate logic representation for concepts unless otherwise noted.

The definition of a concept in predicate logic can vary depending on which logical operators we choose to combine predicates. We often use literals instead of general predicates to define a concept:

Definition 3

(a single concept) Suppose p₁,…, p_m are literals. Then P = P₁ ⋀. .⋀p_m, in other words, a concept defined by the logical conjunction of literals, is called a simple concept or a conjunctive concept. A simple concept is sometimes represented as a set of literals such as p = {p₁,…, p_m}

Definition 4

(multiple concepts) Suppose q₁,…, q_n are simple concepts. Then q = q₁ ∨ … ∨ q_n, in other words, a concept defined by the logical disjunction of concepts, is called a multiple concept or a disjunctive concept

The predicate logic representation of “something big, red, and round” is an example of a simple concept, “something big and red or small and blue” is a multiple concept and can be represented as

Besides these two kinds of concepts, there are also exclusive multiple concepts that combine simple concepts using the exclusive disjunction operator, and implicative concepts that combine concepts by an implication symbol. These concepts are not used often since their logical meaning is simple. In this book, unless otherwise noted, we will mean a simple concept when we say a concept.

(c) A concept and its instances

We can generate various instances of a concept by substituting constants for variable symbols in the definition of the concept. Let us look at the relationship between a concept and its instances. We will define an instance later in Definition 5.

First, let us look at a figure shown in Figure 6.1. Suppose we have six literals

Figure 6.1 Examples of a simple concept.

where the domain of each variable is

The conjunction of the literals

is the simple concept “two things with some kind of shape, size, and position.” Each of the three figures shown in Figure 6.1 can be regarded as representing an instance of the concept p. This is because each figure (a), (b), (c) corresponds to one of the logical expressions

that are obtained from p by applying the substitutions

Let us look at another example. Suppose, of the above six literals, we take only P₂ and p₃. Their conjunction is

The expression

that can be obtained by applying the substitution

to p′ is the concept meaning “something big is in a lower position.” Now, consider the substitutions and Then pθ₁, pθ₂ are special cases of the logical expressions

respectively.

On the other hand,

contradicts pξ_1.

The conjunction of literals which is an instance obtained by applying substitutions to an original concept is called a positive instance of the original concept. A conjunction of literals which is not a positive instance is called a negative instance of the original concept. pθ₁, pθ₂ and pξ₁ are positive instances of the concept p, and p′ητ₁, p′ητ₂, and p′ητ₃ are positive instances of p′η. pθ₁ and pθ₂ are positive instances of p′, but pξ₁ is a negative instance of p′. Also, size(a, large) Λ position(a, upper) is a negative instance of p′. We will define positive and negative instances more accurately in Definition 5 below.

Next, let us pay attention to the fact that there are generally many concepts that have some particular conjunction of literals as a positive or negative instance. For example, for the substitution

p′η and pζ are both concepts that have pθ₁ and pθ₂ as positive instances and have pξ₁ as a negative instance.

Although it is true that a positive (negative) instance of pζ always is a positive (negative) instance of p′η, the reverse is not true, because when we look at the concepts p and p′ as a set of literals, we have p′ ⊂ p and we have η ⊂ ζ for the substitution. In other words, the concepts associated with some particular instance are divided into more general concepts and more specific concepts depending on how we make conjunction and substitution.

To summarize what we have explained so far, let us now define an instance of a concept.

Definition 5

(positive instance and negative instance) We represent a concept, p, as the set, {p₁,…,p_n}, of literals constituting the concept. Suppose q is a finite set of ground literals. If holds for some substitution θ that replaces all the variable symbols included in p with constant symbols, we say that q is a positive instance of p. On the other hand, when r is a finite set of ground literals, if is true for any substitution ξ that replaces all the variable symbols with constant symbols, we say that r is a negative instance of p. We put together a positive instance and a negative instance and call them an instance. Also, when the above conditions pθ ⊂ q and are replaced by q pθ and r pξ, respectively, we sometimes call q and r a positive instance and a negative instance of p, respectively.

Definition 6

(generalization) Suppose we regard each of the given positive instances P₁,…, p_n as a set of literals. If there exists a set of literals, p, such that for all i = 1,…, n, p ⊆ p_iθ_i holds for some θ_i, a substitution that replaces constant symbols with variable symbols, we call p a generalization of the positive instances p₁,…, p_n (based on θ₁,…, θ_n). If we want to emphasize that p is represented by the conjunction of its elements, we specifically say that p is a conjunctive generalization. We also call it a generalization if we replace the conditions p ⊂ p_iθ_i by p_iθ_i p (i = 1,…,n).

We sometimes identify a concept whose positive instances are {p₁,…, p_n} with their generalization. There are usually many generalizations for a given set of positive instances of a concept. Each of those generalizations is one concept. When there are two generalizations pθ₁ and pθ₂ for the same set of positive instances, we determine which is more specific or more general using the following definition.

Definition 7

(specialization) We call pθ₂ a specialization of pθ₁ when pθ₁ = pθ₂ξ holds for some substitution ξ that replaces constant symbols with variable symbols. In this case we also say that pθ₁ is a generalization of pθ_2.

We will return to the generalization and specialization of concepts in Section 6.3.

6.2 Methods for Concept Learning

Here we will explain some methods for learning a concept from a given set of instances. These methods use algorithms that take a set of instances as input and output a concept. We need to be careful because there are generally many concepts that can be derived from the same set of well-formed formulas that make up a set of instances. So an algorithm for learning concepts cannot help but include heuristic evaluation of which concept to choose. Here, as the basic method of concept learning, we describe an application of heuristic search in problem solving.

(a) Information necessary for learning concepts

In this section we consider what is necessary in an algorithm for learning concepts. First, let us focus on what we mean by learning a concept.

Definition 8

(learning a concept) Suppose we have some positive instances p₁,…, p_m and negative instances q₁,…, q_n (m ≥ 0, n ≥ 0). We say we have learned a concept for the instances p₁,…, p_m, q₁,…, q_n if we have found a well-formed formula, r, for which there is a substitution θ_i (i = 1,…,m) that replaces constant symbols with variable symbols so that and for which there is no substitution ξ_j(j = 1,…, n) that replaces constant symbols with variable symbols so that r ⊆ q_jξ_j (j = 1,…,n).

We say that r is a concept for the instances p₁,…, p_m, q₁,…, q_n. Learning a concept for a set of instances is called induction of the concept. Concept learning is learning by induction.

We need to pay attention to the following four things in order to improve the computational efficiency of learning concepts by a computer.

(1) Algorithms for learning concepts.

We will explain in subsection (b) algorithms for learning a concept from a given set of instances.

(2) Background knowledge and the introduction of new literals.

We need background knowledge to choose a specific concept for a given set of instances. We will explain in subsection (c) how we make use of such knowledge and how we make use of new literals when literals not included in the original set of instances appear in background knowledge.

(3) A set of instances and how they are given.

There are many ways in which a set of instances can be given to a system trying to learn a concept. For example, there are cases where only a small number of instances can be obtained or, to the contrary, where a large number of instances is given. There are cases where only positive instances are given. There are also cases where instances are obtained all at once or where they are obtained one by one.

A method that inputs all given instances at once into an algorithm for learning concepts is called simultaneous presentation; a method that inputs given instances one by one or incrementally is called incremental presentation. Simultaneous presentation is used, for example, in predicting weather from a large amount of given weather forecasting data. An example of this method is explained in Chapter 9. In an intelligent robot that is capable of learning, data for learning are usually obtained incrementally in real time. It is an example of incremental presentation.

(4) Representation of an instance and a concept.

We need to consider what representation language we can use to describe an instance and a concept. In this chapter, we use first-order predicate logic.

(b) Algorithms for learning concepts

Learning a concept from a given set of instances is generally done by replacing a constant included in the instance representation with a variable or by removing some literals from the instance. These transformations can be formalized as a problem solving process that involves generating and transforming predicates using some heuristic rules to obtain an appropriate well-formed formula. Let us define learning a concept as a problem solving process.

Definition 9

(problem solving) Suppose we have a set of positive instances, E_T, a set of negative instances E_F, and a set of rules, O, for transforming one well-formed formula to another. In the state space, S, where a set of well-formed formulas corresponds to a state, a rule o ∈ O can be considered to be a state transition rule on S. When we move through S to state s ∈ S using a rule o ∈ O, both and can be used as a constraint G, where θ and ξ are substitutions for replacing variable symbols with constant symbols. If we let c be the concept we are trying to find, then solving the problem P = 〈 S,O, G, E_T, {c}〉 is called learning the concept c.

Although the concept c itself is used in the above definition, the structure of c is not usually known beforehand. Also, not all the elements of the state space S are known. So, in order to solve a problem P, we need to start from an initial state E_T, apply a state transition rule to the current state, and generate a new state. By repeating this operation, if we find among the generated states a concept c for the original set of instances, we stop the search. In the problem solving framework, this can be done using an algorithm for heuristic search.

As an example, let us look at the three instances shown in Figure 6.2:

where p₁ and p₂ are positive instances and p₃ is a negative instance.

Figure 6.2 The concept of “a tower.”

We regard E_T = {p₁,p₂} as the initial state and O = {o₁, o₂} as the set of state transition rules:

o₁: (a rule for changing a constant into a variable)

Let p be one of the well-formed formulas included in the current state s ∈ S and p′ be the new well-formed formula obtained by replacing a constant symbol (or constant symbols) in p with a variable symbol (or variable symbols). In this case, the new state s′ ∈ S generated from s is s′ = s ∪ {p′} − {p}.

o₂: (a rule for removing literals)

When q is one of the well-formed formulas included in the current state s ∈ S, we can obtain a new well-formed formula q′ by removing one of the literals in q. In this case, the new state s′ ∈ S generated from s is s′ = s ∪ {q′} − {q}.

First, let us apply the rule o₁ to the state E_T− For example, if we apply the substitution θ₁ = {X/a} to p₁ ∈ E_T, we get the following state s₁:

Now, for state s₁, we have , so the state transition from E_T to s₁ satisfies the constraint. Next, we apply rule o₁ to s_1. For example, if we apply the substitution θ₂ = {X/e} to ∈ s₁ we obtain the following state, s₂:

This state also satisfies the constraint. Now, let us apply the rule o₂ to the state s₂. For example, if we remove the literal on(X, d) included in , we obtain the following state that satisfies our constraint:

Furthermore, if we apply the rules in the order shown in Table 6.1 and continue with state transitions that satisfy our constraint, we finally obtain cleartop(Y). This c is, according to Definition 9, a concept for the instances p₁, p₂, and p_3.

Table 6.1

Generating a state using a state transition rule

This example shows heuristic search in only one direction. Generally, more than one concept can be found by induction from a given set of instances, but this example does not cover the problem of how to choose one of the multiple concepts found by search. We will discuss this problem in Section 6.3.

(c) Background knowledge and introduction of new literals

There are cases when we can use background knowledge that includes literals not contained in instances. For example, suppose we have the background knowledge M

about the sample problem shown in Figure 6.2. Now if, from p₁ and p₂, we conclude

then {p₁} ∪ M and {p₂} ∪ M become true. So, if we repeatedly apply rules o₂ and o₁ to states, starting from the initial state {,} and paying attention to the negative instance p₃, we can obtain tower(Y, X ) as a concept.

Background knowledge is often useful for improving the efficiency of search; however, background knowledge can include literals that are not part of the original instances, as in the above example. Learning in which a new literal that does not appear in the original set of instances is included in the generated concept is called constructive learning of a concept. Learning that does not involve new literals is called selective learning of a concept.

6.3 Generalization of Well-Formed Formulas

The transition rules, o₁,o₂, for well-formed formules described in Section 6.2(b) are considered to be rules that generalize a well-formed formula. In this section, we will discuss the generalization of well-formed formulas, which is one of the most important problems in concept learning.

(a) MSC generalization

Let us look at the example in Figure 6.3. The three instances in Figure 6.3 can be represented as follows:

Now, let us consider four concepts

Figure 6.3 Sample problems of generalization of a predicate.

for which p₁, p₂, and p₃ are positive instances. In c₁ and c₂, the number of literals is the same, but the number of variables in c₂ is larger. The number of literals in c₃ is smaller than the number of literals in c_1. As you can see, there are differences in degree in the generalization and specialization of a concept for some given set of instances. We have already defined generalization and specialization in Definitions 6 and 7. Usually, we are interested in the most specific concepts for a given set of instances. We will explain what “the most specific” generalization is.

Let us look at generalization in the sense of Definition 6. Using the example in Figure 6.3, any one of c₁,…, c₄ is a generalization of p₁, p₂, and p_3. For example, for c₂, if we use the substitutions

the following is true:

Next, based on the definition of generalization given in Definition 6, we can define “the most specific” generalization.

Definition 10

(MSC generalization) Let p be the generalization obtained by applying substitutions θ₁,…, θ_n to a given set of positive instances {p₁,…, p_n}. Then, if there do not exist substitutions ξ_i,ρ_i (i = 1,…, n) that replace constant symbols with variable symbols and a generalization p′ of {p₁,…, p_n} such that we call p an MSC generalization (maximally specific conjunctive generalization) of p₁,…, p_n.

For example, in the previous example, if we consider c₁,…, c₄, p₁,p₂,p₃ each as a set of literals, then there do not exist substitutions ξ_i,ρ_i {i = 1,2,3) and a generalization c′ where c₁ = p_iθ_i (i = 1,2,3) for the following substitutions and θ_i (i − 1,2,3) and :

Therefore, c₁ is an MSC generalization of p₁, p₂, and p_3. On the other hand, c₂,c₃,c₄ are not MSC generalizations of p₁, p₂, and p_3. Even though an MSC generalization can be regarded as the most specific generalization, it is not always uniquely determined. However, since learning a concept often requires finding a concept that generalizes the set of instances as specifically as possible (for example, to include as few variables as possible), we usually choose one of the MSC generalizations from all the generalizations.

(b) Methods of generalization

In this section we provide methods for generalizing well-formed formulas. Rule o₁ described in Section 6.2(b) can be considered a special case of the method described below.

(1) Changing a constant into a variable

This method replaces a constant symbol in a well-formed formula with a variable symbol. For example, if we consider the substitutions

for the two well-formed formulas

we will obtain the following generalization of q₁ and q₂:

(2) Generalization of a constant in a given structure

If relations among constants are given by a semantic network that inherits the characteristics of predicates, we can make a generalization by replacing a constant by another constant that corresponds to a node above the nodes of the original constants in the semantic network. For example, if the relations among square, triangle, and polygon are given by the semantic network shown in Figure 6.4, then for

Figure 6.4 A semantic network for “figures.”

by replacing the nodes square and triangle with their parent node polygon, we can obtain the generalization

(3) Removing a literal included in a well-formed formula

When a well-formed formula is a conjunction of more than one literal, remove one or some number of literals. For example, let

Then the following is a generalization of p:

(4) Deductive transformation of part of a well-formed formula

Replace some of the literals in a well-formed formula with a literal that can be deduced from those literals and, if available, background knowledge. For example, if, as background knowledge, we can use the formula

we can generalize the expression

to the following:

(Here we use the fact that we can prove q from p and the background knowledge.)

(5) Generalization using resolution

Simplifying a well-formed formula by using the inference rule , for any well-formed formulas p, q₁, and q₂, is called resolution. (The notion of resolution we described in Section 3.7 is an example of this kind of resolution.) Using resolution, we can generalize a given well-formed formula. For example, by extending the above rule a little, we obtain (p Λ q₁ → r) Λ (∼p Λ q₂ → r) → (q₁ ν q₂ → r) for any well-formed formula p, q₁, q₂, r. Using this rule, for example, on the well-formed formula

produces the following generalization:

This is an example of a multiple concept rather than a simple concept.

By using this method as a state transition rule for learning concepts as described in Section 6.2(b), we can mechanically obtain a concept for a given set of instances. The above method, however, is a heuristic method and does not guarantee that we can always find an MSC generalization.

6.4 Version Space

As we described above, there may be many concepts for a given set of instances. To find these concepts, we can use heuristic search. Here, as an effective method for improving the efficiency of heuristic search, we will explain the method of version space, a search method that depends on the structure of the state space.

(a) An example of version space

Suppose we have four predicates representing the attributes of an object u:

We consider the following concepts, represented by a set of these predicates:

Now we leave the variable U in P₁,P₂,P₃ as is and either substitute the constant symbol triangle, small, white, fine into W, X, Y, Z or leave W, X, Y, Z as is. We let Q be the set of concepts generated from P₁,P₂,P₃- For example, the well-formed formula

that can be obtained by replacing W and X of P₂ with triangle and small, respectively, is an element of Q. In this case, the number of elements in Q is 2⁴ = 16. Now, we represent the elements of Q using a bit array of four 0s or 1s and let each bit correspond to one of the predicates shape, size, color, and texture. If a constant symbol has been substituted for one of the variables, W, X, Y, Z, then we make the corresponding bit 1; and if it stays as a variable symbol, we make it 0. For example, the formula p′₂ is represented by 1101.

Using this representation, the relations among the elements of Qs can be represented by the graph shown in Figure 6.5, where each edge in the graph represents a substitution, the top node n₁ represents the most “specific” concept, and the lowest node n₃ represents the most “general” concept.

Figure 6.5 An example of version space.

Edges or paths in the graph shown in Figure 6.5 represent substitutions. More concretely, the graph corresponds to a generalization of a well-formed formula by making constants into variables. For example, n₃ is a generalization of n₁, and n₂ is a generalization of n_1. A version space, intuitively, is a set of well-formed formulas that are related each other by generalizations as exemplified in the graph in Figure 6.5.

(b) The partial order of generalizations and version space

To explain concept learning using version space, let us give a more exact definition to version space. First, notice the fact that, as we noted in Definition 6, a generalization orders two well-formed formulas. We can describe this more generally as follows:

Definition 11

(partial order of generalizations) Let Q be a set of well-formed formulas each defined by a conjunction of literals. For q_i and q_j ∈ Q, we write q_i ≤ q_j if and only if q_i is a generalization of q_j. Then ≤ is a partial order on Q.^* We call ≤ the partial order of generalizations.

In Definition 11, an MSC generalization corresponds to a maximal element in the partial order of generalizations.

Next, let us define version space using the partial order of generalizations.

Definition 12

(version space) Suppose the partial order of generalizations, ≤, is defined for a set of concepts, Q, and Q is a complete lattice^** under ≤. In particular, there are an element q ∈ Q (the minimum element) that makes q ≤ q_i and q < q_j true and an element q′ ∈ Q (the maximum element) that makes q_i ≤ q′ and q_j ≤ q′ true for all q_i,q_j ∈ Q. We assume that the maximal element of the complete lattice Q corresponds to a given positive instance p. We call Qthe version space for the positive example p.

(c) Concept learning using the version space method

In a version space Q, relations among elements represent generalization relations among concepts. So we can narrow the search for a goal concept using the following rules:

(1) When a new positive example is given, only a concept obtained from that example or a concept that is more general than the example can be a goal concept.

(2) When a new negative example is given, no concept more general than the example can be the goal concept.

(3) Starting from the maximum element (the most specific case; in Figure 6.5, the top node) and the minimum element (the most general element; in Figure 6.5, the lowest node) of Q, we can narrow the possible subset including the goal concept using (1) and (2) together.

Finding a concept for a given set of examples using these rules is called concept learning by the version space method.

Let us use the version space method for the example in Figure 6.5. The concept

that is, “a white, fine-textured triangle,” is given as a new concept. Node n₄ in Figure 6.5 corresponds to this concept. So, possible concepts that can be found by the version space method are n₄ and the part of the graph connected by paths under n_4. In Figure 6.5, this part is shown with bold lines.

Next, suppose

in other words, “a white triangle,” is given as the concept corresponding to a negative instance. The node for this concept is n₅ in Figure 6.5. Then the concept we are looking for does not exist in the part of the graph that includes n₅ and that is connected by paths under n_5. That is, the nodes surrounded by broken lines cannot correspond to the concept we are looking for. Now, if

that is, “a fine-textured triangle” (node n₆ in Figure 6.5), is the concept for a negative instance, n₆, then n₂ cannot be the concept we are looking for. So, the concept we want is narrowed down to just one:

This concept corresponds to node n₇ in Figure 6.5.

This example uses a small graph, but the version space method can be applied to graphs with many more nodes. The method can be used even when the elements of the space, in other words, the well-formed formulas that are candidates for the goal concept, are not known beforehand. This is true since a node of a graph can be generated every time an instance is given. In summary, concept learning using the version space method has the following features:

(1) If the maximum element and the minimum element of a version space and a method of generalization are given, we do not need to know the structure of the space beforehand.

(2) Learning a concept is possible not only when instances are input simultaneously but also when they are given one by one.

(3) When a variable is included in each element of a version space, the amount of computation can be very large.

(4) It does not always work for noisy or defective data.

Of these features, (1) and (2) are the most important in real use. For (3), notice that many concept relations can be formulated using a version space that does not involve variables at all.

6.5 Conceptual Clustering

So far we have described methods for concept learning from a given set of instances. On the other hand, classifying instances by finding some relation among them and separating them into several clusters so that each cluster corresponds to a concept is also an important problem in concept learning. Classifying a given set of instances into clusters so that each cluster corresponds to a different concept is called conceptual clustering.

(a) Conceptual clustering using attributes

Let us consider a conceptual clustering method that classifies instances based on attributes. Suppose that there are n objects o₁,…, o_n, and each has some attributes represented by m variables x₁,…, x_m. For example, there are eight automobiles o₁,…, o₈, and each has the values shown in Table 6.2 for the three attributes, speed x₁ ∈ {high, medium, low}, color x₂ ∈ {red, blue, green}, and price x₃ ∈ {high, relatively-high, relatively-low, low}. Below, using this example problem, we will explain conceptual clustering.

Table 6.2

Data for the example of conceptual clustering using attributes

We now try to classify n given objects into several different clusters by whether their attribute-value pairs are similar or not. Suppose that the number of clusters k is given beforehand and that we have an evaluation function f that can tell us how close objects are (of course, this depends on the method of classification). Conceptual clustering by attributes can be characterized as classifying n objects into k different clusters so that the value of f is minimum. Let us look at Table 6.2. It can be rewritten as Figure 6.6 when based on attributes. Now, let us define the evaluation function f as follows:

Figure 6.6 A matrix for objects o₁,…, o₈ and their attributes.

First, to determine the area surrounding some attribute-value pairs, we can specify part of the matrix shown in Figure 6.6. For example, specifies the gray area in Figure 6.6. An area around attribute-value pairs determined in this way is called an attribute complex. The number of matrix elements that are included in an attribute complex α, but do not contain an object, is called the sparseness of α and is written s(α). For example, the sparseness of the attribute complex corresponding to the gray part of Figure 6.6 is s(α) = 4. We can imagine that the higher the sparseness of an attribute complex is, the larger it will be as a cluster for classifying given objects. Suppose we have found a set of attribute complexes that can classify k objects, {α₁,…, α_k}, where α_i (i = 1,…, k) is the attribute complex that separates the objects in the ith cluster from the other objects. We now define f as

Also, we agree that f is minimized over sets of attribute complexes that can classify k different objects. For example, in the problem in Figure 6.6, if we assume k = 2, conceptual clustering by attributes becomes the problem of finding two attribute complexes α₁,α₂ that divide o₁,…, o₈ into different clusters and that make f = s(α₁) + s(α₂) minimum.

Below, we will describe an algorithm for solving this kind of problem. At the heart of this algorithm is a clustering algorithm for dividing a set of objects into k clusters. By applying it recursively, we can generate a representation where clusters are hierarchically structured. Below we will show an algorithm for clustering including an algorithm for generating a cluster hierarchy. By a cover of a set of objects O, we mean a set of sets α = {Q₁, …, Q_m} that satisfies the property . Also, if hold for a cover of the set O, α = {Q₁,…, Q_m}, {O₁,…, O_m} is called a partition of O by the cover α.

Algorithm 6.1

(conceptual clustering by attributes)

[I] Clustering

[1] Let I be the upper limit of the number of iterations. Set iteration = 1, . (Setting parameter values)

[2] Extract k objects o₁,…, o_k from the given objects. (Extracting cluster seeds)

[3] Obtain a set of attribute complexes {α_i¹,…, α_iⁿⁱ} that distinguishes o_i from o₁, …, o_i−1, o_i+1, …, o_k for each i (i = 1,…, k).

[4] For the attribute complexes obtained in [3]:

find (where ranges over only the covers of {o₁,…, o_k}). If f_min ≤ f₀, then set f₀ = f_min. Pick one of that give f_min, and call it .Go to [5].If f_min≤f₀ then go to [6]. (Selecting an attribute complex that makes the evaluation function value minimum)

[5] If iteration ≥ I, go to [6]. Otherwise, go to [7]. (Checking the number of iterations)

[6] Stop. The partition of {o₁,…,o_k} by the cover is a solution to the clustering. (Termination condition)

[7] Select one of the objects included in each of the k attribute complexes (i = 1,…, k), in , and call it o_i (i = 1,…, k). These objects are selected by the following substeps. (Selecting new cluster seeds)

[7.1] Do the following for each (i = 1,…, k). Let p be the vector of values for attributes of objects o in . If there are two of these vectors, p and q, define the distance between p and q by the number of their elements that are not the same (called the Hamming distance). Obtain, as follows, the median vector for the vectors, p_j (j = i₁,…,j_nj), corresponding to objects included in :

For each dimension of the vector p_j (j = i₁,…,i_nj), pick the value that appears most often. Then let be the vector of those values.

[7.2] Pick the vector that has the smallest Hamming distance to g(α^{^}^ji_i) from the vectors p_j (j = i₁,…, i_nj) corresponding to objects in (i = 1,…, k). Let the objects corresponding to this vector be the objects selected from .

When all the objects were selected by the substeps, let iteration = iteration + 1 and go to [3].

[II] Generating the hierarchy

[1] Divide the given objects into k clusters using the clustering algorithm. (Application of clustering)

[2] If the hierarchy of clusters reaches a given depth, stop. Otherwise, go to [3]. (Termination condition)

[3] Generate subclusters by applying [1] to each cluster generated in [1]. This increments the depth of the cluster hierarchy. Go to [2]. (Recursive application)

Let us apply the clustering algorithm to the example in Figure 6.6, where the upper limit of the number of iterations is set to 2.

[1] Let I = 2, iteration = 1, .

[2] Pick o₁ and o₂ from o₁,…, o₈ (which object is chosen influences the efficiency of the computation).

[3] Suppose we found the following attribute complexes that include o₁ but do not include o₂:

In this case, {o₁,o₄,o₅,o₇,o₈} ∈ α¹₁ and {o₁,p₃,o₄,o₆,o₇} ∈ α¹₂. Suppose we found the following attribute complexes that include o₂ but do not include o₁:

In this case, {o₂,o₃, o₆} ∈ α¹₂and {o₂,o₅, o₈} ∈ α²₂ (There are many other sets of attribute complexes that satisfy the conditions, but we limit ourselves here to these two.)

[4] If we let for , only α_A and α_D are a cover of {o₁,…, o₈}. For these, using

the minimum value of f is f_min = 22 and f_min < f_0. The cover that gives f_min is α_A, so let .

[5] Since iteration < 2, go to [7].

[7] The objects included in α¹₁, o₁, o₄, o₅, o₇, o₈, can be represented by the following three-dimensional vectors for the values of the attributes x₁, x₂, x_3.

Then we can pick as the median vector. In this case, o₇ is the object that is closest to g. Similarly, if we pick as the median vector, we find that o₆ is the object that is closest to g. Set iteration = 2.

[3] If we found the following attribute complexes that include o₇ but do not include o₆:

we have {o₁, o₃, o₄, o₇, o₈} , and {o₁, o₄, o₇, o₈} ∈ α²_1. Also, if we found the following attribute complexes that include o₆ but do not include o₇:

we have (o₂, o₅, o₆} .

[4] For α_A = {α¹₁, α¹₂},α_B = {α²₁, α¹₂} only α_A is a cover of {o₁, …, o₈}. For α_A we have

and since Go to [5].

[5] Since iteration = 2, go to [6].

[6] Stop. The partition {o₁, o₃, o₄, o₇, o₈}, {o₂, o₅, o₆} by the cover is a solution of the clustering.

The two clusters found in the above example turn out to be the cluster of the objects where the value of the attribute x₃ is h or rh and the cluster of the objects where the value of the attribute x₃ is rl or l.

The algorithm for conceptual clustering described here is a heuristic method and does not always find a clustering that makes f, the sum of the sparseness, minimum. However, not many conceptual clustering methods have been proposed where the clusters are unknown. Besides algorithms using the attribute values, as we described here, there are other conceptual clustering methods where the data are input one by one.

(b) Conceptual clustering and cluster analysis

Cluster analysis is a clustering method similar to conceptual clustering. Conceptual clustering is different from cluster analysis in that it classifies clusters according to the semantic structures of concepts. Cluster analysis cannot usually take global attributes of the data set into consideration, since it represents the value of an attribute of the data as a single number. On the other hand, conceptual clustering finds clusters based on the type of concepts it is given. Therefore, even if the distance between two objects is small in the sense of cluster analysis, if the two objects are examples of different concepts, conceptual clustering can possibly do the classification.

When we think about classification based on concepts, the goal of the classification and background knowledge are necessary elements in the classification. Although we have not denned background knowledge, we can understand its importance. For example, even variables that are not included in the attribute descriptions of a given object can be used for clustering as long as the variables and the given attributes are related by background knowledge.

Summary

In this chapter, we defined the notion of concept and its instances. We also described algorithms for learning concepts.

6.1. (Simple) concepts can be defined by the conjunction of literals, and their instances can be obtained as the conjunction of ground literals where constant symbols are substituted into variable symbols in a concept.

6.2. To learn concepts, we need to decide on an algorithm, background knowledge and its representation, a method for introducing new literals, a set of instances and a way to input them, and a method of representing instances and concepts.

6.3. Learning concepts can be done by generalizing well-formed formulas. Generalization of well-formed formulas is based on the notion of MSC generalization. Generalization can be done by heuristically applying methods like replacing constant symbols with variable symbols and applying deductive transformation.

6.4. Concept learning can be done based on an algorithm for heuristic search in problem solving. A popular method of this kind is the version space method.

6.5. One algorithm for conceptual clustering is to classify a set of instances into exclusive clusters, each corresponding to a different concept.

Exercises

6.1. Enumerate decisions that you have to make when learning concepts on a computer.

6.2. For the example problem (Figure 6.1) in Section 6.1(c), consider the conjunction of literals

and the substitution

Then show that p″ and n″ are concepts for the positive instances , and the negative instance pξ_1.

6.3. Find an MSC generalization of positive instances:

Also, give an example of a generalization that is not an MSC generalization. Furthermore, check if the found MSC generalization is an overgeneralization according to our knowledge. If so, devise a method for improving it.

6.4. Write a learning algorithm for integration by parts of indefinite integrals using the version space method.

^*For a given set A and a binary relation ≤ on the elements of A, ≤ is a partial order on A if and only if the reflection law (a_i ≤ a_i), the symmetry law (if a_i ≤ a_j and a_j ≤ a_i, then a_i = a_j), and the transitive law (if a_i ≤ a_j and a_j ≤ a_k, then a_i ≤ a_k) are true for all a_i, a_j, a_k ∈ A. For a_j ∈ A, if a_i ≤ a_j is true for all a_i ∈ A where a_j ≠ a_i, then a_j is called a maximal element of the partial order ≤. If a_j ∈ A and if a_j ≥ a_i is true for all a_j ≠ a_i, then a_i is a minimal element.

^**If, for each subset D of a partially ordered set Q, there exist q and q′ such that q ≤ q_i and q_i ≤ q′ for all q_i ∈ D, then we say that Q is a complete lattice. We call q and q′ a lower bound and an upper bound of D, respectively. When D = Q, we call them the minimum and maximum elements of Q.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 6: Learning Concepts

Create new playlist

Sign In

Sign Up