Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5

Symbolic Learning

5.1 Introduction

The preceding chapters have discussed ways of representing knowledge and drawing inferences. It was assumed that the knowledge itself was readily available and could be expressed explicitly. However, there are many circumstances where this is not the case, such as those listed here, adapted from Alpaydin (2010).

The software engineer may not possess the domain expertise. The knowledge would need to be obtained from a domain expert through a process of knowledge acquisition (O’Leary 1998) or, conversely, the domain expert would need to acquire software engineering skills. An attractive alternative is for the system to be designed to learn for itself.
The rules that describe a particular domain may not be completely understood, such as listening to and interpreting a spoken voice.
Even though a domain may be well understood, it may not be expressible explicitly in terms of rules, facts, or relationships. This category includes skills, such as welding or painting.
The problem may change over time. For example, the optimal routing of packets on a computer network varies according to network loading patterns.

One way around these difficulties is to have the system learn for itself from a set of example solutions. Two approaches can be broadly recognized—symbolic learning and numerical learning. Symbolic learning describes systems that formulate and modify rules, facts, and relationships, explicitly expressed in words and symbols. In other words, they create and modify their own knowledge base. Numerical learning refers to systems that use numerical models; learning in this context refers to techniques for optimizing the numerical parameters. Numerical learning includes artificial neural networks (Chapter 8) and a variety of optimization algorithms such as genetic algorithms (Chapter 7) and simulated annealing (Chapter 6).

A learning system is normally given some feedback on its performance. The source of this feedback is called the teacher or the oracle. Often the teacher role is fulfilled by the environment within which the knowledge-based system is working, that is, the reaction of the environment to a decision is sufficient to indicate whether the decision was right or wrong. Learning with a teacher is sometimes called supervised learning. Learning can be classified as follows, where each category involves a different level of supervision:

Rote learning.
The system receives confirmation of correct decisions. When it produces an incorrect decision it is “spoon-fed” with the correct rule or relationship that it should have used.
Learning from advice.
Rather than being given a specific rule that should apply in a given circumstance, the system is given a piece of general advice, such as “gas is more likely to escape from a valve than from a pipe.” The system must sort out for itself how to move from this high-level abstract advice to an immediately usable rule.
Learning by induction.
The system is presented with sets of example data and is told the correct conclusions that it should draw from each. The system continually refines its rules and relations so as to correctly handle each new example.
Learning by analogy.
The system is told the correct response to a similar, but not identical, task. The system must adapt the previous response to generate a new rule applicable to the new circumstances.
Explanation-based learning (EBL).
The system analyzes a set of example solutions and their outcomes to determine why each one was successful or otherwise. Explanations are generated, which are used to guide future problem solving. EBL is incorporated into PRODIGY, a general-purpose problem-solver (Minton et al. 1989).
Case-based reasoning (CBR).
Any case about which the system has reasoned is filed away, together with the outcome, whether it be successful or otherwise. Whenever a new case is encountered, the system adapts its stored behavior to fit the new circumstances. Case-based reasoning is discussed in further detail below.
Explorative or unsupervised learning.
Rather than having an explicit goal, an explorative system continuously searches for patterns and relationships in the input data, perhaps marking some patterns as interesting and warranting further investigation. Examples of the use of unsupervised learning include:
- Data mining, where patterns are sought among large or complex data sets.
- Identifying clusters, possibly for compressing the data.
- Learning to recognize fundamental features, such as edges, from pixel images.
- Designing products, where innovation is a desirable characteristic.

In rote learning and learning from advice, the sophistication lies in the ability of the teacher rather than the learning system. If the teacher is a human expert, these two techniques can provide an interactive means of eliciting the expert’s knowledge in a suitable form for addition to the knowledge base. However, most of the interest in symbolic learning has focused on learning by induction and case-based reasoning, discussed in the following two sections. Reasoning by analogy is similar to case-based reasoning, while many of the problems and solutions associated with learning by induction also apply to the other categories of symbolic learning.

5.2 Learning by Induction

5.2.1 Overview

Rule induction involves generating from specific examples a general rule of the type:

if <general circumstance> then <general conclusion>

Since it is based on trial-and-error, induction can be said to be an empirical approach. We can never be certain of the accuracy of an induced rule, since it may be shown to be invalid by an example that we have not yet encountered. The aim of induction is to build rules that are successful as often as possible, and to modify them quickly when they are found to be wrong. Whatever is being learned—typically, rules and relationships—should match the positive examples but not the negative ones.

The first step is to generate an initial prototype rule that can subsequently be refined. The initial prototype may be a copy of a general-purpose template, or it can be generated by hypothesizing a causal link between a pair of observations. For instance, if a specific valve valve_1 in a particular plant is open and the flow rate through it is 0.5m3s_1, we can propose two initial prototype rules in Flex format:

rule r5_1

	if status of valve_1 is open

	then flow_rate of valve_1 becomes 0.5.

rule r5_2

	if flow_rate of valve_1 is 0.5

	then status of valve_1 becomes open.

The prototype rules can then be modified in the light of additional example data or rejected. Rule modifications can be classified as either strengthening or weakening. The condition is made stronger (or specialized) by restricting the circumstances to which it applies, and conversely it is made weaker (or generalized) by increasing its applicability. The pair of preceding examples could be made more general by considering other valves or a less precise measure of flow. On the other hand, they could be made more specific by specifying the necessary state of other parts of the boiler. A rule needs to be generalized if it fails to fire for a given set of data, where we are told by the teacher that it should have fired. Conversely, a rule needs to be specialized if it fires when it should not.

Assume for the moment that we are dealing with a rule that needs to be generalized. The first task is to spot where the condition is deficient. This is easy when using pattern matching, provided that we have a suitable representation. Consider the following prototype rule:

rule r5_3

	if status of X is open

	and X is a gas_valve

	then flow_rate of X becomes high.

In Chapter 4, names beginning with an uppercase letter were used for classes, consistent with conventions in C++ and UML. However, in Flex, class names are in lowercase, while names beginning with an uppercase letter are used to indicate a local variable that can be matched to something. Rule r5_3 would fire, given the scenario:

status of valve_1 is open.

valve_1 is a gas_valve.

The rule is able to fire because status of valve_1 is open matches status of X is open and valve_1 is a gas_valve matches X is a gas_valve. The conclusion flow_rate of valve_1 is high would be drawn. Now consider the following scenario:

status of valve_2 is open.

valve_2 is a water_valve.

The rule would not fire. However, the teacher may tell us that the conclusion flow_rate of valve_2 is high should have been drawn. We now look for matches as before and find that status of valve_1 is open matches status of X is open. However, there is no match to X is a gas_valve, so this part of the condition needs to be generalized to embrace the circumstance X is a water_valve.

We can therefore recognize where a rule is deficient by pattern-matching between the condition part of the rule and the scenario description. This is analogous to means–ends analysis, which is used to determine a plan for changing the world from its current state to a goal state (see Chapter 4, Section 4.3.3 and Chapter 13, Section 13.3). Means–ends analysis typically uses pattern-matching to determine which rules can lead to such a change.

5.2.2 Learning Viewed as a Search Problem

The task of generalizing or specializing a rule is not straightforward, as there are many alternative ways in which a rule can be changed. Finding the correctly modified rule is a search problem, where the search field can be enormous. Figure 5.1 shows a possible form of the search tree, where each branch represents a generalization or specialization that would correctly handle the most recently encountered input. Subsequent inputs may reveal an incorrect choice (indicated by a dot at the end of a branch). The system must keep track of its current position in the search tree, as it must backtrack whenever an unsuitable choice is found to have been made.

Figure 5.1

Image of A search tree for rules.

A search tree for rules.

Recording all of the valid options (or branches) that make up a search tree is likely to be unwieldy. The Lex system (Mitchell et al. 1983) offers a neat solution to this problem. Rather than keeping track of all the possible solutions, it simply records the most general and the most specific representations that fit the examples. These boundaries define the range of acceptable solutions. The boundaries move as further examples are encountered, converging on a smaller and smaller choice of rules, and ideally settling on a single rule.

There are other difficulties associated with the search problem. Suppose that a rule-based system is controlling a boiler. It makes a series of adjustments, but ultimately the boiler overheats, that is, the control objective has not been achieved. In this case, the feedback of the temperature reading to the learning system serves as the teacher. The system is faced with the difficulty of knowing where it went wrong, that is, which of its decisions were good and which ones were at fault. This is the credit-assignment problem. Credit assignment applies not only to negative examples such as this (which could be termed “blame assignment”), but also to cases where the overall series of decisions was successful. For example, a boiler controller that succeeds in keeping the temperature within the specified range might do so because several good decisions compensate for poorer ones.

The frame problem (or situation-identification problem), introduced in Chapter 1, affects many areas of artificial intelligence, particularly planning (Chapter 13). It is also pertinent here, where the problem is to determine which aspects of a given example situation are relevant to the new rule. A system for control of a boiler will have access to a wide range of information. Suppose that it comes across a set of circumstances where it is told by the teacher that it should shut off valve_2. The current world state perceived by the system is determined by stored data and sensor values, for example:

valve_1 is shut.

valve_2 is open.

gas_flow_rate is high.

gas_temperature is 300 . /* Celsius assumed */

steam_temperature: 150 . /* Celsius assumed */

fuel_oil_stored: 100 . /* gallons assumed */

fuel_oil_supplier: acme.

In order to derive a rule condition such as that in rule r5_4 below, the system must be capable of deducing which parts of its world model to ignore—such as the information about fuel oil—and which to include.

rule r5_4

	if gas_flow_rate is high

	and status of valve_2 is open

	then report('** shut valve_2 **').

The ability to distinguish the relevant information from the rest places a requirement that the system must already have some knowledge about the domain.

5.2.3 Techniques for Generalization and Specialization

We can identify at least five methods of generalizing (or specializing through the inverse operation):

Universalization;
Replacing constants with variables;
Using disjunctions (generalization) and conjunctions (specialization);
Moving up a hierarchy (generalization) or down it (specialization); and
Chunking.

5.2.3.1 Universalization

Universalization involves inferring a new general rule from a set of specific cases. Consider the following series of separate scenarios:

Status of valve_1 is open and flow_rate of valve_1 is high.

Status of valve_2 is open and flow_rate of valve_2 is high.

Status of valve_3 is open and flow_rate of valve_3 is high.

From these we might induce the following general rule:

rule r5_5

	if status of X is open

	then flow_rate of X becomes high.

Thus, we have generalized from a few specific cases to a general rule. This rule turns out to be too general as it does not specify that X must be a valve. However, this could be fixed through subsequent specialization based on some negative examples.

5.2.3.2 Replacing Constants with Variables

Universalization shows how we might induce a rule from a set of example scenarios. Similarly, general rules can be generated from more specific ones by replacing constants with local variables (see Chapter 2, Section 2.6). Consider, for example, the following specific rules:

rule specific1

	if status of gas_valve_1 is open

	then flow_rate of gas_valve_1 becomes high.

rule specific2

	if status of gas_valve_2 is open

	then flow_rate of gas_valve_2 becomes high.

rule specific3

	if status of gas_valve_3 is open

	then flow_rate of gas_valve_3 becomes high.

rule specific4

	if status of gas_valve_4 is open

	then flow_rate of gas_valve_4 becomes high.

rule specific5

	if status of gas_valve_5 is open

	then flow_rate of gas_valve_5 becomes high.

From these specific rules we might induce a more general rule:

rule r5_5

	if status of X is open

	then flow_rate of X becomes high.

However, even this apparently simple change requires a certain amount of metaknowledge (knowledge about knowledge). The system must “know” to favor the creation of rule r5_5 rather than, for example:

rule r5_6

	if status of X is open

	then flow_rate of Y becomes high.

or:

rule r5_7

	if status of X is open

	then Y of X becomes high.

Rule r5_6 implies that if any valve is open (we’ll assume for now that we are only dealing with valves) then the flow rate through all valves is deduced to be high. Rule r5_7 implies that if a valve is open then everything associated with that valve (e.g., cost and temperature) becomes high.

5.2.3.3 Using Conjunctions and Disjunctions

Rules can be made more specific by adding conjunctions to the condition and more general by adding disjunctions. Suppose that rule r5_5 is applied when the world state includes the information status of office_door is open. We will draw the nonsensical conclusion that the flow_rate of office_door is high. The rule clearly needs to be modified by strengthening the condition. One way to achieve this is by use of a conjunction (and):

rule r5_8

	if status of X is open

	and X is a gas_valve

	then flow_rate of X becomes high.

We are relying on the teacher to tell us that flow_rate through office_door is high is not an accurate conclusion. We have already noted that there may be several alternative ways in which local variables might be introduced. The number of alternatives greatly increases when compound conditions are used. Here are some examples:

If valve_1 is open and valve_1 is a gas_valve then ...

If valve_1 is open and X is a gas_valve then ...

If X is open and valve_1 is a gas_valve then ...

If X is open and X is a gas_valve then ...

If X is open and Y is a gas_valve then ...

The existence of these alternatives is another illustration that learning by rule induction is a search process in which the system searches for the correct rule.

Suppose now that we wish to extend rule r5_8 so that it includes water valves as well as gas valves. One way of doing this is to add a disjunction (or) to the condition part of the rule:

rule r5_9

	if status of X is open

	and [X is a gas_valve or X is a water_valve]

	then flow_rate of X becomes high.

This is an example of generalization by use of disjunctions. The use of disjunctions in this way is a “cautious generalization,” as it caters for the latest example, but does not embrace any novel situations. The other techniques risk overgeneralization, but the risky approach is necessary if we want the system to learn to handle data that it may not have seen previously. Overgeneralization can be fixed by specialization at a later stage when negative examples have come to light. However, the use of disjunctions should not be avoided altogether, as there are cases where a disjunctive condition is correct. This dilemma between the cautious and risky approaches to generalization is called the disjunctive-concept problem. A reasonable approach is to look for an alternative form of generalization, only resorting to the use of disjunctions if no other form can be found that fits the examples.

5.2.3.4 Moving up or down a Hierarchy

Rule r5_9 showed a cautious way of adapting rule r5_8 to include both water valves and gas valves. Another approach would be to modify rule r5_8 so that it dealt with valves of any description:

rule r5_10

	if status of X is open

	and X is a valve

	then flow_rate of X becomes high.

Here, we have made use of an is-a-kind-of relationship in order to generalize (see Chapter 4, Section 4.6.4). In the class hierarchy for valves, water_valve and gas_valve are both specializations of the class valve, as shown in Figure 5.2, which uses UML-style capitalization for class names (see Chapter 4, Section 4.6.6).

Figure 5.2

Image of Class hierarchy for valves.

Class hierarchy for valves.

5.2.3.5 Chunking

Chunking is a mechanism for automated learning that is used in SOAR (Laird et al. 1986; Laird et al. 1987). SOAR is a production system (i.e., one that uses production rules—see Chapter 2) that has been modeled on a theory of human cognition. It works on the premise that, given an overall goal, every problem encountered along the way can be regarded as a subgoal. Problems are, therefore, tackled hierarchically; if a goal cannot be met at one level, it is broken down into subgoals. These automatically generated subgoals may be satisfied through the application of production rules stored in a long-term memory from previous runs of the system. However, the application of such rules may be slow, as several of them may need to be fired successively to satisfy the subgoal. Once SOAR has recognized the series of rules required to satisfy a subgoal, it can collapse them down into a single production rule. This is the process of chunking. The new rule, or chunk, is then stored so that it can rapidly solve the same subgoal if it should arise again in the future.

SOAR also offers a novel method of conflict resolution (see Chapter 2, Section 2.8). The usual role of a production rule is to change some aspect of the system’s state that models the problem being tackled. In SOAR, any rule that can fire does so, but the changes to the current state proposed by the rule are not applied immediately. Instead, they are added to a list of suggested changes. A separate set of production rules is then applied to arrange conflicting suggestions in order of preference—a process known as search control. Once conflict has been resolved through search control, changes are made to the actual state elements.

5.3 Case-Based Reasoning (CBR)

The design of intelligent systems is often inspired by attempts to emulate the characteristics of human intelligence. One such characteristic is the ability to recall previous experience whenever a similar problem arises. This is the essence of case-based reasoning (CBR). As Riesbeck & Schank (Riesbeck and Schank 1989) put it,

A case-based reasoner solves new problems by adapting solutions that were used to solve old problems.

Consider the example of diagnosing a fault in a refrigerator, an example that will be revisited in Chapter 11. If an expert system has made a successful diagnosis of the fault, given a set of symptoms, it can file away this information for future use. If the expert system is subsequently presented with details of another faulty refrigerator of exactly the same type, displaying exactly the same symptoms in exactly the same circumstances, then the diagnosis can be completed simply by recalling the previous solution. However, a full description of the symptoms and the environment would need to be very detailed, and the chances of it ever being exactly reproduced are remote. What we need is the ability to identify a previous case, the solution of which can be modified to reflect the slightly altered circumstances, and then save it for future use. Aamodt and Plaza (1994) have therefore proposed that CBR can be described by a four-stage cycle:

Retrieve the most similar case(s).
Re-use the case(s) to attempt to solve the problem.
Revise the proposed solution if necessary.
Retain the new solution as a part of a new case.

Such an approach is arguably a good model of human reasoning. Indeed, case-based reasoning is often used in a semiautomated manner, where a human can intervene at any stage in the cycle.

5.3.1 Storing Cases

The stored cases form a case base. An effective way of representing the relevance of cases in the case base is by storing them as objects. Riesbeck and Schank (1989) have defined a number of types of link between classes and instances in order to assist in locating relevant cases. These links are described below and examples of their use are shown in Figure 5.3.

Figure 5.3

Image of Classifying case histories: abstraction links and index links between classes

Image of Classifying case histories: links between instances

Classifying case histories: (a) abstraction links and index links between classes; (b) links between instances.

5.3.1.1 Abstraction Links and Index Links

The classes may form a structured hierarchy, in which the different levels correspond to the level of detail of the case descriptions. Riesbeck and Schank (Riesbeck and Schank 1989) distinguish two types of link between classes and their specializations: abstraction links and index links. A subclass connected by an abstraction link provides additional detail to its superclass, without overriding any information. An index link is a specialization in which the subclass has an attribute value that is different from the default defined in its superclass. For example, the class Refrigerator_fault might have an index link to the class Electrical_fault, whose defaults relate to faults in vacuum cleaners (Figure 5.3).

Suppose that we wish to save an instance of fixing a refrigerator, where the refrigerator exhibited the following symptoms:

Failed to chill food.
Made no noise.
The light would not come on.

This case might be stored as an instance of the class Refrigerator_fault, which, as already noted, has an index link to the class Electrical_fault.

5.3.1.2 Instance-of Links

Each case is an instance of a specific class of cases (see Chapter 4 for a discussion of object classes and instances). Thus, each case has an instance-of relation with its parent class, as shown in Figure 5.3.

5.3.1.3 Scene Links

These are used to link a historical event to its subevents. For example, removing the outer casing of a refrigerator’s compressor is a subevent of the “changing compressor bearings” event. Both the event and subevent are stored as case instances, with a scene link between them (Figure 5.3).

5.3.1.4 Exemplar Links

These are used to link instances to other similar instances. Suppose that a refrigerator motor fault was diagnosed by referring to a previous case involving a failed washing machine motor. This new case could have an exemplar link to the case from which it was adapted (Figure 5.3).

5.3.1.5 Failure Links

A failure link is a special type of class–instance relation, where the instance represents a specific case where things did not turn out as expected. This is a convenient way of storing cases that form exceptions to their general category. For example, the class Motor_fault might have a failure link to a case in which a replacement motor created a new problem, such as radio interference (Figure 5.3).

5.3.2 Retrieving Cases

The problem of retrieving cases that match a new case, termed the probe case, is eased if the case base is carefully indexed using links as described earlier. Such links are often used as a basis for the first stage of a two-stage interrogation of the case base. The second stage involves ranking the matched cases according to a measure of similarity to the probe case (Ferguso and Bridge 2000).

For a given probe case, a suitable similarity measure Si for case i in the case bases could be:

$S_{i} = \sum_{j = 1}^{P} w_{j} m_{i j}$ $S_{i} = \sum_{j = 1}^{P} w_{j} m_{i j}$

where P is the number of parameters considered, wj is an importance weighting applied to parameter j, and mij is a degree of match between case i and the probe case for the parameter j. For some parameters, such as the presence or absence of a given observation, mij is binary, that is, it is either 0 or 1. Other parameters may concern continuous variables such as temperature, or nearly continuous variables such as cost. These can still yield binary values for mij if the measure is whether the parameter is within an acceptable range or tolerance. Alternatively, a sliding scale between 0 and 1 can be applied to mij. This is akin to using a fuzzy membership value rather than a crisp one (see Chapter 3).

5.3.3 Adapting Case Histories

There are two distinct categories of techniques for adapting a case history to suit a new situation, namely, structural and derivational adaptation. Structural adaptation describes techniques that use a previous solution as a guide and adapt it to the new circumstances. Derivational adaptation involves looking at the way in which the previous solution was derived, rather than the solution itself. The same reasoning processes are then applied to the set of data describing the new circumstances, that is, the probe case. Four structural techniques and one derivational technique are outlined below.

5.3.3.1 Null Adaptation

This is the simplest approach and, as the name implies, involves no adaptation at all of the previous solution. Instead, the previous solution is given as the solution to the new case. Suppose that the case history selector decides that a failed refrigerator is similar to the case of a failed washing machine. If the washing machine failure was found to be due to a severed power lead, then this is offered as the solution to the refrigerator problem.

5.3.3.2 Parameterization

This is a structural adaptation technique that is applicable when both the symptoms and the solution have an associated magnitude or extent. The previous solution can then be scaled up or down in accordance with the severity of the symptoms. Suppose, for example, that a case history was as follows:

Symptom: Fridge cabinet temperature is 15°C, which is too warm;

Solution: Reduce thermostat setting by 11°C.

If our new scenario involves a fridge whose cabinet temperature is 10°C, which is still warmer than it should be, the new solution would be to turn down the thermostat, but by a modified amount (say, 6°C).

5.3.3.3 Reasoning by Analogy

This is another structural adaptation technique. If a case history cannot be found in the most appropriate class, then analogous case histories are considered. Given a hierarchically organized database of case histories, the search for analogous cases is relatively straightforward. The search begins by looking at siblings, and then cousins, in a class hierarchy like the one shown in Figure 5.3. Some parts of the historical solution may not be applicable, as the solution belongs to a different class. Under such circumstances, the inapplicable parts are replaced by referring back to the class of the current problem.

As an example, consider a refrigerator that is found to be excessively noisy. There may not be a case in the database that refers to noisy refrigerators, but there may be a case that describes a noisy washing machine. The solution in that case may have been that the bearings on the washer’s drum were worn and needed replacing. This solution is not directly applicable to the refrigerator, as a refrigerator does not have a drum. However, the refrigerator has bearings on the compressor, and so it is concluded that the compressor bearings are worn and are in need of replacement.

5.3.3.4 Critics

The use of critics has stemmed from work on a planning system called HACKER (Sussman 1975) that could rearrange its planned actions if they were found to be incorrectly ordered (see Chapter 13). The ideas are also applicable to other problems, such as diagnosis (see Chapter 11). Critics are modules that can look at a nearly correct solution and determine what flaws it has, if any, and suggest modifications. In the planning domain, critics would look for unnecessary actions, or actions that make subsequent actions more difficult. Adapting these ideas to diagnosis, critics can be used to “fine-tune” a previous solution so that it fits the current circumstances. For instance, a case-based reasoner might diagnose that the compressor bearings in a refrigerator need replacing. Critics might notice that most compressors have two sets of bearings and that, in this particular refrigerator, one set is fairly new. The modified solution would then be to replace only the older set of bearings.

5.3.3.5 Reinstantiation

The preceding adaptation techniques are all structural, that is, they modify a previous solution. Reinstantiation is a derivational technique, because it involves replaying the derivation of the previous solution using the new data. Previously used names, numbers, structures, and components are reinstantiated to the corresponding new ones. Suppose that a case history concerning a central heating system contained the following abduction to explain from a set of informally stated rules why a room felt cold:

If thermostat is set too low,

then boiler will not switch on.

If boiler is not switched on,

then radiators stay at ambient temperature.

If radiators are at ambient temperature,

then room will not warm up.

Abductive conclusion: thermostat is set too low.

By suitable reinstantiation, this case history can be adapted to diagnose why food in a refrigerator is not chilled:

If thermostat is set too high,

then compressor will not switch on.

If compressor is not switched on,

then cabinet stays at ambient temperature.

If cabinet is at ambient temperature,

then food will not be chilled.

Abductive conclusion: Thermostat is set too high.

5.3.4 Dealing with Mistaken Conclusions

Suppose that a system has diagnosed that a particular component is faulty, and that this has caused the failure of an electronic circuit. If it is then discovered that there was in fact nothing wrong with the component, or that replacing it made no difference, then the conclusion needs repair. Repair is conceptually very similar to adaptation, and similar techniques can be applied to modify the incorrect conclusion in order to reach a correct one. If this fails, then a completely new solution must be sought. In either case, it is important that the failed conclusion be recorded in the database of case histories with a link to the correct conclusion. If the case is subsequently retrieved in a new scenario, the system will be aware of a possible failure and of a possible way around that failure.

5.4 Summary

Systems that can learn offer a way around the so-called “knowledge acquisition bottleneck.” In cases where it is difficult or impossible to extract accurate knowledge about a specific domain, it is clearly an attractive proposition for a computer system to derive its own representation of the knowledge from examples. We have distinguished between two categories of learning systems—symbolic and numerical—with this chapter focusing on the former. Inductive and cased-based methods are two particularly important types of symbolic learning.

Rule induction involves the generation and refinement of prototype rules from a set of examples. An initial prototype rule can be generated from a template or by hypothesizing a causal link between a pair of observations. The prototype rule can then be refined in the light of new evidence by generalizing, specializing, or rejecting it. Rule induction from numerical systems such as neural networks is also possible; discussion of this is deferred until Chapter 9 on hybrid systems.

Case-based reasoning involves storing the details of every case encountered, successful or not. A stored case can subsequently be retrieved and adapted for new, but related, sets of circumstances. This is arguably a good model of human reasoning. The principal difficulties are in recognizing relevant cases, which necessitates a suitable storage and retrieval system, and in adapting stored cases to the new circumstances. The concepts of CBR have been illustrated here with reference to fault diagnosis, but CBR has also found a wide range of other applications including engineering sales (Watson and Gardingen 1999), call-center customer support (Cheetham and Goebel 2007), decision support for color matching (Cheetham 2005), and planning (Marefat and Britanik 1997).

Table of Contents for
Chapter 5: Symbolic Learning

Symbolic Learning

5.1 Introduction

5.2 Learning by Induction

5.2.1 Overview

5.2.2 Learning Viewed as a Search Problem

5.2.3 Techniques for Generalization and Specialization

5.2.3.1 Universalization

5.2.3.2 Replacing Constants with Variables

5.2.3.3 Using Conjunctions and Disjunctions

5.2.3.4 Moving up or down a Hierarchy

5.2.3.5 Chunking

5.3 Case-Based Reasoning (CBR)

5.3.1 Storing Cases

5.3.1.1 Abstraction Links and Index Links

5.3.1.2 Instance-of Links

5.3.1.3 Scene Links

5.3.1.4 Exemplar Links

5.3.1.5 Failure Links

5.3.2 Retrieving Cases

5.3.3 Adapting Case Histories

5.3.3.1 Null Adaptation

5.3.3.2 Parameterization

5.3.3.3 Reasoning by Analogy

5.3.3.4 Critics

5.3.3.5 Reinstantiation

5.3.4 Dealing with Mistaken Conclusions

5.4 Summary

Further Reading

Table of Contents for Chapter 5: Symbolic Learning

Create new playlist

Sign In

Sign Up

Symbolic Learning

5.1 Introduction

5.2 Learning by Induction

5.2.1 Overview

5.2.2 Learning Viewed as a Search Problem

5.2.3 Techniques for Generalization and Specialization

5.2.3.1 Universalization

5.2.3.2 Replacing Constants with Variables

5.2.3.3 Using Conjunctions and Disjunctions

5.2.3.4 Moving up or down a Hierarchy

5.2.3.5 Chunking

5.3 Case-Based Reasoning (CBR)

5.3.1 Storing Cases

5.3.1.1 Abstraction Links and Index Links

5.3.1.2 Instance-of Links

5.3.1.3 Scene Links

5.3.1.4 Exemplar Links

5.3.1.5 Failure Links

5.3.2 Retrieving Cases

5.3.3 Adapting Case Histories

5.3.3.1 Null Adaptation

5.3.3.2 Parameterization

5.3.3.3 Reasoning by Analogy

5.3.3.4 Critics

5.3.3.5 Reinstantiation

5.3.4 Dealing with Mistaken Conclusions

5.4 Summary

Further Reading

Table of Contents for
Chapter 5: Symbolic Learning