Transfer
No animal, human or nonhuman, steps twice into the same perceptual stream. A young monkey that picks a ripe mango in a tree must learn something general about ripe mangoes, because it will certainly never see that particular mango again. A young lion that stalks and kills an impala must learn something general about hunting impalas, because it will certainly never meet that particular impala again. A college course in ethology or learning, or any other subject, is valuable only to the extent that it teaches general principles that students can apply in new situations to events that are still in the future. The same is true of job experience, whether it be teaching school, building circuits, or selling shoes; the value of the experience depends on the general usefulness of the learning. Whatever may be learned about a particular object or event, the most valuable things that are learned have to do with variable aspects of general stimulus classes. Theories of learning must explain how past experience transfers to new situations.
STIMULUS GENERALIZATION
After conditioning with a particular stimulus, Sa, similar stimuli, Sb, Sc, … Sn, can also evoke the same conditioned response, even though these other stimuli never appeared during training. This phenomenon, which has direct implications for transfer, is called stimulus generalization. Interest in this phenomenon has generated an enormous volume of research showing that any experimental procedure that produces conditioning also produces stimulus generalization.
Basic Phenomenon
Hovland’s (1937) study of the generalization of the conditioned galvanic skin response along a pitch dimension is a good reference experiment. Hovland used psychophysical judgments to select four tones that were separated in pitch by 25 just noticeable differences (JNDs). These frequencies were 153, 468, 1,000, and 1,967 cycles. For one group of subjects, the Sa during conditioning was the lowest tone and the test stimuli during extinction were the three higher tones. For a second group, the Sa during conditioning was the highest tone and the test stimuli during conditioning were the three lower tones. Hovland presented the test stimuli in different counterbalanced orders to different subjects to distribute the effects of extinction evenly among the test stimuli. Figure 13.1 shows the response to tones separated in pitch from the original Sa by 25, 50, and 75 JNDs. The amount of response decreased progressively with the stimulus distance in JNDs from the Sa, which is the typical result of generalization experiments.
Another way to test for generalization is to mix conditioning trials with probe trials in which test stimuli replace the Sa, but this introduces other difficulties. If S* appears on the probe trials, then it is paired with the test stimuli and they can become conditioned stimuli in their own right. If S* is omitted on probe trials, then the procedure becomes a partial reward procedure, which can introduce even more complications. These problems of measurement only matter when experimenters are attempting to find the precise shape of the generalization curve, usually to test a particular theory of generalization. The basic phenomenon of stimulus generalization that decreases with stimulus distance from the original conditioned stimulus is so well documented that it is beyond dispute.
Stimulus generalization raises a fundamental theoretical problem for a reinforcement theory. If response to the Sa is a measure of reinforcement strength, then how can other stimuli that the subject never experienced before gain reinforcement strength from Sa?
Implications
The reason for intense interest in gradients of stimulus generalization is that they describe transfer from original learning to new situations in terms of stimulus similarity. A rule for predicting, or better still enhancing, transfer from one learned task to another would be a valuable tool in practical teaching and training situations. In principle, it should be possible to measure the stimulus similarity between any two situations by analyzing each into basic dimensions, such as color, brightness, temperature, loudness, and so on, and then calculating the amount of transfer that should take place from any original object to any transfer object.
Peak Shift
Pigeons have very good vision so they are a favorite subject for the study of stimulus generalization in the Skinner box. The wavelength, or hue, of the light on the key is a favorite dimension because these birds also have very good color vision. In original training, pigeons see light of only one particular wavelength on the key and receive reward for pecking when that light is on the key. The wavelength in original training particular wavelength that corresponds to the CS in classical conditioning. During tests for generalization carried out under extinction, lights of different wavelengths appear on the key in a counterbalanced order to sample response to the spectrum of colors like the CS.
Hanson (1957, 1959) was the first to investigate the effect of discrimination training on generalization gradients in the Skinner box. He projected lights of a single wavelength on the key in a Skinner box. He first rewarded pigeons for pecking the key when the wavelength of the light on the key was 550 nm. Human beings describe 550 nm as a yellowish green. Next he administered discrimination training, which consisted of lighting the key for brief periods with 550 nm during half of the periods and with a different wavelength during the other half. Hanson rewarded the pigeons with grain for pecking during the 550 nm on a VI 60 schedule, but never rewarded them for pecking during the other stimulus. Thus, 550 nm was S+ and the other wavelength was S–. For different groups of pigeons, 555, 560, 570, or 590 nm served as S–. Human beings usually describe 580 nm as yellow and 600 as the beginning of reddish yellow (with 620 as true orange). After this discrimination training, Hanson (1959) tested the pigeons with different wavelengths ranging from 480 nm (greenish blue) to 620 nm (orange). Thus, the generalization test measured the response to a series of lights of different colors that included the former S+ and S– as well as a variety of colors that the pigeons had never seen before in this experiment.
In the usual generalization test, the former S+ evokes the maximum response and the other stimuli evoke less response the farther they are from the former S+. That is, responding usually peaks at the former S+ and falls off more or less symmetrically with distance from the former S+. Figure 13.2 illustrates the peak shift effect that Hanson found instead.
In Fig. 13.2, the curve traced with a solid line is the control condition in which birds only saw the S+ of 550 nm during original training and saw the range of test wavelengths during extinction. The maximum response or highest point of this curve is at 550 nm as we would expect, and the amount of response to the test wavelengths falls off for lower wavelengths, or bluer lights, and for higher wavelengths, or yellower lights. The solid curve shows that generalization is roughly symmetrical, falling off in roughly equal amounts for lower and higher wavelengths. If anything the control group generalized more to the higher, yellower, wavelengths.
The other curves in Fig. 13.2 show the generalization to the test stimuli and labels with arrows, S– = 555, S– = 560, S– = 570, and S– = 590, identify each curve. After discrimination training, the curves are all displaced away from the S–. The pigeons in these groups responded more to test stimuli that they had never seen before in their lives than they did to the former S+. Many other studies using a variety of different stimulus dimensions, tones, or angles of lines as well as colors, have replicated the peak shift effect (Cheng, Spetch, & Johnston, 1997; Rilling, 1977; Thomas, 1993). From the point of view of reinforcement theory, this is a puzzling finding. If amount of response measures reinforcement strength, how can a new stimulus have even more strength than the original S+?
Spence (1936, 1937) based his reinforcement model (Fig. 13.3) on the notion that discrimination training produces two gradients, one positive gradient of reinforcement and one negative gradient of extinction. Spence’s model would yield a pattern like the peak shift with the peak response to a new stimulus that is displaced away from the former S–. While Spence’s model predicts the general pattern of results, the results of peak shift experiments fail to confirm specific predictions of the theory.
Suppose that the traditional view is wrong to say that pigeons in the Hanson experiments learn that 550 nm is positive and 590 nm is negative. Suppose they learn instead that greener is positive and yellower is negative. Then during the generalization test, they respond more to a light that is greener (or, perhaps, bluer) than the original CS. Helson (1964) proposed a theory of perception based on relative values of stimulus dimensions, which he called adaptation-level theory. Thomas (1993) shows in detail how Helson’s theory describes the results of many experiments in discrimination that followed Hanson’s discovery of the peak shift.
TRANSPOSITION
Hanson’s results depend on the relation between S+ and S– along the dimension of color. This agrees well with experiments on transposition that present one pair of stimuli S+ and S– in a simultaneous discrimination and measure the transfer to a second pair of stimuli.
Suppose that in a two-choice apparatus, like the one illustrated in Fig. 3.4, the two visual stimuli are a 4-cm circle that serves as S+ and a 2-cm circle that serves as S–. Suppose further that, after a sufficient number of training trials, the animals are choosing the 4-cm circle on nearly every trial. What should happen if the experimenter alters the problem so that now the animals must choose between a new 8-cm circle and the old 4-cm circle?
If all they learned in the original problem was to choose the 4-cm circle, then they should continue to choose the 4-cm circle because they never received either reward or nonreward for choosing the 8-cm circle. Indeed, they never saw the 8-cm circle before in their lives. On the other hand, if they learned to choose the larger of the two circles, then they should pick the new 8-cm circle at once rather than choosing the formerly correct 4-cm circle. The better they learned to choose the 4-cm circle in the original 4-cm versus 2-cm problem, the more certain the transfer to choosing the 8-cm circle in the new 8-cm versus 4-cm problem.
Transfer experiments of this type have appeared throughout this century using a wide variety of stimulus values and the results are decisive. Animals respond to the relation between the original and the transfer stimuli (larger/smaller, brighter/darker, and so on) rather than to the absolute values. In experiments like the one just described, the animals mostly choose the new 8-cm circle, which is larger than the formerly correct 4-cm circle. This result is called transposition because the subjects transpose the dimensional values of the stimuli in the training problem to the stimuli in the test problem. Cognitive psychologists claimed that reinforcement theory must predict that animals should choose the 4-cm circle in the transposition test between 4-cm and 8-cm. The cognitivists called their theory the relational theory and claimed that habit theories can only predict transfer on the basis of the absolute values of stimuli.
Spence’s Model
Spence (1936, 1937), a close associate of Hull, developed a model based on reinforcement theory plus generalization gradients, which is illustrated in Fig. 13.3. Spence, quite reasonably, pointed out that reinforcement theory entails two gradients of generalization that arise from training on the first problem, a positive gradient of reinforcement originating from the 4-cm circle (solid curve) and a negative gradient of inhibition originating from the 2-cm circle (dotted curve). In Hull’s theory, the negative gradient subtracts from the positive gradient. Figure 13.3 represents the amount of inhibition as dotted bars rising from each stimulus value and represents the difference between excitation and inhibition as white bars between the positive and negative gradients.
Spence assumed that the positive effect of reward is greater than the negative effect of nonreward, which is immanently reasonable. Otherwise, a mixture of reward for going to S+ and nonreward for going to S– would make rats stop running altogether. This assumption is also confirmed by the fact that partial reward is sufficient to maintain most responses. If reward excites and nonreward inhibits, then the excitation of a few rewards must be worth more than the inhibition of many nonrewards.
Notice that the stimulus values in Fig. 13.3 appear in logarithmic units. That is, each stimulus value on the horizontal axis of the graph is double the one before, even though each linear distance along the axis is equal to the one before. This agrees well with the known facts of psychophysics. Psychologically equal stimulus intervals are usually separated by logarithmic stimulus values of units such as length, area, or intensity. In making this move, Spence based his model on the dimensionality of stimuli. The model assumes dimensionality rather than deriving it from a theory of learning or cognition.
In this graph of Spence’s model, the white bar that shows the positive strength at 4 cm is longer than the positive bar at 2 cm, but it is still not the longest positive bar. This is because the negative gradient arising from 2 cm subtracts from the positive value at 4 cm. The positive differential value at 8 cm is greater than the positive differential value at 4 cm, and so Spence’s model also predicts transposition even though it is based on S-R reinforcement principles. The second thing we see is that the white bar at 16 cm is still greater than the white bar at 8 cm, although the difference is less than the difference between 4 cm and 8 cm. This means that there should be some transposition when the test problem is 8 cm versus 16 cm. The rats should choose 16 cm over 8 cm, but transposition should be weaker than transposition to 4 cm versus 8 cm.
The third thing that we see from Fig. 13.3 is that the white bar at 32 cm is actually smaller than the white bar at 16 cm. This means that in a 16-cm versus 32-cm transfer problem, transposition should fail and the rats should choose the smaller 16-cm circle over the larger 32-cm circle. This is the most important prediction in Spence’s model because it predicts that transposition will fail or even reverse at extreme values. Cognitive theory, on the other hand, must predict that transposition will go on forever, because all the animals learn is relations between stimuli, such as larger or brighter. Thus, Spence’s dimensional model predicts a new result that contradicts cognitive theories.
Experimental tests of Spence’s predictions (see Hebert & Krantz, 1965; Schwartz & Reisberg, 1991, for reviews) have generally confirmed his prediction that transposition eventually breaks down, but tests have only partly confirmed the more daring prediction of reversed transposition. Ehrenfreund’s (1952) experiment shown in Fig. 13.4 is typical. He tested rats in a T-maze with cards that varied in brightness serving as the stimuli. In Experiment I, S– was white and S+ was a light grey in the original training. In Experiment II, S– was black and S+ was a dark gray. In the transposition tests, the stimuli were intermediate grays presented in pairs chosen to appear at roughly equal steps apart in brightness. The results of the transfer tests appear in Fig. 13.4, which shows that transposition broke down when the transfer pairs were farthest from the original pairs and there was some evidence for transposition reversal in Experiment II. The eventual breakdown of transposition in all experiments confirms Spence’s model. The lack of evidence for eventual reversal of transposition indicates that the model is only partly successful. All of these results plainly contradict the cognitive relational model.
Psychophysics
Riley (1958) conducted the definitive transposition experiment. Riley used a Lashley jumping stand, which is a two-choice apparatus for simultaneous discrimination (see Fig. 3.4). He presented his animals with an original brightness discrimination problem and different transfer problems as in the usual transposition experiment. The difference was that Riley also varied the brightness of the front of the apparatus that surrounded the stimulus cards. As most readers should know, the apparent brightness of a gray patch depends on the brightness of the surround. If the surround is white, a gray patch looks dark, and if the surround is black, a gray patch looks light.
Note. These are simplified numbers for illustration purposes. The exact values that Riley used are slightly different.
The design of Riley’s experiment appears in Table 13.1. All of the rats first mastered a discrimination between brightness value 3 and brightness value 10, in which 10 was S+ and 1 was the brightness value of the surround. Then they were tested on four different transfer problems.
In what Riley called the near absolute test, the surround remained at 1, while the brightnesses of the test stimuli increased to 30 and 100. In Riley’s far absolute test, the surround remained at 1, and the test pairs increased to 300 and 1,000. In what Riley called the near relative test, he raised the brightness of the surround to 10 and the brightness of the test pairs to 30 and 100. In Riley’s far relative test, he raised the surround to 100 and the test pairs to 300 and 1,000. Thus, in one set of tests, called the absolute tests, the relative brightness of the test stimuli increased while the brightness of the surround stayed constant, as in all other experiments on transposition up to that time. In the other set of tests, that Riley called relative, all three brightnesses including the surround increased in the same proportion.
Riley’s absolute tests replicated the earlier transposition experiments. The animals transposed when given the near absolute test in which the surround was constant from training to transfer. That is, they continued to choose the brighter of the new test stimuli even though the new S+ was 10 times brighter than the original S+. Transposition broke down on the far absolute test. The animals failed to choose the brighter of the two new test stimuli when Riley kept the brightness of the surround the same as in original training, but raised the brightnesses of the test stimuli by a factor of 100.
As in earlier experimental tests, which had kept the surround constant, transposition failed at the far absolute test just as Spence predicted. Spence’s most dramatic prediction failed, however. Even at the extreme when the new S+ was 100 times brighter than the original S+, transposition only failed; there was no reversal. That is, the rats failed to chose brightness 300 (which is closer to brightness 3) over brightness 1,000. Instead they chose 300 and 1,000 about equally.
The animals transposed completely on both the near and the far relative tests in which Riley raised the brightness of the surround along with the brightnesses of the test stimuli. When the ratio of the test stimuli to the surround stimulus was the same as in original training, transfer was perfect. The animals responded as if nothing had changed. Indeed, the graduate research assistant complained that he, a human, had difficulty telling the relative transfer stimuli apart and had to check the labels on the back of stimulus panels to be sure which set was which.
So, animals can transpose indefinitely as cognitive theories should predict, but they transpose for bottom-up perceptual rather than top-down cognitive reasons. Real animals see only relative brightness. Without an artificial meter, human beings also see only relative brightness. Psychological theories that treat living organisms as though they can respond to particular levels of brightness, size, color, and so on, treat the animals and the tasks as abstractions. Real animals are very different from meters. Animals respond to relative values rather than absolute values because they see only relative values in the first place.
SIGN STIMULI
Ethologists soon recognized the dimensionality of the sign stimuli that evoke species-specific action patterns (discussed in chap. 4). Herring gulls only sit on nests that contain eggs. What is the sign stimulus that makes a herring gull brood egglike objects? Tinbergen (1953a, pp. 144–159) removed eggs from temporarily unattended nests of herring gulls and placed them in a nearby empty and unattended nest. At the same time, he placed artificial eggs in a third nest, equally nearby. Sometimes he placed artificial eggs in both test nests. The gulls chose by brooding on the objects in one of the nests. Tinbergen varied such things as shape and color and found that the gulls usually chose gull eggs over artificial eggs, but they chose some artificial eggs over others. He found that size was a dimension that mattered very much to brooding gulls. When he offered them a choice between a Herring gull egg and an artificial egg that had the same shape and color but was only half the normal size, all of the gulls chose the normal gull’s egg. When he offered them a choice between a normal egg and an artificial egg that was double the normal linear size—that is, eight times the volume of a normal egg—all gulls chose the giant egg. “All birds which were given the large egg became very excited and made frantic attempts to cover it. In doing so, they invariably lost their balance, and their evolutions were, I must confess, most amusing to watch” (p. 158).
Magnus (1963) found the same phenomenon in the mating behavior of the fritillary butterfly, Argynnis paphia. Butterflies live to mate, but how do they recognize females fluttering about in the fields? How do they discriminate female Argynnis paphia from other butterflies or even flowers in a meadow? Magnus moved artificial lures in pairs in an experimental room to see what sort of lure would attract males of this species. His lures were many times larger than normal fritillaries, but this seemed to be acceptable to the butterflies. The lures were cylinders with black stripes alternating with yellow stripes made by pasting the wings of female butterflies on to the cylinders. When Magnus rotated the cylinders, they presented a flickering pattern to the male butterflies like the flickering pattern of the wings of a flying female butterfly. The males chose lures with a flicker speed like the flicker of normal wings over slower flicker speeds, but they also chose faster speeds over the normal speed. In fact, like the Herring gulls who chose the giant eggs, the fritillaries overwhelmingly chose a flicker speed far faster than any that a live butterfly could ever produce.
Magnus also varied the color of the stripes. He offered a cylinder with yellow stripes made of fritillary wings and cylinders with painted yellow stripes. The artificial yellows varied in saturation, which in the present context is similar to intensity of colors. The butterflies chose the normal fritillary yellow over less saturated yellows, but they also chose more saturated yellows over the normal yellow. Just as with flicker speed, their favorite yellow was much more saturated than anything that they could ever see on a female’s wings.
Human beings are also sensitive to species-specific sign stimuli. The profiles of newborn mammals are very different from the profiles of adults. Their jaws and snouts are severely reduced relative to their eyes and brain cases. It is a nursing face. Jaws and snouts develop adult proportions when the animals have to eat for themselves. Human babies also have a nursing face that attracts human beings. Artists take advantage of this and draw superbabies on greeting cards and illustrations, babies with impossibly large eyes and brain cases relative to their tiny mouths and noses. The nursing faces of other baby mammals also attract both juvenile and adult humans. Artists use this dimension very effectively to create super-adorable animals in animated films.
B. T. Gardner and Wallach (1965) showed that university students are sensitive to the dimension of shape that defines the relationship between human adults and human babies. They measured silhouettes of babies and adults to define the dimension of shape and used this dimension to create the superbabies illustrated in Fig. 13.5. In paired comparisons, undergraduates chose all of the superbabies as more babyish than the silhoutte of an actual baby. This shape dimension is fairly complex, but it has the dimensional properties of sign stimuli.
Ethologists call sign stimuli that are impossible in nature, but more effective than naturally occuring stimuli, supernormal stimuli. They show that sign stimuli are dimensional. Eggs that are too small repel brooding gulls, but eggs that are too large attract them. Flickering wings that are not yellow enough repel male butterflies, but superyellow wings attract them. Why are living animals sensitive to impossible stimuli? The reason is that relative sensation is cheaper and more efficient than absolute sensation.
Dimensional Stimuli
Gradients of generalization and transposition only puzzle those who imagine separate receptors for each stimulus, in the case of Fig. 13.2, separate retinal receptors for each wavelength in the visible spectrum. Sensory research plainly contradicts this view. Research on human color vision (Kaiser & Boynton, 1996; Webster, 1996) points to a system of three color receptors with overlapping ranges of sensitivity that work like the overlapping categories of Kipersztok and Patterson’s (1995) fuzzy controller described in chapter 9. As in the case of fuzzy controllers, a relatively small set of overlapping sensors does the job more economically and efficiently than a large array of separate receptors for each value on a dimension (Erickson, Di Lorenzo, & Woodbury, 1994).
Chapter 4 introduced the principle of parsimony in scientific theories. In comparisons between theories, the best theory is the one that accounts for the most evidence with the least amount of assumptions. Biologists place a high value on parsimony for an additional reason. Competition for survival favors economical and efficient biology. Kipersztok and Patterson’s (1995) fuzzy controller appeals to biologists for the same reason that it appeals to the Boeing Aircraft Company; it does the job economically and efficiently.
Throughout science and industry, from antique beam balances to electronic strain gauges, the most economical and efficient measuring devices sense relative values of overlapping categories. It is cheaper and more efficient for fritillary butterflies to respond to relative yellow rather than to the absolute yellow of actual female wings.
Neural Networks
Animals extract dimensional and patterned information—color, brightness, curvature, angularity, and so on—from the blooming buzzing confusion of the natural world around them. This baffled early psychologists who often attributed dimensional and patterned perception to mysterious cognitive abilities. Early simple-minded mechanistic models, such as Spence’s outlined here, had more power than systems that only considered specific stimulus-response conditioning. They were also more parsimonious than cognitive systems and they made precise and even counterintuitive predictions, but they were clearly too crude to cope with the whole problem.
In modern times, fairly simple-minded electronic computers have taken some of the mystery out of complex psychological phenomena. The idea that a mechanical device could solve complex problems has a long history. In 1936 the mathematician and logician, Turing, showed how a mechanical device that could only write either a 1 or a 0 on squares of paper could write any logical proposition, given enough paper and time. Turing’s proof described a machine that would work in principle but was far beyond the technology of the 1930s, not to mention all that paper and time.
There were dramatic advances in computer technology during World War II, however, and Turing, himself, was a leader in these advances. By 1943, McCulloch and Pitts showed how a network of logical units that operated as switches with two positions, on or off corresponding to Turing’s 1s and Os, could compute sophisticated logical propositions limited only by the number of units in the network. They proposed a model of a brain based on existing computer technology and introduced the term neural networks. Next, Hebb (1949) showed how a neural network could extract patterned information from sensory input by a self-organizing system that he called cell assembly. Hebb conjectured that, if units that fired together had an increased likelihood of firing together again, then groups of cells could organize themselves into cell assemblies that recognized repeating patterned and dimensional information.
The limited speed and capacity of early computers plagued the first attempts to build an electronic computer based on Hebb’s cell assemblies and McCulloch and Pitts’ neural networks. Computer scientists persisted, however, and more powerful computers appeared that use artificial neural networks to extract useful amounts of dimensional and patterned information from the natural world (J. A. Anderson, 1995). Webster (1996) shows how previously intractable problems of dimensional adaptation and contrast in human color vision can be solved if the visual system uses mechanisms like those found in artificial neural networks. Bateson and Horn (1994) proposed a model based on artificial neural networks that simulates major perceptual phenomena of imprinting (chap. 2), such as recognition of fine detail, while generalizing to classes of stimuli. The mysteries of patterned and dimensional perception are fading away.
HABITS, HYPOTHESES, AND STRATEGIES
The first part of this chapter discussed transfer based on stimulus similarity. This next part considers transfer based on problem-solving strategy.
Habits Versus Hypotheses
In a two-choice, simultaneous discrimination, the experimenter makes the left-right arrangement of S+ and S– random, or at least unpredictable. The subjects, however, respond in predictable ways because they are natural biological systems. Truly random behavior must be rare in nature. Gambling casinos try very hard to produce truly random sequences of events, but they always fail to some extent. Casinos must constantly change equipment such as dice and roulette wheels to prevent their customers from winning by memorizing slight deviations from randomness that appear in all such devices.
Usually, animals in a two-choice apparatus respond exclusively left or right for dozens, even scores, of trials. Sometimes, particularly early in training, they make alternate runs of trials to one side or the other, and even switch from side to side trial by trial for several trials. Experimenters take pains to see that the left-right arrangement of S+ and S– varies unpredictably so that the nonrandom choices of the animals earn rewards 50% of the time on average. In terms of correct and incorrect choices, the typical learning curve of an individual animal stays flat around 50% for many trials.
Once response to S+ starts to rise above 50%, it rises steeply until it reaches 100%. Group learning curves look much more gradual because experimenters pool and average the data for many subjects. Because different subjects start rising at different points, the average rise looks smooth. In reinforcement theories, correct choices of S+ and avoidance of S– grow out of reward and nonreward. In feed forward theory, a rhythm of responding within trials builds up because, by definition, rhythms require repetition. Both theories assume that animals are learning even when their choices yield only 50% reward in what is called the presolution period.
Cognitive theorists have taken a different view of the function of the presolution period. According to Lashley (1929), the form of the learning curve “suggests that the actual association is formed very quickly and that both the practice preceding and the errors following are irrelevant to the actual formation of the association.” According to Krechevsky (1932), “Learning consists of changing from one systematic, generalized, purposive way of behaving to another and another until the problem is solved” (p. 532). In Krechevsky’s (1938) analysis of discrimination learning, the subject tries out hypotheses during the presolution period:
Once the animal is immersed in a given problem-situation, the animal selects out of the welter of possible stimuli certain sets of discriminanda to which he reacts. Each time (while “paying attention to” this particular set of discriminanda) he makes what proves to be “correct response,” he learns (wrongly perhaps) something about the significance of this particular stimulus; each time he makes a “wrong” response he learns something else, but he does not learn anything about the “correctness” or “wrongness” of the to-be-final-learned set of discriminanda. Eventually he gives up responding to his first set of discriminanda and responds to another set, and another set, etc. (p. 111)
In this view, subjects actually learn nothing about S+ and S– during the presolution period because they are trying out hypotheses about other solutions to the discrimination problem. It is only after they have tried and rejected the other hypotheses that they begin to try hypotheses about which is correct, S+ or S–, and only then do they form a preference for S+ over S–. This suggested an experimental test.
Suppose that the experimenter reverses the correctness of the two stimuli during the presolution period. If black was serving as S+ and white as S–, then the experimenter would switch to reward for white choices and nonreward for black choices. If the animals had, indeed, learned nothing about the correctness of black and white before the reversal, as Krechevsky claimed, then they should do as well as animals that had the original problem, black+ versus white– throughout. According to reinforcement theory, of course, there should be negative transfer with S+ and S– reversed because the differential strength of black over white had already begun to form even though the subjects were still responding according to position or alternation biases.
Krechevsky’s own experiments and several others conducted by cognitive theorists agreed with his prediction. Groups of rats that had S+ and S– reversed during the presolution period performed roughly as well as groups of rats that stayed on the same problem from start to finish. Krechevsky tested this, however, by counting the first trial of reversal as Trial 1 for the reversed groups and counting the first trial of original training as Trial 1 for the group that had the same problem throughout.
The error here is an error in operational definition. It is very similar to the error that cognitive theorists made in the latent learning experiments discussed in chapter 5. Karn and Porter (1946) showed how familiarity with general aspects of the experimental procedure, such as handling, deprivation schedules, the type of apparatus, and so on, are critical factors for a rat that must solve an experimental learning task. During the presolution period, both the reversed and the constant groups are learning all of these nonspecific aspects of their task as well as learning to choose S+ over S–. For the groups that stayed on the same problem from start to finish, Krechevsky and his cognitive colleagues counted all of the trials from start to finish. For the groups that had S+ and S– reversed, however, they counted only the trials after the reversal. Obviously, they failed to count some important trials. When trials to the solution criterion are counted from the start to finish for both groups, then the groups that reversed during the presolution period always required more total trials than the groups that continued with the same problem throughout. That is, the reversed groups showed the negative transfer that habit theories predict (see Hall, 1976, pp. 360–363; Kimble, 1961, pp. 128–134, for more extensive discussions of this problem).
Overtraining and Reversal
In another claim for the superiority of cognitive theory over reinforcement theory, rats were overtrained, that is, run for many trials after reaching criterion, before reversal. Reid (1953) was the first to demonstrate the overtraining reversal effect. He trained three groups of rats to make a black-white discrimination in a Y-maze. All groups first reached a criterion of 9 out of 10 correct responses. Following this, one group was reversed immediately, a second group continued on the original problem for 50 trials of overtraining before reversal, and a third group received 150 overtraining trials before they were reversed. Counting trials from the point of reversal, the immediate reversal group required a mean of 138.3 trials to reach criterion on the reversal problem, while the 50-trial overtrained group required a mean of 129 trials, and the 150-trial reversal group required a mean of only 70 trials.
The overtraining reversal effect attracted a great deal of experimental and theoretical interest for many years (see Gardner, 1966; Hall, 1976, pp. 370–373; Mackintosh, 1974, pp. 602–607, for detailed reviews). A popular cognitive view was that overtrained subjects learn to attend better to critical aspects of the stimuli (Flaherty, 1985; Sutherland & Mackintosh, 1971). But, why should overtraining have an attentional effect? In other situations, repetitious drill lowers attention and impairs problem solving as illustrated in the section on learning sets and problem solving later in this chapter.
A positive effect of overtraining on reversal only contradicts reinforcement theory if reinforcement theory must always predict that repetition only strengthens a discrimination. This grossly underestimates the depth and generality of reinforcment theories such as those of Hull and Spence. As explained in chapter 10, those theories predict the opposite: that overtraining eventually produces a reduction in response strength as confirmed by runway experiments. In addition, Wolford and Bower (1969) applied Hull-Spence theory directly to overtraining and reversal in a two-choice discrimination and showed how it predicted faster reversal after overtraining.
In either case, however, there is an experimental error in the usual interpretation of the overtraining reversal effect. This is the same error as in the latent learning experiments and presolution hypothesis experiments. A fair test must add the overtraining trials to the reversal trials because the overtrained animals did receive these trials. In the case of Reid’s (1953) typical finding, the immediate reversal group only took 138.3 trials to master the reversal after reaching the 9 out of 10 criterion on the original problem, the 50-trial overtraining group took 129 + 50 or 179 trials to reach criterion on the reversed problem, and the 150-trial overtraining group required 70 + 150 or 220 trials for reversal. That is, the sooner they got the reversed problem, the sooner they solved it—if we count total trials. The following example illustrates this point.
Suppose that a colleague at a university medical school needs to study the effect of diet on vision and consults a comparative psychologist about the best way to solve the following practical problem. To study vision our medical colleague trained rats on a black versus white discrimination in a T-maze and rewarded them all for choosing black. Now, our colleague realizes that to test the physiological hypothesis all of the rats should approach white rather than black. Furthermore, the research project is running behind schedule, so the time it takes to reverse the animals is important. What should we advise? Should we tell our colleague in the medical school to give the rats the reversal problem immediately, or should we advise 100 or 200 trials of overtraining before the reversal?
Coate and R. A. Gardner (1965) tested this directly when they trained two groups of rats to criterion on the same discrimination problem in the apparatus of Fig. 3.4. They varied the amount of experience with the experimental procedure by mixing in different numbers of trials on a second problem. The group that had more experience with the experimental procedure reversed faster even though both groups should have performed equally well according to the cognitive theory of reversal.
In the case of latent learning without reward, presolution hypotheses, and overtraining reversal, cognitive theorists insisted that reinforcement theories can only explain strengthening by reward and weakening by nonreward. In the traditional cognitive view, anything else that animals learn from experience must be cognitive. Both human and nonhuman animals learn a great deal from their experience in the experimental procedure, and experimental operations must control for the amount of experience before attributing results either to cognition or to reinforcement.
Learning Sets
The phenomenon that Harlow called learning sets is one of the great discoveries of 20th-century psychology. It was the work of an ingenious experimenter trying to find out how intelligent his monkeys were, rather than the work of a clever theorist out to prove how intelligent he was.
To test his monkeys Harlow designed an apparatus that he called the Wisconsin General Testing Apparatus (WGTA) shown in Fig. 13.6. In the WGTA, the experimenter places various pairs of objects over two food wells. One well is baited with food; the other is empty. The left-right arrangement of baited and unbaited wells varies in a counterbalanced and unpredictable sequence. If a monkey displaces S+, the object over the baited food well, it gets the food. If the monkey displaces S–, the object over the empty well, it gets nothing. In either case, the experimenter immediately removes the tray bearing the objects and food wells to set up the next trial. Harlow took advantage of the fact that monkeys can respond to a very large range of stimuli and used what he called “junk” objects as S+ and S–. That is, he and his assistants shopped at large stores and bought a variety of cheap objects, such as buttons, spools of thread, control knobs, and so forth. With these “junk” objects, Harlow and his associates could present hundreds of problems to a monkey over a long period of time without repeating any given pair of objects.
The monkeys could try to solve these problems by always choosing, say, the larger object of each pair because the larger object was correct on a previous problem. Or, they could always choose the rounder, or the darker, or the redder object for the same reason. Such strategies would have to fail because the positive and negative objects varied randomly in size, shape, color, and so on, from problem to problem.
Harlow found that, while naive monkeys could take as many as 100 trials to reach criterion with the early pairs of objects, they steadily improved and solved later problems in fewer and fewer trials. If we call a particular pair of objects A and B, and A serves as S+, then on the first trial with A and B the monkey cannot know which object is S+. On average, the best it can do is 50% correct. If the monkey correctly chooses A on Trial 1, then it can be correct 100% of the time from then on, if it chooses A on Trial 2 and on every trial after that. If the monkey chooses B on Trial 1, then it is incorrect on that trial, but it has enough information to choose A on Trial 2 and every trial after that and can also choose correctly 100% of the time starting with Trial 2. Each problem consists of a fresh pair of objects. After the first trial with a new pair of objects, the monkeys can win every time if they always repeat their correct choices and never repeat their incorrect choices—that is, if they use a win-stayllose-shift strategy.
Harlow found that, as the monkeys proceeded from the first problem, A versus B, to the next problem, C versus D, and the next, E versus F, and so on, they improved steadily. Eventually, they were solving new problems within the minimum possible two trials. Figure 13.7 shows the improvement on Trials 2 through 6. Each line shows the averages at successive stages of improvement. During the latest stage, Problems 201 to 312, the monkeys were nearly always correct from Trial 2 onward. They had learned a strategy for solving this kind of problem. Harlow and others replicated this finding many times. Later experiments with the WGTA showed that monkeys can also adopt other strategies such as win-shift/lose-stay. Monkeys can even shift from strategy to strategy as the task demands (McDowell & Brown, 1963a, 1963b).
These animals can learn general strategies that are independent of the particular objects that serve as S+ and S–, even more general than broad stimulus qualities such as color and shape. This ability to develop an overall strategy to solve a series of problems, regardless of the particular stimuli, is really much more intelligent than the hypotheses or the cognitive maps proposed by theorists such as Krechevsky and Tolman.
In the original 1949 study, Harlow gave his monkeys 50 trials per problem at first, and then gradually reduced the number of trials to 6 per problem since he was only interested in performance on the second trial and the first few after that. In later experiments, Levine, Harlow, and Pontrelli (1961) and Levine, Levinson, and Harlow (1959) gave different groups of monkeys 3, 6, or 12 trials per problem from the start and found that they required fewer problems to achieve learning sets—100% correct choices starting on Trial 2—if they had fewer trials per problem. That is, the fewer the trials per problem up to what must be a minimum of three, the faster the monkeys learned the winning strategy.
In the early problems, as Fig. 13.7 shows, it took much more than 12 trials for a monkey to master a problem. At only three trials per problem, Harlow and his associates were certainly switching to new pairs of objects before the monkeys mastered the early problems. Other experiments have shown that monkeys can form learning set strategies with only two trials per problem. They can do this even if the first trial on each of 16 different problems appears on one day and the second trial appears on the next day with the sequence of the 16 problems randomly shuffled from Day 1 to Day 2 (Bessemer & Stollnitz, 1971).
Monkeys, at least, perform more intelligently with less drill and more varied experience. It is rather the reverse of the traditional doctrine that human learners progress faster if they are drilled on each succeeding problem until they have overlearned it—the “one-step-at-a-time” strategy of the Skinnerian behaviorists. The problem may be that conventional reinforcement theory applies to habit and skill rather than to intelligent strategies. Habits and skills depend on rhythmic repetition, actually getting into a rut. Intelligent problem solving is quite different. A problem usually begins when a well-practiced habit fails. The more practice, the more difficult it is to abandon the old solution and try new solutions.
PROBLEM SOLVING VERSUS HABIT
Here is a little test that you can try on trusting friends. Try asking them to pronounce the words you spell out loud. Try M - A - C - D - U - F - F. Then a few like M - A - C - T - A - V - I - S - H, or M - A - C - H - E - N - R - Y, and then, perhaps, M - A - C - H - I - N - E - R - Y. Presented orally, this little test in negative transfer trips up many English-speaking people.
R. A. Gardner and Runquist (1958) used a variant of this pronunciation test to study an everyday type of problem-solving skill in college students. Each student subject had the following type of problem. They were to imagine that they had three empty jars, and each jar was a different size. The problem was to measure out a fourth volume of water using just those three jars and an unlimited supply of tap water. Each problem consisted of a 3 × 5 inch index card with four numbers on it as in the following example:
A B C X
50 81 7 17
It could take the average undergraduate a few minutes to solve this problem by filling Jar B with 81 units and then measuring out 50 units into Jar A and 7 units out into Jar C twice. They had to indicate this by writing 81 – 50 – 7 – 7 = 17 on the card for that trial. Each student solved 6, 11, or 21 problems of the same kind one after the other in rapid succession as a training series. Each problem had different numbers but could only be solved by the same formula as the first, B - A - C - C. At the end of the training series they got the following problem:
A B C X
21 52 9 12
This problem is much easier than the problems in the training series. It can be solved by writing 21 – 9 = 12. After training with the difficult problems, the easy problem became very difficult. After solving 21 examples of B - A - C - C problems, some subjects worked for more than 10 minutes before they could solve the first A - C problem. The students acted as if conditioned to solve all the problems with B - A - C - C. They failed, of course. But, they repeated the same thing again and again as if blaming their failure on some error in their arithmetic. Next, they tried variations of B - A - C - C, such as B - C - C - A or B - C - A - C. They acted as if they had to extinguish the solution that had worked so many times in the past by repeating it and failing over and over again. Next, they tried other difficult solutions with many steps. But, these also failed. They acted as if they had to extinguish the strategy of looking for a complicated solution before they could try something simple like A - C.
With practice, the subjects took less and less time to solve the hard problems as shown in the line labeled “Pre-Test” in Fig. 13.8. If speed of response measures strength of conditioning, then the stronger the conditioning the less time it should take to solve the training problems, and that is what happened. The more practice they had with hard problems, the longer it took them to solve the first easy problem as shown in the line labeled “Extinction” in Fig. 14.3. That is just what should happen if they had to extinguish the hard solution before they could solve the easy problem.
Now, if the students had to extinguish the hard solution before they could solve the easy problem, then that one trial of extinction should bring all of the subjects down to the same level of extinction of the hard solutions. To test this, Gardner and Runquist followed the easy A - C trial with one more problem of the hard type, B - A - 2C. All of the subjects took about the same time to solve the postextinction problem, even though some had practiced that solution for only 6 trials during the training series and others had practiced it for 21 trials. This appears in the line labeled “Post-Test” in Fig. 13.8. The Post-Test averages are not precisely level, but they come very close and the differences were statistically insignificant. The substantially equal performance of all three groups on the Post-Test agrees with the notion that they had to extinguish the hard solution to a similar level before they could try the easy solution.
The response times in the graph of Fig. 13.8 are in logarithmic units. This is a common practice with time scores because the actual results of response time measures are often badly skewed. That is, there was a floor on the scores because the subjects had to take a few seconds at least to write their answers. But there was no ceiling; they could take a few minutes, or in the case of extinction trials, many minutes to solve the problem, and many did. The very high scores tend to distort the averages, but using logarithm of time scores often corrects the problem.
The highly intelligent college students in this experiment discovered a relatively difficult solution to a problem and got better and better at solving fresh problems with the same solution. The more they practiced the difficult solution, however, the harder it got for them to solve a much easier problem. The Post-Test results show that the habit of trying the difficult solution first made it harder for them to find the easy solution (see A. S. Luchins & E. H. Luchins, 1994, for a history of the use of water jar problems to study the negative effect of drill on human problem solving).
There is a lesson here. Most of the research discussed in the early part of this book deals with habits and skills, with well-practiced rhythmical patterns of response to predictable rhythmical patterns of stimuli. Habit and skill are essential modes of response. What happens when a well-practiced habit fails? Many people start out for work or school by getting their things together and going out to the car. They start the car and drive off. Often, they cannot remember the well-practiced steps that got them to their destination. But, suppose that the car fails to start when you turn the key in the ignition. Then you have a problem. Most people turn the key a few more times to make sure. Those who persistently turn the key in the ignition over and over again are failing to face the problem. The most likely result for them is a dead battery and more problems. You can also start swearing or go out and kick the tires. Responses of that sort can be well-practiced habits also.
Clearly, there is a conflict between habit and problem solving. Both are necessary for survival, but they interfere with each other. Even simple problem-solving strategies like the learning sets formed by Harlow’s monkeys suffer from overtraining, but thrive in conditions that foster variability. A truly powerful theory must account both for well-practiced habit and for variable problem solving. The lesson for the student is to profit from skills, and even to be persistent in the face of failure, but at the same time to recognize when failure reveals a problem that requires a fresh and variable attack.
COMPARATIVE INTELLIGENCE AND INTELLIGENT COMPARISONS
At the beginning of the 20th century, Köhler (1925/1959) studied a group of captive chimpanzees on the island of Tenerife off the coast of Africa. Köhler invented ingenious problems that the chimpanzees had to solve with objects that they could find in a large testing arena. To get a banana suspended from the ceiling, they had to drag a large box from a distance to a point under the lure. When Köhler raised the banana higher, the chimpanzees had to drag two, three, and even four boxes and then stack them before they could reach the lure.
Köhler’s chimpanzees also solved problems by pulling in lures attached to strings and by reaching through barriers with sticks. In a particularly difficult, and therefore interesting, problem the chimpanzee had two bamboo sticks to work with, but neither stick was long enough to reach the lure. At least one young chimpanzee, named Sultan, spontaneously inserted the thin end of one stick into the hollow end of the second, thus making a longer stick. Sultan then ran with this new object to draw in food that was out of reach with either of the short sticks. Sultan extended this skill to a situation in which the food was so far away that he had to join three sticks together. The three-stick tool was awkward to handle, so Sultan would shorten the tool by disconnecting sections of bamboo as he drew the food closer (pp. 113–119).
Köhler revealed intelligence by giving chimpanzees interesting problems to solve and keeping them in relatively free and stimulating conditions, as opposed to the usual mind-numbing confinement of caged life. As Köhler (1925/1959) put it:
… there has arisen among animal psychologists a distinct negativistic tendency, according to which it is considered particularly exact to establish non-performance, non-human behavior, mechanically-limited actions and stupidity in [nonhuman] animals…. For my part, I have tried to be impartial, and I believe that my description is not influenced by any emotional factor, beyond a deep interest in these remarkable products of nature. (p. 241)
Köhler’s chimpanzees were captured in the jungle and sold to the laboratory by traders, so he could not know their past experience or even their ages at the beginning of his experiments. Later, working with laboratory-born and -reared chimpanzees, Birch (1945) discovered how skill with sticks depended on past experience with sticks. Birch studied six juvenile chimpanzees who were between 4 and 5 years old, which is quite young for chimpanzees (see chap. 14). He gave them a series of problems that they had to solve by using strings and sticks to reach food on a special table just outside of their testing cage.
After a series of string problems, Birch gave each chimpanzee the problem shown in Fig. 13.9. One end of a stick was within reach of the chimpanzee. The other end had a cross-bar. Near the cross-bar was the usual banana. All the chimpanzees had to do was to pull on the stick and rake in the food. Of the six subjects, only Jojo solved the problem by raking in the lure. A second subject, Bard, solved the problem without using the stick as a rake. While agitated by frustration, Bard happened to hit the cage end of the rake in such a way that the other end hit the food and moved it closer to the bars of the testing cage. After observing this, Bard hit the stick repeatedly until he could reach the food with his hand. The remaining four chimpanzees failed to solve the rake problem within one hour.
One striking difference between Jojo and the other five chimpanzees was that Jojo had previous experience using a stick as a tool. She had already taught herself to use a stick to operate a light switch that was outside of her living cage, and had also taught herself to use a stick to unscrew light bulbs from a socket that was also outside of her living cage.
Birch next released the chimpanzees as a group into a large outdoor area and left them there to run and climb and play freely for 3 days. Birch had left many 1 × 1 inch sticks—some 16 inches long, some 23 inches long—for the chimpanzees to find in the play area. Providing that their previous confinement has not been too severe or has not lasted too long, all chimpanzees play with any objects they can find. All six chimpanzees of Birch’s study group were soon playing with the sticks, running with them, hitting with them, and poking with them. The sticks evoked manipulation. After only 3 days of free play with sticks, Birch returned the chimpanzees to the rake problem and the slowest of them took only 20 seconds to rake in the food.
Birch then gave the chimpanzees eight more problems in which they had to use straight sticks without cross-bars to bring in food that was out of reach. In the first problem the stick was beside the food, just as in the rake problem. In another problem it was entirely inside the cage, but near the bars separating the chimpanzees from the lure. In another problem the stick was at the opposite side of the cage from the bars, so the chimpanzees had to turn away from the lure to find the stick. In another problem the chimpanzees had to use a short stick to bring in a longer stick and then use the longer stick to bring in the food.
After eight problems of that sort, Birch gave them a ninth problem that was quite different. In the ninth problem, Birch let the chimpanzees watch him place food inside a length of pipe. The only way that the chimpanzees could get the food out of the pipe was by poking it out with the stick that Birch left lying near the pipe. This problem baffled these otherwise clever and stick-wise chimpanzees. The first thing that they all did, of course, was to pick up the stick and run to the bars to find the food on the table outside of the cage because they had practiced that so many times before. Within the allotted hour, only three of them stopped trying the old problem solution and took the stick back to poke through the pipe. Jojo solved the new problem faster than the others, as we might expect. The remaining three became frustrated and agitated when they failed to find any food outside of the cage, and eventually lost all interest in the problem. The young chimpanzees behaved very much like the college students in Gardner and Runquist (1958) and in many other studies of humans reviewed by Luchins and Luchins (1994).
Chapter 9 described an experiment in which Harlow et al. (1950) left special latching hardware mounted on the walls of cages for monkeys to solve without any extrinsic reward. Figure 13.10 is a diagram of the latching devices. To unlatch these devices, a monkey had to remove the pin attached to the chain, then lift the hook holding the hasp in place, then open the hasp by swinging it on its hinge. A monkey had to perform all three steps in that fixed order to open the hasp.
There were two groups of four monkeys in this experiment. In Phase I, the experimenters set one latch in each cage for 12 days. For Group A the latches were assembled as shown in Fig. 13.10. For Group B the latches were disassembled with the hasps in their open position. The experimenters checked several times a day and reassembled any of the Group A latches that they found opened. They also checked the latches of Group B in order to reset them to their disassembled positions, but none of the monkeys in Group B ever reassembled a latch.
Phase II tested for learning during Phase I and consisted of 10 1-hour tests for each monkey in both groups with the latches assembled. The monkeys in Group A disassembled their latches within the hour on 31 out of these 40 opportunities and they usually succeeded within a minute. The monkeys in Group B succeeded in disassembling the latches on only 4 of these 40 opportunities and none of them succeeded within 1 minute. Clearly, experience in Phase I developed a skill in Group A, even though the monkeys never received any extrinsic rewards in either Phase I or Phase II.
In Phase III the monkeys in Group A had to open the hasps to get food reward. During Phases I and II, the experimenters also adapted each monkey in Group A to the WGTA shown in Fig. 13.6 for 1 hour each day. As in the usual preliminary procedure for the WGTA, both food wells contained food, in this case raisins. At first the wells were open, then they were covered with identical junk objects and the monkeys had to displace the objects to get the raisins. After this experience and immediately after Phase II, the monkeys found a single food well in the WGTA closed by an assembled latch. After the test with food reward in the WGTA, the experimenters brought each monkey back to its home cage and reassembled a latch attached to the wall as before (Fig. 9.2) after sticking a raisin under the hasp so that the monkey had to disassemble the latch to get the raisin. Immediately after the test with food under the hasp, an experimenter reassembled the latch in the home cage without food and again in full view of the monkey.
Food incentives significantly disrupted performance. Only one of the monkeys succeeded in opening the hasp to get the food in the WGTA. Three out of the four monkeys succeeded in opening the hasps in their home cages to get food rewards, but all four succeeded after the experimenters removed all food. Even when they succeeded, the monkeys made many more errors in Phase III than they had in Phase II. The main type of error was anticipatory; they tried to open the hasp without disassembling the rest of the latch. Food incentive disrupted performance by inducing anticipatory errors as in the experiments reviewed in chapter 10. The monkeys improved markedly when food incentive was withdrawn at the end of Phase III, but they continued to make more errors than they had before the experimenters introduced extrinsic reward.
Near the end of the 20th century, Visalberghi and Limongelli (1994) carried on the tradition of testing intelligence by practicing clever, nonhuman animals on repetitive tasks. Their subjects were four capuchin monkeys who had learned to get a piece of candy out of a transparent tube by poking with a stick. Three of the monkeys were young adult females between 5 and 8 years old and had years of various kinds of unspecified laboratory experience including “more than 80” repetitive trials of overtraining with the transparent tube. The fourth subject was a 3-year-old juvenile female who had much less experience with laboratory drill and who had only “about 20” successful trials with the transparent tube.
Visalberghi and Limongelli next gave the monkeys the following challenge. They modified the transparent tube so that it had a trap in the center as shown in Fig. 13.11 and they placed the food randomly either to the right or to the left of the trap. When the food was on the right in the diagram of Fig. 13.11, a monkey could only poke it out by inserting the stick through the left end of the tube. If a monkey poked from the right end, then the food fell into the trap where it was impossible to reach by any means available to monkeys.
The three well-practiced adults adopted a strategy of always poking from the same end of the tube although sometimes they switched ends across days. In this way they earned food 50% of the time, with stereotyped, repetitive habits (chap. 11). The three adults stuck to this strategy for the entire course of 140 trials. Visalberghi and Limongelli concluded that the overtrained adults, that they described as “expert tool users,” failed to understand the problem in spite of their training.
A relatively inexperienced juvenile female, Rb, did solve the problem about halfway through the course, but the experimenters kept her repeating this for another 40 or 50 trials. Doubting that Rb understood tools better than the well-trained adults, Visalberghi and Limongelli tested Rb further. In the first test, for example, they rotated the tube so that the trap was above rather than below the path of the food. Starting from either end of the tube in this arrangement, Rb could poke the food safely past the trap, but she stuck to the strategy she had practiced so well before. She always started the stick from the end of the tube that was farthest from the lure, and poked the food out through the near end. She persisted in this habitual strategy even with straight tubes that lacked any traps at all.
Visalberghi and Limongelli concluded that the now well-practiced Rb also failed to understand the problem. All she had learned, according to them, was to poke the food out by starting the stick from the far end. On their part, the experimenters failed to consider the uniformly negative effect of mind-numbing drill on both human and nonhuman animals throughout the history of research on problem solving.
SUMMARY
Extending the findings of earlier chapters to problem-solving strategy reveals a basic conflict between skill at repetitive sequences and flexible attack on new problems. chapter 11 showed how partial reward increases resistance to extinction because it separates sequential skill from reward. Repetitive practice with minimum reward perfects skill at repetitive tasks, but tolerance for failure delays, even prevents, problem solving.
Early cognitive approaches like Krechevsky’s transposed the effect of contingency to the reinforcement and extinction of abstract hypotheses without taking into account the practical benefit that a learner gets from general experience with experimental procedures. Later cognitive approaches like MacKintosh’s transposed Skinner’s notions of stamping in responses to the idea that overtraining stamps in attention to stimulus dimensions. In both cases, support for the cognitive position depended on confusion about noncognitve alternatives and the operational definition of efficient problem solving.
Harlow’s research on problem solving by monkeys revealed the development of problem-solving strategies and showed how repetitive drill and extrinsic reward retard problem solving in monkeys as well as human beings. The notion that rigorous application of Skinnerian reinforcement is the best way to train other animals has made them fail at laboratory tasks. The next chapters of this book extend these findings to research on teaching sign language to chimpanzees.
18.116.42.137