CHAPTER 5
Reinforcement Versus Expectancy
Possibly the longest running dispute in the psychology of learning concerns two principles: response contingent reinforcement and cognitive expectancy. Do rewards and punishments automatically stamp in the association between stimuli and responses? Or, is much if not all of learning a matter of building up cognitive expectancies about future events? During the middle of this century, this dispute flared up to the point where almost all experimenters seemed to be involved in one way or another. For about 30 years laboratories seethed with activity and the quality of research and argument yielded a rich harvest of solid findings. This chapter considers the theoretical roots of the dispute and some of the lasting findings and insights that came out of it. The basic questions remain, but significant advances in operational definition and experimental design grew out of this dispute.
RESPONSE CONTINGENT REINFORCEMENT
S-R-S* Contingency
In this book the expression S-R-S* represents conditioning by contingency in which an S* is contingent on the performance of a particular response to a particular stimulus. In Fig. 4.2, the S* acts backward to strengthen or weaken the S-R association. In the case of an appetitive S* like food, the effect is to strengthen the bond between an arbitrary stimulus, Sa, and an arbitrary response, Ra. In the case of an aversive S* like shock, the effect is to weaken the bond between Sa and Ra. Learning, according to this principle, is brought about by the automatic effect of an S* on an arbitrary response to an arbitrary stimulus. The principle is called the law of effect. In this view, the association between Sa and Ra is arbitrary in the sense that any response can be conditioned to any stimulus by the arbitrary action of the law of effect.
According to the law of effect:
1. Motivation is necessary for learning.
2. Animals learn a response to a stimulus (with the understanding that there can be patterns of responding and patterns of stimulation).
3. The motivational consequences (rewards, punishments) of a response to a stimulus determine whether or not learning takes place.
The S-R-S* formula seems to be designed for a single, highly simplified form of laboratory learning. How can reinforcement theorists adapt this principle to more complex, sequential behavior such as playing a game of tennis, speaking a foreign language, or managing a business?
Goal Gradient
Often, particularly in the Skinnerian tradition, the response in instrumental conditioning is treated as a single act ending in a reward. Many of the facts about common forms of instrumental learning, however, indicate that this view is oversimplified and misleading. The instrumental response always requires a series of movements. Even in a straight alley, rats make a series of running responses that take them from the start box to the goal box segment by segment. The advantage of studying rats in mazes is that experimenters can monitor progress through each segment of the maze. Hopefully, this information can reveal the way individual units fit into a skilled sequence. This could be the bridge relating simple conditioning to more complex behavior.
A bottom-up view of the process leads to questions such as, why rats take shorter rather than longer routes through a maze, why they eliminate some blind alleys before others, and why mazes with different patterns vary in difficulty. The answers to these questions become more reasonable when we analyze the act in terms of its component segments and the stimuli that begin and end each segment.
Given enough trials, rats solved mazes as complicated as the Hampton Court maze in Fig. 3.2. This is a challenging problem for conditioning theory. On the first trial, and many others, rats made many errors and went down many blind alleys, eventually getting to the food at the goal. If the food rewards the whole performance, then the food in the goal box must reward errors as well as correct turns. How can rats improve under those circumstances? The obvious answer is that they get the food sooner and with less effort when they make correct turns. But, in a maze as complicated as the Hampton Court maze, how can rats remember all the turns they made on each trial and compare time and effort on different trials?
In multiple-unit mazes such as those at the bottom of Fig. 3.3, distance from reward is a critical variable. Rats usually master the choices that are closer to the goal box before the choices that are farther from the goal box. There is also a clear-cut speed gradient; in general, rats run faster as they near the goal. This phenomenon is called the goal gradient, or delay-of-reinforcement gradient. C. L. Hull, one of the two or three most influential learning theorists of the 20th century, formulated a goal gradient principle as follows: “The mechanism … depended upon as an explanatory and integrating principle is that the goal reaction gets conditioned the most strongly to the stimuli preceding it, and the other reactions of the behavior sequence get conditioned to their stimuli progressively weaker as they are more remote (in time or space) from the goal reaction” (1932, pp. 25–26).
Figure 5.1 is a diagram of Hull’s analysis of an act into segments. In a maze or runway, each segment contains somewhat different stimuli. That is, each segment of the runway looks a little different from the others because it has a slightly different pattern of stains, knotholes, nails, bolts, temperature gradients, light gradients, and so on. As rats run through a maze, each stimulus becomes a stimulus for running to the next segment until they get to the goal box, which presents stimuli for approaching the food dish and eating. The goal gradient principle says that these S-R connections are stronger or weaker depending on whether they are closer or farther from the reward. Hull’s system is a bottom-up system, working step by step through each unit of a series of stimuli and responses.
Learning the Shorter Path to a Goal
If there is more than one path to the goal box, as in the maze plan of Fig. 5.2, animals learn to take the shorter path. According to the goal gradient principle, the responses involved in a correct run should, therefore, be stronger, as illustrated in Fig. 5.2. More generally, animals should learn to take the shortest of several alternative paths to a goal.
Order of Elimination of Blinds
A similar application of the goal gradient principle predicts the backward order of elimination of blinds in maze learning. Performance in a multiple-unit maze is a more complicated case of learning the shortest route to a reward. Errors, such as entries into blind alleys and retracings, are all rewarded eventually, but a correct run is rewarded sooner. To illustrate, suppose that entering a blind alley adds 5 seconds to the time required to run the maze, and therefore, 5 seconds’ delay of reinforcement. Consequently, a correct response is always rewarded 5 seconds sooner than an error. Suppose that, after a correct choice at the last choice point, it takes 5 more seconds to reach the food in the goal box. Then a correct choice is rewarded after 5 seconds and an incorrect choice is rewarded after 5 seconds plus 5 seconds or 10 seconds. Suppose further that, after a correct choice at a point in the middle of the maze, it takes 50 seconds to get to the goal box. In that case, a correct choice would be rewarded after 50 seconds and an incorrect choice after 55 seconds. Suppose finally that, after a correct choice at the first choice point at the beginning of the maze, it takes 100 seconds to reach the goal box. In that case, a correct choice would be rewarded after 100 seconds and an incorrect choice after 105 seconds.
In most psychophysical dimensions, relative differences are more important than absolute differences. Accordingly, in Hull’s theory, the difference in habit strength depends on the ratio of delays. For the last choice point in the hypothetical example, the ratio of delay would be (10 – 5)/10 = 5/10. For the middle choice point the ratio of delay would be (55 – 50)/50 = 5/50, and for the case of the first choice point (105 – 100)/100 = 5/100. If difficulty depends on relative delay, then the last choice point before the goal should be the least difficult because the relative delay is greatest and the first choice point after the start should be the most difficult because an incorrect choice at that point causes the smallest relative delay.
Experimental tests in multiple-unit mazes have roughly agreed with this prediction but other factors complicate the picture. Mechanical inertia favors a forward-going tendency, and centrifugal swing favors running off at a tangent after a turn to take the opposite turn at the next choice point. Rats also tend to orient in the direction of the goal box, or sometimes their home cages, leading them into blinds that point in certain directions. Finally, they tend to make anticipatory errors, that is, to make the response that is correct at the last choice point—-just before the reward—too soon. Because of these factors rats seldom eliminate blinds in a perfect backward order and the usual result is only a rough approximation.
The goal gradient principle predicts that the correct turn at the last choice point in a multiple-unit maze will be learned first and best. Once learned, however, the last response will tend, on the basis of stimulus generalization, to appear at other choice points, that is, in anticipation of the final choice point. Whether such anticipation aids or interferes with learning the maze depends on the maze pattern.
Figure 5.3 is a drawing of a maze used by C. L. Hull (1939, 1947), and Sprow (1947) to study patterns of error in a linear maze. The upper panel of Fig. 5.4 shows how the correct path through the maze may require either a series of identical responses (homogeneous case) or a series of different responses (heterogeneous case). In the homogeneous case, anticipation is always correct and aids learning, while in the heterogeneous case anticipation is always an error and interferes. There are also perseverative errors when the learners repeat their previous choices. In the homogeneous case, perseveration is always correct and aids learning, while in the heterogeneous case, perseveration is always an error and interferes. Both anticipation of later responses and perseveration of earlier responses interfere with learning in the heterogeneous maze, so the middle of the maze should be more difficult than the beginning and the end. The lower panel of Fig. 5.4 shows that the pattern of errors agrees with the expected sources of error.
Speed-of-Locomotion Gradient
The same reasoning that predicts the order of elimination of blinds applies to speed of running. The goal gradient alone leads us to expect faster running as the animal approaches the goal. Once again, there are mechanical factors that influence the actual gradient. In a straight alley, in particular, rats that build up their speed as they run must decelerate somewhat before they get to the end of the alley or else they will crash into the end wall. Results of a typical experiment appear in Fig. 5.5.
Figure 5.5 presents data obtained by Hull (1934) on the time spent by rats in various segments of a 40-foot straight runway. As expected, the animals ran slowest in the early maze sections and progressively faster as they approached the goal, slowing up slightly in the final segment. Late in training the curve flattens because running times are reaching the absolute limit of zero. The top curve shows how the gradient continued, even accentuated, during extinction. Later, Drew (1939) showed that the gradient occurs in many situations and becomes more pronounced under massed as compared to distributed practice.
The goal gradient principle is only an illustrative sample of the way a representative bottom-up theory can work to describe specific results of specific experiments. Hull (1952) constructed an elaborate theoretical system which involves many additional functions that are far beyond the scope of this book.
COGNITIVE EXPECTANCY
Theories of cognitive expectancy differ from theories of response contingent reinforcement in that learning is perceptual and top-down rather than motor and bottom-up. Human and nonhuman animals learn to expect events. Their perception of the situation determines what they do. Rewards and punishments cannot strengthen or weaken responses, directly. Perception of rewards and punishments creates expectancies. An executive entity at the top of the system commands the lower motor entities according to these perceived expectancies. Cognitive theories are top-down systems.
Learning, according to expectancy theory, consists of perceiving a stimulus and its significance. With experience in a T-maze, a rat perceives that the food is at the end of one arm. If the rat is hungry, this expectancy motivates running down that arm of the maze. The rat “sees” that such and such a stimulus followed by such and such behavior leads to the goal box. In this way, the rat forms a cognitive map or representation of the maze somewhere in its brain. The executive entity in the brain uses this cognitive map to direct the motor entity to the reward.
Expectancy theory accounts for all of the findings covered by the goal gradient principle. With a cognitive map of the maze, rats can select the shortest path to the food in the goal box. Rats run faster as they approach the goal because the expectancy of reward is greater, and so on. Top-down principles of cognitive expectancy seemed to predict some additional findings that contradict the bottom-up principles of response contingent reinforcement, and this is what took the dispute out of the debating halls and into the laboratories.
Unrewarded Trials
If all that rats learn by exploring the maze is a cognitive map, then a rat should learn just as much whether or not it is hungry and whether or not there is food in the goal box. According to cognitive expectancy theory, food makes a hungry rat take the shortest path to the goal box because the executive center at the top evaluates the expected value of the available actions and commands lower, motor centers to respond appropriately. Cognitive expectancy theory distinguishes between the motivated performance that appears in the maze and the cognitive learning that governs performance.
A rather straightforward experiment should reveal whether reward in the goal box is necessary for learning or only necessary for performance. Rats in one group run through the maze as usual with reward in the goal box on every trial starting with Trial 1. Rats in a second group run through the maze for the same number of trials but without reward in the goal box during Phase I. Both reinforcement theory and expectancy theory predict that only the rewarded group should improve during Phase I. According to expectancy theory, however, both groups should learn the same amount during Phase I because learning is the cognitive result of experience in the maze. What the unrewarded group learns in Phase I remains latent, however, until the experimenter puts food in the goal box and shows the rats in this group that it is worthwhile to proceed to the goal box by the shortest route.
To test this prediction, both groups find food in the goal box during Phase II of the experiment. Phase II tests whether the previously unrewarded group profited from their latent learning during Phase I. According to cognitive expectancy theory, both groups should perform equally well during Phase II because both groups learned roughly the same amount during Phase I.
According to reinforcement theory, on the other hand, the unrewarded group learns nothing during Phase I because they receive no rewards. Without an S*, there is no strengthening of the association between Sa and Ra. Therefore, the performance of the experimental group should start at about the same level as the performance of the control group when they first found food in the goal box.
Tolman and Honzik (1930) was one of the first experiments to test for latent learning after unrewarded trials. Tolman and Honzik ran hungry rats in a 14-unit maze like the one shown in Fig. 5.6. They used this fairly elaborate, complex maze, reasoning that a more complicated maze would require a better cognitive map and thus amplify the difference between learning by cognitive expectancy and learning by response contingent reinforcement.
In the Tolman and Honzik experiment, one group found food in the goal box at the end of the maze on every trial beginning with the first trial. This is, of course, the usual procedure. This group showed a steady decline in errors throughout the experiment, which is, of course, the usual finding. The second group never found any food in the goal box and performed much more poorly than the first group throughout the experiment, as any theory would expect. The critical third group ran the maze from beginning to end without finding any food in the goal box for 10 trials. Then, on the 11th trial and every trial after that, they found food in the goal box.
Figure 5.7 shows the results, which are typical of many similar studies using the same design to test for latent learning. The third, latent learning group performed very poorly for the first 11 trials. After their first reward at the end of the 11th trial, however, the latent learning group improved rapidly, so rapidly that on the very next trial, the 12th trial, they actually equaled the performance of the group that found food in the goal box starting from Trial 1. After a single food reward, the experimental group performed just as well as the first control group did after 11 food rewards. Indeed, in Tolman and Honzik (1930), the experimental group slightly outperformed the control group from the 12th trial onward.
Soon, many other experimenters replicated the basic results of the Tolman and Honzik (1930) experimental design (see Kimble, 1961; MacCorquodale & Meehl, 1954; for extensive reviews). Sometimes the latent learning groups took more than one trial to overtake the animals that were rewarded throughout, but a very few rewarded trials were always enough to bring the latent learning groups up to the performance of animals that had been rewarded many more times for running in the same maze. We cannot doubt the reliability of this experimental finding, but does that mean that there can be significant, even dramatic, amounts of instrumental learning without any reward? Is learning a cognitive phenomenon that depends on exposure to the stimulus situation, while reward only provides the incentive for performance?
Karn and Porter (1946) entered this arena after several additional experiments had thoroughly replicated the main results of Tolman and Honzik (1930). Karn and Porter replicated the main conditions but they asked additional questions. They looked at their subjects as living animals that learn many things in the course of an experiment, more things than the plan of the maze and the location of food. By broadening the scope of their experiment beyond the traditional dispute between reinforcement and expectancy, Karn and Porter discovered more about living animals. Just as the warring parties in the dispute ignored the nature of their living subjects as irrelevant to theory, so Karn and Porter showed that the dispute was, itself, irrelevant to essential aspects of a learning situation. Karn and Porter were among the first to discover important features of a learning situation that the grand theorists usually ignore, features that have more direct implications for human and nonhuman learning outside of the laboratory. This book describes and discusses this particular experiment in detail because its implications extend far beyond the latent learning dispute that inspired it.
Karn and Porter (1946) used the special maze illustrated in Fig. 5.8. It is called a Dashiel maze after its inventor, John Dashiel. The distinctive feature of a Dashiel maze is that, if an animal is put into the maze in compartment A and finds food in compartment C, then there are many paths that all lead to food. At most of the choice points in this maze animals can choose between four alternative alleys, two leading toward the goal box and two leading away from the goal box. Responses that lead toward the goal box count as correct responses; those leading away count as errors. Correct and incorrect responses in a Dashiel maze depend on orientation toward the goal, which should make it especially appropriate for studying cognitive maps.
Table 5.1 outlines the design of Karn and Porter’s (1946) experiment. In this experiment, Phase III was the usual training condition for maze learning. Rats were deprived of food for 23.5 hours each day, and then, once each day they were placed in the maze in compartment A and left in the maze until they found food in compartment C where they ate for 1 minute from a dish of bread soaked in milk (a preferred food for laboratory rats). During Phase III, each animal ran through the maze for one trial a day until it reached a criterion of two successive errorless runs.
The purpose of the experiment was to compare the effects of different kinds of preliminary training that the rats received before Phase III, including Tolman and Honzik’s preliminary trials without reward. The number of trials that each group took to reach the criterion of two successive errorless trials in Phase III measured the effects of these different earlier treatments.
Group 5 is like the group that Tolman and Honzik (1930) rewarded on every trial starting with the first trial, and Group 1 is like the latent learning group that had a series of unrewarded trials before finding reward in the goal box. Once each day in Phase II, which lasted for 5 days, the animals in Group 1 were placed individually in the maze in compartment A and then left in the maze until they reached compartment C where they were confined for 1 minute and then removed from the maze. After 5 days of Phase II, Group 1 began Phase III, which was the same as Phase II except for the food reward in compartment C. The first experience that Group 5 ever had in the maze was the rewarded treatment of Phase III—they never experienced Phase I or Phase II. Group 1 and Group 5 of Karn and Porter (1946) replicated the procedure of Tolman and Honzik (1930).
The results, shown in the last column of Table 5.1, also replicated the findings of Tolman and Honzik; once they found reward in the goal box, Group 1 learned much faster than Group 5. According to cognitive expectancy theory, this demonstrates that Group 1 learned a cognitive map of the maze during Phase II and used it to reach the goal box after they found food there. If contingent rewards such as food and water are necessary to stamp in S-R connections, then Group 1 should have performed at about the same level as Group 5 because their rewarded treatment in Phase III was identical.
The remaining three groups of the experiment show how experimental controls separate out critical components of a main result. The difference between Karn and Porter (1946) and Tolman and Honzik (1930) offers a classic lesson in operational definition. During Phase II, the experimenters recorded the time that each animal in Group 1 took to get from compartment A to compartment C each day. According to cognitive expectancy theory, the animals of Group 1 spent this time learning a cognitive map of the maze. Each animal in Group 1 was then paired with a particular animal in Group 2. On each day of Phase II, each animal in Group 2 was placed in the maze at random either in compartment A, in compartment B, in compartment C, or in compartment D according to a table of random numbers, and then taken out of the maze wherever it happened to be after it had spent the same amount of time in the maze as its yoked mate in Group 1.
This is called a yoked control. In each yoked pair, both animals spent the same amount of time in the maze. The only difference between them was that the yoked animal in Group 1 had to get from compartment A to compartment C on every trial before the experimenter would take it out of the maze. According to cognitive expectancy theory, the animals in Group 2 had just as much opportunity to form a cognitive map of the maze as their yoked mates in Group 1 because both groups spent the same amount of time exploring the maze and neither got any food reward during Phase II. If that is all there is to maze learning, both groups should have performed equally well in Phase III. As Table 5.1 shows, however, Group 1 performed much better than Group 2. This is a puzzle for both cognitive theory and reinforcement theory. Both groups got the same amount of exploration and neither got any food. According to expectancy theory both should have learned the same amount, and according to reinforcement theory neither should have learned anything. The results seem to refute both theories.
The advantage of Group 3 over Group 5 is also puzzling. The animals in Group 3 were taken out of their home cages once each day and brought to the maze room where they spent their yoked time in a plain, wooden box that was made of the same material and built as a duplicate of the starting box of the maze. In Phase III, Group 3 took more trials to reach criterion than either Group 1 or Group 2. This is only reasonable, because Group 3 never had any experience with the maze before Phase III. But, Group 3 still learned the maze faster than Group 5. Both reinforcement theories and expectancy theories depend on experience in the maze, so neither theory can cope with the difference between Group 3 and Group 5 because neither group had any prior experience in the maze.
In Phase I, which lasted for six days, the experimenters handled each of the animals in Groups 1, 2, 3, and 4 once each day. The five days of handling was the only difference between Group 4 and Group 5, and yet Group 4 mastered the maze faster than Group 5. Again, neither expectancy theory nor reinforcement theory can cope with the difference between Group 4 and Group 5 because neither group had any prior experience in the maze.
A procedure like Phase I in Karn and Porter (1946) became a standard procedure after the results of this and similar experiments conducted at about the same time. In most experiments with nonhuman subjects, the animals live in cages and the experimenters must take them out of their cages and bring them to the testing apparatus (which is often housed in a different experimental room) and then bring them back to their cages after testing. This is a disturbing experience for animals that have never before been handled by human beings. Modern experimenters take their animals out of the cages and handle them once a day for at least one week before the start of an experiment. When laboratories can afford the luxury of a breeding colony, human beings handle the animals from birth and this is an important part of the colony procedure. An animal that arrives in the maze stressed and agitated by its first experience with a human handler is definitely a different learner from an animal that is tame and calm when it enters the maze (Morato & Brandão, 1996).
The need for preliminary taming by handling was not generally appreciated at the time of Tolman and Honzik’s (1930) experiment. One of the pitfalls of abstract theorizing is that theorists and experimenters often think of their subjects as furry little test tubes and forget that they are talking about live animals that respond to everything they experience, not just to rewards and punishments and cognitive maps. Handling the animals in Group 4 helped them learn the maze faster, even though handling is irrelevant both to response reinforcment and to cognitive expectancy. Karn and Porter (1946) showed that handling plays a role in maze learning even if it plays no role in abstract theories.
Rats have to get used to a schedule of feeding once a day for only half an hour and few experimenters appreciated this at the time of the Tolman and Honzik (1930) experiment. As Table 5.1 shows, another difference between Groups 1, 2, and 3 and Groups 4 and 5 was that Groups 1, 2, and 3 got five days of adaptation to the schedule of 23.5 hours of deprivation and 0.5 hours of eating at the same time each day, while Groups 4 and 5 both got their first experience with food deprivation when they started rewarded training in the maze during Phase III. We can see from Table 5.1 that a large portion of the difference between Group 1 and Group 5 depended on differences in adaptation to the experimental conditions—handling, deprivation schedule, maze environment, and so on—quite apart from any difference between cognitive expectancy or contingent reward.
From the point of view of cognitive expectancy theory, the most perplexing finding in the Karn and Porter study is the advantage of Group 1 over Group 2 in Phase III. In Phase II, both groups spent the same average amount of time in the maze so both had the same opportunity to form cognitive maps, and also the same amount of time to become adapted to the maze environment. The difference was that the animals of Group 1 always started at compartment A and ended at compartment C, while the animals in Group 2 were placed in the maze at random and taken out wherever they happened to be at the end of their yoked study time. How can we explain this finding within either theory?
Suppose that the rats were unhappy in the maze or preferred being put back in their home cages or suppose that they enjoyed handling after they got used to it. During Phase II, the sooner the animals in Group 1 found their way to compartment C, the sooner they got out of the maze and back to their home cages, and during the trip they got handled by their familiar experimenter. Perhaps, getting out of the maze and getting handled by the experimenter was their reward and that is why they performed better than the animals in Group 2. Looking at Fig. 5.7, we see that the animals in the unrewarded conditions of the Tolman and Honzik (1930) experiment also improved. Many other replications of the Tolman and Honzik experiment confirmed this result. Indeed, in some of these experiments (e.g., Meehl & MacCorquodale, 1951) the so-called “unrewarded” animals performed almost as well as the rewarded animals before receiving the first reward. Perhaps, all of the so-called “unrewarded” groups in Tolman and Honzik type latent learning experiments got rewarded for finding their way to the place where the experimenter took them out of the maze and handled them.
It certainly is reasonable to say that there are lots more ways of rewarding rats—or human beings for that matter—than by giving them food or water. While this move does get reinforcement theory out of trouble here, there is a heavy price. Reinforcement theorists must now admit that they have no independent operational definition of reward. They found out that being picked up out of the maze and handled was an incentive only after many experiments demonstrated learning without any other incentive. The value of a scientific theory lies in telling us more than we know already. Saying that the rats must have been reinforced because they learned and then saying that whatever they got must have been a reinforcement only tells us what we already know. They might as well say that the rats learned because they learned.
Some experimenters tried to measure the reward value of picking rats up out of the goal box and handling them. In these experiments, rats were taken out of a T-maze and handled after they reached the goal box at the end of one arm, but only confined briefly after they reached the goal box at the end of the other arm. In one experiment, the rats learned to go to the handling goal box, in a second experiment they learned to go to the confinement side, and in the third experiment they ran equally often to both sides (Candland, Faulds, Thomas, & Candland, 1960; Candland, Horowitz, & Culbertson, 1962; Sperling & Valle, 1964). If we follow the line of argument that whatever the rats learned must have been rewarding, then we have to say that handling was rewarding in the first experiment, punishing in the second experiment, and neutral in the third. That sort of theory cannot tell us anything in advance. It only gives us names for whatever we happen to find—after we find it.
Saying that taking a rat out of the maze and putting it back in the home cage is rewarding also removes the difference between cognitive expectancy and response reinforcement. This is because cognitive expectancy theory always predicts the same thing as response reinforcement theory when there is any incentive to get somewhere or do something. All of those experiments and all of those arguments only proved that latent learning experiments of this type cannot distinguish between cognitive expectancy and response reinforcement. Allowing theorists to invent incentives after the results are in is like giving them blank checks to pay for their mistakes.
The inconclusive results of experiments on latent learning after unrewarded trials teach valuable lessons in operational definition. First, many aspects of experimental treatment have crucial effects on learning quite apart from the hypothetical effects of stimulus response habits or cognitive expectancies. Theories about the behavior of live animals rather than furry test tubes must take these effects into account. That is the ethological view. Second, there is a serious weakness in both reinforcement theory and expectancy theory. Neither theory has a suitable operational definition of reward or punishment that is independent of the learning already observed in past experiments. Later chapters of this book return to both of these themes.
Free Exploration
In this type of latent learning, the experimenter places a hungry rat in a multiple-unit maze like the one in Fig. 5.6 and then goes home for the night. In the morning the experimenter places the rat in the start box and feeds it when it reaches the goal box. Animals in the control group spend the night in the same experimental room in a different apparatus, say a plain box, a simple straight alley, or a rectangular arrangement of four straight alleys. In most of these experiments, the animals that explored the maze during the night performed nearly perfectly when they ran the maze for rewards in the morning. The animals in the control groups performed like naive animals on the first trials of the usual experiment (see Kimble, 1961; MacCorquodale & Meehl, 1954; for extensive reviews). This certainly looks like learning without any possible source of reward. Or does it?
After several reports of latent learning following free exploration, some experimenters became curious enough to stay in the laboratory and watch the animals to see what they did during the night. They found that the rats spent a lot of time exploring the maze from one end to the other, as ethologists would expect of animals that normally live in burrows. The exploration was far from random, however; the rats soon started to avoid the blind alleys and spend more and more time in the true path running between the start box and the goal box. After a while each animal spent most of its time in the true path and very little time in the blind alleys. They found the true path and practiced running in this path without any food reward or any other help from the experimenters who never even touched the rats during the study periods. Where is the reward here and where is the expectancy?
MacCorquodale and Meehl (1951) reasoned as follows. Rats like to run freely through the maze, especially rats that live in small cages. They prefer the true path because it lets them run more freely. They avoid the blind alleys because these are confining. So far, all we have is another post hoc (after the fact) explanation of what we know already. But, MacCorquodale and Meehl went a step farther and formulated the following hypothesis. If rats avoid blind alleys because they are confining, then narrower alleys should be more confining than wider alleys. The effect should depend on the width of the alleys. Accordingly, they built mazes with narrower alleys and found that the narrower the alleys the more the rats avoided the blind alleys during free exploration. This is a classic example of a theoretical explanation that generated a prediction which was confirmed experimentally.
Notice that the fresh thinking of MacCorquodale and Meehl started when experimenters became curious about what the animals were doing during the periods of free exploration. Then they approached learning as ethologists. Parties to the latent learning dispute who were only interested in the final scores on test trials were content to go home for the night and forget about the animals till testing time in the morning.
By watching the animals as well as theorizing about them, Mac-Corquodale and Meehl discovered an unexpected incentive that favors the true path of the maze over the blind alleys. That is certainly a step up from saying that rats learn because they learn, but it opens up the possibility of an unlimited number of similar sources of reward that can appear unexpectedly in any learning situation. Without a rule that tells us in advance what will be rewarding and what will not be rewarding, reinforcement theorists are a long way from giving us a useful theory. And once again, after the source of incentive has been found, cognitive expectancy theories predict the same result that response reinforcement theories predict.
Irrelevant Drive
In this type of latent learning, experimenters deprive rats of one incentive, say water, and satiate them for another, say food. In Phase I, they typically run the rats in a T-maze with water in both goal boxes and food in only one of the two goal boxes. With equal experience in both goal boxes, the animals get to see that there is food in one goal box but not in the other. They never eat, however, because they are not hungry. After many such trials, the experimenters satiate the animals for water and deprive them of food for Phase II. If the rats now go to the food side, then they demonstrate latent learning because we cannot say that they were reinforced with food in that goal box if they never ate there. Or can’t we?
Once again, reinforcement theory turns out to be more complicated than people thought. When animals learned without eating, reinforcement theorists were quick to point out that the rats may have received secondary or conditioned reinforcement from the sight or the smell of food. Secondary or conditioned reinforcement is an integral part of reinforcement theory. We know that animals have to learn to recognize food. Therefore the sight of laboratory food is a learned or derived secondary reward rather than a primary reward based directly on eating. The concept of stimuli that acquire the power of reinforcement is extremely important and chapter 7 considers this concept in detail.
Reinforcement can explain any result of learning from experience with irrelevant drive. When animals learned even though they never ate during the exploratory phase, their reward could have been the sight or the smell of food. At the same time, when animals failed to learn from the sight of uneaten food in these experiments—that is, when rats now satiated for water and deprived of food continued to run equally to the food and the nonfood side—this also confirmed reinforcement theory because lack of latent learning always confirms reinforcement theory.
According to expectancy theorists, when rats failed to learn to go to the food side, it was because they were so thirsty that they failed to notice the food; they were literally blinded by their thirst. Several experiments attempted to avoid this objection by ensuring that the rats would attend to the food.
Walker (1948), for example, used the maze plan in Fig. 5.9 to test for latent learning about food by thirsty rats. In Phase I, Walker deprived rats of water and let them eat all they could in their home cages. In the maze, if they ran to the water goal box, he stopped them there and let them drink. If they ran to the food side, he let them continue past the food goal box until they got to the water goal box from the other direction, and then stopped them and let them drink. Doors at key points in the maze prevented the animals from retracing their steps. Walker made sure that the animals would have equal experience with both goal boxes by blocking one arm occasionally so that they had to go the other way. To make sure that the rats experienced the food, Walker piled it up in the food goal box so that they had to climb over the food on their way to the water.
In spite of experiments like Walker’s the results remained inconclusive. Once you grant the possibility that rats can be so blinded by thirst that they fail to notice food, then nothing really guarantees attention. At the same time, once you grant the possibility that the sight of food can be secondarily reinforcing without eating, then secondary reinforcement becomes indistinguishable from cognitive expectancy.
Experiments with irrelevant drive in Phase I yielded positive results that demonstrated latent learning and negative results that demonstrated absence of latent learning in roughly equal proportions. Since both sides of the dispute had an explanation for both positive and negative results, both sides could claim victory but neither side was actually supported. The dispute over this type of experiment revealed the importance of the concepts of secondary reward and attention. These concepts play parallel roles and predict parallel results in both types of theory. Yet, without clear operational definitions that tell us in advance just when an animal has received secondary reward or when an animal has paid attention, both reinforcement theory and expectancy theory fail as theories. Chapter 7 takes up the problem of secondary reward in detail.
SUMMARY
The latent learning controversy frustrated those who looked for a crucial experiment that could prove that response reinforcement theory fits the evidence better than cognitive expectancy, or vice versa. The rest of us can still profit from these experiments because they were good experiments. In their attempts to design rigorous experiments that would define the difference between reinforcement and expectancy, experimenters discovered serious logical defects in both theories. The chief problem is the lack of an operational definition for either primary or secondary reward. This problem is the subject of chapter 7.
Many important aspects of animal learning, such as the need to tame animal subjects and accustom them to the regimen of deprivation, came to light only when the latent learning controversy forced all sides to scrutinize every possible flaw of these experiments. All in all, the controversy enriched experimental method and enriched our understanding of the learning process.
Notice how experiments can discover much more about animal behavior by bottom-up experiments than by debating the grand hypotheses of top-down theories. When experiments on the value of unrewarded trials separated out aspects of experience in a maze, they found that adaptation to the learning environment is at least as important as incentive and reward. This is a general principle of all learning, directly applicable to human learning in factories and classrooms. It is a valuable principle whatever the fate of the controversy between the grandest theories of reinforcement and cognition.
In studies of free exploration, experimenters who were only interested in grand theories went home for the night and left the rats to study on their own. The only thing that interested these experimenters was the scores on tests in the morning. When experimenters who were interested in live animals started watching the animals, they discovered that rats gradually learned to keep to the true path of a maze without any input from the experimenter. They discovered something about rats in mazes that was quite beyond the imagination of the grand theorists.
3.141.7.186