CHAPTER 8
Contiguity Versus Contingency
Since Skinner invented the procedure in the 1930s, armies of experimenters have conditioned hordes of rats and pigeons to press levers and peck keys in Skinner boxes. The success of this procedure is supposed to support traditional, almost biblical, faith in reward and punishment. This chapter reviews the modern evidence that bears on this ancient belief.
SHAPING AND AUTOSHAPING
Instrumental conditioning in the Skinner box is easily and cheaply automated, which makes it cost-effective and popular with experimenters. The most inefficient step in the procedure used to be the wait for the first response. Without some intervention from the experimenter, a great deal of waste time often elapsed before a naive rat or pigeon first pressed the lever or pecked the key. Some impatient experimenters smeared the lever with moist food to induce the first lever-press. Skinner (1953) described the orthodox technique as follows:
We first give the bird food when it turns slightly in the direction of the spot from any part of the cage. This increases the frequency of such behavior. We then withhold reinforcement until a slight movement is made toward the spot. This again alters the general distribution of behavior without producing a new unit. We continue by reinforcing positions successively closer to the spot, then by reinforcing only when the head is moved slightly forward, and finally only when the beak actually makes contact with the spot….
The original probability of the response in its final form is very low; in some cases it may even be zero. In this way we can build complicated operants which would never appear in the repertoire of the organism otherwise. By reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time…. The total act of turning toward the spot from any point in the box, walking toward it, raising the head, and striking the spot may seem to be a functionally coherent unit of behavior; but it is constructed by a continual process of differential reinforcement from undifferentiated behavior, just as the sculptor shapes his figure from a lump of clay. (pp. 92–93)
Obviously, human and nonhuman animals learn many things every day on their own without the intervention of dedicated shaping that Skinner prescribed. The only reasonable claim that anyone could ever make for shaping is that it is efficient, speedier, longer lasting, in some practical way superior. There is only one way to establish such a claim, however, and that is an experimental comparison between the prescription and an alternative. In a long lifetime, Skinner never attempted any experimental comparison that could compare his prescribed methods with alternatives. After decades, a comparison did emerge when more adventurous experimenters deviated from accepted wisdom in an attempt to save themselves a little time.
Manual shaping is obviously labor intensive. Inevitably, experimenters began to look for labor-saving devices. In 1968, P. Brown and Jenkins reported a truly economical procedure, which they called autoshaping. They lighted up the response key for a pigeon in a Skinner box for 8 seconds. At the end of that time, an automatic device turned off the key light and delivered food no matter what the pigeon did. If the pigeon pecked the key while it was still lighted, the device turned off the light and delivered food, immediately. In either case, after an intertrial interval, the key was relighted and the cycle repeated. Soon, pigeons were pecking the key on their own. Autoshaping was as effective as Skinner’s laborious manual procedure.
The discovery of autoshaping generated a large volume of research that replicated and extended the early findings in great detail. Later experiments induced robust rates of pecking with food delivered only at the end of the light-on period, independently of anything that the pigeon did. D. R. Williams and H. Williams (1969) showed that they could maintain key-pecking if they only delivered food when pigeons failed to peck the key—that is, when they omitted food every time the pigeon pecked the key. In this omission contingency, free food evokes robust rates of pecking, at first. As a result of the negative contingency, the more the pigeons peck the less food they get. As food is omitted, pecking declines. When pecking declines, food is again delivered, pecking recovers, food is again omitted, and so on, indefinitely. Overall, this procedure maintains a robust rate of key-pecking.
Under the omission contingency, hungry pigeons behave as if they are trying to avoid food. When food stops they rest content, as if a painful stimulus has been removed. When food appears again, they hasten to make responses that stopped food in the past (see Schwartz & Gamzu, 1977, for an extensive review). This experimental result of omission contingency indicates that key-pecking in the Skinner box is an obligatory response like salivation in Pavlovian conditioning (as explained in chap. 4) and raises fundamental questions about the traditional distinction between classical and instrumental conditioning.
SUPERSTITION
The elaborate manual shaping—so carefully described and so painstakingly followed by Skinner and a generation or two of his faithful followers—was entirely unnecessary. Its popularity among experimenters could serve as a striking human example of what Skinner called superstitious behavior and described as follows:
If there is only an accidental connection between the response and the appearance of a reinforcer, the behavior is called “superstitious.” We may demonstrate this in the pigeon by accumulating the effect of several accidental contingencies. Suppose we give a pigeon a small amount of food every fifteen seconds regardless of what it is doing. When food is first given, the pigeon will be behaving in some way—if only standing still—and conditioning will take place. It is then more probable that the same behavior will be in progress when food is given again. If this proves to be the case, the “operant” will be further strengthened. If not, some other behavior will be strengthened. Eventually a given bit of behavior reaches a frequency at which it is often reinforced. It then becomes a permanent part of the repertoire of the bird, even though the food has been given by a clock which is unrelated to the bird’s behavior. Conspicuous responses which have been established in this way include turning sharply to one side, hopping from one foot to the other and back, bowing and scraping, turning around, strutting, and raising the head. (Skinner, 1953, p. 85)
The concept of superstitious behavior plays a central role in the traditional view of key-pecking as arbitrary behavior reinforced by feeding rather than obligatory behavior evoked by feeding. Skinner published only one report of any research observations to support the highly interpretive term, superstition, and that (1948) report never mentions any direct observation of a single adventitious conjunction between response and food. In a long lifetime, Skinner never reported any further direct observations of any such adventitions conjunction between response and reward. Nor, as late as 1997, has anyone else ever reported any direct observations that support Skinner’s purely speculative interpretation. Nevertheless, adventitious reinforcement survives as an argument for the arbitrary effect of reinforcement together with the suggestion that it is the mechanism of human superstitious ritual. Skinner’s “superstition” lived on for many years without any supporting evidence and in spite of the following contradictory evidence.
Staddon and Simmelhag (1971) published the first, fully reported, description of what hungry pigeons actually do when they receive noncontingent food at intervals. After repeated, noncontingent delivery of food, all of Staddon and Simmelhag’s pigeons developed the same habit—pecking at the wall above the food hopper. This is a sign of an obligatory as opposed to arbitrary effect of food arriving at intervals. Most of the pecking occurred just before each delivery of food. This is what we would expect if pecking is a prefeeding response evoked when food is about to arrive, just as salivation is a prefeeding response in Pavlovian conditioning.
Staddon and Simmelhag (1971) also reported other stereotyped behaviors, such as wing-flapping and circling movements, with individual variations from pigeon to pigeon. These resembled Skinner’s descriptions of individually stereotyped behaviors that he called superstitious. Individual variation is a sign of arbitrary as opposed to obligatory effects. These arbitrary variations appeared, however, early in the intervals long before food delivery so they could hardly depend on adventitious conjunction between the responses and food. Other experimenters soon replicated Staddon and Simmelhag (e.g., Fenner, 1980; Reberg, Innis, Mann, & Eizenga, 1978; Timberlake & Lucas, 1985).
EARNING VERSUS FREE LOADING
In the traditional view, lever-pressing and key-pecking entail arbitrary work that rats and pigeons perform in order to earn food or other necessities of life. The biological utility of the incentives must justify the biological effort, because animals would not press the bar or peck the key if they had an easier way to get the incentives. This proposition seemed so obvious to so many that no one thought to test it empirically before G. D. Jensen’s (1963) experiment on free feeding. Jensen was the first to offer rats a choice between pellets of food that they could earn by pressing the lever in the usual way, and a heap of identical pellets free for the taking from a convenient food dish.
Soon, many other experimenters replicated Jensen’s findings with pigeons as well as with rats. With an abundant supply of free food in front of them in the food dish, most animals earn some food by pressing the lever or pecking the key. Under some conditions, animals earned as much as 90% of the food they ate. So much for the cognitive interpretation that rats press the lever because images of food make them expect that lever-pressing brings them food.
It is tempting to suppose that operant conditioning turns rats into lever-pressing machines so thoroughly conditioned that they go on pressing the lever even when there is free food in front of their faces. Further experiments showed, however, that from their first experience in a Skinner box animals begin to press levers and peck keys if there is free food continuously available, and they acquire the habit of pressing levers and pecking keys before they have had any opportunity to receive reward for responding. Hungry animals press levers and peck keys even when the consequences of work are negative; that is, when they abandon the heap of free food to work at the lever or the key, they lose time that they could have spent eating (see extensive reviews in Inglis, Forkman, & Lazarus, 1997; Osborne, 1977, 1978).
AVOIDING FOOD
In 1951, Breland and Breland first described a kind of show business based on techniques they had learned as junior associates of B. F. Skinner. The Brelands taught chickens to bat baseballs and parakeets to ride bicycles; and some 38 different species acquired a wide variety of unlikely skills, which they displayed in museums, zoos, fairs, department stores, and—the ultimate achievement—television commercials. The popularity of the animal actors combined with the practical effectiveness of the conditioning techniques resulted in a profitable business. There were some failures, not random failures, but patterns of failure that plainly contradicted the behavior theory behind their otherwise so successful program of conditioning. The Brelands referred to these patterns as misbehavior (Breland & Breland, 1961).
In one display that the Brelands planned for the window of a savings bank, a pig picked up large wooden coins from a pile and deposited them in a “piggy bank” several feet away. After shaping with a single coin placed farther and farther from the bank, the pig progressed to four or five coins picked up and deposited one by one to earn each small portion of food. It was a textbook example of ratio reinforcement (chap. 3) and most pigs acquired it rapidly. Instead of improving with practice, however, performance deteriorated day by day. The pigs continued to pick up the coins readily enough, but they were slower and slower to deposit them in the bank. On the way to the bank, they would drop a coin and root it along the ground with their snouts, pick it up and drop it again, often tossing it in the air before rooting it along the ground once more. Pig after pig indulged in more and more rooting and tossing until they delayed the delivery of food, indefinitely.
Raccoons failed in a similar way. Adept at manipulating objects, raccoons quickly learned to grasp and carry wooden coins and even to insert them into a slot. But, after a few rewards the raccoons seemed unable to let the coins out of their grasp. They kept on handling and rubbing the coins and dipping them in the slot as if washing them. Given two coins, a raccoon would rub them together over and over again like a miser. Raccoon misbehavior looked very much like the manipulatory behavior raccoons normally direct toward naturally occurring portions of food—they handle and rub foods that have husks and shells, and even pull prey such as crayfish out of pools. Similarly, wild and domestic pigs kill small rodents by rooting and tossing them before eating them.
In the Breland failures, animals first had to direct responses that resembled prefeeding toward objects that resembled food. Then they had to stop prefeeding the token food before they could receive actual food. The Brelands tried making their animal actors hungrier, reasoning that this should increase the incentive value of food. If, instead, “misbehavior” consists of obligatory components of prefeeding evoked by a conditioned connection between the tokens and the food, then it should increase with increased hunger. This is precisely what the Brelands report; the hungrier they were, the more the animals persisted in “misbehavior” that postponed food.
Rodents also manipulate their food before eating it. Timberlake and his associates (Timberlake, 1983, 1986; Timberlake, Wahl, & D. King, 1982) placed rats in a specially designed Skinner box without the usual lever, and dropped a 5/8-in. steel ball into the box through a hole in one wall. Unimpeded, the ball took 3.1 seconds to roll down a groove in the slightly inclined floor and pass out of the box through a hole in the opposite wall. The experimenters also dropped pellets of food one at a time into a food dish located to one side and above the exit hole. Under some conditions the food arrived at the same time as the ball, under others it was delayed for a measured period of time, and under still others food was delayed until the ball rolled out of the box.
The rats handled, mouthed, and carried the ball as if it were a seed or a nut in a shell. They dug at the entry hole during the delay between the sound of the dispenser and the entry of the ball, even though this tended to delay the ball by blocking the hole. When food was delayed until the ball dropped through the exit hole, the rats continued to handle the ball, thus blocking its progress, preventing its exit, and postponing the food. Under experimental conditions in which food arrived before the ball rolled out of the box, most of the rats formed a habit of carrying the ball to the food dish or otherwise blocking its exit and later resumed handling the ball after they had consumed the food. They persisted in this even though it lengthened the intertrial interval, thus delaying the next feeding.
In the usual Skinner box for rats, the lever is the only graspable, movable object, hence the only available target for the prefeeding manipulatory behavior of these animals. Lacking a manipulatory appendage, pigeons peck at targets with their beaks, and the most prominent target in the Skinner box for pigeons is the lighted key. The box used by Staddon and Simmelhag (1971) had no key, but their pigeons pecked at the wall above the food bin, anyway. If conditioned pecking were based on a kind of stimulus substitution in which the pigeons pecked at random spots of dirt or other imperfections that resembled grains of food, then we would expect them to peck at the floor. Random spots on the floor of the chamber should resemble grain as much as random spots on the upright walls, and downward pecking should resemble normal feeding behavior more than horizontal pecking. Why should pigeons peck at spots on a wall? Unlike precocial birds such as chickens that feed themselves from the first, the young of altricial birds such as pigeons are fed by their parents. The young of altricial birds solicit food by pecking upward at a parent’s beak and crop (Lehrman, 1955).
The misbehaviors of the Breland pigs and raccoons and the Timberlake rats are components of prefeeding in these species. They were evoked by stimuli associated with feeding. When food appeared, the animals interrupted the prefeeding they were directing at the tokens, in favor of consummatory responses directed at the food. The prefeeding responses only became “misbehavior” when the animals had to interrupt them before the food appeared.
In the Skinner box, rats and pigeons interrupt lever-pressing and key-pecking to consume food. They also mix eating with prefeeding, as when heaps of food are already available in free feeding experiments. But, they drift into “misbehavior” when they have to stop key-pecking or lever-pressing to get food, as in the omission contingency. It is then that their responses postpone rather than hasten the arrival of food.
These observations suggest a recursive program that ends with a test. Fresh inputs initiate the next loop in the program. Without fresh inputs the loop repeats. Prefeeding behavior such as lever-pressing and key-pecking is mixed with feeding proper under conditions in which heaps of free food are already available. This suggests a recursive program of the form:
Feed → Manipulate + Eat + Feed
The Brelands describe how easy it was to teach chickens to pull in a loop of string or wire, a simple feat that was extremely difficult for pigeons. Unlike pigeons, chickens in a farmyard get much of their food by scratching in the earth for worms. In one of their displays, the Brelands wanted a chicken to stand still on a platform for 12–15 seconds, waiting for food. They found that about 50% of their chicken performers began to shift from place to place on the platform while scratching vigorously at the floor beneath them. The resourceful Brelands labeled the platform “dance floor” and had the chickens start a “juke box” by pulling a loop. They then required the chickens to stand on the platform (thus depressing a switch) for 15 seconds and called this show “The Dancing Chicken.” They estimated that in the course of an average performing day, each chicken made more than 10,000 useless scratching responses when all they had to do was to stand still on the platform for 15 seconds at a time. Audiences were amazed to see how a behaviorist could get an animal to dance to any tune through the power of positive reinforcement.
CONSTRAINTS AND CONTINGENCY
The Skinner box was supposed to be so arbitrary that the obligatory, species-specific aspects of rat and pigeon behavior would be minimal, just as civilized life was supposed to be so artificial and arbitrary that human behavior must be virtually free of biological constraints. But, rats and pigeons stubbornly refuse to leave their ethology behind when they enter the Skinner box. Lever-pressing and key-pecking were supposed to represent arbitrary work that rats and pigeons would only perform for food and water rewards. Instead, food and water evoke and maintain these particular responses without any contingency at all. Indeed, under a wide range of conditions, positive and negative contingencies are irrelevant to learning. The Brelands discovered analogous phenomena in a wide range of animal species under a wide range of conditions. We cannot dismiss these findings as artifacts peculiar to the behavior of rats and pigeons in a Skinner box.
In response to this mounting evidence, defenders of the law of effect retreated to the position that the arbitrary effects of consequences must operate within limits imposed by certain species-specific constraints. This is known as the “constraints on learning” approach (Hinde, 1973; Shettleworth, 1972). The constraints are said to be evolutionary adaptations for vital functions such as feeding, courtship, and defense.
Like the law of effect, the constraints position is a traditional view with a long cultural history. For example, in the film classic, The African Queen (Huston & Agee, 1951), Katharine Hepburn plays a devout missionary in Africa in the early 1900s. To Humphrey Bogart, who plays the rough operator of a tramp riverboat, she preaches, “Nature, Mr. Allnutt, is what we were put in this world to rise above.” Echoing the missionary tradition in recent times, Skinner (1977) wrote, “Civilization has supplied an unlimited number of examples of the suppression of the phylogenic repertoire of the human species by learned behavior. In fact, it is often the very function of a culture to mask a genetic endowment” (p. 1007).
The missionary tradition supposes a fundamental conflict between the civilizing effect of learning and the brutalizing effect of biology. This supposed conflict separates learning from biology. Nevertheless, the learning of living animals, whether human or nonhuman, can only be a biological phenomenon. The laws of learning must emerge from more general laws of ethology. The rest of this chapter develops this theme to resolve the supposed conflict between learning and ethology.
Meanwhile, the constraints in actual experiments are hardly constraints on learning because learning is abundant in these experiments. A much better name would be “constraints on the law of effect.” So much ethologically based learning by contiguity appears in these experiments that it is difficult to see anything that remains to be learned by contingency. This raises a fundamental question of experimental operations. The answer to this question is the subject of the next section of this chapter.
YOKED CONTROL
Responses that seem to be reinforced by the contingent delivery of food or water are evoked by these incentives without any contingency at all. In that case, how much of the results of instrumental conditioning can we attribute to the effect of response contingent reinforcement? The experimental answer to this question is straightforward. It requires two conditions. Under one condition, C, incentives are contingent on some criterion response. Under the second condition, Y, the same number of incentives are delivered, but independently of the criterion response. In virtually every experiment that used this design to test for contingency, regardless of the response measured and regardless of the species, whether human or nonhuman, the contingent subjects responded more than the yoked subjects. This result seems to confirm the principle of S-R-S* contingency. Nevertheless, this experimental design contains a fatal error that vitiates the results. So many experimenters and commentators have failed to appreciate this error that students and nonspecialists often miss the problem at a first reading. Because the error has profound implications for any theory of learning, this chapter analyzes the yoked control for contingency in great detail.
Experimental Design
The two Skinner boxes shown in Fig. 8.1 are identical except that the lever of box C operates both food magazines, while the lever of box Y is not connected to either magazine. Both magazines dispense pellets of food when rat C presses his lever, but neither magazine operates when rat Y presses his lever. Thus, C’s feeding is contiguous with and contingent upon his lever-presses, while Y receives the same number of pellets at the same time intervals but Y’s feeding is independent of his lever-presses. The difference between C’s responses and Y’s responses should measure the difference between contingent and evocative effects of the incentives. This general procedure can be adapted to use with any criterion response and any method of dispensing incentives.
Perhaps the error of the yoked control appears more clearly in the design shown in Fig. 8.2, which represents two college students C and Y participating in a word recognition experiment. They sit in separate rooms at identical terminals. Each subject reports by speaking into a microphone. Immediately after subject C says “Ready,” identical target words appear on both screens, and immediately after C reports a word both screens black out, but nothing that subject Y says has any effect on either screen. As a result, the target words will usually appear when C is attending to her screen. Sometimes, Y will also be attending to her screen, but many trials will begin at times when Y is paying no attention to the screen, whatever. On the average trial, then, Y’s attention will be lower than C’s, and C will report more correct words than Y does. C will have the same advantage if the experiment is converted from word recognition to word learning by requiring C and Y to memorize the words. But, we cannot attribute C’s superiority in either case to the rewarding effect of her control over the screen.
The yoked control is an ex post facto design. The classical example of this error (discussed in Underwood, 1957, pp. 97–99) is a study of the effects of time spent in the Boy Scouts on later participation in community affairs. The study found that youths who had joined the Scouts and remained Scouts for an average of 4 years later participated in more community activities than other youths who had joined the Scouts at about the same time but quit in an average of 1.4 years. The authors concluded that the additional years in the Boy Scouts increased the amount of community involvement when the youths became adult citizens.
The Boy Scout study proves nothing of the kind. The quitters were different from the stickers at the time that they quit. That is why they quit. It is a mistake to attribute later differences to participation in the Boy Scouts because the two groups were different before the differential conditions had any chance to act on them. Once experimenters select subjects on the basis of past behavior, they cannot logically attribute subsequent differences in behavior to later conditions imposed by the experiment.
Subject Selection
A sound experiment assigns subjects to different conditions in an unbiased way. Suppose that an experiment aims to test the relative effectiveness of praises and insults on college students memorizing lists of words in a special laboratory. Suppose further that in the praise condition the experimenter praises the subjects every time they are correct and in the insult condition the experimenter insults the subjects every time they make an error. In a sound experiment, the experimenter must assign students to the two conditions at random. One way to do this would be to toss a coin each time a new student arrived in the laboratory, and then to assign a student to the praise condition whenever the coin came up heads and to the insult condition every time the coin came up tails. This would be unbiased selection.
An example of biased selection would be to assign the subjects to conditions on the basis of arrival time. One way to do this would be to assign the first 20 volunteers who arrived to the praise condition and the second 20 to the insult condition. This would be biased selection because early volunteers are different from latecomers—otherwise all students would arrive at the same time. If personal characteristics that lead to early instead of late arrival also affect ability or effort to learn, then the difference between the two groups on the memorization task could depend on difference in personal characteristics between early and late arrivers, rather than the relative effects of praise and insult.
In the Boy Scout study, the stickers received 4 or more years of scouting and the quitters received less than 1.4 years of scouting, but this was after the scouts themselves decided to stick or to quit. The Boy Scout study would come closer to the yoked design illustrated in Figs. 8.1 and 8.2, if experimenters had allowed the boys to apply for release from the Scouts and then forced them all to stay in for the full 4 years whether they liked it or not. This, more rigorous, design would still fail to demonstrate that volunteering to stay enhances the positive effects of membership in the Boy Scouts. Just as in the original study, quitters are different from stickers. Otherwise, all the scouts would quit or all would stick. This basic difference between the two types of scout can account for any difference that appears later. Scouts who stayed in voluntarily might as adults participate in more community affairs than the scouts who were forced to stay. But, that could only be a further consequnce of the personal differences that made stickers ask to stick and quitters ask to quit.
Readiness
The E-S-Q-R paradigm in Fig. 4.1 shows how response outputs depend on the interaction between stimulus inputs and the state Q of an animal or a machine. Remember from chapter 4 that the state Q depends partly on the original design of the animal or machine and partly on past history. A state Q is a state of readiness to respond in certain ways to certain stimuli.
In Fig. 8.1, food always arrives when rat C has just pressed his lever—that is, when he is engaged in that particular prefeeding behavior. By definition this means that his current state, Q (E-S-Q-R paradigm of chap. 4), makes pressing more likely, that he is ready to press.
A rat in a Skinner box that has just pressed the lever is in a different state from a rat that has just done something else—otherwise, they would both have done the same thing. The first rat was ready to press; the second rat was ready for something else. If the experimenter delivers a pellet at this moment, the procedure is biased because, at this moment, the contingent rat is more ready to press than the yoked rat. In the same way, the contingent student in the word recognition task was, on average, more ready when she said “Ready,” than she would be at other times.
Conclusion
Yoked control experiments always confound contiguity with contingency when the reinforcing stimulus, S*, either evokes or inhibits the to-be-conditioned response. The source of confounding cannot be eliminated by more powerful procedures or more precise instruments. All conceivable versions of the yoked control for response contingent reinforcement suffer from the same ex post facto error.
The principle of response contingent reward and punishment is an intuitively attractive hypothesis that agrees with an everyday, commonsense view of learning that has appealed to parents, teachers, animal trainers, moralists, and psychologists for centuries. Nevertheless, for those who judge scientific merit on the basis of experimental operations rather than on the basis of intuition or common sense, a principle without any possibility of operational definition is also without any scientific merit.
LEARNING WITHOUT HEDONISM
If we can dispense with response contingent reinforcement, then we can also dispense with much of the clutter of a learning process based on experienced or expected pleasure and pain.
Bioassay
Consider the following version of the experiment in Fig. 8.1. Suppose that instead of delivering pellets of food into a dish, the apparatus delivers doses of a drug directly into the bloodstream by means of a fistula (a surgically implanted tube). The doses are small enough to metabolize rapidly, hence their effect is transitory and brief. Let us further suppose that there is a drug, Epsilon, that has only one effect on rats: It excites feeding behavior.
The rats are yoked in pairs as before, so that both receive an injection of Epsilon when rat C presses his lever, but neither is affected when rat Y presses his lever. When C presses his lever, he receives a dose of Epsilon, which excites his feeding behavior. Since manipulation is one of his feeding behaviors, that bout of lever-pressing will be prolonged. Sometimes Y will also be pressing his lever when Epsilon is released into his bloodstream, but often he will be in some other part of the Skinner box engaged in some other behavior, altogether. Consequently, rats in Condition C will, on the average, press the lever more times than rats in Condition Y.
Since we know that Epsilon is limited to one effect—the excitation of feeding behaviors—we can resist the temptation to conclude that Epsilon is a positive reinforcer. But we only have this advantage in thought experiments. Real experiments are performed to determine the effects of unknown drugs. An actual bioassay experiment that used the traditional definition of reinforcement (chap. 7) and failed to take into account the logical fallacy of the yoked control would have to conclude that Epsilon was a positive reinforcer.
In fact, bioassay experiments with analogous designs frequently appear to support the claim that electrodes implanted in certain regions of the rat brain deliver “reinforcing brain stimulation.” The argument for this claim begins with the assumption that lever-pressing is arbitrary work that rats shun unless we pay them for it with food or some other commodity. It follows that, if contingent brain stimulation increases lever-pressing, it must have some value for the rat, hence the stimulated region is a “pleasure center.” As we have seen, however, feeding evokes lever-pressing. It is more parsimonious to conclude that stimulation of certain regions of the rat brain evokes manipulation just as feeding does. The experimental result would be the same.
If reinforcing brain stimulation is a way of paying the rat with the pleasurable result of eating, then we would expect eating, itself, to decline and even stop entirely when brain stimulation is freely available. Indeed, we might expect some rats to starve to death under these conditions. Nevertheless, when food is freely available and brain stimulation is also freely administered by the experimenter, rats eat more rather than less food (Mogenson & Cioé, 1977, pp. 581–584). This is the opposite of what we would expect if brain stimulation is a substitute for the pleasure of eating. But it is just what we should expect if electrical stimulation delivered to certain regions of the brain evokes a repertoire of obligatory feeding behaviors that includes both eating and manipulation. In that case, when there is food and no lever, brain stimulation should evoke eating, and when there is a lever but no food, brain stimulation should evoke lever-pressing. When both are available, brain stimulation should evoke a mixture of feeding and prefeeding just as food does (see also Stewart, De Wit, & Eikelboom, 1984).
Inhibition and Competition
Consider one more variant of the yoked control experiment. Once again paired rats receive injections of a drug by means of a fistula. This time, however, the drug is Iota, which has only one effect on rats: It inhibits feeding. The subjects are yoked in pairs as before, so that both rats receive Iota when rat C presses his lever, but neither gets any Iota when rat Y presses his lever. When C presses his lever, he receives a dose of Iota that inhibits his feeding. Since manipulation is one of his feeding behaviors, that bout of lever-pressing will be shortened. Sometimes Y will also be pressing his lever when Iota is released into his bloodstream, but often he will be in some other part of the Skinner box engaged in some other behavior which may or may not be part of a rat’s feeding behavior.
On the average, doses of Iota will depress C’s lever-pressing more than Y’s. If we did not know that Iota has only one effect—the inhibition of feeding behavior—we might conclude that Iota was aversive. For, if lever-pressing is arbitrary work that increases with pleasureable consequences and decreases with painful consequences, then Iota must be a punisher, and a site of brain stimulation that has the same negative effect must be a “pain center.” Only the omniscience of the thought experimenter or the logical analysis of the yoked control can protect us from this error.
Suppose that instead of inhibiting feeding behavior, Iota excites a repertoire of nonfeeding behavior that is incompatible with lever-pressing. Doses of such a drug or stimulation of such a brain site would also interrupt bouts of lever-pressing. In the same way, doses of Epsilon or stimulation of an Epsilon brain site would interrupt bouts of nonfeeding behavior. An inhibitory process based on negative consequences is only required in a system that requires an excitatory process based on positive consequences. The result is a cumbersome, top-down system in which each response is governed by a separate pair of opposing excitatory and inhibitory processes. All that a living animal requires, however, is circuitry that can select now one response and then another.
The carrot and the stick move the donkey forward when the carrot is applied to the front and the stick is applied to the rear of the donkey. If the stick is applied to the nose of the donkey, then the donkey moves backward even if the stick is applied as a punishment for moving backward. Whether applied to the front or to the rear, the stick has a feed forward effect. Aversive stimulation depresses feeding behavior by evoking responses that are incompatible with feeding. In aversive conditioning with shock as in appetitive conditioning with food, the responses that are conditioned are the responses that the S* evokes. Aversive conditioning is also independent of the contingency between response and S*, and robust conditioning develops even when this earns more pain. Chapter 10 considers the problem of aversive conditioning.
Feedback Versus Sign Stimuli
The prototypes of what Norbert Wiener called cybernetics were two-phase systems such as thermostats, in which one of two possible inputs maintains the current output (positive feedback) and the second possible input switches the device back to the alternative phase (negative feedback), thus limiting fluctuations in temperature (discussed in chap. 4). Modern computers are multiphase systems that perform a given operation until a test is passed. This initiates the next operation, and so on. A computer can go through many operations before it repeats the first operation. Indeed, it may never return to the logical state that initiated the first operation. The pulse that ends one operation initiates the next. It feeds forward rather than backward, like the sign stimuli of classical ethology. In the case of a male stickleback, for example, establishing a territory and building a nest are not rewarded by the appearance of a gravid female. Instead, the swollen abdomen of the female is a sign stimulus that initiates courtship. Courtship is not rewarded when a female deposits eggs in the nest; instead, the clutch of eggs is a sign stimulus that initiates fanning and nest tending.
Highly artificial sign stimuli can evoke genuine, species-specific patterns of behavior. Remember Tinbergen’s description of the male sticklebacks in his laboratory that attempted to attack Royal Mail vans passing by the windows, even though the only stimulus that a van shared with an intruding male stickleback fish was its fiery red color (chap. 4).
According to S-R contiguity, an S* serves to evoke obligatory responses rather than to reinforce arbitrary responses. The stimulus feeds forward rather than backward. The rats studied by Timberlake and his associates manipulated small steel balls almost indefinitely until pellets of prepared grain arrived. Under comparable conditions, rodents manipulate a seed until they find a suitable place in the husk to break open. This is the S* for breaking open the husk. If a rat finds a kernel of food in the husk, this is the S* for consuming the kernel, and so on (Lawhon & Hafner, 1981; Weigl & Hanson, 1980).
The sight of food evokes prefeeding responses, such as salivation. If an arbitrary stimulus, such as a tone, appears when the subject is salivating, the tone later evokes salivation before food appears. A vast number of experiments demonstrate that this procedure is sufficient for conditioning. Is it necessary to introduce an additional feed backward mechanism by which positive or negative consequences reinforce or weaken the connection between stimulus and response? Are there economic advantages to a feed backward mechanism of learning with all of the cumbersome, prespecified, top-down mechanisms that it entails? If not, the additional burden handicaps both organism and robot in a competitive world. A feed forward model of the learning process offers a more practical alternative that is also more consistent with the experimental evidence (see also, R. A. Gardner & B. T. Gardner, 1988a, 1988b, 1992).
AUTONOMOUS ROBOTS
Recent advances in robotics are beginning to offer attractive alternatives to traditional learning theories, alternatives that are even more simple-minded than Hull’s model and behave even more intelligently than Tolman’s. To be autonomous a robot must operate under unpredictable conditions on its own without any human supervision. So far, successful autonomous robots follow basic ethological principles.
Top-Down Robots
In traditional philosophy, information coming in from the senses must be represented in the brain in some reduced shadowy form, as in Plato’s parable of the cave. In this view, human beings must decide what to do next from moment to moment by looking at these shadows of the outside world, often called representations. Sadly, many modern attempts to build autonomous robots have used this ancient model of how a nervous system governs behavior. In traditional robotics, a central processor receives information from sensors and constructs a reduced model of the present state of the world. The central processor then uses this model to decide the robot’s next move and commands the motors accordingly. After the next move, the world outside changes, which changes the inside model of the world and causes the machine to reevaluate the new situation and decide once again on the next move after that. This conception of a central processor that governs behavior by receiving and processing information from sensors and then transmitting commands to motors is, of course, the top-down model of cognitive psychology.
So far, top-down robots have performed rather poorly. Imagine a top-down robot trying to maneuver through a corridor in a building. Each time it moves, the information coming in changes and alters the model of the world stored inside. Each time someone or some object in the corridor moves, this also changes the information and alters the model. To receive and process all of this information, the onboard computer must have a fairly large capacity. Even if it only has to proceed through the corridor avoiding obstacles, the onboard computer has to do a lot of computation. If the robot has additional tasks to perform (and why ever would anyone send it into the corridor or out into space to move around without doing anything?), it has to have still greater capacity. This requires a rather large onboard computer to process the information rapidly. Otherwise, the robot could only move at a snail’s pace. A large computer that computes with reasonable speed has to have a large power source in the form of large, heavy batteries. This means large, heavy wheels and motors to carry the onboard computer even before the engineers add the hardware that does the jobs required by the mission.
As late as 1997, the best top-down robots were so large and so cumbersome and their processing was so slow that the fastest of them were still too slow to do any practical work. Some engineers continue to design larger top-down robots with larger onboard computers in the hopes that larger computers may eventually process the information fast enough to become practical. There is always hope that miniaturization may someday make top-down robots more practical. Meanwhile, existing top-down robots are so large and so expensive that it would be impossible to send more than one at a time on a space mission, say to the moon. And, if one bearing or one spring failed, the whole robot would be lost and the mission would fail with it.
Bottom-Up Robots
More recently, engineers lead by Brooks (1990) of MIT shifted their attention to bottom-up models that need little or no central processing. Suppose that the diagram in Fig. 8.3 represents an egg-shaped robot equipped with two light-sensitive units at its front end and two motor-driven wheels at its rear end. In this illustration, the light sensor on the left side of the robot is wired to the wheel motor on the right of the robot and the light sensor on the right is wired to the wheel motor on the left of the robot. This is called contralateral connection.
The star-shaped object in the diagram represents a source of light. When the light is closer to the sensor on the left, more power goes to the wheel on the right and less power to the wheel on the left. As a result, the robot moves toward the light. When the light moves, the robot follows, eventually colliding with the light. Notice that the robot can follow the light faithfully without any central processor to compute where the light is or the speed and direction of the light’s movement, or where the robot is and the speed and direction of its own movement. The robot never has to compute anything and evaluate the results in order to decide what to do next. All it has to do is respond directly to the relative amount of light reaching its sensors.
Bottom-up robots have been around for some time. By the time of the Korean War, the U.S. military had already developed a system like the one in Fig. 8.3 to operate heat-seeking missiles. Heat-seeking missiles are fired at enemy aircraft, but they only have to get near enough to sense the heat of the enemy’s engines. With more than two pairs of jets, the heat-seeking missile can maneuver in three dimensions and track the enemy engine no matter what evasive action the pilot may take. Eventually, the missile collides with the enemy engine and destroys it.
It is a simple, cheap, and effective design that operates without a central processor. It never has to know anything about the location or the movements of a target, it never has to know anything about its own location or movements, either, and it never has to create a model of the world outside in order to decide what to do next. In fact, it never makes any specific response to any specific stimulus. It
only responds to relative stimulation. To people who do not know how these robots work, they seem very clever. To the first pilot who was pursued and shot down by a heat-seeking missile, the robot must have seemed devilishly clever, indeed.
In Fig. 8.3, the connections between the sensors and the motors are contralateral, and the robot always moves toward the light. Suppose the connections are ipsilateral, instead—that is, the sensor on the left wired to the motor on the left and the sensor on the right wired to the motor on the right. With ipsilateral connections, the robot should always turn away from the light. If the sensors have wide-angle reception, like the eyes of many animals such as rabbits, the ipsilateral robot not only turns away from the light, but continues to move away until it gets to a dark place.
The behavior of the robot would be more complicated if it had another pair of sensors, say a pair of odor sensors. Suppose that the light sensors have ipsilateral connections to the motors and the odor sensors have contralateral connections. Then the robot would track odors in the dark and avoid light in general. With a little adjustment, the robot would approach when an odor is relatively strong and light is relatively weak. Such a robot would even dither a bit when attraction to the odor conflicted with repulsion from the light. In close cases, the robot would seem to be having trouble deciding what to do next. Brooks (1990) has already tested a robot that runs away from the light and into the darkest corner it can find, when it hears a loud sound. It then stays in the dark until an interval of quiet time has passed.
In recent times, engineers have used the simple principle of direct response to relative stimulation to produce more ingenious robots. Horswill and Brooks (1988), for example, mounted two small television cameras on a small robot about 2 feet long, and wired the robot (as in Fig. 8.3) to follow small solid objects about the size of a baseball. The vision of the cameras is very crude compared with the vision of most vertebrate animals, but they can sense simple objects and direct the robot motors to follow the objects. When someone dangles a ball on a string near this robot, the robot chases it like a kitten.
The robot also has sonar units that sense obstacles and it is wired to avoid obstacles before collision. If the ball moves over an object like a chair but still remains in view, then the robot goes around the chair. If a large object, like a utility cart, gets between the robot and the ball, the cameras lose sight of the target. The robot then remains oriented toward the last place where it detected a ball and stays oriented in this direction for a while. If the obstacle moves away soon enough, the robot picks up the target again and continues to follow it. If the obstacle remains in place too long, then the robot begins to wander randomly until it finds another target.
Again, this simple, autonomous robot can navigate around obstacles and deal with a wide variety of unpredictable situations without any cognition at all. It only has to make relative responses to relative sources of stimulation. Bottom-up robots can do without any internal representation of the outside world because they respond directly to the world, itself. They operate without constructing an internal model, because they use the world as a model of itself.
Building From the Bottom Up
Autonomous robots of the Brooks type are designed from the bottom up. The most basic element in the design is the system that avoids obstacles. The engineers perfect this system first. Next, the engineers design and perfect a simple system that makes the robot wander. This is necessary for situations in which the robot reaches a blank wall or some other obstacle. Without a spontaneous wandering system, it would continue to rest at a safe distance from the obstacle.
The next layer of the system begins to direct the robot. With two cameras, the robot can have depth perception to detect openings such as doorways and to go through them. One of Brooks’ requirements for bottom-up robots is that they can operate in a natural and unpredictable environment, such as the lobby outside his MIT lab where people come and go and move carts and furniture at unpredictable times. After entering a doorway, a robot called Herbert seeks out desk-size objects and approaches them. When Herbert is close enough, it passes a laser beam over the top of a desk till it detects an object like a soda pop can. Herbert’s arm then reaches out, grabs the can, and hefts it. If the can is heavy, Herbert replaces it. If the can is light, Herbert takes it away and deposits it in a recycling bin down the hall.
Bottom-up robots only need simple computers, so the robots can be quite small, often only about 1 foot long. Because they are so small and so cheap to build, a rocket ship to the moon could carry 100–200 of them. Any that failed to operate could be left behind with little loss. The odds are high that many of them would perform well enough to assure the success of a mission.
Bottom-Up Learning
Suppose that Herbert’s laser beam can scan only to the right or to the left. Suppose further that 80% of the empty cans in this environment are on the right and only 20% are on the left. To profit from this contingency, all Herbert would need is a circuit that prevented immediate repetition, right or left, but ensured repetition after a delay, say till the next desk.
This would guarantee that at the next desk, Herbert would beam in the same direction that was successful at the last desk. On average, Herbert would find a can on the first try 80% of the 80% of the time that the cans were on the right and 20% of the 20% of the time that cans were on the left. That is 64% plus 4% or 68% of the time, which is distinctly better than 50%. This could be improved by allowing a cumulative effect of repeated success in one direction. This would dampen the effect of infrequent instances of left-lying cans. By contrast, engineers could burden Herbert with onboard computing facilities that recorded past right and left successes and computed the contingencies. When Herbert had enough data to conclude that beaming to the right was the best strategy, the robot could beam to the right at every desk. That would increase performance to 100% of 80% or 80% success, only a modest improvement over 68%.
The practical advantages of a simple-minded, win-stay system increase if there are three or more alternatives, as when cans can have right, center, and left or still more different positions (R. A. Gardner, 1957, 1961). An onboard computer that had to compute all of the contingencies would have to take into account an N × N matrix of reaches, successes, and failures. The last-trial-only learning system would only have to record the last successful reach. While modestly less accurate than a system that computed exact probabilities, it would make up for this by responding immediately to the prevailing contingency. Herbert could respond to changes in the right-left distribution of cans on desks without waiting until new instances averaged out the old computations, a serious problem in practical applications of neural network computing (Kosko, 1993, pp. 331–369).
With this bottom-up learning mechanism, Herbert could benefit from experience without discriminating correlation from causation and without evaluating, computing, or storing positive and negative hedonic contingencies. Herbert could use the recent history of the world as a model of the future and respond directly to that.
SUMMARY
Abundant evidence reviewed in earlier chapters shows that S-R contiguity is sufficient for learning. Ethological evidence shows that lever-pressing and key-pecking in the Skinner box or operant conditioning chamber are obligatiory behaviors evoked by food or water. They persist even when the contingency between responses and incentives is negative. The idea that lever-pressing and key-pecking are arbitrary behaviors that animals only perform for contingent food or water has caused much confusion. Analysis of the yoked control experiment shows that, when food or water evokes a response, then it is impossible under any conceivable experimental arrangements to measure any additional effect of contingency beyond the well-established evidence for contiguity.
This chapter goes on to show how relatively simple, feed forward, ethological principles fit the facts of conditioning without recourse to cumbersome, hedonistic, feed backward principles of contingency. This chapter ends by contrasting the practical failures of top-down robotics with the practical successes of bottom-up robotics. A foraging robot with a simple, low-powered, bottom-up mechanism could learn much more cheaply and efficiently than a foraging robot that had to compute actual contingencies. The simpler, bottom-up system is also more flexible so that its advantages increase with complexity and natural fluctuations in practical foraging problems.
18.117.119.7