5

image

The Endpoint of Skilled Word Recognition: The ROAR Model

Gordon D. A. Brown

University of Warwick

The chapters in this book represent many different perspectives on the same landscape. This landscape includes the terrain that children must negotiate in order to reach the final destination that researchers and educators are all concerned to help them reach—the state of skilled adult reading. This chapter aims to describe what this destination looks like and, more important, why it is the chosen destination. If beginning readers are to be provided with the best possible maps and guides to the route they must follow, an understanding of why skilled adult readers converge on a particular destination (as opposed to some other one) is essential. The present chapter focuses on the nature of skilled single-word reading, because it is there that most progress has been made in understanding the demands of the task and the different types of solution that may be possible.1

The chapter begins with a brief description of the rational analysis approach to human cognitive processing (Anderson, 1990) according to which mature human cognitive behavior can be understood as an adaptive reflection of the structure of the world. When applied to reading, this perspective leads to the suggestion that the cognitive abilities of skilled adult readers should have developed in such a way that performance will be statistically optimal with respect to the structure of the English spelling-to-sound mapping system. This is the ROAR model (for rational, optimal, adaptive reading). The substantial amount of empirical evidence consistent with this suggestion is reviewed, as are the implications of the approach for learning. If the endpoint of learning to read can indeed be characterized as statistically optimal behavior, then the performance of artificial learning systems that learn statistically optimal behavior can provide important insights into the learning process. In particular, the study of such systems may lead to ways of structuring the learning process that facilitate the acquisition of a system involving both regular and irregular spelling-to-sound mappings.

THE IMPORTANCE OF THE “WHY” QUESTION

Many highly sophisticated models of reading have been developed over the past decade (e.g. Coltheart, Curtis, Atkins, & Haller, 1993; Norris, 1994; Plaut, McClelland, Seidenberg, & Patterson, 1996; Seidenberg & McClelland, 1989). Such models have been concerned to characterize the information-processing mechanisms that underlie skilled adult reading. However, this research has remained largely silent on a question that is surely of central importance: Why do skilled adult readers develop the mechanisms that they do? Indeed, the question seems often to be missed or ignored completely. To illustrate, imagine that some group of researchers developed a model of skilled adult reading that accounted for all known relevant empirical data. Let us call this “Model M.” Further assume that the predictions of Model M are tested experimentally, and that no disconfirming evidence can be found. The development of Model M would not be the end of the theoretical enterprise, because the mere existence of the model would bring us no nearer the question of why Model M (rather than, say, Model P or Model Q) was the one to describe human reading. Why, in other words, does human learning lead to the development of a system that behaves like Model M? Why is Model M the best one to have?

In the first section of this chapter one possible answer to this question is proposed. It is suggested that the properties of the mechanisms used by skilled adult readers can be understood as those of a system that is optimally adapted (i.e., maximally efficient) in the light of the statistical nature of the mapping between spelling and sound in English. This is the ROAR model. According to this view, empirical phenomena, such as the use of spelling-to-sound correspondences at multiple levels, can be explained in terms of the operation of a mechanism that is optimised for the task it must perform. Such a characterization cuts across any particular implementation of the mechanisms in question—it is possible to claim that skilled adult reading is statistically optimal without commitment as to the exact nature of the mechanisms that are involved in reading.

This leads to a view of reading as statistics that is, of course, not a novel one. The approach has been implicit in connectionist models of reading dating back from the middle of the 1980s, and has more recently been made more explicit in the work of, for example, Jared, McRae, and Seidenberg (1990) and, especially, Treiman, Mullennix, Bijeljac-Babic, and Richmond-Welty (1995). In the present chapter, however, the intention is to go beyond this general claim to argue that the reading as statistics approach can be used as the basis for understanding the question posed at the beginning of this chapter, namely, why do skilled adults possess the reading system that they do (i.e., as opposed to some other one). It is one thing to show that adult reading is in some way sensitive to the statistical regularities in the English spelling-to-sound mapping system; it is quite another to explain why this should be so and, in particular, why adults should be sensitive to these regularities in the precise way that they are.

The second part of the chapter reviews empirical evidence on skilled adult reading that is consistent with the ROAR approach, and this is taken as suggestive evidence that the endpoint of reading development is indeed optimal reading in the specific sense of statistical efficiency. In the third section of the chapter the implications of the new characterization of the endpoint of reading instruction for instructional practice are considered.

A RATIONAL ANALYSIS OF READING

This section describes the approach that Brown (1996) took to describing skilled adult single-word reading as “adaptively rational” (Anderson, 1990). The suggestion is that such reading can be seen as representing optimal performance given the statistical properties of the task to be performed. This is the approach known as “rational analysis” that has been developed by Anderson (1990; Anderson & Milson, 1989) and others. A central idea of the rational analysis approach is that human psychological behavior can be understood in terms of the operation of a mechanism that is optimally adapted to its environment (Anderson, 1990), in the sense that the behavior of the mechanism is as efficient as it conceivably could be given the structure of the problem or input-output mapping it must solve. An example comes from the study of human memory, where Anderson and Milson (1989; cf. also Anderson & Schooler, 1991; Brown & Vousden, in press) showed that the rate at which information is lost from memory can be explained if it is assumed that the availability of information in memory reflects the probability that it will be necessary to access it as a reducing function of the time since the memory was last accessed.

In applying the same approach to the study of single-word reading, our aim has been to develop an understanding of why the mechanisms that skilled adults have developed for reading have the properties they do—in particular, why do they exhibit the observed forms of sensitivity to the regularity or consistency of the spelling-to-sound correspondences contained in the words to be read? Note that the question of why (or, indeed, whether) it is adaptively rational, optimal, or at least efficient for humans to have a reading system that is organized in such a way as to have the observed empirical properties is quite independent of questions concerning the actual implementation of the relevant system. For example, much debate has centred on the ability of dual-route models (e.g. Coltheart et al., 1993) or connectionist models (e.g. Brown, 1987, 1997; Norris, 1994; Seidenberg & McClelland, 1989) of reading to account for the wide range of empirical data available concerning skilled adult decoding. However, even if it were to be shown that a particular dual-route model or a particular connectionist model were to provide a complete account of the relevant experimental findings, researchers would still be no nearer an understanding of why it is that skilled adults develop a dual-route model rather than a single-route connectionist model (or vice versa). In other words, understanding the nature of the mechanism that causes or enables us to read in a particular way is just one part of the research enterprise.

The ROAR model is based on what Brown (1996) termed the optimal reading hypothesis. The optimal reading hypothesis states that skilled adults are reading with maximum efficiency, given the statistical structure of the mapping from orthography to phonology in English. Thus, the process of learning to read is seen, in this view, as basically a statistical process in that it requires the learner to acquire a set of associations between written word forms and their pronunciations (Brown, 1996, 1997; Brown & Loosemore, 1994, 1995; cf. also Seidenberg & McClelland, 1989; Treiman et al., 1995). In other words, they must infer a model (of the language) from the data to which they are exposed. If skilled adult reading is adaptively rational and statistically optimal, then the sensitivity of the adult reading mechanism to spelling-to-sound consistency should reflect optimal representation and usage of the statistical properties of the spelling-to-sound mapping system in English.

More specifically, Brown (1996) provided formal demonstrations (using Bayesian and other analyses) that the ratio of consistent pronunciations in the language to all pronunciations in the language of a given orthographic segment should be the only spelling-to-sound factor to influence skilled adult reading. In other words, if a given orthographic segment O can receive different pronunciations with probabilities p1, p2, p3, and p4, then the difficulty of assigning the pronunciation that has probability p3 to O will be given by:

image

This will be referred to as the consistency ratio of an orthographic segment. In the case of the orthographic rime segment -ave, for example, there is only one word, have, with the /Av/ pronunciation (we consider here only mono-syllabic words with a frequency of at least one per million in the Kucera & Francis, 1967, count). However there are 12 words with a different pronunciation of the same orthographic rime segment (save, cave, gave, etc.). Therefore, if the consistency ratio is calculated in terms of word types, it is equal to 1/(1 + 12) = 0.08. If word frequency is taken into account, the consistency ratio (by tokens) is equal to 0.89—a much higher value, because the token frequency of have is very high in relation to the summed frequency of its inconsistently pronounced neighbors. Treiman et al. (1995) examined consistency ratios (and other measures) by both types and tokens, and both measures are also used in the analyses reported later in this chapter.

Note that the claim that only the consistency ratio of an orthographic segment will determine its ease of pronunciation is not a trivial one: It excludes, for example, the possibility that the overall frequency with which an orthographic letter pattern receives a given pronunciation will independently determine pronunciation time or accuracy, and it also excludes the possibility that the overall frequency with which an orthographic pattern receives a different pronunciation to the one intended should influence processing. These factors (numbers of consistent and inconsistent orthographic neighbors) should only, according to the optimal reading hypothesis, influence pronunciation in so far as they contribute to the consistency ratio. Thus, the optimal reading model makes a strong prediction regarding the precise quantitative form of the relationship between reading and spelling-to-sound consistency. What the rational analysis has provided is a specific hypothesis concerning why and how skilled adult reading should reflect the statistical structure of the language. The next section focuses on evidence consistent with the account of how the sensitivity should manifest itself; that is, what is the precise quantitative form of the relationship between spelling-to-sound consistency and actual reading performance as assessed, for example, by word naming latency and accuracy?

EVIDENCE FOR OPTIMAL READING

The central prediction of the described approach is that single-word reading performance should be influenced by the consistency ratio of the orthographic segments within a word, and not by the absolute frequencies of the orthographic or phonological rimes. An immediate problem in evaluating this prediction with respect to real empirical data is that spelling-to-sound regularities must be defined over some specific units in the language. For example, is it regularities at the level of graphemes and phonemes that matter, or the statistical relationships between larger units such as phonological rimes (e.g. the /Av/ in have) and the orthographic letter clusters that correspond to them (following other researchers, these are referred to as orthographic rimes, analogously to the concept of graphemes as the orthographic units corresponding to a single phoneme)? In line with the optimal reading hypothesis, it might be assumed that optimal readers will make use of whatever levels of spelling-to-sound correspondence are most useful (i.e., provide the most reliable guide to pronunciations). Treiman et al. (1995) adopted this perspective, and showed that, in English, the most reliable sublexical guide to the pronunciation of the vowels (where much of the inconsistency resides) of monosyllabic English words often resides in the correspondence between orthographic and phonological rimes.

Consistent with this, Treiman et al. found substantial evidence that skilled adults do indeed make use of such orthographic units, in that their single-word naming latencies and error rates are affected by spelling-to-sound consistency at the rime level independent of consistency at other levels. However, Treiman et al. also found independent effects of the consistency of other orthographic segments—such as graphemes at the beginnings of words and sometimes at the ends, as well as medial vowels. Many other studies have also found evidence of consistency at the rime level, but the Treiman et al. study is unique in its careful comparison of different levels of spelling-to-sound correspondence.

The analyses I now describe focus on spelling-to-sound consistency at the rime level, but this primarily reflects the fact that most studies that provide quantitative information on effects of different levels of spelling-to-sound consistency have focused on this particular level. As described previously, the ROAR approach predicted that naming latency and accuracy should be determined by the consistency ratio of any given orthographic segment—the ratio of consistent to all pronunciations of that orthographic segment in the language. Several studies have shown that the consistency with which an orthographic rime is pronounced affects the speed and accuracy of pronunciation of words containing that orthographic rime (e.g. Bowey, 1996; Bowey & Hansen, 1994; Brown, 1987; Brown & Watson, 1994; Coltheart & Leahy, 1992; Glushko, 1979; Jared et al., 1990; Laxon, Masterson, & Coltheart, 1991; Laxon, Masterson, & Moran, 1994; Seidenberg, Waters, Barnes, & Tanenhaus, 1984; Taraban & McClelland, 1987; Treiman, Goswami, & Bruck, 1990; Waters & Seidenberg, 1985; Waters, Seidenberg, & Bruck, 1984). Jared et al. (1990) concluded, on the basis of many studies carried out by themselves and others, that word-naming latency is determined by the relative frequencies of the consistently and inconsistently pronounced orthographic neighbours of a word (orthographic neighborhood here being defined in terms of other words in the language sharing the same orthographic rime unit). Consistent with this, they found larger exception effects (disadvantages for words with mainly inconsistently pronounced neighbors) for low-frequency words with few consistently pronounced neighbors. This led Jared et al. to their suggestion that it is the combination of consistently and inconsistently pronounced orthographic neighbors that combines to determine word naming time.

Furthermore, they provided some evidence that it was token frequency (i.e., the summed frequency for all neighbors) rather than type frequency (number of different words in the neighborhood, ignoring frequency) that was relevant in determining pronunciation time. Thus, there is ample empirical evidence in support of the general conclusion that the consistency of pronunciation of a word’s orthographic rime will determine the ease of its reading.

Treiman et al. (1995) were the first to go beyond this general claim, and they did so in two important ways. First of all, they examined the ability of a quantitative measure of consistency to predict individual word-naming latencies and error rates. Specifically, the measure they used was the ratio of consistent to all pronunciations of a given orthographic segment (note that this is exactly the consistency ratio that the rational analysis discussed earlier predicted should be influential). Second, they directly compared the ability of the consistency of different orthographic segments (at different levels) to predict latencies and accuracies. This allowed conclusions to be drawn about the size of orthographic units used by skilled readers. First of all, Treiman et al. found that the consistency ratio of orthographic rimes in monosyllabic words accounted for independent variance in both naming latency and error rate. This finding was obtained in two large analyses, each of several hundred words. They also found effects of other units (e.g., the graphemes at the beginnings and ends of words), but here the focus is on orthographic rime effects. The fact that Treiman et al. found these strong independent effects of rimes consistency ratio is highly consistent with the ROAR framework described previously.

However, further evidence could be obtained if it could be shown that this consistency ratio is the only spelling-to-sound measure to influence performance, as predicted by the optimal reading account. In order to attempt further evaluation of this, Brown (1996) examined the ability of the rime consistency ratio measure to account for variance in other published data. In one of their experiments, Jared et al. (1990) measured naming latency for eight different word types, varying in spelling-to-sound consistency. Both consistent and inconsistent words were used, with either high- or low-summed frequencies of consistent and inconsistent neighbors. Brown (1996) found that the simple consistency ratio was very highly correlated (around 0.95) with mean naming latency for the different word types, and also highly correlated with error rate. A simple multiple regression, with consistency ratio as the only factor, predicted every one of the eight mean naming latencies to within 4 milliseconds. Partial correlations were also carried out, to examine whether the total frequency of consistently pronounced neighbors, or the total number of inconsistently pronounced neighbors, would be related to naming latency or error rate after the effects of the consistency ratio measure were taken into account. Although the results of this analysis should be treated with considerable caution due to the small number of data points, in all cases it was found that there was no independent correlation between performance and the other measures of the spelling-to-sound characteristics of the rimes in the different word types. The comparisons between different spelling-to-sound measures are not strong evidence in themselves. However, in combination with the Treiman et al. (1995) results, the analyses do provide some evidence consistent with the optimal reading hypothesis.

In an attempt to predict a larger number of data points, Brown (1996) used the consistency ratio measure to predict the results of Jared et al.’s meta-analysis of more than 20 studies of consistency effects previously reported in the literature. All of these studies examined spelling-to-sound consistency effects at the rime level, and it was on the basis of their metaanalyses of the numbers of consistently and inconsistently pronounced orthographic neighbors of the words used in the various studies that Jared et al. concluded that the numbers of consistently and inconsistently pronounced orthographic neighbors of a word combine and conspire to determine the word’s naming latency. For present purposes, the question of interest is whether the effect size can be predicted by the consistency ratios of the orthographic rimes of the stimuli used in the various experiments brought together by Jared et al.2

Figure 5.1 shows the effects size in 20 of the experiments described by Jared et al. as a function of consistency ratio as calculated by Brown (1996) from the data provided in Jared et al. (1990).3 It can be seen that, consistent with the optimal reading hypothesis, there is an extremely good relation between spelling-to-sound consistency ratio and the size of the observed consistency effect. Partial correlations again found that the overall frequency of consistently or inconsistently pronounced neighbors of the word failed to correlate independently with naming latencies. This analysis therefore provides some tentative evidence consistent with the claim embodied in the ROAR model; namely, the claim that consistency ratio should be the only measure of spelling-to-sound consistency to account for variation in performance in skilled adult reading. Further evaluation of this claim will be important to distinguish the specific predictions of the ROAR model from the more general claim, made by a number of researchers, that the relative proportions of consistently and inconsistently pronounced orthographic neighbors will conspire in some way to determine naming latency or accuracy.

image

FIG. 5.1. Observed and predicted consistency effect sizes (observed and predicted effect size, analysis from Brown, 1996; data from Jared et al., 1990). CR = consistency ratio.

The studies and analyses described here have focused on the effect of consistency of pronunciation of orthographic rimes, because it is on that issue that most of the data are available. Such quantitative evidence as exists for skilled adult reading concerning the effects of consistency at other levels of analysis (e.g., of individual graphemes) comes mainly from the analyses carried out by Treiman et al. (1995). Such effects as were found in that study were also effects of the consistency ratio of the orthographic units examined. Thus, these results are also consistent with the predictions of the ROAR model, although they do not provide strong evidence for it. Further evidence will be necessary to compare the effects of different quantitative measures of spelling-to-sound consistency for various levels of correspondence. Brown (1996) reviewed further evidence consistent with the optimal reading analysis, and also suggested that the approach could account for the widely observed interactions between spelling-to-sound consistency and word frequency.

IMPLICATIONS FOR LEARNING

This section examines some of the implications for learning of taking seriously the view of reading as statistics described previously. If the endpoint of successful learning to read can be characterized as an internalization of the statistical regularities embodied in the English spelling-to-sound mapping system, then our understanding of the best way to attain that endpoint may be increased by examination of other learning systems that are designed to do exactly that: connectionist learning systems. Connectionist models of learning work by extracting out underlying statistical structure from a set of learned exemplars. They can be seen as performing statistical inference, in that they infer a model from a set of data (see Chater, 1995, for extensive discussion). Over the last few years, connectionist models of learning have provided us with important insights about the nature of statistical learning systems and the conditions under which different kinds of input-output mapping systems are easy or difficult to learn. This section briefly reviews some of the conclusions that have emerged from this work, because they have important potential implications for the process of learning to read. First, an important caveat: The present chapter focuses mainly on evidence relevant to the order of learning of regular and irregular words in beginning reading. In order to emphasize the issues that concern us here, motivational issues are largely ignored on the assumption that they cut across the questions addressed here. The importance of providing a rich and rewarding environment in which reading learning can take place is clear; here, the focus is on general issues concerning the learning of a system containing regular and exceptional items that may apply independent of the wider framework that is adopted.

Catastrophic Interference

Studies of associative learning in both human and connectionist learning systems have paid much attention to the phenomenon of catastrophic interference (Barnes & Underwood, 1959; McCloskey & Cohen 1989; Ratcliff, 1990). The phenomenon is most easily illustrated using the paradigm classically used to demonstrate it. Barnes and Underwood (1959) trained subjects to associate items from one list (List A) with items from another list (List B). When these A-B associations had been learned, a second set of associations was learned, in this case between List A items and a new set of items, List C items. The question of interest is the extent to which the more recently learned A-C associations interfere with or overwrite the first-learned A-B associations. Barnes and Underwood found gradual and incomplete forgetting of the A-B associations throughout learning of the A-C associations, and even when the A-C associations were completely learned much memory was retained for the original A-B associations. In other words, the learning of new information only partially and gradually interferes with existing stored knowledge. This phenomenon has attracted particular interest in recent years, because it appears that connectionist models of learning and memory suffered from “catastrophic interference” in that learning the new set of associations (the A-C associations) caused rapid and total forgetting of the first-learned A-B associations (e.g., McCloskey & Cohen, 1989). This is catastrophic interference.

Ratcliff (1990) further analyzed the conditions under which catastrophic interference occurs. The AB-AC paradigm, as it has become known, captures an important aspect of the task facing children as they learn to read. In this case, orthographic and phonological representations are the patterns to become associated to one another. Just as more than one item becomes associated to each List A item in the Barnes and Underwood paradigm, more than one phonological pattern may become associated to a given orthographic segment. Thus, it will clearly be undesirable if, having initially learned that the orthographic sequence -ave is pronounced /eIv/ (as in rave, gave, save, etc.), subsequent learning that -ave is sometimes pronounced /Av/ as in have completely wipes out prior knowledge of the original association. In other words, it is important to ensure that catastrophic interference is avoided in learning the mapping from spelling to sound. In the connectionist modeling literature, various solutions to the problem of catastrophic interference have been proposed (e.g. French, 1992; Lewandowsky, 1991). However, of most interest to the present proposal is a kind of learning known as interleaved learning (McClelland, McNaughton, & O’Reilly, 1995).

Interleaved Learning and Gradient Descent Learning

A recent insight into connectionist learning systems concerns the need for two completely different kinds of learning to be possible (McClelland et al., 1995). On the one hand, it is necessary to be able to store distinguishable representations of (perhaps similar) events after a single exposure. On the other hand, it is by now well-established that slow incremental learning procedures, in which the weights in a network change just a small amount after presentation of each item in the set to be learned, is good for extracting underlying regularities in the mapping to be learned. In such systems, similar representations are assigned to similar items. Thus, these slow incremental learning procedures (e.g., gradient descent learning algorithms, such as back propagation in connectionist nets) are well suited to developing systems that are good at generalization. This is because the structure of the underlying system only changes by a small amount in response to each presented exemplar, and therefore a single atypical instance cannot have too great an influence on the underlying structure of the network. This leads to a system which is (a) sensitive to the underlying regularities in the mapping to be learned, and (b) good at generalization, because it has extracted what is common to all the examples with which it has been presented. It is therefore no coincidence that slow incremental learning procedures such as the back propagation or other error-correcting rules widely used in connectionist systems have provided good models of the learning of underlying regularities inherent in mappings such as the English spelling-to-sound mapping systems (Plaut et al., 1996; Seidenberg & McClelland, 1989) or the English verb tense system (Plunkett & Marchman, 1991, 1993; Rumelhart & McClelland, 1986).

A further important characteristic of these systems is that they avoid the catastrophic interference referred to earlier because they use what McClelland et al. (1995) referred to as interleaved learning. This means that different example are not presented all at once, in blocks, but rather are “interleaved” (so that, e.g., irregular examples would be interspersed among regular exemplars). This has the additional consequence that the relative frequencies of different exemplars can be accurately represented in the connection strengths of the fully learned system. It remains to be seen how far such findings can be generalized to the case of learning to read. However, on the basis of computational results already established, it seems likely that interleaved learning will provide the most efficient method of ending up with a system in which both regular and irregular spelling-to-sound correspondences can be represented. Thus, for example, massed practice of regular or consistent items, followed by massed practice of irregular items, would be likely (if the previous analysis applies to real reading learning) to lead to undesirable forgetting of the originally learned regularities. Given that the final system must encode both regular spelling-to-sound correspondences and exceptions to those regularities, it seems likely that interleaved learning of regular and exceptional items will be most efficient.4

Little empirical investigation has been done on the issue of how to introduce regular and irregular items in children’s reading, although researchers such as Kryzanowski and Carnine (1980), as well as others, found the expected general advantage for distributed versus massed practice—in one study, nearly twice as many correct posttraining naming responses were made to letters when they had been presented in distributed rather than blocked patterns (cf. also Rea & Modigliani, 1985). This study only examined responses to single letters, however, and it is possible that interleaved learning may be particularly important under conditions where multiple responses must be associated with a single input (as with ambiguously or inconsistently pronounced orthographic segments).

Thus, there is at least suggestive evidence that interleaved learning may provide the best way of avoiding catastrophic interference in leaning of input-output mapping systems such as the English spelling-to-sound mapping system, as well as extracting out the relevant underlying structural regularities that will allow generalization. It remains to be seen whether the predictions of this account can be confirmed experimentally.

Additional evidence concerning the conditions under which it is easiest for regular and exceptional items to become represented in the same system is reviewed next.

THE IMPORTANCE OF RELATIVE FREQUENCY

In studying the acquisition of verb tense learning, Plunkett and Marchman (1991) delineated some important constraints on the conditions under which exceptions and exceptions/subregularities can become represented within the same system. A complete review of this research is beyond the scope of this chapter, but for present purposes one significant conclusion may be emphasized. This is that, in a simple associative network, exceptions can only become stably represented in a system where there are underlying and conflicting regularities if the exception items are sufficiently high in frequency. If the (token) frequency of exceptional items is too low, then it is difficult or impossible for the correct output to become associated with them, for the tendency to regularize those items due to the high frequency of regular items becomes too strong. Therefore, it appears to be no accident that in language systems such as the English spelling-to-sound system or verb tense formation the irregular items tend to be high in token frequency. This appears to be a computational consequence of the need to represent exceptions and regularities in the same system.

Again this has important implications for learning. It suggests that it might be a mistake to avoid as far as possible exceptional items in the early stages of reading learning, because the irregular items will be swamped and be hard or impossible to represent if they are too low in token frequency. Again, there is a dearth of empirical evidence on this question. Although there is as yet no evidence against it, the conclusion must remain tentative, and the additional possibility remains that an early emphasis on regular items may lead to a useful focus on the levels of representation over which regularities may be defined. The importance of representations is discussed in the next section.

THE IMPORTANCE OF REPRESENTATION

Subsequent to the influential connectionist reading model of Seidenberg and McClelland (1989), much research has demonstrated that the nature of the representations that a connectionist model of reading is provided with has a major impact on its performance. The Seidenberg and McClelland model was criticized for its poor nonword reading performance (Besner, Twilley, McCann, & Seergobin, 1990), but improved nonword performance can be obtained if more fine-grained input and output representations are used. These allow the network to more easily capture generalizations at the level of graphemes and phonemes (e.g., Brown, 1997; Bullinaria, 1995; Norris, 1994; Plaut et al., 1996). Therefore, a model that is provided with explicit representations at many different levels will perform well (Norris, 1994; see also Phillips, Hay, & Smith, 1993). Furthermore, it has been argued that some paradoxical deficits associated with developmental dyslexia may be explained in terms of the computational capacity of a network (Brown & Loosemore, 1995; Seidenberg & McClelland, 1989) or, alternatively, in terms of the specificity of the representations given to the model (Brown, 1997; Metsala & Brown, in press). Finally, it is well-established that a network will learn more easily if it is provided with prestructured phonological output representations (Harm, Altmann, & Seidenberg, 1994; Hulme, Snowling, & Quinlan, 1991). Other extensions of the approach have been more concerned to introduce alternative routes into the model.

This evidence for the importance of representation will, of course, come as no surprise to those involved in reading instruction (see, e.g., Stahl & Murray, chap. 3, this volume; Torgesen & Burgess, chap. 7, this volume). However, the computational results described here do enable an understanding of how the importance of developing the right phonological and orthographic representations can be understood as the behavior of a system developing toward an endpoint that is characterized by statistically optimal behavior in a given cognitive domain.

DISCUSSION

The aim of this chapter has been to show that new perspectives on the nature of skilled adult reading may have important implications for our understanding of the best way to teach reading skills. Although much further research remains to be done, empirical evidence consistent with the idea that skilled adult reading involves statistically optimal or adaptively rational behavior has been reviewed. Thus, the process of learning to decode the spelling-to-sound system can be seen as just one instantiation of the general cognitive process of coming to internalise the statistical structure of the environment. This view leads naturally to an emphasis on the role of task structuring in instruction.

A consequence of this perspective is that general results obtained from the study of statistical learning become relevant to reading instruction. The final section drew attention to some results from the study of learning in connectionist systems and suggested that these results might have implications for the process of literacy instruction. Relatively little empirical work has addressed these issues. Some relevant work was conducted by Becker, Carnine, Engelmann, and their colleagues in the early 1980s as their theory of instruction. The direct instruction model (e.g., Becker, Engelmann, Carnine, & Rhine, 1981) was based on the assumption that classroom learning can be enhanced by careful engineering of students’ interaction with the environment to be learned. This research program emphasized the importance of the analysis of sameness and differences among examples used in teaching. The aim was to identify, in the input to be learned, the structural basis for generalization. In general terms this fits well with the insights from computational learning described previously. As Carnine and Becker (1982) pointed out, an enormous work on stimulus generalization has been carried out under the banner of learning theory. They focused on the implications for generalization in learning to read. How does one structure a set of examples to ensure maximum generalization, and how should the design be informed on the basis of the underlying structure of sameness and difference in the examples to be used? Much of the direct instruction approach was motivated toward drawing attention to significant contrasts, by using sets of training exemplars that are only minimally different from one another (see, e.g., Carnine & Becker, 1982). On the basis of the computational results outlined earlier, it is suggested that there is ample reason to believe that learning of the English spelling-to-sound mapping system may be greatly facilitated if regular and inconsistent items are introduced in an appropriate order, with appropriate frequency, and if practice is carefully scheduled to facilitate maintenance and generalization (Rea & Modigliani, 1985; Schmidt & Bjork, 1992). There is little evidence that current educational practice approaches optimality in any of these respects.

ACKNOWLEDGMENTS

The research reported here was partially supported by a grant from the Economic and Social Research Council (U.K.), R000 23 2576. I thank Jonathan Solity for many helpful discussions. Correspondence concerning this article should be addressed to Gordon D. A. Brown, Department of Psychology, University of Warwick, Coventry, CV4 7AL, U.K. E-mail: [email protected]

REFERENCES

Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Lawrence Erlbaum Associates.

Anderson, J. R., & Milson, R. (1989). Human memory: An adaptive perspective. Psychological Review, 96, 703–719.

Anderson, J. R., & Schooler, L. J. (1991). Reflections of the environment in memory. Psychological Science, 2, 396–408.

Barnes, J. M., & Underwood, B. J. (1959). “Fate” of first-list associations in transfer theory. Journal of Experimental Psychology, 58, 97–105.

Becker, W. C., Engelmann, S., Carnine, D. W., & Rhine, W. R. (1981). Direct instruction model. In W. R. Rhine (Ed.), Making schools more effective. New York: Academic.

Besner, D., Twilley, L., McCann, R. S., & Seergobin, K. (1990). On the association between connectionism and data: Are a few words necessary? Psychological Review, 97, 432–446.

Bowey, J. A. (1996). Phonological recoding of nonword orthographic rime primes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 117–131.

Bowey, J. A., & Hansen, J. (1994). The development of orthographic rimes as units of word recognition. Journal of Experimental Child Psychology, 58, 465–488.

Brown, G. D. A. (1987). Resolving inconsistency: A computational model of word naming. Journal of Memory and Language, 26, 1–23.

Brown, G. D. A. (1996). A rational analysis of reading: Spelling-to-sound translation is optimal. Manuscript submitted for publication.

Brown, G. D. A. (1997). Developmental and acquired dyslexia: A connectionist comparison. Brain and Language.

Brown, G. D. A., & Loosemore, R. L. (1994). Computational approaches to normal and impaired spelling. In G. D. A. Brown & N. C. Ellis (Eds.), Handbook of spelling: Theory, process and application (pp. 9–33). Chichester, England: Wiley.

Brown, G. D. A., & Loosemore, R. L. (1995). A computational approach to dyslexic reading and spelling. In C. K. Leong & R. M. Joshi (Eds.), Developmental and acquired dyslexia: Neuropsychological and neurolinguistic perspectives (pp. 195–219). Dordrecht, The Netherlands: Kluwer.

Brown, G. D. A., & Vousden, J. I. (in press). Adaptive analysis of sequential behaviour: Oscillators as rational mechanisms. To appear in M. Oaksford & N. Chater (Eds.), Rational models of cognition. Oxford, England: OUP.

Brown, G. D. A., & Watson, F. L. (1994). Spelling-to-sound effects in single-word reading. British Journal of Psychology, 85, 181–202.

Bullinaria, J. D. (1995). Neural network models of reading without wickelfeatures. In J. Levy, D. Bairaktaris, J. Bullinaria, & D. Cairns (Eds.), Connectionist models of memory and language (pp. 161–178). London: UCL Press.

Carnine, D. W., & Becker, W. C. (1982). Theory of instruction: Generalisation issues. Educational Psychology, 2, 249–262.

Chater, N. (1995). Neural networks: The new statistical models of mind. In J. Levy, D. Bairaktaris, J. Bullinaria, & D. Cairns (Eds.), Connectionist models of memory and language (pp. 207–228). London: UCL Press.

Coltheart, M., Curtis, Atkins, P., & Haller, M. (1993). Models of reading aloud: Dual-route and parallel-distributed-processing accounts. Psychological Review, 100, 589–608.

Coltheart, V., & Leahy, J. (1992). Children’s and adults’ reading of nonwords: Effects of regularity and consistency. Journal of Experimental Psychology: Learning, Memory and Cognition, 18, 718–729.

French, R. M. (1992). Semi-distributed representations and catastrophic forgetting in connectionist networks. Connection Science, 4, 365–377.

Glushko, R. J. (1979). The organization and activation of orthographic knowledge in reading aloud. Journal of Experimental Psychology: Human Perception and Performance, 5, 674–691.

Harm, M., Altmann, L., & Seidenberg, M. S. (1994). Using connectionist networks to examine the role of prior constraints in human learning. Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society (pp. 392–396). Hillsdale, NJ: Lawrence Erlbaum Associates.

Hulme, C., Snowling, M., & Quinlan, P. (1991). Connectionism and learning to read: Steps towards a psychologically plausible model. Reading and Writing, 3(2), 159–168.

Jared, D., McRae, K., & Seidenberg, M. S. (1990). The basis of consistency effects in word naming. Journal of Memory and Language, 29, 687–715.

Kryzanowski, J. A., & Carnine, D. W. (1980). Effects of massed versus distributed practice schedules in teaching sound-symbol correspondences to young children. Journal of Reading Behavior, 8, 225–229.

Kucera, H., & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.

Laxon, V., Masterson, J., & Coltheart, V. (1991). Some bodies are easier to read: The effect of consistency and regularity on children’s reading. Quarterly Journal of Experimental Psychology, 43A, 793–824.

Laxon, V., Masterson, J., & Moran, R. (1994). Are children’s representations of words distributed? Effects of orthographic neighbourhood size, consistency and regularity of naming. Language and Cognitive Processes, 9, 1–27.

Lewandowsky, S. (1991). Gradual unlearning and catastrophic interference: A comparison of distributed architectures. In W. E. Hockley, & S. Lewandowsky (Eds.), Relating theory and data: Essays on human memory in honor of Bennet B. Murdoch (pp. 445–476). Hillsdale, NJ: Lawrence Erlbaum Associates.

McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102, 419–457.

McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 24, pp. 109–165). New York: Academic.

Metsala, J. L., & Brown, G. D. A. (in press). The development of orthographic units in reading and spelling. To appear in R. M. Joshi & C. Hulme (Eds.), Reading and spelling: Development and disorders. Mahwah, NJ: Lawrence Erlbaum Associates.

Norris, D. (1994). A quantitative model of reading aloud. Journal of Experimental Psychology: Human Perception and Performance, 20, 1212–1232.

Phillips, W. A., Hay, I. M., & Smith, L. S. (1993). Lexicality and pronunciation in a simulated neural net. British Journal of Mathematical and Statistical Psychology, 46, 193–205.

Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. E. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 103, 56–105.

Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-layered perception: Implications for child language acquisition. Cognition, 38, 43–102.

Plunkett, K., & Marchman, V. (1993). From rote learning to system building: Acquiring verb morphology in children and connectionist nets. Cognition, 48, 21–69.

Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review, 97, 285–308.

Rea, C. P., & Modigliani, V. (1985). The effect of expanded vs. massed practice on the retention of mulitplication facts and spelling lists. Human Learning, 4, 11–18.

Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tenses of English verbs. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 2, pp. 216–271). Cambridge, MA: Bradford Books/MIT Press.

Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217.

Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523–568.

Seidenberg, M. S., Waters, G. S., Barnes, M. A., & Tanenhaus, M. K. (1984). When does irregular spelling influence word recognition. Journal of Verbal Learning and Verbal Behavior, 23, 383–404.

Taraban, R., & McClelland, J. L. (1987). Conspiracy effects in word pronunciation. Journal of Memory and Language, 26, 608–631.

Treiman, R., Goswami, U., & Bruck, M. (1990). Not all nonwords are alike: Implications for reading development and theory. Memory & Cognition, 18, 559–567.

Treiman, R., Mullennix, J., Bijeljac-Babic, R., & Richmond-Welty, E. D. (1995). The special role of rimes in the description, use, and acquisition of English orthography. Journal of Experimental Psychology: General, 124, 107–136.

Waters, G. S., & Seidenberg, M. S. (1985). Spelling-sound effects in reading: Time course and decision criteria. Memory & Cognition, 13, 557–572.

Waters, G. S., Seidenberg, M. S., & Bruck, M. (1984). Children’s and adults’ use of spelling-sound information in three reading tasks. Memory & Cognition, 12, 293–305.

__________

1It is not, of course, intended to imply that there are not many other important processes that combine to make up skilled adult reading ability. The rational analytic approach that is applied here to the case of single-word reading may eventually be applicable to other aspects of reading behavior; at present, however, single-word reading provides the most fruitful domain of application for the approach.

2Note that all of the experiments considered were primarily designed to examine the consistency of pronunciation of orthographic rimes, and it is therefore reasonable to assume that effects of spelling-to-sound consistency at other levels of analysis would not be a major factor in the present analyses.

3Three studies were excluded prior to analysis as unrepresentative; see Brown (1996) for further discussion.

4One possible exception to this is provided by the phenomenon of “starting small” (Elman, 1990). In studying the acquisition of syntax, Elman showed that a system can best learn if it is initially exposed to a relatively simplified form of the grammar to be learned, thus enabling it to encode solidly the underlying regularities in the system, and then is subsequently introduced to more complex mappings (longer distance dependencies, in the case of syntax acquisition). This shows that there are some circumstances under which a connectionist learning system can most easily accommodate both exceptions and regularities if it is intially made easy for the system to encode regularities. If the system is, in contrast, given the task of learning the simple and complex regularities all at once, it may fail ever to learn the underlying regularities.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.19.17