Chapter 2

Spatial Representations: What Do We Mean by Space?

WHAT DO WE MEAN BY SPACE?

In this chapter, the tour of types of psychological representations begins with a focus on spatial representations. A spatial representation has a representing world that incorporates a notion of space in it. Although the representing world is a space, the represented world need not be a space. Indeed, in many of the examples in this chapter, the represented world is not space, but rather concepts or word meanings. In addition to thinking about how space can be used as a representation, I also examine the processes that can be applied to a spatial representation.

What is meant by spatial representation? First, think about the notion of a space. From where I sit now, I see my computer and keyboard. To the left is a telephone and in front of that a coffee mug. The coffee mug is closer to the telephone and farther away from the computer. All these objects have a set of spatial relations to each other, with a distance and direction to each other object. What I perceive as real space has three dimensions. From my current vantage point, if I fix an origin directly between my eyes, I can think of the locations of objects in the world in terms of their coordinates in three dimensions: One is vertical and passes directly between my eyes; the second is horizontal and passes through each eye; and the third is depth, which is perpendicular (or orthogonal) to the other two dimensions and meets the other dimensions at the origin. I fix this origin arbitrarily. It could easily be fixed at the door to my office, in the ceiling or even on the George Washington Bridge, a good 5 miles from where I am now. Likewise, the three dimensions, which all pass through the origin, are arbitrarily oriented with respect to the origin. As long as the dimensions are independent (that is, orthogonal), I can select three dimensions to describe space in any way I want. In real space, a pair of orthogonal dimensions is a 90-degree angle with respect to each other. When the origin changes, the coordinates used to fix the locations of the objects in space are different, but the relations among the objects themselves do not change. In each space, the objects have the same relative locations with the same distances between them. For now, I focus on three key aspects of a space: (1) points that fix locations in a space, (2) distances between points in a space, and (3) a set of orthogonal dimensions that specify important reference directions in a space.

Space is commonly used as a representation in diagrams. For example, a football coach may draw Xs and Os on a diagram like the one in Figure 2.1. In this diagram, the two-dimensional space correponds to the two dimensions on the ground of a football field. An origin can be fixed in this space at the point where the ball would be located, and two orthogonal dimensions (shown in Figure 2.1 as dotted lines) can be drawn. In this diagram, the Xs and Os represent players (Xs are defensive players, and Os are offensive players). The distance between points corresponds to the distance between players. In general, points in a space can correspond to some entities in the represented world, and distances between points can be used to represent some relation between the entities in the represented world. In Figure 2.1, the representing space represents a real spatial configuration, and distance in the representing space corresponds to distance in the represented world.

Images

FIG. 2.1. Diagram illustrating positions of players on a football field. The spatial relations between players in the represented world is represented by the spatial relations between circles and Xs in the representing world.

The represented world need not be a space; the points in a space can represent objects, and the distances between them can represent similarities between objects (as in Figure 2.4). In this case, points close in the space represent objects that are similar to each other. Points that are far away represent objects that are dissimilar. This use of distance fits with the commonsense notion that similar things are mentally “nearer” to each other than are dissimilar things. The distance need not be construed as similarity, however. It is also possible to think of the space as a preference space, with the distance between points corresponding to differences in preference between them. Any quantity that can be construed as changing with the distance between points in this mental space can be represented by distance. Finally, the dimensions of the space can be conceptual features. For example, the dimensions in a space representing a set of animals may reflect size and predacity.

A space is a mathematical concept (I give a formal definition of a space later). This concept is useful for explaining how to measure distance between points in physical space, but I can abstract the definition of a space to more than three dimensions. Of course, the physical world appears as a three dimensional space and most people cannot imagine a space with more than three dimensions. Nonetheless, high-dimensional spaces have been used as representations in many psychological models.

MULTIDIMENSIONAL SCALING

Mental distance models have been very popular in psychology because they are supported by a sophisticated mathematical technique for deriving a map of a space from the distances between points. This technique, called multidimensional scaling or MDS, takes as input the distances between points. Using some algorithm (often a gradient descent algorithm that moves the points in the space in directions that improve the fit between the positions of the points and the actual distances), an MDS program develops a map that locates all the points in space (Kruskal & Wish, 1978; Shepard, 1962). Imagine giving an MDS program the lower diagonal matrix in Table 2.1. Each number in the table represents the flying distance between the city in the row and the city in the column. For example, the flying distance between Atlanta and Chicago is 587 miles. The MDS program needs only the lower diagonal matrix, because distances are symmetric, and the upper diagonal matrix would be redundant.

TABLE 2.1
Flying Distances Between 10 Cities in the United States

Images

Figure 2.2 shows the best two-dimensional solution for the distances in Table 2.1. The MDS analysis reconstructs a spatial map for these cities in which the cities appear in the same relative locations as on an ordinary map of the United States. The dimensions generated by MDS are arbitrary. In this solution, the east-west dimension is similar to that seen in maps, but the north-south dimension is inverted. Because the dimensions are arbitrary, I can rotate the dimensions to any position desired, as long as the dimensions are orthogonal. Thus, I can easily invert this dimension to get the configuration typically seen on a map.

For cities in the United States, it seems strange to take all this trouble to develop a map, particularly because I can get the same information from an atlas. The power of MDS comes from mapping uncharted mental spaces. Imagine that instead of entering distances between U.S. cities, people were asked to listen to pairs of letters in Morse code and to press one button when the letters were the same and another button when the letters were different. Morse code was developed for telegraphy, and consists of letters made up of combinations of long and short tones. People unfamiliar with Morse code will make some errors in this task. The errors can be placed into a lower diagonal matrix, like the distances between U.S. cities, and a space can be developed. Rothkopf (1957) did such a study and fed the confusion data into an MDS program, which generated the two-dimensional solution shown in Figure 2.3. For reference, the Morse code signals for the 26 letters of the English alphabet are shown in Table 2.2.

The data from this study were simply confusions between Morse code signals. The two-dimensional solution generated is only a spatial description of the data in the lower diagonal matrix. Yet, the MDS solution is a candidate for the representation underlying people’s perception of Morse code signals, and the space seems to divide the signals in a way that seems psychologically plausible. The shortest signals (E and T, which correspond to the Morse code strings /./ and /-/, respectively) are together in the upper left corner of the space. Following these to the right is a band of four other letters (I, A, N, and M), which correspond to Morse code strings with two tones each. In this band, the I and M are farthest apart, a sensible arrangement because I corresponds to the string with two short tones /../, and M corresponds to the string with two long tones /--/. The letters A (/.-/) and N (/-./), which have one short tone and one long tone, are between them. This band of two-tone letters is followed by a band of three-tone letters (S, U, R, D, W, K, G, and O); again, the letters farthest apart (S and O), consist of all short tones /…/ and all long tones /---/, respectively. Finally, there is a band of four-tone letters at the farthest right (H, V, F, B, L, C, X, P, Z, Y, J, and Q). In this case, H, the letter consisting of all short tones /.…/ is farthest to the bottom. There is no Morse code string with four long tones, but the three letters with three long tones (J, Q, and Y) are all at the opposite end of the four-tone band from H. Thus, the dimensions of the space seem to be the length of the Morse code strings and the long and short tones in the character string. The actual dimensions derived from an MDS program are arbitrary: Rotating the dimensions and drawing a pair of orthogonal dimensions make the character length and long- versus short-tone dimensions clearer. These dimensions are also drawn in Figure 2.3.1

Images

FIG. 2.2. Configuration generated by a multidimensional scaling program with the distances given in Table 2.1. This configuration reconstructs the relative positions of U.S. cities, although the north-south axis is reversed.

Images

FIG. 2.3. MDS solution obtained from the data from Rothkopf’s (1957) study of the similarity of Morse code characters. Two interpretable dimensions are shown as dotted lines on the figure. The more horizontal dimension corresponds to the number of tones in a character; the more vertical dimension corresponds to the relative number of short and long tones in the character.

TABLE 2.2
The 26 Letters of the English Alphabet and Their Corresponding Morse Code Signals

Images

In many situations, the experimenter determines the dimensions underlying the space by looking at the configuration of points. In the case of the Morse code data, it is straightfoward to look at this configuration and pull out length and tone type as dimensions. Another technique that can produce interpretations of the dimensions derived from MDS requires people to give ratings of the objects forming the points in space on dimensions expected to form the basis of the space. These data can be entered into a multiple regression equation in which the scale rating for each point is predicted by the coordinates of the point in some arbitrary space. If the multiple regression equation predicts significant variance in the rating, the best fitting regression line may be interpreted as a dimension underlying the space.

MDS provides a powerful tool for describing psychological data. From responses by participants in studies (in this case, judgments of “same” and “different”), it is possible to derive a description that seems like a plausible candidate for the information people extract from the stimuli presented. Regardless of whether mental space representations are a good idea for developing an account of a particular psychological process, MDS remains an important tool for describing proximity data of this type. Although I am not concerned here with the many MDS algorithms that have been developed for data analysis, a number of excellent sources have reviewed these techniques (Arabie, Carroll, & Desarbo, 1987; Kruskal & Wish, 1978).

USING SPACE TO REPRESENT SPACE

If mental spaces are used as representations, it seems straightforward to use mental space as a representation of physical space and particularly of the visual experience of physical space. Researchers have offered many proposals that have implicitly or explicitly assumed that key spatial properties of images are preserved in mental representations of visual images (Glenberg, Kruley, & Langston, 1994; Shepard & Cooper, 1982; Tye, 1991). According to these models, representations of mental images are like real images: The format of the mental representation of visual images preserves spatial relations among elements. In this representation, points near each other in the represented space are represented by points near each other in the representing world, and operations performed on the representing world treat it as a space. For example, to move from one point in a representation to another requires traversing the intermediate points.

A number of classic studies were performed to support this view. For example, Kosslyn, Ball, and Reiser (1978) asked people to learn the locations of objects on the map of an island. After this learning phase, the researchers asked the subjects to form an image of the whole map and to focus on an object on the map named by an experimenter. Then, a second object was named. If the new object was also on the map, people were to scan their mental image of the map from one location to the next and to press a button when they reached the location of the second object. As expected if mental images preserved spatial relations, it took longer to mentally scan between objects distant on the map than to scan between objects close together on the map. Expanding the image of the map uniformly increased the scanning times, as expected if the image representation was spatial.

A second classic phenomenon that bears on visual representation is that of mental rotation (Shepard & Cooper, 1982). In the typical study of this phenomenon, researchers showed people nonsense figures or letters. For the nonsense figures, pairs of figures oriented in slightly different ways were shown. On each trial, participants were to press one button if the figures were identical (except for the difference in orientation) and a second button if they were mirror images of each other. For letters, a single letter was presented with some degree of rotation from its canonical position, and participants were asked to press one button if the letter was presented as it is normally written (except for the difference in orientation) and a second button if the letter was a mirror image of the way it is normally written. For both studies, the time to respond was essentially a linear function of the angle of rotation, with figures at higher degrees of rotation requiring more time for response than figures at low degrees of rotation.

Both of these studies were motivated by the assumption that representations of visual information have a spatial component. That is, the mental image itself has a two- or three-dimensional form, and operating on an image involves processes like “scanning” from one point in the image to another or “rotating” the image. Early critics of these models (e.g., Pylyshyn, 1981) described flaws in this approach to mental representation. Pylyshyn pointed out that going from one point to another in an image requires knowing the relative locations of the points. Otherwise, a person is likely to take a mental journey in the wrong direction. Similarly, to rotate one figure into another through the shortest distance requires some knowledge of what the figure depicts. To know the relative location of pairs of points or to know what a figure depicts, a person must have some way of looking at the entire space. If the mental image was really a “picture” in the head, though, looking at the entire space would require someone who could look at the image and interpret it. This is the homunculus problem discussed in chapter 1.

The homunculus problem is avoided here by assuming that all mental representations are acted on by computational processes. These computations can be carried out without benefit of an intelligent agent (e.g., Kosslyn, 1994; Tye, 1991). This process, however, does not obviate the fact that going from point A to point B requires knowing the direction in which the journey takes place. To solve problems of this type, most models of mental imagery assume that there are both spatial representing worlds for mental images and also other representing worlds that explicitly represent relations. The mental representation of a space may have a relation telling it that point A is north of point B. This qualitative relation is hard to represent in a space; because distance is a continuous quantity, it is better to use discrete symbols to represent such relations (see chaps. 3 and 5). Thus, mental imagery is likely to require both a spatial representing world and a representing world with discrete symbols in it (see also Kosslyn, 1994).

USING SPACE TO REPRESENT CONCEPTS

As already discussed, mental spaces can be used to represent actual spaces. How can a mental space be used in a psychological model of conceptual processing? In the example of Rothkopf’s study of Morse code, I used a matrix of confusion data to develop a spatial representation, then decided that the space looked psychologically plausible. I did not specify a process that may act on such a representation or embed this representation in a model of a psychological task. Fortunately, there are examples of spatial representations that have been used in models of psychological processes. This section describes a few such models.

One simple and elegant model was developed by Rips, Shoben, and Smith (1973) to account for the ease with which people can verify simple class-inclusion statements like “A robin is an animal” and “A robin is a bird.” The typical measure of ease for verifying these statements is a person’s response time to say True or False when presented with a statement of the form “An X is a Y.” People generally need less time to verify that an object is a member of its immediate superordinate category than to verify that it is a member of a more distant superordinate category. For example, people are faster to respond True to the sentence “A robin is a bird” than they are to respond to the sentence “A robin is an animal.” This finding is not surprising, as bird is a more specific category than is animal. The opposite pattern of results, however, also occurs: People are slower to respond to the sentence “A bear is a mammal” than to the sentence “A bear is an animal,” even though mammal is a more specific category than is animal. People also respond faster to the sentence “A bear is an animal” than to the sentence “A robin is an animal,” even though both sentences require examining a common superordinate category.

To explain patterns of data like this, Rips et al. (1973) derived a spatial representation of animals and a representation of birds by collecting ratings of pairwise relatedness for all pairs in a set of mammals and in a set of birds. These relatedness ratings were then submitted to an MDS program. The configurations generated by the MDS program are shown in Figure 2.4. The researchers assumed that people verified one category as a subset of another by traversing the distance from one to another. If the points were near to each other, that would be evidence that the categories were related. Rips et al. did not specify how the subset–superset relation is determined; nothing inherent in distance discloses the relation between objects, and distance measures only the relatedness of the concepts. Nonetheless, according to this model, people would be faster to respond to sentences involving points that are near to each other than to respond to sentences involving points that are far apart. To test this hypothesis, the distances from the derived space were compared to the response times in the verification task. The distances between the object and its superordinate category in the mental space were correlated with the statement verification times, a finding suggesting that the more similar the object was to its superordinate, the faster that a statement of the form “An X is a Y” could be verified.

Images

FIG. 2.4. MDS solution obtained from judgments of relatedness among a set of birds and a set of animals. These spatial solutions were used in a model of people’s ability to verify sentences of the form “An X is a Y.” From L. J. Rips, E. J. Shoben, and E. E. Smith (1973). Copyright © 1973 by Academic Press. Reprinted with permission.

A more elaborate use of spatial representations was developed by Rumelhart and Abrahamson (1973), who used such a representation to model people’s performance on four-term analogy problems (see also Rips et al., 1973; Tourangeau & Sternberg, 1981). Four-term analogy problems are the A is to B as C is to D problems typically seen on achievement tests; a person must extract some relation between the A and the B terms and then find C and D terms with the same relation between them. In a spatial representation, an analogy can be thought of as a correspondence between one dimensional structure and another. Rumelhart and Abrahamson (1973) suggested that analogies can be solved by a parallelogram method in which the relation between the A and B terms is found by drawing a line from the A term to the B term. Next, a line is drawn from the A term to the prospective C term. The analogy is a good one to the extent that the line drawn from the C term to the D term is of the same distance and direction as the line from the A term to the B term. For example, Figure 2.4 shows the two-dimensional space for birds developed by Rips et al. (1973). A simple analogy such as Robin is to Sparrow as Duck is to? is processed by first drawing a line between Robin and Sparrow and then drawing a line from Robin to Duck. Finally, a line of the length and direction between Robin and Sparrow can be drawn. This procedure suggests that Chicken is an appropriate completion to this analogy.

In a variety of studies, Rumelhart and Abrahamson (1973) and Rips et al. (1973) gave people four-term analogies in which they had to select the most appropriate D term from a set of choices. The results suggested that people were indeed selecting a D term that differed from the C term by about the same distance and direction as the B term differed from the A term. Rumelhart and Abrahamson found that people could learn new animals by analogy to old ones. In one study, they arbitrarily placed new points in a space of animals and labeled the points with nonsense words. During the study, the nonsense words sometimes appeared in the four-term analogies. By the end of the study, people appeared to place the nonsense words in approximately the right point in space.

Although this kind of representation provides an interesting account of analogies, there are problems with this approach, in particular, the analogies used to test it seem rather strange. For example, in the absence of a set of choice alternatives, it is not clear that people can generate the correct solution to an analogy like Pigeon is to Parrot as Goose is to? even though the spatial representations in Figure 2.3 suggest that Duck provides a reasonable response to this analogy. In contrast, in many analogies that people think are good ones, they can generate responses on their own. For example, Cat is to Kitten as Dog is to? can be solved fairly easily. One reason that the first analogy seems strange whereas the second seems natural is that people apparently prefer relations between A and B terms which they can name explicitly; the relation between Pigeon and Parrot seems hard to name. The model of analogy based on spatial representations requires only a distance and a direction and needs no more explicit labeled relations between things; hence it seems to lack something as a model of analogy.

Indeed, there is evidence that subjects found the analogies used in studies testing this spatial model to be odd. Sadler and Shoben (1993) demonstrated that people in these studies were probably not using the parallelogram model suggested by Rumelhart and Abrahamson (1973). Instead, participants tended to select the answer from the set of choices closest to the C term in space. This result suggests that participants could not find the relation between the A and B terms and so used a simple heuristic to select a D term for the analogy on the basis of a C term and choices of possible D terms. This solution strategy contrasts with people’s behavior with analogies like the Cat is to Kitten analogy, for which the relation between the A and B terms is readily apparent and easily extended to other objects.

Despite these problems with the model of analogy, these examples offer some lessons on the use of spatial models. First, it is straightforward to use spatial representations in models of psychological processing. In the case of sentence verification, the key aspect was a correlation between distance in a mental space and response time. Of course, many details would have to be filled in to make an algorithmic-level description of sentence verification. In the case of analogy, the process was somewhat more complex and required a mapping between two spaces. In the examples discussed here, the points were both in the same space, and so this mapping was not required. With more complex analogies in which birds were mapped to animals or characters in a play, the dimensions of two distinct spaces would have to be aligned. Indeed, Tourangeau and Sternberg (1981) suggested that metaphors become more apt not only as the distance and direction between the A and B terms match the distance and direction between the C and D terms, but also as the distance between the domain encompassing the A and B terms and the domain encompassing the C and D terms increases. Thus, a metaphor comparing one person’s eyes to another person’s is less apt than a metaphor comparing one person’s eyes to the sun.

These examples also demonstrate the limits of the power of a spatial representation. The processes that operate on a spatial representation are sensitive to the distance between points. Thus, spatial models do not employ ways of describing points which do not use distances. A psychologist looking at an MDS solution like the one in Figure 2.4 may infer that people were sensitive to the size of birds, because small birds lie on one side of the space and large birds on the other. A model that operates over this space, however, does not use these dimension interpretations explicitly and thus does not explicitly use information about size.

Using the Definition of Space

Earlier, I pointed out that space is a mathematical notion with a formal mathematical definition. In this section, I discuss some main axioms that must hold for something to be a space. The formal properties of a space make spatial representations easy to test. If a domain that needs to be represented does not obey the axioms that define a space, a spatial model may be a poor choice as a representing world for that domain. The axioms help determine the power of spatial representations.

A. Tversky (1977) discussed three metric axioms that define the concept of a space: minimality, symmetry, and the triangle inequality. Minimality is captured by the mathematical regularity:

Images

where d (x, y) is the function that returns the distance between two points (here x and itself). This axiom states that the distance between any point and itself in a space a minimum (that is, 0). This minimum distance is the same for any point and itself. That is, I am as close to myself as I can get, and you are as close to yourself as you can get.

The symmetry axioms can be cast as:

Images

That is, the distance between any two points (here x and y) is the same regardless of whether measured starting with point x or with point y. For example, a person travels the same distance from New York to Boston whether the departure is from New York or Boston (otherwise, a host of junior high school algebra problems would not work at all).

Finally, the triangle inequality states:

Images

That is, the distance between any two points is always less than or equal to the sum of the distances between those points and a third point. If the third point lies on the line between x and y, the distance is equal. Otherwise, the sum is always greater than the distance between x and y. This axiom simply says that it cannot be shorter to travel from point A to point B by going through point C than it is to go directly from point A to point B.

Another aspect of the mathematical concept of space is that it is not always necessary to calculate distance as it is calculated in the world. In daily life, people calculate distance “as the crow flies”: They draw a straight line from point x to point y, and the length of the line is the distance between points. This is the standard Euclidean notion of distance. Mathematically, I can write this distance calculation as:

Images

where N is the number of dimensions, and xi and yi are the values for points x and y along dimension i. In two dimensions, this is the familiar Pythagorean formula for the length of the hypotenuse of a right triangle, but this equation can be generalized to consider noneuclidean distances. The generalization of this formula is simply:

Images

The exponent r in this formula is called the Minkowski metric. When r = 2, distance is Euclidean. Another frequently used metric is the city block distance metric, which is obtained with an exponent of 1. In the city block metric, distance is not measured “as the crow flies,” but rather as if a person was walking on the sidewalks of a city. For each unit of distance on each dimension, the person must walk along the perimeter of the triangle rather than cutting across. For example, in New York, getting from 120th Street and Broadway to 119th Street and Amsterdam requires walking one block east (from Broadway to Amsterdam) and one block south (from 120th St. to 119th St.) It would be convenient to cut the corner and walk straight, but Columbia University is in the way.

The importance of this variety of distance metrics is that people can choose to measure distance in a representing space in different ways depending on what they are representing with it. If the represented world is a real space, they may calculate distance in the representation by using a euclidean distance metric. If the represented world is something that comes in discrete steps, they may use a city block metric. I can use a space to represent bags of Halloween candy collected by children; here, each dimension corresponds to a type of candy. The distance between points can represent the perceived similarity of the load of candy collected by each child. Because candy bars are discrete units, however, it probably makes more sense to calculate the distance between points by using a city block metric rather than a euclidean distance.

One nice thing about having a strict formal definition for a space is that, for any given application of spatial representations, it is possible to test whether a representation is appropriate for the task. A. Tversky (1977) examined whether metric axioms held for similarity judgments to test whether spatial representations were appropriate for models of similarity. He found systematic violations of all three metric axioms. As a violation of the minimality axiom, Tversky pointed to studies like the one by Rothkopf described previously. In that study, the probability that people made a correct same–different judgment was not the same for all items. This result can be interpreted as an indication that people consider some pairs of identical objects more similar to each other than to other pairs (at least in this task context).

Tversky also found violations of the symmetry axiom, most typically from embedding objects in different sentence contexts. People (at least in the United States) prefer to say “Mexico is like the United States” rather than “The United States is like Mexico.” This example suggests that comparison statements do not necessarily mean the same thing when the order of the terms is reversed. As another example, the metaphorical statement “That butcher is a surgeon” is a compliment, but the statement “That surgeon is a butcher” is not.

Finally, researchers have pointed out violations of the triangle inequality. Typically, these violations result from different comparisons that use different aspects of an object. William James (1892/1989) noted that the moon is similar to a ball, because both are round; that the moon is similar to a lamp, because both are bright; but that a ball and a lamp are not at all similar. A. Tversky and Gati (1982) obtained a similar result with a variety of materials.

These findings do not mean that mental space representations are incorrect for all applications in psychology, but they may have limitations on their effectiveness. When cases lead to violations of the metric axioms, it is possible that some other kind of representation is more appropriate as the basis of a psychological account of these phenomena. Of course, faced with data like these, it is important to remember that explanations of psychological processes require both representations and statements of processes that use these representations. Changing the way that representations are processed can also help explain violations of the metric axioms.

To save mental space models, Nosofsky (1986) proposed that the weights on dimensions be adjusted dynamically. For example, dimensions along which there are close matches may be given greater weight than dimensions along which two items have very different values. Thus, in the William James example earlier mentioned, “roundness” may have greater weight when comparing the moon to a ball than when comparing the moon to a lamp. Similarly, “brightness” may have greater weight when comparing the moon to a lamp than when comparing the moon to a ball. The comparison of a ball to a lamp has no close matches, and so no dimensions are weighted heavily. Processing principles like this one may explain violations of the metric axioms.

Krumhansl (1978) gave a comprehensive suggestion for dealing with violations of the metric axioms. She proposed that the density of points in a neighborhood can be factored into calculations of similarity in a mental space representation. As one test of this hypothesis, Krumhansl examined asymmetries in Rothkopf’s Morse code data. In all, she found 39 cases with large asymmetries in correct responding (in which presenting the pair with one of the letters appearing first led to more incorrect responses than did presenting the pair with the other letter first). Consistent with her hypothesis, in 26 cases, there were more errors when the first letter presented in the pair had more neighbors in semantic space than did the second letter (i.e., the space around the first letter was more dense). In contrast, for only 5 cases did the asymmetry show more errors when the first letter presented had fewer neighbors in semantic space than did the second letter presented (there were eight ties). Thus, by assuming that mental distance can be sensitive to the density of points in a space, data that appear to violate the metric axioms can be accommodated.

A final objection against MDS models as techniques for uncovering latent psychological dimensions arises from studies examining the relation between MDS solutions and ratings of other psychological dimensions (Gerrig, Maloney, & Tversky, 1991). As discussed previously, ratings on scales can be used to label the dimensions of a space derived from MDS. Gerrig, Maloney, and Tversky asked subjects to state the degree of relatedness of personality traits by filling in the proportion of people likely to have one trait if they were known to have another. This relatedness task was done with a set of positive traits, a set of negative traits, or a mixed set of positive and negative traits. Subjects also rated the traits on dimensions like agreeableness and potency. Gerrig et al. reasoned that if the resulting MDS solutions reflect the underlying psychological dimensions, then the relation of the rated dimensions to the personality traits should be the same in all three data sets (i.e., positive, negative, and mixed). In contrast to this prediction, the relation between the ratings and the coordinates of the traits in space was very different for each set, a finding suggesting that, although MDS provided a reasonable description of the relatedness data, it did not extract a set of fundamental psychological dimensions that form people’s representations of personality traits.

To summarize, mental distance models have assumed that objects are represented as points in a multidimensional space. The examples discussed so far consist of fairly low-dimensional spaces. Indeed, MDS techniques rarely yield more than five dimensions and generally give only three dimensions for which there is a clear interpretation. Some work (like Tversky’s) has suggested that the rigid geometric definition of spaces limits the applicability of mental space models to psychology, but with sufficiently rich processing assumptions, spatial models can account for violations of the metric axioms. In the next section, I discuss representational systems using high-dimensionality mental spaces.

HIGH-DIMENSIONAL SEMANTIC SPACES

As just mentioned, many applications of mental space models in psychology have used low-dimensional spaces because spaces derived from MDS models tend to have no more than five good dimensions and typically have only three usable dimensions. Dimensions above the third tend to be unusable for two reasons. First, it is often difficult to figure out what the dimensions mean psychologically, and it is unsatisfying to use dimensions that seem nonsensical. More important, MDS techniques build spaces by minimizing the difference between the distances obtained from data and the fitted distances in the space derived from the model. As the number of dimensions increases, there is less distance that is not fit well. Further-more, in most applications of MDS, fewer than 30 points are placed in the space because a lower diagonal matrix has:

Images

entries in it, where N is the number of points in the space. Obtaining the distances between the 26 letters of the alphabet (and the 10 numerals) as Rothkopf did requires 630 distances. If these distances are obtained by having people do pairwise comparisons, 630 experimental trials would be required. At a rate of one trial every 10 seconds, this experiment would require an hour and 45 minutes. It is difficult to get people to sit through so many trials for that long a period without getting frustrated, and so psychological experiments often use many fewer points. For example, getting the pairwise distances between 16 points requires 120 observations to fill the lower diagonal matrix. At 10 seconds per trial, this study can be done in 20 minutes.

There are, however, other techniques for developing higher dimensional spaces. Increases in the speed of computers have allowed brute-force techniques that develop representations of words in the lexicon by analyzing large corpora of text (Burgess & Lund, 1997; Landauer & Dumais, 1996, 1997). For example, Landauer and Dumais developed a model of the lexicon through the analysis of a large corpus of text. Their interest was in the practical problem of retrieving articles from a large database on the basis of a query in the form of a set of keywords. One way to solve this problem, used in many database retrieval systems, is to index each entry by a set of keywords determined in advance. With this technique, however, new keywords cannot be added, and if a user enters a synonym to a keyword, the system finds nothing. In traditional systems, getting around this problem requires a thesaurus of index terms so that people can look up the proper word to enter for their query.

It would be convenient to have a system developed on the basis of articles entered into the database, rather than on the basis of pre-established keywords. Unfortunately, devising a system like this requires creating a method for dynamically establishing the meaning of keywords used for a search. Landauer and Dumais’s solution to this problem was to use a mathematical technique called singular value decomposition (related to factor analysis) to analyze statistical relationships among words in a corpus. In this way, they obtained some notion of the “meaning” of the words in the articles being indexed.

In this technique, a large number of documents are put into a matrix in which words that are part of the system are placed in the rows of the matrix; documents in the corpus make up the columns. This matrix is then factored, and many factors that account for only small amounts of variance in the original matrix are discarded. Thus, the matrix may start with 50,000 rows (corresponding to 50,000 words); after dimensions are discarded, it may be reduced to between 150 and 300 dimensions. This reduced matrix contains much information about the higher order correlational structure between the words in the input.

Imagine that a new query is entered into a system that has been reduced to 200 dimensions. This query is translated into a 200-dimensional vector. It may be hard to visualize 200 dimensions, but things work just as if there were only 2 or 3 dimensions. Just as a 2-dimensional vector is a ray in 2-dimensional space, a 200-dimensional vector shoots out in a particular direction in a higher dimensional space. By using the trigonometric function cosine, it is possible to find other vectors in space that are similar to the vector formed by the query. In a space, cosine is a measure of similarity. Two vectors that go in exactly the same direction have a cosine of one, and two vectors that are orthogonal have a cosine of zero. Vectors that go in approximately the same direction have cosines near one. Those vectors in the system with a high vector cosine with the query are likely to be semantically similar to the query. In practice, this system has eased retreival of journal articles from a database.

This method was also tested on the synonyms test from the Test of English as a Foreign Language (TOEFL). First, a singular value decomposition was obtained for segments of text from over 30,000 articles in an encyclopedia. The initial matrix had entries for over 60,000 words, and optimum performance was obtained with a decomposition that reduced this structure to between 300 and 325 dimensions. The system was given the vector corresponding to a word from the test, and the cosine between the vector corresponding to the test word and the vectors corresponding to the forced choice options were determined. The choice option with the highest cosine to the test vector was selected as the right answer. This system was able to get about 65% of the responses correct, approximately the level obtained by the average student from a non-English speaking country who takes the test.

This test demonstrates both the strengths and weaknesses of this high-dimensional space approach. On the positive side, a simple brute-force mechanism was able to capture semantic similarities between words. Although the system must process a lot of text to work, it is unclear that the amount of text is much greater than is the experience of a child learning language. Furthermore, the calculations are very simple: There is no need for an extensive decomposition of the text entered. The system does not analyze words and does not even remove morphological endings (like the -ed past tense ending on verbs). Rather, the system must discover the similarities between the same word stem with different morphological endings. Nonetheless, with very little tweaking, it performs a synonyms test at the level of non-native speakers of English who also take the same test.

Despite this success, there are weaknesses with this kind of spatial representation. In particular, it is fortunate that the model was tested with the synonyms test and not the antonyms test. The commonsense view is that antonyms are dissimilar from each other, but from another viewpoint, antonyms are very similar. Depressed may be an antonym for elated, but it is far more similar to elated than it is to rhinoceros. Indeed, a good pair of antonyms is as similar as possible to each other except for a very different value along a salient dimension. If a language processor cannot decompose a representation into a dimensional or featural structure, it cannot tell the difference between synonyms and antonyms. Thus, although techniques like singular value decomposition are very powerful, they cannot provide a complete basis for understanding word meaning, because they can determine only the proximity of two words in a semantic space. The representations do not contain all the information about the meanings of words that people seem to be able to use when processing language.

CONNECTIONIST MODELS: HOW THEY WORK

High-dimensional spaces like those developed using singular value decomposition can be quite powerful. In the example just described, the vectors were processed in only a simple way: Pairs of vectors were compared to find the vector cosine between them. Researchers have developed, however, a powerful set of techniques that allow pairs of vectors to become associated in much the same way that people can learn associations between concepts. The technique used to develop these associations involves a form of connectionist modeling. In this section, I discuss connectionist models that learn associations, and I examine their use of high-dimensional spaces (see J. A. Anderson, 1995; McClelland & Rumelhart, 1986).

Connectionism is a blanket term for a computational model described by analogy to the way the brain may function. Although the brain has an enormous number of neurons, each individual neuron does not do much. It can “fire” or send an electrical signal from the cell body to the end of the cell. When this signal (called an action potential) reaches the end of the cell, it sends chemicals (called neurotransmitters) into the space between cells (called a synaptic cleft). These chemicals influence neighboring cells. If the influence makes a neighboring cell more likely to fire, the relation between cells is excitatory. If the influence makes a neighboring cell less likely to fire, the relation between cells is inhibitory. No single cell seems to serve a vital function in the brain. Rather, the collective activity of groups of neurons leads to interesting behavior.

These basic facts about the brain are at the core of connectionist models. A simple connectionist model is illustrated in Figure 2.5. The models consist of units (analogous to individual neurons) and connections (analogous to synapses). In Figure 2.5A, four units, depicted as circles, are labeled input units (for reasons that will become clear), and four units labeled as output units. The connections are illustrated as lines connecting an input unit to an output unit. The connectionist models I discuss in this section are not identical to neurons: Instead of discrete signals that influence neighboring units, each unit has a level of activation, which is a value that measures the current influence of a given unit. Activation may be analogous to the rate of firing of a neuron (i.e., how many pulses the neuron sends in a given time). The connections between units determine how the activation of one unit affects the activation of a second unit. If high levels of activation in one unit promote high levels of activation in a second unit, the connection is excitatory. If high levels of activation in one unit promote low levels of activation in a second unit, the connection is inhibitory. Connections also vary in their weight, which determines the degree to which the activation on one unit affects another. Finally, as in the brain, no single unit in a connectionist model represents a concept; the pattern of activation across a set of units is used to represent a concept. For example, in Figure 2.5B, the network has activation levels of 1, –1, –1, and 1 across its four units. For the moment, assume that this pattern of activation represents the concept cat.

Images

FIG. 2.5. A: An illustration of a simple connectionist model. B: The same model with a pattern of activity on its input units giving rise to a second pattern of activity on its output units. C: A set of input and output patterns used in the example of how connectionist models use spatial representations.

In a simple connectionist model like the one shown in Figure 2.5A, the activation of the input units influences the activation of the output units. For example, I want the network to learn to produce a pattern of activation representing the concept meow when the pattern of activation representing the concept cat occurs. For this example, I assume that the concept meow is represented by the pattern of activity –1, –1, 1, 1 on the four output units. How may this association be learned?

Understanding how connectionist models do what they do, requires thinking about the spatial properties of the distributed representation. (This discussion draws heavily on concepts from linear algebra, and I present vectors and matrices with the text that follows. Readers not familiar with the concepts of linear algebra can read this section without spending too much time on the mathematics and still get the general flavor of what a distributed connectionist model does.) To think of a distributed representation as a spatial representation, first notice that the values of the input units can be written as a vector. For example, the activations of the units representing cat in Figure 2.5B can be written:

Images

This vector points from an origin in a four dimensional space to the point (1, –1, –1, 1). What is needed is some way of having this pattern of activation on the input units pass through the connections and give rise to the vector:

Images

on the output units. In the discussion that tollows, I reier to whole vectors and matrices using bold type (e.g., f) and individual elements of a vector or matrix using regular type (e.g.,f). Following established conventions, I refer to input vectors as f, with a subscript that states what the vector represents. I refer to output vectors as g, also with a subscript that states what it represents. Sometimes it is convenient to write a vector as a row rather than a column. In this case, I refer to the transpose of a vector f, as f’ or:

Images

The connections between the input and output units can be written as a matrix (typically called A), in which a given element Aij represents the connection between input unit i and output unit j.

Simple connectionist models assume that the activation of a unit in the output is just the sum of the activations of the input units connected to that output unit multiplied by the weight of the connection. That is, the activation of some output unit gj is just:

Images

where Aij is the connection between input unit i and output unit j, and fi is the activation of input unit i. More generally, with a matrix of connection weights A and an input vector f, the operation:

Images

determines the entire output vector (where each element gj of the output vector is determined by equation 2.9).

There is a way of getting just the right pattern of connections between the input and output units through the linear algebraic operation called the outer product. The outer product is the operation g f’, which with two N-dimensional vectors yields an N × N matrix. The change in each element of the matrix A, which I denote Δ Aij, is:

Images

If this formula is applied to the vectors corresponding to cat and meow in Figure 2.5C, the resulting weight matrix is:

Images

To see that this connection matrix really does create the output vector for meow (vector 2.7) with the input cat (vector 2.6), apply equation 2.9 to this matrix of connection weights. The outer product operation allows the weights on the connections between the input and output units to be created in a way that produces a given output with the corresponding input. In this way, the input is associated with the output. Thus, like a toddler learning to make animal sounds, this matrix is learning the association between the animal name and the sound (if there really is a way of attaching these representing vectors to some represented world).

If connectionist models learned only a single association, they would not be very interesting. Connectionist models have generated excitement in the cognitive science community because the same set of units can be used to store a variety of distinct associations. For example, Figure 2.5C shows activation patterns for CAT, DOG, MEOW, and BOW WOW. If I have already taught a connectionist model the association between cat and meow by using the procedure just described, I can now teach it the association between dog and bow wow in the same way. I take the outer product of the vectors for dog and bow wow and generate their outer product by using equation 2.10 yielding:

Images

Then, I add the connection matrix with the weights that yield MEOW with CAT (matrix 2.12) to the connection matrix that yields BOW WOW with DOG (matrix 2.13). Adding two matrices involves adding together all the corresponding elements in a pair of matrices. This operation yields:

Images

First, I can demonstrate that the two associations entered into the system were learned by multiplying the connection matrix and the input vector as in equation 2.9. Carrying out this multiplication for the cat association yields:

Images

Notice that the elements in the resulting array have the value 4, whereas the initial association for MEOW shown in Figure 2.4C had the value 1. This vector, however, does go in the same direction. Indeed, the obtained vector is simply the vector for MEOW with each entry multiplied by 4. Such a scalar multiplication preserves the direction of the initial vector. In connectionist models, it is the direction of vectors that matters. If I was worried about vector length, I could have arbitrarily set the values of the elements in the input and output vectors so that the vectors would have a length of 1, and then the values in the input vector would have matched the values in the output vector exactly, but this procedure would have made the example harder to follow. Finally, for completeness, we can also retrieve the association for DOG by:

Images

The output vector associated with dog is in the same direction as the bow wow vector, but it is longer.

This example demonstrates that one can superimpose multiple patterns and still retrieve information from the system. How is all this related to the notion of a space? The ability to overlay one pattern of connection weights on another is made possible by the geometric (i.e., spatial) properties of vectors. In the section on high-dimensional semantic spaces, I introduced the idea that the similarity of a pair of vectors can be determined by finding the cosine between them. The cosine of a pair of vectors can be determined by taking the dot product of those vectors as follows:

Images

and then dividing the dot product by the lengths of the vectors being compared (the length of a vector is simply the sum of values of each unit squared). Taking the dot product of the DOG and CAT vectors yields a value of 0; thus there is no similarity between these vectors (they are orthogonal). No part of the DOG vector goes in the direction of the CAT vector. This result is important, because the output of the multiplication of the weight matrix and a new input vector in a system like this is proportional to the amount of the input vector in the same direction as (i.e., the amount that projects on) input vectors that the system has already seen. Because the DOG and CAT vectors go in orthogonal directions, the DOG → BOW WOW association and the CAT → MEOW association can be stored in the same connection matrix.

This geometry is evident in two additional examples. First, imagine a new vector for the input pattern pig, which the system has not learned. Pig is represented by:

Images

Calculating the dot product of this vector with both the dog and cat vectors shows that both are 0. That is, this vector is in a new direction from the cat and dog vectors. Associated with pig in the current weight matrix is:

Images

Thus, there is no already learned association for pig. The network gives no output.

What happens if I ask the network about an animal that was like a cat? To make a vector for an animal similar to a cat, just add a bit of “noise” to the vector by adding and subtracting a bit from each of its values:

Images

This vector is not in quite the same direction as the cat vector. In fact, it has a vector cosine of 0.93 with the cat vector. Obviously, it is still similar to cat, but it is no longer in an identical direction. The output obtained when this vector is presented to the system is:

Images

The output pattern is still in the same direction as the vector for meow. That is, the network was able to “generalize” to another input that was geometrically similar but not identical to the one that the model learned.

This geometry holds the power of distributed connectionist models. The input vectors exist in a space defined by the weight matrix, and new vectors that have never been seen can still generate an output to the extent that some portion of the vector falls in a direction for which the network already has an association. In connectionist models, there are generally more than four input and output units. For a two-layer network (one with only an input and an output layer), a maximum of N distinct associations can be learned perfectly, where N is the number of dimensions because only N orthogonal vectors fit into an N-dimensional space. In practice, because the input vectors used in applications of connectionist models are not crafted to ensure that they are all orthogonal, the number of vectors that can be stored before there is too much interference between them to properly learn all the desired associations is often only about 10 to 20% of the number of dimensions. Thus, it is not uncommon for a connectionist model to have hundreds of units that form a high-dimensional space similar to those described in the previous section.

AN EXAMPLE OF A CONNECTIONIST MODEL

As an example of connectionist models in action, I examine a model of prototype formation developed by Knapp and Anderson (1984). The basic psychological phenomenon follows from Posner and Keele’s (1970) classic work on classification. They presented participants in a study with dot patterns in a two-dimensional array and asked them to classify the dot patterns into one of two groups. As shown in Figure 2.6, each class of dot patterns was developed from a prototype pattern. Although the prototype was never shown to people, participants saw various distortions of the prototypical pattern. After learning to classify the patterns reliably, people saw various distortions of the prototype as well as the prototype itself (which had not been seen before), and they were asked to classify them. Participants classified the prototype correctly even though it was not seen during learning. Indeed, participants classified the prototype correctly more often than they did examples presented during learning, examples that were extreme distortions of the prototypical pattern.

Images

FIG. 2.6. Configurations of points modeled in the simulation by Knapp and Anderson.

To create a connectionist model that exhibits this pattern of behavior, Knapp and Anderson assumed that dots were represented as a pattern of activity in a two-dimensional grid. As shown in Figure 2.7, the pattern of activity was greatest at the center of the dot and fell off gradually as the dot moved farther away. This distributed representation corresponded to a vector of activation values in an input layer. Such input patterns can be associated with output responses. Because the input represents dots as patterns of activity, new patterns that are similar to those seen during learning have vector representations in a direction similar to patterns used to form associations during learning. The prototype of a category, which is similar to many different learned patterns (all of which are distortions of that prototype) is categorized easily by this system. In contrast, patterns that are large distortions of the prototype pattern are not categorized as well, because few patterns stored in the network are similar to the distorted pattern. Thus, the geometric similarity between different input vectors allows this simple connectionist model to account for prototype formation.

This model also highlights the distinction between two uses of space. At one level, this model uses a spatial representation of space: As shown in Figure 2.6, each point in perceptual space is represented by a point in a grid, and nearby points in perceptual space are represented by points near to each other in representational space. At another level, the grid representation is a vector in a high-dimensional space, and similar perceptual patterns give rise to vectors with high vector cosines. Knapp and Anderson were careful to point out that both types of representation are critical for understanding the behavior of the system. It is crucial to the functioning of the model that points are represented as widely distributed patterns of activity in a two-dimensional grid. If each dot is represented as activity in only a single unit in a two-dimensional grid, there is no generalization. Similarly, if activity is spread too widely across the grid for each dot, all patterns tend to interfere with each other. If a spatial grid is not used, the network cannot recognize which points are near to each other in space. The use of an intermediate level of distributed activity combined with the geometric properties of vector representations allows the network to account for the observed pattern of results.

Images

FIG. 2.7. Distributed representation of a dot in space. Units representing points at the center of a dot are most strongly activated. Units representing points farther from the center of a dot are less strongly activated. From A. G. Knapp and J. A. Anderson (1984). Copyright © 1984 by American Psychological Association. Reprinted with permission.

HOW SHOULD WE THINK ABOUT CONNECTIONIST MODELS?

There is often some confusion about how to think of representations in connectionist systems. Because units do not have meaning by themselves, it is tempting to think of them as implementing an alternative representation that is not like the symbols often seen in computer programs and in many psychological models. Indeed, Smolensky (1988) suggested that connectionist models operate at a subsymbolic level, inspired by the fact that each individual unit can best be thought of as representing a very fine-grained feature active in many concepts. People may be unable to interpret what a given unit means, but because the collected activity of a set of units does a reasonable job of representing a concept, a fine-grained set of features must be there somewhere.

In this chapter, I have suggested that the behavior of a simple distributed connectionist model is probably best conceptualized as a high-dimensional space. Thinking about the role of individual units is confusing, but seeing the patterns of activity as vectors makes it clear how the connectionist model is doing what it does. In this way, the representing world in a connectionist model is a lot like the representing worlds in other spatial models described in this chapter.

STRENGTHS AND WEAKNESSES

Some of the same limitations of spatial models discussed previously also apply to connectionist models. I pointed out earlier that the models perform well as long as the major operation is one of finding neighbors in space. For example, the space created by singular value decomposition performed quite well on a test of synonyms, although it would have performed poorly on a test of antonyms. Likewise, simple associative connectionist models gave an output based on the spatial similarity between an input vector and previously seen input vectors. The process of multiplying a weight matrix by an input vector does not allow the vector to be analyzed in any way needed for searching for antonyms rather than synonyms.

Still, connectionist models of this type are very powerful. The example presented previously is J. A. Anderson’s (1972) simple linear associator. There are more powerful learning mechanisms, even for simple two-layer networks (Widrow & Hoff, 1960). In addition, researchers have developed algorithms for creating connectionist models with additional layers of units (called hidden units) between the input and output units. These hidden units perform a function similar to that of the singular value decomposition described earlier: They find a way of reducing the dimensionality needed to represent information. This dimension reduction finds higher order statistical relations in the activation patterns of units in the network. The networks maintain the basic geometry but are capable of learning more complex associations between input units and output units.

BRINGING THIS ALL TOGETHER

This chapter has been quite a tour. I started with low-dimensional spaces that were used to represent a few points in a common frame. These spaces were supported by multidimensional scaling programs illuminating the information that people used to make judgments. Then, I examined high-dimensional space representations formed by finding statistical relationships among words in large corpora of text. Finally, I discussed connectionist models, which also operate in high-dimensional spaces. In these systems, the geometric similarity of input vectors to vectors learned in previous associations determines the new output.

The great strength of spatial representations is power in simplicity. Spaces have a rigid formal definition that gives modelers a few well-controlled aspects to use in developing the representing world. Several dimensions can be created for the space. Points or vectors can be located in that space, in which the distance between points or vectors becomes an important quantity. In many of the examples I discussed, distances represented psychological similarity, but distance can represent other constructs like perceived distance or preference as well. Other factors have been considered for spatial models, such as variable attention weights for dimensions in space and density of points in neighborhoods in the space, although most models have focused selectively on distance between points or vectors.

Mental spaces are often associated with rather simple processing assumptions. The model of sentence verification was concerned with only a measurement of distance between points. Similarly, the analogy model of Rumelhart and Abrahamson (1973) involved drawing only a few lines between points in a space. High-dimensional space and connectionist models derive most of their power from simple calculations in linear algebra. As discussed in subsequent chapters, when proposals for the nature of mental representations grow more complex, the processes needed to extract and use information from representations also become more complex. Thus, it is a virtue of mental space models that elaborate processing assumptions are not needed.

Despite this simplicity, mental space models have formed the basis of various psychological models. In this chapter alone, I described models of sentence verification, analogy, category formation, and word meaning. Although there are problems with all these models, it is important to point out that they are very successful. Indeed, the problems with models that use mental space representations almost all boil down to a single issue: the need to make use of specific parts of the representations. In all applications of mental space models, the interaction of two points or vectors yields only a quantity, a measure of proximity in the space. When proximity is all that is needed, these models are quite powerful, but many complex processes involve not only the sense of conceptual proximity but also an understanding of what makes two points near to each other or far apart. In these situations, it seems important to have access to the elements that make up the representations. For example, to determine that corrupt is the opposite of virtuous, it is necessary to go beyond simply determining that both terms are part of the same conceptual field. I must also determine in what way virtuous and corrupt are related. After all, virtuous and chaste are also in the same conceptual field, but they are not antonyms. In cases like this, mental space models falter.

1To see what the configuration looks like with the other axes drawn, rotate the page so that the other axes are vertical and horizontal. This may seem obvious, but a configuration can look different when rotated to a different orientation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.119.206