CHAPTER 2

Recommender Systems as a Collaborative Form of ASI

Artificial social intelligence is much broader than mere multiagent systems, whether the agents are represented by neural nets or rule-based systems, involving many kinds of partnership between humans and machines. A powerful and well-established example is recommender systems, which are not popularly called AI yet typically involve machine learning of some form, analyzing preferences or behaviors of people in order, commonly, to advise them what movie or other cultural product they might want to experience next.1 A good example is the classic 2006 Netflix dataset of ratings of 17,770 movies by hundreds of thousands of its customers. When computer scientists work on recommender systems, they tend to seek to improve the predictive power of the algorithms, which serves the financial interests of the vendor company that acquired the data and offers good advice to the customers. However, it is relatively rare for such data to be used in the humanities to map the current structure and historical dynamics of cinema cultures or other subcultures interested in particular artistic genres. This topic is good for the second chapter of this book for three reasons: (1) it offers clear examples of principles that will feature in later chapters, (2) the field of research is well but unevenly developed, such that research priorities can be proposed with some confidence, and (3) it has obvious relevance in both economic profits for the media companies and educational benefits for a range of college and precollege academic subjects.

Systems for Recommending Movies

Since the very beginning of the 20th century, motion pictures have been a major segment of popular culture and an economically significant industry. Arguably, the technology became technically mature by the year 1939, with the release of high-quality sound and color films, notably The Wizard of Oz and Gone with the Wind. Computer-generated graphics have facilitated some improvement in special effects, especially for fantasy and science-fiction films, but occasional technological fads like 3D have not led to radical reformulation of the medium over the past eight decades. However, Internet and cable streaming services have liberated audiences from theaters and allowed them to view what they wanted when they wanted. For example, anyone can invest the 12 minutes required to view the 1903 classic short The Great Train Robbery in any of several copies freely available on YouTube. To get a sense of its historical context, one may search the old newspapers at the Chronicling America online digital library of the U.S. Library of Congress, for example, finding this notice on the May 26, 1904 front page of The Morning Appeal newspaper in Carson City, Nevada: “The great train robbery, the finest film yet shown at the Vitagraph will be shown for the last time this evening.”2 A much more recent evaluation of this pioneering work of action-oriented art can be found on its Wikipedia page, which had been viewed 384,761 times in the period July 1, 2015, through March 12, 2019.3

Before we consider how today’s revolutionary communication media are either fragmenting or unifying popular culture, we need both an historical perspective and a sense of the technical features of well-established methods for mapping a cultural genre. The Internet Movie Database (IMDb) classifies The Great Train Robbery as “short, action, crime” or “short, western,” depending upon which of two editions one checks.4 IMDb links to two sources of judgment: published reviews by professional movie critics and ratings plus descriptions from ordinary users of the Metacritic online information service. Those are really the traditional sources that audiences relied upon to decide whether to see a film: (1) professional critics who published in the local newspaper or elsewhere and (2) neighbors who had seen the film and commented privately about it. While there are various methods to draw insights from traditional sources, modern recommender systems often gather their own information in a manner similar to social-science questionnaire survey research. A total of 15,258 IMDb users rated The Great Train Robbery on a scale from 1 to 10, the modal response being 7, which 31.6 percent of them selected. If we had data on how the same users rated many other films, we could use statistical methods to identify clusters of films having similar characteristics, such as westerns. To illustrate this approach, we need a different database.

In August 1978, I took a carload of paper questionnaires of two types to the 36th annual World Science Fiction Convention in Phoenix, Arizona. The main part of this research project sought to map the cultural structure of science fiction literature, and its questionnaire asked respondents to rate 140 authors and many types of literature described in standard terms used by critics who published in the popular science fiction magazines. There were five versions of this questionnaire, listing the authors in different random orders to avoid correlations simply because of the placement of the names, and two of the authors were fake names to catch frivolous responders. The data were manually entered into Hollerith style computer cards, and I wrote programs to rearrange the data about authors in the same order. Factor analysis of the data did an excellent job in identifying four main subcultures of science fiction literature: (1) hard-science SF that was logical and based on the physical sciences, (2) new-wave SF that was more poetic and connected to social science, (3) a cluster of types of SF-related fantasy including horror and “sword-and-sorcery,” and (4) classic SF of its early days headed by the works of Jules Verne and H.G. Wells. The results were published as a book from ­Harvard University Press.5

The second questionnaire was more problematic, focusing on 67 movies the attendees at the convention were likely to have seen, asking them to rate each on a 7-point scale from 0 (do not like) to 6 (like very much). For the analysis reported here, I focused on the 200 respondents who had rated at least 45 of the 67 movies, and a general issue for cultural survey research is the degree of familiarity the respondents have with the topic. Again, one of the main methodologies was factor analysis. Wikipedia correctly describes this approach and connects it to modern artificial intelligence:

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors ... Factor analysis aims to find independent latent variables. It is a theory used in machine learning and related to data mining.6

But it is not a new method, and was largely developed by Charles Spearman (1863–1945) nearly a century ago.7

The most central connection of factor analysis to modern machine learning is that it is an iterative process, many stages of which can be either automated or decided by the human user. In my analysis of the 1978 movie data, I instructed the computer to rotate the factors, which meant that it automatically went through a series of iterations to improve the clarity of the analysis. I had the option of telling it how many factors to seek, or to follow the statistical criterion of all factors with eigenvalues greater than 1. In much more recent online research, I administered Lewis Goldberg’s set of 100 questionnaire items measuring the so-called Big Five personality dimensions to 3,267 respondents, vastly more than such studies usually have, and thus potentially strengthening the reliability of the statistics.8 I ran two different factor analyses. In a confirmatory analysis, I told the statistical analysis software to do a common kind of principal component analysis with rotation, calling for five factors, and pretty exactly I got the Big Five. Then I compared an exploratory analysis, with everything the same, but asking for as many factors that had eigenvalues greater than 1. That produced fully 15 factors.9 Even though the 100 questionnaire items had been designed to measure exactly five dimensions, the human responses had a more complex structure. A system that automatically experimented with a range of criteria, from rotation to varying eigenvalues to selection of subsets of items for further analysis, would clearly identify factor analysis as a form of machine learning.

Other options illustrated with the 1978 movie data are focusing on a subset of more knowledgeable respondents and on films that have characteristics in common. Table 2.1 reports results of an analysis of the 36 best-known movies, 17 of which clustered well into 3 factors. The most popular film for the 1978 respondents was Star Wars (1977), seen by all the respondents and rated an average of 5.45 on the 0 to 6 scale, but it did not fall into any particular factor. While the movies are more than four decades old, they are relevant today, as demonstrated by the large numbers of times people have viewed their Wikipedia articles, data covering the period from July 1, 2015, through March 12, 2019. Notably, the 1939 classic The Wizard of Oz attracted fully 6,192,237 recent pageviews.


Table 2.1 Factor analysis of old but popular movies

 

 

1978 SF Convention Questionnaire Data

2015–2019 Wikipedia Pageviews

 

Year

Mean Score

Percent Rated

Factor 1 Loading

Factor 2 Loading

Factor 3 Loading

Factor 1

Logan’s Run

1976

3.31

96.5%

0.67

–0.04

–0.05

1,539,820

Soylent Green

1973

3.50

93.0%

0.61

0.12

0.04

2,798,725

The Andromeda Strain

1971

4.59

98.0%

0.59

–0.02

0.07

460,597

The Omega Man

1971

3.31

91.5%

0.58

–0.10

–0.05

769,621

Westworld

1973

3.89

95.5%

0.57

0.11

0.03

1,437,308

Fantastic Voyage

1966

4.00

98.0%

0.52

–0.14

0.12

509,048

The Land That
Time Forgot

1974

2.49

88.5%

0.50

0.16

0.05

245,642

Factor 2

Psycho

1960

4.18

88.0%

–0.11

0.68

–0.02

5,320,274

The Bride of
Frankenstein

1935

3.42

87.5%

–0.06

0.65

0.29

922,810

Rosemary’s Baby

1968

2.62

88.0%

0.02

0.59

0.12

2,769,900

The Fly

1958

2.76

90.5%

0.15

0.59

0.16

480,875

King Kong

1933

4.56

98.0%

0.01

0.54

0.33

1,537,859

Invasion of the
Body Snatchers

1956

4.19

88.0%

0.12

0.51

0.16

842,633

Factor 3:

20,000 Leagues
Under the Sea

1954

4.16

97.0%

0.10

0.14

0.68

825,987

Forbidden Planet

1956

5.09

99.0%

0.00

0.08

0.60

1,036,063

The Wizard of Oz

1939

4.47

99.0%

–0.18

0.30

0.54

6,192,237

The Day the Earth Stood Still

1951

5.25

97.5%

0.11

0.07

0.52

1,118,914


Factor analysis takes a correlation matrix as its input, and the output includes new scores for the items, called loadings, that are rather like correlations between the item and each of the latent factors. Table 2.1 uses a somewhat arbitrary criterion, including only films that loaded at least 0.50 on one of the factors. In most cases, a film has one big loading, and the other two are statistically indistinguishable from 0, reflecting a very successful factor analysis. The interesting exceptions are The Bride of Frankenstein (loaded 0.65 on factor 2 and 0.29 on factor 3), King Kong (0.54 on factor 2 and 0.33 on factor 3), and The Wizard of Oz (0.30 on factor 2 and 0.54 on factor 3). A film that did not meet the 0.50 criterion for inclusion, The Creature from the Black Lagoon, loaded 0.20, 0.47, and 0.39. While in applications like the Big Five personality dimensions, in which items that load on multiple factors were excluded from the design of the instrument, when we study natural cultures we need to realize that a table like Table 2.1 is mapping films into a multidimensional conceptual space, rather than clustering the films, strictly speaking.

In the questionnaire rating science fiction authors, it worked very well to employ an extension of the factor analysis method, generating factor scores that could then be correlated with the respondent’s ratings of the main terms used by critics to describe different subgenres or aspects of science fiction literature. This resulted in very solid descriptions of the meanings of all four main factors. That approach did not work as well for the movies, perhaps because the questionnaire was not based on very clear categorization schemes developed by knowledgeable film critics, and the subculture of science fiction fans was primarily oriented toward the literature. Indeed, cultures vary in terms of how solidly based they are upon clear ontologies.

How can we describe each of the three movie factors? From the dates, factor 1 clusters films that were very recent in 1978 when the questionnaire was administered, and their mean release date was 1972. The other two factors are older, with mean dates of 1952 and 1950. It is noteworthy that very high majorities of the respondents had seen each of the films, but that testifies to their relevance for the science fiction subculture, and today movies of all vintages are far more accessible to the general public than they were four decades ago. The third factor stands out in terms of their ratings, having an average of 4.7 on the 0 to 6 scale, while the other two factors were essentially tied at 3.6.

Having seen all the films in the table, I was aware that the factor 1 films were not merely similar in vintage but also in theme, each exploring an imagined, exotic environment. The right-hand column of the table suggests how anyone could learn their topics and other features, because all have their own article on Wikipedia, nine of them having been viewed over a million times in the period reported in their Wikipedia pageviews statistic. Here is how Wikipedia describes the factor 1 films:

Logan’s Run “depicts a utopian future society on the surface, revealed as a dystopia where the population and the consumption of resources are maintained in equilibrium by killing everyone who reaches the age of thirty.”10

Soylent Green “combines both police procedural and science fiction genres; the investigation into the murder of a wealthy businessman and a dystopian future of dying oceans and year-round humidity due to the greenhouse effect, resulting in suffering from pollution, poverty, overpopulation, euthanasia and depleted resources.”11

The Andromeda Strain is a “science fiction thriller film” about “a team of scientists who investigate a deadly organism of extraterrestrial origin” “near the small rural town of Piedmont, New Mexico, almost all of the town’s inhabitants die quickly.”12

The Omega Man is about “a survivor of a global pandemic” who “spends his days patrolling the now-desolate Los Angeles, hunting and killing members of ‘the Family’, a cult of plague victims who were turned into nocturnal albino mutants.”13

Westworld explores “a high-tech, highly realistic adult amusement park ... The resort’s three ‘worlds’ are populated with lifelike androids that are practically indistinguishable from human beings, each programmed in character for their assigned historical environment.”14

Fantastic Voyage follows “a submarine crew who are shrunk to microscopic size and venture into the body of an injured scientist to repair damage to his brain.”15

The Land That Time Forgot explores “an uncharted sub-continent called Caprona, a fantastical land of lush vegetation where dinosaurs still roam, co-existing with primitive man.”16

Clearly, these themes harmonize with standard science fiction literature, but in fact there are also direct connections, Fantastic Voyage with a novel by Isaac Asimov, The Land That Time Forgot with one by Edgar Rice Burroughs, and Logan’s Run with a novel written collaboratively by William F. Nolan and George Clayton Johnson. The Omega Man was based on the 1954 novel I Am Legend by Richard Matheson, about which Wikipedia correctly reports:

It was influential in the development of the zombie-vampire genre and in popularizing the concept of a worldwide apocalypse due to disease. The novel was a success and was adapted into the films The Last Man on Earth (1964), The Omega Man (1971), and I Am Legend (2007). It was also an inspiration behind Night of the Living Dead (1968).17

Not only was Soylent Green based on the 1966 novel Make Room! Make Room! by Harry Harrison, but it was like a sequel to The Omega Man, starring the same actor, Charlton Heston, and having one of the same producers, Walter Seltzer. Novelist Michael Crichton wrote the stories for both Westworld and The Andromeda Strain. Clearly, the films in factor 1 represent a subculture of science fiction literature that converged with the movies.

While not defined as a recommender system, Wikipedia can easily serve as an advisor about which movie one might next want to see. For example, Logan’s Run, Soylent Green, and Omega Man are included in its list of dystopian films, so that seeing one could encourage viewing the other two or indeed becoming a fan of the entire genre.18 Many other current online information sources about the film industry are relevant to recommender systems, notably the IMDb that links to reviews by professional critics and reports ratings by users.19 The IMDb page for Logan’s Run headlines a brief synopsis: “An idyllic science fiction future has one major drawback: life must end at the age of thirty.”20 The average metascore is rather low, just 53 on a scale of 1 to 100, but IMDb highlighted four reviews linked via Metacritic that had much higher scores:21

Variety (metascore = 90): “Logan’s Run is a rewarding futuristic film that appeals both as spectacular-looking escapist adventure as well as intelligent drama.”

Chicago Sun-Times, Roger Ebert (metascore = 75): “Logan’s Run is a vast, silly extravaganza that delivers a certain amount of fun, once it stops taking itself seriously.”

The New York Times, Vincent Canby (metascore = 70): “Logan’s Run is less interested in logic than in gadgets and spectacle, but these are sometimes jazzily effective and even poetic. Had more attention been paid to the screenplay, the movie might have been a stunner.”

IGN (metascore = 70): “Taken for what it is, Logan’s Run delivers a fun ride and a glimpse at another era, even if it’s probably not the time frame the producers had in mind.”

The phrase “escapist adventure as well as intelligent drama” illustrates a challenge that in some form may be faced by cultural science in all areas: Which cultural features are relevant to a particular analysis, serving its goals rather than distracting from them? We have not emphasized that 5 of the 17 movies were filmed in black and white, rather than color. All of the factor 1 films could be described as escapist, because they venture far away from ordinary life. Yet one of the proverbs of SF fandom is: “Science fiction is escape ... into reality.” By this the fans mean that popular culture ignores key truths about the nature of humans and the universe they inhabit. The term intelligent may refer to the asking of questions as much as answering. This returns us to the key theme of cultural relativism: The criteria used to map the structure of a culture are themselves cultural products, so very different analyses may be equally valid.

Table 2.2 summarizes the IMDb and Metacritic information about the three factors of films rated by the 1978 science fiction convention, chiefly based on new data but including a few of the original movie reviews. The Metacritic scores are on a 1 to 100 scale, while the IMDb user scores are on a 1 to 10 scale. Interestingly, IMDb reports the gender of most user reviewers. In the case of Logan’s Run, of 46,894 raters, 33,178 self-identified as male and 4,252 as female. Thus, only 11.4 percent of the 37,430 who reported their gender were female. The highest percent female, 26.7, is for The Wizard of Oz in which a heroine explores a fantasy world, and the second-highest, 22.2 percent, is for Rosemary’s Baby. We may weigh the balance between two hypotheses: (1) users of IMDb who report their gender are predominantly male and (2) female users of IMDb are less interested in science fiction than are male users.


Table 2.2 Data about three sci-fi factors at the Internet movie database

 

Metacritic Score

(1–100)

User Score (1–10)

Users Who Rated It

Percent Female

Category
Description

Factor 1

 

 

 

 

 

Logan’s Run

53

6.8

46,894

11.4%

Action, sci-fi

Soylent Green

66

7.1

51,971

11.6%

Crime, mystery, sci-fi

The Andromeda Strain

60

7.2

30,001

9.1%

Sci-fi, thriller

The Omega Man

56

6.6

26,420

7.9%

Action, sci-fi, thriller

Westworld

77

7.0

46,137

8.8%

Action, sci-fi, thriller

Fantastic Voyage

None

6.8

15,554

8.9%

Adventure, family, sci-fi

The Land That
Time Forgot

None

5.7

4,915

8.0%

Adventure, fantasy

Factor 2

 

 

 

 

 

Psycho

97

8.5

527,957

19.9%

Horror, mystery, thriller

The Bride of Frankenstein

None

7.9

38,670

13.3%

Drama, horror, sci-fi

Rosemary’s Baby

96

8.0

167,461

22.2%

Drama, horror

The Fly

None

7.1

18,606

12.4%

Drama, horror, sci-fi

King Kong

90

7.9

72,981

11.2%

Adventure, horror, sci-fi

Invasion of the Body Snatchers

92

7.8

40,170

12.5%

Drama, horror, sci-fi

Factor 3

 

 

 

 

 

20,000 Leagues Under the Sea

None

7.2

25,758

10.1%

Adventure, drama, family

Forbidden Planet

None

7.6

40,722

8.8%

Action, adventure, sci-fi

The Wizard of Oz

100

8.0

348,129

26.7%

Adventure, family, fantasy

The Day the Earth Stood Still

None

7.8

71,638

11.0%

Drama, sci-fi


In an earlier study, I reported data about a different set of science fiction movies from the highly respected MovieLens database.22 It accurately describes itself as the primary academic center for movie-related recommender system research:

MovieLens is a research site run by GroupLens Research at the University of Minnesota. MovieLens uses “collaborative filtering” technology to make recommendations of movies that you might enjoy, and to help you avoid the ones that you won’t. Based on your movie ratings, MovieLens generates personalized predictions for movies you haven’t seen yet. MovieLens is a unique research vehicle for dozens of undergraduates and graduate students researching various aspects of personalization and filtering technologies.23

The heart of the system is asking the user to rate some movies, then recommending others that tended to correlate in the ratings by earlier users, but it is not limited to that correlational approach, allowing users to specify movie genres, release dates, actors, and directors, and increasingly other variables.24 If scholars in the humanities and social scientists want to explore the potential of a convergent cultural science, they might well be advised to try doing research studies in collaboration with MovieLens.

Several of the most widely used recommender system algorithms do not perform anything like factor analysis on movie preferences across the entire dataset, but take each respondent, hunt for the neighborhood of similar people, and use only their data to make the recommendations.25 To conclude this section, however, one more correlational analysis will help make useful points, employing the Netflix “data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies,” shared in 2006 with the public in a contest to see who could develop algorithms better than those already in use.26 Table 2.3 shows results of statistical analysis of six of the movies, the four from factor 1 with the highest number of pageviews, and the one with the most pageviews from each of the other two factors. The diagonal, from upper left, shows the number who rated each film, for example, 14,384 rating Logan’s Run on a 1 to 5 scale. The cells below and to the left give the number who rated each pair, 4,977 in the combination of Logan’s Run and Soylent Green. The numbers above and to the right are the correlation coefficients, in this case 0.43 which indicates that respondents who liked Logan’s Run rather strongly also tended to like Soylent Green. The total number of respondents rating any of these movies was 110,465. Just 956 rated all 6, while 69,246 rated only 1 and thus could not be used in the correlation analysis.


Table 2.3 Comparative analysis using the 2006 Netflix movie database

 

Logan

Soylent

Omega

Westworld

Psycho

Wizard

Logan’s Run

14,384

0.43

0.42

0.39

0.17

0.17

Soylent Green

4,977

11,276

0.50

0.41

0.22

0.16

The Omega Man

3,122

3,309

5,736

0.40

0.16

0.16

Westworld

4,447

3,751

2,641

8,239

0.21

0.18

Psycho

7,651

5,930

3,225

4,856

58,837

0.26

The Wizard of Oz

8,556

6,131

3,218

5,023

31,409

74,829


The most obvious pattern in the table is that the four movies from factor 1 tend to correlate well and about equally with each other, replicating the analysis done with data more than a quarter century older. This suggests that the structure of movie culture is somewhat stable over time, at least in some sectors. The 0.50 correlation between Soylent Green and The Omega Man is noticeably higher than the others, probably because audiences recognized the same lead actor in both, but perhaps also for stylistic commonalities shaped by the fact this pair had the same producer. The correlations between the factor 1 quartet and the two other movies are much lower, and we see a moderate but respectable correlation of 0.26 between Psycho and The Wizard of Oz, perhaps reflecting the fact that they were both of older vintage than the quartet. Given that people vary in how much they like movies in general, and positive response biases are often found in questionnaire data, the three 0.16 coefficients may not really be meaningfully more than zero. But the differences across the coefficients are significant and indicate that commercial recommender system data can indeed be used to map cultural structures.

Audience-Generated Culture

To gain a sense of the variety of recommender systems that already exist and to raise more methodological challenges and opportunities, we shall now examine the case of The Foundry, a software system incorporated in the massively multiplayer online game Neverwinter that allowed players to create their own virtual environments and missions for other players to undertake. This is an excellent example for cultural science, because Neverwinter is one component of the vast Dungeons and Dragons subculture that nicely illustrates the convergence–divergence dynamic, having drawn into itself elements of many existing subcultures, then greatly stimulating the emergence of new subcultures. In a recent chapter about online virtual worlds, I reported27:

A revolutionary development was the emergence of Dungeons and Dragons (D&D) in 1974, a tabletop role-playing game that allowed players to invent their own stories or follow an increasing number of partially prewritten scripts, usually within a fantasy environment that was frankly influenced by Lord of the Rings, but avoided copyright infringement by calling Hobbits halflings instead.28 Another one of the many influences was jetan, a chesslike game devised by Edgar Rice Burroughs for Chessmen of Mars, one of a series of novels that also influenced the Star Wars mythos and that embedded the gameplay in the fictional history of competing alien ethnicities.29

The freedom to create their own missions, or use role-play stories written by professionals, was central to D&D culture. Thus, it was natural for Cryptic Studios to include that feature when it released Neverwinter in 2013. It had included similar features in two earlier online multiplayer games, City of Heroes, which shut down in 2012, and Star Trek Online, where I had studied The Foundry extensively.30 On March 4, 2019, Cryptic announced that The Foundry would be removed from both of the surviving virtual worlds on April 11.31 I had done some research on Neverwinter’s version of The Foundry and, given how fundamental player-created material was to D&D culture, I immediately rushed to collect as much additional data as I could, especially the public recommender system data concerning more than a thousand missions. This makes the point that many forms of culture, even important ones, are ephemeral and thus require active documentation and archiving by researchers.32

When looking for a mission to undertake, players could open a catalog interface and conduct a search, for example, checking the popular missions that had received the highest ratings from earlier players or narrowing the search to a particular language and specifying a keyword that should be in the description written by the creator of the mission. Some missions were simple and brief, but others took as much as an hour. At the end, the player got the opportunity to rate the mission on a scale from 1 to 5 “stars,” write a brief comment, and then respond to a 13-item questionnaire that listed features that any mission might have. For example, one item was “story focus,” and the player could check a box if the mission seemed to have a strong narrative. Another choice was “combat focus,” and the player could select either, both, or neither. A player who performed the mission a second time could update this personal evaluation, but would be counted only once for the rating and classification. Many players who assigned a rating declined to offer any categorization, and it is common for recommender systems to collect more fragmentary data than do social science questionnaires. Table 2.4 reports the distribution of classifications for all the missions in the public database, for the six languages other than English.


Table 2.4 Player categorizations of player-created game missions

German

French

Polish

Portuguese

Italian

Turkish

Challenging

12.1%

5.2%

9.7%

23.5%

11.7%

12.3%

Story focus

10.4%

14.0%

10.7%

7.8%

10.1%

12.4%

Combat focus

16.0%

12.7%

18.9%

12.8%

9.5%

16.0%

Lore

2.3%

1.1%

6.5%

5.8%

3.2%

4.1%

Exploration

8.3%

8.0%

3.8%

9.6%

6.6%

7.8%

Humor

7.1%

8.7%

9.7%

9.3%

9.6%

8.1%

Unusual

5.7%

7.0%

7.2%

7.8%

6.4%

7.9%

Solo friendly

17.5%

21.4%

10.3%

4.6%

15.9%

3.7%

Group friendly

3.0%

3.3%

3.7%

2.7%

5.1%

3.8%

Role-play

4.7%

5.8%

4.4%

2.5%

3.5%

4.9%

Puzzle

2.1%

2.2%

2.0%

2.2%

2.8%

3.1%

Adjustable difficulty

1.7%

2.5%

3.0%

5.0%

4.4%

4.3%

Eventful

9.1%

8.1%

9.9%

6.5%

11.0%

11.5%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

Categorizations

195,505

192,438

124,098

98,645

49,710

26,333

Missions

387

313

109

217

128

79

Plays

1,091,905

774,614

505,449

434,726

345,084

201,789


There were 387 missions in the German language that had been played a total of 1,091,905 times. Among the details not shown in the table are that there were 518,338 ratings on the 1 to 5 scale, with a mean of 3.68 stars. We do not know whether only about half of the plays produced ratings simply because the players tended to do the mission twice, getting credit only for once, or that some fraction of those who completed a mission chose not to rate it. But there is a major discrepancy between the 518,338 ratings and the 195,505 categorizations. If one player marked both questionnaire boxes for story focused and combat focused, that would count as one rating and two categorizations. We just do not know how many categorizations the average categorizer gave, but it appears that only a small fraction of the players went this far in the recommendation process. This complex situation offers two insights:

  1. 1. Participants in a culture differ in their knowledge of it and in their willingness to share observations or assessments.
  2. 2. In cultural science, the nature of the “unit of analysis” is often optional.

As I defined the term in an earlier textbook-software educational project, a unit of analysis is “the thing being counted or the basic element of reality that constitutes a case of the thing being studied.”33 The unit of analysis in questionnaire survey research is typically the individual respondent. This makes particular sense in political science election polling, because each voter gets one vote. But in culture, some individuals may be far more influential than others, and for decades social scientists have distinguished opinion leaders from the rank and file members of the population.34 In Table 2.4 the unit of analysis is the assignment of a mission to a category by one player who may have assigned it as well to other categories, each assignment counting separately.

The 13 category names in Table 2.4 are all the information The Foundry gave its users about the meanings of these terms. Any player would understand the distinction between “solo friendly” and “group friendly,” but the exact meanings are open to debate. “Solo friendly” states clearly that it is possible for a single player to complete the mission without help from other players. “Group friendly” does not explicitly state that a team of players is necessary, and for many missions additional players could be a hindrance rather than a help. An alternate definition is that a group-friendly mission is one that a group of friends would have fun completing, even if one of them could easily have “soloed” it. It is worth noting that solo categorizations were far more common than group categorizations, and a constant debate about online role-playing games is whether they implicitly discourage group play, even as they seek to be multiplayer. The low numbers for the “lore” category are also interesting, because we might have predicted that D&D fans would have prioritized the mythos, for example, the fact that Neverwinter is the name of a city governed by Lord Protector Dagult Neverember, within the Forgotten Realms part of the D&D backstory.35

Whatever language is supposedly spoken in the fictional city of Neverwinter, the Neverwinter game was created in English, and then translated to German, French, Polish, Portuguese, Italian, and Turkish. The novels of the popular American D&D author R.A. Salvatore have been translated into German, Italian, Finnish, Hebrew, Greek, Hungarian, Turkish, Croatian, Bulgarian, Yiddish, Spanish, Russian, Polish, Czech, and French.36 Cryptic Studios is currently owned by a Chinese company with the imposing name Perfect World Entertainment.37 So we are clearly examining a cross-cultural subculture, and it is an open question whether there are cultural differences across the six language cultures represented in Table 2.4. I often thought about this issue as I entered the data into my computer, because the only way to extract this information from the software for this particular game was to do so manually, working from “screenshot” pictures of the game display. The data raise this question, but they do not answer it.

Massively multiplayer online games are primarily produced in the United States, China, and South Korea, with some production in Britain and Germany, and a few highly creative game studios in such places as Iceland and Norway. Poland has recently entered the game industry, notably with the impressive and popular Witcher series of fantasy games.38 Spanish is missing from the list of Neverwinter Foundry missions, although I did find a couple of Spanish language missions hidden in the English category. In 2013, soon after Neverwinter’s release, several Spanish-speakers complained online, including someone using the moniker SpellSigner in the online forums of the Steam game distributor: “Es ridiculo esto. El juego está hasta en Turco y Portugués, pero del español no habla ni 2 palabras.” Cat Hugger replied, “Or you could just learn English and this way be able to enjoy thousands of other english only-games. Just a thought!”39

The inclusion of Portuguese in the list reflects the population of Brazil more than Portugal, and the popular scripting language used with many games, Lua, was developed there.40 We cannot definitively interpret the two obvious statistical facts about the Portuguese data in Table 2.4, that Portuguese missions are more challenging and that Portuguese speakers produce twice as many missions as the French, despite similar numbers of plays. The reason relates to the issue of unit of analysis, because a few players produced many short-duration missions in Portuguese, and the demise of The Foundry prevents us from playing all of the missions to see how they were designed.

Here we shift the definition of unit of analysis from the categorization to the mission, conceptually jumping right over the player. In social science, it is quite common to perform statistical analysis of data originally concerning individual people, but aggregated by geographic area. Having often done statistical research of that kind, I am very aware of the problems of defining geographic units. For example, much American government data can be analyzed by metropolitan statistical areas, which traditionally matched county boundaries except in New England, and some of the “units” seemed heterogeneous, notably the Baltimore–Washington metropolitan area, given how different the city of Baltimore is from the nation’s capital. Foundry mission creators differed in whether they would break a very long mission into parts, each of which would then count as a separate mission. In Foundry data, we cannot be sure which missions are copies or adaptations of others. Also, the recommender system featured specific missions for a few days and reports data from that period separately from data for other periods for the same mission, leaving open the question of whether they should be combined. Here I simply accept the unit of analysis as reported in Neverwinter, but focus on cases that meet reasonable criteria to maximize the meaningfulness of the data.

First, we will focus just on missions that took players on average at least 10 minutes to complete. The Foundry database reported average duration for all missions that meet this criterion, but did not report the durations of shorter missions. Of the total 1,233 missions in Table 2.3, 649 of them took on average 10 minutes or more to complete. Multiplying their specific average duration by the numbers of plays gives the remarkable total of 612,764 hours! This does not include the time invested by players who did not complete a mission, nor the times for the missions under 10 minutes. The reason for excluding the short missions is that many of them were not missions at all, but quick in-out actions, such as saying hello to a computer-simulated person to earn a simulated white pearl.

Second, we focus on missions that received at least 100 ratings on the star scale of quality. Another option would have been to focus just on missions that received 100 categorizations, or some other number. However, that alternative might distort the balance between very specialized missions that had their votes concentrated in one category versus those that were more general and got many votes from each rater. The English language missions were also included, giving a total of 774 missions with durations of at least 10 minutes and having at least 100 star quality votes. This will allow us to correlate categorizations, using mission as the unit of analysis. However, there are multiple ways to handle correlations in a situation like this, and Table 2.5 reports the results of two of them, focusing just on the eight categories with clear cultural significance.


Table 2.5 The correlation structure of game mission descriptors

Story

Combat

Lore

Explore

Humor

Un-usual

Role-
play

Puzzle

Story

1.00

–0.65

0.35

0.32

0.04

–0.05

0.69

–0.03

Combat

0.56

1.00

-0.35

–0.48

–0.30

–0.39

–0.53

–0.44

Lore

0.88

0.55

1.00

0.05

–0.07

0.00

0.26

0.19

Explor-ation

0.88

0.59

0.81

1.00

–0.23

–0.01

0.29

0.36

Humor

0.63

0.50

0.44

0.45

1.00

0.46

–0.02

–0.06

Unusual

0.85

0.57

0.64

0.75

0.79

1.00

0.01

0.29

Role-play

0.89

0.54

0.88

0.81

0.49

0.62

1.00

–0.01

Puzzle

0.63

0.45

0.60

0.79

0.36

0.58

0.61

1.00


The diagonal of cells in the table from upper left to lower right simply reports that each variable has a Pearson correlation coefficient of 1.00 with itself. The coefficients below and to the left of the diagonal are the results of correlating the raw numbers. They are rather large, as questionnaire correlations usually go, simply because they reflect the overall number of categorization votes each mission received, and some missions had been played and classified by far more players than others. It is interesting that the 0.56 correlation between story focus and combat focus is far less than the other correlations in the story focus column, but it is hard to know exactly what this means prior to completing the alternate analysis.

The correlations above and to the right of the diagonal are much easier to interpret. Before calculating them, the data for each mission were transformed into what fraction of the total number of categorizations for a given mission was assigned to the particular category. While not perfect, this is a pretty good method for removing the distorting effect of the highly varied number of categorizations across missions. Here we see a very strong negative correlation of -0.65 between story focus and combat focus, indicating that these characteristics are rather contradictory. Looking across the top row we see that story focus has solid positive correlations with lore (0.35), exploration (0.32), and role-play (0.69). Looking over the whole table we see a connection of 0.46 between humor and unusual, and 0.36 between exploration and puzzle. Frankly, this method allows the data to speak very clearly, helping us understand the meanings of the categories and how they fit together.

Although The Foundry no longer exists, many other online systems support creativity by members of the audience for popular culture franchises and folk traditions. Recently, vast numbers of nonprofessional writers have published their stories online, notably in the fan fiction context. Among the most popular is the Star Wars subculture, commercially manifested in many media, including computer games, novels, television programs, and most centrally, three trilogies of movies, the last of which was not yet complete at the time of this writing. As of May 5, 2019, Archive of Our Own offered online access to fully 4,775,000 stories and other works, including 87,412 directly inspired by Star Wars. It defines itself as “A fan-created, fan-run, nonprofit, noncommercial archive for transformative fanworks, like fanfiction, fanart, fan videos, and podfic.”41 The archive’s search engine facilitates identification of many interesting subsets, such as all the stories that contain a specified pair of familiar Star Wars characters, including the eight reported in Table 2.6.


Table 2.6 Co-occurrence of characters in fan fiction stories

 

Obi

Luke

Han

Leia

Anakin

Padmé

Rey

Finn

Obi-Wan Kenobi

12,134

17.4%

10.6%

10.3%

73.3%

64.4%

2.8%

2.8%

Luke Skywalker

16.8%

11,720

55.7%

47.0%

17.2%

25.1%

22.3%

22.8%

Han Solo

7.9%

42.7%

8,995

45.5%

8.5%

11.5%

16.1%

18.4%

Leia Organa

13.0%

61.4%

77.4%

15,303

15.1%

25.9%

34.5%

43.3%

Anakin
Skywalker

47.1%

11.5%

7.4%

7.7%

7,802

64.2%

2.4%

2.4%

Padmé
Amidala

24.4%

9.8%

5.9%

7.8%

37.8%

4,592

1.8%

1.7%

Rey

4.5%

37.2%

35.2%

44.1%

6.1%

7.6%

19,595

74.7%

Finn

3.0%

25.4%

26.7%

36.8%

4.0%

4.9%

49.7%

13,037


Each column of the table reports the number of stories that include the character named at the top, on the row that has that same name, and the percentage co-appearance with characters named on the other rows. Of the 87,412 works related to Star Wars, 12,134 included the character Obi-Wan Kenobi, who was a somewhat elderly Jedi master when he mentored Luke Skywalker in the original 1977 movie. Of the stories including him, 2,038 (16.8 percent) also included Luke, but rather more, 5,719 (47.1 percent), included Anakin Skywalker. The explanation will be obvious to any Star Wars fan. The second trilogy of movies takes place earlier in the fictional history, when Obi-Wan was a young man who interacted with Anakin Skywalker, who was Luke’s father, and Obi-Wan died before the end of the 1977 film. Padmé Amidala was Luke’s mother, but she died and he was raised as an orphan by an uncle and aunt. Rey and Finn were central characters in the third trilogy, which took place later in the history, and thus only a very unusual story could justify including them along with Anakin and Padmé.

An analysis like Table 2.6 may not contain surprises for members of the particular subculture, although it may call their attention to interesting anomalies like the stories that contain both Luke and Anakin. But it can serve as a useful introduction for students and scholars who are not members of the subculture and provide a framework to guide their further study. Professional literary critics and literature professors will still be able to express their own views after cultural science is well established, but much of their work should focus on how knowledgeable audiences interpret a subculture to which they belong. The table is like a recommender system, but based on observation of behavior, not expression of opinions in response to a questionnaire.

By writing a story that includes both Luke and Anakin, at least some of the amateur writers demonstrated that the relationship between these characters deserves consideration, and reading what they wrote may provide the material for scholarship, social-scientific analysis of family relationships, or even provide an idea for a future movie in which time travel overcomes death of a parent. Oh, wait! Anakin did not immediately die, but morphed into Darth Vader, the chief enemy of the original trilogy. The archive has 1,612 stories that contain Darth Vader, 729 of which also contain Luke; 346 contain Anakin, and 184 contain all three personalities. We can imagine a journal article based on deep analysis of these 184 works. Table 2.6 would only be the first step toward a serious cultural science analysis, whether a monograph or collection of essays written by a diversity of authors, or even a complex computer game in which the player could choose not only which character to become, but which others would appear as artificially intelligent nonplayer characters.

Conclusion

Although the fact is seldom noted, here we saw that recommender systems are rather similar to the questionnaire research developed long ago in sociology and political science, and thus may offer solutions to the serious problems faced by that traditional methodology today. Two related problems deserve mention: (1) response rates have dropped rather low in questionnaire research, failing to achieve representative samples, and (2) many kinds of questionnaire items have very different meanings for the culturally diverse population, thus obscuring the implication of statistical results. Similar problems plague ethnography and related forms of field observation in the social sciences. Originally, cultural anthropology tended to assume that each “primitive” society had a relatively coherent and uniform culture, and so elaborate population sampling methods were not necessary, but if that was ever true in the past, it is not true today. Thus, while social science can contribute to the convergence of the humanities with information science, it also needs help from these partner disciplines.



1 Goldberg, D., D. Nichols, B.M. Oki, and D. Terry. 1992. “Using Collaborative Filtering to Weave an Information Tapestry.” Communications of the ACM 35, no. 12, pp. 61–70; Resnick, P., and H.R. Varian. 1997. “Recommender Systems.” Communications of the ACM 40, no. 3, pp. 56–58; Basu, C., H. Hirsh, and W. Cohen. 1998. “Recommendation as Classification: Using Social and Content-Based Information in Recommendation.” In Proceedings of the Fifteenth National Conference on Artificial Intelligence. Madison, Wisconsin; Canny, C. 2002. “Collaborative Filtering with Privacy via Factor Analysis.” In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 238–245. New York, NY: ACM; Herlocker, J.L., J.A. Konstan, L.G. Terveen, and J.T. Riedl. 2004. “Evaluating Collaborative Filtering Recommender Systems.” ACM Transactions on Information Systems 22, pp. 5–53.

2 chroniclingamerica.loc.gov

3 en.wikipedia.org/wiki/The_Great_Train_Robbery_(1903_film)

5 Bainbridge, W.S. 1986. Dimensions of Science Fiction. Cambridge: Harvard University Press.

6 en.wikipedia.org/wiki/Factor_analysis

7 en.wikipedia.org/wiki/Charles_Spearman

8 Goldberg, L.R. 1993. “The Structure of Phenotypic Personality Traits.” ­American Psychologist 48, pp. 26–34, Goldberg, L.R. 1999. “A Broad-Bandwidth, Public Domain, Personality Inventory Measuring the Lower-Level Facets of Several Five-Factor Models.” In Personality Psychology in Europe, eds. I. Mervielde, I. Deary, F. De Fruyt, and F. Ostendorf, Vol 7, 7–28. Tilburg University Press, Tilburg, Netherlands.

9 Bainbridge, W.S. 2012. “Whole-Personality Emulation.” International Journal of Machine Consciousness 4, no. 1, pp. 159–175, Bainbridge, W.S. 2014. Personality Capture and Emulation, 58–62. London: Springer.

10 en.wikipedia.org/wiki/Logan%27s_Run_(film)

11 en.wikipedia.org/wiki/Soylent_Green

12 en.wikipedia.org/wiki/The_Andromeda_Strain_(film)

13 en.wikipedia.org/wiki/The_Omega_Man

14 en.wikipedia.org/wiki/Westworld_(film)

15 en.wikipedia.org/wiki/Fantastic_Voyage

16 en.wikipedia.org/wiki/The_Land_That_Time_Forgot_(1975_film)

17 en.wikipedia.org/wiki/I_Am_Legend_(novel)

18 en.wikipedia.org/wiki/Category:Dystopian_films

22 Bainbridge, W.S., 2020. The Social Structure of Online Communities
(Cambridge, England: Cambridge University Press).

23 movielens.org/info/about

24 Harper, F.M., and J.A. Konstan. 2015. “The MovieLens Datasets: History and Context.” ACM Transactions on Interactive Intelligent Systems, files.grouplens.org/papers/harper-tiis2015.pdf

25 Resnick, P., N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. 1994. ­“GroupLens: An Open Architecture for Collaborative Filtering of Netnews.” In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW ’94, 175–186. New York, NY: ACM; Hill, W., L. Stead, M. Rosenstein, and G. Furnas. 1995. “Recommending and Evaluating Choices in a Virtual Community of Use.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’95, 194–201. New York, NY: ACM; Sarwar, B., G. Karypis, J. Konstan, and J. Riedl. 2001. “Item-based Collaborative Filtering Recommendation Algorithms.” In Proceedings of the 10th International Conference on World Wide Web, WWW ‘01, 285–295, New York, NY: ACM; Herlocker, J., J.A. Konstan, and J. Riedl. 2002. “An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms.” Information Retrieval 5, no. 4, pp. 287–310.

26 en.wikipedia.org/wiki/Netflix_Prize

27 Bainbridge, W.S. 2019. Virtual Local Manufacturing Communities: Online Simulations of Future Workshop Systems, 31. New York, NY: Business Expert Press.

28 Gygax, G. 1979. Advanced Dungeons and Dragons, Dungeon Masters Guide. New York, NY: TSR/Random House.

29 Burroughs, E.R. 1922. Chessmen of Mars. Chicago: A.C. McClurg.

30 Bainbridge, W.S. 2016. Star Worlds: Freedom Versus Control in Online Gameworlds. Ann Arbor, Michigan: University of Michigan Press.

31 Kael, A. March 4, 2019. “Foundry Sunset—April 11, 2019.” www.arcgames.com/en/games/neverwinter/news/detail/11102923

32 forums.mmorpg.com/discussion/480490/whats-the-future-of-player-made-content-in-the-mmorpg-genre

33 Bainbridge, W.S. 1989. Survey Research: A Computer-Assisted Introduction, 366. Belmont, California: Wadsworth.

34 Katz, E., and P. Lazarsfeld. 1955. Personal Influence. New York, NY: Free Press.

35 forgottenrealms.fandom.com/wiki/Neverwinter

36 forgottenrealms.fandom.com/wiki/R.A._Salvatore

37 en.wikipedia.org/wiki/Cryptic_Studios

38 en.wikipedia.org/wiki/The_Witcher_(video_game)

39 steamcommunity.com/app/109600/discussions/0/648813727946039645/

40 en.wikipedia.org/wiki/Lua_(programming_language)

41 archiveofourown.org

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.200.220