Chapter 10

Thread networks: Mapping message boards and email lists

Abstract

Many online communities use threaded conversations in the form of email lists, discussion forums, Facebook groups, and sites like Reddit, Quora, and Stack Overflow. Threaded conversations are composed of single-authored messages organized into threads (i.e., top-level message with a chain of replies). Threads are often found within topics or groups. These conversations lend themselves to the creation of several networks including the directed, weighted Reply network and Top-Level Reply network; the undirected, weighted affiliation network connecting threads (or forums) to the individuals that posted to them; and the undirected, weighted networks derived from the affiliation network including the user-to-user network and thread-to-thread network. An analysis of the CSS-D technical support community shows how to identify important social roles and individuals who fill those roles including answer people, discussion starters, and questioners. The analysis of the bimodal Ravelry network shows how to identify important people and integrate non-discussion network metrics.

Keywords

Threaded conversation; Email list; Reply network; Social roles; Answer people; Discussion starters; Questioners; CSS-D; Ravelry

10.1 Introduction

Threaded conversations have served as the foundation of virtual communities since the inception of the Internet. Usenet newsgroups, email lists, web boards, and discussion forums demonstrated the value of threaded conversation from the beginning. More recent incarnations of threaded conversations show up in Facebook and LinkedIn group and profile page discussions, Reddit threads, GameSpot, Craigslist posts, YouTube comments, Amazon ratings, and Q&A sites like Quora and Stack Overflow. All contain collections of messages sent in reply to one another. The natural conversation style supported by the basic post-and-reply threaded message structure has proven enormously versatile, serving communities ranging widely in focus and goals. Cancer survivors and those seeking technical support or religious guidance are as likely to use a threaded discussion as a corporate workgroup. Although the basic structure of threaded conversation has remained surprisingly similar over time, conversations now include multimedia elements, user profiles (often with social network features), participation statistics, reputations scores, and ratings. Conversations are now attached to a host of other entities ranging from people (e.g., a public wall on a person's profile page) to items (e.g., movies; actors) to groups (e.g., university alumni) to events. This chapter primarily focuses on more traditional forums and email lists, but the core analysis techniques for threaded networks apply to other contexts.

The threaded conversation structure lends itself well to network analysis, because a directed link between individuals is created each time someone replies to another person's message. Unfortunately, most threaded conversation systems do not make this networked data easily accessible. The majority of threaded message content is not easily accessible because of the number of software platforms used and the fact that many groups only make content accessible to subscribed members. Many threaded message systems report participation statistics and ratings (e.g., top 10 contributors), which are important metrics. However, they fail to capture the social connections between members, a critical component of virtual communities and internal communities of practice.

10.2 Definition and history of threaded conversation

Threaded conversation is a commonly used design theme that enables online discussion between multiple participants using the ubiquitous post-reply-reply structure. The key properties of threaded conversation were enumerated in Resnick et al. [1] and are listed here with some modification:

  •  Topics: A set of topics, groups, or spaces, sometimes hierarchically organized to aid users in discovering interesting groups to “join.” Topics or groups are persistent, though their contents may change over time. Figure 10.1 includes two topics: TOPIC 1: Social Media and TOPIC 2: NodeXL.
    f10-01-9780128177563
    Figure 10.1 Threaded conversation diagram showing five threads that are part of two different topics. Each post includes a subject (e.g., Thread A), a single author (e.g., Adam), and a time stamp (e.g., 12/10/2010 2:30 pm). Indenting indicates placement in the reply structure. Green posts initiate new threads (i.e., they are top-level threads), yellow posts reply directly to green posts, orange posts reply to yellow posts, and the pink post replies to the orange post.
  •  Threads: Within each topic or group, there are top-level messages and responses to those messages. Sometimes further nesting (responses to responses) is permitted. The top-level message and the entire tree of responses to it is called a thread. In Figure 10.1, there are five unique threads (starting with a green background box). Thread A includes only two messages, whereas Thread B includes six messages. Thread D includes only a single message.
  •  Single authored: Each message contributed to a thread is authored by a single user. Typically, the person's username, real name, or email address is shown alongside the post so people know who is talking. In Figure 10.1, the author of each message and the time of its post are indicated. Users may post to multiple threads (e.g., Beth) or multiple times within a thread (e.g., Cathy).
  •  Permanence: In many threaded conversations including email lists, once a message has been posted it cannot be rewritten or edited. A new message may be posted, but no matter how much someone may wish it, an original post often cannot be retracted. However, in many social media threaded conversations such as Facebook and LinkedIn, original posts can be modified or deleted after the initial contribution.
  •  Thread Navigation: Threaded conversation systems also differ in how users navigate through the different threads. The partitioning of threads and messages into topics is a feature shared by many discussion interfaces. Most systems sort threads and messages in chronological or reverse chronological order (e.g., Figure 10.1). Other systems display threads or messages based on user ratings (e.g., Stack Overflow; Reddit). Often the aggregated views of the threads are the same for all users, but sometimes they are personalized for individuals, such as when Facebook's algorithms decide which threads to display on a feed, or unread messages are shown at the top of an interface.

In addition to this basic structure, there are a few important ways that threaded conversation platforms differ. Some, such as email lists, are push technologies that send updates to all subscribers. Others, such as discussion forums and social media sites, are pull technologies that require individuals to visit a website in order to view messages. These are often accompanied by smartphone or email notifications when there are updates. Also, this chapter focuses on asynchronous threaded conversations, but many synchronous conversations such as texting, Instant Messenger, and group chat follow a similar reply structure even if the pace of interaction and nature of messages is different.

Another important distinction is who can access content. Public conversations allow anyone who visits a website (or email list archive) to read the content, even if they are not part of the community. These are often indexed by search engines, helping their content rise out of obscurity. Semi-public conversations require users to create a username and log in, or join a group (e.g., on Facebook) before accessing content. While often anyone can join and gain access to prior messages, the content may not be indexed by search engines, making it more obscure. Finally, private conversations are only open to those who receive invitations via some existing member or are a member of some organization. Many corporate forums or email lists fall into this category.

The history of public online, threaded conversation communities began in the late 1970s with significant developments in the 1980s as bulletin board systems (BBS), email lists, and Usenet gained traction. The earliest threaded conversation systems relied on dialup connections, which encouraged groups within local telephone calling distance to form. Users of these early community systems demonstrated that text-only communication was sufficient to develop surprisingly meaningful relationships and rich cultures. Early communities covered topics ranging from dentistry to gaming to the occult. Access to public communities was often free. Others charged subscription fees, such as the WELL, a community of writers primarily from the San Francisco Bay Area described in Howard Rheingold's classic book “The Virtual Community: Homesteading on the Electronic Frontier” [2]. Technologies such as Listserv began to develop in the mid-1980s, allowing interested groups to create their own community email lists with increasing ease.

As the Internet and the World Wide Web became ubiquitous in the 1990s, many of the original BBS services became or were replaced by Internet service providers (ISPs). Although they provided many services, asynchronous threaded conversation became one of the mainstays. Tools like Usenet that had relatively few users in the 1980s experienced exponential growth in the 1990s, growing from approximately 2,000 newsgroups in 1991 to close to 11,000 newsgroups in 1996. Today, email lists and discussion forums continue to support numerous communities ranging from neighborhood lists for sharing free items (e.g., Freecycle) to gaming communities, to medical support groups. Although email lists may sound passe, they can outperform social media as a marketing platform, since there is often less competition in one's email list than social media feed. These traditional threaded conversation platforms have proven surprisingly robust, enabling user groups to adapt them to a wide range of usage scenarios.

Threaded conversations have also worked their way into a range of social networking sites, corporate intranets, multimedia sites (e.g., YouTube), customer review sites, Q&A sites, and specialized online community software. Reddit, with 18 billions page views per month in November 2018, and the wildly popular Stack Overflow have shown the power of combining threaded conversation with a voting-based navigation scheme. New mobile apps, such as Marco Polo support threaded video messaging, and popular social networking sites like Facebook include threaded conversations in their groups, fan pages, and profile pages. Corporate communication tools, such as Slack have integrated threaded conversation into a group chat system.

Research on communities that use threaded conversation began in the early days of BBS and Usenet. Many of the same themes continue to be explored today. For example, Kollock and Smith's 1999 book “Communities in Cyberspace” included chapters on identity online, deviant behavior and conflict management, social order and control, community structure and dynamics, visualization, and collective action [3]. All of these topics are still being explored in new contexts and with new technologies such as social networking sites, blogs, microblogging, and wikis. Early books by Preece [4], Kim [5], and Powasek [6] provided some enduring, practical advice and inspiration for those managing online communities. More recent additions by Kraut & Resnick [7] and Howard [8] continue the conversation on how to build community using online conversations and related technologies.

Researchers from a variety of disciplines analyze threaded conversation communities and publish results in communication, business, information science, health, sociology, and computer science journals and conferences. Several have emerged around the Internet, such as the International Conference on Web and Social Media (ICWSM), Association for Internet Research (AoIR), the Journal of Computer Mediated Communication (JCMC), the Association for Computing Machinery's Computer-Human Interaction (ACM-CHI), and Computer-Supported Cooperative Work (ACM-CSCW) conferences are just a few. Findings show that there is a consistent pattern of participation with few core members contributing the majority of content, many peripheral members contributing infrequently, and a large number of lurkers [9] who benefit by overhearing the conversations of others [10]. The nature of computer-mediated conversations depends largely on the type of community that engages in them. For example, technical and medical support communities differ in the level of empathy expressed [11] and the reusability of their content [12].

10.3 What questions can be asked

There are many reasons to explore networks that form within large collections of conversations. New employees or community members need to rapidly catch up with the “story so far” to get to a point that they can make useful contributions. Community managers need tools to help them serve as metaphorical fire rangers and game wardens for huge populations of discussion contributors and the mass of content they produce. When outsiders such as researchers or competitors peer into a set of relationships, social network analysis can point out people, documents, and events that are most notable. A few of the specific questions that can be addressed with network analysis of community conversations are described next:

  •  Individuals. Who are important individuals within the community? For example, who are the question answerers, discussion starters, and administrators? Who are the topic experts? Who would be a good replacement for an outgoing administrator? Who fills a unique niche?
  •  Groups. Who makes up the core members of the community? How interconnected are the core group members? Are there subgroups within the larger community? If so, how are the subgroups interconnected? How do they differ?
  •  Temporal comparisons. How have participation patterns and overall structural characteristics of the community changed over time? What does the progression of an individual from peripheral participant to core participant look like and who has made that transition well? How is the community structure affected by a major event like a new administrative team, the leaving of a prominent member, or an initiative to bring in new members?
  •  Structural patterns. What network properties are related to community sustainability? What are the common social roles that recur among community members (e.g., answer person, discussion starter, questioner, administrator)?

10.4 Threaded conversation networks

The network most commonly used to analyze threaded conversations is the Reply network. Each time someone replies to another person's message, she creates a directed tie to that other person. If she replies to the same person multiple times, a stronger weighted tie is created. To understand the nuances of how a Reply network gets created, you can compare the original data in Figure 10.1 to the Reply network derived from that data and shown in Figure 10.2.

Figure 10.2
Figure 10.2 An example discussion Reply network graph displayed in NodeXL, based on the data found in Figure 10.1. The network is constructed by creating an edge pointing from each replier to the person he or she replied to and then merging duplicate edges. Notice that Beth has replied directly to Dave twice, so the edge connecting them is thicker. Fiona replied to her own message, so there is a self-loop. Greg started a thread but was not replied to. He would normally not show up on the graph because he is not in the edge list; however, he was manually added to the Vertices tab and his visibility was set to Show, so he would appear.

A related, but different network is the Top-Level Reply network, which connects all repliers to the person who started each thread (Figure 10.3) instead of the person they are replying to directly. This network emphasizes those who start threads (i.e., post the top-level message), while de-emphasizing conversations that occur midway through a thread. In some communities with short threads where all replies are typically directed at the original poster, such as Question and Answer (Q&A) sites or Reddit posts, this network can better reflect the underlying dynamics. However, in discussion communities or forums with longer threads, the standard Reply network is typically preferred because people later in the thread are often replying to each other.

Figure 10.3
Figure 10.3 A Top-Level Reply network graph displayed in NodeXL based on the data found in Figure 10.1. The network is constructed by creating an edge pointing from each replier to the person who started the thread (i.e., posted the top-level message) and then merging duplicate edges. Notice how Cathy plays a more prominent role (i.e., has a higher in-degree) than in the standard Reply network graph (Figure 10.2) because she started the longest thread and all subsequent repliers link to her. Self-loops are more frequent in this type of network because people like Cathy may respond to those who replied to her initially, leading to a self-loop.

Affiliation data connecting posters to threads (or forums) can also be used to create bimodal networks (see Chapter 6). These are undirected, weighted networks that connect posters (i.e., users) to threads. For example, an edge would connect Cathy to Thread B with a weight of 2 because she posted to that thread twice. Beth would be connected with a weight of 1 to Thread A, Thread B, and Thread C because she posted to each of them once. This network is ideal for identifying boundary spanners, as you will see in Section 10.6. As discussed in Advanced topic: Transforming a bimodal affiliation network into two unimodal network of Chapter 6, affiliation networks can be transformed into two additional undirected, weighted networks. With threaded networks, you can create a user-to-user network connecting people based on the number of threads (or forums) they both contribute to, and a thread-to-thread (or forum-to-forum) network connecting threads together based on the number of contributors they share. These networks are good for creating overview graphs of large communities with many threads or forums.

Preparing data needed to create threaded conversation networks can be challenging because they rely on such a wide range of technologies. Email lists are the easiest conversations to capture in NodeXL, because you can use the email import wizard (see Chapter 9). Data from discussion forums, Reddit, Facebook, etc. must be generated by using screen scrapers, manually entering data, or performing queries on the forum's database or through a web Application Programming Interface (API). Whichever approach you use, your dataset will likely have header information that includes some of the following information for each message: a time stamp, a message author, an identifier for the message this message is a reply to (if any), a subject line (or thread ID), a set of tags, an attachment, a link to the author's profile, a group or forum the thread is a part of, and a rating. A separate file of information on each user is also often useful. It may include aggregate participation statistics on other community activities (see Section 10.6 for an example). All of this data can be useful in creating maps of conversation networks, but at the core a simple edge list is the minimum necessary requirement to start a social network analysis of a conversation.

The type of discussion platform in use will also influence the potential data problems you are likely to run into. For example, email lists often have people registered with multiple email addresses, making it necessary to combine duplicate addresses for the same person (see Chapter 9). Email lists also have problems when the reply structure isn't clear, because people reply to the email list address rather than to one another. Corporate email lists and discussion forums typically have the cleanest set of unique identifiers for individuals, but even then, name and title changes can cause problems.

It is also important to realize that the reply network from an email list only captures messages sent or posted to the list. Many personal messages sent directly to and from individuals on the email list are not captured. Depending on the type of community and the default settings (e.g., email list Reply To settings), these private messages may account for the majority of all messages exchanged among a population. In addition, other types of communications such as corporate meetings, phone calls, and instant messenger exchanges are invisible to discussion forum networks. People who communicate and contribute more effectively through other channels may show up only marginally in discussion forum datasets. However, even given these limitations, analysis of threaded conversation networks can provide vital information about community dynamics and help identify important individuals and groups.

10.5 Identifying important people and social roles in the CSS-D Q&A reply network

There are a host of email lists, forums, and Q&A websites such as Stack overflow and Quora where people post technical questions and volunteers provide answers. Many companies host these Q&A discussions to learn about problems with existing products, resolve customer concerns, generate new ideas on future improvements, and build a loyal customer community. To meet these goals, it is often important to understand which individuals play important roles within the community, something that can be challenging when managing multiple, active communities or viewing content from across large sites. In this section you will learn how to identify key members of a technical support Q&A discussion focused on cascading style sheets (CSS), which are used in front-end web development. During the time of data collection, the community sent approximately 50 messages a day and included several key administrators who kept the conversation friendly and on topic. See [12] for a complete description of the community and some of the strategies used by the community administrators and members to make it so effective.

In an online community, users contribute in different patterns and styles. In other words, community members fill different social roles. Understanding the composition of social roles within your community or social networking site can provide many insights that can help you be a more effective community manager. Social network analysis provides metrics that can be used to automatically identify those who fill unique social roles and track their prevalence over time. This can help community and social media managers:

  •  Identify high-value contributors of different types. Which community members are the most important question answerers or question starters? Who connects many other users together? Answering these questions can help you know who to thank (and for what) and how to support individuals' needs.
  •  Determine if your community has the right mix of people. Is your community attracting enough question answerers? Are there enough connectors to hold the community together? Is discussion crowding out Q&A? Is a discussion space dissolving into Q&A? Knowing the answers to these questions can help you know who to recruit or encourage more, as well as what policies may be needed.
  •  Recognize changes and vulnerabilities in the social space. How has the community composition changed as it has grown? What is the effect of a certain prominent member leaving the community going to have? Which members are currently irreplaceable in the type of work they do? What is the effect of a policy change or change in settings on the community dynamics? Answering these questions can help you prepare for change, understand the effects of prior decisions and events, and cultivate important relationships.

In this section you will learn how to identify important individuals and social roles within the CSS-D community. You will do this by using subgraph images (introduced in Chapter 7) and creating a composite metric that helps identify the two most important social roles within Q&A communities like CSS-D: answer people and discussion people. You will then use this metric to develop visualizations that show the relationships between these individuals.

The easiest way to get a sense of the key individuals within a network is to create the 1.5 subgraph images for each vertex using a layout like the Harel-Koren Fast Multiscale layout (see Figure 10.4). Once these subgraph images of each email contributor's local networks are created, you can sort on the graph metrics in the Vertices worksheet associated with each contributor such as in-degree (who receives messages from the most people) and out-degree (who sends messages to the most people) to bring differently connected individuals to the top. You can also sort by centrality measures like Page Rank to get a sense of who is a core member of the community, because this member is an active participant and talks to other active participants.

Figure 10.4
Figure 10.4 NodeXL Subgraph images (1.5 degree; vertex and incident edges highlighted red) for six CSS-D contributors that fill three different social roles within the CSS-D community. Answer people predominantly reply to questions from isolates (i.e., those who are not connected to others). Question people typically have a low degree themselves, but they receive messages from those with high degree (i.e., answer people). Discussion starters initiate long threads and receive many replies, often from people who know each other.

Scanning through the Subgraph Images of CSS-D contributors helps you get a sense of the different social roles that exist within the email list community. Figure 10.4 shows examples of three types of contributors (question people, answer people, and discussion starters) along with some of the metrics that could be used to identify them (see Advanced topic: Social role measures for more). Question people post a question and receive a reply by one or two individuals who are likely to be answer people. Answer people mostly send messages (arrows point toward other vertices) to individuals who are not well connected themselves [13]. Discussion starters mostly receive messages (arrows point toward them), often from people who are well connected to each other.

You can typically identify a person's social role by looking at his or her subgraph image (Figure 10.4), but doing so for many individuals becomes problematic. Instead, it is possible to create aggregate metrics that automatically identify those who play certain social roles. These metrics consist of a combination of network metrics and participation metrics. Automatically identifying social roles within a community using metrics facilitates their tracking over time, which allows you to keep your pulse on the health of your community. It can also be used in combination with visualizations as shown in Figures 10.5 and 10.6 to more easily understand individuals' social roles and how they relate to one another. The specific metrics used to identify social roles will depend on the metrics that are available (i.e., those that are tracked) and will be tied to some extent to the underlying type of social media being analyzed. Advanced topic: Social role measures describes several metrics that can be used to identify different roles within threaded conversations.

Figure 10.5
Figure 10.5 NodeXL map of the CSS-D Q&A network after removing the vertex for the email list address itself. Answer people (greener) and discussion starters (redder) are identified by the calculated answer person score (see Advanced topic: Social role measures). Blue vertices have a total degree of fewer than 15. Subgraph images (1.5) of the top four discussion starters are shown. Vertex size is mapped to eigenvector centrality. Edge weight is mapped to both edge size (1.5–4) and opacity (20–80), applying the logarithmic scale and ignoring outliers. Like many help-based communities, CSS-D consists of mostly question askers with a handful of answer people and discussion starters.
Figure 10.6
Figure 10.6 NodeXL map of the filtered version of the CSS-D email list seen in Figure 10.5 showing only the most central members. The maximum size of vertices and edges has been increased to more clearly draw comparisons.

Advanced topic

Social role measures

Custom metrics can be created based on network or attribute data. Some metrics, such as the core social network metrics, are created automatically using the Graph Metrics feature (see Chapter 6). Other metrics, such as the average posts per thread or the days active, must be tracked via some other means. All of the metrics presented in this section are devised so that higher values correspond with typical answer person behaviors. Depending on what data you have available, you can combine different metrics into an aggregate metric by averaging them, multiplying by them, or taking a weighted average. Table 10.1 shows a list of different custom metrics that can be created, alongside their description and interpretation. Figures 10.5 and 10.6 show values for a Degree_Cuttoff, Percent_Out-Degree, and Ans_Person_Score (i.e., answer person score) that are used in the visualizations. The Degree_Cuttoff equals Out-Degree + In-Degree, so that those with a low total degree can be filtered out (e.g., those with under 15). The Percent_Out-Degree is described in the last row of Table 10.1 and is used to identify those who reply to others, versus have people reply to them. Finally, the Ans_Person_Score is calculated by multiplying the Percent Out_Degree by (1 − Clustering Coefficient), which indicates that the person has both a high Out-Degree, as well as a low number of people they message who message each other. Thus, high values of the answer person score identify answer people. In addition, low values identify discussion starters, as they have a high percent in-degree (they solicit replies from many people while not sending messages to many people) and high clustering coefficient (those who they are connected to know each other).

Table 10.1

Social role metrics.
MetricDescription
(User's Thread Count) ÷ (User's Post Count)Brevity is preferred. Larger values = fewer messages per thread
(User's Reply Posts) ÷ (User's Total Posts)Initiation is avoided. Larger values = avoids starting threads
(User's Degree) ÷ (Total Users)Talks to many people. Larger values = replies to a significant fraction of community members
(1 − Clustering Coefficient)Talks to people who aren't well connected to each other. Larger values = lower clustering coefficient (i.e., less well-connected neighbors)
1 ÷ Avg of Neighbor's DegreeTalks to people who connect to few others. Larger values = talks to more isolates
(User's Days Active) ÷ (User's Possible Active Days)Posts on most days. Larger values = posts on multiple days more often
(User's Out-Degree) ÷ (User's Out-Degree + User's In-Degree)Percent out-degree. Larger values = is connected to more people because of replying to them than because of receiving from them

The specific social roles and their prevalence within a particular community will depend on the nature of that community. Because the CSS-D community is primarily a Q&A community, it consists of mostly question askers, a handful of prominent answer people, and a small number of discussion starters. Other more discussion-based communities would have many more discussion starters as well as other social roles such as flame warriors, commentators, and connectors. Tracking the ratio of people that play different social roles can be a good way to assure that a community is healthy. For example, if the CSS-D community had too few answer people or an influx of many question people, it could not function effectively.

Viewing the entire reply network for the CSS-D email list (Figure 10.5) can provide some general insights about the composition of its population, although the size of the network can make it challenging to interpret without filtering. Figure 10.5 maps the answer person score (see Advanced topic: Social role measures) to color: green-colored nodes represent answer people, red-colored nodes represent discussion starters, and blue nodes have a total degree of less than 15. Larger nodes have a higher eigenvector centrality suggesting they are connected to many people and others who are well connected. The binned layout is used to identify isolates, of which there are many because the email list address itself was removed. Isolates represent those who posted to the list and didn't receive a response (e.g., they posted an announcement) or in some cases those who replied to the list without copying in the address of the person who they were replying to. Overall the composite network shows many individuals connected primarily through a handful of central question answerers and a small but stable core group of members that interact with one another regularly.

To better focus in on the core members of the community you can filter out vertices with a total degree of less than 15. Figure 10.6 shows the resulting network after manually positioning the vertices. Subgraph images for the top three answer people are shown. The edge weights, mapped to the edge width and opacity, provide a good sense of who interacted with whom during the 2-month time period and is thus likely to know each other and perhaps have similar interests. Note that even among these core members, discussion starters rarely reply to other discussion starters. Also notice that the largest vertex (i.e., the one with the highest eigenvector centrality), while categorized as an answer person, receives many messages from the core members. This suggests that he plays multiple important roles within the community. In fact, if he were removed from the network, there would be considerably fewer connections between the core members. This suggests that community administrators should make sure this individual is adequately appreciated and encouraged to remain in the community.

10.6 Understanding groups at Ravelry

Ravelry (www.ravelry.com) is a thriving online community for anyone passionate about yarn. Millions of knitters and crocheters have registered on the site. Users organize their projects, yarn stashes, and needles; share and discover designs, ideas, and techniques; and form friendships through discussions and exploration of shared interests. In this section, you play the role of a fictional Ravelry community administrator. You will work with data on the top 20 posters to three discussion forums created for different groups. The data and initial network analysis for this section were developed by Rachel Collins, a graduate student at University of Maryland's iSchool. Special thanks to the Ravelry community for allowing us to analyze it and discuss their fascinating community in the book. All group and individual usernames have been modified for privacy reasons. The techniques used to analyze this bi-modal network can be applied to many similar networks that connect people to discussions or other items.

Imagine you are assigned three group discussion forums to monitor and help develop. They are highly active groups, making it hard to keep up with all the messages and see the forest from the trees. You'd like to get a better sense of how the most important community members relate to one another, as well as how the groups differ. This understanding will help you recommend the best group for a newcomer to join (which could mean you link to the forum more prominently in your website), as well as identify individuals with certain expertise or social relations that you can call upon if needed. You also hope to share the visualization with the groups themselves to encourage self-reflection.

The three groups you are in charge of include one common-interest group (Apathetic Funloving Crafters [AFC]), one meetup (Chicago Fiber Arts), and one knit-along (Project Needy). They are three of hundreds of similar groups. Discussion forums for each group serve as their central hubs. Individuals can participate in as many forum groups as they desire. You have collected data on project output, discussion board usage, blog activity, community roles, and total friends for the top 20 posters in each group. This lets you relate many different activities together in a single analysis, focusing attention on the most active members who are typically the most important.

Figure 10.7 shows a bimodal graph of the three forums/groups (shown in blue) connected to individuals who have posted to them. Edge thickness is based on the number of forum posts (using a logarithmic mapping). The thinnest lines connect users to groups that they are members of but have not yet posted to. Other visual properties are used to convey individuals' levels of activity in other parts of the community. Maroon vertices also maintain a community blog, whereas solid disks are community moderators or volunteer editors. The graph helps you identify important individuals, such as those who post to multiple groups or have certain color/size/shape combinations. It also helps you compare the three groups. For example, the graph makes clear that the Apathetic Funloving Crafters (AFC) forum is very active, includes many bloggers, and includes relatively few people who complete a large number of projects (perhaps explaining the “Apathetic” in the title). In contrast, the Project Needy group includes many highly productive members, many of whom are both administrators and bloggers. In contrast, the Chicago Fiber Arts group has fewer bloggers and less project activity.

Figure 10.7
Figure 10.7 Bimodal network connecting three Ravelry groups (i.e., forums) represented as blue text boxes to contributors represented as circles. Edge width is based on number of posts (with logarithmic mapping). Vertex size is based on number of completed Ravelry projects. Maroon vertices have a blog and solid circles are either community moderators or volunteer editors. The network helps identify important boundary spanners (e.g., those connected to multiple groups) as well as compare groups.

A newcomer to the Ravelry community could use a visualization like the one displayed in Figure 10.7 to quickly get a sense of which group(s) he or she may want to join, as well as identify some of the prominent members. Administrators can use similar graphs to identify potential candidates for volunteer editors or identify clusters of boundary spanners with which to form new groups because of shared interests. Providing graphs like this one to the groups themselves can also prompt self-reflection and potentially foster new connections. They can also be used to better understand how the activities on the site relate to one another, although use of statistics may be needed to more systematically validate initial claims. For example, Figure 10.7 shows that location-based groups have a lower percentage of active members who blog and people who complete many projects seem to cluster into project groups. Understanding trends such as these can help you better target your community and groups around the different user types involved.

10.7 Practitioner's summary

Many online communities use threaded conversations in the form of email lists, discussion forums, Facebook groups, Reddit, and Q&A sites like Quora and Stack Overflow. Although they make use of a wide range of technologies that differ in their delivery infrastructure, all threaded conversations share similar characteristics: they are composed of single-authored messages organized into threads (i.e., a top-level message, replies to that message, and possibly replies to those replies), threads are often found within topics or groups, messages are often permanent, and users often have a shared view of the conversation. These conversations lend themselves to the creation of several networks including the directed, weighted Reply network and Top-Level Reply network; the undirected, weighted affiliation network connecting threads (or forums) to the individuals that posted to them; and the undirected, weighted networks derived from the affiliation network including the user-to-user network and thread-to-thread network. The analysis of the CSS-D technical support community showed how to identify important social roles and individuals who fill those roles including answer people, discussion starters, and questioners. The analysis of Ravelry forums and posters showed how to use a bimodal affiliation network to understand how forum-based groups are connected, identify important boundary spanners, and relate nondiscussion network metrics (e.g., blog activity, project activity) to group discussion activity.

10.8 Researcher's agenda

Research on threaded conversation communities has a long history as outlined in Section 10.2, yet there remain many interesting research questions to explore. As threaded conversations become embedded within more complex social spaces with multiple interaction technologies, it is increasingly important to understand how they all interact. For example, Hansen has found that technical and patient support groups benefit from combining a threaded conversation (i.e., email list) with a more permanent wiki repository [12]. The Ravelry example showed strategies that have not yet been widely used by the research community to understand how network position relates to use of other tools (i.e., blogs) or activities (i.e., projects). Network-based research is also needed to better understand the determinants of successful online communities. For example, we don't know what proportion of mixtures of answer people, discussion starters, and questioners lead to better outcomes or what overall network statistics (e.g., clustering coefficient) are correlated to success. From a design perspective, there are many fascinating opportunities to enhance the threaded conversation model [1] as evidenced by new features on sites like Quora, Reddit, and Stack Overflow. One particularly promising approach is to use visualization to help people make sense of community interaction [14, 15].

References

[1] Resnick P., Hansen D., Riedl J., Terveen L., Ackerman M. Beyond threaded conversation. In: CHI ‘05 Extended Abstracts on Human Factors in Computing Systems (Portland, OR, USA, April 02–07, 2005). CHI ‘05. New York, NY: ACM; 2005:2138–2139.

[2] Rheingold H. The Virtual Community: Homesteading on the Electronic Frontier. Reading, MA: Adison-Wesley Pub. Co; 1993.

[3] Smith M., Kollock P., eds. Communities in Cyberspace. London, UK: Routeledge; 1999.

[4] Preece J. Online Communities: Designing Usability and Supporting Sociability. New York, NY: John Wiley & Sons, Inc; 2000.

[5] Kim A.J. Community Building on the Web: Secret Strategies for Successful Online Communities. first ed. Peachpit Press; 2000.

[6] Powazek D. Design for Community. illustrated ed, Waite Group Press; 2001.

[7] Kraut R.E., Resnick P. Building Successful Online Communities: Evidence-Based Social Design. MIT Press; 2012.

[8] Howard T. Design to Thrive: Creating Social Networks and Online Communities that Last. Morgan Kaufmann; 2009.

[9] Nonnecke B., Preece J. Lurker demographics: Counting the silent. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (the Hague, the Netherlands, April 01–06, 2000). New York, NY: ACM; 2000:73–80 CHI’00.

[10] Hansen D.L. Overhearing the Crowd: An Empirical Examination of Conversation Reuse in a Technical Support Community, in: Proceedings of the Fourth International Conference on Communities and Technologies (University Park, PA, USA, June 25–27, 2009). C&T’09. New York, NY: ACM; 2009.155–164.

[11] Preece J., Ghozati K. In search of empathy online: A review of 100 online communities. In: Proceedings of the 1998 Association for Information Systems Americas Conference. 1998:92–94.

[12] D. Hansen, Knowledge Sharing, Maintenance, and Use in Online Support Communities, Unpublished Dissertation, University of Michigan, http://hdl.handle.net/2027.42/57608.

[13] Welser H.T., Gleave E., Fisher D., Smith M. Visualizing the signatures of social roles in online discussion groups. J. Social Struct. 2007;8(2).

[14] Chen Y. Visual opinion analysis of threaded discussions. In: Data Mining Workshop (ICDMW), 2015 IEEE International Conference on. IEEE; 2015, November:646–651.

[15] F.B. Viégas, M. Smith, Newsgroup crowds and AuthorLines: Visualizing the activity of individuals in conversational cyberspaces, Proceedings of Hawaii International Conference on Software and Systems (HICSS) 2004. [Best Paper: Persistent Conversation Minitrack]

Further reading

Butler B., Sproull L., Kiesler S., Kraut R.E. Community effort in online groups: Who does the work and why. In: Weisband S., Atwater L., eds. Leadership at a Distance. Mahwah, NJ: Lawrence Erlbaum Associates Inc; 2005.

Wenger E. Communities of Practice: Learning, Meaning and Identity. Cambridge: Cambridge University Press; 1998.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.157.6