Many online communities use threaded conversations in the form of email lists, discussion forums, Facebook groups, and sites like Reddit, Quora, and Stack Overflow. Threaded conversations are composed of single-authored messages organized into threads (i.e., top-level message with a chain of replies). Threads are often found within topics or groups. These conversations lend themselves to the creation of several networks including the directed, weighted Reply network and Top-Level Reply network; the undirected, weighted affiliation network connecting threads (or forums) to the individuals that posted to them; and the undirected, weighted networks derived from the affiliation network including the user-to-user network and thread-to-thread network. An analysis of the CSS-D technical support community shows how to identify important social roles and individuals who fill those roles including answer people, discussion starters, and questioners. The analysis of the bimodal Ravelry network shows how to identify important people and integrate non-discussion network metrics.
Threaded conversation; Email list; Reply network; Social roles; Answer people; Discussion starters; Questioners; CSS-D; Ravelry
Threaded conversations have served as the foundation of virtual communities since the inception of the Internet. Usenet newsgroups, email lists, web boards, and discussion forums demonstrated the value of threaded conversation from the beginning. More recent incarnations of threaded conversations show up in Facebook and LinkedIn group and profile page discussions, Reddit threads, GameSpot, Craigslist posts, YouTube comments, Amazon ratings, and Q&A sites like Quora and Stack Overflow. All contain collections of messages sent in reply to one another. The natural conversation style supported by the basic post-and-reply threaded message structure has proven enormously versatile, serving communities ranging widely in focus and goals. Cancer survivors and those seeking technical support or religious guidance are as likely to use a threaded discussion as a corporate workgroup. Although the basic structure of threaded conversation has remained surprisingly similar over time, conversations now include multimedia elements, user profiles (often with social network features), participation statistics, reputations scores, and ratings. Conversations are now attached to a host of other entities ranging from people (e.g., a public wall on a person's profile page) to items (e.g., movies; actors) to groups (e.g., university alumni) to events. This chapter primarily focuses on more traditional forums and email lists, but the core analysis techniques for threaded networks apply to other contexts.
The threaded conversation structure lends itself well to network analysis, because a directed link between individuals is created each time someone replies to another person's message. Unfortunately, most threaded conversation systems do not make this networked data easily accessible. The majority of threaded message content is not easily accessible because of the number of software platforms used and the fact that many groups only make content accessible to subscribed members. Many threaded message systems report participation statistics and ratings (e.g., top 10 contributors), which are important metrics. However, they fail to capture the social connections between members, a critical component of virtual communities and internal communities of practice.
Threaded conversation is a commonly used design theme that enables online discussion between multiple participants using the ubiquitous post-reply-reply structure. The key properties of threaded conversation were enumerated in Resnick et al. [1] and are listed here with some modification:
In addition to this basic structure, there are a few important ways that threaded conversation platforms differ. Some, such as email lists, are push technologies that send updates to all subscribers. Others, such as discussion forums and social media sites, are pull technologies that require individuals to visit a website in order to view messages. These are often accompanied by smartphone or email notifications when there are updates. Also, this chapter focuses on asynchronous threaded conversations, but many synchronous conversations such as texting, Instant Messenger, and group chat follow a similar reply structure even if the pace of interaction and nature of messages is different.
Another important distinction is who can access content. Public conversations allow anyone who visits a website (or email list archive) to read the content, even if they are not part of the community. These are often indexed by search engines, helping their content rise out of obscurity. Semi-public conversations require users to create a username and log in, or join a group (e.g., on Facebook) before accessing content. While often anyone can join and gain access to prior messages, the content may not be indexed by search engines, making it more obscure. Finally, private conversations are only open to those who receive invitations via some existing member or are a member of some organization. Many corporate forums or email lists fall into this category.
The history of public online, threaded conversation communities began in the late 1970s with significant developments in the 1980s as bulletin board systems (BBS), email lists, and Usenet gained traction. The earliest threaded conversation systems relied on dialup connections, which encouraged groups within local telephone calling distance to form. Users of these early community systems demonstrated that text-only communication was sufficient to develop surprisingly meaningful relationships and rich cultures. Early communities covered topics ranging from dentistry to gaming to the occult. Access to public communities was often free. Others charged subscription fees, such as the WELL, a community of writers primarily from the San Francisco Bay Area described in Howard Rheingold's classic book “The Virtual Community: Homesteading on the Electronic Frontier” [2]. Technologies such as Listserv began to develop in the mid-1980s, allowing interested groups to create their own community email lists with increasing ease.
As the Internet and the World Wide Web became ubiquitous in the 1990s, many of the original BBS services became or were replaced by Internet service providers (ISPs). Although they provided many services, asynchronous threaded conversation became one of the mainstays. Tools like Usenet that had relatively few users in the 1980s experienced exponential growth in the 1990s, growing from approximately 2,000 newsgroups in 1991 to close to 11,000 newsgroups in 1996. Today, email lists and discussion forums continue to support numerous communities ranging from neighborhood lists for sharing free items (e.g., Freecycle) to gaming communities, to medical support groups. Although email lists may sound passe, they can outperform social media as a marketing platform, since there is often less competition in one's email list than social media feed. These traditional threaded conversation platforms have proven surprisingly robust, enabling user groups to adapt them to a wide range of usage scenarios.
Threaded conversations have also worked their way into a range of social networking sites, corporate intranets, multimedia sites (e.g., YouTube), customer review sites, Q&A sites, and specialized online community software. Reddit, with 18 billions page views per month in November 2018, and the wildly popular Stack Overflow have shown the power of combining threaded conversation with a voting-based navigation scheme. New mobile apps, such as Marco Polo support threaded video messaging, and popular social networking sites like Facebook include threaded conversations in their groups, fan pages, and profile pages. Corporate communication tools, such as Slack have integrated threaded conversation into a group chat system.
Research on communities that use threaded conversation began in the early days of BBS and Usenet. Many of the same themes continue to be explored today. For example, Kollock and Smith's 1999 book “Communities in Cyberspace” included chapters on identity online, deviant behavior and conflict management, social order and control, community structure and dynamics, visualization, and collective action [3]. All of these topics are still being explored in new contexts and with new technologies such as social networking sites, blogs, microblogging, and wikis. Early books by Preece [4], Kim [5], and Powasek [6] provided some enduring, practical advice and inspiration for those managing online communities. More recent additions by Kraut & Resnick [7] and Howard [8] continue the conversation on how to build community using online conversations and related technologies.
Researchers from a variety of disciplines analyze threaded conversation communities and publish results in communication, business, information science, health, sociology, and computer science journals and conferences. Several have emerged around the Internet, such as the International Conference on Web and Social Media (ICWSM), Association for Internet Research (AoIR), the Journal of Computer Mediated Communication (JCMC), the Association for Computing Machinery's Computer-Human Interaction (ACM-CHI), and Computer-Supported Cooperative Work (ACM-CSCW) conferences are just a few. Findings show that there is a consistent pattern of participation with few core members contributing the majority of content, many peripheral members contributing infrequently, and a large number of lurkers [9] who benefit by overhearing the conversations of others [10]. The nature of computer-mediated conversations depends largely on the type of community that engages in them. For example, technical and medical support communities differ in the level of empathy expressed [11] and the reusability of their content [12].
There are many reasons to explore networks that form within large collections of conversations. New employees or community members need to rapidly catch up with the “story so far” to get to a point that they can make useful contributions. Community managers need tools to help them serve as metaphorical fire rangers and game wardens for huge populations of discussion contributors and the mass of content they produce. When outsiders such as researchers or competitors peer into a set of relationships, social network analysis can point out people, documents, and events that are most notable. A few of the specific questions that can be addressed with network analysis of community conversations are described next:
The network most commonly used to analyze threaded conversations is the Reply network. Each time someone replies to another person's message, she creates a directed tie to that other person. If she replies to the same person multiple times, a stronger weighted tie is created. To understand the nuances of how a Reply network gets created, you can compare the original data in Figure 10.1 to the Reply network derived from that data and shown in Figure 10.2.
A related, but different network is the Top-Level Reply network, which connects all repliers to the person who started each thread (Figure 10.3) instead of the person they are replying to directly. This network emphasizes those who start threads (i.e., post the top-level message), while de-emphasizing conversations that occur midway through a thread. In some communities with short threads where all replies are typically directed at the original poster, such as Question and Answer (Q&A) sites or Reddit posts, this network can better reflect the underlying dynamics. However, in discussion communities or forums with longer threads, the standard Reply network is typically preferred because people later in the thread are often replying to each other.
Affiliation data connecting posters to threads (or forums) can also be used to create bimodal networks (see Chapter 6). These are undirected, weighted networks that connect posters (i.e., users) to threads. For example, an edge would connect Cathy to Thread B with a weight of 2 because she posted to that thread twice. Beth would be connected with a weight of 1 to Thread A, Thread B, and Thread C because she posted to each of them once. This network is ideal for identifying boundary spanners, as you will see in Section 10.6. As discussed in Advanced topic: Transforming a bimodal affiliation network into two unimodal network of Chapter 6, affiliation networks can be transformed into two additional undirected, weighted networks. With threaded networks, you can create a user-to-user network connecting people based on the number of threads (or forums) they both contribute to, and a thread-to-thread (or forum-to-forum) network connecting threads together based on the number of contributors they share. These networks are good for creating overview graphs of large communities with many threads or forums.
Preparing data needed to create threaded conversation networks can be challenging because they rely on such a wide range of technologies. Email lists are the easiest conversations to capture in NodeXL, because you can use the email import wizard (see Chapter 9). Data from discussion forums, Reddit, Facebook, etc. must be generated by using screen scrapers, manually entering data, or performing queries on the forum's database or through a web Application Programming Interface (API). Whichever approach you use, your dataset will likely have header information that includes some of the following information for each message: a time stamp, a message author, an identifier for the message this message is a reply to (if any), a subject line (or thread ID), a set of tags, an attachment, a link to the author's profile, a group or forum the thread is a part of, and a rating. A separate file of information on each user is also often useful. It may include aggregate participation statistics on other community activities (see Section 10.6 for an example). All of this data can be useful in creating maps of conversation networks, but at the core a simple edge list is the minimum necessary requirement to start a social network analysis of a conversation.
The type of discussion platform in use will also influence the potential data problems you are likely to run into. For example, email lists often have people registered with multiple email addresses, making it necessary to combine duplicate addresses for the same person (see Chapter 9). Email lists also have problems when the reply structure isn't clear, because people reply to the email list address rather than to one another. Corporate email lists and discussion forums typically have the cleanest set of unique identifiers for individuals, but even then, name and title changes can cause problems.
It is also important to realize that the reply network from an email list only captures messages sent or posted to the list. Many personal messages sent directly to and from individuals on the email list are not captured. Depending on the type of community and the default settings (e.g., email list Reply To settings), these private messages may account for the majority of all messages exchanged among a population. In addition, other types of communications such as corporate meetings, phone calls, and instant messenger exchanges are invisible to discussion forum networks. People who communicate and contribute more effectively through other channels may show up only marginally in discussion forum datasets. However, even given these limitations, analysis of threaded conversation networks can provide vital information about community dynamics and help identify important individuals and groups.
There are a host of email lists, forums, and Q&A websites such as Stack overflow and Quora where people post technical questions and volunteers provide answers. Many companies host these Q&A discussions to learn about problems with existing products, resolve customer concerns, generate new ideas on future improvements, and build a loyal customer community. To meet these goals, it is often important to understand which individuals play important roles within the community, something that can be challenging when managing multiple, active communities or viewing content from across large sites. In this section you will learn how to identify key members of a technical support Q&A discussion focused on cascading style sheets (CSS), which are used in front-end web development. During the time of data collection, the community sent approximately 50 messages a day and included several key administrators who kept the conversation friendly and on topic. See [12] for a complete description of the community and some of the strategies used by the community administrators and members to make it so effective.
In an online community, users contribute in different patterns and styles. In other words, community members fill different social roles. Understanding the composition of social roles within your community or social networking site can provide many insights that can help you be a more effective community manager. Social network analysis provides metrics that can be used to automatically identify those who fill unique social roles and track their prevalence over time. This can help community and social media managers:
In this section you will learn how to identify important individuals and social roles within the CSS-D community. You will do this by using subgraph images (introduced in Chapter 7) and creating a composite metric that helps identify the two most important social roles within Q&A communities like CSS-D: answer people and discussion people. You will then use this metric to develop visualizations that show the relationships between these individuals.
The easiest way to get a sense of the key individuals within a network is to create the 1.5 subgraph images for each vertex using a layout like the Harel-Koren Fast Multiscale layout (see Figure 10.4). Once these subgraph images of each email contributor's local networks are created, you can sort on the graph metrics in the Vertices worksheet associated with each contributor such as in-degree (who receives messages from the most people) and out-degree (who sends messages to the most people) to bring differently connected individuals to the top. You can also sort by centrality measures like Page Rank to get a sense of who is a core member of the community, because this member is an active participant and talks to other active participants.
Scanning through the Subgraph Images of CSS-D contributors helps you get a sense of the different social roles that exist within the email list community. Figure 10.4 shows examples of three types of contributors (question people, answer people, and discussion starters) along with some of the metrics that could be used to identify them (see Advanced topic: Social role measures for more). Question people post a question and receive a reply by one or two individuals who are likely to be answer people. Answer people mostly send messages (arrows point toward other vertices) to individuals who are not well connected themselves [13]. Discussion starters mostly receive messages (arrows point toward them), often from people who are well connected to each other.
You can typically identify a person's social role by looking at his or her subgraph image (Figure 10.4), but doing so for many individuals becomes problematic. Instead, it is possible to create aggregate metrics that automatically identify those who play certain social roles. These metrics consist of a combination of network metrics and participation metrics. Automatically identifying social roles within a community using metrics facilitates their tracking over time, which allows you to keep your pulse on the health of your community. It can also be used in combination with visualizations as shown in Figures 10.5 and 10.6 to more easily understand individuals' social roles and how they relate to one another. The specific metrics used to identify social roles will depend on the metrics that are available (i.e., those that are tracked) and will be tied to some extent to the underlying type of social media being analyzed. Advanced topic: Social role measures describes several metrics that can be used to identify different roles within threaded conversations.
The specific social roles and their prevalence within a particular community will depend on the nature of that community. Because the CSS-D community is primarily a Q&A community, it consists of mostly question askers, a handful of prominent answer people, and a small number of discussion starters. Other more discussion-based communities would have many more discussion starters as well as other social roles such as flame warriors, commentators, and connectors. Tracking the ratio of people that play different social roles can be a good way to assure that a community is healthy. For example, if the CSS-D community had too few answer people or an influx of many question people, it could not function effectively.
Viewing the entire reply network for the CSS-D email list (Figure 10.5) can provide some general insights about the composition of its population, although the size of the network can make it challenging to interpret without filtering. Figure 10.5 maps the answer person score (see Advanced topic: Social role measures) to color: green-colored nodes represent answer people, red-colored nodes represent discussion starters, and blue nodes have a total degree of less than 15. Larger nodes have a higher eigenvector centrality suggesting they are connected to many people and others who are well connected. The binned layout is used to identify isolates, of which there are many because the email list address itself was removed. Isolates represent those who posted to the list and didn't receive a response (e.g., they posted an announcement) or in some cases those who replied to the list without copying in the address of the person who they were replying to. Overall the composite network shows many individuals connected primarily through a handful of central question answerers and a small but stable core group of members that interact with one another regularly.
To better focus in on the core members of the community you can filter out vertices with a total degree of less than 15. Figure 10.6 shows the resulting network after manually positioning the vertices. Subgraph images for the top three answer people are shown. The edge weights, mapped to the edge width and opacity, provide a good sense of who interacted with whom during the 2-month time period and is thus likely to know each other and perhaps have similar interests. Note that even among these core members, discussion starters rarely reply to other discussion starters. Also notice that the largest vertex (i.e., the one with the highest eigenvector centrality), while categorized as an answer person, receives many messages from the core members. This suggests that he plays multiple important roles within the community. In fact, if he were removed from the network, there would be considerably fewer connections between the core members. This suggests that community administrators should make sure this individual is adequately appreciated and encouraged to remain in the community.
Ravelry (www.ravelry.com) is a thriving online community for anyone passionate about yarn. Millions of knitters and crocheters have registered on the site. Users organize their projects, yarn stashes, and needles; share and discover designs, ideas, and techniques; and form friendships through discussions and exploration of shared interests. In this section, you play the role of a fictional Ravelry community administrator. You will work with data on the top 20 posters to three discussion forums created for different groups. The data and initial network analysis for this section were developed by Rachel Collins, a graduate student at University of Maryland's iSchool. Special thanks to the Ravelry community for allowing us to analyze it and discuss their fascinating community in the book. All group and individual usernames have been modified for privacy reasons. The techniques used to analyze this bi-modal network can be applied to many similar networks that connect people to discussions or other items.
Imagine you are assigned three group discussion forums to monitor and help develop. They are highly active groups, making it hard to keep up with all the messages and see the forest from the trees. You'd like to get a better sense of how the most important community members relate to one another, as well as how the groups differ. This understanding will help you recommend the best group for a newcomer to join (which could mean you link to the forum more prominently in your website), as well as identify individuals with certain expertise or social relations that you can call upon if needed. You also hope to share the visualization with the groups themselves to encourage self-reflection.
The three groups you are in charge of include one common-interest group (Apathetic Funloving Crafters [AFC]), one meetup (Chicago Fiber Arts), and one knit-along (Project Needy). They are three of hundreds of similar groups. Discussion forums for each group serve as their central hubs. Individuals can participate in as many forum groups as they desire. You have collected data on project output, discussion board usage, blog activity, community roles, and total friends for the top 20 posters in each group. This lets you relate many different activities together in a single analysis, focusing attention on the most active members who are typically the most important.
Figure 10.7 shows a bimodal graph of the three forums/groups (shown in blue) connected to individuals who have posted to them. Edge thickness is based on the number of forum posts (using a logarithmic mapping). The thinnest lines connect users to groups that they are members of but have not yet posted to. Other visual properties are used to convey individuals' levels of activity in other parts of the community. Maroon vertices also maintain a community blog, whereas solid disks are community moderators or volunteer editors. The graph helps you identify important individuals, such as those who post to multiple groups or have certain color/size/shape combinations. It also helps you compare the three groups. For example, the graph makes clear that the Apathetic Funloving Crafters (AFC) forum is very active, includes many bloggers, and includes relatively few people who complete a large number of projects (perhaps explaining the “Apathetic” in the title). In contrast, the Project Needy group includes many highly productive members, many of whom are both administrators and bloggers. In contrast, the Chicago Fiber Arts group has fewer bloggers and less project activity.
A newcomer to the Ravelry community could use a visualization like the one displayed in Figure 10.7 to quickly get a sense of which group(s) he or she may want to join, as well as identify some of the prominent members. Administrators can use similar graphs to identify potential candidates for volunteer editors or identify clusters of boundary spanners with which to form new groups because of shared interests. Providing graphs like this one to the groups themselves can also prompt self-reflection and potentially foster new connections. They can also be used to better understand how the activities on the site relate to one another, although use of statistics may be needed to more systematically validate initial claims. For example, Figure 10.7 shows that location-based groups have a lower percentage of active members who blog and people who complete many projects seem to cluster into project groups. Understanding trends such as these can help you better target your community and groups around the different user types involved.
Many online communities use threaded conversations in the form of email lists, discussion forums, Facebook groups, Reddit, and Q&A sites like Quora and Stack Overflow. Although they make use of a wide range of technologies that differ in their delivery infrastructure, all threaded conversations share similar characteristics: they are composed of single-authored messages organized into threads (i.e., a top-level message, replies to that message, and possibly replies to those replies), threads are often found within topics or groups, messages are often permanent, and users often have a shared view of the conversation. These conversations lend themselves to the creation of several networks including the directed, weighted Reply network and Top-Level Reply network; the undirected, weighted affiliation network connecting threads (or forums) to the individuals that posted to them; and the undirected, weighted networks derived from the affiliation network including the user-to-user network and thread-to-thread network. The analysis of the CSS-D technical support community showed how to identify important social roles and individuals who fill those roles including answer people, discussion starters, and questioners. The analysis of Ravelry forums and posters showed how to use a bimodal affiliation network to understand how forum-based groups are connected, identify important boundary spanners, and relate nondiscussion network metrics (e.g., blog activity, project activity) to group discussion activity.
Research on threaded conversation communities has a long history as outlined in Section 10.2, yet there remain many interesting research questions to explore. As threaded conversations become embedded within more complex social spaces with multiple interaction technologies, it is increasingly important to understand how they all interact. For example, Hansen has found that technical and patient support groups benefit from combining a threaded conversation (i.e., email list) with a more permanent wiki repository [12]. The Ravelry example showed strategies that have not yet been widely used by the research community to understand how network position relates to use of other tools (i.e., blogs) or activities (i.e., projects). Network-based research is also needed to better understand the determinants of successful online communities. For example, we don't know what proportion of mixtures of answer people, discussion starters, and questioners lead to better outcomes or what overall network statistics (e.g., clustering coefficient) are correlated to success. From a design perspective, there are many fascinating opportunities to enhance the threaded conversation model [1] as evidenced by new features on sites like Quora, Reddit, and Stack Overflow. One particularly promising approach is to use visualization to help people make sense of community interaction [14, 15].
3.17.157.6