Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3

Social network analysis: Measuring, mapping, and modeling collections of connections

Abstract

Social network analysis uses mathematical tools to systematically understand networks, which are made up of vertices (e.g., people) that are connected to one another via edges (e.g., friendship ties). Network metrics help identify who is most important or central in a network, subgroups (i.e., network clusters) of tightly connected people, and the overall network structure (e.g., the density of a network). Social scientists have developed social network analysis and visualization techniques for decades. Network data is represented as an edge list or matrix. Directed edges have a clear origin and destination, while undirected edges do not. Weighted networks include a value associated with the edge. The scope of a network determines if it is an ego network, partial network, or full network. Multimodal networks include vertices of different types, while multiplex networks include edges of different types. Affiliation networks connect people based on shared affiliations (e.g., club).

Keywords

Centrality metrics; Clustering algorithms; Directed; Undirected; Weighted; Ego network; Multimodal; Multiplex; Affiliation network; History

3.1 Introduction

Human beings have been part of social networks since our earliest days. We are born and live in a world of connections. People connect with others through social networks formed by kinship, language, trade, exchange, conflict, citation, and collaboration. Using computer technologies to create social networks is relatively new, but networks of social interactions and exchanges are primordial. Simply defined, a network is a collection of things and their relationships to one another. The “things” that are connected are called nodes, vertices, entities, and in some contexts people. The connections between the vertices are called edges, ties, relationships or links. Many natural and artificial systems form networks, which exist in systems from the atomic level to the planetary level. A special subset of networks are social networks which are created whenever people interact, directly or indirectly, with other people, institutions, and artifacts. Social network theory and analysis is a relatively recent set of ideas and methods largely developed over the past century. It builds on and uses concepts from the mathematics of graph theory, which has a longer history, starting with Leonhard Euler in 1736. Using network analysis, you can visualize complex sets of relationships as maps (i.e., graphs or sociograms) of connected symbols and calculate precise measures of the size, shape, and density of the network as a whole and the positions of each element and group of elements within it.

The recent proliferation of Internet social media applications and smartphone devices has made social connections more visible than ever before (Chapter 2). A new subset of social networks, social media networks, are a growing focal point for the application of network analysis tools. The idea of networks, whether they are composed of friends, ideas, or web pages, is increasingly an important way to think about the modern world. You can use social network analysis to explore and visualize patterns found within collections of linked entities that include people. From a social network analysis perspective, the treelike “org-chart” that commonly represents the hierarchical structure of an organization or enterprise is too simple and lacks important information about the cross connections that exist between and across departments and divisions. In contrast with the simplified tree structure of an org-chart, a social network view of an organization or population leads to the creation of visualizations that resemble maps of highway systems, airline routes, or rail networks (see Chapter 9). Social network maps can similarly guide journeys through social landscapes and tell a story about how some points or people are at the center or periphery of the network. Maps of transportation networks where distance is measured in number of flights or road miles from one city to another city are familiar. They inspire application to less familiar networks of electrical connections, protein expression, and webs of information, conversation, and human connection.

Social network analysis and metrics are described in several excellent books and journals [1–6]. This chapter touches on the key historical developments, ideas, and concepts in social network analysis and applies them to social media network examples. We have left details of advanced topics and mathematical definitions of various concepts to the many fine technical works. The following is intended as an introductory survey of the core network concepts and methods used in subsequent chapters, which focus on the networks that can be extracted from social media sources like Twitter, Facebook, email, discussion forums, YouTube, and wikis.

3.2 The network perspective

Network analysts see the world as a collection of interconnected pieces. Those studying social networks see relationships as the building blocks of the social world, each set of relationships combining to create emergent patterns of connections among people, groups, and things. The focus of social network analysis is between, not within, people. Whereas traditional social science research methods such as surveys focus on individuals and their attributes (e.g., gender, age, income), network scientists focus on individuals and their “alters”—the people to whom they connect. Network analysis shifts the focus of analysis to the bonds between individuals in addition to the internal qualities and abilities of individuals. This change in focus from attribute data to relational data dramatically affects how data are collected, represented, and analyzed. Social network analysis complements methods that focus more narrowly on individuals, adding a critical dimension that captures the connective tissue of societies and other complex interconnections.

Network analysis shares some core ideas with the real estate profession. In contrast to approaches that look at internal attributes of each individual, network analysis shares the real estate focus on location, location, location! The interior of a house may be a liability, but where a property is located matters far more when trying to get a good sale price. The network perspective looks at a collection of ties among a population and creates measurements that describe the location of each person or entity within the structure of all relationships in the network. The position or location of a person or “node” or “vertex” in relation to all the others is a primary concern of social network analysis. Many network explanations look for causes of outcomes in the patterns of connections around an individual instead of their personal characteristics. “Know who” is often more important in network explanations than “know how.” Network approaches observe that different people in similar social positions often act in similar ways, even if they have different backgrounds. Positions within networks may be as significant a factor as any aspect of the people who occupy them. Network analysis argues that explanations about the success or failures of organizations are often to be found in the structure of relationships that limit or provide opportunities for interaction [7].

Many network concepts are intuitive and echo familiar phrases like “friend of a friend,” “word of mouth,” and “six degrees of separation.” Other network terms like “transitivity,” “triadic closure,” and “centrality” (see Section 3.5) may be unfamiliar terms for familiar social arrangements. Many of us recognize social network differences among people: we know some people who are “popular” and have connections to many others. We may also know some people who may be less “popular” but are still “influential,” connecting to a smaller number of people who have “better” connections. Network analysis recognizes these and other less intuitively sensed patterns in social relationships, like measuring the number of your friends who know each other and how much a person occupies a gatekeeper or bridge role between two groups. The network analysis approach makes the web of interconnections that bind people to one another visible, creating a mathematical and graphical language that can highlight important people, events, and subgroups.

3.2.1 A simple Twitter network example

To better understand the network perspective, consider the social network of Twitter users shown in Figure 3.1 (see Chapter 11 for a description of Twitter). It is an example of a sociogram, also called a network graph, which is a common way of visualizing networks. Like all networks, it consists of two primary building blocks: vertices (also called nodes or agents) and edges (also called ties or connections). The vertices are represented by images of the Twitter user profile photo, and the edges are represented by the lines that point from one vertex to another.

This network graph visualization paints a picture of the social relationships among the Twitter accounts of members of the United States Senate in 2018. The size of each Twitter user’s profile image is determined by the user’s total number of followers as reported by the Twitter Application Programmer Interface (API), which gives software access to extended details about each user’s profile and message data. This is one example of how attribute data (e.g., data that describe a person) can be overlaid onto a network. A line, or edge, exists between two people when one user account “mentions” or “replies-to” another. All of these connections in aggregate reveal the emergent structure of two large distinct groups (G1 and G2) with relatively few connecting links, which loosely map to the two political parties in the United States. These separate clusters reflect the higher rate that members of one party mention one another in contrast to the rate they mention members of the opposing party. This network analysis identifies the individuals who fill important positions within the network, such as those with whom many other people interact and those who are connected across cluster boundaries. The current and following chapters will provide a guide to creating maps like these from Twitter and other social media platforms and data sources. For now, let’s consider the major components of a network in a bit more detail.

3.2.2 Vertices

Vertices, also called nodes, agents, entities, or items, can represent many kinds of things. Often they represent people or social structures such as workgroups, teams, organizations, institutions, states, or even countries. At other times they represent content such as web pages, keywords, or videos. They can even represent physical or virtual locations or events. Vertices often correspond with the primary building blocks of social media platforms as described in Chapter 2: pages in wikis, friends in social networking sites, and posts or authors in blogs.

Although it is not an absolute requirement for network analysis, having attribute data that describe each of the vertices can add insights to an analysis and visualization. For example, Figure 3.1 used descriptive attribute data about the total number of followers for each user to convey a sense of who is most popular on Twitter within the network. Other attribute data from Twitter, such as the number of people each user follows and the date they joined Twitter, can also be mapped to visual attributes (see Chapter 11). More generally, attribute data may describe demographic characteristics of a person (age, gender, race), data that describe the person’s use of a system (number of logins, messages posted, edits made) or other characteristics such as income, location or brand preferences. In network visualization tools like NodeXL, attribute data can be mapped to visual properties such as the size, color, or opacity of each vertex (see Chapter 5).

3.2.3 Edges

Edges, also known as links, ties, connections, and relationships, are the connective tissue of networks. An edge connects two vertices together. Edges can represent many different types of relationships like proximity, collaboration, kinship, friendship, trade partnership, literature citation, investment, hyperlink, transaction, or any shared attribute (e.g., people who attended the same University). An edge can be said to exist if it has some official status, is recognized by the participants, or is observed by exchange or interaction between them. In summary, an edge is any form of relationship or connection between two entities.

Network scientists have developed a language to describe different types of edges. In Section 2.3.5 of Chapter 2, we introduced the core types of connections that occur in social media networks. Here we describe how those concepts map to network and graph theory concepts more generally.

Undirected or directed edges are the two major types of connections. Directed edges (also known as asymmetric edges) have a clear origin and destination: money is lent from one person to another, a Twitter user follows another user, an email is sent from an author to a recipient, or a web page links to another web page. They are represented on a graph as a line with an arrow pointing from the source vertex to the recipient vertex (see Figure 3.1). Directed edges may be reciprocated or not. If I sent you a message, you may send one back in return, or not. An undirected edge (also known as a symmetric or mutual edge) simply exists between two people or things: a couple is married, two Facebook users are friends, or two people are members of the same organization. No origin or destination is clear in these mutual relationships. They cannot exist unless they are reciprocated. Undirected edges are represented on a graph as a line connecting two vertices with no arrows.

Edges can be further described by additional types of data. The simplest type of edge, an unweighted edge or binary edge, only indicates if an edge exists or not. For example, a friendship tie between Facebook users either exists or it does not. In contrast, a weighted edge includes values associated with each edge that indicate the strength or frequency of a tie. For example, a weighted edge between two Facebook users may indicate the number of photo comments exchanged or the duration since the creation of a friendship. Weighted edges are often represented visually as thicker or darker or as more or less opaque lines. Including weighted edge data in a network dataset is preferable because this provides additional information about each tie. However, many social network analysis metrics (see Section 3.5) are designed for unweighted networks. Fortunately, any weighted network can be converted to an unweighted one by choosing a cutoff point. For example, an unweighted edge could be shown between individuals who exchanged at least 10 email messages, with no edge between people who exchanged fewer than 10 messages.

3.2.4 Network data representations

Because network data differ from attribute data, a different way to represent it is used. With attribute data, it is common to create a data matrix where each row represents an individual and each column represents an individual’s characteristics, behaviors, or answers to survey questions. A modified approach is used to represent relational data. Like attribute matrices, each row represents an individual in the network. However, unlike attribute matrices, each column represents other individuals as shown in Table 3.1.

Table 3.1

A network represented as a matrix^a
	Ann	Bob	Carol
Ann	0	1	1
Bob	0	0	0
Carol	1	0	0

^a This network is a directed network, as it is not symmetrical (i.e., Ann points to Bob in row 1, but Bob doesn't point to Ann in row 2). It is a simple binary network: either a tie exists (value = 1) or not (value = 0).

Different types of edges can be represented in network matrices. Table 3.1 describes a directed network because not all connections are reciprocated. For example, Ann “points to” Bob as shown in row 1, but Bob does not “point to” Ann as shown in row 2. If it were an undirected network it would be a symmetric matrix; if Ann points to Bob then Bob must necessarily point to Ann. This network is a binary network because it only includes 1s and 0s, where a 1 indicates that there is a connection and a 0 indicates that there is no connection. Allowing additional values would create a weighted network. For example, the 1s could be replaced with the number of email messages sent or phone calls made to the other person. Notice that the diagonal of the matrix connects each person with himself or herself. In this network, like most networks, the diagonal values are 0 indicating that a person does not “point to” herself. However, in some networks a “self-loop” connecting a person to herself can exist. For example, a person may send herself an email message as a reminder. Network matrices are powerful forms of representation that lend themselves to efficient mathematical manipulation for those inclined. However, they can also become quite large and challenging to navigate, particularly when networks are relatively “sparse” with few connections and many items.

Advanced topic

The foundations of graph theory

Network analysis is rooted in the work of the mathematician Leonhard Euler who in 1736 studied the question whether a single path could be walked over the Seven Bridges of Königsberg that connected islands in the river Pregel (which flows through what was then Prussia and is now Kaliningrad in Russia) without crossing any bridge more than once.¹ By reimagining the problem in terms of vertices and edges, he showed it is impossible to cross each bridge just once. Although the problem seems abstract, its solution led to the development of the mathematics of graph theory and, notably, hundreds of years later, the mathematical work of Paul Erdös and Alfréd Rényi on random graphs in the 1950s, an important theoretical development that allows for the generation of a graph from random processes. Social network analysis builds on these concepts and extends them to capture the nonrandom connections that occur among groups of people.

An alternative to the matrix data format that is a more efficient representation of a network is called an “edge list.” As its name suggests, it is simply a list of all edges in the network as shown in Table 3.2. This is the same network as shown in Table 3.1. Individuals in the Vertex1 column “point to” those in the Vertex2 column. Unless data describing the value of each edge are provided in additional columns, the network is implied to be a binary one. Self-loops are possible to represent in edge lists by having a row with the person’s name repeated in both columns. Throughout this book, you will use edge lists instead of matrices. Edge lists are “efficient” in that they only record a row of data for each connection that does exist in a network, rather than store a “zero” for each possible connection that does not exist. Edge lists can be smaller files and easier to edit and review.

Table 3.2

A network represented as an edge list^a
Vertex1	Vertex2
Ann	Bob
Ann	Carol
Carol	Ann

^a Individuals in the Vertex1 column “point to” those in the Vertex2 column in this directed network. The network is implied to be a binary network. Additional columns could be used to describe each edge. For example, an Edge Weight column could be added with values representing the strength of various ties.

The final method for representing networks is through network graphs. Figure 3.2 is a network graph based on the data in Table 3.2. It makes immediately clear that the relationship between Ann and Carol is reciprocated (i.e., there are arrows on both sides of the line connecting them) and that there is no connection between Bob and Carol. Our earlier analysis of Figure 3.1, another network graph, demonstrates how network graphs can lead to insights that are hard to identify in tabular data, particularly when large networks are presented. However, many network graphs require significant preparation to assure that they are readable as described in Section 3.9 and Chapter 4.

3.3 Types of networks

Social networks range in size from a handful of people to national and planetary populations. They also differ in the types of vertices they include, the nature of the edges that connect them, and the ways in which they are formed. In this section we introduce some of the distinctions that network scientists have identified to describe different types of networks. These distinctions affect the metrics and maps generated from them, as well as their interpretation.

3.3.1 Egocentric, partial, and full networks

It is often useful to consider social networks from an individual member’s point of view. Network analysts call the individual that is the focus of attention “ego” and the people he or she is connected to “alters.” Some networks, called egocentric networks, only include individuals who are connected to a specified ego. For example, a network of your personal Facebook friends would be an egocentric network because you are, by definition, connected to all other vertices, like the hub of a wagon wheel with many spokes. Other egocentric networks and their associated “subgraphs” (see Chapter 7) may extend out from an ego, reaching not only friends, but also friends of friends. More generally, egocentric networks can extend out any number of “degrees” from an ego. The basic “1-degree” ego network consists of the ego and their alters. The “1.5- degree” ego network extends the 1-degree network by including connections between all of the alters. For example, a Facebook 1.5 degree ego network would characterize which of your friends know each other (sadly this data is no longer available from the Facebook platform). The “2-degree” ego network extends the 1.5-degree network by including all of the alters’ own alters (i.e., friends of friends), some of whom may not be connected to the ego. These three sizes of ego networks allow you to look at increasingly larger, but still “local” neighborhoods around a particular individual in a social network. Higher-degree networks (e.g., 2.5, 3) are feasible to create but not used as often in practice because they can quickly grow to a large size and become intractable. Consider, for example, that of the 1.59 billion Facebook users in 2016, there were an average of only 3.57 “intermediaries” between any two people in the network!²

Networks that are smaller than the complete human population are often interesting and some can be small enough to be manageable with the resources available in a desktop or laptop computer. A “full” or “complete” network contains the subset of people or entities who match some interest or attribute and includes information about the set of connections among them all. All the “egos” in a full network are treated equally, none is assumed to be the “ego” of the network, although analysis of these networks will reveal that some people are more strategically located in the network than others. A full network is often created and available when a single system, such as a social media platform, acts as a hub among a group of connected people. For example, the Twitter network includes all users of the service and the connections between them. In practice, it is not always feasible (or particularly insightful) to analyze a platform-scale full network in one dataset. Instead, analysts create more selective sub-networks by selecting a sample or slice of the larger complete network. For example, Figure 3.1 showed the slice of the Twitter network that included the connections among the 100 user accounts for the members of the 115th United States Senate. This partial network is based on a known list of users. Other types of networks are topic centric, they start with a search term and the people who will be included in the data are not (necessarily) known prior to the data collection. Other partial networks may be created to include a subgroup of users (e.g., all conference attendees), or include only people and connections that occurred within a specified time frame, or be limited to people who have certain characteristics (e.g., CEOs of Fortune 500 companies, members of a national or state legislature).

3.3.2 Unimodal, multimodal, and affiliation networks

Up until this point we have only considered networks that connect the same type of entity. These standard networks are called unimodal networks because they include one type (i.e., mode) of vertex. They connect users to users or they connect documents to documents, but they don’t include both users and documents. However, networks can include different types of vertices creating multimodal networks. Chapter 6 includes an example multimodal network that connects Marvel Movies to Characters in those movies. Rich sets of intersecting networks often form in social media environments composed of connections between people, photos, videos, messages, documents, groups, organizations, locations, and services. In many cases, these multimodal networks have to be transformed into simpler unimodal networks to perform meaningful network analysis, as most network metrics are designed for unimodal networks.

A common type of multimodal network is a bimodal network with exactly two types of vertices. Data for these networks often include individuals and some event, activity, or content with which they are affiliated, creating an affiliation network. For example, an affiliation network may connect users with the wiki pages they have edited. People are affiliated with pages. In this network, no two users would directly connect to each other. Likewise, no two pages would directly connect to each other. Pages only link to people (i.e., editors).

Bimodal affiliation networks can be transformed into two separate unimodal networks: a “user edits page” network can be converted into a user-to-user network and an page to page network (see Chapter 6, Advanced topic: Transforming a bimodal affiliation network into two unimodal networks for details). The user-to-user network connects people based on their indirect links to one another through edits to a common page. For example, in a wiki co-edit affiliation network Derek and Marc would be strongly connected because they both edit many of the same wiki pages. In contrast, a Page to Page network connects Pages based on the number of shared editors. For example, a pair of wiki pages would be closely connected if many people edited both of the pages (see Chapter 14). More generally, this approach can be used to relate objects of all types (e.g., books, photos, and audio recordings) based on users’ behaviors (e.g., purchasing or reading habits) and preferences (e.g., ratings). Affiliation networks are the raw material of many recommender systems that recommend items of interest, such as Amazon’s “Customers Who Bought This Item Also Bought” feature. A network data structure can return results to queries like “people who linked to this document also linked to these documents” or “if you link to this document, you may want to link to these people.”

3.3.3 Multiplex networks

Although it is common for two people to be connected in many different ways (e.g., by exchanging phone calls, emails, sharing group membership, and being married), most networks only include one type of connection or edge. However, it is possible to consider networks with multiple types of connections, called multiplex networks. For example, the Twitter network shown in Figure 3.1 includes two types of directed edges: “reply to” relationships and “mention” relationships. The network graph visualization could have uniquely represented each type of edge by using color, different edge types (e.g., dotted lines, solid lines), or edge labels (see Chapter 5). In the case of Figure 3.1, the difference between the two types of edge (reply and mention) was not deemed important, so the multiplex network data was condensed into a uniplex network that showed a single directed edge if one or more of the three types of connections were present. This strategy of combining multiple types of edges is a common one that allows for the use of network metrics, which are mostly based on uniplex networks.

3.4 The network analysis research and practitioner landscape

You can find network scientists in nearly every academic discipline and an increasing number of practitioner communities. Network concepts and techniques are now widely found throughout a range of disciplines including sociology, anthropology, communications, computer science, education, economics, physics, management, information science, medicine, political science, public health, psychology, biology, history and digital humanities. In the past several decades, social scientists have shown that network structures have a profound influence on health, work, and community. Getting a job, being promoted, catching an illness, adopting an innovation, and many more activities and processes have been explained in the terms of social networks. Network structures are important in the biological sciences where research is focused on connections between metabolic and genetic processes. The shape and function of networks can have great consequences as ideas, genes, innovations, or pathogens diffuse through populations. Researchers now apply network theory and methods to understanding how Supreme Court decisions relate to previous cases, how the United States Senate votes (see Chapter 7), how epidemics spread within cities, and how characters in a movie relate to one another (see Chapter 6). Networks are formed from many physical processes and are echoed in a number of structures created inside information systems such as the collection of linked documents within the World Wide Web or an enterprise’s collections of files and emails. Information scientists use these links to identify high-quality web pages (e.g., Google’s PageRank algorithm), or use the citations from research articles to identify high-impact articles and authors.

Network methods are diffusing beyond academic research, becoming an important tool for managing organizations, markets, and movements. Entrepreneurs apply network analysis techniques to understand how to leverage the powerful effects of word-of-mouth marketing as their customers spread news about their new products to one another. Many politicians recognize the potential power of a connected network of supporters who can be turned into contributors, volunteers, and voters. Engineers use network analysis to build more effective power grids, computer networks, and transportation systems. Law enforcement officers and lawyers analyze email networks to identify and prosecute potential criminals. And the intelligence community seeks to identify national security threats by looking at networks created by communication links, money trails and kinship. Having at least a basic understanding of network thinking and concepts is a core literacy of our time. Like statistics, network analysis has countless applications to a number of fields.

This book primarily focuses on social network analysis, a subfield of network sciences that focuses on networks that connect people or social units (i.e., organizations, teams) to one another (see Advanced topic: Early social network analysis). Further, we are interested in networks that connect human-generated content or artifacts together, such as websites or cell phones, or social media networks.

Advanced topic

Early social network analysis

The social science roots of social network analysis can be found in the early 1800s in the work of the person credited with being the first sociologist, Auguste Comte, and later in the early 1900s in the work of the sociologist Georg Simmel. Both saw patterns of social ties as the main focus of sociology in contrast to the study of individuals and their attributes. Early in the 19th century, Comte defined society as more than simply a group of people. He argued that a population became a society only when people had influence on one another and considered the choices and interests of others as part of their own choices. Simmel echoed these ideas at the turn of the twentieth century, focusing social science on the study of how people come together and form groups and associations. These sociologists imagined society as composed of a web of relationships—more than a mass of individuals; they saw societies as networks of interaction and influence.

The idea of connected actions linking people to one another has remained at the core of the social sciences, but efforts to create a systematic language to record social relationships started only in the 20th century. Anthropologists studying the range of kinship systems they documented in fieldwork from around the world created symbol systems that are related to social network analysis. Their maps of who is related to whom were early forms of social networks focused on just the subset of social ties that are considered to be “family.” The core concepts and methods of modern social network analysis date from the 1930s and the pioneering work of Jacob Moreno and his many collaborators. Researchers at New York University, Columbia, and Harvard created the first scholarly works featuring the distinctive core components of modern social network theory: measures, maps, and models. Moreno and his research partners created the first pictures of patterns of groups of people and their partnerships, using visual maps with symbols that represented individuals with different types of lines connecting them to others that represented different kinds of relationships.

Moreno documented relationships among schoolchildren and the way an innovative behavior, running away, moved through chains of student connections. In 1934, Moreno [8] published “Who shall survive,” which catalyzed work among a group of scholars who refined his approach and added critical mathematical elements that today are a standard part of network analysis. These approaches were applied to various settings, and revealed the key roles a relatively small number of people played in their networks along with the presence of subgroups of distinct people. For example, in the 1930s, Davis et al. collected detailed records of observed attendance at 14 social events by 18 southern women, and the graph of that data revealed two distinct groups with minimal overlap [9]. Moreno developed sociometry and is often considered the founder of the sociogram, applying these diagrams in studies of relationships among members of a football team. These diagrams revealed patterns of friendship and animosity (see Figure 3.3) (as produced in Freeman [10]).

Figure 3.3 Jacob Moreno's early social network diagram of positive and negative relationships among members of a football team. Originally published in Moreno, J.L., 1934. Who Shall Survive? Nervous and Mental Disease Publishing Company, Washington, DC.

At Harvard in the 1930s, a group formed around W. Lloyd Warner and Elton Mayo to explore interpersonal relationship in workplaces. Early social network analysis work focused on connections in small work groups in industrial factory settings. For example, Roethlisberger and Dickson [11] studied the Western Electric Wiring room, documenting the ways individuals within a group worked with one another. As seen in Figure 3.4, some workers in the study emerged as the most connected, whereas others appeared as peripheral or isolated. Another dataset was created that represented the relationships among 14 manufacturing employees of the Western Electric Hawthorne Plant. Employees and two inspectors were observed, and each contact among them was coded. When employees played games with one another, argued, were openly friendly, confrontational, or helpful a note and tie was recorded. The result were six networks, which led to a seminal work by the Harvard sociologist George Homans [12] and later more mathematical work that focused on automatically finding clusters or groups within these datasets [13]. In the 1950s, Nadel wrote about social roles and the social structures that define them [14]. He saw that the patterns of connections people had might be similar, even if they were connected to different people. These patterns, Nadel suggested, could be studied systematically, but in the 1950s the data and computational resources made that ambition a challenge.

Figure 3.4 An early social network diagram of relationships among workers in a factory illustrates the positions different workers occupy within the workgroup. From Management and the Worker: An account of a Research program conducted by the western electric company, Hawthorne works, Chicago by F. J. Roethlisberger and William J. Dickson, Cambridge, Mass.: Harvard University Press, Copyright © 1939 by the President and Fellows of Harvard College. Copyright renewed © 1966 by the President and Fellows of Harvard College.

Over time, Moreno's colleagues, including Paul Lazarsfeld, added key ingredients of the modern form of social network analysis: metrics and algorithms for calculating important network properties of the graph as a whole and for each individual in the graph (see Freeman [10] for details).

3.5 Network analysis metrics

Social scientists, physicists, computer scientists, and mathematicians have collaborated to create novel theories and algorithms for calculating measurements of social networks and the people and things that populate them. These quantitative network metrics allow analysts to systematically inspect the patterns of connection within the social world, creating a basis on which to compare networks, track changes in a network over time, and determine the relative position of individuals and clusters within a network.

Social network measures initially focused on simple counts of connections and over time became more sophisticated as it developed and incorporated concepts of network density, centrality, structural holes, balance, and transitivity. Some metrics describe a network as a whole. For example, vertex count is the number of entities in the network while the edge count is the number of connections among them. Another whole network metric “density” captures how connected a set of vertices are by calculating the percentage of connections that are observed from maximum possible count if everyone connected to everyone. Other metrics are calculated for each vertex in a network. For example, “centrality” measures, of which there are many, capture how “important” (central) a vertex is within the network based on some objective criteria. Some people sit at the edge or periphery of their networks, whereas others are firmly at the center, connected to many of the other most connected people. In most human networks, even highly connected networks, some pairs of people are not directly connected. When a third person bridges a connection (a “friend of a friend”), we can think of that person as a broker, a “bridge” or a “connector.” When that person is missing, we can think of the gap as a “structural hole,” a place in which there is a missing connector, potentially a good spot to build a “bridge.” The following sections describe some of these metrics in more detail. Chapter 6 introduces some of the core metrics found in NodeXL through hands-on exercises.

Advanced topic

Historical obstacles to the development of network analysis

Following the rapid development of the major elements of social network analysis in the 1930s there was a period of stagnation and neglect. For a variety of reasons, from Moreno’s own personal and professional conflicts to the cost and lack of available network datasets and computing resources, social network analysis languished for decades.

The early social network literature was built on manually collected and processed data about social ties. Researchers would typically observe or survey population members, asking each to list those they came in contact with regularly for a variety of tasks and purposes. People are often unable to recall all their interactions accurately. The prohibitive cost of this approach was also a major limiting factor in the widespread application of social network analysis in enterprises and organizations. The recent explosion of computer-mediated social relationships and the associated drop in the costs of creating network datasets have made network approaches increasingly practical. As more details about our interactions and associations are tracked and captured by mobile devices and social media services, network analysis becomes increasingly useful.

Network analysis is computationally intensive: many network metrics can require generating millions of calculations even when processing modest sized datasets. The recent explosion of computing power and the associated drop in costs have made network approaches increasingly practical, even if network methods remain among the most computationally intensive in use.

3.5.1 Aggregate network metrics

A number of metrics are used to describe and summarize an entire network. In some cases, a single network dataset contains sub-networks separated into several disconnected pieces, called components. Some aggregate network metrics only work on networks where all of the vertices are connected in a single component, whereas others can be applied to entire networks even if they are split up into disconnected segments. Here we describe just a few aggregate network metrics to give a flavor for what is possible, leaving a fuller discussion for Chapter 6.

As mentioned, density is an aggregate network metric used to describe the level of interconnectedness among a set of vertices. Density is a count of the number of relationships observed to be present in a network divided by the total number of possible relationships that could be present. It is a quantitative way to capture important sociological ideas like cohesion, solidarity, and membership.

Centralization is an aggregate metric that characterizes the amount to which the network is centered on one or just a few important nodes. Centralized networks have many edges that emanate from a few important vertices, whereas decentralized networks have many vertices with many interconnections. Networks with high levels of centralization are likely to be more hierarchical, with a few people playing hub roles.

Other metrics integrate attribute data with network data. For example, metrics that measure homophily look at the similarity of people who are connected. Studies typically show that people are connected to others who are similar to themselves on core attributes like income, education level, religious affiliation, and age.

3.5.2 Vertex-specific network metrics

A set of network metrics are similar to the geographic concepts of latitude and longitude, coordinates that identify each individual's position within a network. Paramount among these is the set of “centrality” measures, which describe how a particular vertex can be said to be in the “middle” of a network. In the 1970s and 1980s, the sociologist Phillip Bonacich developed a refined measure of centrality that took into consideration the different value a highly-connected person can have in contrast to people with a few rare connections. Network theorists noted that simply having many connections, called “degree centrality,” was only one way to be “at the center” of things. A person with fewer connections might have more rare and potentially “important” connections than someone with more connections. One connection can be more important than another in different ways. Some are better because they bridge across otherwise separated portions of the network, whereas others are important because they connect to well- connected people. The following centrality metrics provide quantifiable measures for these concepts (see Chapter 6 for more details).

Degree centrality

Degree centrality is a simple count of the total number of connections linked to a vertex. It can be thought of as a kind of popularity measure, but a crude one that does not recognize a difference between quantity and quality. Degree centrality does not differentiate between a link to the CEO of a big company and a link to its most recent trainee hire. Degree is the measure of the total number of edges connected to a particular vertex. For directed networks where relationships have an origin and a destination rather than have mutual connections, there are two measures of degree: in-degree and out-degree. In-degree is the number of connections that point inward at a vertex. Out-degree is the number of connections that originate at a vertex and point outward to other vertices.

Betweenness centrality: Bridge scores for boundary spanners

The notion of connection paths is central to the study of networks. Perhaps one of the most natural questions to ask about any two people in a network it is “How far apart are they?” This distance is measured simply: the distance between people who are not neighbors is measured by the smallest number of neighbor-to-neighbor hops from one to connect to the other. For instance, people who are not your neighbors, but are your neighbors' neighbors, are a distance 2 from you, and so on. The shortest path between two people is called the “geodesic distance” and is used in many centrality metrics. For example, betweenness centrality is a measure of how often a given vertex lies on the shortest path between two other vertices. This can be thought of as a kind of “bridge” score, a measure of how much removing a person would disrupt the connections between other people in the network. The idea of brokering is often captured in the measure of betweenness centrality.

A “structural hole” is a term for recognizing a missing bridge. Wherever two or more groups fail to connect, one can argue that there is a structural hole, a missing gap waiting to be filled. Burt provides compelling evidence that individuals who bridge structural holes within their organizations are promoted faster than others [15]. Social network analysis has many strategic applications for people in an organization to analyze their position and the position of others. Managers and leaders can recognize gaps or disconnections within organizations and devote resources to bridging the divide. People may be able to apply social network analysis to identify locations in which a gap exists and elect to fill them, recognizing the value they can generate as broker between two otherwise separate groups.

Closeness centrality: Distance scores for strategically located people

Closeness centrality measures each individual’s position in the network via a different perspective from the other network metrics, capturing the average distance between each vertex and every other vertex in the network. Assuming that vertices can only pass messages to or influence their existing connections, a low closeness centrality means that a person is directly connected or “just a hop away” from most others in the network. In contrast, vertices in very peripheral locations may have high closeness centrality scores, indicating the high number of hops or connections they need to take to connect to distant others in the network. Think of closeness, paradoxically, as a “distance” score. Some people are just a few miles from the big city, others must drive for hours: similarly, people with high “closeness” centrality scores have many miles or rather personal connections that they must travel to reach many other people in the network. Note that in some cases the inverse of the average distance to others in the network is used as a measure of closeness centrality. In that case, higher values indicate a more central position.

Eigenvector and PageRank centrality: Influence scores for strategically connected people

Eigenvector centrality is a more sophisticated view of centrality: a person with few connections could have a very high eigenvector centrality if those few connections were to very well-connected others. Eigenvector centrality allows for connections to have a variable value, so that connecting to some vertices has more benefit than connecting to others. The PageRank algorithm used by Google's search engine is a variant of Eigenvector Centrality, primarily used for directed networks. PageRank considers (1) the number of in-bound links (i.e., sites that link to your site), (2) the quality of the linkers (i.e., the PageRank of sites that link to your site), and (3) the link propensity of the linkers (i.e., the number of sites the linkers link to). See Chapter 6 for a more in-depth discussion and examples.

Clustering coefficient: How connected are my friends?

The clustering coefficient metric differs from measures of centrality. It is more akin to the density metric for whole networks, but focused on egocentric networks. Specifically, the clustering coefficient is a measure of the density of the 1.5-degree egocentric network for each vertex. When these connections are dense, the clustering coefficient is high. If your “friends” (alters) all know each other, you have a high clustering coefficient. If your “friends” (alters) don’t know each other, then you have a low clustering coefficient. People have different measures for their clustering coefficient depending on the ways they cultivate connections to others and the environments they are in.

3.5.3 Grouping, clustering, and community detection algorithms

A network approach can discover and identify the boundaries of groups and clusters, or apply existing information about each vertex to create categories or divisions. In a network perspective, people maintain many relationships and are potentially members in many loosely defined groups and clusters. Defining exact group boundaries in a network may be difficult, reflecting the reality of people with multiple and shifting memberships. From a network perspective, a group is a collection of vertices. Groups can be formed for many reasons, in some cases some vertices are more connected to one another than they are to others. Relatively more cohesive or densely connected sets of vertices form regions, also called clusters, that may reflect the existence of groups. A group of people discovered in this way might not be explicitly named or recognized. Members of a network cluster might not recognize their collective membership despite their individual connections to others in the group. A rapidly growing body of research describes clustering algorithms, also called community detection algorithms, that automatically identify these clusters based on networks structures, as discussed in Chapter 7.

3.5.4 Structures, network motifs, and social roles

Two people within a network may sometimes share a pattern of connection to other people, even if they do not connect to the same people. Certain professions have distinct patterns of connections, either linking with many others (real estate agents, and other retail professionals) or few (reclusive authors and artists, remote office workers, and some people whose work focuses on things rather than people). In addition to having the same the number of connections, some people share the same pattern of connections among the people with whom they connect. In some cases people are connected to people who are strangers to one another, in other cases a group may be densely connected to one another. These secondary patterns of connection are a distinctive feature of network analysis approaches: networks are as much about the attributes and patterns of connection among neighbors as they are about the attributes and connections of any individual.

Social roles are complex cultural and structural features of social life. An example social role like “father” is explicitly recognized in society, has a wide set of culturally shared meanings and expectations, is associated with particular goals and interests, and is partly defined by the content and structure of actions directed toward other distinctive role holders. Other types of social roles may not be as clearly defined or explicitly recognized by all the actors in a given social setting, but they have identifiable content, behavioral, and structural features.

Studies of social media have illustrated the ways contributors create distinctive network patterns that reflect their role or status within the community (e.g., Welser, Gleave, and Smith [16]). These patterns are evidence of specialization of behavior in these social spaces. An example of a role in a social media space is the “answer person” who disproportionately provides the answers to questions asked in message board environments (see Chapter 10), “discussion people” who engage in extended exchanges of messages in large and populous threaded discussions, “discussion starters” who demonstrate influence over the topics discussed by the “discussion people,” “influential” people who are well connected to others who are more highly connected than they are, and boundary spanners who bridge between unconnected subgroups.

Advanced topic

A renaissance of network research and data

Since the 1960s, network analysis has blossomed. New research and methods have flourished and social networking has developed a new prominence in mainstream culture. Despite early challenges, in the past several decades a healthy and growing subfield has reemerged around social network analysis. New network tools and concepts have been created and applied to a wide and growing range of domains. Mathematical sociology has developed as a major subdiscipline in the social sciences, dedicated to finding elegant descriptions of complex social phenomena. Starting 80 years ago with simple hand-drawn charts and diagrams that described small groups of people and their connections, network science concepts, methods, and tools are used today to calculate a range of measures that describe the shape, structure, and dynamics of potentially multi million or billion vertex networks. New methods have been developed for automatically organizing and displaying visualizations of the links among large populations. This combination of structural models, visualizations, and metrics forms the key features of modern social network analysis.

In the late 1960s, Stanley Milgram explored the idea of small world networks in a study that came to be referred to as “Six Degrees of Separation” [17], which later inspired the 1990 John Guare play and 1993 movie of the same name. The Milgram study explored the question of how connected any two people selected at random might be. Milgram sent a collection of letters to randomly selected people around the United States asking them to send the message to someone they knew who could move their letter closer to the target, a stock broker in Massachusetts. On average, the letters took six steps to arrive at their destination. The “six degrees” or steps suggested that even in large networks where most people are not directly connected, people can be reached from almost every other person through a small number of steps (although possibly more than six, which was the average number of hops not the maximum!).

Sampson’s study in the late 1960s of relationships among members of a residential monastery captured social network data during an event in which several members were expelled or chose to leave [18]. A series of social network datasets were collected by asking participants about who they liked and spent time with. Social network analysis of this data allowed Sampson to identify the future lines of division among the members of the network. The idea that members of a network can be grouped based on how densely they are connected is an important concept in network analysis. The sub-groups identified by network analysis can reflect important real world social divisions with consequences for the future of that network. For example, a notable study by Zachary in the 1970s mapped the structure of a Karate club based on affinities and connections between students and teachers. These maps predicted the ways the club eventually split when a new teacher, in conflict with the owner, left the studio and took many students with him [19].

The sociologist Barry Wellman demonstrated in the 1970s that real-world communities are composed of interlocking social networks of specialized relationships that changed dramatically in composition over a period of years. He proposed that society was now characterized by networked individualism in contrast to the clearly defined group memberships and identities of prior periods. Rather than defining oneself in professional or political terms, Wellman observed that people create personal networks in which they occupy distinct locations and roles. He later applied these techniques to study online networks [20]. In 1977, Wellman founded a social network analysis professional association, the International Network for Social Network Analysis (INSNA). INSNA now has more than a thousand members, many of whom have gathered for more than 20 years for an annual conference (“Sunbelt”) on social network analysis research.³ Journals and publications devoted to social network analysis include Social Networks, Connections, and the Journal of Social Structure. Social network data, methods, and visualizations appear across a much wider spectrum of journals and conference publications.

In the early 1970s, the sociologist Mark Granovetter investigated the employment market, looking at how people discovered new job opportunities. He observed that, in contrast to the view held by classical economics, people were not freely floating independent actors in the labor market. They were embedded in a set of different relationships with particular people. Granovetter found that job news passed through connections that were not the closest and most intense relationships [21]. A person’s “weak ties” brought news from distant parts of the social network to which “strong ties” did not have access because they occupied such a similar place in the network as the job seeker. Thus weak ties proved particularly useful for finding novel information, such as information about job prospects. Because weak ties were less intense, they were also less costly to maintain in terms of time and attention. As a result, it is possible to have many weak ties but only a few strong ties.

Empowered with new network metrics and the means to calculate them, network analysts have focused on a variety of data sources and questions. Social networks have been applied to historical studies using records of investments, marriages, and memberships in elected positions. In the 1400s in the city of Florence, the Medici and Strozzi families struggled for domination. These families, along with many others, were locked in political struggles. In the 1970s, John Padgett collected records of the social relations among Renaissance Florentine families that he extracted from historical documents. Families were often connected through a variety of ties, relations, and business connections. A dataset was created that represented the financial loans, credits and joint partnerships, and marriages that bound families to one another. The resulting dataset included information about each family as well as their links to others. Each family had a value representing its net wealth in the year 1427, the number of seats it held in the local government between the years 1282 and 1344, and the number of business or marriage ties among the population of 116 families. Analyzing these data, Padget found that the Medici held great power because, he argued, they sat at the center of business and family networks, brokering connections that no other family could equal [22, 23].

A more modern version of the study of historical Florentine politics can be found in the study of interlocking directorships in modern corporations. Many corporations and other institutions have a board of directors, some of whom serve on more than one board. When board members serve on two or more boards, they link those corporations and, in aggregate, create interlocking directorships that combine to form even larger meta-institutions. By building on research on interlocking directorships in U.S. corporations [24, 25], websites like “They Rule” provide an interactive map that displays the common links between major corporations.⁴

In 1992, Robin Dunbar famously argued that people have an innate ability to handle a number of social relationships but not an endless number of them. Remembering people's names may have a biological limit as our brains evolved over long periods in which there were rarely more than a few hundred people within any region, group, or tribe. The number 150 has been loosely associated with the idea of a “Dunbar” number, an upper limit on the number of relationships a person can normally manage.⁵ Other social animals with smaller brains have lower Dunbar numbers than humans, suggesting that complex social relationships require more mental resources (and the cranial volume to hold it). The Dunbar number advantage humans already have can be expanded with augmentation, through analog technologies like diaries, address books, the “filo-fax,” and now Friends and Contact list managers in social media platforms. Social media tools like Facebook, LinkedIn, text messengers and email contact lists extend our ability to maintain more relationships. These additional relationships can be said to be “weaker” than the core 150 “organic” relationships, but as Granovetter has shown, weak ties can collectively be of enormous strength and value.

Business applications of social network analysis

Social network analysis has historically been an academic endeavor, but as network analysis tools and datasets become more available, pioneering businesses are applying it to help manage business challenges, gain insight into markets and communities, and build more robust industry relationships. For example, the work of Rob Cross and the Network Roundtable focuses on several practical applications of social network analysis for corporations and other large organizations, highlighting differences between healthy and underperforming divisions and the value of organization spanning connections [26, 27]. Others apply network analysis to the improvement of corporate structures and processes [28]. In the early 1990s, Monge and Contractor [29] documented the many forms of social network patterns that emerge inside of organizations and institutions.

Social networks have been shown to have a significant influence on the adoption of new technologies or social practices. The sociologist Everett Rogers described the concept of the “diffusion of innovations,” arguing that people with particular patterns of connections to others played pivotal roles in the success or failure of a new idea or message being rejected or adopted and distributed through the network [30]. Networks with different patterns of connection have different properties in terms of how they propagate a new message, rumor, or product and how they resist being dissolved when vertices are removed from the graph. These observations have significant implications for interventions into disease and rumor propagation and the cultivation of innovation [31].

Networks play an important role in e-commerce where collaborative filtering powers the familiar list of “books that people who liked this book also liked.” Businesses are also interested in learning the requirements of viral marketing. Diffusion can often lead to “cascades” where an unknown, even marginal idea can spread rapidly throughout the entire network and become widely observed, if still rare.⁶ Memes are a commonly-cited example of contagion, as are viral messages, such as viral videos on YouTube that go from dozens to millions of viewers in a few months, weeks or even days.

3.6 Social networks in the era of abundant computation

The widespread adoption of networked communication technologies has significantly expanded the population of people who are both aware of network concepts and interested in network data. Although the idea of networks of connections of people spanning societies and nations was once esoteric, today many people actively manage an explicit social network of friends, contacts, buddies, associates, and addresses that compose their family, social, professional, and civic lives. Facebook posts forwarded from person to person have become a common and visible example of the ways information passes through networks of connected people. The notion of “friends of friends” is now easy to illustrate in the features of social media applications like Facebook and LinkedIn that provide explicitly named “social networking” services. Viral videos and chain emails illustrate the way word of mouth has moved into computer-mediated communication channels. The idea of “six degrees of separation” has moved from the offices of Harvard sociologists to become the dramatic premise of a Broadway play to now appear as an expected feature of services that allow people to browse and connect to their friend’s friends.

As network concepts have entered everyday life, the previously less visible ties and connections that have always woven people together into relationships, cliques, clusters, groups, teams, partnerships, clans, tribes, coalitions, companies, institutions, organizations, nations, and populations have become more apparent. Patterns of information sharing, investment, personal time and attention have always generated network structures, but only recently have these linkages been made plainly visible to a broad population. In the past few decades, the network approach to thinking about the world has expanded beyond the core population of researchers to a wide range of analysts and practitioners who have applied social network methods and perspectives to understand their businesses, communities, markets, and disciplines. Today, because many of us manage many aspects of our social relationships through a computer-networked social world, it is useful for many more people to develop a language and literacy in the ways networks can be described, analyzed, and visualized. Visualizing and analyzing a social network is an increasingly common personal or business interest. The science of networks is a growing topic of interest and attention, with a growing number of courses for graduates and undergraduates, as well as educational materials for a wider audience (e.g., television documentaries).⁷

The availability of cheaper computing resources and network datasets has enabled a new generation of researchers access to studies of the structures of social relationships at vastly larger scale and detail. Since the late 1960s, as computing resources and network datasets have grown in availability and dropped in cost, researchers began developing tools and concepts that enabled a wider and more sophisticated application of social network analysis.

Advanced topic

Social network analysis research meets the web

As access to electronic networks grew in the 1970s, academic and professional discussions and collaborations began to take place through them. Systems to support the exchange of messages and the growth of discussions and even decisions became a major focus of systems development and the focus of study itself. Freeman and Freeman [32] collected data from the records of the Electronic Information Exchange System (EIES) that itself hosted a discussion among social network researchers. Two relations were recorded: the number of messages sent and acquaintanceship. These systems became the focus of the first systematic research into naturally occurring social media. Even before the Internet, early computer network applications supported the creation of exchanges, discussions, and therefore social networks, built by reply connections among authors.

Early proprietary systems evolved into the public World Wide Web. In the 1990s, the computer scientist Jon Kleinberg created an algorithm called HITS that identified the patterns of links between high-quality web pages. This algorithm later inspired Stanford graduate students Sergey Brin and Larry Page who founded the Google corporation to develop a further refinement they called “Page Rank.” Kleinberg’s work described different locations within a population of linked documents on the World Wide Web: not all documents are equal. On the Web, a document or page can link to another page, forming a complex network of related documents. Some documents contain many pointers to other documents, whereas others have many documents that point at them. These “hubs” and “authorities” defined two broad classes of web pages that offered a path to identifying high-quality content. Links from one page to another are considered to be indicators of value. Refinements of the HITS algorithm made use of eigenvector centralities to implement the page rank algorithm that is the core of the Google “Page Rank” web search ranking method [33].

Network researchers studying social networks and the Internet found that empirical networks often exhibit “small-world properties”: most nodes are not neighbors with each other, but most nodes can be reached from almost every other node in a small number of hops. In the late 1990s, the physicist/sociologist Duncan Watts, working with the mathematician Steven Strogatz, created mathematical models of “small world” networks and contrasted them with purely random networks such as those proposed by Erdos and Renyi [34]. Their model captured the natural properties of social networks far better than those that assumed a purely random or normal distribution of links. Although most people have connections to other people who are local to them, people occasionally have a few connections that link them to another person physically far from the individual. Many of our friends are likely to live or work near us, but a few may be very far away. Even a modest number of these relatively rare far-reaching links can dramatically change the properties of a network, making the widespread transmission of messages much easier. This model significantly improved on earlier ways of thinking about network growth and structure, better approximating the observed structure of naturally occurring social networks. Later researchers have built upon their work to devise models that generate “small world” networks that more closely match empirical networks, helping us to understand how networks may have become the way they are. For example, Barabasi and Albert have developed a family of models of preferential attachment that can generate “scale-free” networks, which are a common feature of social networks [35]. Scale free networks have a power law degree distribution, meaning that there are a few key hubs in a network and many poorly connected vertices. While none of these models perfectly predict real world observed social networks, they provide a method for systematically comparing networks and focus attention on the processes that may have led to the characteristics that we do see in the networks around us.

In the past few years, researchers have begun to study large web-based networks. For example, Leskovec and Horvitz calculated metrics for a graph that includes more than 300 million users of the Microsoft Messenger service [36]. Each user typically had one or more “buddies” with whom he or she might send one or more messages and receive some in return. Buddies often listed their locations, allowing these linkages to be aggregated into a complex map of the world and the flow of conversation around it. Others have reported on the hyperlink network created by web pages hyperlinking to other web pages (e.g., Park and Thelwall [37]). A number of studies have examined the blog network. For example, Adamic and Adar [38] showed how political blogs are divided into two clear clusters with minimal overlap that represent the left and right political populations in the United States. More recently, Kelly and Etling mapped Iran’s blogosphere, identifying more than 20 subcommunities of bloggers who wrote in Farsi for an Iranian audience.⁸

Another line of research has focused on visualizing social networks. For example, an early influential paper by Heer and Boyd [39] described a tool called Vizster that allowed users to navigate through their friends from a social networking site to explore social connections. Now there are entire conferences dedicated to network visualization, such as the annual Graph Drawing and Network Visualization symposium.

As social media has matured and its ability to shape perceptions has been recognized, it has become the target of concerted efforts at manipulation. Misinformation, disinformation and propaganda have grown in visibility and concern. Claims and counter claims of “fake news” have become common. While initial critical analysis of social media like the work of Eli Parisier focused on its divisive ability to create “filter bubbles,” later work has explored the ways the bubbles can be penetrated and the divisions between them amplified.⁹ Computer scientist Kate Starbird shows the ways that national, political and commercial groups have collaborated to influence social media by creating messages aimed at making already divided groups more extreme.¹⁰ Fil Menczer and collaborators are building tools to identify “bots” and address the propagation of disputed or low value information.¹¹ Information networks are designed to move all kinds of information, not just the true or high value kinds. Paradoxically people often resist abandoning beliefs despite strong evidence and often increase their commitments to beliefs that are challenged. Since a large amount of the information people want to create and consume is explicitly “fiction,” the goal of building machines that can highlight facts and diminish fake information is a challenge. The concept of “tribal epistemology” suggests that most facts exist only for certain people in certain places and times. If multiple truths can coexist it may be better to build maps of the many beliefs and their believers. Rather than seeking to identify the true among the field of untrue material, an alternative approach could map the range of claims and the people and groups who make them. Social media is often thought of as an example of a “marketplace of ideas”.¹² If so, social media network analysis could be thought of as a form of accounting software for this marketplace. Without accounting and auditing most markets become rife with fraud and marketplaces for ideas are no exception. With better accounting it should be possible to clearly trace which groups are the source and support for various ideas, claims and beliefs. When independent scrutiny of markets is possible manipulation and collusion can be identified and potentially addressed.

3.7 The era of abundant social networks: From the desktop to your hand

We now live in a new era of network data abundance. Network data collection was once a time-consuming and laborious process that yielded small datasets at great cost. Observations, surveys and interviews took many days or weeks to perform, could not be repeated frequently, required many people to produce, and often yielded low rates of participation with inherent biases and errors. Asking people about their relationships with others continues to have benefits and offers unique sources of insight, but people have been shown to be a poor source of accurate information as bias and faulty memory warp what people report about who they know and with whom they interact. The challenge of creating a dataset that spanned long periods or large numbers of people or contained records of many events proved insurmountable using traditional methods.

Today, interactions between people increasingly take place through computing systems. Users create many types of networks in a machine-readable form each day as our interactions are documented in a computer. When we use these communication tools, databases are created and maintained with records and log files that document the details of the time, place, and participants of each interaction, whether via computers or telephones or even televisions. These event logs describe many different kinds of connection but share a common structure in which one person or entity is linked to another by some relationship.

The creation of these machine-readable network datasets mean that long periods of time or large populations connected by many events can now be studied using widely available computing equipment and data sources.

Like a jump from Galileo’s handmade telescope to the orbiting Hubble, network science has made a vast leap in scale and scope as we create a digitally networked world around ourselves.

The historical drought of social network data has ended with a flood of new sources of network data. The challenge has shifted to rapidly develop tools and concepts needed to process and analyze this deluge of connected data. Technical methods for building multi-terabyte databases have shifted to the even vaster task of managing petabytes of data. New methods of harnessing thousands and even millions of computers in parallel have been driven by the growing need to manage vast data stores growing from the web. The challenge is likely to grow steeper as new sources of network data come pouring out off an emerging class of sensor-rich devices (the “Internet of Things”) that record vast streams of data from billions of people, devices, and locations. The early wave of this surge of data can be seen in new sources of data from everyday life that are being captured and recorded with mobile and wearable devices, creating a new stream of archival material that is richer than all but the most obsessively observed biographies. It has become common in recent years that the most timely and well-placed photographs and video recordings have come from everyday individuals with phones and computers rather than from news photographers and reporters.

The coming wave of mobile technologies is likely to deepen this trend, with new ways for smartphones or other devices to capture information about their users and the relationships and world around them. Many mobile applications integrate location into their service (see Chapter 2). As phones are aware of their location, a new set of mobile social software applications are becoming possible, as evidenced by new services such as Strava, a good example of a mobile data collection, analysis, and presentation service for cyclists, runners, and other trail sports. Other products like FitBit¹³ and Apple Watch are examples of social location and vital signs recording technologies that enabled web applications to provide self-monitoring medical and fitness tracking. Medical communities overlap with trail-based exercise communities by using devices that extensively quantify your “self” and “others.” These devices enable consumers to collect detailed medical readings nearly all the time that are cross referenced by location. The result is a growing aggregated map of the health and environmental conditions of the planet, not unlike early examples of collectively authored road maps of whole nations accomplished by the Open Street Map project.¹⁴

3.8 Tools for network analysis

The growth of interest in network analysis has been dramatic, but until recently the development of social network analysis tools has lagged, and they remained challenging for many non-technical people to use. Applying network approaches has been traditionally a challenge that involved much more than simply mastering a new set of concepts and ideas that focus on relationships and patterns. Network data have traditionally been difficult to create and collect, and the tools for analyzing and visualizing networks have demanded significant technical skill and often mastery of programming languages. Many tools that exist to support network analysis demand significant commitment to learn and master. The existing network tools that are relatively easier to use have typically lacked support for easily importing social media network data. In the past few years, many network analysis projects and research papers have focused on computer-mediated networks of people, documents, and systems. Only recently have new tools made it simpler for people to extract data from major social media network sources and to perform a basic network analysis workflow without requiring programming skills or using a command line interface.

Social media network data collection, scrubbing, analysis, and display tasks have historically required a remarkable collection of tools and skills. While tools like Datasift make data available from numerous social media platforms, significant technical skills are needed to connect to application programmer interfaces (APIs). In contrast, this book focuses on a single tool designed for non-programmers, NodeXL, because of its relative ease of use, support for rich visuals and analytics, and integration with the ubiquitous Excel spreadsheet software. The python or “R” programming language path is certainly the high road for experts and those with demanding volumes of data or esoteric data analysis requirements. But for the noncoding user, NodeXL may be one of the easiest ways to both manipulate network graphs and get graph datasets from a variety of social media sources. A detailed step-by-step guide to the core features of NodeXL can be found in Part II of the book.

3.9 Node-link diagrams: Visually mapping social networks

One of the key elements that characterizes modern social network analysis is the use of visualizations of complex networks. Compared to staring at edge lists or network matrices (see Section 3.2.4), looking at a network graph can provide an intuitive visual overview of the structure of the network, calling out cliques, clusters, communities, and key participants. It could be said that a graph visualization is worth a thousand ties. Not only can network visualizations inspire understanding and insights, they can also be appealing and even beautiful. They can serve as persuasive tools that demonstrate important points about networks. The ability to map attribute data and network metric scores to visual properties of the vertices and edges (see Chapters 5 and 6) makes them particularly powerful.

However, network visualizations are often as frustrating as they are appealing. Network graphs can rapidly get too dense and large to make out any meaningful patterns as illustrated in Figure 3.5. Many obstacles like vertex occlusions and edge crossings make creating well-organized and readable network graphs challenging. There is an upper limit on the numbers of vertices and edges that can be displayed in a bounded set of pixels; typically only a few hundred or thousand vertices can be meaningfully and distinctly represented on average-sized computer screens. In his appeal for better-quality network visualization, Shneiderman [40] has suggested that we aspire to reach the worthy but not always attainable goal of “netviz nirvana” in which the following goals are proposed:

• Every vertex is visible.
• Every vertex’s degree is countable (i.e., the number of connections that start or end at that vertex).
• Every edge can be followed from source to destination.
• Clusters and outliers are identifiable.

Figure 3.5 A medium-sized node-link network diagram visualization of Twitter users linked by patterns of following. This sized graph illustrates many issues with a network graph containing more than a few dozen vertices. Many vertices sit on or overlap with other vertices. The number of edges associated with some vertices is impossible to count, whereas other edges cannot be traced from source to destination. Improvements to network layout are an active area of research.

To approach netviz nirvana, careful preparation, layout, and filtering techniques must be used. In practice, network visualizations often fall far from the mark. However, the graphs shown throughout this book illustrate the value of carefully crafting network graphs. We hope they will inspire network analysts to take the care needed to create substantive, understandable, and esthetically pleasing graphs.

3.10 Common network analysis questions applied to social media

Once a set of social media networks has been constructed and social network measurements have been calculated, the resulting dataset can be used for many applications. For example, network datasets can be used to create reports about community health, comparisons of subgroups, and identification of important individuals, as well as in applications that rank, sort, compare, and search for content and experts.

The value of a social network approach is the ability to ask and answer questions that are not available to other methods. Network methods focus on the patterns of relationships in contrast to the volumes of individuals. Although analysts, marketers, and administrators often track social media participation statistics, they rarely consider measures of network position and structure. Traditional participation statistics can provide important insights into the volume of engagement of a community, but can say little about the structure of the connections between community members. Network analysis can help explain important social phenomena such as group formation, group cohesion, social roles, personal influence, and overall community health. Combining traditional participation metrics with network metrics provides the best of both worlds and allows you to answer important questions such as the following:

• What kinds of social roles are being performed within a social media collection? Does a community have enough people filling the important roles?
• Which individuals play important social roles within a group or collection? Who would make a good administrator based on that person’s network position?
• What subgroups exist? Do connections between subgroups exist? Who plays the bridge roles that connect otherwise unconnected groups?
• How do new ideas propagate through a network? Who are the influencers that spark the spread of ideas?
• How do the overall structures of a social network change after a particular event (e.g., a company social, a round of new hires or layoffs, a product launch or recall)?

3.11 Practitioner’s summary

The opportunities for practitioners to apply network analysis to contemporary business, community management, political influence, and team collaboration have dramatically increased in recent years. The once esoteric concepts and metrics of network analysis have become talk show and airport lounge topics. The difficulties in collecting and analyzing network data have been dramatically reduced by powerful database methods and well-designed network analysis and visualization tools. There is still a lot of work to be done, but practitioners now have the potential to make more effective decisions based on network analyses of their own data conducted in a few hours, rather than a few months.

Learning network concepts and tools is a necessary first step, but the payoffs for applying network methods are large. The growing numbers of trained social media network analysts and consultants are complemented by a vast array of books and informative websites, online seminars, and Wikipedia pages which make the necessary training widely available. At the same time, network analysis methods are rapidly spreading through university curricula and filtering into high school courses.

Attending public seminars and professional conferences provides other means to acquire skills and make valuable connections. Your first steps may be a struggle, but we hope that with each step the processes become smoother and the professional benefits larger.

3.12 Researcher’s agenda

The research progress on network analysis has been dramatic in the past few decades, transforming an exotic research topic into a thriving research community in academia, government, and industry. The existing metrics, clustering, and layout algorithms are stabilizing, but innovative approaches are still emerging to trigger bursts of new research. As practitioner pressure builds to apply network analysis to ever larger datasets, researchers have developed remarkably more efficient algorithms, while hardware developers have produced powerful graphics processors (based on gaming computers), huge arrays of computers, and scalable cloud computing services. Meanwhile, new social media services generate more relational data than ever before, ushering in a golden era of social science research on human relationships and collaboration.

The algorithms and hardware provide the platforms, but the concomitant development of vastly improved user interfaces for network analysis has begun to enlarge the community of users from the dedicated sociologists who are also programmers to the broad segment of business analysts who use spreadsheets or simplified web-based tools. Packaging the complex processes of frequently applied network analyses into a few clicks is the next challenge in many fields, thereby inspiring other researchers and developers to simplify the processes even further, while increasing the power offered to users. The best is yet to come.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3: Social network analysis: Measuring, mapping, and modeling collections of connections

Create new playlist

Sign In

Sign Up

3.1 Introduction

3.2 The network perspective

3.2.1 A simple Twitter network example

3.2.2 Vertices

3.2.3 Edges

3.2.4 Network data representations

3.3 Types of networks

3.3.1 Egocentric, partial, and full networks

3.3.2 Unimodal, multimodal, and affiliation networks

3.3.3 Multiplex networks

3.4 The network analysis research and practitioner landscape

3.5 Network analysis metrics

3.5.1 Aggregate network metrics

3.5.2 Vertex-specific network metrics

Degree centrality

Betweenness centrality: Bridge scores for boundary spanners

Closeness centrality: Distance scores for strategically located people

Eigenvector and PageRank centrality: Influence scores for strategically connected people

Clustering coefficient: How connected are my friends?

3.5.3 Grouping, clustering, and community detection algorithms

3.5.4 Structures, network motifs, and social roles

3.6 Social networks in the era of abundant computation

3.7 The era of abundant social networks: From the desktop to your hand

3.8 Tools for network analysis

3.9 Node-link diagrams: Visually mapping social networks

3.10 Common network analysis questions applied to social media

3.11 Practitioner’s summary

3.12 Researcher’s agenda

Table of Contents for
Chapter 3: Social network analysis: Measuring, mapping, and modeling collections of connections