Social network analysis uses mathematical tools to systematically understand networks, which are made up of vertices (e.g., people) that are connected to one another via edges (e.g., friendship ties). Network metrics help identify who is most important or central in a network, subgroups (i.e., network clusters) of tightly connected people, and the overall network structure (e.g., the density of a network). Social scientists have developed social network analysis and visualization techniques for decades. Network data is represented as an edge list or matrix. Directed edges have a clear origin and destination, while undirected edges do not. Weighted networks include a value associated with the edge. The scope of a network determines if it is an ego network, partial network, or full network. Multimodal networks include vertices of different types, while multiplex networks include edges of different types. Affiliation networks connect people based on shared affiliations (e.g., club).
Centrality metrics; Clustering algorithms; Directed; Undirected; Weighted; Ego network; Multimodal; Multiplex; Affiliation network; History
Human beings have been part of social networks since our earliest days. We are born and live in a world of connections. People connect with others through social networks formed by kinship, language, trade, exchange, conflict, citation, and collaboration. Using computer technologies to create social networks is relatively new, but networks of social interactions and exchanges are primordial. Simply defined, a network is a collection of things and their relationships to one another. The “things” that are connected are called nodes, vertices, entities, and in some contexts people. The connections between the vertices are called edges, ties, relationships or links. Many natural and artificial systems form networks, which exist in systems from the atomic level to the planetary level. A special subset of networks are social networks which are created whenever people interact, directly or indirectly, with other people, institutions, and artifacts. Social network theory and analysis is a relatively recent set of ideas and methods largely developed over the past century. It builds on and uses concepts from the mathematics of graph theory, which has a longer history, starting with Leonhard Euler in 1736. Using network analysis, you can visualize complex sets of relationships as maps (i.e., graphs or sociograms) of connected symbols and calculate precise measures of the size, shape, and density of the network as a whole and the positions of each element and group of elements within it.
The recent proliferation of Internet social media applications and smartphone devices has made social connections more visible than ever before (Chapter 2). A new subset of social networks, social media networks, are a growing focal point for the application of network analysis tools. The idea of networks, whether they are composed of friends, ideas, or web pages, is increasingly an important way to think about the modern world. You can use social network analysis to explore and visualize patterns found within collections of linked entities that include people. From a social network analysis perspective, the treelike “org-chart” that commonly represents the hierarchical structure of an organization or enterprise is too simple and lacks important information about the cross connections that exist between and across departments and divisions. In contrast with the simplified tree structure of an org-chart, a social network view of an organization or population leads to the creation of visualizations that resemble maps of highway systems, airline routes, or rail networks (see Chapter 9). Social network maps can similarly guide journeys through social landscapes and tell a story about how some points or people are at the center or periphery of the network. Maps of transportation networks where distance is measured in number of flights or road miles from one city to another city are familiar. They inspire application to less familiar networks of electrical connections, protein expression, and webs of information, conversation, and human connection.
Social network analysis and metrics are described in several excellent books and journals [1–6]. This chapter touches on the key historical developments, ideas, and concepts in social network analysis and applies them to social media network examples. We have left details of advanced topics and mathematical definitions of various concepts to the many fine technical works. The following is intended as an introductory survey of the core network concepts and methods used in subsequent chapters, which focus on the networks that can be extracted from social media sources like Twitter, Facebook, email, discussion forums, YouTube, and wikis.
Network analysts see the world as a collection of interconnected pieces. Those studying social networks see relationships as the building blocks of the social world, each set of relationships combining to create emergent patterns of connections among people, groups, and things. The focus of social network analysis is between, not within, people. Whereas traditional social science research methods such as surveys focus on individuals and their attributes (e.g., gender, age, income), network scientists focus on individuals and their “alters”—the people to whom they connect. Network analysis shifts the focus of analysis to the bonds between individuals in addition to the internal qualities and abilities of individuals. This change in focus from attribute data to relational data dramatically affects how data are collected, represented, and analyzed. Social network analysis complements methods that focus more narrowly on individuals, adding a critical dimension that captures the connective tissue of societies and other complex interconnections.
Network analysis shares some core ideas with the real estate profession. In contrast to approaches that look at internal attributes of each individual, network analysis shares the real estate focus on location, location, location! The interior of a house may be a liability, but where a property is located matters far more when trying to get a good sale price. The network perspective looks at a collection of ties among a population and creates measurements that describe the location of each person or entity within the structure of all relationships in the network. The position or location of a person or “node” or “vertex” in relation to all the others is a primary concern of social network analysis. Many network explanations look for causes of outcomes in the patterns of connections around an individual instead of their personal characteristics. “Know who” is often more important in network explanations than “know how.” Network approaches observe that different people in similar social positions often act in similar ways, even if they have different backgrounds. Positions within networks may be as significant a factor as any aspect of the people who occupy them. Network analysis argues that explanations about the success or failures of organizations are often to be found in the structure of relationships that limit or provide opportunities for interaction [7].
Many network concepts are intuitive and echo familiar phrases like “friend of a friend,” “word of mouth,” and “six degrees of separation.” Other network terms like “transitivity,” “triadic closure,” and “centrality” (see Section 3.5) may be unfamiliar terms for familiar social arrangements. Many of us recognize social network differences among people: we know some people who are “popular” and have connections to many others. We may also know some people who may be less “popular” but are still “influential,” connecting to a smaller number of people who have “better” connections. Network analysis recognizes these and other less intuitively sensed patterns in social relationships, like measuring the number of your friends who know each other and how much a person occupies a gatekeeper or bridge role between two groups. The network analysis approach makes the web of interconnections that bind people to one another visible, creating a mathematical and graphical language that can highlight important people, events, and subgroups.
To better understand the network perspective, consider the social network of Twitter users shown in Figure 3.1 (see Chapter 11 for a description of Twitter). It is an example of a sociogram, also called a network graph, which is a common way of visualizing networks. Like all networks, it consists of two primary building blocks: vertices (also called nodes or agents) and edges (also called ties or connections). The vertices are represented by images of the Twitter user profile photo, and the edges are represented by the lines that point from one vertex to another.
This network graph visualization paints a picture of the social relationships among the Twitter accounts of members of the United States Senate in 2018. The size of each Twitter user’s profile image is determined by the user’s total number of followers as reported by the Twitter Application Programmer Interface (API), which gives software access to extended details about each user’s profile and message data. This is one example of how attribute data (e.g., data that describe a person) can be overlaid onto a network. A line, or edge, exists between two people when one user account “mentions” or “replies-to” another. All of these connections in aggregate reveal the emergent structure of two large distinct groups (G1 and G2) with relatively few connecting links, which loosely map to the two political parties in the United States. These separate clusters reflect the higher rate that members of one party mention one another in contrast to the rate they mention members of the opposing party. This network analysis identifies the individuals who fill important positions within the network, such as those with whom many other people interact and those who are connected across cluster boundaries. The current and following chapters will provide a guide to creating maps like these from Twitter and other social media platforms and data sources. For now, let’s consider the major components of a network in a bit more detail.
Vertices, also called nodes, agents, entities, or items, can represent many kinds of things. Often they represent people or social structures such as workgroups, teams, organizations, institutions, states, or even countries. At other times they represent content such as web pages, keywords, or videos. They can even represent physical or virtual locations or events. Vertices often correspond with the primary building blocks of social media platforms as described in Chapter 2: pages in wikis, friends in social networking sites, and posts or authors in blogs.
Although it is not an absolute requirement for network analysis, having attribute data that describe each of the vertices can add insights to an analysis and visualization. For example, Figure 3.1 used descriptive attribute data about the total number of followers for each user to convey a sense of who is most popular on Twitter within the network. Other attribute data from Twitter, such as the number of people each user follows and the date they joined Twitter, can also be mapped to visual attributes (see Chapter 11). More generally, attribute data may describe demographic characteristics of a person (age, gender, race), data that describe the person’s use of a system (number of logins, messages posted, edits made) or other characteristics such as income, location or brand preferences. In network visualization tools like NodeXL, attribute data can be mapped to visual properties such as the size, color, or opacity of each vertex (see Chapter 5).
Edges, also known as links, ties, connections, and relationships, are the connective tissue of networks. An edge connects two vertices together. Edges can represent many different types of relationships like proximity, collaboration, kinship, friendship, trade partnership, literature citation, investment, hyperlink, transaction, or any shared attribute (e.g., people who attended the same University). An edge can be said to exist if it has some official status, is recognized by the participants, or is observed by exchange or interaction between them. In summary, an edge is any form of relationship or connection between two entities.
Network scientists have developed a language to describe different types of edges. In Section 2.3.5 of Chapter 2, we introduced the core types of connections that occur in social media networks. Here we describe how those concepts map to network and graph theory concepts more generally.
Undirected or directed edges are the two major types of connections. Directed edges (also known as asymmetric edges) have a clear origin and destination: money is lent from one person to another, a Twitter user follows another user, an email is sent from an author to a recipient, or a web page links to another web page. They are represented on a graph as a line with an arrow pointing from the source vertex to the recipient vertex (see Figure 3.1). Directed edges may be reciprocated or not. If I sent you a message, you may send one back in return, or not. An undirected edge (also known as a symmetric or mutual edge) simply exists between two people or things: a couple is married, two Facebook users are friends, or two people are members of the same organization. No origin or destination is clear in these mutual relationships. They cannot exist unless they are reciprocated. Undirected edges are represented on a graph as a line connecting two vertices with no arrows.
Edges can be further described by additional types of data. The simplest type of edge, an unweighted edge or binary edge, only indicates if an edge exists or not. For example, a friendship tie between Facebook users either exists or it does not. In contrast, a weighted edge includes values associated with each edge that indicate the strength or frequency of a tie. For example, a weighted edge between two Facebook users may indicate the number of photo comments exchanged or the duration since the creation of a friendship. Weighted edges are often represented visually as thicker or darker or as more or less opaque lines. Including weighted edge data in a network dataset is preferable because this provides additional information about each tie. However, many social network analysis metrics (see Section 3.5) are designed for unweighted networks. Fortunately, any weighted network can be converted to an unweighted one by choosing a cutoff point. For example, an unweighted edge could be shown between individuals who exchanged at least 10 email messages, with no edge between people who exchanged fewer than 10 messages.
Because network data differ from attribute data, a different way to represent it is used. With attribute data, it is common to create a data matrix where each row represents an individual and each column represents an individual’s characteristics, behaviors, or answers to survey questions. A modified approach is used to represent relational data. Like attribute matrices, each row represents an individual in the network. However, unlike attribute matrices, each column represents other individuals as shown in Table 3.1.
Table 3.1
Ann | Bob | Carol | |
---|---|---|---|
Ann | 0 | 1 | 1 |
Bob | 0 | 0 | 0 |
Carol | 1 | 0 | 0 |
a This network is a directed network, as it is not symmetrical (i.e., Ann points to Bob in row 1, but Bob doesn't point to Ann in row 2). It is a simple binary network: either a tie exists (value = 1) or not (value = 0).
Different types of edges can be represented in network matrices. Table 3.1 describes a directed network because not all connections are reciprocated. For example, Ann “points to” Bob as shown in row 1, but Bob does not “point to” Ann as shown in row 2. If it were an undirected network it would be a symmetric matrix; if Ann points to Bob then Bob must necessarily point to Ann. This network is a binary network because it only includes 1s and 0s, where a 1 indicates that there is a connection and a 0 indicates that there is no connection. Allowing additional values would create a weighted network. For example, the 1s could be replaced with the number of email messages sent or phone calls made to the other person. Notice that the diagonal of the matrix connects each person with himself or herself. In this network, like most networks, the diagonal values are 0 indicating that a person does not “point to” herself. However, in some networks a “self-loop” connecting a person to herself can exist. For example, a person may send herself an email message as a reminder. Network matrices are powerful forms of representation that lend themselves to efficient mathematical manipulation for those inclined. However, they can also become quite large and challenging to navigate, particularly when networks are relatively “sparse” with few connections and many items.
An alternative to the matrix data format that is a more efficient representation of a network is called an “edge list.” As its name suggests, it is simply a list of all edges in the network as shown in Table 3.2. This is the same network as shown in Table 3.1. Individuals in the Vertex1 column “point to” those in the Vertex2 column. Unless data describing the value of each edge are provided in additional columns, the network is implied to be a binary one. Self-loops are possible to represent in edge lists by having a row with the person’s name repeated in both columns. Throughout this book, you will use edge lists instead of matrices. Edge lists are “efficient” in that they only record a row of data for each connection that does exist in a network, rather than store a “zero” for each possible connection that does not exist. Edge lists can be smaller files and easier to edit and review.
Table 3.2
Vertex1 | Vertex2 |
---|---|
Ann | Bob |
Ann | Carol |
Carol | Ann |
a Individuals in the Vertex1 column “point to” those in the Vertex2 column in this directed network. The network is implied to be a binary network. Additional columns could be used to describe each edge. For example, an Edge Weight column could be added with values representing the strength of various ties.
The final method for representing networks is through network graphs. Figure 3.2 is a network graph based on the data in Table 3.2. It makes immediately clear that the relationship between Ann and Carol is reciprocated (i.e., there are arrows on both sides of the line connecting them) and that there is no connection between Bob and Carol. Our earlier analysis of Figure 3.1, another network graph, demonstrates how network graphs can lead to insights that are hard to identify in tabular data, particularly when large networks are presented. However, many network graphs require significant preparation to assure that they are readable as described in Section 3.9 and Chapter 4.
Social networks range in size from a handful of people to national and planetary populations. They also differ in the types of vertices they include, the nature of the edges that connect them, and the ways in which they are formed. In this section we introduce some of the distinctions that network scientists have identified to describe different types of networks. These distinctions affect the metrics and maps generated from them, as well as their interpretation.
It is often useful to consider social networks from an individual member’s point of view. Network analysts call the individual that is the focus of attention “ego” and the people he or she is connected to “alters.” Some networks, called egocentric networks, only include individuals who are connected to a specified ego. For example, a network of your personal Facebook friends would be an egocentric network because you are, by definition, connected to all other vertices, like the hub of a wagon wheel with many spokes. Other egocentric networks and their associated “subgraphs” (see Chapter 7) may extend out from an ego, reaching not only friends, but also friends of friends. More generally, egocentric networks can extend out any number of “degrees” from an ego. The basic “1-degree” ego network consists of the ego and their alters. The “1.5- degree” ego network extends the 1-degree network by including connections between all of the alters. For example, a Facebook 1.5 degree ego network would characterize which of your friends know each other (sadly this data is no longer available from the Facebook platform). The “2-degree” ego network extends the 1.5-degree network by including all of the alters’ own alters (i.e., friends of friends), some of whom may not be connected to the ego. These three sizes of ego networks allow you to look at increasingly larger, but still “local” neighborhoods around a particular individual in a social network. Higher-degree networks (e.g., 2.5, 3) are feasible to create but not used as often in practice because they can quickly grow to a large size and become intractable. Consider, for example, that of the 1.59 billion Facebook users in 2016, there were an average of only 3.57 “intermediaries” between any two people in the network!2
Networks that are smaller than the complete human population are often interesting and some can be small enough to be manageable with the resources available in a desktop or laptop computer. A “full” or “complete” network contains the subset of people or entities who match some interest or attribute and includes information about the set of connections among them all. All the “egos” in a full network are treated equally, none is assumed to be the “ego” of the network, although analysis of these networks will reveal that some people are more strategically located in the network than others. A full network is often created and available when a single system, such as a social media platform, acts as a hub among a group of connected people. For example, the Twitter network includes all users of the service and the connections between them. In practice, it is not always feasible (or particularly insightful) to analyze a platform-scale full network in one dataset. Instead, analysts create more selective sub-networks by selecting a sample or slice of the larger complete network. For example, Figure 3.1 showed the slice of the Twitter network that included the connections among the 100 user accounts for the members of the 115th United States Senate. This partial network is based on a known list of users. Other types of networks are topic centric, they start with a search term and the people who will be included in the data are not (necessarily) known prior to the data collection. Other partial networks may be created to include a subgroup of users (e.g., all conference attendees), or include only people and connections that occurred within a specified time frame, or be limited to people who have certain characteristics (e.g., CEOs of Fortune 500 companies, members of a national or state legislature).
Up until this point we have only considered networks that connect the same type of entity. These standard networks are called unimodal networks because they include one type (i.e., mode) of vertex. They connect users to users or they connect documents to documents, but they don’t include both users and documents. However, networks can include different types of vertices creating multimodal networks. Chapter 6 includes an example multimodal network that connects Marvel Movies to Characters in those movies. Rich sets of intersecting networks often form in social media environments composed of connections between people, photos, videos, messages, documents, groups, organizations, locations, and services. In many cases, these multimodal networks have to be transformed into simpler unimodal networks to perform meaningful network analysis, as most network metrics are designed for unimodal networks.
A common type of multimodal network is a bimodal network with exactly two types of vertices. Data for these networks often include individuals and some event, activity, or content with which they are affiliated, creating an affiliation network. For example, an affiliation network may connect users with the wiki pages they have edited. People are affiliated with pages. In this network, no two users would directly connect to each other. Likewise, no two pages would directly connect to each other. Pages only link to people (i.e., editors).
Bimodal affiliation networks can be transformed into two separate unimodal networks: a “user edits page” network can be converted into a user-to-user network and an page to page network (see Chapter 6, Advanced topic: Transforming a bimodal affiliation network into two unimodal networks for details). The user-to-user network connects people based on their indirect links to one another through edits to a common page. For example, in a wiki co-edit affiliation network Derek and Marc would be strongly connected because they both edit many of the same wiki pages. In contrast, a Page to Page network connects Pages based on the number of shared editors. For example, a pair of wiki pages would be closely connected if many people edited both of the pages (see Chapter 14). More generally, this approach can be used to relate objects of all types (e.g., books, photos, and audio recordings) based on users’ behaviors (e.g., purchasing or reading habits) and preferences (e.g., ratings). Affiliation networks are the raw material of many recommender systems that recommend items of interest, such as Amazon’s “Customers Who Bought This Item Also Bought” feature. A network data structure can return results to queries like “people who linked to this document also linked to these documents” or “if you link to this document, you may want to link to these people.”
Although it is common for two people to be connected in many different ways (e.g., by exchanging phone calls, emails, sharing group membership, and being married), most networks only include one type of connection or edge. However, it is possible to consider networks with multiple types of connections, called multiplex networks. For example, the Twitter network shown in Figure 3.1 includes two types of directed edges: “reply to” relationships and “mention” relationships. The network graph visualization could have uniquely represented each type of edge by using color, different edge types (e.g., dotted lines, solid lines), or edge labels (see Chapter 5). In the case of Figure 3.1, the difference between the two types of edge (reply and mention) was not deemed important, so the multiplex network data was condensed into a uniplex network that showed a single directed edge if one or more of the three types of connections were present. This strategy of combining multiple types of edges is a common one that allows for the use of network metrics, which are mostly based on uniplex networks.
You can find network scientists in nearly every academic discipline and an increasing number of practitioner communities. Network concepts and techniques are now widely found throughout a range of disciplines including sociology, anthropology, communications, computer science, education, economics, physics, management, information science, medicine, political science, public health, psychology, biology, history and digital humanities. In the past several decades, social scientists have shown that network structures have a profound influence on health, work, and community. Getting a job, being promoted, catching an illness, adopting an innovation, and many more activities and processes have been explained in the terms of social networks. Network structures are important in the biological sciences where research is focused on connections between metabolic and genetic processes. The shape and function of networks can have great consequences as ideas, genes, innovations, or pathogens diffuse through populations. Researchers now apply network theory and methods to understanding how Supreme Court decisions relate to previous cases, how the United States Senate votes (see Chapter 7), how epidemics spread within cities, and how characters in a movie relate to one another (see Chapter 6). Networks are formed from many physical processes and are echoed in a number of structures created inside information systems such as the collection of linked documents within the World Wide Web or an enterprise’s collections of files and emails. Information scientists use these links to identify high-quality web pages (e.g., Google’s PageRank algorithm), or use the citations from research articles to identify high-impact articles and authors.
Network methods are diffusing beyond academic research, becoming an important tool for managing organizations, markets, and movements. Entrepreneurs apply network analysis techniques to understand how to leverage the powerful effects of word-of-mouth marketing as their customers spread news about their new products to one another. Many politicians recognize the potential power of a connected network of supporters who can be turned into contributors, volunteers, and voters. Engineers use network analysis to build more effective power grids, computer networks, and transportation systems. Law enforcement officers and lawyers analyze email networks to identify and prosecute potential criminals. And the intelligence community seeks to identify national security threats by looking at networks created by communication links, money trails and kinship. Having at least a basic understanding of network thinking and concepts is a core literacy of our time. Like statistics, network analysis has countless applications to a number of fields.
This book primarily focuses on social network analysis, a subfield of network sciences that focuses on networks that connect people or social units (i.e., organizations, teams) to one another (see Advanced topic: Early social network analysis). Further, we are interested in networks that connect human-generated content or artifacts together, such as websites or cell phones, or social media networks.
Social scientists, physicists, computer scientists, and mathematicians have collaborated to create novel theories and algorithms for calculating measurements of social networks and the people and things that populate them. These quantitative network metrics allow analysts to systematically inspect the patterns of connection within the social world, creating a basis on which to compare networks, track changes in a network over time, and determine the relative position of individuals and clusters within a network.
Social network measures initially focused on simple counts of connections and over time became more sophisticated as it developed and incorporated concepts of network density, centrality, structural holes, balance, and transitivity. Some metrics describe a network as a whole. For example, vertex count is the number of entities in the network while the edge count is the number of connections among them. Another whole network metric “density” captures how connected a set of vertices are by calculating the percentage of connections that are observed from maximum possible count if everyone connected to everyone. Other metrics are calculated for each vertex in a network. For example, “centrality” measures, of which there are many, capture how “important” (central) a vertex is within the network based on some objective criteria. Some people sit at the edge or periphery of their networks, whereas others are firmly at the center, connected to many of the other most connected people. In most human networks, even highly connected networks, some pairs of people are not directly connected. When a third person bridges a connection (a “friend of a friend”), we can think of that person as a broker, a “bridge” or a “connector.” When that person is missing, we can think of the gap as a “structural hole,” a place in which there is a missing connector, potentially a good spot to build a “bridge.” The following sections describe some of these metrics in more detail. Chapter 6 introduces some of the core metrics found in NodeXL through hands-on exercises.
A number of metrics are used to describe and summarize an entire network. In some cases, a single network dataset contains sub-networks separated into several disconnected pieces, called components. Some aggregate network metrics only work on networks where all of the vertices are connected in a single component, whereas others can be applied to entire networks even if they are split up into disconnected segments. Here we describe just a few aggregate network metrics to give a flavor for what is possible, leaving a fuller discussion for Chapter 6.
As mentioned, density is an aggregate network metric used to describe the level of interconnectedness among a set of vertices. Density is a count of the number of relationships observed to be present in a network divided by the total number of possible relationships that could be present. It is a quantitative way to capture important sociological ideas like cohesion, solidarity, and membership.
Centralization is an aggregate metric that characterizes the amount to which the network is centered on one or just a few important nodes. Centralized networks have many edges that emanate from a few important vertices, whereas decentralized networks have many vertices with many interconnections. Networks with high levels of centralization are likely to be more hierarchical, with a few people playing hub roles.
Other metrics integrate attribute data with network data. For example, metrics that measure homophily look at the similarity of people who are connected. Studies typically show that people are connected to others who are similar to themselves on core attributes like income, education level, religious affiliation, and age.
A set of network metrics are similar to the geographic concepts of latitude and longitude, coordinates that identify each individual's position within a network. Paramount among these is the set of “centrality” measures, which describe how a particular vertex can be said to be in the “middle” of a network. In the 1970s and 1980s, the sociologist Phillip Bonacich developed a refined measure of centrality that took into consideration the different value a highly-connected person can have in contrast to people with a few rare connections. Network theorists noted that simply having many connections, called “degree centrality,” was only one way to be “at the center” of things. A person with fewer connections might have more rare and potentially “important” connections than someone with more connections. One connection can be more important than another in different ways. Some are better because they bridge across otherwise separated portions of the network, whereas others are important because they connect to well- connected people. The following centrality metrics provide quantifiable measures for these concepts (see Chapter 6 for more details).
Degree centrality is a simple count of the total number of connections linked to a vertex. It can be thought of as a kind of popularity measure, but a crude one that does not recognize a difference between quantity and quality. Degree centrality does not differentiate between a link to the CEO of a big company and a link to its most recent trainee hire. Degree is the measure of the total number of edges connected to a particular vertex. For directed networks where relationships have an origin and a destination rather than have mutual connections, there are two measures of degree: in-degree and out-degree. In-degree is the number of connections that point inward at a vertex. Out-degree is the number of connections that originate at a vertex and point outward to other vertices.
The notion of connection paths is central to the study of networks. Perhaps one of the most natural questions to ask about any two people in a network it is “How far apart are they?” This distance is measured simply: the distance between people who are not neighbors is measured by the smallest number of neighbor-to-neighbor hops from one to connect to the other. For instance, people who are not your neighbors, but are your neighbors' neighbors, are a distance 2 from you, and so on. The shortest path between two people is called the “geodesic distance” and is used in many centrality metrics. For example, betweenness centrality is a measure of how often a given vertex lies on the shortest path between two other vertices. This can be thought of as a kind of “bridge” score, a measure of how much removing a person would disrupt the connections between other people in the network. The idea of brokering is often captured in the measure of betweenness centrality.
A “structural hole” is a term for recognizing a missing bridge. Wherever two or more groups fail to connect, one can argue that there is a structural hole, a missing gap waiting to be filled. Burt provides compelling evidence that individuals who bridge structural holes within their organizations are promoted faster than others [15]. Social network analysis has many strategic applications for people in an organization to analyze their position and the position of others. Managers and leaders can recognize gaps or disconnections within organizations and devote resources to bridging the divide. People may be able to apply social network analysis to identify locations in which a gap exists and elect to fill them, recognizing the value they can generate as broker between two otherwise separate groups.
Closeness centrality measures each individual’s position in the network via a different perspective from the other network metrics, capturing the average distance between each vertex and every other vertex in the network. Assuming that vertices can only pass messages to or influence their existing connections, a low closeness centrality means that a person is directly connected or “just a hop away” from most others in the network. In contrast, vertices in very peripheral locations may have high closeness centrality scores, indicating the high number of hops or connections they need to take to connect to distant others in the network. Think of closeness, paradoxically, as a “distance” score. Some people are just a few miles from the big city, others must drive for hours: similarly, people with high “closeness” centrality scores have many miles or rather personal connections that they must travel to reach many other people in the network. Note that in some cases the inverse of the average distance to others in the network is used as a measure of closeness centrality. In that case, higher values indicate a more central position.
Eigenvector centrality is a more sophisticated view of centrality: a person with few connections could have a very high eigenvector centrality if those few connections were to very well-connected others. Eigenvector centrality allows for connections to have a variable value, so that connecting to some vertices has more benefit than connecting to others. The PageRank algorithm used by Google's search engine is a variant of Eigenvector Centrality, primarily used for directed networks. PageRank considers (1) the number of in-bound links (i.e., sites that link to your site), (2) the quality of the linkers (i.e., the PageRank of sites that link to your site), and (3) the link propensity of the linkers (i.e., the number of sites the linkers link to). See Chapter 6 for a more in-depth discussion and examples.
The clustering coefficient metric differs from measures of centrality. It is more akin to the density metric for whole networks, but focused on egocentric networks. Specifically, the clustering coefficient is a measure of the density of the 1.5-degree egocentric network for each vertex. When these connections are dense, the clustering coefficient is high. If your “friends” (alters) all know each other, you have a high clustering coefficient. If your “friends” (alters) don’t know each other, then you have a low clustering coefficient. People have different measures for their clustering coefficient depending on the ways they cultivate connections to others and the environments they are in.
A network approach can discover and identify the boundaries of groups and clusters, or apply existing information about each vertex to create categories or divisions. In a network perspective, people maintain many relationships and are potentially members in many loosely defined groups and clusters. Defining exact group boundaries in a network may be difficult, reflecting the reality of people with multiple and shifting memberships. From a network perspective, a group is a collection of vertices. Groups can be formed for many reasons, in some cases some vertices are more connected to one another than they are to others. Relatively more cohesive or densely connected sets of vertices form regions, also called clusters, that may reflect the existence of groups. A group of people discovered in this way might not be explicitly named or recognized. Members of a network cluster might not recognize their collective membership despite their individual connections to others in the group. A rapidly growing body of research describes clustering algorithms, also called community detection algorithms, that automatically identify these clusters based on networks structures, as discussed in Chapter 7.
Two people within a network may sometimes share a pattern of connection to other people, even if they do not connect to the same people. Certain professions have distinct patterns of connections, either linking with many others (real estate agents, and other retail professionals) or few (reclusive authors and artists, remote office workers, and some people whose work focuses on things rather than people). In addition to having the same the number of connections, some people share the same pattern of connections among the people with whom they connect. In some cases people are connected to people who are strangers to one another, in other cases a group may be densely connected to one another. These secondary patterns of connection are a distinctive feature of network analysis approaches: networks are as much about the attributes and patterns of connection among neighbors as they are about the attributes and connections of any individual.
Social roles are complex cultural and structural features of social life. An example social role like “father” is explicitly recognized in society, has a wide set of culturally shared meanings and expectations, is associated with particular goals and interests, and is partly defined by the content and structure of actions directed toward other distinctive role holders. Other types of social roles may not be as clearly defined or explicitly recognized by all the actors in a given social setting, but they have identifiable content, behavioral, and structural features.
Studies of social media have illustrated the ways contributors create distinctive network patterns that reflect their role or status within the community (e.g., Welser, Gleave, and Smith [16]). These patterns are evidence of specialization of behavior in these social spaces. An example of a role in a social media space is the “answer person” who disproportionately provides the answers to questions asked in message board environments (see Chapter 10), “discussion people” who engage in extended exchanges of messages in large and populous threaded discussions, “discussion starters” who demonstrate influence over the topics discussed by the “discussion people,” “influential” people who are well connected to others who are more highly connected than they are, and boundary spanners who bridge between unconnected subgroups.
The widespread adoption of networked communication technologies has significantly expanded the population of people who are both aware of network concepts and interested in network data. Although the idea of networks of connections of people spanning societies and nations was once esoteric, today many people actively manage an explicit social network of friends, contacts, buddies, associates, and addresses that compose their family, social, professional, and civic lives. Facebook posts forwarded from person to person have become a common and visible example of the ways information passes through networks of connected people. The notion of “friends of friends” is now easy to illustrate in the features of social media applications like Facebook and LinkedIn that provide explicitly named “social networking” services. Viral videos and chain emails illustrate the way word of mouth has moved into computer-mediated communication channels. The idea of “six degrees of separation” has moved from the offices of Harvard sociologists to become the dramatic premise of a Broadway play to now appear as an expected feature of services that allow people to browse and connect to their friend’s friends.
As network concepts have entered everyday life, the previously less visible ties and connections that have always woven people together into relationships, cliques, clusters, groups, teams, partnerships, clans, tribes, coalitions, companies, institutions, organizations, nations, and populations have become more apparent. Patterns of information sharing, investment, personal time and attention have always generated network structures, but only recently have these linkages been made plainly visible to a broad population. In the past few decades, the network approach to thinking about the world has expanded beyond the core population of researchers to a wide range of analysts and practitioners who have applied social network methods and perspectives to understand their businesses, communities, markets, and disciplines. Today, because many of us manage many aspects of our social relationships through a computer-networked social world, it is useful for many more people to develop a language and literacy in the ways networks can be described, analyzed, and visualized. Visualizing and analyzing a social network is an increasingly common personal or business interest. The science of networks is a growing topic of interest and attention, with a growing number of courses for graduates and undergraduates, as well as educational materials for a wider audience (e.g., television documentaries).7
The availability of cheaper computing resources and network datasets has enabled a new generation of researchers access to studies of the structures of social relationships at vastly larger scale and detail. Since the late 1960s, as computing resources and network datasets have grown in availability and dropped in cost, researchers began developing tools and concepts that enabled a wider and more sophisticated application of social network analysis.
We now live in a new era of network data abundance. Network data collection was once a time-consuming and laborious process that yielded small datasets at great cost. Observations, surveys and interviews took many days or weeks to perform, could not be repeated frequently, required many people to produce, and often yielded low rates of participation with inherent biases and errors. Asking people about their relationships with others continues to have benefits and offers unique sources of insight, but people have been shown to be a poor source of accurate information as bias and faulty memory warp what people report about who they know and with whom they interact. The challenge of creating a dataset that spanned long periods or large numbers of people or contained records of many events proved insurmountable using traditional methods.
Today, interactions between people increasingly take place through computing systems. Users create many types of networks in a machine-readable form each day as our interactions are documented in a computer. When we use these communication tools, databases are created and maintained with records and log files that document the details of the time, place, and participants of each interaction, whether via computers or telephones or even televisions. These event logs describe many different kinds of connection but share a common structure in which one person or entity is linked to another by some relationship.
The creation of these machine-readable network datasets mean that long periods of time or large populations connected by many events can now be studied using widely available computing equipment and data sources.
Like a jump from Galileo’s handmade telescope to the orbiting Hubble, network science has made a vast leap in scale and scope as we create a digitally networked world around ourselves.
The historical drought of social network data has ended with a flood of new sources of network data. The challenge has shifted to rapidly develop tools and concepts needed to process and analyze this deluge of connected data. Technical methods for building multi-terabyte databases have shifted to the even vaster task of managing petabytes of data. New methods of harnessing thousands and even millions of computers in parallel have been driven by the growing need to manage vast data stores growing from the web. The challenge is likely to grow steeper as new sources of network data come pouring out off an emerging class of sensor-rich devices (the “Internet of Things”) that record vast streams of data from billions of people, devices, and locations. The early wave of this surge of data can be seen in new sources of data from everyday life that are being captured and recorded with mobile and wearable devices, creating a new stream of archival material that is richer than all but the most obsessively observed biographies. It has become common in recent years that the most timely and well-placed photographs and video recordings have come from everyday individuals with phones and computers rather than from news photographers and reporters.
The coming wave of mobile technologies is likely to deepen this trend, with new ways for smartphones or other devices to capture information about their users and the relationships and world around them. Many mobile applications integrate location into their service (see Chapter 2). As phones are aware of their location, a new set of mobile social software applications are becoming possible, as evidenced by new services such as Strava, a good example of a mobile data collection, analysis, and presentation service for cyclists, runners, and other trail sports. Other products like FitBit13 and Apple Watch are examples of social location and vital signs recording technologies that enabled web applications to provide self-monitoring medical and fitness tracking. Medical communities overlap with trail-based exercise communities by using devices that extensively quantify your “self” and “others.” These devices enable consumers to collect detailed medical readings nearly all the time that are cross referenced by location. The result is a growing aggregated map of the health and environmental conditions of the planet, not unlike early examples of collectively authored road maps of whole nations accomplished by the Open Street Map project.14
The growth of interest in network analysis has been dramatic, but until recently the development of social network analysis tools has lagged, and they remained challenging for many non-technical people to use. Applying network approaches has been traditionally a challenge that involved much more than simply mastering a new set of concepts and ideas that focus on relationships and patterns. Network data have traditionally been difficult to create and collect, and the tools for analyzing and visualizing networks have demanded significant technical skill and often mastery of programming languages. Many tools that exist to support network analysis demand significant commitment to learn and master. The existing network tools that are relatively easier to use have typically lacked support for easily importing social media network data. In the past few years, many network analysis projects and research papers have focused on computer-mediated networks of people, documents, and systems. Only recently have new tools made it simpler for people to extract data from major social media network sources and to perform a basic network analysis workflow without requiring programming skills or using a command line interface.
Social media network data collection, scrubbing, analysis, and display tasks have historically required a remarkable collection of tools and skills. While tools like Datasift make data available from numerous social media platforms, significant technical skills are needed to connect to application programmer interfaces (APIs). In contrast, this book focuses on a single tool designed for non-programmers, NodeXL, because of its relative ease of use, support for rich visuals and analytics, and integration with the ubiquitous Excel spreadsheet software. The python or “R” programming language path is certainly the high road for experts and those with demanding volumes of data or esoteric data analysis requirements. But for the noncoding user, NodeXL may be one of the easiest ways to both manipulate network graphs and get graph datasets from a variety of social media sources. A detailed step-by-step guide to the core features of NodeXL can be found in Part II of the book.
One of the key elements that characterizes modern social network analysis is the use of visualizations of complex networks. Compared to staring at edge lists or network matrices (see Section 3.2.4), looking at a network graph can provide an intuitive visual overview of the structure of the network, calling out cliques, clusters, communities, and key participants. It could be said that a graph visualization is worth a thousand ties. Not only can network visualizations inspire understanding and insights, they can also be appealing and even beautiful. They can serve as persuasive tools that demonstrate important points about networks. The ability to map attribute data and network metric scores to visual properties of the vertices and edges (see Chapters 5 and 6) makes them particularly powerful.
However, network visualizations are often as frustrating as they are appealing. Network graphs can rapidly get too dense and large to make out any meaningful patterns as illustrated in Figure 3.5. Many obstacles like vertex occlusions and edge crossings make creating well-organized and readable network graphs challenging. There is an upper limit on the numbers of vertices and edges that can be displayed in a bounded set of pixels; typically only a few hundred or thousand vertices can be meaningfully and distinctly represented on average-sized computer screens. In his appeal for better-quality network visualization, Shneiderman [40] has suggested that we aspire to reach the worthy but not always attainable goal of “netviz nirvana” in which the following goals are proposed:
To approach netviz nirvana, careful preparation, layout, and filtering techniques must be used. In practice, network visualizations often fall far from the mark. However, the graphs shown throughout this book illustrate the value of carefully crafting network graphs. We hope they will inspire network analysts to take the care needed to create substantive, understandable, and esthetically pleasing graphs.
Once a set of social media networks has been constructed and social network measurements have been calculated, the resulting dataset can be used for many applications. For example, network datasets can be used to create reports about community health, comparisons of subgroups, and identification of important individuals, as well as in applications that rank, sort, compare, and search for content and experts.
The value of a social network approach is the ability to ask and answer questions that are not available to other methods. Network methods focus on the patterns of relationships in contrast to the volumes of individuals. Although analysts, marketers, and administrators often track social media participation statistics, they rarely consider measures of network position and structure. Traditional participation statistics can provide important insights into the volume of engagement of a community, but can say little about the structure of the connections between community members. Network analysis can help explain important social phenomena such as group formation, group cohesion, social roles, personal influence, and overall community health. Combining traditional participation metrics with network metrics provides the best of both worlds and allows you to answer important questions such as the following:
The opportunities for practitioners to apply network analysis to contemporary business, community management, political influence, and team collaboration have dramatically increased in recent years. The once esoteric concepts and metrics of network analysis have become talk show and airport lounge topics. The difficulties in collecting and analyzing network data have been dramatically reduced by powerful database methods and well-designed network analysis and visualization tools. There is still a lot of work to be done, but practitioners now have the potential to make more effective decisions based on network analyses of their own data conducted in a few hours, rather than a few months.
Learning network concepts and tools is a necessary first step, but the payoffs for applying network methods are large. The growing numbers of trained social media network analysts and consultants are complemented by a vast array of books and informative websites, online seminars, and Wikipedia pages which make the necessary training widely available. At the same time, network analysis methods are rapidly spreading through university curricula and filtering into high school courses.
Attending public seminars and professional conferences provides other means to acquire skills and make valuable connections. Your first steps may be a struggle, but we hope that with each step the processes become smoother and the professional benefits larger.
The research progress on network analysis has been dramatic in the past few decades, transforming an exotic research topic into a thriving research community in academia, government, and industry. The existing metrics, clustering, and layout algorithms are stabilizing, but innovative approaches are still emerging to trigger bursts of new research. As practitioner pressure builds to apply network analysis to ever larger datasets, researchers have developed remarkably more efficient algorithms, while hardware developers have produced powerful graphics processors (based on gaming computers), huge arrays of computers, and scalable cloud computing services. Meanwhile, new social media services generate more relational data than ever before, ushering in a golden era of social science research on human relationships and collaboration.
The algorithms and hardware provide the platforms, but the concomitant development of vastly improved user interfaces for network analysis has begun to enlarge the community of users from the dedicated sociologists who are also programmers to the broad segment of business analysts who use spreadsheets or simplified web-based tools. Packaging the complex processes of frequently applied network analyses into a few clicks is the next challenge in many fields, thereby inspiring other researchers and developers to simplify the processes even further, while increasing the power offered to users. The best is yet to come.
3.144.30.236