Chapter 13

YouTube: Exploring video networks

Itai Himelboim*; Jen Golbeck; Bryan M. Trude    * SEE Suite: Social media Engagement & Evaluation lab, Department of Advertising & Public Relations, Grady College of Journalism and Mass Communication, University of Georgia, Athens, Georgia
College of Information Studies, University of Maryland, College Park, MD, United States
Department of Advertising & Public Relations, Grady College of Journalism and Mass Communication, University of Georgia, Athens, Georgia

Abstract

Analyzing YouTube social networks can offer many insights into the ways videos become popular (e.g., go viral) and the way information is disseminated through videos. This chapter walks you through the collections, analysis and visualization of YouTube video networks and YouTube user networks, which can be created with the NodeXL YouTube importer. The analysis of video network clusters helps reduce an immense number of topical videos into user-created groups of videos, based on their posted comments, a proxy for their expressed interests. Such clusters provide insights into the sub-themes of large YouTube videos. These observations, coupled with YouTube’s immense popularity, provide deep insights into one of today’s major media outlets. The size of the YouTube network can be daunting, but by focusing on appropriate samples of topical data, filtering with relevant metrics, and using visualizations extensively, you can grasp at least part of what shapes our contemporary culture.

Keywords

YouTube; Importer; Network clusters; Video networks

13.1 Introduction

Billions of videos about wildly diverse topics have been uploaded to the Internet by hundreds of millions of people. Using the techniques of social network analysis, you can visualize the landscape of connected videos and users to highlight important patterns that link the producers, commentators, and consumers. In just the past few years, online video sharing has become a growing mainstream social practice. Gone are the days when watching a video online was an onerous task, involving the installment of media players and a prolonged wait for the content to download. Today, people easily share private videos with friends and family; amateurs and professionals broadcast artistic endeavors, from music to comedy to directorial experimentation; media corporations distribute TV programs and movie excerpts, and millions of people watch and recommend videos to others, making some of them “viral” and wildly popular. Online video sharing services offer something for almost everyone, whether they are video content creators or consumers. Video content is used for many purposes: conveying knowledge, disseminating information, self-promotion, documenting world affairs, and much more. The diverse content that can be found on video-sharing sites draws large numbers of users. But not all content, even popular content, is popular in the same way. As you will see, networks of producers and consumers of online videos vary greatly.

YouTube has become almost synonymous with watching videos online. Although “video sharing” is a term less familiar to most people, “YouTube” videos are commonplace. Using video-sharing systems to publish or consume videos creates a variety of connections between the people who use these systems, the videos, and the tags that describe them. The social structures that emerge from the interaction of video creators and viewers can be represented as a social network graph. Participation in online video sharing generates a number of network graphs that can reveal not only users’ preferences for video content but also their habits, motivations, and social interaction. Many people upload, view, and comment on videos, but some become centers of dense communities or are well connected to others. Because of its extreme popularity, YouTube presents an especially inviting domain to explore the social structure and dynamics of networks within video-sharing communities.

13.2 What is YouTube?

Created in 2005, YouTube is an arena for personal communication, a place to create online communities or egocentric social networks, and a platform that can be used for distributing commercial content. YouTube was one of the first online services to offer users the opportunity to upload videos and share them with the world. Though similar services, such as MetaCafe, Yahoo! Video, and Google Video, emerged around the same time and were later followed by Vimeo, DropShots, and many others, YouTube has become the most popular video-sharing service in the world. Google bought YouTube in 2006, and it is now operating it as a subsidiary of Google Inc. Some of the site’s current features are now based on Google tools (e.g., the search and suggested list of friends based on Gmail contacts).

The immense popularity of YouTube can be seen in the current statistics offered by the company: as of April 2019 there were over 1.9 billion users from 91 countries in 80 different languages who watch over 1 billion hours of video daily.1 YouTube’s popularity can be attributed to several factors: the relative ease of uploading and sharing videos, the site’s continuous design updates (which reflect the evolution of online social networking practices), and strategic collaborations with commercial content providers such as broadcast networks, movie studios, and political parties. There is an entire cottage industry that has developed to help YouTubers commercialize their content (e.g., see [1]).

The foremost reason for YouTube’s success is the relative ease of uploading and sharing videos. Video sharing existed before YouTube prevailed: videos were sent as email attachments or were available through other video hosting services, yet these services were slow, cumbersome, and limited in the amount of storage they offered. Viewers could not watch videos instantly but had to download them to their computers before playing and viewing them using proprietary video players. Metadata descriptions about the video were rarely available. YouTube changed all that, as well as moving the viewing experience from a solitary experience to a social one. YouTube supports and encourages the embedding of videos in other forms of online communication—from email to microblogging to status updates in social networks—by displaying the relevant link address next to each video. All users have to do is cut and paste these video link addresses into other social media such as blogs, wikis, emails, and status updates. The simplicity of video sharing and embedding contributed not only to YouTube’s popularity but also to the phenomenon of “viral videos”: usually provocative, quirky, or creative videos that achieve extreme popularity after being mass-distributed through electronic word of mouth via various online social tools.

The practice of video sharing creates several types of networks: some networks are based on content and others on social affinity or social ties. Content networks reflect mutual interests or shared hobbies or practices, essentially creating communities of practice [2] that stem from the commonalities between users’ interests. These communities can loosely be based on the preexisting categorical definition suggested by YouTube (e.g. “music,” “entertainment,” “how-to and style,” “politics and news,” etc.) or sub-networks of people interested in a specific aspect of the overall category (e.g., Japanese anime enthusiasts, environmentalists, cosmetic makeup aficionados, gamers). Content networks evolve around videos and are based on ties (edges) between videos (vertices), which are formed through the use of social tools for creating comments, creating linked videos, collecting “favorites,” and tagging content with keywords.

Social affinity networks are formed when users interact with each other. YouTube allows users to subscribe to other users’ “channels” of video collections. Subscription ties are the edges of the networks, and users are the vertices. Social affinity networks can be based on preexisting relationships (e.g., family members, friends, or fans) or form on the site, as people interact with one another based on mutual interest in content. As you will see, the conceptual distinction between the two networks is sometimes difficult. However, as the next section explains, they are structurally separate.

Analysis of different YouTube networks will allow you to reveal the key positions some people and videos occupy within the mesh of connections created when users collectively create, watch, and comment on content or form personal relationships; in some cases they may even offer insights as to the underlying reasons for these connections.

13.3 YouTube’s structure

According to YouTube’s official policy, the purpose of the service is to give everyone a voice and show them the world. As such, YouTube’s structure is based on two sub-layers, clearly differentiating between videos (content) and users (community) while maintaining a close linkage between the two. Similarly, network analysis can be performed on both networks of video or users, either independently of each other or in conjunction.

YouTube features are constantly changing, with new tools being introduced and others periodically removed, often directed at making YouTube an even more social space. As with almost every online social network, you should familiarize yourself with the latest changes, before analyzing a YouTube network.

13.3.1 Videos

YouTube videos are displayed on separate pages with consistent layouts. YouTube pages also include controls for video playback features, a Subscribe button, a Share button, and the ability to Report inappropriate videos. Related videos are also shown, as well as metadata about the video as described below:

  •  Title—a title chosen by the user who uploads the video.
  •  Description—a detailed description provided by the user when first uploading the video.
  •  Username—the poster's username and icon, which links to their channel (see Section 13.3.2).
  •  Tags—chosen by the user to describe the video so searchers can more easily find it. These are not shown publicly on video pages, but can be viewed in the page source or using third party tools like vidIQ Vision for YouTube chrome plugin.
  •  Category—chosen by the user from a closed-list of categories provided by YouTube.
  •  Views, data, and statistics—YouTube provides the number of views the video generated, as well as the number of likes and dislikes, and the date of publication. YouTube Studio beta provides many additional stats for videos you have posted (e.g., location of viewers, viewer trends over time).
  •  Comments—comments about the video by users. Comments can be threaded (see Chapter 10), pinned to the top, and liked/disliked by other users.

13.3.2 The user channel

Similar to other social networks, YouTube users can create personal profiles called “channels,” which are customizable. Users can choose what information to share with other viewers and which sections to display on their channel. Sections allow you to display videos (e.g., highlight popular or recent or linked videos), show a playlist, show a series, or highlight recent activity. They can also be reordered on the page. Thus, users’ pages can be strikingly different: some users prefer not to disclose any personal information, whereas others exhibit their social relations and detailed personal information publicly. Most users display information about themselves (e.g., name, age, location, the date they joined YouTube and the last date they logged in, links to other personal websites, and alternative means of communication—email, instant messaging (IM), or Facebook account), although this can be limited with privacy settings.

Users have the option to display two different social networks via the Channels section of the page:

  1. 1. Featured Channels. These are other YouTube channels that the current channel wants to highlight. They are often related channels or ones that the current user values highly. They create a directed (asymmetrical) edge that points from the current user’s channel to each featured channel that is listed. YouTube limits this to 100 featured channels, and many YouTubers only include a handful of them.
  2. 2. Subscriptions. These are other YouTube channels that the current channel is subscribed to. Only those that are listed as public will be listed on the Subscriptions section of the Channels page. Like Featured Channels, these are directed links that point from the current user’s channel to each user’s channel that they are subscribed to.

With effort and time, a dedicated reader or researcher can piece together pieces of information that assemble an interesting picture of a user’s activity and preferences.

13.4 Networks in YouTube

YouTube’s rich collection of users, videos, comments, subscriptions, featured channels, tags, ratings, and favorite videos offers multiple ways networks can be formed. Broadly, video networks are different from user networks in both content and structure. Within these networks are several subnetworks that provide insights into the important people, videos, and events in these video-sharing networks. Interesting connections between people and content can be found. Examples of these different networks are presented later in this chapter, but first you’ll need to understand the attributes of each network.

13.4.1 Video networks

Several networks can be constructed that connect videos to other videos using the attributes found on video pages:

  •  Videos that share the same descriptors. When users upload videos to YouTube, they must provide video content descriptions, including a title and tags or related keywords. Videos can also be classified according to predefined categories, such as comedy, music, education, politics and news, people and blogs, how-to and style, and so on. Videos that share the same descriptors may also share the same type of content; however, as users are the ones who assign video descriptors, the descriptors can vary widely and you cannot assume content similarity in videos that share descriptors. The emergent collection of connections among videos has varying degrees of density, from tightly knit networks of topical videos that discuss similar content to a dispersed network of videos that have little in common. These content network can be created using data captured in the Tags, Author, Description, and Title fields that are populated when using the Import From YouTube Video Network as described in Section 13.6.1. However, you will need to also apply techniques described in Chapter 8 to transform data in those columns into these types of networks.
  •  Shared comment networks. Users leave textual comments about videos they have watched, often producing lively discussions. Videos can be connected to one another when the same person (or people) comment on them. For example, if Marc Smith comments on a NodeXL Intro video and a NodeXL New Feature video, an undirected edge would then connect the two videos. If Derek Hansen also comments on those two videos, the edge would have a weight of 2. This type of network can be imported using the Import From YouTube Video Network as described in Section 13.6.1.
  •  Related videos. A list of related videos is adjacent to each selected video. These lists are based on YouTube algorithms. The current version of NodeXL does not have a way to capture these networks.

13.4.2 Users’ networks

In contrast to video networks, which focus on content, user networks focus on connections between users. User networks can be explicit or implicit. The direct request or action of at least one user creates explicit networks. Users take the effort to click to create “subscription” networks and display those connections on their channel. Or, users choose to add another user’s channel to their Featured Channel list. Implicit networks are created when two or more users interact through comments, ratings, and favoriting on user and video profile pages. Of the three, only comments are visible to external users of YouTube in personally identifiable form; ratings and favoring are anonymized and summed to a single value. When one user comments on another’s channel profile, an implied connection is created between them. Not all, or even most, comments are responded to, but all create a connection and allow other users to exhibit their interest in that user and the content he or she provides. The NodeXL Import From YouTube’s User’s Network described in Section 13.6.2 allows you to capture the network of user subscriptions. Currently, other user-to-user networks, such as a comment reply network (see Chapter 10) are not possible to capture using the importer. However, it is possible to create a user-to-user network that connects users who have commented on the same videos using data from the Shared Commenter field after using the Import from YouTube Video Network described in Section 13.6.1. This will require significant data preparation, which is not covered in this chapter. Exploring user networks can provide insights into the overall structure of video collections and key videos and users who occupy critical positions in that network, as well as contextual connections among videos and users (Figure 13.1). You will learn in this chapter about the characteristics of these networks.

Figure 13.1
Figure 13.1 A YouTube user's 1.5 ego subscription network.

Understanding the nature and structure of the relationships and ties within YouTube networks can help you understand important interaction patterns and information flows within the networks.

13.5 Hubs, groups, and layers: What questions can social network analysis of YouTube answer?

YouTube does not easily reveal its underlying network structure. The interface displays individual leaves and branches but not the larger forest of connections it contains. Before deciding which data to collect and analyze, you have to step back and conceptualize the questions that are of interest. Once formulated, questions may be pertinent to both video and user networks.

13.5.1 Video networks

  1. 1. Centrality. Which videos are central within a category/type of videos? Which videos generate many comments, response videos, and higher ratings? These videos and users may influence the content produced in other videos and attract many relationships (i.e., subscriptions) with people who share an interest in that content. Some videos are central to a specific category, whereas others are peripheral. Are there differences between a single video and a series of videos produced by the same user? (Do series increase the overall popularity of individual videos? Can a single video be as pivotal as a series of videos?)
  2. 2. Groups. Does the network contain hubs of densely interconnected videos that share properties like common tags and descriptors? Which videos are central to those hubs? Is their centrality correlated to other attributes? Are different hubs connected to each other? Which are the boundary videos that connect such hubs? How dense are these hubs? How do they compare with other types of social content?
  3. 3. Temporal comparisons. How does a video network evolve over time? What affects its development? Are certain descriptors, tags, topics, and types of videos crucial to the evolution of the network? What is the effect of rapidly and widely exchanged viral videos on the development of the video and user network? Do these videos disrupt or reinforce existing networks, or is the effect of viral videos visible mostly outside YouTube’s boundaries? What changes occur when a video becomes popular?

13.5.2 User networks

  1. 1. Centrality. Which users are central in the network of connected YouTube users? Some users may be central in a specific category but not in others. Is centrality an outcome of the explicit or implicit networks? Which users are boundary spanners between different parts of the networks? Which are peripheral? Can you identify rising YouTube stars?
  2. 2. Groups. How do users link together to form emergent groups? What brings them together? (e.g., is it a certain interest, topic, or another reason?) How do the populations of subscribers and featured channels overlap? Implicit groups can be found and compared with explicit groups. Are there central and peripheral groups? Do subscribers-of-subscribers belong to the same groups? How dense are these groups?
  3. 3. Temporal and structural comparisons. How and why does the popularity of users change? How do users move from being peripheral to central and vice versa? Are boundary spanners changing over time? How do external circumstances affect users, their popularity, and their networks? How do the video, subscription, and friendship networks align? What are the differences between a user’s subscription and friendship network? Which is denser? Which is larger? How does a change in a video popularity affect these networks? Are there differences between the explicit networks and the implicit ones? How do they affect each other?

Some of YouTube’s features, such as favoriting, number of views, and lists like “Most viewed,” “Most subscribed to,” or categories such as “Rising videos” can give you an understanding of popularity trends. But you cannot learn about information flows, centrality, and subnetwork structures from these features alone. For that, make use of NodeXL's network analysis metrics, coupled with network graph visualizations.

13.6 Importing YouTube data into NodeXL

To import data from YouTube into NodeXL, first select which network you are interested in and what type of data to import using one of the data Import options in the NodeXL Data menu. When deciding what data to import, remember that YouTube differentiates between videos and users. Although networks of both users and videos can be imported and integrated into the same NodeXL network file and later compared, they cannot be imported simultaneously.

13.6.1 Importing video data

From the NodeXL ribbon, choose Import then From YouTube’s Video Network. A dialog box, similar to Figure 13.2. will be displayed.

  •  In the first field, type your search string. It can be a single word or a full Boolean string, similar to your Twitter data collection (see Chapter 11).
  •  Next, select: Pair of videos commented by the same user (slower). Two videos then will be connected if at least one user posted comment - text of video - to both. It is helpful to think of a link as shared interest, as at least one user expressed interest in both videos by posting comments to both.
  •  You may want to limit the number of videos, the number of top-level comments, and the number of replies in discussion threads to consider when identifying co-commenters. Use the checkboxes and numbers in the bottom three fields to do do this.
Figure 13.2
Figure 13.2 Import from YouTUbe video network.

Your decision about which data to import depends on your research question. Some options will substantially slow the import process and will require further filtering of the data in order to analyze it later. You may need to use the filters to set an upper limit to the number of videos that will be imported in any data import. This is very useful when considering the size of YouTube and the staggering number of videos and the limits of desktop spreadsheet programs.

You are encouraged to collect your own data, but if you prefer, you can download the sample dataset based on the search term “eye shadow” AND “makeup” used in this chapter from here: https://nodexlgraphgallery.org/Pages/Graph.aspx?graphID=175213

13.6.2 Importing user data

From the NodeXL ribbon, choose Import then From YouTube User’s Network. A dialog box, similar to Figure 13.3, will appear. This option allows you to import a network associated with YouTube users. In the search box you can indicate either a user or a channel as the Seed of your ego network:

Figure 13.3
Figure 13.3 Importing user’s network.

You can then select the network Levels to Include. A level is a step in the network from one node to another.

  •  A single level starts with a target user and takes a single step out to all of their subscribers or subscriptions. A second level includes data about all of the friends of their friends.
  •  Taking a half step back, you can construct a 1.5 level network, which limits the list to the ties among first level subscribers. A tie from a friend to someone who is not the selected user’s friend will not appear while ties among their mutual friends do appear.
  •  The higher the number of levels you choose to go into the network to collect data, the longer it will take to import the data from YouTube into NodeXL. In most cases, asking the spigot to bring back 1.5 levels of the network is sufficient to answer many common questions. You can also limit the number of users (vertices) that will be imported, if needed, to extract a dataset of manageable size.

You are encouraged to collect your own data, but if you prefer, you can download the sample dataset based on the user GameGrumps used in this chapter from here: https://nodexlgraphgallery.org/Pages/Graph.aspx?graphID=175211

Advanced topic

Using pre-prepared data

If you would prefer working with data that is not imported into NodeXL through the data imports, this data can be gathered independently and opened directly in Excel and imported into NodeXL (see Chapter 4).

YouTube requires use of the YouTube API to gather data. This is maintained as part of the Google Code system and documentation is available at https://developers.google.com/youtube/v3/. The API describes how data can be accessed from YouTube. It is based in the Google Data Protocol, a REST-inspired protocol for accessing the data over the web. This means data can be accessed with precisely formatted machine-readable content over the web and through a wide variety of programming languages.

You can write your own code to gather YouTube data through this API. You may want to crawl a network, beginning with a specific user, moving outward through the network. You may also want to send queries and create connections between users or videos based on criteria different from those available in the NodeXL YouTube data spigot.

Once you have gathered the data, it can be loaded into Excel and formatted for use in NodeXL.

13.6.3 Ethical considerations

YouTube users can choose to make certain segments of the information on their channel private. They can decide that their subscribers lists are private, approve or delete comments, and send private messages to other users. When importing data from YouTube, it is important to consider that data may be missing some components because of privacy preferences (for instance, complete subscription lists will not be imported into NodeXL if they are defined as private). In addition, ethical considerations stem from using other people’s data, even when publicly available. When looking at a user’s data or performing social network analysis on them, researchers should act carefully and with respect to users’ expectations of privacy.

Because not all users are aware of privacy settings and related considerations, researchers should demonstrate care when dealing with information that may be perceived of as private. For example, personal ties and other personally identifying information. In the case of YouTube, there is the danger of inadvertently disclosing extremely personal, embarrassing, or sensitive information. Putting together a name, face, and an opinion may be more revealing than many users expect. Anonymizing data can alleviate some of these concerns; however, it is problematic in a network that offers many facets of extensive data about the users. Therefore the rich metadata offered along with YouTube videos should be handled with care and with respect to the users behind it.

13.6.4 Problems with YouTube network data

Data collected for analysis in NodeXL, either through the importers or through your own code written with the API, is not necessarily complete or accurate. First, the YouTube data API is not 100% reliable when used by either your own code or the NodeXL importers. Because it accesses data over the web, requests may be lost or timed out. Thus, the same query may yield different results at different times. Second, even without errors, imported data may not reveal a complete network. Videos and user profiles may be marked private, preventing them from being accessed and included in the analysis. Third, users can choose to remove a video they uploaded at any time, and YouTube can remove a video when it violates the site’s terms of use or is flagged for review by other users. However, there is a delay between the time a video was removed from the site and the time it will stop appearing in search results. This can cause data previously accessed to be incorrect—a video may appear as a vertex in the network but will not actually remain a part of it in the current data on YouTube.

13.7 Preparing YouTube network data

Using the YouTube video network importer will result in collecting multiple edges for the same vertices (i.e., videos). As a result, it is possible for videos to be connected to each other more than once. When the data is imported, you will see the Relationship column on the Edges Worksheet will say Shared commenter, since that is the type of edge. This will create duplicate edges—one for each comment. Even multiple comments from the same user will create duplicate edges (with unique content, but the same videos in the Vertex1 and Vertex2 columns). This is so that the content of each comment can be viewed (see Video1 Comment and Video2 Comment columns). In such cases, you can Count and Merge Duplicate Edges feature (available in the Prepare Data drop-down in the NodeXL Ribbon) present a clearer picture of the relationships, allowing you to visually and statistically represent the data accurately. Clicking on the first two options will compare each edge and find the duplicates and then merge them into one connection. The connection will be weighted according to the number of duplicate relationships that were merged. A column containing the edge weight is added to the NodeXL Edges worksheet.

13.8 Analyzing YouTube networks

13.8.1 User networks

YouTube users (or “YouTubers”) are content providers, drawing other users, viewers, and participants to the videos they produce. However, YouTube users do not limit their activity on the site to video content production. The massive amounts of social interaction that occurs on YouTube strengthen the idea that the site is host to a complex and lively community of users and participants and is more than just a platform through which incidental viewers can watch videos [34].

YouTube user networks are important for several reasons. (1) Users are the essence of the YouTube community; the way they congregate and interact may reveal the flow of information in the YouTube network and highlight the importance of certain users to the community or to a specific sub-community of users. (2) Differences between users’ networks can demonstrate different participation patterns and aid in improving the interface to accommodate various audiences. (3) These networks can also be used by companies and organizations wishing to use the YouTube platform for advertising, lobbying, or disseminating information.

Users’ egocentric networks tell an interesting story about the type of ties created among users who subscribe to both the central user being analyzed, and other users that subscribe to each other. The first network you will analyze in this chapter belongs to the popular YouTube video game content creators “Game Grumps” (see file link provided above). The “Game Grumps” are a small team of creators whose primary use for YouTube is to upload and share “Let’s Play” videos of themselves playing video games. To see the network, use the Import From User’s YouTube Network option in the NodeXL ribbon as shown in Figure 13.3. Selecting 1.5 levels will retrieve both users who have ties to the Game Grumps and the ties among themselves. This will allow you not only to analyze users’ relations with the person who is the middle of the egocentric network but also to find any hubs of relationships that exist among the other users.

In the Edges worksheet tab you can see each pair of vertices and a new column titled Relationship, which is added to the standard NodeXL Edges worksheet and filled according to the type of YouTube relationship that connects these vertices. A quick look at the relationship column (Figure 13.4) will reveal that there are only Subscribed To relationships between the vertices. Note that in the Vertex 1 and 2 columns you will find User IDs. The information about each user is located in the Vertices spreadsheet, as a user—a vertex—is the unit of analysis.

Figure 13.4
Figure 13.4 User’s network Edges spreadsheet showing User IDs and a Subscribed To relationship type.

Select the Vertices spreadsheet to learn more about the users in the network. While the Vertex uses the unique YouTube ID of the user, you can find user information in the Title and Description columns. See Figure 13.5. You will return to these spreadsheets as we start calculating social networks metrics and visualize the graphs.

Figure 13.5
Figure 13.5 User's network the vertices spreadsheet showing the User ID, Title (i.e., username), and Description columns.

For now, you will create a visual display of the egocentric network that surrounds the Game Grumps. It is useful to calculate network graph metrics that will describe the shape of the graph and describe the location of each person or video within it. You can then move on to explore the characteristics of the graphic display of the network.

Click on Show Graph and look at the initial visual display (Figure 13.6). Each vertex represents a user in the network that subscribed to Game Grumps. Each edge captures a subscription-relationship between two users. Because this is a directed network, there are arrows indicating which user follows which user. In order to make better sense of the network, you can highlight key vertices, based on their Vertex-level metrics.

Figure 13.6
Figure 13.6 User's network—initial graph.
  •  Calculate vertex in-degree, out-degree PageRank, betweenness centrality, vertex reciprocated vertex pair ratio, and edge reciprocation using Graph Metrics.
  •  Use Autofill Columns to set Vertex Color to Out-Degree, Vertex Size to Degree Centrality and Vertex Label to Title. See Figure 13.7.
  •  Set Vertex Shape to Label, using the Graph Options Pane. It appears on the top-right side of your Graph pane (Document Action). See Figure 13.7. If you wish, you can change other characteristics of Vertices and Edges, such as the Curvature of the edges.
    f13-07-9780128177563
    Figure 13.7 AutoFill columns.
  •  Click Refresh Graph (Figure 13.9).
Figure 13.8
Figure 13.8 Graph options.
f13-09-9780128177563
Figure 13.9 User's network—customized visualization.

Let us examine the graph (Figure 13.8). Each Vertex is represented by its user, as we designated earlier. Examine the Vertices spreadsheet and find the top out-degree and betweenness centrality vertices (users). You can use the sort options to organize these columns from largest to smallest. Now, examine the graph. At the center you will find the Seed for this ego-network, the GameGrumps YouTube account. You designated vertex color to be associated with out-degree, so it has the darkest color as it subscribed to the largest number of users in the network (not a surprise, as you collected data about this user’s subscriptions). You set vertex size to correspond with betweenness centrality, and as expected GameGrumps has the highest betweenness centrality as it is connected to all users and they are all connected to others either directly or through this account. Looking at the rest of the users, you can see that users at the core of the network are more connected, their higher number of subscriptions (out-degree) gives them a darker color and their higher betweenness centrality values determines their larger sizes. In contrast, at the periphery you will find the smaller and brighter vertices. The graph and the spreadsheets are linked, so if you select a Vertex in the Vertices spreadsheet it will highlight the corresponding vetext in the graph and vice versa. You may consider removing GameGrumps from the network, since including it can obscure connections between others in the graph. This will lose some information (i.e., who subscribes to or is subscribed by GameGrumps), but may be worth it to remove the edges connecting GameGrumps to all other vertices in the network.

13.8.2 Video networks

The second layer of networks that can be found on YouTube are the content-related networks that stem from the various kinds of links between videos uploaded on the site. These networks are less about personal affinity or depiction of preexisting relationships and more about shared topicality and thematic association. Understanding video networks can offer you insights about several important happenings on YouTube, from the phenomenon of viral videos to institutionalized information dissemination; how independent and sponsored videos are connected to each other and how users form these connections and react to them. These insights can help others fathom the way YouTube is used for different purposes, and guide their own engagement in this community.

13.8.3 The YouTube “makeup” video network

In this section you will focus your attention on an example of videos related to “makeup” and eye shadow (see file link earlier). Videos tagged with the word “makeup” may come from several sources: some are cosmetics companies’ efforts to extend their marketing to reach viewers online, some videos are from makeup professionals who promote themselves by providing tutorials to the masses, others are created by teenage girls who share their first experimentations with cosmetics. “Makeup” is one of the most popular topics on YouTube, with millions of videos related to the topic and an extensive presence in various categories.

To start analyzing the YouTube makeup video network, look for separate or overlapping groups or cliques of users create around the shared terms. Some uses of a specific term are distinct from each other (i.e., “relationship” AND “makeup”), whereas other terms blend together (“eye shadow” AND “makeup”). Some videos are clearly personal or amateur, whereas others are professional productions.

Start by importing the video network data into the NodeXL workbook using the data shown in Figure 13.2 earlier in the chapter. Make sure to check the first box, which will create edges between videos that have comments by the same person. Although NodeXL can import large numbers of videos, because of the popularity of the topic and the huge number of related videos, limit retrieval to 300 videos, 200 for top-level comments and 100 for replies (Figure 13.2). Alternatively, you can download the file linked to earlier from the NodeXL Graph Gallery.

Searching for the keyword that you have just typed into the search box, the application will look in all the possible search fields combined. Currently, the YouTube API does not allow for distributed searches that differentiate between various fields (e.g., header, description, tags, and category). Therefore, some of the videos will have the keywords makeup in their header or their description, and some will not. However, all videos (vertices) will have the “makeup” and “eye shadow” keywords.

After importing the YouTube makeup video network into NodeXL, the workbook will include the following: an Edges worksheet that includes pairs of vertices along with a column that describes their Relationship (in our case all edges will be connected by Shared commenter), and a Vertices worksheet that will be populated with information about each of the individual vertices (i.e., videos). This includes useful information such as the number of views, comments, likes count, dislikes count, creation date (UTC), title, description, author, tags, and links to the actual videos. Select the Show Graph button in the NodeXL Graph menu to display a visualization of the set of connections among the population of “makeup” videos. The first look at a YouTube video network can be daunting. What can this blob in Figure 13.10 mean?

Figure 13.10
Figure 13.10 YouTube “makeup” AND “eye shadow” video network in its raw form.

Many shared commenter networks are densely interconnected, since there are people who comment on many of the videos related to a certain topic. Using a range of tools in NodeXL, you can look at deeper layers within this network, filtering obscuring details to reveal interesting things. The first step in this process is to prepare the data, first by using the Count and Merge Duplicate Edges feature in the Import menu (Figure 13.11). In the case of video networks this is especially important as multiple ties can be repeatedly created based on the actions of only a couple of active commenters. In our example, the makeup video network, the Merge Duplicate Edges feature reduced the network from more than 16,000 edges to only 775 unique edges. Note that you are not losing any network data, though you do lose some of the content of the comments in the Other Columns. The number of duplicate edges appears in a new column in the Edges spreadsheet titled Edge Weight.

Figure 13.11
Figure 13.11 Count and merge duplicate edges.

Once the dataset is prepared, it is ready for the creation of network metrics. Compute the metrics relevant to undirected networks using the Graph Metrics menu option. Summary metrics about the network are reported on the Overall Metrics worksheet. While an analysis of some of the metrics and the basic visualization can provide some insights, the network is difficult to decipher, and the visualization gives us only a point from which to begin our exploration. Filtering the network, based on Edge Weight, can remove some of the excessive data, especially the peripheral videos that do not belong to the core of the network.

Looking at the range of Edge Weights you can see that it is quite wide, ranging from 1 to 8429. This suggests that some pairs of videos have only one shared commenter, while others have many. Remember from Chapter 7 that there are two approaches to filtering in NodeXL: one (Dynamic Filters) operates on the display of vertices and edges in the visualization pane, and the other operates on the spreadsheet data rows that feed the graph visualization. Unlike Dynamic Filters, when an edge or a vertex is Skipped using Autofill Columns it will not be read into the graph visualization, and clicking on related edges or vertices will not display it. Filtering at the spreadsheet level can be useful for reducing the size of the data sent to computation intensive tasks like the calculation of metrics, clusters, and layouts. Data filtered at this level will never appear in the graph display no matter how the Dynamic Filters are set; you have to be careful about filtering this way and not exclude important parts of the network graph. In this case, using the Autofill Columns option is preferred, as at a later stage you can recalculate the network metrics to include only its core components.

Use Autofill Columns to display the wanted resolution of the network. To do that, select Edge Visibility and base it on Edge Weight. In the right-hand Options tab, select an edge weight starting from 2 (Greater than 1) as shown in Figure 13.12. Also map Edge Width to Edge Weight with Options that make the maximum width 2 and use a logarithmic mapping.

Figure 13.12
Figure 13.12 Autofill Columns with Edge Visibility Options dialog also open and set to Greater than 1.

At this stage you can see multiple isolates that clutter the visualization. An easy way to de-clutter the graph is by applying a different layout for the graph. From the graph pane layout algorithm drop-down menu select Layout Options, and then choose the third option to lay out the smaller connected components at the bottom of the graph. Then choose the Harel-Koren Fast Multiscale layout algorithm and choose Refresh Graph. This will position the vertices in a much more meaningful way.

Once the filtering has presented a workable visualization, the next step is to find clusters (i.e., groups) within the network. You can often recognize clusters in networks visually, but clusters can also be automatically identified as described in Chapter 7. This feature creates a set of groupings of vertices based on their patterns of interconnection. It finds some obvious clusters but can also identify more subtle distinctions that may not be visually obvious. Click on the Groups drop-down menu in the NodeXL ribbon and choose Group by Cluster, selecting the Wakita-Tsurami clustering algorithm from the list and checking the Put all neighborless vertices into one group option. Then, choose Refresh Graph to exhibit the clusters. Look carefully at the clusters that were created: NodeXL assigns each cluster a unique shape or color, and sometimes different clusters may share the same color. If, as in this case, it is hard to differentiate between clusters because of the color similarity, use the Groups worksheet to manually set the group colors or shapes.

The resulting graph should be fairly readable. However, you may want to split groups into distinct boxes by using the Layout Options second choice Lay out each of the graph's groups in its own box (Figure 13.13). Calculate Graph Metrics again, this time including Group metrics.

Figure 13.13
Figure 13.13 YouTube video network displayed by groups with group labels manually added after content analysis of the video descriptions.

Compare the groups visually on the graph. Next, use the metrics presented on the Groups worksheet to compare them numerically. Notice that some of groups have high density, while others have low density. Navigate to the Vertices worksheet and sort on the Vertex Group column. Then read through the Author, Title, and Description tag to better understand how the videos are related. You can also right-click on any vertex in the graph pane and choose the Play Video in Browser button to open the video in YouTube. This content analysis can help you compare different groups to identify common themes. For example, some groups are based on regions (e.g., India or Pakistan), while others cover introductory tutorials versus more advanced tutorials. Add titles to the groups if desired.

It is interesting to discover that no commercial videos or product placement videos were included in this dataset. That may be because of cosmetics companies’ reluctance to use YouTube as an advertising venue or due to the popularity of tutorial videos in comparison to commercial content (a search for “makeup commercial” retrieves only about 4800 results, compared with more than 300,000 results for makeup tutorials and 46,000 for makeup tips), it can also be attributed to the sample size.

However, if commercial corporations want to use YouTube as an advertising platform, they should consider how and through which route they can best promote their products. One way is to engage prominent users in product placement (i.e., using specific products in their tutorials or tips). To find the most strategic users to approach, advertisers need to identify the most central users that affect the community. Some users are more influential than others.

Social network analysis has several measurements that can be argued to relate to the “influence” of a position in the network. One way you can identify influential users in YouTube networks is by sorting users by their betweenness centrality measurement. This value captures the extent to which a person is the only path between two otherwise separate parts of the network. Each person is “between” separate networks, and is a “bridge” between these networks. In YouTube, some videos are linked strategically so that they are the only connection between separate clusters of videos. At this stage it is important to remember that at the beginning of this analysis you used Autofill Columns to filter all videos with edge weight of less than 2. This not only allowed for a clearer visualization but also affected the accuracy of the network metrics that were previously calculated. Videos that were previously central or prominent may change their position in the network after the filtering. Currently, because you calculated Graph Metrics after filtering, many videos will not have any metrics values. These metrics are thus, focused on the relationships among videos that are more closely connected to one another.

Often, when analyzing YouTube video networks, you can identify certain videos that serve as bridges between two clusters. If this is not as apparent, turn to the Vertices worksheet and sort on the Betweenness Centrality column. Select the top row and click or scroll through the list and look at the placement of the high betweenness videos within the clusters. These videos are boundary objects around which the YouTube “eye shadow AND makeup” network congregates. Boundary objects are intellectual concepts, artifacts, or objects that connect different communities of interest [2, 5], though each community may interpret or use them differently. Boundary objects are also used as translational mechanisms—they provide a channel for transferring information, ideas, and understandings between different communities, where each community recognizes the boundary object’s common structure yet applies to it its own interpretation. A video is not a boundary object per se—it becomes one when different communities give it meaning and use it as such. It is elastic enough to accommodate the different meanings attributed to it by the different clusters.

It is interesting to examine overall placement of the videos in the entire YouTube network. This can be done using the statistical data provided by YouTube (e.g., views, likes count, dislikes count, number of comments), which is also imported into NodeXL. These statistics can help you compare videos central to the entire YouTube social network. What kind of story could you tell if a video has a high betweenness score, but a low number of views compared to less central videos within the network? In this dataset, the video that is most central to the network for linking videos together at a centrality of 1580.169 has less than 95,000 views, while the video with the most views—more than 18 million—has a centrality score of less than 25. Despite having millions of more views than our central-most video, this video is out on the periphery of our network.

To better view this metadata, use Autofill Columns to set the vertex Shape to Views and in Vertex Shape Options select Label for every video with more than 3,600,000 views. Set Vertex Size to Degree, Vertex Layout Order to Views, and Vertex Visibility to Degree. Examine the Vertices spreadsheet and find that for the 19 most-viewed videos on our network the Shape column indicates Label, using Vertex Shape Options to include only vertices with Degree Centrality of 2 or more in order to hide the less connected vertices. The result should look something like Figure 13.14.

Figure 13.14
Figure 13.14 Video network with top viewed videos.

Examining the graph that you just created you will notice that videos and users that have widespread popularity among a general population of viewers (i.e., those showing up as a label instead of a disk) may lack influence within local communities of interest (i.e., they may not have a high degree). You will also notice that of the 19 most-viewed videos only 15 appear in the graph. Look at the Visibility column in the Vertices spreadsheet and note that for the missing 5 videos the Visibility is set to Skip. A quick look at their degree values will reveal that they have very low values (under 2). Since you set the Vertex Shape Options to include only vertices with degree centrality of 2 or more, these high viewed videos are not in the graph. This is another illustration that videos may be very popular on YouTube and still peripheral in your video network.

To display complex combinations of network attributes, you can map different metrics to different visual properties. A node may hold different attributes like its centrality in its local network and popularity in the overall YouTube population, measured in terms of the number of views or comments. This is demonstrated in Figure 13.15 where, using a fresh copy of the dataset, edge-bearing vertices were filtered out if they had less than 700 comments, labeled by author, with each node shaped to the video’s thumbnail image.

Figure 13.15
Figure 13.15 Filtered YouTube eye makeup tutorial network.

These additional attributes can help distinguish between different kinds of popularity and activity in YouTube, showing that some videos are popular but do not generate discussion and vice versa. These additional attributes can also help illustrate and reveal interesting insights about our networks.

13.9 Practitioner’s summary

Analyzing YouTube social networks can offer many insights into the ways videos become popular, sometimes even becoming “viral,” and the way information is disseminated through videos. YouTube’s popularity makes it a channel that enables professionals, from marketing experts to political advisers, to gauge popular themes and public trends. Analyzing video networks will make it easier to decide on the types of interventions, the creative routes that will maximize their outcome, and, even more important, what approaches not to take in order to avoid negative backlash.

For artists and content producers who are not backed by large-scale media corporations, simple social network analysis—focusing on egocentric and content networks—can give a glimpse of the many facets that affect success and popularity within the YouTube universe.

As YouTube becomes an important tool for information dissemination in other noncommercial domains, such as education and public health (e.g., the Center for Disease Control’s streaming channel2), social network analysis of a different kind—one that explores which audiences can be reached through YouTube videos, for example—can provide a useful tool to coordinate funding and production efforts in an efficient manner.

These observations, coupled with YouTube’s immense popularity, provide deep insights into one of the major media outlets in existence today. The size of the YouTube network can be daunting, but by focusing on appropriate samples of data, filtering with relevant metrics, and using visualizations extensively, you can grasp at least part of what shapes our contemporary culture.

13.10 Researcher’s agenda

Despite YouTube’s immense popularity, the research on YouTube’s underlying social networks is in its early stages. Although practitioners, from marketing experts to educators, have attempted to explore these networks to gain an understanding of the best ways to utilize YouTube for information dissemination, researchers have, for the most part, preferred to study more “obvious” social networks such as Facebook and Twitter. Structural studies of YouTube have focused on its overall macrostructure [67] or on a category level [3, 8]. The rich composite data found in YouTube offer a compelling reason to map the networks of connection it contains. The combination of user-generated content and social ties can illuminate many phenomena that shape not just our popular culture [9] but also the way institutional information is disseminated or the ways in which public opinion is broadcasted [10]. Increasingly, it is used to understand propaganda campaigns as well [11].

Using social network analysis, researchers can identify important YouTubers or pivotal videos, as well as the types, structures, and development of the networks that are created around them. Researchers can also explore how the structure of ties and networks on YouTube affect content creation. The interplay between content and structure is one of the more important attributes of YouTube and is worthy of deeper exploration.

As you have seen, different users or videos have different networks built around them. Understanding the nature and evolution of these networks can lead to improved use of YouTube by users and enterprises or by designers of video-sharing interaction. Researchers can extend our knowledge of the social processes that underlie the YouTube interaction and the ways the social networks that exist on the site contribute to the popularity of or disregard certain views, opinions, or video content.

References

[1] Kane B. One Million Followers: How I Built a Massive Social Following in 30 Days. Dallas, TX: BenBella Books; 2018.

[2] Wenger E. Communities of Practice: Learning, Meaning, and Identity. Cambridge, MA: Cambridge University Press; 1998.

[3] Rotman D., et al. The community is where the rapport is: on sense and structure in the youtube community. In: presented at the Proceedings of the Fourth International Conference on Communities and Technologies, University Park, PA, June 4–7; 2009.

[4] Rotman D., Preece J. The “WeTube” in YouTube: creating an online community through video sharing. Int. J. Web-based Commun. 2010;6.

[5] Star S.L., Griesemer J.R. Institutional ecology, ‘translations’ and boundary objects: amateurs and professionals in Berkeley’s museum of vertebrate zoology. Social Stud. of Sci. 1989;19:387–420.

[6] Cha M., et al. I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In: Presented at the Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, San Diego, CA, October 24–26; 2007.

[7] Geisler G., Burns S. Tagging video: conventions and strategies of the YouTube community. In: Presented at the Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, Vancouver, BC, Canada, June 18–23; 2007.

[8] Paolillo J.C. Structure and network in the YouTube core. In: Presented at the Proceedings of the 41st Annual Hawaii International Conference on System Sciences, January 07–10; 2008.

[9] Burgess J., Green J. YouTube: Online Video and Participatory Culture. Malden, MA: Polity Press; 2009.

[10] Gueorguieva V. Voters, MySpace, and YouTube: the impact of alternative communication channels on the 2006 election cycle and beyond. Social Sci. Comput. Rev. 2008;26:288–300.

[11] Klausen J., Barbieri E.T., Reichlin-Melnick A., Zelin A.Y. The YouTube Jihadists: a social network analysis of Al-Muhajiroun’s propaganda campaign. Perspect. Terrorism. 2012;6(1).

Suggested reading

[Shoham et al., 2013] Shoham M.D., Arora A.B., Al-Busaidi A. Writing on the wall: an online “community” of YouTube patrons as communication network or cyber-graffiti?. In: System Sciences (HICSS), 2013 46th Hawaii International Conference on IEEE; 2013:3951–3960.

[Park et al., 2015] Park S.J., Lim Y.S., Park H.W. Comparing Twitter and YouTube networks in information diffusion: the case of the “Occupy Wall Street” movement. Technol. Forecast. Soc. Chang.. 2015;95:208–217.

[Hai-Jew, 2016] Hai-Jew S. Exploring “User,” “Video,” and (Pseudo) multi-mode networks on YouTube with NodeXL. In: Social Media Data Extraction and Content Analysis. 2016:242.

[Xu et al., 2016] Xu W.W., Park J.Y., Kim J.Y., Park H.W. Networked cultural diffusion and creation on YouTube: an analysis of YouTube memes. J. Broadcast. Electron. Media. 2016;60(1):104–122.

[Xu et al., 2015] Xu W.W., Park J.Y., Park H.W. The networked cultural diffusion of Korean wave. Online Inf. Rev.. 2015;39(1):43–60.

[America, 2017] America K. The communicative features of online hate in temporary social networks in Twitter and YouTube. Multilingual Margins. 2017;2(2):74.

[Oksanen et al., 2014] Oksanen A., Hawdon J., Räsänen P. Glamorizing rampage online: School shooting fan communities on YouTube. Technol. Soc.. 2014;39:55–67.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.14.93