Chapter 4. Network Patterns

One of the most critical roles of networks is their use in identifying various behaviors, regardless of whether they originate from social networks, disease tracking, idea diffusion, or any one of many other sources. There is a large and growing body of literature dedicated to network analysis that can be applied to a wide range of behavior across multiple realms, merging theory with real life examples to help you provide solutions to critical needs.

In this chapter, we will initially examine several of the most critical concepts, providing general overviews as well as external resources that examine each of these concepts more thoroughly. The topics covered in this chapter are as follows:

  • Contagion and diffusion
  • Clustering and homophily
  • Network growth patterns
  • Using Gephi generators
  • Viewing a contagion network
  • Viewing network diffusion
  • Identifying homophily

After covering the theoretical foundations, we'll then move onto how to identify and apply these ideas using Gephi. The chapter will conclude with a section on traversing networks that will incorporate the previously noted concepts.

A significant portion of this chapter will be spent working with example datasets that can be used to illustrate various theories, demonstrate how Gephi can be leveraged to create and then to understand each of these network behaviors. So you are encouraged to join in using the same datasets used in the chapter and to replicate the steps used here in your own version of Gephi.

Contagion and diffusion

Contagion and diffusion are related concepts that are often best understood through network analysis. The use of network graphs can help you provide considerable insight into the potential spread of a disease as well as the success of a new innovation. This is done by exposing the structure of the underlying network. Densely connected networks can be effective in both of these realms, which can have both positive and negative effects, depending on whether we are viewing the diffusion of a positive innovation, or conversely, the spread of an infectious disease.

Contagion

One of the most intriguing and potentially valuable ways of using networks is to examine how disease transmission works within a network, and how the structure of a network can either promote or limit the spread of the disease. If the transmission pattern can be altered through some sort of intervention, such as vaccination or even simple avoidance (for example, not contacting a friend if there is a high likelihood of contagion), then the spread of the disease can be significantly reduced and perhaps stopped altogether.

We'll begin our discussion of contagion at a theoretical level by laying the groundwork for a practical approach later in the chapter, and using Gephi to demonstrate how network structures give rise to contagion levels, or conversely, how they might prevent further spread of a disease.

So what is the definition of contagion we'll be using in this chapter? Out of the multiple definitions using the Merriam-Webster dictionary, here are the two that best represent the concept for our purposes:

  • The transmission of a disease by direct or indirect contact
  • An influence that spreads rapidly

David Easley and Jon Kleinberg also allude to the similarities between diffusion and contagion in their book, Networks, Crowds, and Markets: Reasoning About a Highly Connected World, Cambridge University Press (published in 2010).

Easley and Kleinberg highlight the primary difference between the two when it comes to modeling:

"…the biggest difference between biological and social contagion lies in the process by which one person "infects" another. With social contagion, people are making decisions to adopt a new idea or innovation…With diseases, on the other hand, not only is there a lack of decision-making in the transmission of the disease from one person to another, but the process is sufficiently complex and unobservable at the person-to-person level that it is most useful to model it as random."

In other words, contagion is often best expressed using random models, given the unpredictable contact patterns that often spread a disease. Diffusion (or social contagion), is typically the result of a network structure and the interaction between members of a network, and can thus be modeled using methods with a higher degree of sophistication compared to a random model.

Contagion, in its biological form, requires close physical contact to propagate, although there are some caveats to this rule. In many cases, direct contact is required, but in other instances, merely being in the same place where infectious germs remain from a prior occupant (a bathroom, bus station, escalator, and so on) is sufficient to spread the disease. Even in these cases, there are limiting factors that will halt the spread of the disease, at least for the moment. The most critical element here is time. Suppose two strangers ride the same train on the same day and occupy the same seat just 15 minutes apart. Infectious germs left by Stranger A might well survive long enough to infect Stranger B. Stranger C, who occupies the same seat 4 hours later, is highly unlikely to be exposed to the same germs, simply due to the difference in time. Here, we are witness to a bit of the randomness characterized by Easley and Kleinberg.

One of the ways in which the contagion can spread is through a simple branching network, where the individual carrying the disease enters a population and transmits it to each person he or she makes contact with. Note that there is always a probability associated with this event, as not everyone will contract the disease. The probability might range from a relatively low 0.10, with one of 10 contacts picking up the disease, to something much higher, contingent on both the infectiousness of the disease coupled with the immunity levels of the members in the network. In cases with a high level of infectiousness combined with relatively low immunity levels, the probability of each individual being infected can be very high. The inverse is true if there is a low contagion level paired with high immunity levels.

The branching network works quite simply. Let's assume a 30 percent (0.3) probability that members of the first individual's contacts will also get the disease. For simplicity, we'll assume this represents three individuals. This is the first wave in the spread of the disease. These three infected persons then go out into their respective (albeit somewhat random) networks, and further spread the disease to roughly one of every three persons they meet. This is the second wave of the disease. Finally, each subsequent wave is spread in the same fashion, making the branching process potentially infinite; although in practice, it will tend to die out well before this occurs.

For the next few graphs we used CmapTools, which is available at http://cmap.ihmc.us/. Let's track this initial scenario using a simple diagram to help you understand the spread of the disease through its first several waves:

Contagion

Contagion branching network with 0.3 probability

Notice how quickly the contagion fades in this instance, with just two of the 12 possible nodes getting infected. This obviously changes dramatically in the case of a highly infectious disease where the probability of transmission is 0.7, as shown in the following diagram:

Contagion

Contagion branching network with 0.7 probability

Instead of just two of 12 being infected, now half of the susceptible nodes have contracted the infection. This begins to illustrate the influence of individual nodes close to the source and demonstrates the downstream impact they can have. In this instance, even with a high transmission rate, those who come in contact with D have no chance of contracting the virus, given that D had some sort of immunity not present for either B or C.

Now let's move onto a brief discussion of three epidemic models discussed by Easley and Kleinberg in Networks, Crowds, and Markets: Reasoning About a Highly Connected World, Cambridge University Press (published in 2010), that provide further insight into the spread of disease and how network structure plays a critical role in the progression of an epidemic.

The SIR model

The SIR model offers three distinct stages for each node in a network: Susceptible, Infectious, and Removed. This can be viewed in temporal fashion, as follows:

The SIR model

A brief definition of each stage is provided here:

  • Susceptible: This refers to the stage where an individual node could potentially be the recipient of an infection, as they neither have it currently nor do they have immunity. This does not mean that infection is inevitable, as that depends on other factors such as the strength of the virus, the immune system of the potential recipient, and the timing of contact with an infected node.
  • Infectious: This simply describes nodes who currently have the infection and have the ability to transmit it to those in the susceptible stage. In the SIR model, those who have been in the infectious stage wind up as part of the removed population, as they cannot acquire the same infection for a second time.
  • Removed: These nodes are those persons who have already been in the infectious stage and have now recovered. In the SIR model, they cannot be infected a second time and can therefore eventually bring the contagion process to a halt through their acquired immunity.

The SIS model

In a SIS model, the removed status has been replaced by a second instance of susceptible, as this model assumes the case that individuals are not immune from a disease simply because they were previously exposed to the infection. In this case, after leaving an infectious state, each member will become susceptible once again, which is represented as follows:

The SIS model

The single difference between SIR and SIS is the ability of previously infected persons to be reinfected; in other words, there is no immunity built up simply as a result of having had the infection. A SIS cycle has the potential to last almost indefinitely, as an infection can be passed back and forth through repeated exposure to infected individuals.

The SIRS model

A third case is presented that merges the first two models, resulting in the SIRS model, with a fourth stage where individuals have temporary immunity from the infection, but eventually re-enter the susceptible stage. It is represented as follows:

The SIRS model

In this instance, persons coming out of the infectious stage are removed, as they are in the SIR model, but for only a finite time period. Once this period has ended, individuals revert back to the susceptible stage, where they can be infected for a second (or greater) time. We would anticipate that a contagion process will die off faster in this situation compared to the SIS model, but has the potential to run considerably longer than a contagion under the SIR model.

I hope this discussion has provided you with a fundamental understanding of how the theory behind contagion works, and how we might be able to use Gephi to understand the process further. Specifically, we can start down this path by working with dynamic networks in Gephi. A very quick approach will be to navigate to the Generate | Dynamic Graph Example..., which will create a relatively simple network. From there, we can choose to edit that file, or create our own dynamic dataset using time elements (start time, end time) for use in Gephi. We'll address this topic more specifically in Chapter 8, Dynamic Networks.

Diffusion

While contagion is often viewed through the lens of an infectious disease and other direct contact effects, diffusion can be thought of in a somewhat different manner. We'll use the following definition to best describe this process from a network analysis context:

Diffusion is the process or state of something spreading more widely.

Note how this is related to contagion, in the sense of something spreading (an idea, an innovation, and so on), but has a much broader definition that goes beyond the spread of disease. We should also recognize that while contact plays a role in the process of diffusion, it might not be the same sort of direct contact implied by contagion. For instance, the behavior within a network of Facebook friends can certainly enable a diffusion process through sharing and forwarding a particular post. No direct physical contact is required for this process to succeed.

We assume from our preceding example that one of these Facebook friends has recently contracted a flu virus. This virus will not spread through his or her Facebook network regardless of how many posts are shared, unless he or she also has direct physical contact with the same group of friends. This brings us to another critical distinction between the two processes: diffusion generally requires some sort of social contact or influence to take root, while contagion is dependent on physical proximity, regardless of whether others are part of your social or professional network. Diffusion can be witnessed in the spread of a specific good across a network. This might or might not be a physical product; it can just as easily be the spread of digital property (music files, videos, images, and so on) or even a concept or new idea. The Web has certainly aided the diffusion process in many cases, but is not the sole channel for propagation. Traditional media, advertising, word of mouth, and industry events are just a few other means for initiating a diffusion process.

The concept of preferential attachment, addressed earlier in the book, also plays a significant role in diffusion (as well as contagion). Networks that are defined by the presence of a relatively small number of influential hubs enable more rapid diffusion of many products or processes. For instance, when one of the large hubs promotes the latest cat video, it will soon be found in millions of Facebook feeds, regardless of whether the user is fond of cats. In contrast, try to picture this same process in a highly fragmented low density network. The diffusion process in this situation will have difficulty in establishing any momentum, and the process is likely to stop almost as quickly as it started.

Many of the ideas presented in this section are based on the work of Easley and Kleinberg in their book, Networks, Crowds, and Markets: Reasoning About a Highly Connected World, Cambridge University Press (published in 2010). Their work will help in providing a general overview of how network diffusion works, which will then be translated into examples using Gephi. The process of how diffusion takes place within networks can be fascinating, as it is frequently tied to both global and local structures within a given network. These structures can either promote or discourage diffusion, which we will demonstrate through some examples in the following sections.

Diffusion, as we noted earlier, is concerned primarily with the spread of ideas, innovations, and other concepts that do not necessarily require close physical proximity. Proximity can play a role, but it is generally not essential to the spread of something, quite the opposite of contagion. We might, in fact, find cases where physical location is critical—say the opening of an independent coffee house in Minneapolis, where the reputation of the store is only relevant to the customers living (or visiting) within a short distance of the store. So there can be a physical component to diffusion, but in this sort of case, it is likely to be a limiting factor that is dependent on network members in a small geographic area.

Contrast this with the spread of ideas and innovations in our twenty-first century world. Early adopters of the latest Apple or Samsung product might quickly become critical to the diffusion pattern of the product, with glowing reviews likely to lead to an accelerated adoption of the product, while a spate of negative reviews might well serve to limit the diffusion process. Similarly, ideas can also be spread rapidly via the Web, social media, texting, or other means of inexpensive, rapid communication.

You might have detected a physical element to diffusion that is not always addressed. Products or services that are inexpensive, or lightweight and easily shipped, are far more likely to have explosive diffusion patterns than large physical entities that are either immobile, expensive, or both. At its most extreme, think of the rapid spread of a simple video (probably involving cats) across the Web. The cost is low, access to the video is high, and thus diffusion is widespread and rapid, assuming the video is well-received.

In contrast, think of a spectacular new audio component (perhaps a revolutionary new speaker design) that has been developed in Boston and is available in just five high-end audio specialists at the outset. Perhaps all of the audiophiles in your personal network are very excited by this new product and wish to own it someday. The reviews might be spectacular, the desire to own the product is high, but the price, limited availability, and high cost of delivery beyond the Boston market will quickly restrict the diffusion of the product, at least until a comprehensive distribution system becomes available. Even then, the spread of the product will pale in comparison to the aforementioned cat video, where the cost of ownership is effectively zero, and where the distribution network (YouTube, Facebook, and so on) is already constructed.

We have thus established that the rapid diffusion of a product or service is not necessarily related to its quality, even in cases where an underlying peer network exists. Better products and innovations might well lose out to inferior ones, largely dependent on the ability of each product to access appropriate levels within a network and then to spread from there.

Now, let's take a brief look at what diffusion process might look like in theory, and then later in the chapter we'll examine this process using Gephi. Here's a case where we have the following criteria:

  • Node B purchases a new product.
  • Node G is considering purchasing the product, but needs to reach an aggregate threshold of at least 0.6 before making the purchase for himself/herself.
  • Each number on the chart indicates a relative influence level one node has on another. So we can see that G is influenced by B, D, and F, who are themselves subjected to influences from other network members.

Here's our network:

Diffusion

A simple diffusion example

Notice that based on the threshold level of 0.6, it is not enough for G to see that B owns the product. He also needs to know that either D or F also has purchased the item before feeling confident enough to also make the purchase. So G is dependent on the decisions of D and F as well as B. At least two of the three must own the product before G will elect to buy.

While this clearly represents a simplification of the diffusion process, it does illustrate an important concept that is crucial to both product adoption and more generally to the spread of information. We will return to this concept by building some examples in Gephi in the Viewing Network Diffusion section of this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.189.228