© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
R. Li, A. NakanoSimulation with Pythonhttps://doi.org/10.1007/978-1-4842-8185-7_9

9. Misinformation Spreading and Simulations on a Graph

Rongpeng Li1   and Aiichiro Nakano1
(1)
Los Angeles, CA, USA
 

Misinformation has never been so deadly in the age of COVID-19. On social networks like Twitter and Facebook, conspiracy theories about the origin of the virus, the treatment of the virus, and vaccination are rampaging. They often spread in communities and circles and echo with other conspiracy theories about the US presidential elections. Such misinformation and conspiracy theories reinforce each other and form a waterproof echo chamber.

Misinformation and disinformation are quite similar. The difference is that disinformation is false information that is spread deliberately to deceive. In other words, disinformation is wrong on purpose. Misinformation is a super set of disinformation. The spreader of misinformation may be unaware of the incorrectness or harmfulness of the wrong information. In this chapter, we will study how misinformation spreads with a network/graph setting.

Model the Social Network

We have studied some network-like systems earlier like the state transition of the Markov model and the forest fire spreading model. In the Markov model of poem generation, we treat each word as a node, and each node has a probability to be followed by another one, thus forming a network of words with directional edges.

In the forest fire spreading model, each tree is neighbored by four nearest trees on a square grid. Fire can spread from one tree to another with a predefined probability. The network is somewhat homogeneous because they all have four neighbors, except the boundary ones, and no tree is special than another in any way.

In this chapter, we will study general networks that can represent arbitrary connectivity of a community. Let’s first introduce some terminologies.

So far, I have been using network and graph interchangeably in this chapter. Their definition differences are usually subject to the underlined scientific domains. For example, sometimes people use links instead of edges in networks, but they essentially represent the same thing. For consistency with other literature, I will use graphs for the following content.

A graph is consistent with nodes and edges. For example, Figure 9-1 has five nodes and six edges.

Five nodes are depicted graphically in a rectangular shape, connected by lines. A parallelogram formed by four dots and a fifth dot diagonally across from it was created.

Figure 9-1

A simple graph with five nodes and six edges

The graph in Figure 9-1 is also a connected graph because you can reach any node to any other by traversing the edges. This is not always true. The graph in Figure 9-2 is a disconnected graph.

A diagram of nodes. The nodes are randomly distributed within a rectangular box. Colors differentiate the nodes, which are denoted by numbers 0 to 7. Lines connecting nodes 5 and 0, 6 and 2, 4 and 7 form three pairs.

Figure 9-2

A disconnected graph example

This graph has five disconnected clusters (also called communities). They are not connected through edges.

A common misconception is the definition of subgraph. Any subset of nodes and edges from one graph will form a subgraph. Those nodes and edges may not be visually forming a cluster. For example, in the five-cluster graph earlier, node 1 forms a single-node subgraph, while nodes 1, 2, 3, and 6 together with the edge between 2 and 6 also form a four-node subgraph, although a weird one in three visually disconnected clusters.

A graph’s edge can also be directional, which makes a graph directional. For example, the Twitter following relationship is a directional relationship. You may follow a super star, but the super star is not likely to follow you back. If two Twitter accounts follow each other, then there must be two edges between them to represent such a relationship: one from one to the other and the other way around. We call such a graph which allows multiple edges between nodes a multigraph. Oftentimes, multigraphs represent directional relationships but not always.

Figure 9-3 is a directional multigraph. The edges are directional, and multiple edges are allowed. For example, there are two edges between nodes 2 and 3. Another interesting edge is the loop edge that points to the node itself for nodes 0, 1, and 2. With proper modeling, a loop can represent a tweet’s self-retweet or similar actions. Figure 9-3 is a directed graph example.

Six nodes are graphically represented inside a rectangular box. The nodes are highlighted and denoted by numbers 0 to 6. Nodes 0 and 1 are connected to each other, and nodes 1, 2, and 2, 3, and 4 are linked together. Arrow lines connect each node.

Figure 9-3

A directed graph with self-loops

Nodes and edges can also have attributes. For example, nodes can have weights that indicate how influential they are in the graph, while edges’ weights can represent how strong the bondings are between two nodes. To model a social network, we need to assign certain attributes to graph elements, either intrinsic or calculated.

Let’s define our social network, specifically the Twitter ecosystem, using a graph. Each node represents an account, and each edge represents a following relationship. Information can therefore flow from the followee to the follower through posting tweets.

Think about the properties of a Twitter account. The most important indicator of an account’s influence is the number of followers; in our case, it is exactly the number of outgoing edges. In graph theory, it has a name: out-degree, the number of outgoing edges. Similarly, the in-degree counts the number of incoming edges. The larger the out-degree is, the more accounts on the social network can be potentially influenced.

We want to also borrow some concepts from the previous chapter. We will partition the nodes into three categories, besides the sources of misinformation, like in the susceptible, infected, and recovered model. The susceptible are accounts who haven’t been exposed to misinformation; they have a probability to be infected if the people they follow are the sources of the misinformation or infected. The infected nodes are like the infected patients in the disease spreading simulation that they actively influence their followers by retweeting and sharing misinformation. The recovered nodes are accounts who either intrinsically resist, R for both resistance and recovered, misinformation or recover from misinformation pollution.

In summary, we use the properties in Table 9-1 to characterize a node on the graph.
Table 9-1

Properties of nodes/accounts in a social network

Name

Description

Possible Values

Changeable

State

The state of the node

“Source,” “Susceptible,” “Infected,” or “Recovered”

Yes

Influence

How influential the node is, measured by the out-degree

A nonnegative integer

No*

Resistance

The resistance of an account against misinformation

A value between 0 and 1

Yes

The influence power is not changeable because our social network is not dynamic. If we allow the follower-followee relationship to change during the simulation, which is much more realistic, then the influence power of a node will change if fewer and fewer accounts follow it.

Now, we have all the static settings laid out. The next step is to define the mechanism that governs the simulation.

Simulate Misinformation Spreading

A simulation starts with a few nodes, likely malicious, beginning spreading misinformation. Their susceptible followers will respond to the misinformation and react. The followers may become infected or stay susceptible. If a follower becomes susceptible, the follower’s followers will also be exposed to misinformation and so on and so forth.

At any time, a susceptible node has a chance to recover.

The probabilities in the simulation are also dynamic. For example, if a node is exposed to multiple sources of misinformation, then the chance that they become infected can become very high, while the chance of recovering can be strongly suppressed.

Simple Cases

Let’s start with a simple case, in Figure 9-4, with only five nodes to test the accuracy of our code. The graph looks like Figure 9-4. Note that nodes 0 and 1 are following each other.

A graphical representation of five nodes comprised within a rectangular box. The nodes are denoted by the numbers 0 to 4 and are highlighted. Arrow lines connect nodes 3, 4, 0, and 1 to form a parallelogram. Its diagonal is formed by an arrow from node 4 to node 1. Node 2 can be found at the bottom, connected to node 3.

Figure 9-4

A directed graph with five nodes

The following is the code to generate the graph:
# import networkx as nx
# import matplotlib.pyplot as plt
fig, ax = plt.subplots()
sg = nx.fast_gnp_random_graph(n=5, p=0.2, seed = 3, directed = True)
pos=nx.spring_layout(sg,seed=5)
nx.draw_networkx_nodes(sg, pos, ax=ax)
nx.draw_networkx_labels(sg, pos, ax=ax, font_weight='bold')
nx.draw_networkx_edges(sg, pos, ax=ax, edgelist= sg.edges());

Networkx is a powerful Python library to manipulate graphs. Its syntax is quite straightforward. The only non-intuitive method in the preceding code is fast_gnp_random_graph. It is a built-in graph generator that, in this example, generates 5 nodes and arbitrarily connects every pair with a probability of 20%. You can choose to generate a directed graph and set the random seed for reproducibility as well. I choose the name sg because it is indeed a small graph.

We can check the nodes and edges of the graph by running the following:
sg.nodes, sg.edges
The results are two nicely ordered iterables:
 (NodeView((0, 1, 2, 3, 4)),
 OutEdgeView([(0, 1), (1, 0), (1, 3), (2, 3), (3, 4), (4, 0), (4, 1)]))
We can also check the neighbors of a specific node:
list(sg.neighbors(1))

For node 1, the code returns nodes 0 and 3. Node 4 is not returned because the neighbors method only returns the successors. In the social network context, it means we are only viewing the followers of node 1.

To see all neighbors including the followee, we use
list(nx.all_neighbors(sg,1))

This returns [0, 4, 0, 3]; we have duplicates because the relationship between 0 and 1 is bidirectional. We can easily use a set operation to deduplicate it.

One last piece of networkx knowledge is the usage of attributes. Each node and edge in a graph can have attributes. For example, the following code will assign a value to node 1 with an attribute name attr1_1. This can be very handy to update attributes of the social network accounts.
sg.nodes[1]["attr_1"] = "val_1"
First, let’s define some helpful data structures:
class State(Enum):
    SOURCE = 0
    SUSCEPTIBLE = 1
    INFECTED = 2
    RECOVERED = 3
STATE2COLOR = {
    State.SOURCE: "red",
    State.SUSCEPTIBLE: "grey",
    State.INFECTED: "orange",
    State.RECOVERED: "green"
}
We can now initialize the attributes of our simple five-node graph as follows:
import numpy as np
np.random.seed(1)
for node in sg.nodes:
    sg.nodes[node]["influence"] = len(list(sg.neighbors(node)))
    if node == 4:
        sg.nodes[node]["state"] = State.SOURCE
        sg.nodes[node]["resistance"] = 0
    else:
        sg.nodes[node]["state"] = State.SUSCEPTIBLE
        sg.nodes[node]["resistance"] = np.random.random()
We can now plot the graph with color indicating the initial state of the accounts. As the dictionary STATE2COLOR denotes, red means the source of the misinformation. Figure 9-5 indicates the source of misinformation.

Five nodes are depicted graphically inside a rectangular box. The nodes are represented by the numbers 0 through 4, with node 4 highlighted. Arrow lines connect nodes 3, 4, 0 and 1, forming a parallelogram. The diagonal is formed by an arrow connecting nodes 4 and 1. Node 2 can be seen at the bottom, connected to node 3.

Figure 9-5

Source of misinformation is in red

To simulate the simultaneous states updating in one step, we need to
  1. 1.

    Update the states of each node and save the state in a copy of the graph

     
  2. 2.

    Copy the states to the original graph

     
  3. 3.

    Repeat steps 1 and 2

     
Here is the code skeleton for each iteration:
for _ in range(5):
    sg_copy = sg.copy()
    for node in sg.nodes:
        update_state(sg, sg_copy, node)
    # copy state
    for node in sg.nodes:
        sg.nodes[node]["state"] = sg_copy.nodes[node]["state"]
All the property updating details are in function update_state. The logics can be extended further, but here is the first version:
def update_state(sg, sg_copy, node):
    # update states in sg_copy to achieve simultaneous updates
    successors = set(sg.neighbors(node))
    predecessors = set(nx.all_neighbors(sg,node)) - successors
    state = sg.nodes[node]["state"]
    if state == State.SOURCE:
        return
    elif state == State.RECOVERED:
        if sg.nodes[node]["resistance"] > np.random.random():
            sg.nodes[node]["resistance"] = min(sg.nodes[node]["resistance"]*2,sg.nodes[node]["resistance"] + np.random.random(), 1)
        else:
            sg_copy.nodes[node][state] = State.SUSCEPTIBLE
    elif state == State.SUSCEPTIBLE:
        source_influenced = State.SOURCE in [sg_copy.nodes[pre]["state"] for pre in predecessors]
        infected_influenced = State.INFECTED in [sg_copy.nodes[pre]["state"] for pre in predecessors]
        if source_influenced or infected_influenced:
            if sg.nodes[node]["resistance"] < np.random.random():
                sg_copy.nodes[node]["state"] = State.INFECTED
    elif state == State.INFECTED:
        # infected has a chance to become recovered
        if sg.nodes[node]["resistance"] > np.random.random():
            sg_copy.nodes[node]["state"] = State.RECOVERED
        else:
            sg.nodes[node]["resistance"] = max(sg.nodes[node]["resistance"]/2, sg.nodes[node]["resistance"] - np.random.random())
    else:
        print("Unsupported state, exit.")

Our logic is based on the observation of accounts’ behaviors on social media. The source of the information is always trying to influence the followers. A susceptible person will be infected if their resistance is smaller than a random number.

For an infected person, there is a chance to become recovered by comparing against a random number, but if it fails, the resistance will be halved or reduced by a random number, whichever is larger.

For a recovered person, the resistance will be increased, up to 100% immune, if the resistance is greater than a random number in each round; otherwise, the person will fall back to the susceptible domain.

We already see the initial state, but let’s also take note of the resistance numbers:
{node: sg.nodes[node]["resistance"] for node in sg.nodes}
{0: 0.417022004702574,
 1: 0.7203244934421581,
 2: 0.00011437481734488664,
 3: 0.30233257263183977,
 4: None}
Next, let’s explore the system step by step. Depending on the random number seeds, your results may look different from mine. After four iterations, my network looks like the one in Figure 9-6. The resistance values remain unchanged, but account 0 bought the story pushed by account 4. Figure 9-6 shows the updates of states.

A graphical representation of five nodes inside a rectangular box. The nodes are denoted by numerics 0 to 4, and nodes 4 and 0 are highlighted by different colors. Lines of arrows connect nodes 3, 4, 0, and 1 which forms a parallelogram. An arrow from nodes 4 to 1 forms its diagonal. Node 2 is observed at the bottom linked to node 3.

Figure 9-6

Account 0 is contaminated with misinformation

Just after another iteration, my graph changes to the following. Account 1, with a relatively high resistance, bought the story, as shown in Figure 9-7.

Five nodes are graphically represented within a rectangular box. The nodes are represented by the numbers 0 to 4, with nodes 4, 1, and 0 highlighted in different colours. Arrow lines connect nodes 3, 4, 0, and 1, forming a parallelogram. Its diagonal is formed by an arrow connecting nodes 4 and 1. Node 2 is seen at the bottom, connected to node 3.

Figure 9-7

Account 0 recovered, while account 1 bought the story

After two more iterations, things change again. Figure 9-8 shows that none of account 4’s followers believe in it.

A graphical representation of five nodes inside a rectangular box. The nodes are denoted by numerics 0 to 4. Nodes 0 and 1, are differentiated from node 4 by different colors. Lines of arrows connect nodes 3, 4, 0, and 1 which forms a parallelogram. An arrow from nodes 4 to 1 forms its diagonal. Node 2 is observed at the bottom linked to node 3.

Figure 9-8

Both accounts 0 and 1 are immune to account 4’s misinformation

The resistance data shows that account 1 is not only recovered but also highly unlikely to buy the misinformation spreader’s story anymore.
{0: 0.417022004702574,
 1: 0.9184259825270369,
 2: 0.00011437481734488664,
 3: 0.30233257263183977,
 4: None}

Such states are likely going to last forever as account 4 has no approach to reach account 3, unless ads are available. We say that our system is stable.

Let’s try a bigger system with ten accounts and a different topology. In this topology, there are much more bidirectional connections which can represent a small community, like a family group or a local community. This topology is represented in Figure 9-9.

Ten nodes are graphically represented. The nodes are represented by the numbers 0 to 9 and are distributed at random within a rectangular box. Each node is connected by arrow lines. The nodes 4 and 6 have been highlighted.

Figure 9-9

A graph with more bidirected relationships

I randomly pick accounts 4 and 6 as the sources. After one iteration, two other accounts are infected as shown in Figure 9-10.

A graphical representation of ten nodes. The nodes are denoted by the numbers 0 to 9 and are distributed at random within a rectangular box. Arrow lines connect each node. Nodes 4, 6, and 5 are highlighted and differentiated by different hues.

Figure 9-10

Two followers of the misinformation source got infected

After four iterations, account 9 is recovered, while account 3 is infected. Figure 9-11 shows the evolution.

A graphical representation of ten nodes. The nodes are denoted by numerics 0 to 9 and are distributed randomly inside a rectangular box. Lines of arrows connect each node. Nodes 4, 6, and 3, 5 are highlighted and differentiated from node 9 by different colors.

Figure 9-11

Account 9 recovered, while account 3 got infected

Continuing the simulation, the stable state of my system looks as in Figure 9-12.

10 nodes are graphically represented. The nodes are indicated by the numbers 0 to 9 and are dispersed at random inside a box that is rectangular in shape. Each node is connected by arrow lines. Different colours are used to emphasise and distinguish nodes 4, 6, 1, 3, and 9 from node 5.

Figure 9-12

The stable state of the first simulation

Note that simply changing the random seed initiator can significantly change the final stable state of the graph. Another stable state is presented in Figure 9-13.

The ten nodes are displayed graphically. A rectangular box has a random distribution of the nodes, which are represented by the numbers 0 to 9. Every node is joined by arrow lines. Different coloured highlights and lines separate nodes 4, 6, 0, 1, 3, and 5 from node 9 respectively.

Figure 9-13

Another stable state with a different random number seed

Alrighty, let’s move on to much bigger graphs.

Misinformation Spreading on Different Networks

The preceding graphs don’t actually capture the essence of online social networks. First, they are too small. Second, they don’t exhibit the most significant properties of social media networks: only a small portion of accounts have the majority of the followers.

To understand it, let’s plot the distribution of the numbers of followers in the network. We need to create a bigger graph using the fast_gnp_random_graph() function:
fig, ax = plt.subplots()
bg = nx.fast_gnp_random_graph(n=1000, p=0.15, seed = 1, directed = True)
degree_sequence = sorted((d for n, d in bg.out_degree()), reverse=True)
ax.bar(*np.unique(degree_sequence, return_counts=True))
ax.set_title("Degree histogram")
ax.set_xlabel("Degree")
ax.set_ylabel("# of Nodes");
The result is presented in Figure 9-14.

The hyphen degree is displayed on a histogram plot with a horizontal axis that spans from 110 to 190 in increments of 10. The nodes' hash is displayed on the vertical axis, which runs from 0 to 40 in increments of 5. The hyphen degree histogram is displayed at the top of the graph. With 38 hash of nodes, the largest peak is shown at two out-hyphen degree levels of 147 and 153

Figure 9-14

The distribution of the out-degrees of a random graph

As you can see, the network doesn’t have a single, or a handful of, strong influencers. The average number of followers is about 150, and the distribution is almost symmetric. This is not true in real social networks. People don’t follow each other randomly with a probability of 15%.

We need to use another random graph generator, the scale_free_graph() method. It has many parameters, but we will take the default arguments for simplicity.

Let’s take a quick comparison of the ten-node graphs generated by fast_gnp_random_graph() and scale_free_graph():
fig, axes = plt.subplots(1,2, figsize=(15,6))
G = nx.fast_gnp_random_graph(n=10, p =0.2, seed = 1)
pos=nx.circular_layout(G)
nx.draw(G, with_labels=True, font_weight='bold', pos = pos, ax = axes[0])
G = nx.scale_free_graph(10)
pos=nx.circular_layout(G)
nx.draw(G, with_labels=True, font_weight='bold', pos = pos, ax = axes[1])
The result looks like the one in Figure 9-15. The one on the left is much more centric visually. Well, account 0 follows many other accounts, not the other way around though. The result is presented in Figure 9-15.

A pair of graphical representations of 10 nodes. In both the illustrations, the nodes are placed in circular representation and denoted by numerics 0 to 9. In image a, node 5 is suspended and the remaining nodes are connected by lines. In image b, lines of arrows connect each node, all the nodes are connected to node 0. Node 1 is connected to itself.

Figure 9-15

A visual comparison of a random graph and a scale-free graph

Let’s create the same out-degree distribution visualization for a 1000-node scale-free graph. The result is presented in Figure 9-16.

A histogram plot with a horizontal axis reads out hyphen degree and ranges from 0 to 80 in increments of 20. The vertical axis reads the hash of nodes and ranges from 0 to 600 in increments of 100. The top of the graph reads out hyphen degree histogram. The highest peak is observed at out hyphen degree level of 2 with 580 hash of nodes.

Figure 9-16

Out-degree distribution for a scale-free graph

As you can see, there are accounts with more than 80 followers, but the majority of the accounts have only 1 or 2 followers. By controlling the parameters of the scale_free_graph function, you can control the mechanism of the graph generation. The algorithm for generating a scale-free graph is to continuously add new nodes to an existing graph according to a set of preferences. I changed the default parameters to alter the out-degree distribution. For example:
fig, ax = plt.subplots()
g = nx.scale_free_graph(1000, alpha = 0.6, beta = 0.39, gamma = 0.01, seed = 0)
degree_sequence = sorted((d for n, d in g.out_degree()), reverse=True)
ax.bar(*np.unique(degree_sequence, return_counts=True))
ax.set_title("Out-degree histogram")
ax.set_xlabel("Out-degree")
ax.set_ylabel("# of Nodes");
The preceding code gives me a more equal world that the biggest influencer in the community is not that influential. The largest out-degree in Figure 9-17 is around 30, not 80 as in Figure 9-16.

A histogram plot with a horizontal axis that reads "hyphen degree" and ranges from 0 to 30 in steps of 5, displays this information. The nodes' hash is displayed on the vertical axis, which runs in 100-point increments from 0 to 00. It says "hyphen degree histogram" at the top of the graph. The largest peak is seen at our hyphen degree level 1, where there are more than 700 hash nodes.

Figure 9-17

A scale-free graph with a smaller maximum out-degree

The name scale-free comes from the fact that if the network is large enough and you can zoom in to a local small subgraph, you will identify the similar properties and metrics.

We care about the two different cases because misinformation in the second case can spread much faster and dangerously. To quantify that, we need a function to aggregate the numbers of accounts in different state first.
from collections import Counter
def count_states(g):
    states = [g.nodes[node]["state"] for node in g.nodes]
    return Counter(states)

Alrighty, let’s track the spreading of misinformation for both types of graphs. We will randomly generate graphs with 1000 nodes for each case and check how long it takes to reach the stable states. We will also examine the ratios of infected accounts when the stable states are reached.

We will start by setting the top two influential accounts as the sources of misinformation. Note that we can also control the p parameter in the fast_gnp_random_graph function to control the edge density. It is a crucial parameter because if p is very large, then every node is essentially connected with any other node. The misinformation reaches everyone in step one. We will see how it affects the simulation in detail.

First, let me bundle our earlier code into several functions:
def initialize(g, top_k = 5):
    tops = sorted(( (n, d) for n, d in g.out_degree()), reverse=True,
                  key = lambda pair: pair[1])[:top_k]
    for node in g.nodes:
        g.nodes[node]["influence"] = len(list(g.neighbors(node)))
        if node in [pair[0] for pair in tops]:
            g.nodes[node]["state"] = State.SOURCE
            g.nodes[node]["resistance"] = None
        else:
            g.nodes[node]["state"] = State.SUSCEPTIBLE
            g.nodes[node]["resistance"] = np.random.random()
def simulate(g, steps = 100, top_k = 5):
    initialize(g, top_k)
    res = []
    for _ in range(steps):
        g_copy = g.copy()
        for node in g.nodes:
            update_state(g, g_copy, node)
        # copy state
        for node in g.nodes:
            g.nodes[node]["state"] = g_copy.nodes[node]["state"]
        res.append(count_states(g))
    return res
def visualize(res):
    steps = range(1, len(res) + 1)
    susceptible = [r.get(State.SUSCEPTIBLE,0) for r in res]
    recovered = [r.get(State.RECOVERED,0) for r in res]
    infected = [r.get(State.INFECTED,0) for r in res]
    fig, ax = plt.subplots()
    ax.plot(steps, susceptible, label="susceptible")
    ax.plot(steps, recovered, label="recovered")
    ax.plot(steps, infected, label="infected")
    plt.legend()

The initialize method assigns the SOURCE state to the top k most influential accounts in the graph. Other codes are quite straightforward.

Now, let’s run the simulation for 20 steps and collect the statistics. For a normal graph, we use the following code to simulate it.
g_normal = nx.fast_gnp_random_graph(n=1000, p=0.002, directed = True)
res = simulate(g_normal, steps = 20,  top_k = 5)
visualize(res)
The result shown in Figure 9-18 indicates that the system does tend to reach a stable state fairly quickly, although only 0.2% of all possible edges exist. Go ahead and do the calculation that this is actually a quite large number.

A graph illustrates a vertical axis range of 0 to 1000 and a horizontal axis range of 0 to 20.0. Three curve lines are plotted. The susceptible curve has a final point that is constant and has a decreasing slope. From the origin, recovered and infected curves are arranged one below the other and resemble growing slopes with constant slopes at their endpoints.

Figure 9-18

The evolution of populations in a 1000-node random graph

How about the scale-free graph? Our expectation is that the misinformation spreading should be faster. However, Figure 9-19 doesn’t say so.
g_scale_free = nx.scale_free_graph(1000, alpha = 0.5, beta = 0.1, gamma = 0.4, delta_out = 0.9)
res = simulate(g_scale_free, steps = 20,  top_k = 5)
visualize(res)

A graph. The horizontal axis ranges from 0 to 20.0 and the vertical axis ranges from 0 to 800. It plots three lines of curves. The susceptible curve initiates at the left top and decreases to form a straight line. The infected and the recovered curve plots one below the other and form a straight line with a small increase at their origin.

Figure 9-19

The evolution of populations in a 1000-node scale-free graph

Clearly, something is off. The system seems stuck in less than five steps. Why? Let’s take a look at the most influential account’s neighbors:
tops = sorted(((n, d) for n, d in g_scale_free.out_degree()), reverse=True,
                  key = lambda pair: pair[1])[:5]
In my case, account 2 has the most followers:
[(2, 15), (18, 14), (6, 12), (11, 12), (32, 10)]
Let’s examine the neighbors of account 2. We are mostly interested in the number of their followers:
[g_scale_free.out_degree()[k] for k in list(nx.neighbors(g_scale_free,2))]
It looks like the majority of them don’t have any followers!
[2, 0, 0, 0, 1, 3, 1, 0, 9, 0, 0, 0, 0, 0, 0]

This is why with a naively created directed scale-free graph, the misinformation spreading is kind of limited to a small neighborhood of the sources.

An easy way to fix it is to use an undirected graph. Or significantly increase the top players’ influential power in a much larger graph. Unfortunately, the second option crashes my laptop. Please try the first approach on your own.

Lastly, let’s check how the p parameter in the fast_gnp_random_graph method influences the speed of misinformation spreading.

To do that, we need to slightly modify the simulate function and make one approximation. When the counting of different states is not changing in the last two iterations, we assume that the system reaches a stable state. Of course, there is a possibility that the situation will change, but that’s not very predictable due to random number generation.

The code looks like the following:
def simulate(g, steps = 100, top_k = 5):
    initialize(g, top_k)
    res = []
    for _ in range(steps):
        g_copy = g.copy()
        for node in g.nodes:
            update_state(g, g_copy, node)
        # copy state
        for node in g.nodes:
            g.nodes[node]["state"] = g_copy.nodes[node]["state"]
        if len(res) > 1 and res[-2] == res[-1] == count_states(g):
            return res
        res.append(count_states(g))
    return res
for p in [0.001,0.002, 0.004, 0.01, 0.02, 0.04, 0.08, 0.1, 0.2, 0.4]:
    for _ in range(1000):
        len_res = []
        infected_rate = []
        recovered_rate = []
        g_normal = nx.fast_gnp_random_graph(n=100, p=p, directed = True)
        res = simulate(g_normal, steps = 1000,  top_k = 5)
        len_res.append(len(res))
        infected_rate.append(res[-1][State.INFECTED]/100)
        recovered_rate.append(res[-1][State.RECOVERED]/100)
    print(p, np.mean(len_res), np.mean(infected_rate), np.mean(recovered_rate))
The result is quite interesting as follows. I am just going to paste the number here:
0.001 3.0 0.02 0.02
0.002 4.0 0.03 0.05
0.004 5.0 0.04 0.04
0.01 16.0 0.21 0.12
0.02 10.0 0.2 0.29
0.04 19.0 0.31 0.54
0.08 11.0 0.23 0.64
0.1 17.0 0.4 0.51
0.2 11.0 0.38 0.47
0.4 13.0 0.28 0.6

At the beginning, because of the bad connectivity, the simulations stop quite early, and only a small portion of accounts see the misinformation and recover from it. As more and more edges are added to the graph, the stable state takes a longer time to reach with a generally larger size of infected community and recovered community as well.

Exercise

  1. 1.

    Use undirected scale-free graphs to redo the simulation. Check the built-in networkx graph generators to choose the right one.

     
  2. 2.

    Each account has an influence property, which should influence how likely its followers are to accept the misinformation. Introduce a mechanism to address such behavior.

     

Summary

In this chapter, we discussed another important type of simulation: the simulation on a graph data structure. Specifically, we studied the simulation of misinformation spreading on social media. We introduced basic concepts of graphs and quantified properties to describe graph elements. On top of that, we ran the simulations on different types of graphs and interpreted the behaviors.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.122.4