Chapter 2. Markov Network Fundamentals

In the previous chapter, we saw how we can represent a joint probability distribution (JPD) using a directed graph and a set of conditional probability distributions (CPDs). However, it's not always possible to capture the independencies of a distribution using a Bayesian model. In this chapter, we will introduce undirected models, also known as Markov networks. We generally use Markov networks when we can't naturally define directionality in the interaction between random variables.

In this chapter, we will cover:

  • The basics of factors and their operations
  • The Markov model and Gibbs distribution
  • The factor graph
  • Independencies in the Markov model
  • Conversion of the Bayesian model to the Markov model and vice versa
  • Chordal graphs and triangulation heuristics

Introducing the Markov network

Let's take an example of four people who go out for dinner in different groups of two. A goes out with B, B goes out with C, C with D, and D with A. Due to some reasons (maybe due to a bad relationship), B doesn't want to go with D, and the same holds true for A and C. Let's think about the probability of them ordering food of the same cuisine. From our social experience, we know that people interacting with each other may influence each other's choice of food. In general, we can say that if A influences B's choice and B influences C's, then A might (as it is probabilistic) indirectly be influencing C's choice. However, given B's and D's choices, we can say with confidence that A won't affect C's choice of food. Formally, we can put this as Introducing the Markov network. Similarly, Introducing the Markov network as there is no direct interaction between A and C nor between B and D.

Let's try to model these independencies using a Bayesian network:

Introducing the Markov network

In the preceding figure, the one labeled Fig 2.1(a) is the Bayesian network representing Introducing the Markov network, whereas the one labeled Fig 2.1(b) is the Bayesian network representing Introducing the Markov network

The first Bayesian network, Fig 2.1(a), satisfied the first independence assertion, that is, Introducing the Markov network, but it couldn't satisfy the second one. Similarly, the second Bayesian network, Fig 2.1(b), satisfied the independence assertion Introducing the Markov network, but not the other one. Thus, neither of them is an I-Map for the distribution. Hence, we see that directed models have a limitation and there are conditions that we are unable to represent using directed models.

Introducing the Markov network

Fig 2.2 Undirected graphical model encoding independencies Introducing the Markov network and Introducing the Markov network

To correctly represent these independencies, we require an undirected model, also known as a Markov network. These are similar to the Bayesian network, in the sense that we represent all the random variables in the form of nodes, but we represent the dependencies or interaction between these random variables with an undirected edge. Before we go into the representation of these models, we need to think about the parameterization of these models. In the Bayesian network, we had a CPD Introducing the Markov network associated with every node Introducing the Markov network. As we don't have any directional influence or a parent-children relationship in the case of the Markov network, instead of using CPDs, we use a more symmetric representation called factor, which basically represents how likely it is for some states of a variable to agree with the states of other variables.

Parameterizing a Markov network – factor

Formally, a factor Parameterizing a Markov network – factor of a set of random variables X is defined as a function mapping values of X to some real number Parameterizing a Markov network – factor:

Parameterizing a Markov network – factor

Unlike CPDs, there is no notion of directionality or causal relationship among random variables in factors. Factors help in symmetric parameterization of random variables. As the values in a factor don't represent the probability, they are not constrained to sum up to 1 or to be in the range [0,1]. In general, they represent the similarities (or, sometimes, compatibility) among the random variables. Therefore, the higher the value of a combination of states, the greater the compatibility for those states of variables. For example, if we say that two binary random variables A and B are likely to be in the same state rather than different states, we can have a factor where Parameterizing a Markov network – factor, Parameterizing a Markov network – factor, Parameterizing a Markov network – factor, and Parameterizing a Markov network – factor. This situation can be represented by a factor as follows:

A

B

Parameterizing a Markov network – factor

Parameterizing a Markov network – factor

Parameterizing a Markov network – factor

1000

Parameterizing a Markov network – factor

Parameterizing a Markov network – factor

1

Parameterizing a Markov network – factor

Parameterizing a Markov network – factor

5

Parameterizing a Markov network – factor

Parameterizing a Markov network – factor

100

We also define the scope of a factor to be the set of random variables over which it is defined. For example, the scope of the preceding factor is {A, B}.

In pgmpy, we can define factors in the following way:

# Firstly we need to import Factor
In [1]: from pgmpy.factors import Factor

# Each factor is represented by its scope,
# cardinality of each variable in the scope and their values
In [2]: phi = Factor(['A', 'B'], [2, 2], [1000, 1, 5, 100])

Note

In pgmpy, the order in which variables are passed to the factor also has significance. The entries in the factors assume that the random variables on the right change more frequently than the ones present on left (as represented in the previous example).

Now let's try printing a factor:

In [3]: print(phi)

a

b

phi(A,B)

A_0

B_0

1000.0000

A_0

B_1

1.0000

A_1

B_0

5.0000

A_1

B_1

100.0000

Factors subsume the notion of CPD. So, in pgmpy, CPD classes such as TabularCPD, TreeCPD, and RuleCPD are derived from the Factor class.

Factor operations

There are numerous mathematical operations on factors; the major ones are marginalization, reduction, and product.

The marginalization of a factor is similar to its probabilistic marginalization. If we marginalize a factor Factor operations whose scope is W with respect to a set of random variables X, it means to sum out all the entries of X, to reduce its scope to Factor operations. Here's an example for marginalizing a factor:

# In the preceding example phi, let's try to marginalize it with 
# respect to B
In [4]: phi_marginalized = phi.marginalize('B', inplace=False)
In [5]: phi_marginalized.scope()
Out[6]: ['A']
# If inplace=True (default), it would modify the original factor 
# instead of returning a new one
In [7]: phi.marginalize('A')
In [8]: print(phi)
╒═════╤═══════════=╕
│ B   │    phi(B)  │
╞═════╪═══════════=╡
│ B_0 │ 1005.0000  │
├─────┼───────────-┤
│ B_1 │  101.0000  │
╘═════╧═══════════=╛

In [9]: phi.scope()
Out[9]: ['B']

# A factor can be also marginalized with respect to more than one # random variable
In [10]: price = Factor(['price', 'quality', 'location'], 
                        [2, 2, 2], 
                        np.arange(8))
In [11]: price_marginalized = price.marginalize(
                                   ['quality', 'location'], 
                                   inplace=False)
In [12]: price_marginalized.scope()
Out[12]: ['price']
In [13]: print(price_marginalized)
╒═════════╤══════════════╕
│ price   │   phi(price) │
╞═════════╪══════════════╡
│ price_0 │       6.0000 │
├─────────┼──────────────┤
│ price_1 │      22.0000 │
╘═════════╧══════════════╛

Reduction of a factor Factor operations whose scope is W to the context Factor operations means removing all the entries from the factor where Factor operations. This reduces the scope to Factor operations, as Factor operations no longer depends on X.

# In the preceding example phi, let's try to reduce to the context 
# of b_0
In [14]: phi = Factor(['a', 'b'], [2, 2], [1000, 1, 5, 100])
In [15]: phi_reduced = phi.reduce(('b', 0), inplace=False)
In [16]: print(phi_reduced)
╒═════╤═══════════╕
│ a   │    phi(a) │
╞═════╪═══════════╡
│ a_0 │ 1000.0000 │
├─────┼───────────┤
│ a_1 │    5.0000 │
╘═════╧═══════════╛

In [17]: phi_reduced.scope()
Out[17]: ['a']

# If inplace=True (default), it would modify the original factor
# instead of returning a new object.
In [18]: phi.reduce(('a', 1))
In [19]: print(phi)
╒═════╤══════════╕
│ b   │   phi(b) │
╞═════╪══════════╡
│ b_0 │   5.0000 │
├─────┼──────────┤
│ b_1 │ 100.0000 │
╘═════╧══════════╛
In [20]: phi.scope()
Out[20]: ['b']

# A factor can be also reduced with respect to more than one
# random variable
In [21]: price_reduced = price.reduce(
                       [('quality', 0), ('location', 1)], 
                       inplace=False)
In [22]: price_reduced.scope()
Out[22]: ['price']

The term factor product refers to the product of factors Factor operations with a scope X and Factor operations with scope Y to produce a factor Factor operations with a scope Factor operations:

In [23]: phi1 = Factor(['a', 'b'], [2, 2], [1000, 1, 5, 100])
In [24]: phi2 = Factor(['b', 'c'], [2, 3],
                       [1, 100, 5, 200, 3, 1000])
# Factors product can be accomplished with the * (product)
# operator
In [25]: phi = phi1 * phi2
In [26]: phi.scope()
Out[26]: ['a', 'b', 'c']
In [27]: print(phi)
╒═════╤═════╤═════╤══════════════╕
│ a   │ b   │ c   │   phi(a,b,c) │
╞═════╪═════╪═════╪══════════════╡
│ a_0 │ b_0 │ c_0 │    1000.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_0 │ b_0 │ c_1 │  100000.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_0 │ b_0 │ c_2 │    5000.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_0 │ b_1 │ c_0 │     200.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_0 │ b_1 │ c_1 │       3.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_0 │ b_1 │ c_2 │    1000.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_1 │ b_0 │ c_0 │       5.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_1 │ b_0 │ c_1 │     500.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_1 │ b_0 │ c_2 │      25.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_1 │ b_1 │ c_0 │   20000.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_1 │ b_1 │ c_1 │     300.0000 │
├─────┼─────┼─────┼──────────────┤
│ a_1 │ b_1 │ c_2 │  100000.0000 │
╘═════╧═════╧═════╧══════════════╛

# or with product method
In [28]: phi_new = phi.product(phi1, phi2)
# would produce a factor with phi_new = phi * phi1 * phi2

Gibbs distributions and Markov networks

In the previous section, we saw that we use factors to parameterize a Markov network, which is quite similar to CPDs. Hence, we may think that factors behave in the same way as the CPD. Marginalizing and normalizing it may represent the probability of a variable, but this intuition turns out to be wrong, as we will see in this section. A single factor is just one contribution to the overall joint probability distribution; to have a joint distribution over all the variables, we need the contributions from all the factors of the model. For the dinner example, let's consider the following factors to parameterize the network.

Factor over the variables A and B represented by Gibbs distributions and Markov networks:

A

B

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks
Gibbs distributions and Markov networks

90

Gibbs distributions and Markov networks
Gibbs distributions and Markov networks

100

Gibbs distributions and Markov networks
Gibbs distributions and Markov networks

1

Gibbs distributions and Markov networks
Gibbs distributions and Markov networks

10

Factor over the variables B and C represented by Gibbs distributions and Markov networks:

B

C

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

10

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

80

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

70

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

30

Factor over variables C and D represented by Gibbs distributions and Markov networks:

C

D

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

10

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

1

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

100

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

90

Factor over variables A and D represented by Gibbs distributions and Markov networks:

D

A

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

80

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

60

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

20

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

10

Now, considering the preceding factors, let's try to calculate the probability of A by just considering the factor Gibbs distributions and Markov networks. On normalizing and marginalizing the factor with respect to B, we get Gibbs distributions and Markov networks and Gibbs distributions and Markov networks. Now, let's try to compute the probability by considering all the factors. To do this, we first need to calculate the factor product of these factors:

A

B

C

D

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Gibbs distributions and Markov networks

Now, if we normalize and marginalize this factor product, we get Gibbs distributions and Markov networks and Gibbs distributions and Markov networks. Here, we see that there is a huge difference in the probability when we consider only a single factor as compared to when we consider all the factors. Hence, our intuition of factors behaving like CPDs is wrong.

Therefore, in a Markov network over a set of variables Gibbs distributions and Markov networks having a set of factors Gibbs distributions and Markov networks associated with it, we can compute the joint probability distribution over these variables as follows:

Gibbs distributions and Markov networks

Here, Z is the partition function and Gibbs distributions and Markov networks.

Also, a distribution Gibbs distributions and Markov networks is called a Gibbs distribution parameterized by a set of factors Gibbs distributions and Markov networks if it is defined as follows:

Gibbs distributions and Markov networks

Here, Gibbs distributions and Markov networks is a normalizing constant called the partition function.

To construct a Markov network, we need to associate the parameterization of a Gibbs distribution to the set of factors of an undirected graph structure. A factor with the X and Y scopes represents a direct relationship between them.

Let's see how we can represent a Markov model using pgmpy:

# First import MarkovModel class from pgmpy.models
In [1]: from pgmpy.models import MarkovModel
In [2]: model = MarkovModel([('A', 'B'), ('B', 'C')])
In [3]: model.add_node('D')
In [4]: model.add_edges_from([('C', 'D'), ('D', 'A')])

Now, let's try to define a few factors to associate with this model:

In [5]: from pgmpy.factors import Factor
In [6]: factor_a_b = Factor(variables=['A', 'B'],
                            cardinality=[2, 2],
                            value=[90, 100, 1, 10])
In [7]: factor_b_c = Factor(variables=['B', 'C'],
                            cardinality=[2, 2],
                            value=[10, 80, 70, 30])
In [8]: factor_c_d = Factor(variables=['C', 'D'],
                            cardinality=[2, 2],
                            value=[10, 1, 100, 90])
In [9]: factor_d_a = Factor(variables=['D', 'A'],
                            cardinality=[2, 2],
                            value=[80, 60, 20, 10])

We can associate the factors to the model using the add_factors method:

In [10]: model.add_factors(factor_a_b, factor_b_c,
                           factor_c_d, factor_d_a)
In [11]: model.get_factors()
Out[11]:
[<Factor representing phi(A:2, B:2) at 0x7f18504477b8>,
 <Factor representing phi(B:2, C:2) at 0x7f18504479b0>,
 <Factor representing phi(C:2, D:2) at 0x7f1850447f98>,
 <Factor representing phi(D:2, A:2) at 0x7f1850455358>]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.103.5