Bayesian networks

Bayesian networks are a type of graphical model that involves a directed acyclic graph structure. We often refer to the tail node of a directed edge in a graphical model as the parent and the head node as the child or descendant. In fact, we generalize this latter notion so that if there is a path from node A to node B in the model, node B is a descendant of node A. We can distinguish the special case of node A connected to node B by saying that the latter is a direct descendant.

The parent relationship and the descendant relationship are mutually exclusive in a Bayesian network because it has no cycles. Bayesian networks have the distinguishing property that given its parents, every node in the network is conditionally independent of all other nodes in the network that are not its descendants. This is sometimes referred to as the local Markov property. It is an important property because it means that we can easily factorize the joint probability function of all the random variables in the model by simply taking note of the edges in the graph.

To understand how this works, we will begin with the product rule of probability for three variables that says the following (taking G, J, and U as example variables):

Bayesian networks

This rule is a general rule and always holds without any loss of generality. Let's return to our student applicant example. This is actually a simple Bayesian network where G and J have U as a parent. Using the local Markov property of Bayesian networks, we can simplify the equation for the joint probability distribution as follows:

Bayesian networks

The ability to factorize a probability distribution in this way is useful as it simplifies the computations we need to make. It can also allow us to represent the entire distribution in a more compact form. Suppose that the distribution of each random variable is discrete and takes on a finite set of values, for example, random variables G and J could each take on the two discrete values {yes, no}. To store a joint probability distribution without factorizing, and taking into account independence relations, we need to consider all possible combinations of every random variable.

By contrast, if the distribution factorizes into a product of simpler distributions as we saw earlier, the total number of random variable combinations we need to consider are far fewer. For networks with several random variables that take on many values, the savings are very substantial indeed.

Besides computation and storage, another significant benefit is that when we want to determine the joint probability distribution of our random variables given some data, it becomes much simpler to do so when we can factorize it because of known independence relations. We will see this in detail when, in the next section, we study an important example of a Bayesian network.

To wrap up this section, we'll note the factorization of the joint probability function of the Bayesian network, represented by the graph we saw in the first diagram in this chapter, and leave it as an exercise for the reader to verify:

Bayesian networks
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.71.211