Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3
Uncertainty

This chapter focuses on uncertainty and the means to evaluate it through a mathematical theory of probability. Our objective here is to pedagogically present such a theory as a way to operate with random variables and random processes with special attention to well‐defined systems following the demarcation process presented in Chapter 2. In this case, uncertainty can be related to either internal or external aspects of a given particular system, as well as to its articulation with its environment. We will introduce this theoretical field in an accessible manner with an explicit intention of reinforcing foundational concepts instead of detailing mathematical techniques and derivations. Such concepts are the raw material needed to understand the definition of information to be presented in Chapter 4. For readers with interest in the mathematics, the original book by Kolmogorov is a must [1]. Engineers and engineering students could also refer to [2] and readers with interest in complex systems to [3].

3.1 Introduction

The word “uncertainty” refers to aspects of a phenomenon, an object, a process, or a system that are totally or partly unknown, either in relative or absolute terms. For example, it is uncertain to me how many people will read this book, or if I will be able to travel to Brazil in the summer vacations. It is also uncertain to me who will win the lottery this evening; however, I am certain that I will not win because I have no ticket. Other phenomena involve even more fundamental types of uncertainty as, for instance, quantum mechanics (e.g. Heisenberg's uncertainty principle) or chaotic systems that are sensitive to initial conditions that are impossible to determine with the required infinite precision.

Probability theory is a branch of mathematics that formalizes the study of uncertainty through a mathematical characterization of random variables and random processes. Its basis is set theory, and it is considered to have been inaugurated as an autonomous field when Kolmogorov postulated the axioms of probability theory [1]. We can affirm that probability theory is a purely theoretical discipline related to purified mathematical objects. To avoid misunderstanding, it is worth mentioning that the treatment of uncertainty in real‐world data is related to another discipline called statistics. While statistics is, in fact, founded on probability theory, it constitutes an autonomous technical discipline based on its own methodology and raw materials. Interestingly, the ongoing research efforts in machine learning and artificial intelligence are closely related to statistics and hence, probability theory. Some aspects of statistics will be discussed in forthcoming chapters.

Our focus here is only on the fundamentals of probability theory. We will mostly provide some intuitive concrete examples as illustrations, which are controlled experimental systems isolated from the environment (i.e. a closed system), and thus, a rigorous analysis of real‐world data is put aside. In this sense, the most important concepts and the basics of their mathematical formulation will be stated in a comprehensive manner.

3.2 Games and Uncertainty

Most games involve some sort of randomness. Lottery, dices, urns, roulette, or even simple coin flipping are classical examples of games that are almost entirely a chance game. Those are usually one‐shot games, and the uncertainty is directly related to the outcome in relation to choices the players have taken before the outcome is revealed. They are usual examples in probability theory textbooks.

Other classes of games, like chess or checkers, are strategic and, in some sense, deterministic. However, the (strategic) interactions between the two players in order to win also lead to uncertainty but of a different type in comparison with chance games. We can say that this is more an operational uncertainty in a sense that it is impossible to know with certainty how the process will evolve, except in very few (unrealistic) cases where very strong assumptions about the players are taken (e.g. both players will always act in the same way when faced with the same situation).

There are other kinds of games that combine both types of uncertainty. These are games where the initial state of the players is given by a drawing process (like a lottery), and the game itself is based on (strategic) decisions of players. Card games and dominoes are well‐known examples. These are the most interesting games to exemplify and introduce the idea of random variables and random processes. Because of its relative simplicity, our discussions will mainly concern dominoes, whose main aspects will be defined next. Note that all games considered in this chapter are closed systems.

Definition 3.1 Dominoes. Dominoes is a tile‐based game with 28 pieces forming a pair of two numbers ranging from 0 to 6, represented here as [,], without repeating pairs. For example, if the numbers in the tile are 4 and 3, we represent this as [4,3], or [3,4], because the pairs are unordered. Before starting, all tiles are placed with their numbers down and shuffled. The players (usually from two to four) take five pieces without the others seeing them and keep the leftovers as a bank. Whoever has the highest double (i.e. [6,6], [5,5], and so on) in the hands begins; if no one has a double, then the tile with the highest sum begins. With the first one in the deck, each player must pair off the numbers sequentially (e.g. clockwise). If a player in his/her turn does not have a tile with a number that matches the current deck, then he/she needs to get new pieces from the bank until he/she gets a suitable one. If the bank is empty, then that player misses the turn. The game ends when the first player has used all his/her pieces. If the game is blocked (i.e. no one has a piece to keep the game running), the winner is the one with the lowest sum in the hand.

A full set of dominoes is shown in Figure 3.1a, while a snapshot of a typical deck is presented in Figure 3.1b. This game is interesting because it provides a pedagogical way to present probability theory, ranging from the simplest case of one random variable to the most challenging cases of random processes. In more precise terms, there are two main sources of uncertainty in dominoes, namely (i) the drawing processes at the beginning and during the game, and (ii) the behavior of the players, who are dynamically interacting in a sequential process. In this case, interesting questions could be posed: (a) how frequently will a player begin with the same hand; (b) how frequently will all players begin with the same hand; and (c) how frequently will a game with all players starting with the same hand lead to the same playing sequence resulting in identical games? Without any formalism, even a child can readily answer that (a) is more frequent than (b) and (c), and (b) more frequent than (c). Our task is then not only to prove these inequalities but also to find a way to quantify the chances of these outcomes. Let us move step‐by‐step in a series of examples.

Schematic illustration of (a) Dominoes set; (b) example of a game. — **Figure 3.1** (a) Dominoes set; (b) example of a game.

Example 3.1 Drawing one specific tile. Consider that a given player takes only one tile from the full dominoes set without seeing its respective number. Before looking at it, any tile out of the 28 could be selected. In this case, we can say that the player has a chance of 1 out of 28 possible outcomes, which we denote by 1 : 28, to get any specific tile. Note that 1 : 28 is not the same as 1/28 (which means 1 divided by 28). Hence, this number 1 : 28 could be used to quantify the uncertainty of the outcome. After the player sees the content of the tile, the outcome is certain, and thus, the tile is exactly the selected one. For example, the chance of having the tile [6,6] is 1 : 28 before the selection is revealed. After that, there is no uncertainty: the selected tile is either [6,6] or not. Without the [6,6] in the set, the chance of any other remaining tile to be drawn is 1 out of 27 possible outcomes (1 : 27). If the [6,6] is returned to the set, the chance is again 1 : 28.

Example 3.2 Drawing two specific tiles in different ways. The player takes two tiles at the same time, placing one on the right side and the other on the left. What is the chance that the tiles are [0,0] and [6,6]? There are two possible favorable outcomes: (A) the tile on the right is [0,0] and the one on the left is [6,6], or (B) the tile on the right is [6,6] and the one on the left is [0,0]. In this case, the chances are 2 out of (28 27) possible outcomes, or 1 : 378, where 28 27 = 756 refers to all possibilities of pairs. If positions of the tiles matter (i.e. the favorable outcome is [6,6] on the right and [0,0] on the left), only outcome (A) is favorable, and thus, the chances are smaller: 1‐out‐of‐756. This is similar to a case of subsequent drawings where the first tile needs to be [6,6] and the second needs to be [0,0]. This results in 1 out of 756 possible outcomes. Note, though, that the chance of losing this game in the first draw is 27 out of 28 possible outcomes. If the draw is subsequent but the order does not matter, then we are in the first case: 2 out of 756 possible outcomes (or 1 : 378) with a chance of losing in the first draw equals 26 out of 28.

Example 3.3 One player draws two times with the same five tiles. Consider a game with two players where each one takes five tiles. We can determine the chances of one player getting the same hand, assuming that he/she is the first to take from the deck all five tiles. In this case, the player knows the five tiles of the first game, and therefore, we need to find the chance of having the same five tiles. The number of overall possibilities is 28 27 26 25 24 = 11,793,600. The favorable outcomes are all possible arrangements of these five tiles, which can be computed as five options for the first, four for the second, and so on. This leads to 5 4 3 2 1 = 120. Therefore, the chance of having the same hand twice is 1 out of 98 280 possibilities. Another way to understand this is that the chance for the first tile being in the desired set is 5 : 28, the second is 4 : 27, until the fifth is 1 : 24, reaching the same result. We could follow similar steps to find the solution for the second player and also in the case of sequential one by one draw (this will be the focus of Exercise 3.1). Note that the definition of the favorable outcomes is more intricate in this case since the chance of the second player to get a given tile after the first player refers to the chance: (i) the latter not having selected such a tile in his/her turn, and (ii) the former having selected it.

Now, we can provide an outline of how to answer the three questions posed before these four examples. For the question (a) how frequently will a player begin with the same hand, we should proceed similar to Example 3.3 but indicating the order in which the pieces are taken from the deck (i.e. whether the player is the first one to take the five tiles, or if there is a different procedure for taking the tiles). The idea is to check the favorable outcomes (the specific five tiles) with respect to the number of pieces in the deck. The question (b) how frequently will all players begin with the same hand follows the same procedure, but each player has to be analyzed individually, and all players must have the same five tiles. However, it is easy to see that the situation described in (a) is completely covered by the situation in (b), which specifies even further the favorable outcome. We can say that the favorable outcome related to (a) is a subset of the one described in (b). For being more restrictive, the chances of (b) are smaller than (a).

The last question (c) how frequently will a game with all players starting with the same hand lead to the same playing sequence becoming identical games is trickier. First, the situation described in (b) is covered by the one described in (c), and thus, (c) has, at best, the same chance as (b). However, (c) involves an operational uncertainty related to how the game develops from the decisions and actions of the (strategic) players. Such decision‐making processes are, in principle, unknown, and thus, we cannot compute the chances related to them without imposing (strong) assumptions. For example, if we assume that (i) all players will always take the same decisions and actions when they are in the same situation, and (ii) they play in the same sequence, then if they have the same tiles, the outcome of the game will be the same, reducing the chances of (c) happening to the outcome that all players have the same hands, which is the situation described in (b). In this case, the chances of (b) and (c) are the same. Nevertheless, any other assumption related to the players' behavior would imply more possibilities and therefore, the chances of (c) to occur would be lower than (b), and thus, (a). The situation described in (a) and (b) refers to random variables, while (c) refers to random processes or stochastic processes. Random variables are then associated with single observations, while stochastic processes with sequential observations constituted by indexed (e.g. time‐ or event‐stamped) random variables considering the order of the observations. More details can be found in [3].

Besides, an attentive reader will probably not be satisfied because the questions talk about the frequency of a specific outcome and the answers displace the question of talking about the chances of those outcomes. This is a correct concern. The fundamental fact taken into account in those cases is that the studied outcomes are one‐shot realizations of either a given process of drawing as in (a) and (b), or of a well‐determined dominoes complete match as in (c). Since each one‐shot realization of the process is unrelated to both previous and future realizations, the outcomes are then independent. A deeper discussion about the dependence and independence of outcomes will be provided later in this chapter. Let us now exemplify this by revisiting Example 3.1.

Example 3.4 Chance and frequency of drawing one specific tile. It was shown that the chance that a player gets a specific tile, let us say [6,6], from the full dominoes set is 1 out of 28 possible tiles. In other words, there is a chance of 1 : 28 of selecting [6,6] in a one‐shot trial. If the tile is returned and the tiles are mixed again, the chance of getting [6,6] in the next one‐shot trial is the same 1 : 28. If a new trial happens following the same procedure, the chances are the same 1 : 28. This could go indefinitely with the same chance of 1 : 28.

A skeptical person is concerned with the fairness of the game because one player gets [6,6] three times in a row. He/she then decides to take notes on how many times [6,6] appears as an outcome by filling a table with two possible outcomes [6,6] and not [6,6]. After 2,800 trials, the person decides to count the frequency of the outcomes. The [6,6] column is marked 102 times out of 2800 outcomes, leading to an approximate frequency of 1 : 28. The not [6,6] column is marked 2698 times out of 2800 outcomes, leading to an approximate frequency of 27 : 28. In this case, the frequency of outcomes is numerically approximated by the chance of the specific outcome in a one‐shot trial. In addition, the sum of the events of interest, [6,6] and not [6,6], results in 2800, leading to 2800 times out of 2800 outcomes, and therefore equals 28 : 28, or 1 : 1 (which is a certain outcome).

With this example, we have shown that the frequency of the outcomes in a (fair) drawing process can be approximated by the chance of a given outcome in a one‐shot realization. If the drawing process is repeated many many times, then the frequency will tend to have the same value as the chance of a one‐shot realization. For the questions asked in (a), (b), and (c), this fact is enough to sustain that frequency and chance have the same numerical value. But we need to go further, and this will require some formalization of the concepts presented so far in order to extend them. In the following section, we will start our quick travel across the probability theory world always trying to provide concrete examples.

3.3 Uncertainty and Probability Theory

Consider a particular system or process containing different observable attributes that can be unambiguously defined. For example, the dominoes set has 28 tiles with two numbers between zero and six, coins have two sides (heads and tails), the weather outside can be rainy or not, or the temperature outside my apartment during July could be any temperature between 10 and 35 C. Given these observations, there is an unlimited number of possible experiments that could be proposed. One experiment could be related to measuring the temperature outside for the whole July, every day at 7 am and 7 pm. Another experiment could be to throw a coin five times in a row and then record the sequence of heads and tails. The examples provided with dominoes in the previous sections are all different experiments related to drawing tiles of the dominoes set.

Each experiment is thus defined by events related to observations of a set of predefined attributes of the system following a well‐defined protocol of action (i.e. a procedure of how to observe and record observations). Depending on the specific case, the result of an observation is referred to as outcome, sample, measurement, or realization. If, before the observation, the attribute to be observed has its outcome already known with certainty, the system or process is called deterministic. If this is not the case, there is uncertainty involved, and thus, the possible set of outcomes is associated with chance, or probability.

The formal notation to be employed here will be presented next.

Definition 3.2 Notations.

A particular system or process is associated with different observable attributes , where is the set composed of all possible outcomes of the th attribute.
Each observation process is determined by a specific protocol that unambiguously determines .
The result of each observation of the attribute is denoted , then .
An experiment is designed with respect to by defining events of interest, denoted by , such that , where is a function that maps the set of all possible observation outcomes into the sample space .
covers all possible outcomes of the experiment , including the possibility of sequential observations of the same attribute with .
If the outcome of is known with certainty, the system or process is said to be deterministic with respect to the experiment , and thus, the event will either happen or not.
If the outcome of is unknown but is known with certainty, then the system or process is random with respect to ; the chances that the event will happen can be quantified by a function named probability measure, which is defined in Definition 3.3.

It is noteworthy that the proposed notation formalizes the observation process with respect to a system or process. This is a remarkable difference from most textbooks, e.g. [2], where neither the observation protocol grounded in a particular system or process is formalized nor its randomness is formally defined with respect to the experiment under consideration. Although these aspects are found in most books in a practical state, our aim here is to explicitly formalize them in order to support the assessment and evaluation of the uncertainty of different systems and processes. To do so, we first need to mathematically define probability measure.

These are the well‐known axioms of probability proposed by Kolmogorov [1]. In plain words: probability is positive with a maximum value of 1, where the probability of two disjoint sets is the sum of their individual probabilities. With those three basic axioms, it is possible to derive several properties of [1, 2].

With Definitions 3.2 and 3.3, we can return to the dominoes but now discussing probabilities.

Example 3.5 Probability of drawing one specific tile.

System : a dominoes set with 28 pieces.
Protocol : (i) all tiles are facing down, (ii) they are mixed, (iii) one tile is taken, (iv) its attribute that will be defined next is observed and recorded, and (v) the tile is returned.
Attributes : the two numbers written in the tile so that . Note that in the dominoes set, the two numbers form a unordered pair so that is equivalent to and so on.

Experiment 3.1

Experiment : Perform the protocol one time, leading to a sample space .
Event of interest : Observe as the outcome of the experiment .
The outcome is uncertain because of the protocol and the definition of , but the outcomes are fair (i.e. equally likely).
If the protocol leads to a fair selection where all tiles have the same likelihood to be taken, then the probability of taking any specific tile is the same.
Since the tiles are mutually exclusive (once one is selected, all the others are not), and thus, form disjoint sets whose union is , we can apply (A3):
Since , then we can apply (A2) so that , leading to .
Finally, , .

Experiment 3.2

Experiment : Perform the protocol three times, leading to a sample space , where the symbol “” denotes in this case the Cartesian product of sets.
Event of interest : Observe as the outcome of the experiment .
If the protocol leads to a fair selection where all the tiles have the same likelihood to be taken, then the probability of taking any specific combination of three tiles is the same.
Since the tiles are mutually exclusive, and thus, form disjoint sets whose union is , we can apply (A3): ⋯
Since , then we can apply (A2) so that , leading to .
Finally, , .

What is interesting is that one can propose an unlimited number of experiments following the same basic observation process related to a particular system. In this case, it is also important to have a more detailed characterization of it, which can then be used to solve questions related to different experiments grounded in that process. One important tool is the frequency diagrams or histograms. The idea is quite simple: repeat the observation process several times, counting how many times each attribute is observed, and then plot it with a bar diagram associating each attribute with its respective number of observations. Figure 3.2 illustrates this for the case of the dominoes set.

Schematic illustration of example of a frequency diagram related to the observations of the attributes a∈ of the system Φ (a dominoes set composed of 28 tiles) following the observation protocol ρ repeated for 10 000 times. — **Figure 3.2** Example of a frequency diagram related to the observations of the attributes (two numbers in the tile) of the system (a dominoes set composed of 28 tiles) following the observation protocol repeated for 10 000 times. The dark gray line indicates the expected number of occurrences for each tile.

In this case, the dark gray line is the expected number of occurrences of each tile; this number is intuitively defined as the probability of a given outcome (in this case 1/28) times the number of observations (in this case 10 000). This leads to an expected number of 357.14 observations for each different tile. Let us try to better understand this with the following examples.

Example 3.6 Number on the left, number on the right. We are now interested in checking the frequency of the different numbers in a tile, the one on the left, the other on the right. Different from the previous cases, the order of the pair in the tile matters because of the two different orientation options according to which the tile could be placed. This implies a change in our basic protocol . The new protocol and attribute set are defined next.

System : A dominoes set with 28 pieces.
Protocol : (i) All tiles are facing down, (ii) they are mixed, (iii) one tile is taken and horizontally positioned so that one number is on the left and the other on the right, (iv) its attribute that will be defined next is observed and recorded, and (v) the tile is returned.
Attributes : The two numbers written in the tile where one number is on the left, the other is on the right (see Figure 3.3). Formally, with .

Figure 3.3 Domino tile in horizontal position: one number at the left side, another at the right side.

Note that has 49 elements but the dominoes have only 28 tiles. The difference comes from the observation protocols and . In fact, the protocol brings a new uncertainty to the observation process: the position where the number will appear in the tile. We can analyze this situation by considering two different classes of tiles: doubles and not doubles. There are seven doubles where there is no uncertainty about the sequence in which the numbers will appear. The not‐double class is composed of the other 21 tiles, where the sequence of appearance of the number does matter: the attribute is different from , and so on. In this case, the sample space has a twofold increase, leading to 41 possibilities. The new sample space has then 41 elements. Using these numbers, we can plot the frequency diagram as presented in Figure 3.4. Since the protocol does not favor any specific outcome, the chance of having one specific value is 1 out of 7 possibilities (or simply 1 : 7) for both sides, leading to a probability of 1/7, because the outcomes are equally likely. If we repeat the observation process 10 000 times, the expected number of observed values is then 1/7 10 000, which is equal to 1428.57, regardless of the specific number and the side.

Figure 3.4 Example of a frequency diagram related to the observations of the attributes with of the system (a dominoes set composed of 28 tiles) following the observation protocol repeated for 10 000 times.

Schematic illustration of example of a frequency diagram related to the attribute a″=aleft+aright with of the system Φ (dominoes set composed of 28 tiles) following the observation protocol ρ″ repeated for 10 000 times. — **Figure 3.5** Example of a frequency diagram related to the attribute with of the system (dominoes set composed of 28 tiles) following the observation protocol repeated for 10 000 times.

a double-prime equals a Subscript left Baseline plus a Subscript right — **Figure 3.5** Example of a frequency diagram related to the attribute with of the system (dominoes set composed of 28 tiles) following the observation protocol repeated for 10 000 times.

Example 3.7 Sum of the two numbers in the tile. Consider another observation protocol where we are interested in the sum of the number on the right and on the left. We can define the new attribute as with . We have the new protocol defined as: (i) all tiles are facing down, (ii) they are mixed, (iii) one tile is taken, (iv) its attribute is observed and recorded, and (v) the tile is returned. Figure 3.5 illustrates the frequency diagram of this sum. The situation is now more challenging, because the expected number of observations of each attribute is not the same. By inspection, it is easy to see that only if and , which is 1 : 7 and 1 : 7, which results in 1 : 49 and an associated probability of 1/49, because the outcomes of and are equally likely. Considering the 10 000 observations, it is expected that will be observed 1/49 times 10 000, which is equal to 204.08. For , there are two possibilities: [0,1] or [1,0]. In this case, there is twice the chance compared with the previous case, i.e. 2 out of 49 possibilities. Hence, the expected number of observations for is 408.16. It is possible to follow a similar procedure for the other values of .

With these examples, we have shown how the different observation protocols define the observable outcomes and the respective sample spaces, which are the basis of specific experiments constructed to evaluate the uncertainty of particular systems or processes. Because the values of the attributes may vary at every observation, it is then possible to create a map that associates each possible observable attribute with a probability that it will be indeed observed as the outcome of one observation. Probability theory names a quantified version of this unknown observable attribute as random variable and the mathematical relation that maps the random variable with probabilities as probability functions; they are formally defined in the following.

Definition 3.4 Random variable and probability functions. Consider a system associated with an observation process and observable attributes . A random variable is a function that maps every attribute with a real number so that .

[Discrete] A random variable is called discrete if its possible outcomes are defined in a countable set , which can be either infinite, i.e. , or finite, i.e. for a given number . The probability mass function is defined for discrete random variables, i.e. , as:

(3.1) $p Subscript upper X Baseline left-parenthesis x right-parenthesis equals upper P left-parenthesis upper X equals x right-parenthesis equals upper P left-parenthesis StartSet a colon upper X left-parenthesis a right-parenthesis equals x EndSet right-parenthesis comma$

where the function satisfies the axioms defined in Definition 3.3.

[Continuous] A random variable is called continuous if its possible outcomes are defined in an uncountable set as, e.g. . The cumulative density function is then defined for as

(3.2) $upper F Subscript upper X Baseline left-parenthesis x right-parenthesis equals upper P left-parenthesis negative infinity less-than upper X less-than-or-equal-to x right-parenthesis comma$

where the function satisfies the axioms defined in Definition 3.3. The probability density function (if it exists) is then defined as:

(3.3) $f Subscript upper X Baseline left-parenthesis x right-parenthesis equals StartFraction normal d upper F Subscript upper X Baseline left-parenthesis x right-parenthesis Over normal d x EndFraction period$

This can now be used not only to better characterize the uncertainty related to different processes and systems but also to develop a whole mathematical theory of probability. From the probability function, it is also possible to define a simpler characterization of random variables. For example, it is possible to indicate which is the outcome that is more frequent, or the average value of outcomes, or how much the outcomes vary. Among different possible measures, we will define next the moments of random variables, which can be directly employed to quantify their expected value and variance.

Besides, it is worth introducing two other concepts that are helpful to analyze different observation processes. Consider a case of different observations of the same random variable (i.e. all conditions of the observation process are the same). If a given observation of this process does not depend on past and future observations, we say that the realizations of are independent and identically distributed (iid), which will lead to a memoryless process (a formal definition will be provided later). We can now illustrate these concepts with some examples.

In this example, although it is possible to compute the expected value and the variance, these measures are meaningless because the mapping function is arbitrary and its actual values are not related to the attribute . Note that, in other cases, the mapping is straightforward from the measurement as, for instance, the number of persons in a queue. In summary, the correct approach to define the random variable under investigation depends on the system or process , the observation protocol , and the experiment of interest.

It is also important to mention that one interesting way to numerically estimate probability mass functions is by experiments like the ones presented in Figures 3.2, 3.4, and 3.5. Those results were obtained by computer simulations based on the Monte Carlo method [4]. The idea is to generate several observations and empirically find their probabilities as the number of times that a specific observation appears divided by the overall number of observations realized. Figure 3.6 presents analytical results generated from the probability mass function for and empirical results generated by computer simulations. Note that this random variable is uniformly distributed in relation to its sample space.

Schematic illustration of example of a probability mass function diagram related to the random variable X(a) obtained from the observations of the attributes a∈ (two numbers in the tile) of the system Φ (dominoes set composed of 28 tiles) following the observation protocol ρ repeated for 10 000 times. — **Figure 3.6** Example of a probability mass function diagram related to the random variable obtained from the observations of the attributes (two numbers in the tile) of the system (dominoes set composed of 28 tiles) following the observation protocol repeated for 10 000 times. The bars are the empirical results, and the line indicates the analytic probabilities.

As a matter of fact, there are several already named probability distributions, for instance, Gaussian or Normal distribution, Poisson distribution, and Binomial distribution. Their mathematical characterization is available in different public repositories and textbooks, e.g. [2].

The importance of having a well‐defined formulation of probability distributions is to perform computer experiments in a generative manner. The idea is to describe a real‐world process or system with a known probability distribution and simulate its dynamics. Note also that, in many cases, the random variable values can be directly obtained from observations of the attributes, without defining a special mapping function as in Example 3.8.

Example 3.9 Arrivals in a restaurant. The manager of one restaurant is planning how many meals he/she should prepare every hour for the dinner. Without any good characterization, he/she estimated that from 5 pm to 6 pm there are usually fewer people than from 6 pm to 9 pm, which is the peak time. From 9 pm to 10 pm (when the restaurant closes), there is about the same number of people arriving as in the 5 pm to 6 pm period. He/she estimates the arrivals as follows.

5 pm to 6 pm and 9 pm to 10 pm: 5 persons per hour,
6 pm to 9 pm: 20 persons per hour.

However, he/she knows that the actual number may vary night after night, and that the numbers are somehow independent.

Figure 3.7 Poisson random variable considering two different client arrival rates in a period of one hour.

After joining a probability course, the manager discovered that this scenario could be studied by using Poisson random variables. The probability mass function is defined as:

(3.6) $upper P left-parenthesis x right-parenthesis equals StartFraction lamda Superscript x Baseline exp left-parenthesis negative lamda right-parenthesis Over x factorial EndFraction comma$

where is the estimated number of arrivals per hour, , and is the factorial operation. For Poisson random variables, both the mean value and the variance are equal to .

The manager then understands that the estimated arrival rate might be related to the mean value, and thus, he uses the aforementioned values as the parameter . Figure 3.7 shows the distributions defined by (3.6) considering and . It is interesting to see that there is a great variation around the estimated number of arrivals, which was also indicated by the relatively high variance value. In the case of five arrivals per hour, there is a nonnegligible probability that 10 persons will arrive in that period, as well as none. However, the highest probability is associated with four or five arrivals. In this case, the probability of the random variable is or is 35%.

Now, looking at the case of 20 arrivals per hour, the situation is more challenging because the distribution is more spread, although the highest probabilities are and , both below 10%. In plain words, this fatter distribution means that there is a relatively high probability that an outcome far from the estimated arrival rate is actually observed. This means that during one day between 7 pm and 8 pm, 30 persons may arrive, while on another day at the same time only 10 will come.

The probability mass function does not help the manager much to estimate how many meals he/she needs to prepare every evening, but it helps him/her to better understand the uncertainty involved in this case. Depending on other considerations, the manager could have other known distributions to model the problem. He/she could also acquire more fine‐grained data and find an empirical distribution, possibly trying to match the empirical points with known distributions.

Example 3.10 Maxwell's demon. Consider the Maxwell's demon thought experiment introduced in the previous chapter. Let us consider two situations: (i) before the demon's interventions: the system is in a thermodynamic equilibrium, and thus, with the same temperature; and (ii) after the demon's interventions: each side is in a thermodynamic equilibrium but with different temperatures, one warmer, the other colder than before. From classical statistical mechanics, we know that the velocity of the particles for a given temperature is a random variable that follows a Maxwell–Boltzmann probability distribution, whose probability density function is given by:

(3.7) $f Subscript upper X Baseline left-parenthesis x right-parenthesis equals StartRoot StartFraction 2 Over pi EndFraction EndRoot StartFraction x squared Over a cubed EndFraction exp left-parenthesis minus StartFraction x squared Over 2 a squared EndFraction right-parenthesis comma$

where and with are the temperature, the molecule mass, and the Boltzmann constant.

Figure 3.7 presents the distribution (3.7) for (arbitrary value) that characterizes how the velocity of the particles is distributed in the experiment for a given equilibrium temperature. It is interesting to see that the velocity of each molecule refers to the characteristic of one of the components of the system, which is composed of a very large number of molecules, while the temperature is a macrostate of the system (Figure 3.8).

The idea of the operation of Maxwell's demon is to separate the slow and fast molecules based on a given threshold (dashed line in the figure) to create a new equilibrium macrostate constituted by (a) slow molecules (cold side) and (b) fast molecules (hot side). In this case, Maxwell's demon is violating the second law of thermodynamics by creating a new equilibrium state (ii) where there will be two different probability distributions, one for (a) the other for (b). The formalization of this thought experiment and the estimation of the new equilibrium states will be the focus of Exercise 3.4.

Most of the examples presented in this section illustrate experiments where the different observations of the specific random variables are assumed iid. However, this is not always the case. Example 3.7 shows one type of dependence, because there is one new random variable defined as the sum of two other random variables. Another example is a drawing process presented in Example 3.1 but without returning the observed tile to the bank, which decreases the sample space, and thus, modifies the probability distribution. There are many other modes of dependence between two or more random variables, which include operations and functions of random variables, and random processes. These topics will be the focus of our next section.

Schematic illustration of maxwell–Boltzmann probability density function of the random variable that characterizes the speed of the gas molecules for a=1 (arbitrary value) indicating the separation threshold used by Maxwell's demon to separate the fast and slow molecules. — **Figure 3.8** Maxwell–Boltzmann probability density function of the random variable that characterizes the speed of the gas molecules for (arbitrary value) indicating the separation threshold used by Maxwell's demon to separate the fast and slow molecules.

3.4 Random Variables: Dependence and Stochastic Processes

When thinking about systems and processes, we intuitively imagine that different observations of attributes and outcomes of experiments may be dependent. Observable attributes might be related to a background process or system, for example, when measuring temperature in a room using two sensors in different places, or the same sensor but whose measurements are taken every ten minutes. In both cases, if we consider that these observable attributes are associated with random variables, it is expected that they would be related to each other as they are measurements of the temperature (a physical property) of the same place. Hence, if the outcome of one of these random variables is known, the uncertainty of the other should decrease.

In other cases, the attributes might have another type of physical dependence, like the rotation speed of rotors in electric machines and the associated values of electric current and voltage. There is also the case of relations that are constructed in the design of the experiment of interest. For instance, in Example 3.7, the outcome of the experiment is constructed as the sum of two attributes that are characterized as two independent random variables.

Another type of dependence between random variables is related to sequential processes as exemplified by a dominoes game where the sample space dynamically changes at each action. A queue at the cashier of a supermarket also illustrates a sequential process where the number of persons waiting in the line to be served depends on the number of persons arriving in the queue and the service time that each person requires to leave the cashier.

To evaluate the dependence between random variables, it is necessary to formally define a few key concepts, as to be presented next.

To mathematically characterize the dependence/independence of random variables, we need to consider functions of two or more random variables. They are the joint, conditional, and marginal probabilities distributions. Therefrom, it is possible to propose a classification of types of uncertainties associated with different systems, processes, and experiments. Their formal definition is provided next.

Definition 3.7 Joint, conditional, and marginal probability distributions. Consider without loss of generality two random variables and , with and .

[Discrete] The joint probability mass function is defined for the vector of discrete random variables , as:

(3.8) $p Subscript upper X comma upper Y Baseline left-parenthesis x comma y right-parenthesis equals upper P left-parenthesis left-brace upper X equals x right-brace intersection left-brace upper Y equals y right-brace right-parenthesis equals upper P left-parenthesis upper X equals x comma upper Y equals y right-parenthesis period$

where the function satisfies the axioms defined in Definition 3.3.

The marginal probability mass functions are defined as:

(3.9) $p Subscript upper X Baseline left-parenthesis x right-parenthesis equals upper P left-parenthesis upper X equals x comma upper Y element-of upper S Subscript upper Y Baseline right-parenthesis comma$

(3.10) $p Subscript upper Y Baseline left-parenthesis y right-parenthesis equals upper P left-parenthesis upper X element-of upper S Subscript upper X Baseline comma upper Y equals y right-parenthesis comma$

from where the probability mass function of and can respectively be obtained from the joint function by covering the whole domain of and (i.e. the sample space and ).

The conditional probability mass functions are defined as:

(3.11) $p Subscript upper X vertical-bar upper Y Baseline left-parenthesis x vertical-bar y right-parenthesis equals upper P left-parenthesis upper X equals x vertical-bar upper Y equals y right-parenthesis$

(3.12) $p Subscript upper Y vertical-bar upper X Baseline left-parenthesis x vertical-bar y right-parenthesis equals upper P left-parenthesis upper Y equals y vertical-bar upper X equals x right-parenthesis$

where the symbol reads “ given ”, and reads “ given ”.

They are related through the following equality:

(3.13) $p Subscript upper X comma upper Y Baseline left-parenthesis x comma y right-parenthesis equals p Subscript upper X vertical-bar upper Y Baseline left-parenthesis x vertical-bar y right-parenthesis p Subscript upper Y Baseline left-parenthesis y right-parenthesis equals p Subscript upper Y vertical-bar upper X Baseline left-parenthesis y vertical-bar x right-parenthesis p Subscript upper X Baseline left-parenthesis x right-parenthesis comma$

which is the well‐known Bayes' theorem.

[Continuous] The joint cumulative density function is then defined for as

(3.14) $upper F Subscript upper X comma upper Y Baseline left-parenthesis x comma y right-parenthesis equals upper P left-parenthesis left-brace upper X less-than-or-equal-to x right-brace intersection left-brace upper Y less-than-or-equal-to y right-brace right-parenthesis equals upper P left-parenthesis upper X less-than-or-equal-to x comma upper Y less-than-or-equal-to y right-parenthesis comma$

where the function satisfies the axioms defined in Definition 3.3.

The joint probability density function (if it exists) is then defined as:

(3.15) $f Subscript upper X comma upper Y Baseline left-parenthesis x comma y right-parenthesis equals StartFraction partial-differential squared upper F Subscript upper X comma upper Y Baseline left-parenthesis x comma y right-parenthesis Over partial-differential x partial-differential y EndFraction period$

The marginal probability density function of in relation to is defined as:

(3.16) $f Subscript upper X Baseline left-parenthesis x right-parenthesis equals integral Subscript negative infinity Superscript infinity Baseline f Subscript upper X comma upper Y Baseline left-parenthesis x comma y Superscript prime Baseline right-parenthesis normal d y Superscript prime Baseline dot f Subscript upper Y Baseline left-parenthesis y right-parenthesis equals integral Subscript negative infinity Superscript infinity Baseline f Subscript upper X comma upper Y Baseline left-parenthesis x prime comma y right-parenthesis normal d x Superscript prime Baseline period$

The conditional probability density functions are obtained by the following relation:

(3.17) $f Subscript upper X comma upper Y Baseline left-parenthesis x comma y right-parenthesis equals f Subscript upper X vertical-bar upper Y Baseline left-parenthesis x vertical-bar y right-parenthesis f Subscript upper Y Baseline left-parenthesis y right-parenthesis equals f Subscript upper Y vertical-bar upper X Baseline left-parenthesis y vertical-bar x right-parenthesis f Subscript upper X Baseline left-parenthesis x right-parenthesis period$

Despite the potentially heavy mathematical notation, the concepts represented by those definitions are quite straightforward. The following example illustrates the key ideas.

Example 3.11 Dominoes tile. Consider the situation described in Example 3.6, where the observable attribute is with following the protocol . We can directly associate with a discrete random variable and with a discrete random variable , associated with the sample spaces . Because of the protocol , the events are independent, and therefore, the outcomes of and do not affect each other. We can then write:

Marginal: and ,
Joint: with the sampling space being , where the pairs are ordered (i.e. is not equivalent to , and so on).
Conditional: and .

We propose a slight change in the protocol so that the smaller number of the tile is always on the left. In this case, , and thus, . For example, if it is known that , we are sure that as well. If , then there is still uncertainty since there are two possible values that can assume. The conditional probability mass function will then be . If we consider the case that is known, then if we have , then . If , then can assume two values, and so on. The conditional probability will be then .

The joint probability is considerably simple: . This is the case because there is only one tile of each, and the difference is how to arrange them always keeping the smaller number on the left. With these distributions, we can easily find the marginal distributions as follows:

Another possible way to create a dependence is by defining experiments where new random variables are defined as functions of already defined ones. For example, one new random variable can be a sum or a product of two other random variables. Example 3.7 intuitively introduces this idea. Despite the importance of this topic, the mathematical formulation is rather complicated, going beyond the introductory nature of this chapter. The reader can learn more about those topics, for example, in [2]. What is extremely important to remember is that one shall not manipulate random variables by usual algebraic manipulation; random variables must be manipulated according to probability theory.

Before closing this chapter, there is still one last topic: stochastic processes. A stochastic process, also called random process, is defined as a collection of random variables that are uniquely indexed by another set. For instance, this index set could be related to a timestamp. If we consider that the temperature measured by a given thermometer is a random variable, the index set can be a specific timestamp (i.e. year‐day‐hour‐minute‐second) of the measurement. The stochastic process is then a collection of single measurements – which is the random variable – indexed by the timestamp. A formal definition is presented next.

Example 3.12 Queue in the university restaurant. The manager of the university restaurant is monitoring every half an hour, from 10:00 to 14:30, how many people are lined up in the queue waiting for having lunch. This situation can be defined as:

System : Line of persons waiting to be served
Protocol : (i) Count the number of persons in the queue every half an hour, and (ii) record this number indexed by the observation time.
The attributes are natural numbers, i.e. , with indices .
The random variable for the th observation is:
The stochastic process is defined as .

Figure 3.9 exemplifies this random process considering three different realizations. Each different line refers to a different day and represents the measured number of persons in the queue following the proposed observation time sequence (the observation times are represented by the dotted vertical lines).

Figure 3.9 Example of a random process. The number of persons in a queue, measured every 30 minutes, from 10:00 to 14:30. Three different realizations were simulated.

The different observations that form a stochastic process can be composed of iid random variables as in Example 3.5. However, this is not always the case, and thus, it is important to define random processes in terms of how a given observation is related to past observations of the same process. We will follow [3] and define the following types of stochastic processes.

Definition 3.9 Types of stochastic processes. Consider a process with elements , where each random variable is associated with its own sample space with . We have the joint distribution: and the conditional probability . We can now define different types of stochastic processes as follows.

Memoryless: . In plain words, the present outcome does not depend on past realizations of the process.
Markov processes: . In other words, the present outcome only depends on the last outcome; all the previous outcomes are irrelevant.
Path‐ or history‐dependent processes: , i.e. the present outcome depends on the history, and thus, the is a generic function of all prior outcomes. This is the more general class of processes.

There is also another interesting classification of processes that are related to variations in the sample spaces. They are presented next.

Reinforcement processes: . In this case, the sample space from where the random variable will be selected is a generic function of the previous outcomes by positively or negatively reinforcing outcomes that were successful before.
Processes with dynamical sample spaces: where denotes the cardinality (i.e. the number of elements) of a given set. In those processes, the sample space may increase or decrease as a generic function of previous outcomes.

Example 3.13 Examples of different types of stochastic processes. Consider different observation protocols or experiments related to a random drawing of tiles from a dominoes set.

Memoryless: A protocol defined in Example 3.5, where the selected tile is returned to the bank to be used for the next draw. Each outcome is independent of the other outcomes.
Markov process: The same protocol is used but a new experiment is defined to count the difference between the number of times that 5 and 6 are observed. The probability that the random variable will assume in the next observation for a given value only depends on its current value, and how the new observation will change state. Another similar case, but simpler, is the difference between how many times 5 and 6 are observed as outcomes of sequential dice rolls, exemplified by a Markov chain presented in Figure 3.10.
Path‐ or history‐dependent process: Consider a real dominoes game being played by two persons. If we define as a random variable the last tile put in the table, the stochastic process can be defined as the sequence of tiles being used by the players. This random process depends on the whole history of the game actions.
Reinforcement process: Consider a different protocol where the tile that is selected is positively reinforced in the following way: (i) a tile is selected and removed, (ii) another random tile is randomly removed from the dominoes set, and (iii) two identical tiles from the one that was first selected are returned. In this case, the probability that the same tile will be observed in the next stage increases, positively reinforcing the chances of its occurrence.
Processes with dynamical sample spaces: This case can be exemplified by applying the same observation process used in the memoryless case, but now without returning the selected tile. Therefore, the sample space dynamically decreases during the stochastic process.

Figure 3.10 Markov chain representing a random variable defined by the difference between the numbers of occurrences of 5 and 6 in sequential dice rolls. At each roll, the difference can grow by one if the outcome is 5 with probability 1/6, decrease by one if the outcome is 6 with probability, or be in the same state with probability 4/6.

This ends the proposed brief review of probability theory. Now that we are better equipped to mathematically characterize uncertainty in processes and systems, we can move on to the next chapter, where we will define information as a concept that indicates uncertainty resolution.

3.5 Summary

This chapter introduced the mathematical theory of probability as a formal way to approach uncertainty in systems or processes. By using intuitive examples, we navigated through different concepts like observation process, random variables, sample space, and probability function, among others. The idea was to set the basis for understanding the cyber domain of cyber‐physical systems, which is built upon information – a concept closely associated with uncertainty. As a final note, we would like to stress that the present chapter is an extremely brief introduction, and thus, interested readers are strongly suggested to textbooks in the field, such as [2]. Another interesting book is [3], where the authors build a theory for complex systems using many fundamental concepts of probability theory. In particular, its second chapter provides a pedagogical way to characterize random variables and processes, including well‐known probability distributions and stochastic processes. Finally, reading the original work by Kolmogorov [1] may be also a beneficial exercise.

Exercises

3.1 Players drawing two times the same five tiles. Example 3.3 analyzed the situation where one player is the first one to get five tiles out of the full set containing 28 tiles. The task here is to evaluate other situations.
1. Compute the chances of the player to get the same five tiles considering that he/she draws after the first one has drawn his/her five tiles.
2. Consider that the drawing is performed in a sequential manner so that one player gets one tile first, then the other player gets the second tile, and so on. Compute the chances of the first player of the sequence of getting the same five tiles regardless of the other player's hand.
3. Compute the chances of the second player of the sequence of getting the same five tiles regardless of the other player's hand.
4. Consider the situation where both players get the same hand. Does the drawing protocol affect the chances of the outcome?
3.2 Formal analysis to compute probability. The task in this exercise is to practice the formalization proposed in Definitions 3.2 and 3.3 and illustrated in Example 3.5.
1. Formalize the results obtained in Example 3.2, computing the probability associated with the events described there.
2. Do the same for Example 3.3.
3.3 Maxwell's demon operation. Consider Example 3.10. The task is to formalize the Maxwell's demon experiment based on probability theory and illustrate the equilibrium macrostates before and after the demon's intervention.
1. Formalize the experiment following Definitions 3.2 and 3.3.
2. Plot the Maxwell–Boltzmann distribution introduced in (3.7) for , which is the equilibrium state before the demon's intervention.
3. Find the separation threshold value that leads to .
4. Plot an estimation of the Maxwell–Boltzmann distribution for the two new equilibrium states. [Hint: do not compute the new temperature, just think about the effect of the demon's intervention to guess the new values of the constant .]
3.4 Markov chain. The task is to extend the Markov process presented in Figure 3.10 to the dominoes case presented in Example 3.5, considering a random variable that counts the difference between the number of times that 5 and 6 are observed (note that if or are observed, the state will increase by two or decrease by two, respectively, while does maintain the process in the same state).
1. Formalize the experiment following Definitions 3.2, 3.3, and 3.6.
2. Prove that the difference between the number of times that two is a Markov process, i.e. .
3. Find the transition probabilities .
4. Represent this stochastic process as a Markov chain.

References

1 Kolmogorov A. Foundations of the Theory of Probability: Second English Edition. Courier Dover Publications; 2018.
2 Leon‐Garcia A. Probability, Statistics, and Random Processes For Electrical Engineering. Pearson Prentice Hall; 2008.
3 Thurner S, Hanel R, Klimek P. Introduction to the Theory of Complex Systems. Oxford University Press; 2018.
4 Kroese DP, Brereton T, Taimre T, Botev ZI. Why the Monte Carlo method is so important today. Wiley Interdisciplinary Reviews: Computational Statistics. 2014;6(6):386–392.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3 Uncertainty

Create new playlist

Sign In

Sign Up

3.1 Introduction

3.2 Games and Uncertainty

3.3 Uncertainty and Probability Theory

3.4 Random Variables: Dependence and Stochastic Processes

3.5 Summary

Exercises

References

Table of Contents for
3 Uncertainty