Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5
Probability Theory

Package(s): prob, scatterplot3d, ConvergenceConcepts

5.1 Introduction

Probability is that arm of science which deals with the understanding of uncertainty from a mathematical perspective. The foundations of probability are about three centuries old and can be traced back to the works of Laplace, Bernoulli, et al. However, the formal acceptance of probability as a legitimate science stream is just a century old. Kolmogorov (1933) firmly laid the foundations of probability in a pure mathematical framework.

An experiment, deterministic as well as random, results in some kind of outcome. The collection of all possible outcomes is generally called the sample space or the universal space. An example of the universal space of a deterministic experiment is the distance traveled as a consequence of the application of some force is $c05-math-0001$ . On the other hand, for a random experiment of tossing a coin, the sample space consists of the set {Head, Tail}. The difference between these two types of experiments is the result of the final outcome. For a stationary object, if the application of a force results in an acceleration of $c05-math-0002$ , the distance traveled after 60 seconds is known by the formula $c05-math-0003$ . That is, given the acceleration and time, the distance is uniquely determined. For a random experiment of coin tossing, the outcome is sometimes a Head, and at the other times it is Tail.

In this chapter, we will mainly focus on the essential topics of probability which will be required during rest of the book. We begin with the essential elements of probability and discuss the interesting problems using mathematical thinking embedded with in R programs. Thus, we begin with the sets and elementary counting methods and compute probabilities using the software in Section 5.2. Combinatorial aspects with useful examples will be treated in Section 5.3. The subject of measure theory is rightly required and we then discuss the core concepts required for the developments unfolding in Section 5.4. Conditional probability, independence, and Bayes formula are dealt with in Sections 5.5 and 5.6. Random variables and their important properties are detailed in Section 5.7. Convergence of random variables and other important sequences of functions of random variables are discussed through Sections 5.9–5.12. We emphasize here that you may sometimes come across phrases such as “graphical proof ”, or “it follows from the diagram that …”. In most of these cases, the general statement holds true and admits an analytical proof, which actually means that it is the more mathematical proof that is admitting a cleaner visual display. The reader is cautioned here though that such visual displays do not necessarily imply that the mathematical proof is valid and hence we must resist the temptation to generalize statements based on the displays. The spirit adopted in this chapter in particular is to emphasize that probability concepts can be integrated well with a software and that programming may sometimes be viewed as the e-version of problem solving skills.

5.2 Sample Space, Set Algebra, and Elementary Probability

The sample space, denoted by $c05-math-0004$ , is the collection of all possible outcomes associated with a specific experiment or phenomenon. A single coin tossing experiment results in either a head or a tail. The difference between the opening and closing prices of a company's share may be a negative number, zero, or a positive number. A (anti-)virus scan on your computer returns a non-negative integer, whereas a file may either be completely recovered or deleted on a corrupted disk of a file storage system such as hard-drive, pen-drive, etc. It is indeed possible for us to consider experiments with finite possible outcomes in R, and we will begin with a few familiar random experiments and the associated set algebra.

It is important to note that the sample space is uniquely determined, though in some cases we may not know it completely. For example, if the sequence {Head, Tail, Head, Head} is observed, the sample space is uniquely determined under binomial and negative binomial probability models. However, it may not be known whether the governing model is binomial or negative binomial. Prof G. Jay Kerns developed the prob package, which has many useful functions, including set operators, sample spaces, etc. We will deploy a few of them now.

Example 5.2.1. Tossing Coins: One, Two, $c05-math-0005$

If we toss a single coin, we have $c05-math-0006$ , where $c05-math-0007$ denotes the head and $c05-math-0008$ a tail. For the experiment of tossing a coin twice, the sample space is $c05-math-0009$ . The R function tosscoin with the option times gives us the required sample spaces when we toss one, two, or four coins simultaneously.

> tosscoin(times=1); tosscoin(times=2); tosscoin(times=4)
  toss1
1     H
2     T
  toss1 toss2
1     H     H
2     T     H
3     H     T
4     T     T
   toss1 toss2 toss3 toss4
1      H     H     H     H
2      T     H     H     H
3      H     T     H     H
   .  .  .
15     H     T     T     T
16     T     T     T     T

Thus, we can easily obtain the sample space for coin tossing experiments. A further option available in makespace for the tosscoin returns the probabilities of each element of the sample space which will be taken at a later stage. □

Example 5.2.2. Rolling Die: One, Two, $c05-math-0010$

Die rolling is a very popular experiment for the Probabilists. If the die has six sides, rolling one die has the sample space $c05-math-0011$ , whereas rolling two die has the sample space of paired outcomes varying from one to six, that is, $c05-math-0012$ . In R, we can readily see the sample spaces of the die rolling experiments.

> rolldie(times=1); rolldie(times=2); rolldie(times=3)
  X1
1  1
2  2
3  3
4  4
5  5
6  6
   X1 X2
1   1  1
2   2  1
3   3  1
    . . .
35  5  6
36  6  6
    X1 X2 X3
1    1  1  1
2    2  1  1
    .   .   .
215  5  6  6
216  6  6  6
> rolldie(times=1,nsides=7) # My die has seven sides!
  X1
1  1
2  2
3  3
4  4
5  5
6  6
7  7

The function rolldie can roll dice as many times as is required with the option times and furthermore it may also deal with die of different numbers of sides, as seen in the option nsides in the last part of the previous program. We have of course curtailed the output in the above session.□

Mahmoud (2009) is a recent introduction to the importance of urn models in probability. Johnson and Kotz (1977) is a classic account of urn problems. These discrete experiments are of profound interest and are still an active area of research and applications.

Example 5.2.3. Urn Sample Space

Consider an urn which has five balls of red color, three balls of green color, and eight balls of blue color. Suppose we randomly pick one ball from the urn. Note that by definition of the sample space, which is a collection of all possible outcomes, the sample space associated with the experiment is $c05-math-0013$ .

Now consider further variants of the problem. First, we draw a ball, note its color, and place it back in the urn. Then we shake up the urn well enough to ensure that the next ball drawn is truly random. Yet another variant of the problem would be to not put the ball back in the draw and draw another one. What will the sample space be now?

The function urnsamples will help us to do these tasks. First, we need to carefully define the urn from which we wish to draw the samples. This is achieved using the rep function with the option times. Next, we use the urnsamples to find the sample space from the defined urn and required size, along with the information of whether or not we had placed the ball back in the urn.

> Urn <- rep(c("Red","Green","Blue"),times=c(5,3,8))
> urnsamples(x=Urn,size=1)
     out
1    Red
5    Red
6  Green
8  Green
9   Blue
16  Blue
>  urnsamples(x=Urn,size=2,replace=TRUE)
       X1    X2
1     Red   Red
2     Red   Red
135  Blue  Blue
136  Blue  Blue
> urnsamples(x=Urn,size=2,replace=FALSE)
       X1    X2
1     Red   Red
2     Red   Red
119  Blue  Blue
120  Blue  Blue

Thus, the function urnsamples helps us to obtain the sample space of drawing balls from the urns.□

Example 5.2.4. Card Experiments

A pack of cards contains a set of 13 cards of the 4 suites club, spade, heart, and diamond. Each suit contains cards labeled Ace, 2, 3, ..., 10, Jack, Queen, and King. Depending on the version of the card game being played, the pack may or may not contain two more cards labeled as Joker. Thus, the sample space $c05-math-0014$ for the card game is specified as

> cbind(cards()[1:13,],cards()[14:26,],cards()[27:39,], cards()[40:52,])
   rank suit rank    suit rank  suit rank  suit
1     2 Club    2 Diamond    2 Heart    2 Spade
2     3 Club    3 Diamond    3 Heart    3 Spade
3     4 Club    4 Diamond    4 Heart    4 Spade
4     5 Club    5 Diamond    5 Heart    5 Spade
5     6 Club    6 Diamond    6 Heart    6 Spade
6     7 Club    7 Diamond    7 Heart    7 Spade
7     8 Club    8 Diamond    8 Heart    8 Spade
8     9 Club    9 Diamond    9 Heart    9 Spade
9    10 Club   10 Diamond   10 Heart   10 Spade
10    J Club    J Diamond    J Heart    J Spade
11    Q Club    Q Diamond    Q Heart    Q Spade
12    K Club    K Diamond    K Heart    K Spade
13    A Club    A Diamond    A Heart    A Spade
> cards(jokers=T)[53:54,] # Jokers Rule, Sorry Mr.Super Man
    rank suit
53 Joker <NA>
54 Joker <NA>

The cards function along with the cbind function helped us to view the pack of 52 cards.□

Note that the sample space may be treated as a super set. It has to be exhaustive, covering all possible outcomes. Loosely speaking we may say that any subset of the sample space is an event. Thus, we next consider some set operations, which in the language of probability are events.

Let $c05-math-0015$ be two subsets of the sample space $c05-math-0016$ . The union, intersection, and complement operations for sets is defined as follows:

The union of two sets $c05-math-0017$ and $c05-math-0018$ is defined as:
The intersection of two sets $c05-math-0020$ and $c05-math-0021$ is defined as:
The complement of a set $c05-math-0023$ is defined as:
The set difference, or relative complement, of set $c05-math-0025$ and $c05-math-0026$ is defined by

Software, and hence R, despite their strengths, will be used to illustrate the above using simple examples.

Example 5.2.5. Basic Set Operations For the Card Sample Space

The basic set operations are illustrated using the sample space for the card pack. The operations union, intersect, and setdiff of R gives us the relevant results. Some of these functions are available in both packages base and prob. The difference of their functioning across the packages, if any, may be found by the reader.

> S <- cards()
> A <- S[8:28,]; B <- S[22:35,]
> union(A,B)
   rank    suit
8     9    Club
9    10    Club
34    9   Heart
35   10   Heart
> intersect(A,B)
   rank    suit
22   10 Diamond
23    J Diamond
24    Q Diamond
25    K Diamond
26    A Diamond
27    2   Heart
28    3   Heart
> setdiff(S,A) # Result is complement of A
   rank  suit
1     2  Club
2     3  Club
51    K Spade
52    A Spade

The reader is advised to run the program and investigate the output, since the output has been curtailed here for brevity's sake.□

We will now consider a few introductory problems for computation of probabilities of events, which are also sometimes called elementary events. If an experiment has finite possible outcomes and each of the outcomes is as likely as the other, the natural and intuitive way of defining the probability for an event is as follows:

where $c05-math-0029$ denotes the number of elements in the set. The number of elements in a set is also called the cardinality of the set.

Example 5.2.6. Tossing Coins

Contd. By applying the above definition, we can compute the probabilities for the events in a coin tossing experiment. Notably, it is fairly straightforward that when we toss a fair coin, the probability of obtaining a head is 1/2, and that at least one head showing up when two coins are tossed is 3/4:

However, we would like to use the function tosscoin and verify the same. First, the R program will be given followed by its logic.

> Omega <- tosscoin(times=1)
> sum(Omega=="H")/nrow(Omega)
[1] 0.5
> Omega2 <- tosscoin(times=2)
> sum(rowSums(Omega2=="H")>0)/nrow(Omega2)
[1] 0.75

For the object Omega, we first obtained the sample space using tosscoin. Now we checked how many of the events favor H, that is “Head”, using sum. Recollect, or verify if needed, that the function tosscoin returns a data.frame object, and hence nrow returns the total number of events of the sample space. Similarly, we need rowSums, and not the simple sum, to obtain our second answer. In the most likely case of the reader wondering if this tedious programmatic approach is feasible, let us ask the probability of obtaining heads in the range 4 to 7 when 10 coins are tossed. Now, using the arguments used so far, it would be really difficult. However, we can now write two lines of R code and get the answer.

> Omega10 <- tosscoin(times=10)
> sum((rowSums(Omega10==“H”)>=4) & (rowSums(Omega10==“H”)<=7))/ nrow(Omega10)
[1] 0.7734375

□

Example 5.2.7. Rolling Die

Contd. For the rolling of die experiment, the probability of obtaining an odd number is

Now, how do we get the answer from the rolldie function? An integer $c05-math-0032$ is odd if x %% 2 returns 1, and this logic will be exploited to get the answer for the probability of a die to return an odd number. Thus, the next small R program returns the required answer.

> Omega_{R}oll1 <- rolldie(times=1)
> sum(Omega_Roll1%%2 ==1)/nrow(Omega_Roll1)
[1] 0.5

Frequently, we may be interested in some variations or transformations of the sample space. For example, consider the sample space arising as a consequence of rolling two dice. Here, the sample space is the collection of the couplets $c05-math-0033$ , and it has 36 points. Now, if we were to consider the sum of the numbers on both dice, the relevant sample space is $c05-math-0034$ . The number of elements in the modified sample space is now 11. How do we obtain such a sample space for the rolldie function? Let us work out how to answer the probability of the events in the modified sample space.

> S_Die <- rolldie(times=2)
> table(rowSums(S_Die))/nrow(S_Die)
      2       3          10      11
0.02778 0.05556     0.08333 0.05556
     12
0.02778

For a clearer understanding of the program here, note that the table function gives the frequency of the repeated values of a vector or a data.frame. Thus, if you first run names(table(rowSums(S_Die))) and table(rowSums(S_Die)), all doubts should be addressed. Now solve the problem related to rolldie in the exercise section to complete your understanding of the roll die experiment.□

It is also the case that even for elementary events, all the outcomes are not necessarily equally likely. The next example is a point in case.

Example 5.2.8. Thirteenth of a Month

We have been asked a question: What is the probability that the 13th of a month is a Friday? Assuming that the month is an arbitrary month, we would like to believe that there is no effect on the frequency of weekdays in a month. If simplicity is required, we will treat each month as February, which has exactly four frequencies of each weekday. A consequence of this simplicity is that our answer will be that the probability of the 13th of the month being a Friday is $c05-math-0035$ . It turns out that a perfect cycle of the human calender is 400 years. Thus, for each month of these 400 years, we need to find the frequency of the weekdays on the 13th of that month. This is a daunting task! However, there is a calender in every laptop, and also in any good software such as R. Thus, the next module returns us the desired frequency of the weekdays for the 13th of the month.

> fullyears <- 1601:2000
> months <- 1:12
> testthirteenth <- NULL
> for(i in 1:length(fullyears)) {
+ for(j in 1:12){
+ testthirteenth <- c(testthirteenth,weekdays(as.Date(paste(
+ fullyears[i],"/", months[j],"/13",sep=""),"%Y/%m/%d")))
+ }
+ }
> table(testthirteenth)
testthirteenth
  Friday    Monday  Saturday   Sunday  Thursday   Tuesday Wednesday
     688       685       684      687       684       685       687

This detailed R program requires some discussion. The cycle of 400 years is declared in the variable fullyears, and the 12 months are entered in months. The two loops are coded in such a way that for each year the 12 months are checked for the weekday of the 13th. That is, for each month, we prepare the 13th day with the code as.Date(paste(fullyears[i],“/”,months[j],“/13”,sep=“”),“%Y/%m/%d”). Next, its weekday is extracted with the function weekdays from the base package. This output is then stored in the character vector testthirteenth. Finally, for $c05-math-0036$ months, we obtain the frequency of the weekdays with table(testthirteenth).

The above output shows that Friday has the maximum frequency, and hence it is safe to conclude that the probability of the 13th of a month being a Friday is more likely than any other weekday. More details of this problem may be obtained on Pages 26–27 of Parzen (1960), and we note that this problem was first solved in 1933!□

Example 5.2.9. Laplace Probability of Sunrise

Sometimes we find it useful to define the probability of events using empirical evidence. As an example, if we observed 568 heads out of 1014 throws of a coin, the empirical probability of a head is defined by $c05-math-0037$ . Now, suppose that we have recorded the number of times the sun rises on $c05-math-0038$ days, and we wish to predict the probability of sun rise for the day $c05-math-0039$ . The empirical definition of probability gives the answer as $c05-math-0040$ . Laplace suggested that for events which have occurred M-out-of-N times, the probability of that event occurring on the $c05-math-0041$ -th occasion is defined as $c05-math-0042$ . Thus, if we have observed the sun to rise on 6 out of 6 days, the probability that the sun will rise on the 7th day will be $c05-math-0043$ .

At the outset, this answer may appear stupid! Nevertheless, this answer has far-reaching consequences, and we will arrive at it in at least one important context. Its importance is highlighted by the fact that this solution is sometimes known as Laplace smoothing.□

$c05-math-0044$

5.3 Counting Methods

The events of interest may unfold in a number of different ways. We need mechanisms to find in how many different ways an event can occur. As an example, if we throw two die and count the sum of the numbers that appear on the two faces, the sum of 8 can occur in six different ways, viz., (6,2), (2,6), (5,3), (3,5), (4,4), (4,4). In this section, we will discuss some results which will be useful for a large class of problems and theorems.

The seasoned probabilist Kai Lai Chung has registered the importance of the role of permutations and combinations in a probability course. Over a period of years, he has obtained different answers for the number of ways a man can dress differently from a combination of three shirts and two ties. The answers vary from $c05-math-0045$ and $c05-math-0046$ to $c05-math-0047$ and $c05-math-0048$ . See page 46, Chung and AitSahlia (2003).

A natural extension of the above experiment is shown by the next result.

At the outset, the fundamental theorem of counting may appear very simple. Its strength will be used to derive the total number of ways a task can be done in the next subsection.

5.3.1 Sampling: The Diverse Ways

Sampling from a population can be carried out in many different ways. Suppose that we have $c05-math-0058$ balls in an urn, and each ball carries a unique label along with it. Without loss of generality, we can label the balls 1 to $c05-math-0059$ . In this case we say that the balls are ordered. If the label of the balls carry no meaning, or are not available, that is, indistinguishable, we are dealing with sampling problems with unordered units.

5.3.1.1 Sampling with Replacement and with Ordering

Consider the situation where we draw $c05-math-0060$ units sequentially from $c05-math-0061$ units. Before each draw, the balls in the urn are shaken well so that any ordered ball has the same chance of being selected. At each stage after the draw, the label of the drawn unit is noted and the order of labeled units is duly recorded before placing it back in the urn. Thus, if $c05-math-0062$ denotes the label of the unit drawn on the $c05-math-0063$ occasion, we have an ordered $c05-math-0064$ - tuple $c05-math-0065$ , with $c05-math-0066$ taking a value between $c05-math-0067$ , $c05-math-0068$ . The fundamental counting theorem then gives the answer of doing this task in

5.1

distinct ways.

5.3.1.2 Sampling without Replacement and with Ordering

Suppose that the experiment remains the same as above, with a variation being that the drawn ball is not put back in the urn. That is, at the first draw we have $c05-math-0070$ units to choose from. At the second draw, we have $c05-math-0071$ units to choose from, at the third draw $c05-math-0072$ units, and so on. Thus, at the $c05-math-0073$ draw, we have $c05-math-0074$ units for drawing the unit. Thus, sampling with replacement from ordered units can be performed in $c05-math-0075$ distinct ways. Since we cannot draw more than $c05-math-0076$ units among $c05-math-0077$ , we have the constraint of $c05-math-0078$ . We have again applied the fundamental theorem of counting. Let us introduce a notation here of continued product: $c05-math-0079$ to denote $c05-math-0080$ . That is

5.2

In popular terms, this is the permutation of $c05-math-0082$ units sampled from a pool of $c05-math-0083$ units. An interesting case is the permutation of obtaining all the $c05-math-0084$ units. This experiment is about drawing all the units without replacement from the urn and is given by

To see the behavior of the number of ways we can possibly draw, let us look at the permutation of drawing 1 to 12 units.

> sapply(1:12,factorial)
[1]  1 2 6 24 120 720 5040  40320 362880
[10] 3628800  39916800 479001600

The sapply function ensures that the factorial function is applied on each integer 1 to12.

5.3.1.3 Sampling without Replacement and without Ordering

We now consider a sampling variation of the previous experiment. In this experiment we do not record the order of the occurrence of the sampling unit. Alternatively, we can think of this experiment as seeing the final result of sampling the desired number of $c05-math-0086$ units. We have seen that a sample of $c05-math-0087$ units can be obtained in $c05-math-0088$ different ways. Furthermore, the number of distinct ways of obtaining a sample of $c05-math-0089$ units from $c05-math-0090$ is given by the continued product $c05-math-0091$ . Thus, the number of ways of obtaining an unordered sample of $c05-math-0092$ units by sampling without replacement from $c05-math-0093$ units is $c05-math-0094$ . By multiplying the numerator and denominator by $c05-math-0095$ , we get the desired result

5.3

5.3.1.4 Sampling with Replacement and without Ordering

In this setup, we draw $c05-math-0097$ balls one after another, replacing the drawn ball in the urn before the next draw. During this process, we register the frequencies of the labels drawn with possible repetitions without storing the order of occurrence. Note that in this case $c05-math-0098$ may be less than $c05-math-0099$ .

We summarize all the results in the table above.

Example 5.3.3. Random Sampling Numbers

Feller (1968), pages 31–2. Consider the population of 10 digits, $c05-math-0104$ . Suppose that we draw five digits, with replacement, randomly from it and wish to compute the probability of all the five digits being different. By earlier discussion, this probability is easily seen to be $c05-math-0105$ , which on evaluation gives the answer, in R, as

> prod(10:6)/10^{5}
[1] 0.3024

Feller argues this probability through the use of mathematical tables. Let us begin with his discussion on the use of digits of the irrational number $c05-math-0106$ . Consider the first 800 digits of $c05-math-0107$ . The reader may obtain the first 2 million digits of $c05-math-0108$ at http://apod.nasa.gov/htmltest/gifcity/e.2mil, and we too have obtained the values from it. The 800 digits can be grouped into 160 units, each unit consisting of a succession of 5 digits. Group the 160 units into 16 batches, with each batch containing 10 groups of 5 digits of the irrational number $c05-math-0109$ . Finally, find the frequency of numbers in which the block has all distinct digits. We will first discuss the program for Fellers arguments.

> data(e800)
> dis_count <- NULL
> for(i in 1:16){
+     temp <- e800[(10*(i-1)+1):(10*i),]
+     dis_count[i] <- 0
+     for(j in 1:nrow(temp)){
+   if(length(unique(as.numeric(temp[j,])))==5) dis_count[i] <- dis_count[i]+1
+ }
+ }
> dis_count # Matches exactly the numbers on Page 32 of Feller (1968)
 [1] 3 1 3 4 4 1 4 4 4 2 3 1 5 4 6 3
> mean(dis_count)/10
[1] 0.325

Let us understand this program. The dataset e800 consists of the first 800 digits of the irrational number $c05-math-0110$ . The object dis_count represents the number of times we have unique digits among the five digits in a batch of ten such collections, and this is obtained through the loop for(j in 1:nrow(temp)). The search is carried out for all the 16 blocks in the loop for(i in 1:16). Thus, the average of dis_count returns the probability of 0.325, which is closer to the theoretical probability of 0.3024.□

Example 5.3.4. Probability of $c05-math-0111$ Balls Occupying $c05-math-0112$ Cells

Consider an experiment where $c05-math-0113$ balls are randomly placed among $c05-math-0114$ cells and the problem is computation of the probability that each cell contains one ball. We can place $c05-math-0115$ balls in $c05-math-0116$ cells in $c05-math-0117$ ways and the total number of ways in which the $c05-math-0118$ balls can be placed among the $c05-math-0119$ cells is $c05-math-0120$ . Hence, the probability of the event of $c05-math-0121$ randomly placed balls occupying the $c05-math-0122$ cells is

Consider $c05-math-0124$ . The small R program is then as follows.

> n <- 1:10
> prob_n_out_of_n <- factorial(n)/n^{n}
> plot(n,prob_n_out_of_n,type="h")
> title("Probability of All Cells Being Occupied")

The plot is given in Part A of Figure 5.1.□

Figure 5.1 A Graph of Two Combinatorial Problems

Example 5.3.5. Probability of $c05-math-0125$ Passengers Leaving at Different Floors

A building has $c05-math-0126$ floors, excluding the ground floor, so let us suppose that the lift capacity is 10 people. We begin at the ground floor. The event of interest is that no two people leave at any given floor. Let us assume that we have 1 to 10 people at the beginning of the lift. The probability, starting with $c05-math-0127$ people, of the event is then given by

The R program below gives an interesting plot, Part B of Fiqure 5.1.

> n <- 10 # Floors
> r <- 1:10 # Number of Passengers
> prob_distinct_fn <- function(n,r) prod(n:(n-r+1))/n^{r}
> prob_all_distinct <- sapply(r,prob_distinct_fn,n=n)
> plot(r,prob_all_distinct,"h")
> title("B: Probability of Passengers Leaving at Distinct Floors")

Compare and contrast the two diagrams given in Figure 5.1.□

5.3.2 The Binomial Coefficients and the Pascals Triangle

The Pascal's triangle is a simplistic and useful triangle for obtaining the binomial coefficients. To begin with, consider the following relationship:

This relationship says that to obtain $c05-math-0130$ , we can obtain it as the sum of ways of selecting $c05-math-0131$ and $c05-math-0132$ objects from $c05-math-0133$ . Thus, we can move to higher levels using the quantities at one level below it. This leads to the famous Pascal's triangle. A short program is given below to obtain the triangle.

> pascal <- function(n){
+ if(n<=1) pasc=1
+ if(n==2) pasc=c(1,1)
+ if(n>2){
+ pasc <- c(1,1)
+ j <- 2
+ while(j<n){
+     j <- j+1
+     pasc=c(1,as.numeric(na.omit(filter(pasc,rep(1,2)))),1)
+ }
+ }
+ return(pasc)
+ }
> sapply(1:7, pascal)
[[1]]
[1] 1
[[2]]
[1] 1 1
[[3]]
[1] 1 2 1
[[4]]
[1] 1 3 3 1
[[5]]
[1] 1 4 6 4 1
[[6]]
[1]  1  5 10 10  5  1
[[7]]
[1]  1  6 15 20 15  6  1

With the basics of combinatorics with us, we are now equipped to develop R solutions for some interesting problems in probability.

5.3.3 Some Problems Based on Combinatorics

Feller (1968) has a very deep influence on almost all the Probabilists. Diaconis and Holmes (2002) considered some problems from Feller's Volume 1 in the Bayesian paradigm. Particularly, we consider two of those three problems here: (i) The Birthday Problem, and (ii) The Banach Match Box Problem. The problems selected here serve the point that probability on some occasions can be very counter-intuitive. A survey of some of these problems may also be found in Mosteller (1962).

Example 5.3.6. The Birthday Problem

Suppose we have a group of $c05-math-0134$ people. In the birthday problem, we have $c05-math-0135$ cells. Thus, the total number of ways of arranging the $c05-math-0136$ birthdays is $c05-math-0137$ different ways. The number of ways in which all the $c05-math-0138$ birthdays are different is then the same as k-out-of-n permutations, that is, $c05-math-0139$ . Hence, the probability of $c05-math-0140$ different birthdays is $c05-math-0141$ , the complement of which gives the probability of at least two birthdays being the same is $c05-math-0142$ .

Williams (2001) has put forth his own experience of the birthday problem in the classroom:

True Story. I learnt the value of statistics early. In a lecture to students, I had mentioned the result in Subsection 13O that if you have 23 people in a room then there is a probability of more than $c05-math-0143$ that two of them have the same birthday. One student came to see me that evening to argue that 23 wasn't anywhere near enough. We went through the proof several times; but he kept insisting that there must be flaw in the basic theory. After some considerable time, I said. “Look, there must be 23 people still awake in College. Let's ask them their birthdays. To start with, when's yours?” “April 9th”, he said. I replied (truthfully!) “So's mine”.

The next R program gives us the birthday probabilities where we compute the probability of obtaining the same birthday if there are 2, 5, 10, 20, …, 50, people in a classroom.

> k <- c(2,5,10,20,30,40,50)
> probdiff <- c(); probat2same <- c()
> for(i in 1:length(k))  {
+   kk <- k[i]
+   probdiff[i] <- prod(365:(365-kk+1))/(365^{k}k)
+   probat2same[i] <- 1- prod(365:(365-kk+1))/(365^{k}k)
+                         }
> plot(k,probat2same,xlab="Number of Students in Classroom",
+      ylab="Birthday Probability",col="green","l")
> lines(k,probdiff,col="red","l")
> legend(10,1,"Birthdays are not same",box.lty=NULL)
> legend(30,.7,"Birthdays are same",,box.lty=NULL)
> title("A: The Birthday Problem")

The R numeric vectors probdiff and probat2same respectively compute $c05-math-0144$ and $c05-math-0145$ . The code prod(365:(365-kk + 1))/(365^kk)!birthday probability"?> gives $c05-math-0146$ , and by using it we easily obtain probat2same. The plot functions, lines, legend, and title are simple to follow.

c05-math-0101 — **Table 5.1** Diverse Sampling Techniques

The birthday probabilities are summarized in Table 5.2. It can be seen from the table that we need to have just 50 people in a classroom to be almost sure of finding a pair of birthday mates. Part A of Figure 5.2 gives the visual display of the table probabilities. The meeting point of the complementary curves at 0.5 gives k=23 as claimed in Williams (2001).□

Table 5.2 Birthday Match Probabilities

Size k	Probability of Different Birthdays	Probability of at least Two Same Birthdays
2	0.9973	0.0027
5	0.9729	0.0271
10	0.8831	0.1169
20	0.5886	0.4114
30	0.2937	0.7063
40	0.1088	0.8912
50	0.0296	0.9704

Two plots, with the headings: A: The birthday problem and B: The match box problem, with Birthday probability and Cumulative probability on the y-axes, and Number of students in classroom and Number of sticks remaining on the x-axes. — **Figure 5.2** Birthday Match and Banach Match Box Probabilities

Example 5.3.7. The Banach Match Box Problem

This problem is credited to the famous mathematician Stephan Banach. Suppose that the mathematician keeps two match boxes, one each in the right and left pockets. For simplicity, we assume that both the match boxes have an equal number of sticks in the beginning. Whenever required, a match box is randomly selected from the pockets and used and then put back in its place. We are interested in finding the number of sticks remaining in the other box on the occasion of finding one of the boxes empty.

Suppose that each match box has 50 sticks at the beginning. Since each match box is selected with equal probability, we intuitively believe that when one of the match boxes is found empty, the other match box must also be nearly empty. The intuition further says that the probability of the other match box being nearly empty must be very high. If we are hard pressed to put our belief in numbers, we may say that the probability of the non-empty match box containing 5 or less sticks is very high, say 0.7, and that the probability of it having more than 5 sticks is less than 0.3.

Mathematically, the theory unfolds as follows. A probabilist has two match boxes in each of his two pockets, each containing $c05-math-0147$ sticks. A stick is randomly selected from either the right or the left match box and is then used. We are interested in knowing the number of sticks left in the other match box when one of the two boxes is found to be empty. Let $c05-math-0148$ denote the number of $c05-math-0149$ sticks in a box when the other box is found empty. The well-known result about the probability of $c05-math-0150$ is that

5.4

The combinatorial probabilities can be obtained by the following R program:

> match_prob <- function(x) choose(2*N-x,N)*2^{-(2*N-x)}
> #Verifing Fellers Match Box Probabilities on Page 166
> N <- 50
> round(sapply(0:30,match_prob),6)
 [1] 0.079589 0.079589     0.067902 0.063568
 [9] 0.058783 0.053672     0.027677 0.023171
[17] 0.019082 0.015447     0.004041 0.002901
[25] 0.002034 0.001392
> plot(0:50,cumsum(sapply(0:50,match_prob)),xlab="Number of Sticks
+ Remaining", ylab="Cumulative Probability","l")
> title("The Match Box Problem")

The function match_prob computes the probability given by Equation 5.4. The plot function attempts to give the cumulative probability as discussed earlier, the code cumsum and its arguments. The cumulative probability plot of the number of sticks against the number of sticks is given in Figure 5.2. It is thus again a surprise to the reader that the plot is almost the exact reverse of what their intuition said.□

We will now digress a little here and have a close look at the failure of the definition of probability discussed thus far. It seems to our intuition that if $c05-math-0152$ , the probability for any subset $c05-math-0153$ must exist. That is, any subset $c05-math-0154$ must have a well defined measure. Ross and Pekoz (2007) consider a very elegant example of a subset $c05-math-0155$ for which the measure cannot be defined in any rational way. Suppose that we begin at the top of a unit circle and take a step of one unit radian in a counter-clockwise direction. If required, we may perform more than one loop. The task is then computation of the probability of returning to the point from which we started our journey. Suppose that it takes $c05-math-0156$ steps and $c05-math-0157$ loops to return to the top of the circle. This implies that $c05-math-0158$ . As $c05-math-0159$ is an irrational number, it cannot be expressed as the ratio of two integers $c05-math-0160$ . Thus, the probability of returning to the start point cannot be measured.

This and some other limitations of the definition of probability considered up to now are overcome using the Kolmogorov's definition of probability, which is based on Measure Theory. We will leave out the details of these important discussions!

$c05-math-0161$

5.4 Probability: A Definition

We will begin with a few definitions and lemmas.

5.4.1 The Prerequisites

Let us consider a class of sets of $c05-math-0162$ , say $c05-math-0163$ . The sequence $c05-math-0164$ is said to be monotone increasing if $c05-math-0165$ for each $c05-math-0166$ . The sequence is said to be monotone decreasing if $c05-math-0167$ for each $c05-math-0168$ . We introduce two sequences infimum $c05-math-0169$ and supremum $c05-math-0170$ as below:

5.5

The symbol $c05-math-0172$ is used to convey that the quantity on the left-hand side is by definition equal to that on the right-hand side. It is easier to see that the $c05-math-0173$ sequence is monotone increasing sequence, whereas the $c05-math-0174$ sequence is a monotone decreasing sequence. We further define sequences $c05-math-0175$ and $c05-math-0176$ as follows:

5.6

Example 5.4.1. Basic Set Operations

Let $c05-math-0182$ be any two arbitrary sets. Define

By definition, it is easier to see that for $c05-math-0184$ , $c05-math-0185$ , whereas $c05-math-0186$ . Furthermore, the $c05-math-0187$ and $c05-math-0188$ are respectively $c05-math-0189$ , and $c05-math-0190$ . This implies for the sequence $c05-math-0191$ to have a limit, we must have $c05-math-0192$ , which is possible if $c05-math-0193$ . We attempt to write a small R program to illustrate the concepts here.

> # Illustrating limsup and liminf using R
> Omega <- letters
> A <- letters[1:5]; B <- letters[3:10]
> #n= 1, 2, ...
> n <- 1000 #We can't have infinity, so lets do with large n
> liminfsequence <- NULL; limsupsequence <- NULL
> An <- list() # Obtaining the An's
> for(i in 1:n){
+   if(i%%2 == 1) An[[i]] <- A else An[[i]] <- B
+ }
> # Obtaining the Bn's and Cn's
> Bn <- list()
> Cn <- list()
> for(i in 1:n){
+     Bn[[i]] <- An[[i]]
+     Cn[[i]] <- An[[i]]
+     for(j in (i+1):n){
+   Bn[[i]] <- intersect(Bn[[i]],An[[j]])
+   Cn[[i]] <- union(Cn[[i]],An[[j]])
+ }
+ }
Error in An[[j]] : subscript out of bounds
> #Purely from programming point of view ignore Bn[[n]] and Cn[[n]]
> for(i in 1:(n-1)){
+     liminfsequence <- Bn[[i]]
+     limsupsequence <- Cn[[i]]
+     for(j in (i+1):n){
+   liminfsequence <- union(liminfsequence,Bn[[i]])
+   limsupsequence <- intersect(limsupsequence,Cn[[j]])
+ }
+ }
> liminfsequence
[1] "c" "d" "e"
> limsupsequence
[1] "c" "d" "e" "f" "g" "h" "i" "j"

A purist may criticize the above program for incorrectly carrying out the first step of obtaining $c05-math-0194$ in an erroneous way. The way we look at this criticism is that the steps carried out here are in the spirit of a simulation study for asymptotic results. In neither case, we never really reach $c05-math-0195$ . On the other hand, this simple program (if the reader agrees!) illustrates some nice points, such as the fact that $c05-math-0196$ . To see this point, experiment with different $c05-math-0197$ s and $c05-math-0198$ s. Of course, we have restricted ourselves to the case where the $c05-math-0199$ s are a finite discrete set.□

It is also common to refer to the couplet $c05-math-0217$ as a field. It may be easily proved that a field is also closed under finite union.

Example 5.4.3. A List of Fields

We need to verify that a class of sets $c05-math-0218$ satisfies the three criteria of field in Definition 5.4.2 if it is to be a field. It is easy to see that the examples below are indeed fields.

1. Let $c05-math-0219$ be any set. Then $c05-math-0220$ is a field which is also called the trivial field, where $c05-math-0221$ is the null set.
2. For an arbitrary set $c05-math-0222$ , the collection of all subsets of $c05-math-0223$ is a field. This field is referred to as the power set.
3. For the set $c05-math-0224$ , the collection $c05-math-0225$ , for any subset $c05-math-0226$ , is an example of a field.
4. Let $c05-math-0227$ and $c05-math-0228$ be any two fields of $c05-math-0229$ . Then $c05-math-0230$ , the intersection of two fields also results in a field. It is to be noted that an arbitrary intersections of field is also a field.
5. For a set $c05-math-0231$ , the collections $c05-math-0232$ and $c05-math-0233$ are fields. However, the union $c05-math-0234$ is not a field since $c05-math-0235$ is not in the class of unions.

A useful general class of field is developed next.□

In plain words, the minimal field is the least collection of sets containing $c05-math-0241$ , which is a field.

Fields need a generalization and especially in the case of countably infinite or continuous set $c05-math-0249$ .

The couplet $c05-math-0260$ is sometimes called the $c05-math-0261$ -field.

As earlier, we can define the minimal $c05-math-0274$ -field as an extension of the minimal field. There is an important class of $c05-math-0275$ -field, which will be defined now.

It may be verified that the Borel $c05-math-0286$ -field consists of the intervals, for all $c05-math-0287$ , of the following types:

$c05-math-0288$
$c05-math-0289$
$c05-math-0290$

$c05-math-0291$
$c05-math-0292$
$c05-math-0293$

The definition of $c05-math-0294$ may be extended to $c05-math-0295$ and the reader may refer to Chung (2001). Starting with the sample space $c05-math-0296$ , a general class of sets $c05-math-0297$ has been defined. Next a formal definition of a measure is required.

Let us now look at two important types of measures.

We are now equipped with the necessary tools for a proper construction of a probability measure.

5.4.2 The Kolmogorov Definition

Here, the finiteness refers to the collection of sets in $c05-math-0330$ and additiveness refers to disjoint collection of events. Any set $c05-math-0331$ is called an event or (more precisely) a measurable event. Next, we list some properties of the finitely additive probability measure:

$c05-math-0332$ .
$c05-math-0333$ .
If $c05-math-0334$ , then $c05-math-0335$ .
$c05-math-0336$ .
The General Additive Rule: $c05-math-0337$ , and hence $c05-math-0338$ .
For an arbitrary set $c05-math-0339$ and a collection of mutually exclusive and exhaustive sets $c05-math-0340$ :

We now extend the finitely additive probability measure to the countably infinite case.

Example 5.4.11. The Cantor Set

Consider the unit interval $c05-math-0382$ . First, remove the open interval $c05-math-0383$ from $c05-math-0384$ so that we are now left with $c05-math-0385$ . From each remaining interval remove the mid-thirds to obtain $c05-math-0386$ . This process is continued ad infinitum. The remaining $c05-math-0387$ ad infinitum is well-known in real analysis as the Cantor set. It may be shown that the Cantor set is a Borel set, see Royden (1987). Define

Note that for each $c05-math-0389$ , $c05-math-0390$ is the union of $c05-math-0391$ sets. The Cantor set is defined by

At each step we can see that $c05-math-0393$ is an uncountable set. Since the intersection of uncountable sets is again an uncountable set, the Lebesgue measure of the Cantor set is expected to be a positive number. The Lebesgue measure of the Cantor set is obtained from those of $c05-math-0394$ as

For each $c05-math-0396$ , the Lebesgue measure of $c05-math-0397$ can be seen as $c05-math-0398$ . Thus, we have

Hence, it is a surprise that the Lebesgue measure of an uncountable set can be zero. A plot of the $c05-math-0400$ 's can be obtained from the following R program, see its output in Figure 5.3.

> n <- 0:6
> plot(c(0,1), c(0,6), type="n", xlab="The Unit Interval",ylab="n")
> title("The Cantor Set: A Visual Treat")
> points(c(0,1),c(0,0),"l",lwd=5)
> for(i in 2:7){
+     nn <- n[i]
+     points(c(0,1),c(nn,nn),"l",lwd=5)
+     for(j in 1:{3^{nn-1}}) points(c((3*j-2)/3^{nn},
+     (3*j-1)/3^{nn}),c(nn,nn),"l",lwd=5,col="white")
+     }

Figure 5.3 The Cantor Set

The code points(c((3*j-2)/3^{nn},(3*j-1)/3^{nn}),c(nn,nn),…) easily imitates the expression for $c05-math-0401$ . Thus, R is useful to obtain a visual display of events leading to the Cantor set.□

5.5 Conditional Probability and Independence

An important difference measure theory and probability theory is in the notion of independence of events. The concept of independence is brought through conditional probability and its definition is first considered.

The next examples deal with the notion of conditional probability.

Example 5.5.1. The Dice Rolling Experiment

Suppose that we roll a die twice and note the sum of the outcomes as 9. The question of interest is then what is the probability of observing 1 to 6 on Die 1? Using the rolldie and prob functions from the prob package, we can obtain the answers.

> S <- rolldie(2,makespace=TRUE)
> prob(S,X1==1,given=(X1+X2==9))
[1] 0
> prob(S,X1==2,given=(X1+X2==9))
[1] 0
> prob(S,X1==3,given=(X1+X2==9))
[1] 0.25
> prob(S,X1==4,given=(X1+X2==9))
[1] 0.25
> prob(S,X1==5,given=(X1+X2==9))
[1] 0.25
> prob(S,X1==6,given=(X1+X2==9))
[1] 0.25

□

Conditional probabilities help in defining independence of events.

Some properties of the independence, if events $c05-math-0415$ and $c05-math-0416$ are independent, are listed as:

$c05-math-0417$ and $c05-math-0418$ are independent.
$c05-math-0419$ and $c05-math-0420$ are independent.
$c05-math-0421$ and $c05-math-0422$ are independent.

Example 5.5.2. The Intel Fiasco

This example is drawn from Horgan (2008). In the year 1994, Intel manufactured a chip which carried out 1 incorrect division in 9 billion opportunities. Intel transformed this story as an error occurring once in 27 000 years. A statistician will be typically performing millions of division every week. Let us see how this chip will be affecting him. By the data here:

For a probabilist, if he performs a billion divisions, the probability of him making no error is then given by:

> proberror= 1 - 1/9000000000
> (noerrorbill=proberror^{1}000000000)
[1] 0.8948393

Suddenly, this probability looks very high. This is not a surprise for a Statistician, and naturally this fact forced Intel to withdraw the chip from the market.□

5.6 Bayes Formula

An excellent exposition of the the Bayes formula is provided in Chapter 4 of Bolstad (2007). Consider a partition of the sample space $c05-math-0424$ in a collection of $c05-math-0425$ events $c05-math-0426$ , also defined earlier in Section 5.4, from the probability space $c05-math-0427$ . That is, $c05-math-0428$ . Consider an arbitrary event $c05-math-0429$ . Then $c05-math-0430$ . Since $c05-math-0431$ 's form a partition of $c05-math-0432$ , it can be visualized from the Venn diagram, Figure 5.4, that $c05-math-0433$ 's are also a disjoint set of events. Hence,

Image described by caption and surrounding text. — **Figure 5.4** Venn Diagram to Understand Bayes Formula

The last step of the above expression is obtained by the multiple rule of probability. The beauty of Bayes formula is that we can obtain inverse probabilities, that is, the probability of the cause given the effect. This famous Bayes formula is given by:

Example 5.6.1. Classical Problem from Hoel, Port, and Stone (1971)

Suppose there are three tables with two drawers each. The first table has a gold coin in each of the drawers, the second table has a gold coin in one drawer and a silver coin in the other drawer, while the third table has silver coins in both of the drawers. A table is selected at random and a drawer is opened which shows a gold coin. The problem is to compute the probability of the other drawer also showing a gold coin. The Bayes formula can be easily implemented in an R program.

> prob_GC <- c(1,1/2,0)
> prior <- c(1/3,1/3,1/3)
> post_GC <- prob_GC*prior
> post_GC/sum(post_GC)
[1] 0.6666667 0.3333333 0.0000000

Thus, the probability of the other drawer containing a gold coin is 0.6666667.□

5.7 Random Variables, Expectations, and Moments

5.7.1 The Definition

Sample events on their own are not always of interest, as we may only be interested in general events. For instance, if 100 coins are thrown, we may not be interested in the probabilities of the event itself, but we may question the number of heads in the range of 10 to 40, or > 80. Thus, we need to consider functions of the events which help to answer generic questions. This will lead us to the concept of random variable.

In plain words, it is required that for any Borel set $c05-math-0442$ , the inverse image $c05-math-0443$ must belong to $c05-math-0444$ :

A random variable will be simply abbreviated as RV. A commonly accepted convention is to denote the random variables by capital letters $c05-math-0446$ , $c05-math-0447$ , $c05-math-0448$ , etc., and the observed values by small letters $c05-math-0449$ , $c05-math-0450$ , $c05-math-0451$ , etc. respectively. Random variables are identified to be of three types: (i) simple random variable, (ii) elementary random variable, and (iii) extended random variable. A random variable $c05-math-0452$ is said to be simple if there exists a finite partition of $c05-math-0453$ in $c05-math-0454$ , $c05-math-0455$ finite, implying $c05-math-0456$ , such that

5.10

where $c05-math-0458$ if $c05-math-0459$ and 0 otherwise. The random variable $c05-math-0460$ is said to be elementary if for a countably infinite partition of $c05-math-0461$ in $c05-math-0462$

5.11

Finally, the random variable $c05-math-0464$ is said to be an extended random variable if $c05-math-0465$ . An extended RV is often split into a positive part and a negative part defined by

5.12

It can be easily seen that $c05-math-0467$ .

Example 5.7.2. Rolling Dies

Contd. Suppose that we throw two die. Let $c05-math-0477$ be the sum of outcomes of the two die. Then

Here, the sample events are $c05-math-0479$ , whereas we are interested in knowing whether the sum is $c05-math-0480$ . It is possible to define random variables in R for some random experiments. We will illustrate this in the example here. The necessary R codes are given next.

> S <- rolldie(2,makespace=TRUE)
> S <- addrv(S, U = X1+X2)
> for(i in 2:12) print(prob(S,U==i))
[1] 0.02777778
[1] 0.05555556
   ...
[1] 0.02777778
>  aggregate(S$probs,by=list(S$U),sum) # gives same result

Using the rolldie and addrv from the prob package, we can understand random variables for the die rolling experiments. It may be seen that $c05-math-0481$ is a simple random variable.□

Example 5.7.3. Jiang's Example 2.2

Let $c05-math-0482$ and $c05-math-0483$ be the Borel $c05-math-0484$ - field. The probability measure is the Lebesgue measure. Define a sequence of random variables $c05-math-0485$ as follows:

We need to first define the sequence properly. That is, we need an expression which will describe the probability distribution of $c05-math-0487$ .

The first task towards this is to understand that the serial number list can be generated as a function of two series. The serial numbers $c05-math-0488$ can be obtained from the two series $c05-math-0489$ , and $c05-math-0490$ . This can be seen below:

> m <- 0:4
> index <- 0
> serial_number <- c()
> for(k in 2:length(m)){
+     i <- 0:(2^{m[k]}-1)
+     for(j in 1:length(i)){
+ index <- index+1
+ serial_number <- c(serial_number,index)
+ }
+ }
> serial_number[1:10]
 [1]  1  2  3  4  5  6  7  8  9 10

We will use an R program to find the intervals over which $c05-math-0491$ takes the value of 1.

> # myintervals[1,]=c(0,1)
> m <- 0:4
> myintervals <- matrix(nrow=1000,ncol=2)
> index <- 0
> for(k in 2:length(m)){
+     i=0:(2^{m[k]}-1)
+     for(j in 1:length(i)){
+ index=index+1
+ myintervals[index,1]=i[j]/{2^{m}[k]}
+ myintervals[index,2]=(i[j]+1)/{2^{m}[k]}
+ }
+ }
> myintervals[1:10,]
       [,1]  [,2]
 [1,] 0.000 0.500
 [2,] 0.500 1.000
 [3,] 0.000 0.250
 [4,] 0.250 0.500
 [5,] 0.500 0.750
 [6,] 0.750 1.000
 [7,] 0.000 0.125
 [8,] 0.125 0.250
 [9,] 0.250 0.375
[10,] 0.375 0.500

Is this all? The authors do not think so. Let us ask what will the plot of the random variables look like. Again, let us use a programming angle and plot the first eight random variables, see Figure 5.5.

> x <- seq(-0.1,1.1,0.01)
> rx <- function(x,a,b) ifelse({x>=a}&{x<=b},1,0)
> par(mfrow=c(2,4))
> for(i in 1:8){
+     plot(x,y=x*0+1,"n",xlab=expression(omega),
+     ylab=expression(X(omega)),ylim=c(0,1.1),
+     main=paste("Plot of X",i,sep=""))
+     lines(x,sapply(x,rx,a=myintervals[i,1],
+     b <- myintervals[i,2]),"l",cex=10)
+ }

Figure 5.5 displays the first eight random variables and should also further help the reader to visualize the random variables further in the sequence. This example will also be continued further.□

Figure 5.5 Plot of Random Variables for Jiang's example

We next consider some important properties of random variables.

Properties of Random Variables

Borel function $c05-math-0492$ of a random variable $c05-math-0493$ is also a random variable.
If $c05-math-0494$ is an RV, then $c05-math-0495$ is also a RV for $c05-math-0496$ .
Let $c05-math-0497$ and $c05-math-0498$ be two RVs in a probability space $c05-math-0499$ . Then $c05-math-0500$ and $c05-math-0501$ are also RVs.
Consider an independent sequence of RVs $c05-math-0502$ . Define, for all $c05-math-0503$ ,

Then $c05-math-0505$ , $c05-math-0506$ , $c05-math-0507$ , and $c05-math-0508$ are all RVs.
If $c05-math-0509$ converges as $c05-math-0510$ for every $c05-math-0511$ , then $c05-math-0512$ is also an RV.

An RV is better understood through important summaries such as mean and variance. We next define expectation of RV, which further helps in obtaining these summaries.

5.7.2 Expectation of Random Variables

As an introductory, the expectation of a discrete RV is defined by $c05-math-0513$ , and for a continuous RV, by $c05-math-0514$ . In light of the three types of RV discussed earlier, we need to now define the expectation of an RV for simple, elementary, and extended RVs defined in a probability space $c05-math-0515$ . For the moment, we will assume that the extended RVs are non-negative RVs.

The expectations of simple RV and elementary RV as defined respectively in Equations 5.10 and 5.11 are given by

5.13

5.14

The expectation of a non-negative random variable is defined by

5.15

It may be noted that the limit can be $c05-math-0519$ too. The expectation of an arbitrary (extended) RV, as given in Equation 5.12, is defined by

5.16

provided that at least one of $c05-math-0521$ and $c05-math-0522$ is finite. If both $c05-math-0523$ and $c05-math-0524$ are infinite, the expectation of the RV is not defined.

Example 5.7.4. Expectation of RVs Through a Program. $c05-math-0525$ ¹

Mathematically, the expectations of the different types of RVs is defined in Equations 5.13–5.16. Binomial, Poisson, exponential, normal, and Cauchy RVs are some of the very important probability RVs in the subject, for more details refer to Chapter 6. The choice of these random variables is partly due to their importance and also that it is likely that the reader will be familiar with earlier exposure to the subject. The binomial RV is an example of a simple RV, Poisson of an elementary RV, or exponential RV of a non-negative RV, while normal Cauchy RVs are examples of arbitrary RVs. The functions sum and integrate can be used to obtain the mean of RVs. However, we need to caution the reader that numerical computations, as performed here, are not an alternative to analytical and mathematical thinking and as such the results here need to be taken with a lot of care. We will begin with the binomial and Poisson RVs. Let $c05-math-0526$ and $c05-math-0527$ be the parameters for a binomial RV and $c05-math-0528$ be the parameter for a Poisson RV. The expectation, or mean, of these two RVs are respectively known to be $c05-math-0529$ and $c05-math-0530$ . The probability mass function of these two RVs are respectively given in R by dbinom and dpois.

> sum(0:10*dbinom(0:10,size=10,p=0.3)) # Simple RV
[1] 3
> sum(0:1e7*dpois(0:1e7,lambda=5)) # Elementary RV
[1] 5
> sum(0:25*dpois(0:25,lambda=5))
[1] 5
> sum(0:30*dpois(0:30,lambda=5))
[1] 5
> sum(0:25*dpois(0:25,lambda=5))==sum(0:30*dpois(0:30,lambda=5))
[1] FALSE

Since the binomial RV is an example of a simple RV, and in the present case $c05-math-0531$ , the different integer values are 0, 1, …, 10. The parameter values of $c05-math-0532$ and $c05-math-0533$ are respectively declared with the options size=10 and p=0.3. Thus, sum(0:10*dbinom()) computes $c05-math-0534$ as the required expectation $c05-math-0535$ for the binomial RV. The R answer of 3 is acceptable, since it equals the theoretical value of $c05-math-0536$ .

The non-negative integer values of the Poisson RV is the set 0, 1, 2, …, and as such we cannot create an R object consisting of infinite elements for any elementary RV. However, from a programming perspective, the values of $c05-math-0537$ for which $c05-math-0538$ will be close to zero will make an insignificant contribution towards this expectation and thus to begin with we will ignore values of $c05-math-0539$ greater than 1e7 for $c05-math-0540$ . Though the computed value of 5 for sum(0:1e7*dpois(0:1e7,lambda=5)) meets the theoretical expected value, there is a small price to pay for ignoring the remaining range of the RV. This may be seen by restricting the sum over two sets 0:25 and 30, which return the same value. However, they are not equal in the sense that the actual numbers differ in the decimal values which are not displayed on the screen.

Next, consider the computation of expectation of an extended RV. Basically, we will focus on continuous RVs and the use of the Formulas 5.15 and 5.16. To begin with, we will try to obtain the expectation of a uniform RV, see Section 6.3. The parameter $c05-math-0541$ determines the range of the uniform RV. The formula 5.15 requires us to partition the interval $c05-math-0542$ into $c05-math-0543$ intervals. Thus, our computations of the expectations will not be accurate if $c05-math-0544$ is significantly large, since the choice of $c05-math-0545$ is limited accordingly as R can create large data objects depending on the available computer configurations. The computer configurations issue will not be discussed any further here. The intention of the program is to stress that as $c05-math-0546$ increases, the computed expectation will be closer to the actual mean.

> partitions <- function(n) n*2^{n}
> Expectation_NNRV_Unif <- function(n,min,max){
+ k = 1:partitions(n)
+ EX = sum(((k-1)/(2^{n}))*(punif(k/2^{n},min,max)- punif((k-1)/2^{n},min,max)))
+ return(EX)
+ }
> sapply(1:20,Expectation_NNRV_Unif,min=0,max=10)
 [1] 0.0250 0.1750 0.4312    5.0000 5.0000 5.0000
> sapply(1:20,Expectation_NNRV_Unif,min=0,max=0.5)
 [1] 0.0000 0.1250 0.1875   0.2500 0.2500 0.2500
> sapply(1:20,Expectation_NNRV_Unif,min=0,max=20)
 [1]  0.0125  0.0875  0.2156    8.1000  9.0250 10.0000
> sapply(1:20,Expectation_NNRV_Unif,min=0,max=30) # FAILS #
 [1] 0.008333 0.058333 0.143750   5.399999 6.016666 6.666666
> sapply(1:20,Expectation_NNRV_Unif,min=0,max=1.3467)
 [1] 0.1856 0.5539 0.6119   0.6733 0.6733 0.6733

The parameters of a uniform distribution can be specified through the min and max options of the function punif, which returns the cumulative probability. The number of partitions is created with the defined function partitions. The R function Expectation_NNRV_Unif is programmed in such a way as to mimic the expression on the right-hand side of Equation 5.15. The probability $c05-math-0547$ is captured with (punif(k/2^n,min,max)-punif((k-1)/2^n,min,max)), while the term $c05-math-0548$ is obtained with (k-1)/(2^n). Note that if the $c05-math-0549$ value is less than 20, the function Expectation_NNRV_Unif works fine. Recall that the expected value of a uniform distribution is the average of min and max, and if the range is $c05-math-0550$ , the expected value is $c05-math-0551$ . The output given is curtailed and the running of the program at your own console clearly shows that as $c05-math-0552$ , the output becomes closer to $c05-math-0553$ , as given by the limit on the right-hand side of Equation 5.15. A similar exercise is repeated for the exponential distribution with the function Expectation_NNRV_Exp and will not be explained any further here.

> Expectation_NNRV_Exp <- function(n,rate){
+ k = 1:partitions(n)
+ EX = sum(((k-1)/(2^{n}))*(pexp(k/2^{n},rate)- pexp((k-1)/2^{n},rate)))
+ return(EX)
+ }
> sapply(1:20,Expectation_NNRV_Exp,rate=10); 1/10
 [1] 0.0033 0.0224 0.0502   0.1000 0.1000 0.1000
[1] 0.1
> sapply(1:20,Expectation_NNRV_Exp,rate=0.9); 1/0.9
 [1] 0.12 0.50 0.78    1.11 1.11 1.11
[1] 1.1
> sapply(1:20,Expectation_NNRV_Exp,rate=0.5); 1/0.5
 [1] 0.086 0.451 0.836    1.998 1.998 1.999
[1] 2
> sapply(1:20,Expectation_NNRV_Exp,rate=0.1); 1/0.1
 [1] 0.023 0.153 0.353    5.372 5.663 5.940
[1] 10

Let us now look at an arbitrary RV, as defined in Equation 5.12. The focus will be on standard normal and Cauchy RVs, which are both extended RVs. For such RVs we define the positive and negative parts. The integrate(function,lower,upper) can be used to evaluate integrals of the form $c05-math-0554$ . It is important to note that R allows the options Inf and -Inf for $c05-math-0555$ and $c05-math-0556$ . Thus, using dnorm, dcauchy, and integrate with the appropriate options, we obtain $c05-math-0557$ and $c05-math-0558$ , equivalently $c05-math-0559$ . If both expectations are well defined, we can compute the expected value of the normal and Cauchy RVs.

> integrate(function(x) {x*dnorm(x)},lower=0,upper=Inf)
0.4 with absolute error < 1.1e-08
> integrate(function(x) {x*dnorm(x)},lower=-Inf,upper=0)
-0.4 with absolute error < 1.1e-08
> integrate(function(x) {abs(x)*dnorm(x)},lower=-Inf,upper=Inf)
0.8 with absolute error < 2.3e-08
> integrate(function(x) {x*dnorm(x)},lower=-Inf,upper=Inf)
0 with absolute error < 0
> integrate(function(x) {x*dcauchy(x)},lower=0,upper=Inf)
Error in integrate(function(x) { : maximum number of subdivisions  reached
> integrate(function(x) {x*dcauchy(x)},lower=-Inf,upper=0)
Error in integrate(function(x) { : maximum number of subdivisions  reached
> integrate(function(x) {abs(x)*dcauchy(x)},lower=-Inf,upper=Inf)
Error in integrate(function(x) { : maximum number of subdivisions  reached
> integrate(function(x) {x*dcauchy(x)},lower=-Inf,upper=Inf)
0 with absolute error < 0

It may be thus seen that the expectation exists for a normal RV, while it does not exist for the Cauchy RV.□

Let us now consider some properties of expectations of RVs.

Properties of Expectation of Random Variables

If $c05-math-0560$ and $c05-math-0561$ are two partitions of a simple RV $c05-math-0562$ such that

then
If $c05-math-0565$ is a non-negative and simple RV, then $c05-math-0566$ . This property continues to hold for non-negative RVs too.
If $c05-math-0567$ and $c05-math-0568$ are two non-negative (simple or otherwise) RVs, and $c05-math-0569$ , then $c05-math-0570$ . This property is also known as the linearity of expectations.

The third counter-intuitive problem from Diaconis and Holmes (2002) will be discussed next.

Example 5.7.5. The Coupon Collectors Problem

Suppose that there are $c05-math-0571$ coupons or stamps on the market, and the goal is to collect each of the coupons. The coupon collectors problem is to find the expected number of samples required to achieve that goal. At the outset, we may feel that a maximum of $c05-math-0572$ or $c05-math-0573$ is sufficient to meet the goal, as we believe that the probability of obtaining any coupon is the same as with any other coupon. Let us now look at the mathematical argument.

Let $c05-math-0574$ denote the random time to collect all the $c05-math-0575$ coupons, and let $c05-math-0576$ be the time to collect the $c05-math-0577$ coupon after $c05-math-0578$ coupons have already been collected. Viewing $c05-math-0579$ and the $c05-math-0580$ 's as RVs, we can see that the probability of collecting a new coupon, given the previously collected ones, is $c05-math-0581$ , and that the RVs $c05-math-0582$ has a geometric distribution²

with expectation $c05-math-0583$ .

The Linearity of Expectations gives us

In the last line of the above equation, $c05-math-0585$ is the harmonic mean. It can be shown that as $c05-math-0586$ , $c05-math-0587$ can be approximated by $c05-math-0588$ . For different $c05-math-0589$ values, we compute $c05-math-0590$ in an R program.

> # The theoretical expectations for the coupon collectors problem
> # is given in this segment
> TEn <- function(n) n*log(n) # The Theoretical Expectations
> coupons_matrix <- matrix(nrow=100,ncol=3)
> colnames(coupons_matrix) <- c("Number_of_Coupons","TEn","BPEn")
> coupons_matrix[,1] <- 1:100
> coupons_matrix[,2] <- sapply(1:100,TEn)
> plot(1:1000,sapply(1:1000,TEn),"l",xlab="Number of Coupons",
+ ylab="Theoretical Expected Number",pch=10,col="orange")
> title("The Coupon Collectors Problem")
> abline(0,2,col="red",pch=1)
> abline(0,3,col="green",pch=2)
> abline(0,4,col="blue",pch=3)
> legend(0,6000, c("TEn","2n","3n","4n"),col=c("orange","red", "green","blue"),pch=1:3)

Thus, we see from Figure 5.6 that there is a large difference between our perceived intuition and the theoretical values given by the mathematical solutions.□

Figure 5.6 Expected Number of Coupons

Higher-order expectations are of interest for many RVs.

The forthcoming section will consider three important functions related to an RV.

$c05-math-0599$

5.8 Distribution Function, Characteristic Function, and Moment Generation Function

Let $c05-math-0600$ be an $c05-math-0601$ -measurable random variable in the probability space $c05-math-0602$ .

Properties of the cdf

a. The cdf $c05-math-0608$ is a non-decreasing function, that is,
b. The cdf $c05-math-0610$ is right-continuous:
c. $c05-math-0612$ as we approach the null set.
d. $c05-math-0613$ as we approach $c05-math-0614$ .
e. The set of discontinuity points of $c05-math-0615$ is at most countable.

A related function is now defined.

We list below some properties of the mgf, and the reader can refer to Gut (2007) for details.

Properties of the mgf

a. If the mgf $c05-math-0620$ is finite (in the above sense), the $c05-math-0621$ moment of $c05-math-0622$ is the $c05-math-0623$ derivative (w.r.t. $c05-math-0624$ ) of the mgf $c05-math-0625$ evaluated at $c05-math-0626$ .
b. Define $c05-math-0627$ , $c05-math-0628$ , where the mgf of $c05-math-0629$ is finite. Then
c. A finite mgf determines the distribution of the RV uniquely.

The mgf does not exist for many important RVs, and thus we look at a function which will always exist.

The cf always exists for any random variable.

Properties of the cf

a. $c05-math-0633$ , $c05-math-0634$ , $c05-math-0635$ .
b. $c05-math-0636$ is uniformly continuous on the real line.
c. $c05-math-0637$ .

Example 5.8.1. Characteristic Function of a Power Series Distribution

Consider a discrete RV $c05-math-0638$ with a probability function defined as

By definition, the characteristic function is obtained in the following steps:

The above expression can be easily evaluated if we can show that $c05-math-0641$ . Note that $c05-math-0642$ , and since $c05-math-0643$ , we have $c05-math-0644$ . Thus, using the result that for $c05-math-0645$ , the infinite series $c05-math-0646$ , the characteristic function of the random variable $c05-math-0647$ is

A three-dimensional plot of the characteristic function is given by the following R codes.

> t <- seq(-10,10,0.1)
> cf_X <- function(t) {exp(1i*t)/(2-exp(1i*t))}
> scatterplot3d(t,Re(cf_X(t)),Im(cf_X(t)),
+ xlim=c(-11,11),ylim=c(-1,1),zlim=c(-1,1),
+ xlab="t",ylab="Real Part of CF", zlab="Complex Part of CF",
+ highlight.3d=TRUE, col.axis="blue",
+ col.grid="lightblue", pch=20,type="l")
> # Output Suppressed

In most cases, the cf plot does not have an interpretation!□

The concepts of cdf, mgf, and cf will be illustrated in more detail in Sections 6.2–6.4.

$c05-math-0649$

5.9 Inequalities

The first inequality which comes to mind is the triangle inequality. It is claimed that this inequality has more than 200 different proofs! In probability theory, inequalities are useful when we do not have enough information about the distribution of the random variables. We will have practical scenarios where me may not know about the experiments any more than the fact that the random variables are independent, their mean, and probably standard deviation. We will now see the role of probabilistic inequalities.

5.9.1 The Markov Inequality

5.9.2 The Jensen's Inequality

The Jensen's inequality is a very useful technique, especially when we deal with the theory of the subject.

5.9.3 The Chebyshev Inequality

Consider a random variable $c05-math-0670$ with mean $c05-math-0671$ and variance $c05-math-0672$ . The Chebyshev's inequality states that

5.18

This inequality has a very useful interpretation: “Not more than $c05-math-0674$ of the distribution's values can be more than $c05-math-0675$ standard deviations away from the mean.” Note that the inequality does not make any assumptions about the probability distribution of $c05-math-0676$ .

A one-tailed version of the Chebyshev's inequality is

The forthcoming section will deal with various convergence concepts of RVs.

5.10 Convergence of Random Variables

The concept of convergence is very strongly built in the $c05-math-0679$ argument. As Jiang (2010) emphasizes, the convergence concept embeds in itself the $c05-math-0680$ argument. Jiang refers to this tool as the A-B-C of large sample theory, and we will first discuss this in Example 1.1.

Jiang's 5.10.1 Example 1.1

Suppose that we have been asked to prove that as $c05-math-0681$

Here, $c05-math-0683$ . By the $c05-math-0684$ argument, we need to show that there exists some $c05-math-0685$ , however large, such that the $c05-math-0686$ will be less than $c05-math-0687$ , where $c05-math-0688$ is a very small non-negative number, that is,

Of course, as a measurement of closeness, we need $c05-math-0690$ to be as close to 0 as possible. Let us consider a sequence of $c05-math-0691$ as $c05-math-0692$ . We will write a rudimentary R program for this task!

> # The jist of epsilon-delta argument
> epsilon <- 10^{-(1:8)}
> N <- NULL
> for(i in 1:length(epsilon)){
+     n <- 1
+     delta=10^{1}0
+     while(delta>epsilon[i]) {
+      n <- n+1
+   delta <- log(1+1/n)
+             }
+     N <- c(N,n)
+ }
> N
[1] 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08

The sequence epsilon <- 10^-(1:8) examines the convergence for very small values of $c05-math-0693$ , which approach 0. The while loop increments the value of N by 1 until the delta becomes less than epsilon. The for loop runs the program for different epsilon values.

This program took approximately 30 minutes on a modern laptop. This shows that for a given $c05-math-0694$ , we can find an $c05-math-0695$ which will ensure that after that number, $c05-math-0696$ will be smaller than the $c05-math-0697$ . Note that it is indeed possible to directly obtain the values of $c05-math-0698$ , though such a program will not reflect the central idea of increasing $c05-math-0699$ . The above program is, of course, not a proof! Now, $c05-math-0700$ implies $c05-math-0701$ and hence that $c05-math-0702$ . Thus, if we take $c05-math-0703$ , where $c05-math-0704$ denotes the integer part of a number, then for every $c05-math-0705$ , we have satisfied the requirement $c05-math-0706$ .□

In the rest of this section, we attempt to gain an insight into using R programs and to establish the results analytically.

5.10.1 Convergence in Distributions

Consider a sequence of probability spaces $c05-math-0707$ , and let the random variable sequence $c05-math-0708$ be such that for each $c05-math-0709$ , $c05-math-0710$ is $c05-math-0711$ - measurable and $c05-math-0712$ . Furthermore, let $c05-math-0713$ be the cumulative distribution function associated with $c05-math-0714$ . Also, let $c05-math-0715$ be another probability space and we consider an $c05-math-0716$ -measurable random variable $c05-math-0717$ , whose $c05-math-0718$ associated cumulative probability distribution function is $c05-math-0719$ . Then we say that the sequence $c05-math-0720$ converges in distribution to $c05-math-0721$ if as $c05-math-0722$

5.19

The standard notation for convergence in distribution is $c05-math-0724$ or $c05-math-0725$ . Sometimes the convergence in the distribution is also denoted by $c05-math-0726$ .

Example 5.10.2. Convergence in Distribution for a Sequence of Uniform Random Variables

Let $c05-math-0727$ , $c05-math-0728$ a Borel $c05-math-0729$ -field on $c05-math-0730$ , and let $c05-math-0731$ be the uniform distribution over the interval $c05-math-0732$ , that is, we consider a sequence of (decreasing, in support) uniform distributions. We can then see that

However, the cdf of a degenerate RV $c05-math-0734$ is also given by

and hence we conclude that $c05-math-0736$ .

We illustrate this convergence in R.

> # Convergence of Uniform Random Variables
> supportud <- seq(-1,1,0.01)
> plot(supportud,punif(supportud,0,1),"l",xlab="x",
+ ylab=expression(paste("F","_","(n)",sep="")),col=1)
> n <- c(1,2,5,10,100,1000,10000)
> for(i in 2:length(n)) {
+     lines(supportud,punif(supportud,0,1/n[i]),col=i)
+       }
> pdegen <- c(rep(0,100),rep(1,101))
> lines(supportud,pdegen,"p")

How does the program work? The $c05-math-0737$ values range is specified as seq(-1,1,0.01) and stored in the object supportud. Though $c05-math-0738$ does not assume any negative value, the range is selected from –1, which gives a better plot. This aspect of generating elegant graphical plots is not a scientific matter, it is more of a personal preference. The uniform distribution function $c05-math-0739$ is obtained with punif(supportud,0,1), and it is then visualized with the plot graphical function. Since we need to examine the behavior of $c05-math-0740$ for increasing $c05-math-0741$ values, the $c05-math-0742$ values are changed with n <- c(1,2,5,10,100,1000,10 000). Using the lines option, the $c05-math-0743$ are embossed on the existing plot. Thus, Part A of Figure 5.7 shows that as $c05-math-0744$ increases, $c05-math-0745$ approaches $c05-math-0746$ .□

Figure 5.7 Illustration of Convergence in Distribution

We next state an important result, see page 259 of Rohatgi and Saleh (2000).

The above theorem is now illustrated through an example.

Example 5.10.3. Convergence in Distribution to a Nondegenerate Random Variable

Consider a sequence of probability spaces $c05-math-0756$ with the probability measure $c05-math-0757$ identified by normal probability measures $c05-math-0758$ . An R program next shows the convergence of the sequence $c05-math-0759$ to a normal random variable with $c05-math-0760$ .

> # Example of Convergence to a Continuous RV
> theta <- sample(1:10^{6},1)
> supportnor <- seq(theta-3,theta+3,0.01)
> n <- c(1,2,5,10,100,1000,10000)
> plot(supportnor,dnorm(supportnor,theta+1,1),xlab="x",
+ ylab=expression(paste(Phi,"n","(x)",sep="")),"l",col=1)
> for(i in 2:length(n)){
+     lines(supportnor,dnorm(supportnor,theta+1/n[i],1),col=i)
+ }
> lines(supportnor,dnorm(supportnor,theta,1),"p")

Since we require the probability density function of the normal RV, we are using the dnorm function. The purpose of the sample function is to generate a random integer in the range 1 to 10^6. Details of this function are obtainable in Chapter 11. Note that different runs of the program will result in different outputs, since we are arbitrarily selecting the mean of the normal distribution from 1:10^6. The rest of the program is easier to follow. What does Part B of Figure 5.7 say?□

5.10.2 Convergence in Probability

Consider a probability space $c05-math-0761$ and let $c05-math-0762$ be a sequence of random variables defined in this space. We say that the sequence $c05-math-0763$ converges in probability to a random variable $c05-math-0764$ if for each $c05-math-0765$

5.21

5.22

Symbolically, we denote this as $c05-math-0768$ .

Convergence in probability implies convergence in distribution. However, the converse is not true.

5.10.3 Convergence in $c05-math-0776$ Mean

We begin with a probability space $c05-math-0777$ and let $c05-math-0778$ be a sequence of random variables defined in it. We say that the sequence $c05-math-0779$ converges in $c05-math-0780$ mean to an RV $c05-math-0782$ if $c05-math-0783$ and as $c05-math-0784$

5.23

Example 5.10.5. A Sequence Converging in the Second Mean

Let $c05-math-0786$ be a sequence of random variables for whom the probability measure is defined by

Let us understand the behavior of $c05-math-0788$ probabilities graphically.

> # Convergence in r-th Mean
> pr_xn_1 <- 1/{1:50}
> pr_xn_0 <- 1- 1/{1:50}
> par(mfrow=c(1,2))
> plot(1:50,pr_xn_1,xlab="X_n 's",
+ main="Probability of Xn Taking Value 1", type="h")
> plot(1:50,pr_xn_0,xlab="X_n 's",
+ main="Probability of Xn Taking Value 0", type="h")

Figure 5.8 shows that as $c05-math-0789$ increases, the probability of $c05-math-0790$ taking the value 0 is very close to 1. This is not the same as saying that the convergence in mean, for any $c05-math-0791$ , is guaranteed. However, we can probably investigate if the sequence $c05-math-0792$ converges in some order of mean to 0. Let us check for the convergence to 0 in the second mean.

Thus, as $c05-math-0794$ , $c05-math-0795$ and we can say that $c05-math-0796$ converges in the second mean to zero.□

Figure 5.8 Graphical Aid for Understanding Convergence in $c05-math-0797$ Mean

5.10.4 Almost Sure Convergence

Let $c05-math-0798$ be a sequence of random variables on the probability space $c05-math-0799$ . The sequence $c05-math-0800$ is said to converge almost surely to a RV $c05-math-0801$ if

The next result is very useful in establishing almost sure convergence. Almost sure convergence is denoted by $c05-math-0803$

$c05-math-0815$

5.11 The Law of Large Numbers

The “large” in the Law of Large Numbers, abbreviated as LLN, is a pointer that the number of observations are very large. This clarification is issued in interest of the student community, as we felt that quite a few of them have confused the “large” with the magnitude of $c05-math-0816$ .

The LLN has two very important variants: (i) the Weak Law of Large Numbers, abbreviated as WLLN, and (ii) the Strong Law of Large Numbers, SLLN. The former is a convergence in probability criteria and the latter is almost sure convergence. Formal statements are given here. We discuss the concepts with a few analytical illustrations.For an understanding of these concepts through simulation, we refer the reader to Section 6 of Chapter 12.

5.11.1 The Weak Law of Large Numbers

Let $c05-math-0817$ be a sequence of random variables and define $c05-math-0818$ . The sequence $c05-math-0819$ is said to obey the weak law of large numbers, WLLN, with respect to a sequence of constants $c05-math-0820$ , $c05-math-0821$ , if there exists a sequence of constants $c05-math-0822$ which satisfies

The sequence $c05-math-0824$ is called centering sequence and the sequence $c05-math-0825$ is called norming sequence. We state below a result which helps to determine whether the WLLN holds true for the sequence under consideration.

5.12 The Central Limit Theorem

5.12.1 The de Moivre-Laplace Central Limit Theorem

A complete de-mystification of the de Moivre-Laplace CLT appears in Ramasubramaniam (1997). Suppose that $c05-math-0848$ . Let $c05-math-0849$ . Define

5.24

We will begin with a statement of this result.

Typically, CLT is demonstrated using a simulation study. Though that approach is not wrong, we would like to make a few pointers here. In a simulation study we actually use the realized values of the RVs as in $c05-math-0853$ . However, the CLT is truly about the convergence of RVs as in $c05-math-0854$ . The principal point is that we cannot pretend that $c05-math-0855$ is the same as $c05-math-0856$ . Thus, our illustration, at least in this section, will not resort to a simulation study.

We give an animated version of the convergence in the following program.

> n <- 10:1000
> p <- 0.4
> for(i in 1:length(n)){
+     plot(0:n[i],dbinom(0:n[i],p=0.4,n[i]),
+     "h",xaxt="n",yaxt="n",xlab="x",ylab="PDF")
+     title("The de Moivre's Laplace Central Limit Theorem")
+     curve(dnorm(x,mean=n[i]*0.4,sd=sqrt(n[i]*0.4*0.6)),
+     from=0,to=n[i],add=TRUE)
+ }

Change the values of $c05-math-0857$ for different levels and enjoy the convergence. Consider the general iid case next.

5.12.2 CLT for iid Case

Let $c05-math-0858$ be iid random variables with finite first and second moments, that is, $c05-math-0859$ . Define $c05-math-0860$ . Let $c05-math-0861$ and $c05-math-0862$ denote the mean and variance for $c05-math-0863$ . The general CLT for the iid case is stated below.

Two equivalent forms of the CLT statement are that as $c05-math-0866$

This statement appears very generic and we consider an illustration with varied distributions.

Example 5.12.1. CLT for Gamma Distribution

This illustration is an extension of a beautiful construct by Geyer, see http://www.stat.umn.edu/geyer/5101/examp/clt.html. Suppose $c05-math-0868$ . We have an iid sample of size $c05-math-0869$ . It is easier to see that $c05-math-0870$ . For large $c05-math-0871$ , the CLT says that $c05-math-0872$ may be approximated by $c05-math-0873$ . We know that a Gamma distribution is skewed, and hence we first plot the density function of the gamma distribution and insert the approximate normal curve. In the long run, we expect the two curves to be identical. Thus, for different $c05-math-0874$ values, we obtain Figure 5.9, which shows that the normal approximation indeed works fine.

> # 5.15.2 CLT for iid Case
> alpha <- 0.5
> n <- c(1,5,20,100,500,1000)
> cutoff <- 1e-3
> par(mfrow=c(2,3))
> for(i in 1:6){
+     from <- qgamma(cutoff/2, n[i]*alpha)
+     to <- qgamma(cutoff/2, n[i]*alpha,lower.tail=FALSE)
+     if(i==1) from <- 0
+     if(i==1) to <- 6
+     curve(dgamma(x,n[i]*alpha),from=from,to=to,ylab="f(x)", xlab="x",main=paste("n = ",n[i],sep=""))
+     curve(dnorm(x,mean=n[i]*alpha,sd=sqrt(n[i]*alpha)),col="red", add=TRUE)
+      }
> title("CLT for a Gamma Sum",outer=TRUE,line=-1)

Note that we are genuinely looking at the probability distribution function of $c05-math-0875$ .□

Figure 5.9 Normal Approximation for a Gamma Sum

Example 5.12.2. An iid Sample from Triangular Distribution

Let $c05-math-0876$ be a random variable whose probability distribution is specified by a triangular distribution on the points $c05-math-0877$ with $c05-math-0878$ :

The parameters allow us to specify any three points on the real line. Let $c05-math-0880$ , be replicates of $c05-math-0881$ . The reader can verify this for a triangular distribution:

Thus, the mean and variance are seen to be finite. Define $c05-math-0883$ . The iid CLT clearly says that in this case $c05-math-0884$ converges to a standard normal distribution. A nice thing about the theory is that we now know how to approximate the average of the sample observations of a triangular distribution. It is important to note that the convolution sum of triangular random variables is not known for a general $c05-math-0885$ . The CLT holding good for this example is shown through a simulation study in Chapter 11.□

The more general case of a sequence of independent RVs is considered next.

5.12.3 The Lindeberg-Feller CLT

The previous theorem along with Equation 5.26 handles a sequence of iid RVs. An extension of this result for the more generic case of a sequence of independent RVs is then required. The Lindeberg-Feller theorem gives a set of necessary and sufficient conditions which help establish CLT for a sequence of independent RVs. A few notations are in order towards this end.

Let $c05-math-0886$ be a sequence of RVs, and let their respective probability spaces be defined by the sequence $c05-math-0887$ . In the case of the iid sequence, we have $c05-math-0888$ . Assume that $c05-math-0889$ , and $c05-math-0890$ . Define $c05-math-0891$ , $c05-math-0892$ , and $c05-math-0893$ . It may be noted by the reader here that $c05-math-0894$ does not correspond to the realized value of the sum $c05-math-0895$ . The next theorem is the famous result sought in this section.

From a programming perspective, it is better to first consider the simpler case of a sequence of iid RVs for verifying the Feller condition 5.27 and also the Lindeberg condition 5.28. It will become more clear why the simpler case has been considered first.

Example 5.12.3. Sequence of Normal RVs

Let $c05-math-0901$ 's be iid RVs and each following the standard normal distribution. It is then known by the theorem in the previous subsection that the sum of these RVs follows a normal distribution. However, this example will be used here to give a clear understanding of the Lindeberg and Feller conditions. First, Feller conditions will be verified.

The R program unfolds as follows. The numeric vector mean_k declares all the means as 0, while sigma_2_k is used to define the sequence of $c05-math-0902$ 's. Next, the important term of the sum of variances of the RVs $c05-math-0903$ is obtained using the cumsum function and stored as sn_2, and the small $c05-math-0904$ is obtained with sn <- sqrt(sn_2). The Feller condition requires us to obtain $c05-math-0905$ . This is easily obtained with the max function in a loop.

> mean_k <- rep(0,1000)
> sigma_2_k <- rep(1,1000)
> n <- length(sigma_2_k)
> sigma_k <- sqrt(sigma_2_k)
> sn_2 <- cumsum(sigma_2_k)
> sn <- sqrt(sn_2)
> Sn_by_sn <- sigma_k/sn
> Max_Sn_by_sn <- NULL
> for(i in 1:length(sigma_k)){
+ Max_Sn_by_sn[i] <- max(sigma_k[1:i]/sn[i])
+ }
> plot.ts(Max_Sn_by_sn,main=expression(paste("A: Feller
+ Condition for ", X[n],"∼",N(0,1))),
+ xlab=expression(paste("as ",n %->% infinity)))

The resultant Max_Sn_by_sn is visualized to check if it really approaches 0 for large values of $c05-math-0906$ . The numeric object Max_Sn_by_sn is simply plotted against the indices, and the expression and paste options are used meticulously to obtain the right labels for the $c05-math-0907$ - and $c05-math-0908$ - axes. It may be seen from Part A of Figure 5.10 that the Feller condition 5.27 is satisfied for the sequence of iid RVs from the standard normal distribution. Hence, a program is developed to verify if the Lindeberg conditions are also met.

> # Lindeberg Condition
> epsilon <- c(0.3,0.2,0.1,0.05)
> windows(height=20,width=20)
> par(mfrow=c(2,2))
> for(z in 1:4){
+ gn_epsilon <- NULL
+ curr_epsilon <- epsilon[z]
+ for(i in 1:n){
+ integral_term <- 0
+ sigma_2_temp <- sn_2[i]
+ sigma_temp <- sn[i]
+ for(j in 1:i){
+ integral_term <- integral_term + 2*integrate(function(x)  x^{2}*dnorm(x,mean=mean_k[j],
+ sd=sigma_k[j]),lower=curr_epsilon*sigma_temp,upper=Inf)$value
+ }
+ gn_epsilon[i] <- integral_term/sigma_2_temp
+ }
+ plot.ts(gn_epsilon,main=expression(paste("Lindberg Condition  for ", X[n],"∼",N(0,1))),
+ xlab=expression(paste("as ",n %->% infinity)), ylab=expression(g[n](epsilon)))
+ text(800,.8,bquote(epsilon == .(curr_epsilon)))
+ }

Figure 5.10 Verifying Feller Conditions for Four Problems

The Lindeberg condition 5.28 needs to be verified for every $c05-math-0909$ , but such a daunting task is not a necessity here and four values of epsilon <- c(0.3,0.2,0.1,0.05) are experimented with here. The Lindeberg expression needs to be handled carefully, and if it is not, the program may end up as a dampener. First, the integral needs to be evaluated over the range $c05-math-0910$ . Using the symmetric property of the normal distribution, the search is restricted over the range $c05-math-0911$ , and thus the options lower=curr_epsilon*sigma_temp and upper=Inf and the evaluated integral value is multiplied by 2. The integrand $c05-math-0912$ is handled with function(x) x^2*dnorm. The for(j in 1:i) obtains $c05-math-0913$ , and the higher loop for(i in 1:n) obtains $c05-math-0914$ for different $c05-math-0915$ 's. The plot of $c05-math-0916$ is obtained when the $c05-math-0917$ value is displayed using the text function with the bquote option to reflect the choice of $c05-math-0918$ . For each choice of $c05-math-0919$ , the $c05-math-0920$ is seen to approach 0 in Figure 5.11. Thus, the CLT holds for this sequence of RVs.□

Figure 5.11 Lindeberg Conditions for Standard Normal Distribution

Example 5.12.4. Sequence of Normal RVs– $c05-math-0921$

Now, consider the sequence of independent RVs $c05-math-0922$ , where $c05-math-0923$ . This is an example of where the Lindeberg-Feller Theorem will be used to check if the CLT really holds or not. It is the Feller condition which will be first checked and if the condition 5.27 is satisfied, the Lindeberg condition will be detailed.

Clearly, the Feller part of the R program does not change much, except for the declaration of mean_k and sigma_2_k, and hence further description of the module is skipped. Part B of Figure 5.10 clearly shows that the Feller condition is satisfied for the sequence $c05-math-0924$ . Similarly, the R program for the Lindeberg's condition does not require further explanation.

> ### Xn∼N(n,n^{2}) ### CLT HOLDS GOOD
> # Feller Condition
> mean_k <- 1:1000
> sigma_2_k <- mea{n_k^{2}}
> n <- length(sigma_2_k)
> sigma_k <- sqrt(sigma_2_k)
> sn_2 <- cumsum(sigma_2_k)
> sn <- sqrt(sn_2)
> Sn_by_sn <- sigma_k/sn
> Max_Sn_by_sn <- NULL
> for(i in 1:length(sigma_k)){
+ Max_Sn_by_sn[i] <- max(sigma_k[1:i]/sn[i])
+ }
> plot.ts(Max_Sn_by_sn,main=expression(paste("B: Feller
+ Condition for ", X[n],"∼","N(n,",n^{2},")")),
+ xlab=expression(paste("as ",n %->% infinity)))
> # Lindeberg Condition
> epsilon <- c(0.3,0.2,0.1,0.05)
> windows(height=20,width=20)
> par(mfrow=c(2,2))
> for(z in 1:length(epsilon)){
+ gn_epsilon <- 0
+ curr_epsilon <- epsilon[z]
+ for(i in 1:n){
+ integral_term <- 0
+ sigma_2_temp <- sn_2[i]
+ sigma_temp <- sn[i]
+ for(j in 1:i){
+ integral_term <- integral_term + 2*integrate(function(x) x^{2}*dnorm(x,mean=mean_k[j],
+ sd=sigma_k[j]),lower=curr_epsilon*sigma_temp,upper=Inf)$value
+ }
+ gn_epsilon[i] <- integral_term/sigma_2_temp
+ }
+ plot.ts(gn_epsilon,main=expression(paste("Lindberg Condition for ", X[n],"∼",N(n,n^{2}))),
+ xlab=expression(paste("as ",n %->% infinity)), ylab=expression(g[n](epsilon)))
+ text(800,3,bquote(epsilon == .(curr_epsilon)))
+ }

Note the change of $c05-math-0925$ as $c05-math-0926$ decreases. This implies that for smaller $c05-math-0927$ , we need to draw large $c05-math-0928$ to visualize the convergence of $c05-math-0929$ to 0, see the right-hand bottom part of Figure 5.12. However, it is clearly seen that the Lindeberg condition is also satisfied for this sequence of RVs and hence the CLT will hold true.□

Figure 5.12 Lindeberg Conditions for Curved Normal Distribution

Example 5.12.5. Sequence of Normal RVs - $c05-math-0930$

Now, consider the sequence of independent RVs $c05-math-0931$ where $c05-math-0932$ . The now familiar R program for the Feller condition results in part C of Figure 5.10. It is clearly seen that the Feller condition is not satisfied and that $c05-math-0933$ stops at a value slightly above 0.7. Hence, the CLT does not hold for this sequence of independent RVs.

> ### Xn∼N(0,2^{-n}) ## CLT DOES NOT HOLD GOOD
> # Feller Condition
> mean_k <- rep(0,1000)
> sigma_2_k <- 2^{-c(1:1000)}
> n <- length(sigma_2_k)
> sigma_k <- sqrt(sigma_2_k)
> sn_2 <- cumsum(sigma_2_k)
> sn <- sqrt(sn_2)
> Sn_by_sn <- sigma_k/sn
> Max_Sn_by_sn <- NULL
> for(i in 1:length(sigma_k)){
+ Max_Sn_by_sn[i] <- max(sigma_k[1:i]/sn[i])
+ }
> plot(Max_Sn_by_sn)
> plot.ts(Max_Sn_by_sn,main=expression(paste("C: Feller Condition
+ for ", X[n],"∼","N(0,",2^{-n},")")),xlab=expression(paste("as "
+ ,n %->%infinity)))

□

A stronger, but more restrictive, condition can be imposed, which can be used for examining the CLT for a sequence of independent RVs.

5.12.4 The Liapounov CLT

The Liapounov's CLT is given in the next theorem.

Since the Liapounov's condition 5.29 requires a higher-order moment condition, it is more difficult in general to establish whether it holds for a given sequence of RVs. However, it is sometimes very useful for a sequence of discrete RVs.

Example 5.12.6. A Sequence of Poisson RVs with $c05-math-0946$

For a sequence of Poisson RVs $c05-math-0947$ with $c05-math-0948$ , for $c05-math-0949$ , we have

It may be noted that the first three central moments for a Poisson RV are equal. Thus, we are checking whether the conditions required for the Liapounov's theorem is satisfied for $c05-math-0951$ . Since the Feller's condition is fairly easy to prove (at least in R), this aspect will be evaluated first.

> # # # Xn∼Pois(n*lambda)
> # Feller Condition
> lambda <- 5 # A very arbitrary choice
> mean_k <- 1:100*lambda
> sigma_2_k <- mean_k
> n <- length(sigma_2_k)
> sigma_k <- sqrt(sigma_2_k)
> sn_2 <- cumsum(sigma_2_k)
> sn <- sqrt(sn_2)
> # Sn_by_sn <- sigma_k/sn # Not required
> Max_Sn_by_sn <- NULL
> for(i in 1:length(sigma_k)){
+ Max_Sn_by_sn[i] <- max(sigma_k[1:i]/sn[i])
+ }
> plot.ts(Max_Sn_by_sn,main=expression(paste("D: Feller Condition
+ for ", X, "∼", "Pois(n", lambda,")")),
xlab=expression(paste("as ",n %->% infinity)))
> # Verify Liapounov's condition instead
> thirdCentral <- mean_k
> sn <- sqrt(cumsum(mean_k))
> Bn <- cumsum(thirdCentral)
> plot.ts(Bn/sn^{3},ylab="Liapounou's Condition",
+ xlab=expression(paste("as ", n %->% infinity)))
> text(80,0.4,expression(paste(frac(sum(E*"|"*X*"|"[k]^{2+delta},
+ k==1,n)),)) , col="purple", cex=0.8)
> text(80,0.365,expression(s[n]^{2+delta}),col="purple",cex=0.8)

Part D of Figure 5.10 shows that the Feller condition is satisfied for this sequence of Poisson RVs. Now, to evaluate the Liapounov's condition 5.29, first the quantities $c05-math-0952$ and $c05-math-0953$ are computed in sn and Bn respectively. The R quantity Bn/sn^3 gives us the Liapounov condition for various $c05-math-0954$ . It is seen from the resulting diagram, the output of which is not produced here, that the required condition satisfies the $c05-math-0955$ sequence, and hence the CLT holds good here.□

Example 5.12.7. A Sequence of Discrete RVs

Let $c05-math-0956$ be a sequence of RVs with PMF given by

5.30

It can be then seen that

The theoretical quantities $c05-math-0959$ , $c05-math-0960$ , and $c05-math-0961$ clearly help in setting up the R program. Using the same arguments as in the previous example, the R program is easily set up.

> # P(Xn=n/log(n))=log(n)/(2n) = P(Xn = -n/log(n));
# P(Xn=0) = 1-log(n)/n
> # E(Xn) = 0
> # Var(Xn) = n/log(n)
> # Feller Condition
> n <- 2:50
> sigma2 <- n/log(n)
> sigma <- sqrt(sigma2)
> sn2 <- cumsum(sigma2)
> sn <- sqrt(sn2)
> thirdCentral <- n^{2}/(2*(log(n)^{2}))
> Bn <- cumsum(thirdCentral)
> Max_Sn_by_sn <- sigma/sn
> par(mfrow=c(1,2))
> plot.ts(Max_Sn_by_sn,main=expression(paste("Feller Condition
+ for ", X[n])), xlab=expression(paste("as ",n %->% infinity)))
> # Verify Liapounov's condition instead
> plot.ts(Bn/sn^{3},ylab="Liapounov's Condition",
+ xlab=expression(paste("as ", n %->% infinity)),main=
+ expression(paste("Liapounov Condition for ", X[n])))
> text(40,0.8,expression(paste(frac(sum (E*"|"*X*"|"[k]^{2+delta},
+ k==1,n)),)), col="purple", cex=0.8)
> text(40,0.77,expression(s[n]^{2+delta}),col="purple",cex=0.8)

Figure 5.13 clearly shows that the Feller condition 5.27, as well as the Liapounov condition 5.29, are satisfied for the sequence of RVs considered here. Hence, the CLT holds for this sequence of RVs.□

Figure 5.13 Liapounov Condition Verification

$c05-math-0962$

5.13 Further Reading

A futile exercise is being undertaken now as we promise the reader a complete bibliography on the sources of probability! With the possibility, that is positive probabilities, of some abuse, we have further classified the sources into different subsections.

5.13.1 Intuitive, Elementary, and First Course Source

Chung and AitSahlia (2004) is an appealing beginner's starting point. Chung and AitSahlia (2004) is a fourth edition enhancement of Chung (1979), which has been an excellent introduction since its first print. A higher secondary school introduction to probability has been written by two eminent Russian probabilists Gnedenko and Khinchin (1964). Ash (1969), Chandra and Chatterjee (2001), Gnedenko (1978), Durrett (2009), and Ross (2010) are some of the competitive first course texts on probability theory.

As far as intuition is concerned, one of the unrivalled works in the literature ever is the two volumes of Feller (1968 and 1971). A testimony and tribute to this fact is that almost any decent work on probability theory will cite Feller.

5.13.2 The Classics and Second Course Source

Feller's (1968 and 1971) two volumes are intuitive, classic, advanced courses, and also almost everything else too. The measure theoretic approach was first detailed in Kolmogorov (1933). This short book served as a cornerstone for the way probability would be written from that point onward. It is generally agreed that Loéve (1955) has been the first comprehensive take on the measure theoretic approach to probability and the last edition of his work appeared in two volumes, Loeve (1977). Chung (2000) has been another famous probabilitist researcher who has also written some of the best probability books.

Ash and Doléans-Dade (2000), Shiryaev (1995), Chow and Teicher (1995), Athreya and Lahiri (2005), Durrett (2010), and Breiman (1962, 1992), among others, are some of the excellent measure theoretic approaches to probability. Parthasarathy (1978) and Billingsley (1995) have also stood the test of time and are still a favorite of many probability readers. In most of the books mentioned in this paragraph, the first editions have appeared a couple of decades earlier than their most recent editions that we have listed here.

Rosenthall (2006) and Kallenberg (2002) are two of the modern texts on measure theoretic probability. In the Indian subcontinent, students and teachers have benefited from Bhat (2012) and Basu (1998) and both the books make a great read.

5.13.3 The Problem Books

If the intimidated people could have their way, we are sure that most of the authors mentioned in this subsection would be facing capital punishment! After all, the offense of these authors is not any less. Mosteller (1962), Grimmet and Stirzaker (2001), Cacoullus (1989), Schwarz (2007), Capiński and Zastawniak (2001), Nahin (2008), Sveshnikov (1968), and Chaumont and Yor (2003) all are guilty of the same offense. They batter the readers with a never-ending sequence of problems and thereby give a proof that is the way to understand infinity! To be fair to these authors, some of them have been really nice! They clearly mention that their book has only 40, 50, or 100 problems in contrast to some of them, who blatantly emphasize that the reader must solve 1000 problems!

5.13.4 Other Useful Sources

Johnson and Kotz (1969–73) have written a four-volume book on the distributions that arise in probability and statistics. An update to this work has appeared in the late 1990s by Johnson and Kotz and co-workers. DasGupta (2011) and DasGupta (2010) cover a lot of topics and modern advancements. Dworsky (2008) offers a different view and makes a smart read. Stoyanov (1997) is an entirely different type of book, with its focus mainly on counter examples.

5.13.5 R for Probability

Prof Jay Kerns' open source book, Kerns (2010), has been an influence on the first few sections of this chapter. Horgan (2008) and Baclawski (2008) are two introductory books who demonstrate a lot of probability aspect that can be understood through R.

5.14 Complements, Problems, and Programs

Problem 5.1 Consider the three sets from $c05-math-0963$ LETTERS: $c05-math-0964$ , $c05-math-0965$ , and $c05-math-0966$ . Using the operators intersect and union, for the sets $c05-math-0967$ , $c05-math-0968$ , and $c05-math-0969$ , verify the following:
1. 1. $c05-math-0970$ and $c05-math-0971$ .
2. 2. $c05-math-0972$ and $c05-math-0973$ .
3. 3. Using the sample function, verify (i) and (ii) for arbitrary sets $c05-math-0974$ , $c05-math-0975$ , and $c05-math-0976$ .
Problem 5.2 The R code tosscoin(times=3) returns an object of the data.frame class. However, a probabilist is familiar if the sample space is neatly written out as $c05-math-0977$ or $c05-math-0978$ . Use the paste function to convert the data.frame object into $c05-math-0979$ elements such as $c05-math-0980$ , and so on, for any number of coin tosses.
Problem 5.3 The sample space of a die rolling becomes very large, depending on number of times we roll the die, and also on the number of sides of the die. Write a R program using the rolldie function from the prob package, single line preferred, which returns the total number of possible outcomes for (i) a die being rolled 1 to 6 times, (ii) the number of sides of the die vary from 3 to 10. An indicative syntax-based solution is along the lines sapply(sapply(option2,rolldie,option1),nrow). Justify the use of sapply and the options.
Problem 5.4 Find out more details about the Roulette game and make a preliminary finding about it in the function roulette.
Problem 5.5 Run the codes names(table(rowSums(S_Die))) and table(rowSums(S_Die)) from Example 5.2.7 and verify that you have completely understood the examples code. Now, roll four die and answer the probability of obtaining an odd number greater than ten.
Problem 5.6 For the thirteenth of a month problem, start with an arbitrary year, say 1857, and then run the program up to year 2256. Do you expect that the 13th will more likely be a Friday than any other day? Confirm your intuition with the R program.
Problem 5.7 In Example 5.3.3, the digits are drawn to solve a replacement problem. Obtain the probability of obtaining at least two even numbers in a draw of five using the leading digits of $c05-math-0981$ .
Problem 5.8 What is the number of people whose birthday you need to ask so that the probability of finding a birthday mate is at least half? Write a brief R program to obtain the size as the probability varies from 0 to 1.
Problem 5.9 Construct a program which can conclude if the collection of sets over a finite probability space is a field.
Problem 5.10 Extend the program in the previous problem to verify if probabilities defined over an arbitrary collection of finite sets satisfies the requirement of being a probability measure.
Problem 5.11 Explore if the addrv function from the prob package can be used to handle more than two variables.
Problem 5.12 * For small $c05-math-0982$ values and $c05-math-0983$ around 0, write a program to obtain the expectation of a normal RV, which incorporates the expectation of an RV for an arbitrary RV, as given in Equation 5.16.
Problem 5.13 Extend the R function Expectation_NNRV_Unif for computing the expectation of a uniform RV over the interval $c05-math-0984$ .
Problem 5.14 Evaluate the R program of de Moivre-Laplace CLT for different values of $c05-math-0985$ .
Problem 5.15 Using the normal approximation, CLT result, for the triangular distribution for various values of $c05-math-0986$ , $c05-math-0987$ , and $c05-math-0988$ , create an R program for evaluating $c05-math-0989$ .
Problem 5.16 * Using the theoretical moments of the normal distribution, verify if the Liapounovs condition holds for the sequence of RVs developed in Example 5.12.4.
Problem 5.17 Let $c05-math-0990$ follow a Poisson distribution $c05-math-0991$ . Verify if the Feller condition holds for this sequence. If the Feller condition is satisfied, verify for the Liapounov's condition.
Problem 5.18 For an exponential RV with rate $c05-math-0992$ , the mean and variance are respectively known to be $c05-math-0993$ and $c05-math-0994$ . Suppose $c05-math-0995$ follows an exponential distribution with rate $c05-math-0996$ . Does the R program indicate that the Lindeberg condition will be satisfied for the defined sequence?
Problem 5.19 If the rate of exponential distribution for $c05-math-0997$ is $c05-math-0998$ , verify the Lindeberg and Feller condition for the sequence under consideration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

	ordered = True	ordered = False
replace = True	$c05-math-0100$	$c05-math-0101$
replace = False	$c05-math-0102$	$c05-math-0103$