The characterization of a random process requires an understanding of random variable theory, and this has its basis in probability theory. This chapter provides an overview of the key concepts from probability theory that underpin random variable theory. This is followed by an introduction to random variable theory and an overview of discrete and continuous random variables. The concept of expectation of a random variable is fundamental and a brief discussion is included. As part of this discussion, the characteristic function of a random variable is defined. Random variable theory is extended by the generalization to pairs of random variables and a vector of random variables. Associated concepts for this case include the joint probability mass and density functions, marginal distribution and density functions, conditional probability mass and density functions, and covariance and correlation functions. The chapter concludes with a discussion of Stirling’s formula and the important DeMoivre–Laplace and Poisson approximations to the binomial probability mass function. Useful references for probability and random variable theory include Grimmett and Stirzaker (1992) and Loeve (1977).
The following definitions are fundamental and define the basic concepts that underpin probability and random process theory.
The terminology P[A] is used to denote the probability of the event A.
By definition, S is the set of possible experimental outcomes defined by an experiment, and each experimental outcome has a defined probability. The following two definitions are fundamental to the analysis related to random variables and apply, respectively, to the cases where the sample space comprises a countable and an uncountable number of outcomes.
The probability operator is defined on subsets of the sample space S. For the case of a countable number of outcomes, the sample space can be defined as
and P[{ω1}], P[{ω2}], etc. are well defined. For notational convenience, it is usual to write P[ω1] rather than P[{ω1}].
Consider an experiment that is repeated N times and performed under conditions denoted hypothesis H. Assume:
The relative frequency of the event is NAB/N and can be written as
As and with convergent ratios, it is reasonable to infer and define
where P[A/(B and H)] is the probability of A assuming conditions, as specified by B and H, have occurred. As the conditions specified by H are common to all probabilities, it is usual to omit them and to write
It is common to use the notation P[AB] for .
Bayes’ theorem finds widespread application.
It is common for experiments to comprise of unrelated stages or parts. Such stages, or parts, are independent and result in independence being an outcome of the structure of the experiment. For example, consider an experiment of throwing a dice twice. The outcome of the second throw is independent of the outcome of the first throw.
Alternatively, consider a two-part experiment where the outcomes of the second part are independent of the first part. If A denotes an event based solely on the first part, and B denotes an event based solely on the second part, then
Thus, knowing that A has occurred in the first part of the experiment does not affect the probability of the event B occurring in the second part of the experiment. Similarly, . The conditional probability result
then implies
Because of its importance, the following is stated as being axiomatic.
The nature of the experiment determines the nature of the sample space. The following distinctions readily arise:
With an infinite number of outcomes, the probability of an individual point outcome (an outcome consistent with a set with zero measure) is zero. Finite probabilities are associated with intervals or collections of outcomes with nonzero length or measure. For example,
The outcome of an experiment, in general, is not a number and, for example, the sample space could consist of the outcomes . For such cases, it is useful to associate a number with each outcome, for example, , , and such an association defines a random variable.
For the specific case where the random variable associates a distinct value X(ω) with each outcome ω in the sample space, the probability of the outcome X(ω) equals the probability of the outcome ω. This case is illustrated in Figure 4.2. Mathematically,
In general, a random variable X assigns the same number to two or more experimental outcomes as illustrated in Figure 4.3. For the situation illustrated in Figure 4.3, it is the case that
It is clear that the probability of occurrence of a specific outcome for a random variable is dependent on the probability mass function, pΩ, or probability density function, fΩ, associated with the experimental outcomes.
Consider a random variable X:
For the case of a 1 : 1 structural mapping between experimental outcomes and numerical values of the associated random variable, the sample spaces, as detailed in Table 4.1, are indicative of those that arise.
Table 4.1 Forms for the sample space of a random variable
S | Sample space SX |
{ω1, ω2, …} | {X(ω1), X(ω2), …} |
For the case where the experimental outcomes are numerical values, a random variable can be defined based directly on the distinct experimental outcomes according to . For this case, the sample space SX is
and similarly for cases where the experiment yields a vector or matrix of numerical values for each trial.
For this case, the probability mass function, or density function, of the random variable X is identical to that of the probability mass function, pΩ, or the probability density function, fΩ, associated with the experiment.
Random variables can be classified as being discrete, continuous, or a mixture of continuous and discrete. The former two types are the most common and dominate random variable theory.
For the case where the random variable X defines the numbers {x1, …, xi, …}, the following properties hold:
If, in n trials of an experiment, the number of outcomes of a random variable X that fall in the interval equals nx, then for n suitably large.
The cumulative distribution function specifies the probability that a random variable takes on a value in the interval .
The following properties hold for the cumulative distribution function:
For the continuous random variable case, FX is a continuous function (this follows from the assumption that fX has at most a finite number of discontinuities and is not impulsive for any values in its domain) and
The latter result holds at all points where FX is differentiable.
The number of commonly used random variables is large. The discrete uniform, the Bernoulli, the Binomial, the Poisson, and the geometric are common discrete random variables. Common continuous random variables include the uniform, the Gaussian, the exponential, and the Rayleigh. The random variables that are used in subsequent sections, and chapters, are detailed in the reference section at the end of the book.
A random variable X associates a number with each experimental outcome ω of an experiment, and the values X takes on define the sample space, SX, for the random variable. If there is a function, g, which maps each number in the sample space of the random variable X to a new subset of the real line, denoted SY, then a new random variable Y has been defined according to . These mappings are illustrated in Figure 4.5.
Consistent with the illustration in Figure 4.5, the random variable Y could have been defined directly on the set S of experimental outcomes. However, in many instances, the outcomes of a random variable, X, are observed, and then algebraic operations on the outcomes are undertaken consistent with a mapping as defined by a function g. Accordingly, it is natural to consider functions of a random variable.
The expectation operator allows the precise definition of various statistical properties of a random variable including the mean and variance.
The mean and variance are first-order measures of the nature of a random variable.
To simplify notation in the aforementioned definitions, the limits of to are often used for the summation, while the limits of and are often used for the integral.
The characteristic function of a random variable facilitates analysis, for example, when determining the probability density function of a summation of identical random variables.
Consider the random sum of Gaussian random variables defined according to
where M is a discrete random variable taking on the positive values m1, m2, …, with probabilities p1, p2, …, X1, X2, … is a sequence of independent Gaussian random variables and the probability density function of Xi is
The important result, which is a special case of the general result stated in Theorem 4.11, then follows.
Consider the random sum of independent Gaussian random variables
where , , , , , , , and . Consistent with Theorem 4.10, the probability density function is
where , , , and . This probability density function is shown in Figure 4.6.
Modern mathematical software packages can generate random data consistent with many standard distributions. For non-standard distributions, a method is required to generate data consistent with a defined probability density function. The usual starting point is to generate a number, uo, at random from the interval [0,1].
Consider a random variable with a probability density function, and a cumulative distribution function, defined according to
The transformation required is
The experiments, whether real or hypothesized, that underpin random phenomena lead to a variety of sample spaces S:
The ith trial of an experiment will result, depending on the nature of the random phenomenon being modelled, in a single outcome ωi, a vector of outcomes , a matrix of outcomes , etc. A vector of outcomes may arise from N repeated trials of a subexperiment; a matrix of outcomes may arise from N trials of a subexperiment where each subexperiment is a result of M sub-subexperiments. The following are typical forms for sample spaces:
When a number xi, a vector of numbers , or a matrix of numbers , as appropriate, is associated with each experimental outcome, a random variable with is defined in the usual manner. The corresponding sample spaces of the random variables are
For the case where the experimental outcomes are numerical values, a random variable can be defined based directly on the experimental outcomes according to . For this case, and with the notation , the sample space SX is . For such a case, it is usual to use the probability mass function, pΩ, or the probability density function, fΩ, of the experimental outcomes and not the corresponding function, pX or fX, of the random variable X.
Consider a random variable , , where SX comprises of vectors in N dimensional space consistent with . For this case, the random variable can be interpreted as a vector random variable.
Often, the values defined by a vector random variable are utilized directly, and the experiment and sample space for the random phenomena underpinning the vector random variable are left implicit. For this case, the sample space SX of the random vector is defined directly and in the following manner:
The probability associated with a specific outcome xi is usually clear.
To establish the properties of multiple random variables, the starting point is to establish the properties for the vector random variable case of two dimensions, that is, properties for the case of . Of importance is the generalization of the ways of characterizing a random variable for the one-dimensional case: the generalization of the cumulative distribution function, the probability mass function, and the probability density function.
For notational convenience, the vector pair of random variables (X1, X2) is written as (X, Y), and the notation is used.
The sample space for has the following general forms:
where the set RXY consists of a countable and an uncountable number of ordered pairs for, respectively, the discrete and continuous random variable cases.
Consider a pair of random variables defined on a sample space S. The generalization of the cumulative distribution function for the one-dimensional case is the joint cumulative distribution function.
The cumulative distribution function has the following properties:
For the joint random variable case, the individual probability mass functions are still of interest, and it is useful to be able to determine these from the joint probability mass function.
For the case of two continuous random variables (X, Y) and consistent with the one-dimensional case, the probability of a precise value, and , equals zero for all values of x and y. Nonzero probabilities are only associated with intervals or sets. The joint probability density function of two continuous random variables (X, Y) can be defined in an analogous manner to the one-dimensional case.
If X and Y are continuous random variables with a joint probability density function, fXY, and a differentiable joint cumulative distribution function FXY, then
The following properties hold for the joint probability density function:
For any subset , it is the case that
Consider a pair of continuous random variables (X, Y). The marginal cumulative distribution function, FX and FY, can be determined from the joint cumulative distribution function according to
The marginal probability density function, fX and fY, can be determined from the joint probability density function according to
The definition of the joint and marginal probability mass function, and the joint and marginal probability density function, allows the important linearity property of the expectation operator to be proved.
In general, the random variable where
is such that the random variables X and Y are not independent, that is, the occurrence of contains information about the occurrence, or non-occurrence, of and vice versa.
Consider a communication channel where one trial of an experiment is to send one bit of information, denoted x, and to recover this information, denoted y, at a receiver. The outcome of the trial is the ordered pair (x, y) where for binary data communication. For a good communication channel, it is expected that is consistent with and that is consistent with , that is, reliable communication is consistent with the output data containing information about the sent, or input, data.
Characterization of the dependence between two random variables is important and conditional probability mass functions, and conditional probability density functions, are widely used. Understanding these functions is facilitated by a discussion of conditioning.
Consider , where the sample space of Z is defined by Equation 4.109, and consider events that lead to the subsets . It follows from the conditional probability result , that
The following notation is used:
Consider the case of a random variable , , which defines the sample space
and where the probabilities of the outcomes are defined in Figure 4.9.
Consider the case of , , and and the goal of determining PZ[A1/B] and PZ[A2/B]. First,
and, as the set consists of elementary events, the probability of B is
It then follows from Equation 4.110 that
Consider the random variable , with the sample space defined by Equation 4.109. The two component random variables X and Y are mappings:
The intersection of the events , defines a subset :
The probability of occurrence of the joint events is
From conditional probability theory,
and the following definition can be made.
For and with the sample space defined by Equation 4.109, the joint conditional probability mass function can be defined for the case where X and Y are discrete random variables.
For and with the sample space defined by Equation 4.109, the joint conditional probability density function can be defined for the case where X and Y are continuous random variables.
Consider the determination of the conditional joint probability density function, fXY/A, for the case of :
where fXY is defined according to
First, Figure 4.10 illustrates the possible outcomes of the random variable Z and the outcomes consistent with the event A. Second, the probability of the event A can be determined from the joint probability mass function:
Finally, using the result stated in Theorem 4.15, the required result follows:
The following general results have been established: first, for discrete random variables,
Second, for continuous random variables,
A specific subcase of these results, that of conditioning on a single outcome of one of the random variables, is widely used and of importance. The following definitions clarify notation.
The covariance function applies to two random variables and is a generalization of the variance function for a single random variable. A related function is the correlation function.
Consider two subexperiments with countable samples spaces
which underpin an experiment with a sample space S:
The probability of individual outcomes is specified according to
Consider the case where a random variable is defined on S according to where and define random variables on the two subexperiments in a manner such that distinct outcomes lead to distinct values for the random variables. Notation for the outcomes of the random variables is detailed in Table 4.2.
Table 4.2 Outcomes of the random variables Z, X, and Y
ω | Z(ω) | X(ω) | Y(ω) |
ω1 | |||
ω2 | |||
ω3 | |||
… |
The joint probability of (xi, yj) is , and for the case where X is independent of Y, it is the case that .
The following measures can be proposed for the dependence of the random variables X and Y for the specific case of the outcome (xi, yj):
Consider the first measure. When this measure is weighted by the values of the random variables according to
and summed over all possible outcomes, a mean measure for the dependence of the random variables X and Y is obtained according to
This measure is the covariance function of the two random variables X and Y as the following analysis shows:
The covariance function utilizes the simplest measure for the dependence between two random variables.
The concept of uncorrelatedness for two random variables arises from the definition of the covariance.
Consider the case of a random variable , with outcomes of the form
and with
It then follows that
It then follows that , which implies dependence between X and Y. The covariance between X and Y is zero as
Thus, the random variables are uncorrelated but dependent. The joint probability density function is shown in Figure 4.12 for the case of .
The correlation coefficient is widely used as it gives a normalized measure of the correlation between two random variables.
Consider the random variable Z that is defined as the weighted summation of N random variables X1, …, XN according to
The following important results hold.
The probability density function for a sum of independent Gaussian random variables has been detailed in Theorem 4.11.
In general and consistent with the -fold integral expression specified in Equation 4.181, determining the probability density function of a sum of random variables is difficult.
The number of useful joint probability mass functions and joint probability density functions is large. The most widely used joint probability density function is the bivariate Gaussian.
The outcomes of complex phenomena, in many instances, can be modelled based on the outcomes of repetitions of a simple experiment. The simplest experiment is the Bernoulli experiment with two outcomes: success or failure. The probability of a specific outcome arising from a large number of repetitions of such an experiment is specified by the binomial probability mass function. This probability mass function is in terms of factorials and powers and computational complexity is reduced if suitable approximations can be defined. The two important approximations are the DeMoivre–Laplace approximation and the Poisson approximation. Both approximations are based on Stirling’s formula, which states an approximation to a factorial number.
Stirling’s formula (Stirling, 1730; Feller, 1957, p. 50f; Parzen, 1960, p. 242) provides an accurate approximation to the factorial of a number.
The first important approximation to the binomial probability mass function is the DeMoivre–Laplace approximation (Feller, 1945; Feller, 1957, p. 168f; Papoulis and Pillai, 2002, p. 105f).
For the case of , the relative error is usually small. For a given N and μ and a set bound on the relative error, a nonlinear root solving algorithm can be used on Equation 4.191 to solve for the range of k for which the DeMoivre–Laplace approximation is valid.
The binomial probability mass function, along with the relative error and the approximation to the relative error as given by Equation 4.191, is shown in Figure 4.15 for the case of and . The DeMoivre–Laplace approximation is valid, with a relative error less than 0.1, for assuming the relative error expression specified in Equation 4.191. The approximate range for k, as given by Equation 4.192, is , which is slightly conservative. This arises as .
The following theorem specifies a useful extension of the DeMoivre–Laplace theorem.
For the case when and , the Poisson approximation to the binomial probability mass function is appropriate (Feller, 1957, p. 142f).
The binomial probability mass function, along with the relative error and relative error approximation as given by the two expressions in Equation 4.199, are shown in Figure 4.16 for the case of and . The Poisson approximation is valid as , , , and the region of validity, as given by , is .
assuming the set of events {B1, …, BN} is a partition of the sample space.
where the first element in the ordered pair represents the sent information and the second element represents the received information.
where A is a subset of S. Define, and graph, the probability mass function, and the cumulative distribution function, of X.
This result states a memoryless property: if there have been no successes up until the koth subexperiment, then the probability of the first success occurring after a further k1 subexperiments is independent of ko.
where λ is the mean arrival rate. The probability mass function is Poisson with a parameter λT. By considering an infinite number of such detectors, and the experiment of choosing one photodetector at random, an experimental sample space can be defined based on the photon arrival times.
A new random variable, Z, is defined according to
which is the magnitude of the vector defined by each point. Note that
where . Determine the probability density function, and cumulative distribution function, of Z. This probability density function finds widespread application in communications.
Determine the mean and variance of X.
Determine the probability density function of Y in terms of the probability density function of X.
Specify the set of experimental outcomes.
and the probability of occurrence of a given outcome is . Two random variables, and an event A, are defined on this sample space according to
Determine the conditional probability mass function pXY/A(x, y) for the case of and for the general case where .
where X and Y are independent random variables with probability density functions:
when the probability of success in an independent trial is p.
Determine the range of validity of the DeMoivre–Laplace approximation for a relative error of less than 0.1 and compare these values with the range
for the Poisson approximation to the binomial probability mass function by considering the following values: (i) , (ii) , and (iii) .
By definition
The integral is that of a Gaussian probability density function with a mean of and a variance of σ2, and accordingly, the integral is unity. The required result of
then follows. The final result arises by using the association .
By definition
From Theorem 4.23,
Interchanging the order of integration yields
A change of variable in the inner integral results in
The assumption of independence yields the required result:
The result for a weighted sum follows from the relationship given in Theorem 4.7: if , then .
Consider the disjoint outcomes of SM and their associated probabilities:
where is the probability density function of . It then follows that
The assumption of independent random variables, and the result for the characteristic function of a sum of independent random variables stated in Theorem 4.8, yields the required result:
Consider a random variable for some constant α. Consider
Since , it follows that
and
for all values of α. With , the upper bound for the correlation coefficient results according to
The lower bound for the correlation coefficient arises by choosing and proceeding in a similar manner.
Stirling’s formula can be proved by considering upper and lower bounds on the summation
To determine an upper bound on Sn, consider the integral and the integral approximation illustrated in Figure 4.17. Consistent with the areas defined in this figure, it follows that
Hence, an upper bound for Sn is
To determine a lower bound for Sn, consider an affine, and tangent, approximation to ln(x) around the point as illustrated in Figure 4.18. Consistent with the tangent approximation illustrated in this Figure, it follows that
As ln(x) is monotonically increasing, it follows that
and, hence,
Combining the upper and lower bounds for Sn, as given by Equations 4.243 and 4.246, yields
Evaluation of the integrals using the result
and substitution of the definition yields
Hence,
As the exponential function is a monotonically increasing function, it follows that
and thus,
It can be shown that the limit of the sequence is . It is the case that equals 2.527597, 2.508718, 2.506837, respectively, for the case of . Hence,
The relative error bound in the approximation is
Consider the use of Stirling’s formula on the result for k successes in N trials, that is,
which is the first required result. Using the relationship yields the second required form
To establish the third result, define the difference between k and the mean value of k, as given by Np, as Δk, that is, . The following results then hold
Substitution of these results into Equation 4.255 yields
With the manipulations
and , it follows that
Consider the expression
Taking the natural logarithm yields
Consider the expansion , which can be applied provided
Use of this expansion leads to
Expanding and collecting terms yields the following result:
where
It then follows that
When e(N, p, Δk) « 1, the DeMoivre–Laplace approximation follows from the definitions and , that is,
The relative error in the DeMoivre–Laplace approximation is
The DeMoivre–Laplace approximation is based on using Stirling’s formula, and the relative error in this approximation is small for N moderate to large. Accordingly, using Equation 4.260, a reasonable approximation to the relative error is
To obtain the last equation, the result has been used and this is the stated relative error expression.
An alternative approximation to the relative error can be obtained by considering Equation 4.267:
assuming . For the case of and , a simplified relative error approximation can be obtained from Equation 4.266 according to
The first and third terms typically dominate the second term and with :
To determine the range of values of k, where the approximation is within a set relative error, the equation . can be solved for x and hence k. To this end, consider the function
For the case of , the approximation yields a relative error of less than 10% in the solution of . With such an approximation, it follows that values of k in the range
satisfy the relative error criterion. Hence, for a set relative error,
As , it follows that
Consider , which can be written as
using the expansion . It then follows, assuming and , that . With the additional assumption of , the required approximation follows, that is,
The relative error in this approximation is
Stirling’s formula yields
The relative error is small when the magnitude of the argument of the exponential term is much less than one such that the approximation
is valid. To determine when the relative error is small, consider the approximation noted earlier for , which yields
Expanding out and collecting terms yields
With the existing assumptions of , the further assumptions of and are required for the relative error to be small. The restrictions on k then are
The assumption of implies that . In summary, the constraints for the validity of the Poisson approximation are . When these constraints hold, the relative error can be approximated according to
which is the required result.
18.218.234.83