Chapter 7    Probability, Random Variables, and Stochastic Processes

Dinesh Rajan

Southern Methodist University, Dallas, USA

7.1      Introduction to Probability

Probability theory essentially provides a framework and tools to quantify and predict the chance of occurrence of an event in the presence of uncertainties. Probability theory also provides a logical way to make decisions in situations where the outcomes are uncertain. Probability theory has widespread applications in a plethora of different fields such as financial modeling, weather prediction, and engineering. The literature on probability theory is rich and extensive. A partial list of excellent references includes [1, 5]. The goal of this chapter is to focus on the basic results and illustrate the theory with several numerical examples. The proofs of the major results are not provided and relegated to the references.

While there are many different philosophical approaches to define and derive probability theory, Kolmogorov’s axiomatic approach is the most widely used. This axiomatic approach begins by defining a small number of precise axioms or postulates and then deriving the rest of the theory from these postulates.

Before formally defining Kolmogorov’s axioms, we first specify the basic framework to understand and study probability theory. Probability is essentially defined in the context of a repeatable random experiment. An experiment consists of a procedure for conducting the experiment and a set of outcomes/observations of the experiment. A model is assigned to the experiment which affects the occurrence of the various outcomes. A sample space, S, is a collection of finest grain, mutually exclusive and collectively exhaustive set of all possible outcomes. Each element ω of the sample space S represents a particular outcome of the experiment. An event E is a collection of outcomes.

Example 7.1.1. A fair coin is tossed three times. The sample space S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. Event E1 = {HTT, THT, TTH} is the set of all outcomes with exactly 1 Head in the three coin flips.

Example 7.1.2. The angle that the needle makes in a wheel of fortune game is observed. The sample space S = {θ : 0 ≤ 9 < 2π}.

Events Ej and Ek are said to be mutually exclusive or disjoint events if there are no outcomes that are common to both events, i.e., EjEk = ϕ.

A collection of events defined over a sample space S is called a sigma field if:

•   includes both the impossible event ϕ and the certain event S.

•  For every set A , it implies that Ac .

•   is closed under countable set operations of union and intersection, i.e., AB and AB , ∀A, B .

Given a sigma Field , a probability measure Pr (·) is a mapping from every event A to a real number Pr (A) called the probability of event A satisfying the following three axioms:

1.  Pr (A) ≥ 0.

2.  Pr (S) = 1.

3.  For a countable collection of mutually exclusive events A1,A2,…, Pr (A1A2A3 ∪ …) = Pr (A1) + Pr (A2) + Pr (A3) + …

A probability space consists of the triplet (S, , P).

Example 7.1.3. A fair coin is flipped 1 time. In this case, S = {H,T}. The sigma field consists of the sets, {H}, {T}, {ϕ}, {S}. The probability measure maps these sets to the probabilities as follows: Pr (H) = Pr (T) = 0.5, Pr (ϕ) = 0, and Pr (S) = 1.

The following simple and intuitive properties of the probability of an event can be readily derived from these axioms:

•  The probability of the null set equals 0, i.e., Pr (ϕ) = 0.

•  The probability of any event A is no greater than 1, i.e., Pr (A) ≤ 1.

•  The sum of the probability of an event and the probability of its complement equals 1, i.e., Pr (Ac) = 1 − Pr (A).

•  If AB then Pr (A) ≤ Pr (B).

•  The probability of the union of events A and B can be expressed in terms of the probability of events A, B and their intersection AB, i.e.,

Pr(AB) = Pr(A) + Pr(B)  Pr(AB).

(7.1)

To prove (7.1), we can express AB in terms of three mutually exclusive sets A1 = AB, A2 = A − B and A3 = B − A. Hence, Pr (AB) = Pr (A1) + Pr (A2) + Pr(A3). Then by applying Axiom 3, we obtain Pr (A) = Pr(A1) + Pr(A2) and Pr(B) = Pr (A1) + Pr(A3). Property (7.1) readily follows. The other properties stated above can be similarly proved.

The conditional probability Pr (A∣B) for events A and B is defined as

Pr(A|B) = Pr(AB)Pr(B),

(7.2)

if Pr (B) > 0. This conditional probability represents the probability of occurrence of event A given the knowledge that event B has already occurred.

If events A1, A2,… An form a set of mutually exclusive events (AiAj = ϕi, j) that partition the sample space (A1A2 ∪ … An = S) then

Pr(Aj|B) = Pr(B|Aj)Pr(Aj)i = 1nPr(B|Ai)Pr(Ai).

(7.3)

Conditional probabilities are useful to infer the probability of events that may not be directly measurable.

Example 7.1.4. A card is selected at random from a standard deck of cards. Let event A1 represent the event of picking a diamond and let event B represent the event of picking a card with the number 7. Then the probability of the various events are Pr(A1) = 1/4 and Pr(B) = 1/13. Further, Pr(A1|B) = Pr(A1B)Pr(B) = 1/521/13 = 1/4. Also, Pr(B|A1) = Pr(A1B)Pr(A1) = 1/13.

Let events A2, A3 and A4 represent the event of picking, respectively, a heart, spade and clubs. Clearly, events Ai, i = 1, 2, 3, and 4 are mutually exclusive and partition the sample space. Now, we evaluate Pr (A1B) using Bayes results (7.3) as

Pr(A1|B) = Pr(B|A1)Pr(A1)i = 14Pr(B|Ai)Pr(Ai) = (1/13)(1/4)4(1/13)(1/4) = 1/4

(7.4)

which is the same value as calculated directly.

Example 7.1.5. Consider the transmission of a equiprobable binary bit sequence over a binary symmetric channel (BSC) with crossover probability α, i.e., a bit gets flipped by the channel with probability α. For simplicity, we consider the transmission of a single bit and let event A0 denote the event that a bit 0 was sent and event A1 denote the event that a bit 1 was sent. Similarly, let B0 and B1 denote, respectively, the event that bit 0 and bit 1 are received. In this case, the conditional probability that a bit 0 was sent given that a bit 0 was received can be calculated as

Pr(A0|B0) = Pr(B0|A0)Pr(A0)Pr(B0|A0)Pr(A0) + Pr(B0|A1)Pr(A1) = 0.5(1  α)0.5(1  α) + 0.5α = 1  α

(7.5)

Events A and B are independent events if

Pr(AB) = Pr(A)Pr(B).

(7.6)

Equivalently, the events are independent if Pr(A∣B) = Pr (A) and Pr (B∣A) = Pr(B). Intuitively, if events A and B are independent then the occurrence or nonoccurrence of event A does not provide any additional information about the occurrence or nonoccurrence of event B.

Multiple events E1, E2,… En are jointly independent if for every countable collection of events, the probability of their intersection equals the product of their individual probabilities. It should be noted that pairwise independence of events does not imply joint independence as the following example clearly illustrates.

Example 7.1.6. A fair coin is flipped n − 1 times, where n is odd and event Ei, i = 1, 2,…, n − 1 represents the event of receiving a Head in the ith flip. Let event En represent the event that there are even number of Heads in the n − 1 flips. Clearly, we can evaluate the probability of the various events as Pr (E)i = 1/2, ∀i = 1, 2,…, n. It is also clear that Pr (EiEj) = 1/4, ∀ij, which implies that the events are pairwise independent. It can also be verified that any k-tuple of these events are independent for k < n. However, events E1, E2,… En are not n independent, since Pr(E1E2En) = (1/2)n  1i = 1nPr(Ei).

7.2      Random Variables

A random variable, X (ω), is a mapping that assigns a real number for each value ω in the set of outcomes of the random experiment. The mapping needs to be such that all outcomes that are mapped to the values +∞ and −∞ should have probability 0. Further, for all values x, the set {Xx} corresponds to an event. Random variables are typically used to quantify and study the statistical properties associated with a random experiment.

A complex random variable is defined as Z = X + iY where X and Y are real valued random variables. For simplicity, most of the material in this chapter will focus on real valued random variables.

The cumulative distribution function (CDF) or probability distribution function, FX, of random variable X is defined as

Fχ(x) = Pr(χx)

(7.7)

The following properties of the CDF immediately follow:

•  The CDF is a number between 0 and 1, i.e., 0 ≤ FX (x) ≤ 1.

•  The CDF of a random variable evaluated at infinity and negative infinity equals, 1 and 0, respectively, i.e., FX (∞) = 1 and FX (−∞) = 0.

•  The CDF FX (x) is a nondecreasing function of x.

•  The probability that the random variable takes values between x1 and x2 is given by the difference in the CDF at those values, i.e., Pr (x1<Xx2) = FX(x2)  FX(x1) , if x1 < x2.

•  The CDF is right continuous, i.e., limϵ→0 FX (x + ϵ) = FX (x), when ϵ > 0.

A random variable is completely defined by its CDF in the sense that any property of the random variable can be calculated from the CDF. A random variable is typically categorized as being a discrete random variable, continuous random variable, or mixed random variable.

7.2.1      Discrete Random Variables

Random variable X is said to be a discrete random variable if the CDF is constant except at a countable set of points. For a discrete random variable, the probability mass function (PMF), PX (x), is equal to the probability that random variable X takes on value x. Thus PX (x) = FX (x) − FX (x). Clearly, since the PMF represents a probability value, PX (x) ≥ 0 and xPX(x) = 1. Similar to the CDF, the PMF also completely determines the properties of a discrete random variable.

Example 7.2.1. Let random variable X be defined as the number of Heads that appear in 3 flips of a biased coin with probability of Head in each flip equal to 0.3. Figure 7.1 shows a plot of the PMF and corresponding CDF for random variable X .

Certain random variables appear in many different contexts and consequently they have been assigned special names. Moreover, their properties have been thoroughly studied and documented. We now highlight a few of the common discrete random variables, their distributions, and a typical scenario where they are applicable.

•  A Bernoulli random variable takes values 0 and 1 with probabilities α and 1 − α, respectively. A Bernoulli random variable is commonly used to model scenarios in which there are only two possible outcomes such as in a coin toss or in a pass or fail testing.

Image

Figure 7.1:  Illustration of the PMF and CDF of a simple discrete random variable in Example 7.2.1.

•  A binomial random variable X takes values in the set {0,1, 2,…, N} and could represent the number of Heads in N independent flips of a coin. If the probability of receiving a Head in each flip of the coin equals p, then the PMF of the binomial random variable is given by

Pχ(x) = (Nx)px(1  p)(N  x),0xN

(7.8)

Example 7.2.2. In error control coding, a rate 1/n repetitive code [6] consists of transmitting n identical copies of each bit. Let these bits be transmitted over a binary symmetric channel (BSC) with crossover probability p. In this case, random variable X that represents the number of bits that are received correctly has the binomial distribution given in (7.8).

•  A geometric random variable has a PMF of the form

Pχ(x) = (1  p)xp,x = 0,1,2,

(7.9)

Example 7.2.3. Consider a packet communication network in which a packet is retransmitted by the transmitter until a successful acknowledgment is received. In this case, if the probability of successful packet transmission in each attempt equals p and each transmission attempt is independent of the others, then random variable X which represents the number of attempts until successful packet reception has a geometric distribution given in (7.9).

•  A discrete uniform random variable X has PMF of the form

Pχ(x) = 1b  a + 1,x = a,a + 1,,b

(7.10)

where without loss of generality b ≥ a.

•  A Pascal random variable has PMF

Pχ(x) = (Nx)(1  p)xp,x = L,L + 1,L + 2,

(7.11)

Consider a sequence of independent Bernoulli trials in which the probability of success in each trial equals p. The experiment is repeated until exactly L successes. The random variable X that represents the number of trials has a Pascal distribution.

•  A Poisson random variable has PMF of the form

Pχ(x) = e  aaxx!,x = 0,1,

(7.12)

The Poisson random variable is obtained as the limit of the binomial random variable in the limit that n → ∞ and p → 0 but the product np is a constant. The Poisson random variable represents the number of occurrences of an event in a given time period. For instance, the number of radioactive particles emitted in a given period of time by a radioactive source is modeled as a Poisson random variable. Similarly, in queueing theory a common model for packet arrivals is a Poisson process, in which the number of packet arrivals per unit time is given by (7.12).

7.2.2      Continuous Random Variables

Random variable X is said to be a continuous random variable if the CDF of X is continuous. The probability density function (PDF) fX (x) of random variable X is defined as

Fχ(x) =   xfχ(u)du

(7.13)

Note that unlike the PMF, the PDF may take values greater than 1. The PDF is only proportional to the probability of an event. The interpretation of the PDF is that the probability of X taking values between x and x + δx approximately equals fX (x)δx, for small positive values of δx. Similar to the CDF, the PDF is also a complete description of the random variable. The PDF of X satisfies the following properties:

•  Since the CDF is a nondecreasing function, the PDF is non-negative, i.e., fX (x) ≥ 0.

•  The integral of the PDF over a certain interval represents the probability of the random variable taking values in that interval, i.e., abfX(x)dx = Pr(a<Xb).

•  Extending the above property, the integral of the PDF over the entire range -TO equals 1, i.e.,   fX(x)dx = 1.

Example 7.2.4. Suppose a random point on the number line uniformly between the values 0 and 3. Let random variable X represent the coordinate of that point. Then the PDF of X is given by fX(x) = 13, 0 < x < 3. The corresponding CDF is given by

Fχ(x) = {0x0x30<x<31x3

(7.14)

A plot of this PDF and CDF are given in Figure 7.2.

Image

Figure 7.2:  Illustration of the PMF and CDF of a simple discrete random variable in Example 7.2.4.

Similar to the discrete case, several commonly occurring continuous random variables have been studied including the following:

•  The PDF of a uniform random variable is given by

fχ(x) = 1b  a,x(a,b)

(7.15)

•  The PDF of a Gaussian random variable (also referred to as Normal random variable) is given by

fχ(x) = 12πσ2e  (x  μχ)22σ2.

(7.16)

The special case of a Gaussian random variable with 0 mean and unit variance is called a standard normal random variable. As will become clear in our study of the Central Limit Theorem, the distribution of any sum of independent random variables asymptotically approaches that of a Gaussian. Consequently, many noise and other realistic scenarios are modeled as Gaussian. The CDF of a Gaussian random variable is unfortunately not known in closed form. However, the CDF of a standard normal random variable has been computed numerically for various values and provided in the form of tables in several books. This CDF, denoted by Φ is defined as

Φ(x) =   x12πe  u22du

(7.17)

The CDF of any other Gaussian random variable X with mean μX and variance σX2 can be evaluated using the CDF tables of a standard normal random variable as follows

Pr(χx) =   x12πσχ2e  (u  μχ)22σχ2du = Φ(x  μχσχ).

(7.18)

Note that several authors use other variants of Φ to numerically calculate the CDF and tail probabilities of a Gaussian random variable. For instance, the error function er f (x) is defined as

erf(x) = 1π0xe  t2dt

(7.19)

This error function er f (x) and the Φ function can be expressed in terms of each other as Φ(x) = 0.5 + erf(x/2) and erf(x) = Φ(x2  0.5) for positive values of x.

•  An exponential random variable has a PDF given by

fχ(x) = ae  ax,x>0

(7.20)

The exponential distribution is frequently used to model the interarrival time between packets in the queueing theory (see Chapter 17). The exponential distribution has the special memoryless property as demonstrated by the following example.

Example 7.2.5. Let the lifetime of a fluorescent bulb be modeled as an exponential random variable X with a mean of 10 years. Then the probability that the lifetime X exceed 15 years is given by

Pr(X>15) = x = 151/10e  x/10dx = e  15/10

(7.21)

Now suppose the bulb has already been working for 6 years. In this case, the conditional probability that the lifetime X exceeds 15 years is given by

Pr(X>15/X>6) = e  15/10e  6/10 = e  9/10

(7.22)

which is the same as the probability that the lifetime exceeds 15 − 6 = 9 years. The exponential random variable is the only continuous random variable that has this memoryless property.

7.3      Joint Random Variables

Recall that a random variable is a mapping from the set of outcomes of an experiment to real numbers. Clearly, for a given set of experimental outcomes, there could be numerous mappings representing different random variables to different sets of real numbers. To understand the relationships between the various random variables, it is not sufficient to study their properties independent of each other; a joint study of these random variables is required.

The joint CDF FX,Y (x,y) of two random variables X and Y is given by

Fχ,Y(x,y) = Pr(χx,Yy)

(7.23)

Similar to the case of a single random variable, the joint CDF completely specifies the properties of the random variables. From the joint CDF, the marginal CDF of RVs X and X can be obtained as FX (x) = FX,Y (x, ∞) and FY (y) = FX,Y (∞,y). The joint CDF satisfies the following properties:

•  0 ≤ FX,Y (x, y) ≤ 1.

•  FX,Y (−∞, −∞) = 0 and FX,Y (∞, ∞) = 1.

•  Pr (a < X < b, c < Y ≤ d) = FX,Y (a,c) + FX,Y (b,d) − FX,Y (a,d) − FX,Y (b,c).

•  FX,Y (x, y) = limϵ→0,ϵ>0 FX,Y (x+ϵ, y) and FX,Y (x, y) = limϵ→0,ϵ>0 FX,Y (x, y+ ϵ).

The joint PDF of RVs X and Y is given by any function FX,Y (x, y) such that

Fχ,Y(x,y) =   x  yfχ,Y(u,v)dvdu

(7.24)

Example 7.3.1. Consider random variables X and Y with joint PDF given by

fχ,Y(x,y) = {a(x + y + xy)0x1,0y20else

(7.25)

The value of constant a =1/4 can be computed using the property that the integral of the PDF over the entire interval equals 1. In this case, the CDF can be computed as

Fχ,Y(x,y) = {1x>1,y>2a(x2y/2 + y2x/2 + x2y2/4)0x1,0y20else

(7.26)

The probability of various events can be computed either from the PDF or from the CDF. For example, let event A = {0 ≤ X ≤ 1/2, 1 ≤ Y ≤ 2}. The probability of A can be calculated using the PDF as

Pr(A) = 01/212a(x + y + xy)dydx

(7.27)

 = a01/2(x + 3/2 + 3/2x)dx

(7.28)

 = 17/64

(7.29)

The same probability can also be calculated using the joint CDF as

Pr(A) = Fχ,Y(1/2,2) + Fχ,Y(0,1)  Fχ,Y(0,2)  Fχ,Y(1/2,1)

(7.30)

 = 3/8 + 0  0  7/64

(7.31)

 = 17/64

(7.32)

The marginal PDFs of X and Y can now be computed as

fχ(x) = y = 02a(x + y + xy)dy = a(4x + 2),

(7.33)

andfY(y) = x = 01a(x + y + xy)dx = a(3y + 1)/2.

(7.34)

The conditional PDF fX|Y (x∣y) is defined as

fχ|Y(x|y) = fχ,Y(x,y)fY(y)

(7.35)

when fY (y) > 0. For instance, in Exercise 7.3.1, the conditional PDF fX|Y(x|y) = 2(x + y + xy)3y + 1, 0<x<1 and conditional PDF fX|Y(y|x) = (x + y + xy)4x + 2, 0<y<1. Continuous random variables X and Y are said to be independent if and only if

fχ,Y(x,y) = fχ(x)fY(y),x,y.

7.36

Example 7.3.2. Let the joint PDF of random variables X and Y be given by fX,Y (x, y) = xy for 0 ≤ x ≤ 1,0 ≤ y ≤ 2. Then the corresponding marginal PDFs are given by fX(x) = y = 02xydy = 2x, 0 ≤ x ≤ 1 and fY(y) = x = 02xydx = y/2, 0 ≤y ≤ 2. Clearly, fX,Y (x,y) = fX (x)fY (y), which implies that random variables X and Y are independent.

Similarly, discrete random variables X and Y are said to be independent if and only if

Pχ,Y(x,y) = Pχ(x)PY(y),x,y

(7.37)

It is important to note that for independence of random variables (7.37) needs to be satisfied for all values of the random variables X and Y .

Example 7.3.3. Let the joint PMF of random variables X and Y be given by

Pχ,Y(x,y) = {1/6χ = 1,Y = 11/4χ = 1,Y = 21/12χ = 1,Y = 31/6χ = 2,Y = 11/12χ = 1,Y = 21/4χ = 1,Y = 3

(7.38)

The marginal PMFs of X and Y can be computed to show that they are both uniform densities over their respective alphabets. In this case, it can be easily verified that the events X = 1 and Y =1 are independent. However, the events X = 1 and Y = 2 are not independent. Thus, the random variables X and Y are not independent.

Example 7.3.4. Consider a network in which packets are routed from the source node to the destination node using a routing protocol. Let there also be a probability a of packet loss at each node due to buffer overflows or errors in the link. In order to increase the overall chances of success, the routing algorithm sends three copies of each packet over different mutually exclusive routes. The three routes have a1, a2 and a3 hops between the source and destination, respectively. Assume that the probability of success in each hop is independent of the other hops. In this case the overall probability that at least one copy of the packet is received correctly at the destination node can be calculated as 1  i = 13(1  (1  )(1  α)αi).

7.3.1      Expected Values, Characteristic Functions

As noted before, the PDF, CDF and PMF are all complete descriptors of the random variable and can be used to evaluate any property of the random variable. However, for many complex scenarios, computing the exact distribution can be challenging. In contrast, there are several statistical values that are computationally simple, but provide only partial information about the random variable. In this section, we highlight some of the frequently utilized statistical measures.

The expected value, E {X }, of random variable X is defined as

E{χ} = μχ = xfχ(x)dx

(7.39)

In general the expected value of any function g(X ) of a random variable X is given by

E{g(χ)} = g(x)fχ(x)dx

(7.40)

The term, E {Xk } is known as the kth moment of X . The variance σX2 of X is related to the second moment and is given by

σχ2 = E{(χ  μχ)2} = E{χ2}  μχ2

(7.41)

As another variation, the kth central moment of random variable is defined as E{(X  μX)k} .

The covariance between random variables X and Y is defined as

Cov(χ,Y) = E{(χ  μχ)(Y  μY)}

(7.42)

The correlation coefficient ρXY is defined as

ρχ,Y = E{(χ  μχ)(Y  μY)}σχσY

(7.43)

Example 7.3.5. Jointly Gaussian Vector The joint PDF of the Gaussian vector [X1,X2,XN] is given by

fχ1,χ2,,χn(x1,x2,xn) = 1(2π)N/2|Rχ|1/2e  12(x  μx)TRχ  1(x  μx)

(7.44)

where x = [x1, x2,… xn]T and μX = [μX1μX1μXN]T is the mean of the different random variables and RX is the covariance matrix with ith row and jth column element given by Cov(Xi,Xj) . Gaussian random vectors are frequently used in several signal processing applications. For instance, when estimating a vector parameter in the presence of additive noise. The reasons for the popularity of these Gaussian vector models are: i) by central limit theorem, the noise density is well approximated as a Gaussian, ii) several closed form analytical results can be derived using the Gaussian model, and iii) the results derived using a Gaussian approximation serves as a bound for the true performance.

The marginal density of a jointly Gaussian vector are a Gaussian random variable. However, marginal densities being Gaussian does not necessarily imply that the joint density is also Gaussian.

The random variables X and Y are said to be uncorrelated if ρX,Y = 0. If X and Y are independent, then

E{(χ  μχ)(Y  μY)} = E{χ  μχ}E{Y  μY} = 0,

(7.45)

which implies that the random variables are also uncorrelated. However, uncorrelated random variables are not always independent as demonstrated by the following example.

Example 7.3.6. Let X1 be uniformly distributed in the interval (0, 2π). Let X2 = cos(X1 ) and X3 = sin(X1). Then it is clear that μX2 = μX2 = 0 and E {X2,X3} = 0. Consequently, ρX2,Y3 = 0. However, it is clear that X2 and X3 are dependent random variables since X22 + X32 = 1 and given the value of X2 the value of X3 is known except for its sign.

In the special case that X2 and X2 are jointly Gaussian, then if they are uncorrelated they are also independent. This result can be verified by plugging in crosscorrelation values of 0 in the autocorrelation matrix RX that determines the joint PDF of X2 and X3 . Matrix R becomes a diagonal matrix and consequently the joint PDF then simply becomes the product of the marginal PDFs.

The characteristic function ϕX(ω) of X is defined as

ϕχ(ω) = E{ejωχ} = fχ(x)ejωxdx

(7.46)

The characteristic function and the PDF form a unique pair; thus, the characteristic function also completely defines the random variable.

The characteristic function can be used to easily compute the moments of the random variable. Using the Taylors series expansion of ejωx, we can expand the characteristic function as

ϕχ(ω) = E{ejωχ}

(7.47)

 = E{1 + jωx + (jωx)22! + }

(7.48)

 = 1 + jωE{χ} + (jω)22!E{χ2} + 

(7.49)

Now to compute the kth moment E{Xk} , we can differentiate (7.49) k times with respect to ω and then evaluate the result at ω = 0. Thus, E{Xk}1jkdkdωkϕX(ω).

Example 7.3.7. Let X be an exponential random variable with parameter λ. The characteristic function of this random variable X is given by

ϕχ(ω) = E{ejωx} = fχ(x)ejωxdx = 0λe  λxejωxdx = λλ  jω.

(7.50)

The mean of X can be calculated as

μχ = 1jddωϕχ(ω)|ω = 0 = 1jλj(λ  jω)2|ω = 0 = 1λ

(7.51)

The second order moment can be evaluated as

E{χ2} = 1j2ddω2ϕχ(ω)|ω = 0 = 1j2  2λ(λ  jω)3|ω = 0 = 2λ2

(7.52)

Consequently, the variance can be calculated as

σχ2 = E{χ2}  μχ2 = 1λ2

(7.53)

The second characteristic function ψX(ω) is defined as the natural logarithm of the function ϕX(ω) . The cumulants λn are

λn = dΨ(s)dsn|s = 0

(7.54)

The various cumulants are related to the moments as follows:

λ1 = E{χ} = μχ

(7.55)

λ2 = E{(χ  E{χ})2} = σχ2

(7.56)

λ3 = E{(χ  E{χ})3}

(7.57)

λ4 = E{χ4}  4E{χ3}E{χ}  3(E{χ2})2 + 12E{χ2}(E{χ})2  6(E{χ})4

(7.58)

The cumulants of order higher than 3 are not the same as the central moment. Of special interest is the fourth-order cumulant, which is also referred to as kurtosis. The kurtosis is typically used as a measure of the deviation from Gaussianity of a random variable. The kurtosis of a Gaussian random variable equals 0. Further, for a distribution with a heavy tail and a peak at zero, the kurtosis is positive and for distribution with a fast decaying tail the kurtosis is negative.

Example 7.3.8. Consider the uniform random variable X with support over (0,1). The first four moments of X can be calculated as

E{χ} = 0,E{χ2} = 1/12,E{χ3} = 0,andE{χ4} = 1/80

The kurtosis of X can be now calculated using (7.58) as λ4 = −2/15.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.59.231