CHAPTER 1

Basic Probability Theory

PART I: THEORY

It is assumed that the reader has had a course in elementary probability. In this chapter we discuss more advanced material, which is required for further developments.

1.1 OPERATIONS ON SETS

Let inline denote a sample space. Let E1, E2 be subsets of inline. We denote the union by E1 inline E2 and the intersection by E1 inline E2. inline = inlineE denotes the complement of E. By DeMorgan’s laws inline = inline1 inline inline2 and inline = inline1 inline inline2.

Given a sequence of sets {En, n ≥ 1} (finite or infinite), we define

(1.1.1) numbered Display Equation

Furthermore, inline and inline are defined as

(1.1.2) numbered Display Equation

If a point of inline belongs to inline En, it belongs to infinitely many sets En. The sets inline, En and inline, En always exist and

(1.1.3) numbered Display Equation

If inline, En = inline, En, we say that a limit of {En, n ≥ 1} exists. In this case,

(1.1.4) numbered Display Equation

A sequence {En, n ≥ 1} is called monotone increasing if En inline En+1 for all n ≥ 1. In this case inline. The sequence is monotone decreasing if En inline En+1, for all n ≥ 1. In this case inline. We conclude this section with the definition of a partition of the sample space. A collection of sets inline = {E1, …, Ek} is called a finite partition of inline if all elements of inline are pairwise disjoint and their union is inline, i.e., Ei inline Ej = inline for all ij; Ei, Ej inline inline; and inline. If inline contains a countable number of sets that are mutually exclusive and inline, we say that inline is a countable partition.

1.2 ALGEBRA AND σ–FIELDS

Let inline be a sample space. An algebra inline is a collection of subsets of inline satisfying

(1.2.1) numbered Display Equation

We consider inline = inline. Thus, (i) and (ii) imply that inline inline inline. Also, if E1, E2 inline inline then E1 inline E2 inline inline.

The trivial algebra is inline0 = {inline, inline}. An algebra inline1 is a subalgebra of inline2 if all sets of inline1 are contained in inline2. We denote this inclusion by inline1 inline inline2. Thus, the trivial algebra inline0 is a subalgebra of every algebra inline. We will denote by inline(inline), the algebra generated by all subsets of inline (see Example 1.1).

If a sample space inline has a finite number of points n, say 1 ≤ n < ∞, then the collection of all subsets of inline is called the discrete algebra generated by the elementary events of inline. It contains 2n events.

Let inline be a partition of inline having k, 2 ≤ k, disjoint sets. Then, the algebra generated by inline, inline(inline), is the algebra containing all the 2k − 1 unions of the elements of inline and the empty set.

An algebra on inline is called a σfield if, in addition to being an algebra, the following holds.

(iv) If En inline inline, n ≥ 1, then inline En inline inline.

We will denote a σ–field by inline. In a σ–field inline the supremum, infinum, limsup, and liminf of any sequence of events belong to inline. If inline is finite, the discrete algebra inline(inline) is a σ–field. In Example 1.3 we show an algebra that is not a σ–field.

The minimal σ–field containing the algebra generated by {(-∞, x], -∞ < x < ∞ } is called the Borel σfield on the real line inline.

A sample space inline, with a σ–field inline, (inline, inline) is called a measurable space.

The following lemmas establish the existence of smallest σ–field containing a given collection of sets.

Lemma 1.2.1 Let inline be a collection of subsets of a sample space inline. Then, there exists a smallest σ–field inline(inline), containing the elements of inline.

Proof.   The algebra of all subsets of inline, inline(inline) obviously contains all elements of inline. Similarly, the σ–field inline containing all subsets of inline, contains all elements of inline. Define the σ–field inline(inline) to be the intersection of all σ–fields, which contain all elements of inline. Obviously, inline(inline) is an algebra.        QED

A collection inline of subsets of inline is called a monotonic class if the limit of any monotone sequence in inline belongs to inline.

If inline is a collection of subsets of inline, let inline* (inline) denote the smallest monotonic class containing inline.

Lemma 1.2.2. A necessary and sufficient condition of an algebra inline to be a σfield is that it is a monotonic class.

Proof.   (i) Obviously, if inline is a σ–field, it is a monotonic class.

(ii) Let inline be a monotonic class.

Let En inline inline, n ≥ 1. Define inline. Obviously Bn inline Bn+1 for all n ≥ 1. Hence inline. But inline. Thus, inline, En inline inline. Similarly, inline En inline inline. Thus, inline is a σ–field.        QED

Theorem 1.2.1. Let inline be an algebra. Then inline* (inline) = inline(inline), where inline(inline) is the smallest σfield containing inline.

Proof.   See Shiryayev (1984, p. 139).

The measurable space (inline, inline), where inline is the real line and inline = inline(inline), called the Borel measurable space, plays a most important role in the theory of statistics. Another important measurable space is (inlinen, inlinen), n ≥ 2, where inlinen = inline × inline × ··· × inline is the Euclidean n–space, and inlinen = inline × ··· × inline is the smallest σ–field containing inlinen, inline, and all n–dimensional rectangles I = I1 × ··· × In, where

Unnumbered Display Equation

The measurable space (inline, inline∞) is used as a basis for probability models of experiments with infinitely many trials. inline is the space of ordered sequences x = (x1, x2, …), −∞ < xn < ∞, n = 1, 2, …. Consider the cylinder sets

Unnumbered Display Equation

and

Unnumbered Display Equation

where Bi are Borel sets, i.e., Bi inline inline. The smallest σ–field containing all these cylinder sets, n ≥ 1, is inline(inline). Examples of Borel sets in inline(inline) are

(a) {x: x inline inline, inline, xn > a}

or

(b) {x: x inline inline, inline, xna}.

1.3 PROBABILITY SPACES

Given a measurable space (inline, inline), a probability model ascribes a countably additive function P on inline, which assigns a probability P{A} to all sets A inline inline. This function should satisfy the following properties.

(1.3.1) numbered Display Equation

(1.3.2) numbered Display Equation

Recall that if A inline B then P {A} ≤ P{B}, and P{inline} = 1 − P{A}. Other properties will be given in the examples and problems. In the sequel we often write AB for A inline B.

Theorem 1.3.1. Let (inline, inline, P) be a probability space, where inline is a σfield of subsets of inline and P a probability function. Then

(i) if Bn inline Bn + 1, n ≥ 1, Bn inline inline, then

(1.3.3) numbered Display Equation

(ii) if Bn inline Bn+1, n ≥ 1, Bn inline inline, then

(1.3.4) numbered Display Equation

Proof.   (i) Since Bn inline Bn + 1, inline. Moreover,

(1.3.5) numbered Display Equation

Notice that for n ≥ 2, since inlinen Bn−1 = inline,

(1.3.6) numbered Display Equation

Also, in (1.3.5)

(1.3.7) numbered Display Equation

Thus, Equation (1.3.3) is proven.

(ii) Since Bn inline Bn + 1, n ≥ 1, inlinen inline inlinen+1, n ≥ 1. inline. Hence,

Unnumbered Display Equation        QED

Sets in a probability space are called events.

1.4 CONDITIONAL PROBABILITIES AND INDEPENDENCE

The conditional probability of an event A inline inline given an event B inline inline such that P {B} > 0, is defined as

(1.4.1) numbered Display Equation

We see first that P{· | B} is a probability function on inline. Indeed, for every A inline inline, 0 ≤ P{A|B} ≤ 1. Moreover, P{inline| B} = 1 and if A1 and A2 are disjoint events in inline, then

(1.4.2) numbered Display Equation

If P{B} > 0 and P{A} ≠ P{A|B}, we say that the events A and B are dependent. On the other hand, if P{A} = P{A|B} we say that A and B are independent events. Notice that two events are independent if and only if

(1.4.3) numbered Display Equation

Given n events in inline, namely A1, …, An, we say that they are pairwise independent if P{Ai Aj} = P{Ai} P{Aj} for any ij. The events are said to be independent in triplets if

Unnumbered Display Equation

for any ijk. Example 1.4 shows that pairwise independence does not imply independence in triplets.

Given n events A1, …, An of inline, we say that they are independent if, for any 2 ≤ kn and any k–tuple (1 ≤ i1 < i2 < ··· < ikn),

(1.4.4) numbered Display Equation

Events in an infinite sequence {A1, A2, … } are said to be independent if {A1, …, An} are independent, for each n ≥ 2. Given a sequence of events A1, A2, … of a σ–field inline, we have seen that

Unnumbered Display Equation

This event means that points w in inline, An belong to infinitely many of the events {An}. Thus, the event inline, An is denoted also as {An, i.o. }, where i.o. stands for “infinitely often.”

The following important theorem, known as the Borel–Cantelli Lemma, gives conditions under which P{An, i.o.} is either 0 or 1.

Theorem 1.4.1 (Borel–Cantelli) Let {An} be a sequence of sets in inline.

(i) If inline P{An} < ∞, then P{An, i.o.} = 0.
(ii) If inline P{An} = ∞ and {An} are independent, then P{An, i.o. } = 1.

Proof.   (i) Notice that inline is a decreasing sequence. Thus

Unnumbered Display Equation

But

Unnumbered Display Equation

The assumption that inline P{An} < ∞ implies that inline P{Ak} = 0.

(ii) Since A1, A2, … are independent, inline1, inline2, … are independent. This implies that

Unnumbered Display Equation

If 0 < x ≤ 1 then log (1−x) ≤ −x. Thus,

Unnumbered Display Equation

since inlineP{An} = ∞. Thus inline = 0 for all n ≥ 1. This implies that P{An, i.o.} = 1.        QED

We conclude this section with the celebrated Bayes Theorem.

Let inline = {Bi, i inline J} be a partition of inline, where J is an index set having a finite or countable number of elements. Let Bj inline inline and P{Bj} > 0 for all j inline J. Let A inline inline, P{A} > 0. We are interested in the conditional probabilities P{Bj| A}, j inline J.

Theorem 1.4.2 (Bayes).

(1.4.5) numbered Display Equation

Proof.   Left as an exercise.        QED

Bayes Theorem is widely used in scientific inference. Examples of the application of Bayes Theorem are given in many elementary books. Advanced examples of Bayesian inference will be given in later chapters.

1.5 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Random variables are finite real value functions on the sample space inline, such that measurable subsets of inline are mapped into Borel sets on the real line and thus can be assigned probability measures. The situation is simple if inline contains only a finite or countably infinite number of points.

In the general case, inline might contain non–countable infinitely many points. Even if inline is the space of all infinite binary sequences w = (i1, i2, …), the number of points in inline is non–countable. To make our theory rich enough, we will require that the probability space will be (inline, inline, P), where inline is a σ–field. A random variable X is a finite real value function on inline. We wish to define the distribution function of X, on inline, as

(1.5.1) numbered Display Equation

For this purpose, we must require that every Borel set on inline has a measurable inverse image with respect to inline. More specifically, given (inline, inline, P), let (inline, inline) be Borel measurable space where inline is the real line and inline the Borel σ–field of subsets of inline. A subset of (inline, B) is called a Borel set if B belongs to inline. Let X: inlineinline. The inverse image of a Borel set B with respect to X is

(1.5.2) numbered Display Equation

A function X: inlineinline is called inline–measurable if X−1 (B) inline inline for all B inline inline. Thus, a random variable with respect to (inline, inline, P) is an inlinemeasurable function on inline. The class inlineX = {X−1(B): B inline inline} is also a σ–field, generated by the random variable X. Notice that inlineX inline inline.

By definition, every random variable X has a distribution function FX. The probability measure PX{·} induced by X on (inline, B) is

(1.5.3) numbered Display Equation

A distribution function FX is a real value function satisfying the properties

(i) inline FX(x) = 0;
(ii) inline FX(x) = 1;
(iii) If x1 < x2 then FX (x1) ≤ FX(x2); and
(iv) inlineFX(x + inline) = FX(x), and inline F(xinline) = FX (x−), all −∞ < x < ∞.

Thus, a distribution function F is right–continuous.

Given a distribution function FX, we obtain from (1.5.1), for every −∞ < a < b < ∞,

(1.5.4) numbered Display Equation

and

(1.5.5) numbered Display Equation

Thus, if FX is continuous at a point x0, then P{w: X(w) = x0} = 0. If X is a random variable, then Y = g(X) is a random variable only if g is inline–(Borel) measurable, i.e., for any B inline inline, g−1 (B) inline inline. Thus, if Y = g(X), g is inline–measurable and X inline–measurable, then Y is also inline–measurable. The distribution function of Y is

(1.5.6) numbered Display Equation

Any two random variables X, Y having the same distribution are equivalent. We denote this by Y ~ X.

A distribution function F may have a countable number of distinct points of discontinuity. If x0 is a point of discontinuity, F(x0) − F(x0−) > 0. In between points of discontinuity, F is continuous. If F assumes a constant value between points of discontinuity (step function), it is called discrete. Formally, let −∞ < x1 < x2 < ··· < ∞ be points of discontinuity of F. Let IA(x) denote the indicator function of a set A, i.e.,

Unnumbered Display Equation

Then a discrete F can be written as

(1.5.7) numbered Display Equation

Let μ1 and μ2 be measures on (inline, inline). We say that μ1 is absolutely continuous with respect to μ2, and write μ1 inline μ2, if B inline inline and μ2 (B) = 0 then μ1(B) = 0. Let λ denote the Lebesgue measure on (inline, inline). For every interval (a, b], −∞ < a < b < ∞, λ ((a, b]) = ba. The celebrated Radon–Nikodym Theorem (see Shiryayev, 1984, p. 194) states that if μ1 inline μ2 and μ1, μ2 are σ–finite measures on (inline, inline), there exists a inline–measurable nonnegative function f(x) so that, for each B inline inline,

(1.5.8) numbered Display Equation

where the Lebesgue integral in (1.5.8) will be discussed later. In particular, if Pc is absolutely continuous with respect to the Lebesgue measure λ, then there exists a function f ≥ 0 so that

(1.5.9) numbered Display Equation

Moreover,

(1.5.10) numbered Display Equation

A distribution function F is called absolutely continuous if there exists a nonnegative function f such that

(1.5.11) numbered Display Equation

The function f, which can be represented for “almost all x” by the derivative of F, is called the probability density function (p.d.f.) corresponding to F.

If F is absolutely continuous, then f(x) = inlineF(x) “almost everywhere.” The term “almost everywhere” or “almost all” x means for all x values, excluding maybe on a set N of Lebesgue measure zero. Moreover, the probability assigned to any interval (α, β], αβ, is

(1.5.12) numbered Display Equation

Due to the continuity of F we can also write

Unnumbered Display Equation

Often the density functions f are Riemann integrable, and the above integrals are Riemann integrals. Otherwise, these are all Lebesgue integrals, which are defined in the next section.

There are continuous distribution functions that are not absolutely continuous. Such distributions are called singular. An example of a singular distribution is the Cantor distribution (see Shiryayev, 1984, p. 155).

Finally, every distribution function F(x) is a mixture of the three types of distributions—discrete distribution Fd(·), absolutely continuous distributions Fac(·), and singular distributions Fs(·). That is, for some 0 ≤ p1, p2, p3 ≤ 1 such that p1 + p2 + p3 = 1,

Unnumbered Display Equation

In this book we treat only mixtures of Fd(x) and Fac(x).

1.6 THE LEBESGUE AND STIELTJES INTEGRALS

1.6.1 General Definition of Expected Value: The Lebesgue Integral

Let (inline, inline, P) be a probability space. If X is a random variable, we wish to define the integral

(1.6.1) numbered Display Equation

We define first E{X} for nonnegative random variables, i.e., X(w) ≥ 0 for all w inline inline. Generally, X = X+X, where X+ (w) = max (0, X(w)) and X(w) = −min (0, X(w)).

Given a nonnegative random variable X we construct for a given finite integer n the events

Unnumbered Display Equation

and

Unnumbered Display Equation

These events form a partition of inline. Let Xn, n ≥ 1, be the discrete random variable defined as

(1.6.2) numbered Display Equation

Notice that for each w, Xn (w) ≤ Xn+1(w) ≤ … ≤ X(w) for all n. Also, if w inline Ak, n, k = 1, …, n2n, then |X(w) − Xn(w)| ≤ inline. Moreover, An2n+1, n inline A(n+1)2n+1, n+1, all n ≥ 1. Thus

Unnumbered Display Equation

Thus for all w inline inline

(1.6.3) numbered Display Equation

Now, for each discrete random variable Xn(w)

(1.6.4) numbered Display Equation

Obviously E {Xn} ≤ n, and E{Xn+1} ≥ E{Xn}. Thus, inline E{Xn} exists (it might be +∞). Accordingly, the Lebesgue integral is defined as

(1.6.5) numbered Display Equation

The Lebesgue integral may exist when the Riemann integral does not. For example, consider the probability space (inline, inline, P) where inline = {x: 0 ≤ x ≤ 1}, inline the Borel σ–field on inline, and P the Lebesgue measure on [inline]. Define

Unnumbered Display Equation

Let B0 = {x: 0 ≤ x ≤ 1, f(x) = 0}, B1 = [0, 1]− B0. The Lebesgue integral of f is

Unnumbered Display Equation

since the Lebesgue measure of B1 is zero. On the other hand, the Riemann integral of f(x) does not exist. Notice that, contrary to the construction of the Riemann integral, the Lebesgue integral inline f(x)P{dx} of a nonnegative function f is obtained by partitioning the range of the function f to 2n subintervals inlinen = {inline} and constructing a discrete random variable inline = inline I{x inline inline}, where fn, j = inf{f(x): x inline inline}. The expected value of inline is E{inline} = inline P(X inline inline). The sequence {E{inline}, n≥ 1} is nondecreasing, and its limit exists (might be +∞). Generally, we define

(1.6.6) numbered Display Equation

if either E{X+} < ∞ or E{X} < ∞.

If E{X+} = ∞ and E{X} = ∞, we say that E{X} does not exist. As a special case, if F is absolutely continuous with density f, then

Unnumbered Display Equation

provided inline|x| f>(x)dx < ∞. If F is discrete then

Unnumbered Display Equation

provided it is absolutely convergent.

From the definition (1.6.4), it is obvious that if P{X(w) ≥ 0} = 1 then E{X} ≥ 0. This immediately implies that if X and Y are two random variables such that P{w: X(w) ≥ Y(w)} = 1, then E{XY} ≥ 0. Also, if E{X} exists then, for all A inline inline,

Unnumbered Display Equation

and E{XIA(X)} exists. If E{X} is finite, E{XIA(X)} is also finite. From the definition of expectation we immediately obtain that for any finite constant c,

(1.6.7) numbered Display Equation

Equation (1.6.7) implies that the expected value is a linear functional, i.e., if X1, …, Xn are random variables on (inline, inline, P) and β0, β1, …, βn are finite constants, then, if all expectations exist,

(1.6.8) numbered Display Equation

We present now a few basic theorems on the convergence of the expectations of sequences of random variables.

Theorem 1.6.1 (Monotone Convergence) Let {Xn} be a monotone sequence of random variables and Y a random variable.

(i) Suppose that Xn(w) inline X(w), Xn(w) ≥ Y(w) for all n and all w inline inline, and E{Y} > −∞. Then

Unnumbered Display Equation

(ii) If Xn(w) inline X(w), Xn(w) ≤ Y(w), for all n and all w inline inline, and E {Y} < ∞, then

Unnumbered Display Equation

Proof.   See Shiryayev (1984, p. 184).        QED

Corollary 1.6.1. If X1, X2, … are nonnegative random variables, then

(1.6.9) numbered Display Equation

Theorem 1.6.2. (Fatou) Let Xn, n ≥ 1 and Y be random variables.

(i) If Xn(w) ≥ Y(w), n ≥ 1, for each w and E{Y} > −∞, then

Unnumbered Display Equation

(ii) if Xn(w) ≤ Y(w), n ≥ 1, for each w and E {Y} <∞, then

Unnumbered Display Equation

(iii) if |Xn(w)| ≤ Y(w) for each w, and E{Y} <∞, then

(1.6.10) numbered Display Equation

Proof.   (i)

Unnumbered Display Equation

The sequence Zn(w) = inlineXm(w), n ≥ 1 is monotonically increasing for each w, and Zn(w) ≥ Y(w), n ≥ 1. Hence, by Theorem 1.6.1,

Unnumbered Display Equation

Or

Unnumbered Display Equation

The proof of (ii) is obtained by defining Zn(w) = inlineXm(w), and applying the previous theorem. Part (iii) is a result of (i) and (ii).        QED

Theorem 1.6.3. (Lebesgue Dominated Convergence) Let Y, X, Xn, n ≥ 1, be random variables such that |Xn(w)| ≤ Y(w), n ≥ 1 for almost all w, and E{Y} < ∞. Assume also that Pinline. Then E{|X|} < ∞ and

(1.6.11) numbered Display Equation

and

(1.6.12) numbered Display Equation

Proof.   By Fatou’s Theorem (Theorem 1.6.2)

Unnumbered Display Equation

But since inlineXn(w) = X(w), with probability 1,

Unnumbered Display Equation

Moreover, |X(w)| < Y(w) for almost all w (with probability 1). Hence, E{|X|} < ∞. Finally, since |Xn(w) − X(w)| ≤ 2Y(w), with probability 1

Unnumbered Display Equation        QED

We conclude this section with a theorem on change of variables under Lebesgue integrals.

Theorem 1.6.4 Let X be a random variable with respect to (inline, inline, P). Let g: inlineinline be a Borel measurable function. Then for each B inline inline,

(1.6.13) numbered Display Equation

The proof of the theorem is based on the following steps.

1. If A inline inline and g (x) = IA(x) then

Unnumbered Display Equation

2. Show that Equation (1.6.13) holds for simple random variables.
3. Follow the steps of the definition of the Lebesgue integral.

1.6.2 The Stieltjes–Riemann Integral

Let g be a function of a real variable and F a distribution function. Let (α, β] be a half–closed interval. Let

Unnumbered Display Equation

be a partition of (α, β] to n subintervals (xi−1, xi], i = 1, …, n. In each subinterval choose xi, xi−1 < xixi and consider the sum

(1.6.14) numbered Display Equation

If, as n → ∞, inline|xixi−1| → 0 and if inline Sn exists (finite) independently of the partitions, then the limit is called the Stieltjes–Riemann integral of g with respect to F. We denote this integral as

Unnumbered Display Equation

This integral has the usual linear properties, i.e.,

(i)numbered Display Equation
(ii)

(1.6.15) numbered Display Equation

and

(iii) inline g(x) d(γ F1(x) + δ F2(x)) = γ inline g(x)dF1(x) + δ inline g(x)dF2(x).

One can integrate by parts, if all expressions exist, according to the formula

(1.6.16) numbered Display Equation

where g’(x) is the derivative of g(x). If F is strictly discrete, with jump points −∞ < ξ1 < ξ2 < ··· <∞,

(1.6.17) numbered Display Equation

where pj = F(ξj) − F(ξj−), j = 1, 2, …. If F is absolutely continuous, then at almost all points,

Unnumbered Display Equation

as dx → 0. Thus, in the absolutely continuous case

(1.6.18) numbered Display Equation

Finally, the improper Stieltjes–Riemann integral, if it exists, is

(1.6.19) numbered Display Equation

If B is a set obtained by union and complementation of a sequence of intervals, we can write, by setting g(x) = I{x inline B},

(1.6.20) numbered Display Equation

where F is either discrete or absolutely continuous.

1.6.3 Mixtures of Discrete and Absolutely Continuous Distributions

Let Fd be a discrete distribution and let Fac be an absolutely continuous distribution function. Then for all α 0 ≤ α ≤ 1,

(1.6.21) numbered Display Equation

is also a distribution function, which is a mixture of the two types. Thus, for such mixtures, if −∞ < ξ1 < ξ2 < ··· < ∞ are the jump points of Fd, then for every −∞ < γδ < ∞ and B = (γ, δ],

(1.6.22) numbered Display Equation

Moreover, if B+ = [γ, δ] then

Unnumbered Display Equation

The expected value of X, when F(x) = pFd(x) + (1−p) Fac(x) is,

(1.6.23) numbered Display Equation

where {ξj} is the set of jump points of Fd; fd and fac are the corresponding p.d.f.s. We assume here that the sum and the integral are absolutely convergent.

1.6.4 Quantiles of Distributions

The pquantiles or fractiles of distribution functions are inverse points of the distributions. More specifically, the p–quantile of a distribution function F, designated by xp or F−1(p), is the smallest value of x at which F(x) is greater or equal to p, i.e.,

(1.6.24) numbered Display Equation

The inverse function defined in this fashion is unique. The median of a distribution, x.5, is an important parameter characterizing the location of the distribution. The lower and upper quartiles are the .25– and .75–quantiles. The difference between these quantiles, RQ = x.75x.25, is called the interquartile range. It serves as one of the measures of dispersion of distribution functions.

1.6.5 Transformations

From the distribution function F(x) = α Fd(x) + (1−α) Fac(x), 0 ≤ α ≤ 1, we can derive the distribution function of a transformed random variable Y = g(X), which is

(1.6.25) numbered Display Equation

where

Unnumbered Display Equation

In particular, if F is absolutely continuous and if g is a strictly increasing differentiable function, then the p.d.f. of Y, h(y), is

(1.6.26) numbered Display Equation

where g−1(y) is the inverse function. If g’(x) < 0 for all x, then

(1.6.27) numbered Display Equation

Suppose that X is a continuous random variable with p.d.f. f(x). Let g(x) be a differentiable function that is not necessarily one–to–one, like g(x) = x2. Excluding cases where g(x) is a constant over an interval, like the indicator function, let m(y) denote the number of roots of the equation g(x) = y. Let ξj(y), j = 1, …, m(y) denote the roots of this equation. Then the p.d.f. of Y = g(x) is

(1.6.28) numbered Display Equation

if m(y) > 0 and zero otherwise.

1.7 JOINT DISTRIBUTIONS, CONDITIONAL DISTRIBUTIONS AND INDEPENDENCE

1.7.1 Joint Distributions

Let (X1, …, Xk) be a vector of k random variables defined on the same probability space. These random variables represent variables observed in the same experiment. The joint distribution function of these random variables is a real value function F of k real arguments (ξ1, …, ξk) such that

(1.7.1) numbered Display Equation

The joint distribution of two random variables is called a bivariate distribution function.

Every bivariate distribution function F has the following properties.

(1.7.2) numbered Display Equation

Property (iii) is the right continuity of F(ξ1, ξ2). Property (iv) means that the probability of every rectangle is nonnegative. Moreover, the total increase of F(ξ1, ξ2) is from 0 to 1. The similar properties are required in cases of a larger number of variables.

Given a bivariate distribution function F. The univariate distributions of X1 and X2 are F1 and F2 where

(1.7.3) numbered Display Equation

F1 and F2 are called the marginal distributions of X1 and X2, respectively. In cases of joint distributions of three variables, we can distinguish between three marginal bivariate distributions and three marginal univariate distributions. As in the univariate case, multivariate distributions are either discrete, absolutely continuous, singular, or mixtures of the three main types. In the discrete case there are at most a countable number of points {(inline, …, inline), j = 1, 2, … } on which the distribution concentrates. In this case the joint probability function is

(1.7.4) numbered Display Equation

Such a discrete p.d.f. can be written as

Unnumbered Display Equation

where pj = P{X1 =inline, …, Xk = inline}.

In the absolutely continuous case there exists a nonnegative function f(x1, …, xk) such that

(1.7.5) numbered Display Equation

The function f(x1, …, xk) is called the joint density function.

The marginal probability or density functions of single variables or of a subvector of variables can be obtained by summing (in the discrete case) or integrating, in the absolutely continuous case, the joint distribution functions (densities) with respect to the variables that are not under consideration, over their range of variation.

Although the presentation here is in terms of k discrete or k absolutely continuous random variables, the joint distributions can involve some discrete and some continuous variables, or mixtures.

If X1 has an absolutely continuous marginal distribution and X2 is discrete, we can introduce the function N(B) on inline, which counts the number of jump points of X2 that belong to B. N(B) is a σ–finite measure. Let λ (B) be the Lebesgue measure on inline. Consider the σ–finite measure on inline(2), μ (B1×B2) = λ (B1)N(B2). If X1 is absolutely continuous and X2 discrete, their joint probability measure PX is absolutely continuous with respect to μ. There exists then a nonnegative function fX such that

Unnumbered Display Equation

The function fX is a joint p.d.f. of X1, X2 with respect to μ. The joint p.d.f. fX is positive only at jump point of X2.

If X1, …, Xk have a joint distribution with p.d.f. f(x1, …, xk), the expected value of a function g(X1, …, Xk) is defined as

(1.7.6) numbered Display Equation

We have used here the conventional notation for Stieltjes integrals.

Notice that if (X, Y) have a joint distribution function F(x, y) and if X is discrete with jump points of F1(x) at ξ1, ξ2, …, and Y is absolutely continuous, then, as in the previous example,

Unnumbered Display Equation

where f(x, y) is the joint p.d.f. A similar formula holds for the case of X, absolutely continuous and Y, discrete.

1.7.2 Conditional Expectations: General Definition

Let X(w) ≥ 0, for all w inline inline, be a random variable with respect to (inline, inline, P). Consider a σ–field inline, inlineinline inline. The conditional expectation of X given inline is defined as a inline–measurable random variable E{X| inline} satisfying

(1.7.7) numbered Display Equation

for all A inline inline. Generally, E{X| inline} is defined if min {E{X+| inline}, E{X| inline}} < ∞ and E{X| inline} = E{X+| inline} − E{X| inline}. To see that such conditional expectations exist, where X(w) ≥ 0 for all w, consider the σ–finite measure on inline,

(1.7.8) numbered Display Equation

Obviously Q inline P and by Radon–Nikodym Theorem, there exists a nonnegative, inline–measurable random variable E{X| inline} such that

(1.7.9) numbered Display Equation

According to the Radon–Nikodym Theorem, E{X| inline} is determined only up to a set of P–measure zero.

If B inline inline and X(w) = IB(w), then E{X| inline} = P{B| inline} and according to (1.6.13),

(1.7.10) numbered Display Equation

Notice also that if X is inline–measurable then X = E{X| inline} with probability 1.

On the other hand, if inline = {inline, inline} is the trivial algebra, then E{X| inline} = E{X} with probability 1.

From the definition (1.7.7), since inline inline inline,

Unnumbered Display Equation

This is the law of iterated expectation; namely, for all inlineinline inline,

(1.7.11) numbered Display Equation

Furthermore, if X and Y are two random variables on (inline, inline, P), the collection of all sets {Y−1 (B), B inline inline}, is a σ–field generated by Y. Let inlineY denote this σ–field. Since Y is a random variable, inlineY inline inline. We define

(1.7.12) numbered Display Equation

Let y0 be such that fY(y0) > 0.

Consider the inlineY–measurable set Aδ = {w: y0 < Y(w) ≤ y0 + δ }. According to (1.7.7)

(1.7.13) numbered Display Equation

The left–hand side of (1.7.13) is, if E{|X|} <∞,

Unnumbered Display Equation

where inline = 0. The right–hand side of (1.7.13) is

Unnumbered Display Equation

Dividing both sides of (1.7.13) by fY(y0)δ, we obtain that

Unnumbered Display Equation

We therefore define for fY(y0) > 0

(1.7.14) numbered Display Equation

More generally, for k > 2 let f(x1, …, xk) denote the joint p.d.f. of (X1, …, Xk). Let 1 ≤ r < k and g(x1, …, xr) denote the marginal joint p.d.f. of (X1, …, Xr). Suppose that (ξ1, …, ξr) is a point at which g(ξ1, …, ξr) > 0. The conditional p.d.f. of Xr+1, …, Xk given {X1 = ξ1, …, Xr = ξr} is defined as

(1.7.15) numbered Display Equation

We remark that conditional distribution functions are not defined on points (ξ1, …, ξr) such that g(ξ1, …, ξr) = 0. However, it is easy to verify that the probability associated with this set of points is zero. Thus, the definition presented here is sufficiently general for statistical purposes. Notice that f(xr+1, …, xk | ξ1, …, ξr) is, for a fixed point (ξ1, …, ξr) at which it is well defined, a nonnegative function of (xr+1, …, xk) and that

Unnumbered Display Equation

Thus, f(xr+1, …, xk| ξ1, …, ξr) is indeed a joint p.d.f. of (Xr+1, …, Xk). The point (ξ1, …, ξr) can be considered a parameter of the conditional distribution.

If inline (Xr+1, …, Xk) is an (integrable) function of (Xr+1, …, Xk), the conditional expectation of inline (Xr+1, …, Xk) given {X1 = ξ1, …, Xr = ξr} is

(1.7.16) numbered Display Equation

This conditional expectation exists if the integral is absolutely convergent.

1.7.3 Independence

Random variables X1, …, Xn, on the same probability space, are called mutually independent if, for any Borel sets B1, …, Bn,

(1.7.17) numbered Display Equation

Accordingly, the joint distribution function of any k–tuple (Xi1, …, Xik) is a product of their marginal distributions. In particular,

(1.7.18) numbered Display Equation

Equation (1.7.18) implies that if X1, …, Xn have a joint p.d.f. fX(x1, …, xn) and if they are independent, then

(1.7.19) numbered Display Equation

Moreover, if g(X1, …, Xn) = inline gj(Xj), where g(x1, …, xn) is inline(n)–measurable and gj(x) are inline–measurable, then under independence

(1.7.20) numbered Display Equation

Probability models with independence structure play an important role in statistical theory. From (1.7.12) and (1.7.21), we imply that if X(r) = (X1, …, Xr) and Y(r) = (Xr+1, …, Xn) are independent subvectors, then the conditional distribution of X(r) given Y(r) is independent of Y(r), i.e.,

(1.7.21) numbered Display Equation

with probability one.

1.8 MOMENTS AND RELATED FUNCTIONALS

A moment of order r, r = 1, 2, …, of a distribution F(x) is

(1.8.1) numbered Display Equation

The moments of Y = Xμ1 are called central moments and those of |X| are called absolute moments. It is simple to prove that the existence of an absolute moment of order r, r > 0, implies the existence of all moments of order s, 0 < sr, (see Section 1.13.3).

Let μ* r = E{(Xμ1)r}, r = 1, 2, … denote the rth central moment of a distribution. From the binomial expansion and the linear properties of the expectation operator we obtain the relationship between moments (about the origin) μr and center moments mr

(1.8.2) numbered Display Equation

where μ0 ≡ 1.

A distribution function F is called symmetric about a point ξ0 if its p.d.f. is symmetric about ξ0, i.e.,

Unnumbered Display Equation

From this definition we immediately obtain the following results.

(i) If F is symmetric about ξ0 and E{|X|} <∞, then ξ0 = E{X}.
(ii) If F is symmetric, then all central moments of odd order are zero, i.e., E{(XE{X})2m+1} = 0, m = 0, 1, …, provided E|X|2m+1 < ∞.

The central moment of the second order occupies a central role in the theory of statistics and is called the variance of X. The variance is denoted by V{X}. The square–root of the variance, called the standard deviation, is a measure of dispersion around the expected value. We denote the standard deviation by σ. The variance of X is equal to

(1.8.3) numbered Display Equation

The variance is always nonnegative, and hence for every distribution having a finite second moment E{X2} ≥ (E{X})2. One can easily verify from the definition that if X is a random variable and a and b are constants, then V{a + bX} = b2V{X}.

The variance is equal to zero if and only if the distribution function is concentrated at one point (a degenerate distribution).

A famous inequality, called the Chebychev inequality, relates the probability of X concentrating around its mean, and the standard deviation σ.

Theorem 1.8.1. (Chebychev) If FX has a finite standard deviation σ, then, for every a > 0,

(1.8.4) numbered Display Equation

where μ = E{X}.

Proof.

(1.8.5) numbered Display Equation

Hence,

Unnumbered Display Equation        QED

Notice that in the proof of the theorem, we used the Riemann–Stieltjes integral. The theorem is true for any type of distribution for which 0 ≤ σ < ∞. The Chebychev inequality is a crude inequality. Various types of better inequalities are available, under additional assumptions (see Zelen and Severv, 1968; Rohatgi, 1976, p. 102).

The moment generating function (m.g.f.) of a random variable X, denoted by M, is defined as

(1.8.6) numbered Display Equation

where t is such that M(t) < ∞. Obviously, at t = 0, M(0) = 1. However, M(t) may not exist when t ≠ 0. Assume that M(t) exists for all t in some interval (a, b), a < 0 < b. There is a one–to–one correspondence between the distribution function F and the moment generating function M. M is analytic on (a, b), and can be differentiated under the expectation integral. Thus

(1.8.7) numbered Display Equation

Under this assumption the rth derivative of M(t) evaluated at t = 0 yields the moment of order r.

To overcome the problem of M being undefined in certain cases, it is useful to use the characteristic function

(1.8.8) numbered Display Equation

where i = inline. The characteristic function exists for all t since

(1.8.9) numbered Display Equation

Indeed, |eitx| = 1 for all x and all t.

If X assumes nonnegative integer values, it is often useful to use the probability generating function (p.g.f.)

(1.8.10) numbered Display Equation

which is convergent if |t| < 1. Moreover, given a p.g.f. of a nonnegative integer value random variable X, its p.d.f. can be obtained by the formula

(1.8.11) numbered Display Equation

The logarithm of the moment generating function is called cumulants generating function. We denote this generating function by K. K exists for all t for which M is finite. Both M and K are analytic functions in the interior of their domains of convergence. Thus we can write for t close to zero

(1.8.12) numbered Display Equation

The coefficients {κj} are called cumulants. Notice that κ0 = 0, and κj, j ≥ 1, can be obtained by differentiating K(t) j times, and setting t = 0. Generally, the relationships between the cumulants and the moments of a distribution are, for j =1, …, 4

(1.8.13) numbered Display Equation

The following two indices

(1.8.14) numbered Display Equation

and

(1.8.15) numbered Display Equation

where σ2 = inline is the variance, are called coefficients of skewness (asymmetry) and kurtosis (steepness), respectively. If the distribution is symmetric, then β1 = 0. If β1 > 0 we say that the distribution is positively skewed; if β1 < 0, it is negatively skewed. If β2 > 3 we say that the distribution is steep, and if β2 < 3 we say that the distribution is flat.

The following equation is called the law of total variance.

If E{X2} < ∞ then

(1.8.16) numbered Display Equation

where V{X | Y} denotes the conditional variance of X given Y.

It is often the case that it is easier to find the conditional mean and variance, E{X | Y} and V{X | Y}, than to find E{X} and V{X} directly. In such cases, formula (1.8.16) becomes very handy.

The product central moment of two variables (X, Y) is called the covariance and denoted by cov (X, Y). More specifically

(1.8.17) numbered Display Equation

Notice that cov (X, Y) = cov(Y, X), and cov(X, X) = V{X}. Notice that if X is a random variable having a finite first moment and a is any finite constant, then cov(a, X) = 0. Furthermore, whenever the second moments of X and Y exist the covariance exists. This follows from the Schwarz inequality (see Section 1.13.3), i.e., if F is the joint distribution of (X, Y) and FX, FY are the marginal distributions of X and Y, respectively, then

(1.8.18) numbered Display Equation

whenever E{g2(X)} and E{h2(Y)} are finite. In particular, for any two random variables having second moments

Unnumbered Display Equation

The ratio

(1.8.19) numbered Display Equation

is called the coefficient of correlation (Pearson’s product moment correlation). From (1.8.18) we deduce that −1 ≤ ρ ≤ 1. The sign of ρ is that of cov(X, Y).

The m.g.f. of a multivariate distribution is a function of k variables

(1.8.20) numbered Display Equation

Let X1, …, Xk be random variables having a joint distribution. Consider the linear transformation Y = inlineβj Xj, where β1, …, βk are constants. Some formulae for the moments and covariances of such linear functions are developed here. Assume that all the moments under consideration exist. Starting with the expected value of Y we prove:

(1.8.21) numbered Display Equation

This result is a direct implication of the definition of the integral as a linear operator.

Let X denote a random vector in a column form and X’ its transpose. The expected value of a random vector X’ = (X1, …, Xk) is defined as the corresponding vector of expected values, i.e.,

(1.8.22) numbered Display Equation

Furthermore, let inline denote a k × k matrix with elements that are the variances and covariances of the components of X. In symbols

(1.8.23) numbered Display Equation

where σij = cov(Xi, Xj), σii = V{Xi}. If Y = βX where β is a vector of constants, then

(1.8.24) numbered Display Equation

The result given by (1.8.24) can be generalized in the following manner. Let Y1 = βX and Y2 = α X, where α and β are arbitrary constant vectors. Then

(1.8.25) numbered Display Equation

Finally, if X is a k–dimensional random vector with covariance matrix inline and Y is an m–dimensional vector Y = A X, where A is an m × k matrix of constants, then the covariance matrix of Y is

(1.8.26) numbered Display Equation

In addition, if the covariance matrix of X is inline, then the covariance matrix of Y = ξ + AX is V, where ξ is a vector of constants, and A is a matrix of constants. Finally, if Y = AX and Z = BX, where A and B are matrices of constants with compatible dimensions, then the covariance matrix of Y and Z is

(1.8.27) numbered Display Equation

We conclude this section with an important theorem concerning a characteristic function. Recall that inline is generally a complex valued function on inline, i.e.,

Unnumbered Display Equation

Theorem 1.8.2 A characteristic function inline, of a distribution function F, has the following properties.

(i) | inline (t) | ≤ inline (0) = 1;
(ii) inline (t) is a uniformly continuous function of t, on inline;
(iii) inline (t) = inline, where inline denotes the complex conjugate of z;
(iv) inline (t) is real valued if and only if F is symmetric around x0 = 0;
(v) if E{ | X | n} < ∞ for some n ≥ 1, then the rth order derivative inline(r)(t) exists for every 1 ≤ rn, and

(1.8.28) numbered Display Equation

(1.8.29) numbered Display Equation

and

(1.8.30) numbered Display Equation

where |Rn(t)| ≤ 3E{|X|n}, Rn(t) → 0 as t→ 0;

(vi) if inline(2n)(0) exists and is finite, then μ2n < ∞;
(vii) if E{|X|n} < ∞ for all n ≥ 1 and

(1.8.31) numbered Display Equation

then

(1.8.32) numbered Display Equation

Proof.   The proof of (i) and (ii) is based on the fact that |eitx| = 1 for all t and all x. Now, inline eitx dF(x) = inline (−t) = inline. Hence (iii) is proven.

(iv) Suppose F(x) is symmetric around x0 = 0. Then dF(x) = dF(−x) for all x. Therefore, since sin (−tx) = −sin (tx) for all x, inline sin (tx) dF(x) = 0, and inline (t) is real. If inline (t) is real, inline (t) = inline. Hence inlineX(t) = inlineX(t). Thus, by the one–to–one correspondence between inline and F, for any Borel set B, P{ X inline B} = P{−X inline B} = P{X inlineB}. This implies that F is symmetric about the origin.

(v) If E{|X|n} <∞, then E{|X|r} < ∞ for all 1 ≤ rn. Consider

Unnumbered Display Equation

Since inline ≤ |x|, and E{|X|}<∞, we obtain from the Dominated Convergence Theorem that

Unnumbered Display Equation

Hence μ1 = inline inline(1)(0).

Equations (1.8.28)(1.8.29) follow by induction. Taylor expansion of eiy yields

Unnumbered Display Equation

where |θ1| ≤ 1 and |θ2| ≤ 1. Hence

Unnumbered Display Equation

where

Unnumbered Display Equation

Since |cos (ty)| ≤ 1, |sin (ty)| ≤ 1, evidently Rn(t) ≤ 3E{|X|n}. Also, by dominated convergence, inline, Rn(t) = 0.

(vi) By induction on n. Suppose inline(2)(0) exists. By L’Hospital’s rule,

Unnumbered Display Equation

By Fatou’s Lemma,

Unnumbered Display Equation

Thus, μ2 ≤ − inline(2) (0) < ∞. Assume that 0 < μ2k < ∞. Then, by (v),

Unnumbered Display Equation

where dG(x) = x2kdF(x), or

Unnumbered Display Equation

Notice that G(∞) = μ2k < ∞. Thus, inline is the characteristic function of the distribution G(x)/G(∞). Since inline > 0, inline x2h + 2 dF(x) = inline x2 dG(x) < ∞. This proves that μ2k < ∞ for all k = 1, …, n.

(vii) Assuming (1.8.31), if 0 < t0 < R, inline. Therefore,

Unnumbered Display Equation

By Stirling’s approximation, inline(n!)1/n = 1. Thus, for 0 < t0 < R,

Unnumbered Display Equation

Accordingly, by Cauchy’s test, inline < ∞. By (iv), for any n ≥ 1, for any t, |t| ≤ t0

Unnumbered Display Equation

where |inline(t)| ≤ 3inline E{|X|n}. Thus, for every t, inline, which implies that

Unnumbered Display Equation        QED

1.9 MODES OF CONVERGENCE

In this section we formulate many definitions and results in terms of random vectors X = (X1, X2, ···, Xk)’, 1 ≤ k < ∞. The notation ||X|| is used for the Euclidean norm, i.e., ||x||2 = inline.

We discuss here four modes of convergence of sequences of random vectors to a random vector.

(i) Convergence in distribution, Xn inline X;
(ii) Convergence in probability, Xn inline X;
(iii) Convergence in rth mean, Xn inline X; and
(iv) Convergence almost surely, Xn inline X.

A sequence Xn is said to converge in distribution to X, Xn inline X if the corresponding distribution functions Fn and F satisfy

(1.9.1) numbered Display Equation

for every continuous bounded function g on inlinek.

One can show that this definition is equivalent to the following statement.

A sequence {Xn} converges in distribution to X, Xn inline X if inline Fn(x) = F(x) at all continuity points x of F.

If Xn inline X we say that Fn converges to F weakly. The notation is Fn inline F or Fn inline F.

We define now convergence in probability.

A sequence {Xn} converges in probability to X, Xn inline X if, for each inline > 0,

(1.9.2) numbered Display Equation

We define now convergence in rth mean.

A sequence of random vectors {Xn} converges in rth mean, r > 0, to X, Xn inline X if E{||XnX||r} → 0 as n→ ∞.

A fourth mode of convergence is

A sequence of random vectors {Xn} converges almost–surely to X, Xn inline X, as n→ ∞ if

(1.9.3) numbered Display Equation

The following is an equivalent definition.

Xn inline X as n → ∞ if and only if, for any inline > 0,

(1.9.4) numbered Display Equation

Equation (1.9.4) is equivalent to

Unnumbered Display Equation

But,

Unnumbered Display Equation

By the Borel–Cantelli Lemma (Theorem 1.4.1), a sufficient condition for Xn inline X is

(1.9.5) numbered Display Equation

for all inline > 0.

Theorem 1.9.1. Let {Xn} be a sequence of random vectors. Then

(a.) Xn inline X implies Xn inline X.
(b.) Xn inline X, r > 0, implies Xn inline X.
(c.) Xn inline X implies Xn inlineX.

Proof.   (a) Since Xn inline X, for any inline > 0,

(1.9.6) numbered Display Equation

The inequality (1.9.6) implies that Xn inline X.

(b) It can be immediately shown that, for any inline > 0,

Unnumbered Display Equation

Thus, Xn inline X implies Xn inline X.

(c) Let inline > 0. If Xnx0 then either Xx0 + inline1, where 1 = (1, …, 1)’, or ||XnX|| > inline. Thus, for all n,

Unnumbered Display Equation

Similarly,

Unnumbered Display Equation

Finally, since Xn inline X,

Unnumbered Display Equation

Thus, if x0 is a continuity point of F, by letting inline → 0, we obtain

Unnumbered Display Equation        QED

Theorem 1.9.2 Let {Xn} be a sequence of random vectors. Then

(a.) if c inline inlinek, then Xn inline c implies Xn inline c;
(b.) if Xn inline X and ||Xn||rZ, for some r > 0 and some (positive) random variable Z, with E{Z}<∞, then Xn inline X.

For proof, see Ferguson (1996, p. 9). Part (b) is implied also from Theorem 1.13.3.

Theorem 1.9.3 Let {Xn} be a sequence of nonnegative random variables such that Xn inline X and E{Xn} → E{X}, E{X} < ∞. Then

(1.9.7) numbered Display Equation

Proof.   Since E{Xn} → E{X}<∞, for sufficiently large n, E{Xn} < ∞. For such n,

Unnumbered Display Equation

But,

Unnumbered Display Equation

Therefore, by the Lebesgue Dominated Convergence Theorem,

Unnumbered Display Equation

This implies (1.9.7).        QED

1.10 WEAK CONVERGENCE

The following theorem plays a major role in weak convergence.

Theorem 1.10.1. The following conditions are equivalent.

(a.) Xn inline X;
(b.) E{g(Xn)} → E{g(X)}, for all continuous functions, g, that vanish outside a compact set;
(c.) E{g(Xn)} → E{g(X)}, for all continuous bounded functions g;
(d. E{g(Xn)} → E{g(X)}, for all measurable functions g such that P{X inline C(g)} = 1, where C(g) is the set of all points at which g is continuous.

For proof, see Ferguson (1996, pp. 14–16).

Theorem 1.10.2. Let {Xn} be a sequence of random vectors in inlinek, and Xn inline X. Then

(i) f(Xn) inline f(X);
(ii) if {Yn} is a sequence such that XnYn inline 0, then Yn inline X;
(iii) if Xn inline inlinek and Yn inline inlinel and Yn inline c, then

Unnumbered Display Equation

Proof.   (i) Let g: inlinelinline be bounded and continuous. Let h(x) = g(f(x)). If x is a continuity point of f, then x is a continuity point of h, i.e., C f inline C(h). Hence P{X inline C(h)} = 1. By Theorem 1.10.1 (c), it is sufficient to show that E{g(f(Xn))} → E{g(f(X))}. Theorem 1.10.1 (d) implies, since P{X inline C(h)} = 1 and Xn inline X, that E{h(Xn)} → E{h(X)}.

(ii) According to Theorem 1.10.1 (b), let g be a continuous function on inlinek vanishing outside a compact set. Thus g is uniformly continuous and bounded. Let inline > 0, find δ > 0 such that, if ||xy|| < δ then |g(x) − g(y)| < inline. Also, g is bounded, say |g(x)| ≤ B < ∞. Thus,

Unnumbered Display Equation

Hence Yn inline X.

(iii)

Unnumbered Display Equation

Hence, from part (ii), inline.        QED

As a special case of the above theorem we get

Theorem 1.10.3 (Slutsky’s Theorem) Let {Xn} and {Yn} be sequences of random variables, Xninline X and Yn inline c. Then

(1.10.1) numbered Display Equation

A sequence of distribution functions may not converge to a distribution function. For example, let Xn be random variables with

Unnumbered Display Equation

Then, inlineFn(x) = inline for all x. F(x) = inline for all x is not a distribution function. In this example, half of the probability mass escapes to −∞ and half the mass escapes to +∞. In order to avoid such situations, we require from collections (families) of probability distributions to be tight.

Let inline = {Fu, u inline inline} be a family of distribution functions on inlinek. inline is tight if, for any inline > 0, there exists a compact set C inline inlinek such that

Unnumbered Display Equation

In the above, the sequence Fn(x) is not tight.

If inline is tight, then every sequence of distributions of inline contains a subsequence converging weakly to a distribution function. (see Shiryayev, 1984, p. 315).

Theorem 1.10.4. Let {Fn} be a tight family of distribution functions on inline. A necessary and sufficient condition for Fn inline F is that, for each t inline inline, inline inlinen(t) exists, where inlinen(t) = inline eitx dFn(x) is the characteristic function corresponding to Fn.

For proof, see Shiryayev (1984, p. 321).

Theorem 1.10.5 (Continuity Theorem) Let {Fn} be a sequence of distribution functions and {inlinen} the corresponding sequence of characteristic functions. Let F be a distribution function, with characteristic function inline. Then Fn inline F if and only if inlinen (t)→ inline (t) for all t inline inlinek. (Shiryayev, 1984, p. 322).

1.11 LAWS OF LARGE NUMBERS

1.11.1 The Weak Law of Large Numbers (WLLN)

Let X1, X2, … be a sequence of identically distributed uncorrelated random vectors. Let μ = E{X1} and let inline = E{(X1μ)(X1μ)’} be finite. Then the means inlinen = inline converge in probability to μ, i.e.,

(1.11.1) numbered Display Equation

The proof is simple. Since cov (Xn, Xn) = 0 for all nn’, the covariance matrix of inlinen is inlineinline. Moreover, since E{inlinen} = μ,

Unnumbered Display Equation

Hence inlinen inline μ, which implies that inlinen inline μ. Here tr.{inline, } denotes the trace of inline.

If X1, X2, … are independent, and identically distributed, with E{X1} = μ, then the characteristic function of inlinen is

(1.11.2) numbered Display Equation

where inline (t) is the characteristic function of X1. Fix t. Then for large values of n,

Unnumbered Display Equation

Therefore,

(1.11.3) numbered Display Equation

inline (t) = eitμ is the characteristic function of X, where P{X = μ} = 1. Thus, since eitμ is continuous at t = 0, inlinen inline μ. This implies that inlinen inline μ (left as an exercise).

1.11.2 The Strong Law of Large Numbers (SLLN)

Strong laws of large numbers, for independent random variables having finite expected values are of the form

Unnumbered Display Equation

where μi = E{Xi}.

Theorem 1.11.1 (Cantelli) Let {Xn} be a sequence of independent random variables having uniformly bounded fourth–central moments, i.e.,

(1.11.4) numbered Display Equation

for all n ≥ 1. Then

(1.11.5) numbered Display Equation

Proof.   Without loss of generality, we can assume that μn = E{Xn} = 0 for all n ≥ 1.

Unnumbered Display Equation

where μ4, i = E{inline} and inline = E{inline}. By the Schwarz inequality, inlineinline ≤ (μ4, i· μ4j)1/2 for all ij. Hence,

Unnumbered Display Equation

By Chebychev’s inequality,

Unnumbered Display Equation

Hence, for any inline > 0,

Unnumbered Display Equation

where C* is some positive finite constant. Finally, by the Borel–Cantelli Lemma (Theorem 1.4.1),

Unnumbered Display Equation

Thus, P{|inlinen| < inline, i.o.} = 1.        QED

Cantelli’s Theorem is quite stringent, in the sense, that it requires the existence of the fourth moments of the independent random variables. Kolmogorov had relaxed this condition and proved that, if the random variables have finite variances, 0 < inline < ∞ and

(1.11.6) numbered Display Equation

then inline inline (Xiμi) inline 0 as n→ ∞.

If the random variables are independent and identically distributed (i.i.d.), then Kolmogorov showed that E{|X1|} < ∞ is sufficient for the strong law of large numbers. To prove Kolmogorov’s strong law of large numbers one has to develop more theoretical results. We refer the reader to more advanced probability books (see Shiryayev, 1984).

1.12 CENTRAL LIMIT THEOREM

The Central Limit Theorem (CLT) states that, under general valid conditions, the distributions of properly normalized sample means converge weakly to the standard normal distribution.

A continuous random variable Z is said to have a standard normal distribution, and we denote it Z ~ N(0, 1) if its distribution function is absolutely continuous, having a p.d.f.

(1.12.1) numbered Display Equation

The c.d.f. of N(0, 1), called the standard normal integral is

(1.12.2) numbered Display Equation

The general family of normal distributions is studied in Chapter 2. Here we just mention that if Z ~ N(0, 1), the moments of Z are

(1.12.3) numbered Display Equation

The characteristic function of N(0, 1) is

(1.12.4) numbered Display Equation

A random vector inline = (Z1, …, Zk)’ is said to have a multivariate normal distribution with mean μ = E{Z} = 0 and covariance matrix V (see Chapter 2), Z ~ N(0, V) if the p.d.f. of Z is

Unnumbered Display Equation

The corresponding characteristic function is

(1.12.5) numbered Display Equation

t inline inlinek.

Using the method of characteristic functions, with the continuity theorem we prove the following simple two versions of the CLT. A proof of the Central Limit Theorem, which is not based on the continuity theorem of characteristic functions, can be obtained by the method of Stein (1986) for approximating expected values or probabilities.

Theorem 1.12.1. (CLT) Let {Xn} be a sequence of i.i.d. random variables having a finite positive variance, i.e., μ = E{X1}, V{X1} = σ2, 0 < σ2 < ∞. Then

(1.12.6) numbered Display Equation

Proof.   Notice that inline, where inline, i ≥ 1. Moreover, E{Zi} = 0 and V{Zi} = 1, i ≥ 1. Let inlineZ(t) be the characteristic function of Z1. Then, since E{Z} = 0, V{Z} = 1, (1.8.33) implies that

Unnumbered Display Equation

Accordingly, since {Zn} are i.i.d.,

Unnumbered Display Equation

Hence, inline inlinen inline N(0, 1).        QED

Theorem 1.12.1 can be generalized to random vector. Let inlinen = inline, n ≥ 1. The generalized CLT is the following theorem.

Theorem 1.12.2 Let {Xn} be a sequence of i.i.d. random vectors with E{Xn} = 0, and covariance matrix E{Xn X’n} = V, n ≥ 1, where V is positive definite with finite eigenvalues. Then

(1.12.7) numbered Display Equation

Proof.   Let inlineX (t) be the characteristic function of X1. Then, since E{X1} = 0,

Unnumbered Display Equation

as n→ ∞. Hence

Unnumbered Display Equation        QED

When the random variables are independent but not identically distributed, we need a stronger version of the CLT. The following celebrated CLT is sufficient for most purposes.

Theorem 1.12.3 (Lindeberg–Feller) Consider a triangular array of random variables {Xn, k}, k = 1, …, n, n ≥ 1 such that, for each n ≥ 1, {Xn, k, k = 1, …, n} are independent, with E{Xn, k} = 0 and V{Xn, k} = inline. Let Sn = inline Xn, k and inline = inlineinline. Assume that Bn > 0 for each n ≥ 1, and Bn inline ∞, as n→ ∞. If, for every inline > 0,

(1.12.8) numbered Display Equation

as n→ ∞, then Sn/Bn inline N(0, 1) as n → ∞. Conversely, if inline as n → ∞ and Sn/Bninline N(0, 1), then (1.12.8) holds.

For a proof, see Shiryayev (1984, p. 326). The following theorem, known as Lyapunov’s Theorem, is weaker than the Lindeberg–Feller Theorem, but is often sufficient to establish the CLT.

Theorem 1.12.4 (Lyapunov) Let {Xn} be a sequence of independent random variables. Assume that E{Xn} = 0, V{Xn} > 0 and E{|Xn|3}<∞, for all n≥ 1. Moreover, assume that inline = inline V{Xj} inline ∞. Under the condition

(1.12.9) numbered Display Equation

the CLT holds, i.e., Sn/Bninline N(0, 1) as n→ ∞.

Proof.   It is sufficient to prove that (1.12.9) implies the Lindberg–Feller condition (1.12.8). Indeed,

Unnumbered Display Equation

Thus,

Unnumbered Display Equation        QED

Stein (1986, p. 97) proved, using a novel approximation to expectation, that if X1, X2, … are independent and identically distributed, with EX1 = 0, EXinline = 1 and γ = E{|X1|3}<∞, then, for all −∞ < x < ∞ and all n = 1, 2, …,

Unnumbered Display Equation

where Φ(x) is the c.d.f. of N(0, 1). This immediately implies the CLT and shows that the convergence is uniform in x.

1.13 MISCELLANEOUS RESULTS

In this section we review additional results.

1.13.1 Law of the Iterated Logarithm

We denote by log2(x) the function log(log(x)), x > e.

Theorem 1.13.1 Let {Xn} be a sequence of i.i.d. random variables, such that E{X1} = 0 and V{X1} = σ2, 0 < σ < ∞. Let Sn = inlineXi. Then

(1.13.1) numbered Display Equation

where inline (n) = (2σ2n log2(n))1/2, n ≥ 3.

For proof, in the normal case, see Shiryayev (1984, p. 372).

The theorem means the sequence |Sn| will cross the boundary inline (n), n ≥ 3, only a finite number of times, with probability 1, as n → ∞. Notice that although E{Sn} = 0, n ≥ 1, the variance of Sn is V{Sn} = nσ2 and P{|Sn|inline ∞ } = 1. However, if we consider inline then by the SLLN, inline inline 0. If we divide only by inline then, by the CLT, inline inline inline N(0, 1). The law of the iterated logarithm says that, for every inline > 0, P inline = 0. This means, that the fluctuations of Sn are not too wild. In Example 1.19 we see that if {Xn} are i.i.d. with P{X1 = 1} = P{X1 = −1} = inline, then inline inline 0 as n→ ∞. But n goes to infinity faster than inline. Thus, by (1.13.1), if we consider the sequence inline then P{|Wn| < 1+inline, i.o.} = 1. {Wn} fluctuates between −1 and 1 almost always.

1.13.2 Uniform Integrability

A sequence of random variables {Xn} is uniformly integrable if

(1.13.2) numbered Display Equation

Clearly, if |Xn| ≤ Y for all n ≥ 1 and E{Y}<∞, then {Xn} is a uniformly integrable sequence. Indeed, |Xn| I{|Xn| > c} ≤ |Y|I{|Y| > c} for all n ≥ 1. Hence,

Unnumbered Display Equation

as c → ∞ since E{Y} < ∞.

Theorem 1.13.2 Let {Xn} be uniformly integrable. Then,

(ii) if in addition Xn inline X, as n→ ∞, then X is integrable and

(1.13.4) numbered Display Equation

(1.13.5) numbered Display Equation

Proof.   (i) For every c > 0

(1.13.6) numbered Display Equation

By uniform integrability, for every inline > 0, take c sufficiently large so that

Unnumbered Display Equation

By Fatou’s Lemma (Theorem 1.6.2),

(1.13.7) numbered Display Equation

But Xn I{Xn ≥ −c}} ≥ Xn. Therefore,

(1.13.8) numbered Display Equation

From (1.13.6)(1.13.8), we obtain

(1.13.9) numbered Display Equation

In a similar way, we show that

(1.13.10) numbered Display Equation

Since inline is arbitrary we obtain (1.13.3). Part (ii) is obtained from (i) as in the Dominated Convergence Theorem (Theorem 1.6.3).        QED

Theorem 1.13.3. Let Xn ≥ 0, n ≥ 1, and Xn inline X, E{Xn} < ∞. Then E{Xn} → E{X} if and only if {Xn} is uniformly integrable.

Proof.   The sufficiency follows from part (ii) of the previous theorem.

To prove necessity, let

Unnumbered Display Equation

Then, for each c inline A

Unnumbered Display Equation

The family {XnI{Xn < c} } is uniformly integrable. Hence, by sufficiency,

Unnumbered Display Equation

for c inline A, n → ∞. A has a countable number of jump points. Since E {X}<∞, we can choose c0 inline A sufficiently large so that, for a given inline > 0, E{XI{Xc0}} < inline. Choose N0(inline) sufficiently large so that, for nN0 (inline),

Unnumbered Display Equation

Choose c1 > c0 sufficiently large so that E{XnI{Xnc1}} ≤ inline, nN0. Then inline E{XnI{Xnc1}} ≤ inline.        QED

Lemma 1.13.1 If {Xn} is a sequence of uniformly integrable random variables, then

(1.13.11) numbered Display Equation

Proof.

Unnumbered Display Equation

for 0 < c < ∞ sufficiently large.        QED

Theorem 1.13.4 A necessary and sufficient condition for a sequence {Xn} to be uniformly integrable is that

(1.13.12) numbered Display Equation

and

(1.13.13) numbered Display Equation

Proof.   (i) Necessity: Condition (1.13.12) was proven in the previous lemma. Furthermore, for any 0 < c <∞,

(1.13.14) numbered Display Equation

Choose c sufficiently large, so that E{|Xn|I{|Xn|≥ c}} < inline and A so that P{A} < inline, then E{|Xn|IA} < inline. This proves the necessity of (1.13.13).

(ii) Sufficiency: Let inline > 0 be given. Choose δ (inline) so that P{A} < δ (inline), and inline E{|Xn|IA} ≤ inline.

By Chebychev’s inequality, for every c > 0,

Unnumbered Display Equation

Hence,

(1.13.15) numbered Display Equation

The right–hand side of (1.13.15) goes to zero, when c→ ∞. Choose c sufficiently large so that P{|Xn|≥ c} < inline. Such a value of c exists, independently of n, due to (1.13.15). Let inline. For sufficiently large c, P{A} < inline and, therefore,

Unnumbered Display Equation

as c→ ∞. This establishes the uniform integrability of {Xn}.        QED

Notice that according to Theorem 1.13.3, if E|Xn|r <∞, r ≥ 1 and Xn inline X, inline E{inline} = E{Xr} if and only if {Xn} is a uniformly integrable sequence.

1.13.3 Inequalities

In previous sections we established several inequalities. The Chebychev inequality, the Kolmogorov inequality. In this section we establish some useful additional inequalities.

1. The Schwarz Inequality

Let (X, Y) be random variables with joint distribution function FXY and marginal distribution functions FX and FY, respectively. Then, for every Borel measurable and integrable functions g and h, such that E{g2(X)} < ∞ and E{h2(Y)}<∞,

(1.13.16) numbered Display Equation

To prove (1.13.16), consider the random variable Q(t) = (g(X) + th(Y))2, −∞ < t < ∞. Obviously, Q(t) ≥ 0, for all t, −∞ < t < ∞. Moreover,

Unnumbered Display Equation

for all t. But, E{Q(t)} ≥ 0 for all t if and only if

Unnumbered Display Equation

This establishes (1.13.16).

2. Jensen’s Inequality

A function g: inlineinline is called convex if, for any −∞ < x < y < ∞ and 0 ≤ α ≤ 1,

Unnumbered Display Equation

Suppose X is a random variable and E{|X|} < ∞. Then, if g is convex,

(1.13.17) numbered Display Equation

To prove (1.13.17), notice that since g is convex, for every x0, −∞ < x0 <∞, g(x) ≥ g(x0) + (xx0)g* (x0) for all x, −∞ < x <∞, where g* (x0) is finite. Substitute x0 = E{X}. Then

Unnumbered Display Equation

with probability one. Since E{XE{X}} = 0, we obtain (1.13.17).

3. Lyapunov’s Inequality

If 0 < s < r and E{|X|r}<∞, then

(1.13.18) numbered Display Equation

To establish this inequality, let t = r/s. Notice that g(x) = |x|t is convex, since t > 1. Let ξ = E{|X|s}, and (|X|s)t = |X|r. Thus, by Jensen’s inequality,

Unnumbered Display Equation

Hence, E{|X|s}1/s ≤ (E{|X|r})1/r. As a result of Lyapunov’s inequality we have the following chain of inequalities among absolute moments.

(1.13.19) numbered Display Equation

4. Hölder’s Inequality

Let 1 < p < ∞ and 1 < q <∞, such that inline. E{|X|p} < ∞ and E{|Y|q} < ∞. Then

(1.13.20) numbered Display Equation

Notice that the Schwarz inequality is a special case of Holder’s inequality for p = q =2.

For proof, see Shiryayev (1984, p. 191).

5. Minkowsky’s Inequality

If E{|X|p} < ∞ and E{|Y|p} < ∞ for some 1 ≤ p <∞, then E{|X + Y|p} < ∞ and

(1.13.21) numbered Display Equation

For proof, see Shiryayev (1984, p. 192).

1.13.4 The Delta Method

The delta method is designed to yield large sample approximations to nonlinear functions g of the sample mean inlinen and its variance. More specifically, let {Xn} be a sequence of i.i.d. random variables. Assume that 0 < V{X} < ∞. By the SLLN, inlinen inline μ, as n→ ∞, where inlinen = inline inline Xj, and by the CLT, inline N(0, 1). Let g: inlineinline having third order continuous derivative. By the Taylor expansion of g(inlinen) around μ,

(1.13.22) numbered Display Equation

where Rn = inline(inlinenμ)3g(3)(inline), where inline is a point between inlinen and μ, i.e., |inlineninline| < |inlinenμ|. Since we assumed that g(3)(x) is continuous, it is bounded on the closed interval [μ −Δ, μ +Δ]. Moreover, g(3)(inline) inline g(3)(μ), as n → ∞. Thus Rn inline 0, as n→ ∞. The distribution of g(μ) + g(1)(μ)(inlinenμ) is asymptotically N(g(μ), (g(1)(μ))2σ2/n). (inlinenμ)2 inline0, as n→ ∞. Thus, inline(g(inlinen)−g(μ))inline N(0, σ2(g(1)(μ))2). Thus, if inlinen satisfies the CLT, an approximation to the expected value of g(inlinen) is

(1.13.23) numbered Display Equation

An approximation to the variance of g(inlinen) is

(1.13.24) numbered Display Equation

Furthermore, from (1.13.22)

(1.13.25) numbered Display Equation

where

(1.13.26) numbered Display Equation

and |inlineinlinen| ≤ |μinlinen| with probability one. Thus, since inlinenμ → 0 a.s., as n→ ∞, and since |g(2) (inline)| is bounded, Dn inline 0, as n→ ∞, then

(1.13.27) numbered Display Equation

1.13.5 The Symbols op and Op

Let {Xn} and {Yn} be two sequences of random variables, Yn > 0 a.s. for all n ≥ 1. We say that Xn = op(Yn), i.e., Xn is of a smaller order of magnitude than Yn in probability if

(1.13.28) numbered Display Equation

We say that Xn = Op(Yn), i.e., Xn has the same order of magnitude in probability as Yn if, for all inline > 0, there exists Kinline such that supn Pinline.

One can verify the following relations.

(1.13.29) numbered Display Equation

1.13.6 The Empirical Distribution and Sample Quantiles

Let X1, X2, …, Xn be i.i.d. random variables having a distribution F. The function

(1.13.30) numbered Display Equation

is called the empirical distribution function (EDF).

Notice that E{I{Xix}} = F(x). Thus, the SLLN implies that at each x, Fn(x) inline F(x) as n → ∞. The question is whether this convergence is uniform in x. The answer is given by

Theorem 1.13.5 (Glivenko–Cantelli) Let X1, X2, X3, … be i.i.d. random variables. Then

(1.13.31) numbered Display Equation

For proof, see Sen and Singer (1993, p. 185).

The pth sample quantile xn, p is defined as

(1.13.32) numbered Display Equation

for 0 < p < 1, where Fn(x) is the EDF. When F(x) is continuous then, the points of increase of Fn(x) are the order statistics X(1:n) < ··· < X(n:n) with probability one. Also, Fn(X(i:n)) = inline, i = 1, …, n. Thus,

(1.13.33) numbered Display Equation

Theorem 1.13.6 Let F be a continuous distribution function, and ξp = F−1(p), and suppose that F(ξp) = p and for any inline > 0, F(ξpinline) < p < F(ξp + inline). Let X1, …, Xn be i.i.d. random variables from this distribution. Then

Unnumbered Display Equation

For proof, see Sen and Singer (1993, p. 167).

The following theorem establishes the asymptotic normality of xn, p.

Theorem 1.13.7 Let F(x) be an absolutely continuous distribution, with continuous p.d.f. f(x). Let p, 0 < p < 1, ξp = F−1(p) and f(ξp) > 0. Then

(1.13.34) numbered Display Equation

For proof, see Sen and Singer (1993, p. 168).

The results of Theorems 1.13.6–1.13.7 will be used in Chapter 7 to establish the asymptotic relative efficiency of the sample median, relative to the sample mean.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.107.229