Chapter Nine

Moment-Generating Function

9.1 Introduction/Purpose of the Chapter

We chose to introduce the generating function and the moment-generating function after the chapter on characteristic function. This is contrary to the traditional and historical approach. In modern-day probability the characteristic functions are much more widespread simply because they exist and can be constructed for any random variable or vector regardless of its distribution. On the other hand, generating functions are defined only for positive discrete random variables. The moment-generating function does not exist for many discrete or continuous random variables. Despite this fact, when these functions exist, they are much easier to work with than the characteristic functions (recall that the characteristic function takes values in the complex plain). Thus, we feel that the knowledge of these functions cannot miss from the culture of anyone applying probability concepts.

9.2 Vignette/Historical Notes

According to Roy (2011), Euler found a generating function for the Bernoulli numbers around 1730's. That is, he defined numbers Bn such that

equation

Around the same time, DeMoivre independently introduced the method of generating functions in a general way (similar to our definition). He used this method to encode all binomial probabilities in a simple polynomial function. Laplace (1812) applied the Laplace transform to probability, in effect creating the moment-generating function. 281

9.3 Theory and Applications

9.3.1 Generating Functions and Applications

First, we introduce a generating function for discrete probability distributions. This function in fact can be defined for any sequence of numbers.


Definition 9.1. (Regular generating function) Let a = {ai}iimg{0,1,2,...} be a sequence of real numbers. Define the generating function of the sequence a as
equation
The domain of definition of the function Ga(s) is made of values img for which the series converges.


Remark 9.2 Please note that we can obtain all the terms of the sequence if we know the generating function. Specifically,
equation
where we denote with img the i-th derivative of the function Ga calculated at x0.


Example 9.1 A simple generating function
Take the sequence a as
equation
Then its generating function is
equation
provided that the power term converges to 0. This holds if |s| < 1 for all α; and obviously the radius of convergence may be increased, depending on the value of α. The DeMoivre theorem may be proven easily by taking derivatives of this expression and calculating them at s = 0.


Definition 9.3 The convolution of two sequences {ai} and {bi} is the sequence {ci} with
equation
We write c = ab.


Lemma 9.4 If two sequences a and b have generating functions Ga and Gb, then the generating function of the convolution c = ab is
equation

Proof: We have

equation

img

9.3.1.1 Generating Functions for Discrete Random Variables


Definition 9.5 (Probability generating function(discrete)) If X is a discrete random variable with outcomes img and corresponding probabilities pi, then we define
equation
This is the probability-generating function of the discrete random variable X.


Remark 9.6 The first observation we want to make is that the definition is only for discrete random variables with positive integer values. However, the definition can be extended to any discrete random variable taking values in any countable set. Recall that the countable sets all have the same number of elements, so to use the definition we need to first relabel the outcomes of the random variable as x0 → 0, x1 → 1, x2 → 2, and so on, then the definition above will apply.


Lemma 9.7 Suppose that two independent discrete random variables X and Y have associated probability distributions p = {pn} and q = {qn} and generating functions GX(s) = ∑ isipi and GY(s) = ∑ isiqi respectively. Then the random variable X + Y has distribution the convolution of the two probability sequences pq as in Definition 9.3.3 and its generating function is
equation

Proof. The proof is an exercise. Just use the preceding lemma and calculate

equation

in terms of pi and qi.

img

This definition and the lemma above explain the main use of the generating functions. Recall that the probability distribution may be recovered entirely from the generating function. On the other hand, the generating function of a sum of two random variables is the product of individual generating functions. Thus, while the distribution of the sum using the convolution definition is complicated to calculate, calculating the generating function of the sum is a much simple process. Once calculated, the probability distribution of the sum may be obtained by taking derivatives of the generating function.


Proposition 9.8 (Properties of probability-generating function) Let GX(s), GY(s) be the generating functions of two discrete random variables X and Y.
i. There exists R ≥ 0, a radius of convergence such that the sum is convergent if |s| < R and diverges if |s| > R. Therefore, the corresponding generating function GX exists on the interval [− R, R].
ii. The generating function GX(s) is differentiable at any s within the domain |s| < R.
iii. If two generating functions GX(s) = GY(s) = G(s) for all |s| < R where R = min {RX, RY}, then X and Y have the same distribution. Furthermore, the common distribution may be calculated from the generating function using
equation
iv. As particular cases of the definition, we readily obtain that
equation
The last relationship implies that the probability-generating function always exists for s = 1, thus the radius of convergence R ≥ 1.

Proof: Exercise 9.8.

img

9.3.1.2 Generating Functions for Simple Distributions

Bellow we present the generating function for commonly encountered discrete random variables.

Once again we stress that the generating functions are very useful when working with discrete random variables (basically when they exist).

i. Constant variable, identically equal to c (i.e., P(X = c) = 1):

equation

ii. Bernoulli(p):

equation

iii. Geometric(p):

equation

iv. Poisson(λ):

equation


Lemma 9.9 If X has generating function G(s), then:
i. E(X) = G′(1).
ii. E[X(X − 1) img (Xk + 1)] = G(k)(1), the k-th factorial moment of X.

Proof. Again these results are easy to prove and are left as exercises.

img

9.3.1.3 Examples

In the next examples we show the utility of the generating functions.


Example 9.2 Using Generating Function for Poisson Distribution
Let X, Y be two independent Poisson random variables with parameters λ and μ. What is the distribution of Z = X + Y?

Solution: We could compute the convolution fZ = fXfY using the formulas in the previous section. However, calculating the distribution using generating functions is easier. We have

equation

By independence of X and Y and from the Lemma 9.3.7 above, we have

equation

But this is the generating function of a Poisson random variable. Identifying the parameters imply that Z has a Poisson(λ + μ) distribution.

img


Example 9.3 Using Generating Function for Binomial Distribution
We know that for a Binomial(n, p) random variable X may be written as X = X1 + img + Xn, where Xi's are independent Bernoulli(p) random variables. What is its generating function? What is the distribution of a sum of binomial random variables?

Solution: Again applying the Lemma 9.7, we obtain the generating function of a Binomial(n, p) as

equation

What if we have two independent binomially distributed random variables X Binomial(m, p) and Y Binomial(k, p)? Note that p the probability of success is identical. What is the distribution of X + Y in this case?

equation

We see that we obtain the generating function of a Binomial random variable with parameters m + k and p.

This is of course expected. If X can be written and a sum of n Bernoulli(p) and Y can be written along with a sum of k Bernoulli(p), then of course X + Y is a sum of n + k Bernoulli(p) random variables. Even more intuitively, suppose X is the number of heads obtained in n tosses of a coin with probability p, and Y is the number of Heads obtained in k tosses of the same coin (separate tosses). Then of course X + Y is the total number of heads in n + k tosses.

img

Let us expand this results in which we presented the distribution of a sum of a fixed number of terms. Consider the case when the sum contains a random number of terms. The next theorem helps with such a situation.


Theorem 9.10 Let X1, X2, ... denote a sequence of i.i.d. random variables with generating function GX(s). Let N be an integer-valued random variable independent of the Xi's with generating function GN(s). Then the random variable
equation
has generating function:
equation

Now let us give an example of application of the previous theorem.


Example 9.4 A Chicken and Eggs Problem
Suppose a hen lays N eggs. We assume that N has a Poisson distribution with parameter λ. Each egg then hatches with probability p independent of all other eggs. Let K be the number of chicken that hatch. Find E(K|N), E(N|K), and E(K). Furthermore, find the distribution of K.

Solution: We know that

equation

The last distribution is the distribution of the number of eggs hatching if we know how many eggs were laid.

We may then calculate the conditional expectations. Let

equation

since that is the formula for the expectation of a Binomial(n, p) random variable. Therefore, E(K|N) = ϕ(N) = Np. Recall that E[E[X|Y]] = E[X]. Using this relation, we get

equation

To find E(N|K), we need to find fN|K:

equation

if nk and 0 otherwise, where q = 1 − p and for the denominator we have used the following:

equation

since P(K = k|N = m) = 0 for any k > m (it is not possible to hatch more chicks than eggs laid).

Consequently, we immediately obtain

equation

which gives us E(N|K) = + K.

Note that in the previous derivation we kind of obtained the distribution of K. We still have to calculate a sum to find the closed expression. Even though this is not that difficult, we will use generating functions to demonstrate how easy it is to calculate the distribution using them.

Let K = X1 + img + XN, where the Xi ~Bernoulli(p). Then using the the above theorem, we obtain

equation

Using the above equations, we conclude that

equation

But since this is a generating function for a Poisson r.v., we immediately conclude that

equation

img


Example 9.5 A Quantum Mechanics Extension
In an experiment a laser beam emits photons with an intensity of λ photons/second. Each photon travels through a deflector that polarizes (spin up) each photon with probability p. A measuring device is placed after the deflector. This device measures the number of photons with a spin up (the rest are spin down). Let N denote the number of photons emitted by the laser bean in a second. Let K denote the number of photons measured by the capturing device. Assume that N is distributed as a Poisson random variable with mean λ. Assume that each photon is spinned independently of any other photon. Give the distribution of K and its parameters. In particular, calculate the average intensity recorded by the measuring device.

Solution: Exercise. Please read and understand the previous example.

img

9.3.2 Moment-Generating Functions. Relation with the Characteristic Functions

Generating functions are very useful when dealing with discrete random variables. To be able to study any random variables, the idea is to make the transformation s = et in G(s) = E(sx). Thus, the following definition is introduced.


Definition 9.11. (Moment-generating function The moment-generating function of a random variable X is the function img given by
equation
defined for values of t for which MX(t)< ∞.

The moment-generating function is related to the Laplace transform of the distribution of the random variable X.


Definition 9.12 (Laplace transform of the function f) Suppose we are given a positive function f(x). Assume that
equation
for some value of x0 > 0. Then the Laplace transform of f is defined as
equation
for all t > t0. If the random variable X has distribution function F, the following defines the Laplace–Stieltjes transform:
equation
again with the same caution about the condition on the existence.

With the previous definitions it is easy to see that the moment-generating functions is a sum of two Laplace transforms (provided either side exists). Specifically, suppose the random variable X has density f(x). Then,

equation

where

equation

and

equation

This result is generally needed to use the theorems related to Laplace and specially inverse Laplace transforms.


Proposition 9.13 (Properties) Suppose that the moment-generating function of a random variable X, MX(t) is finite for all img.
i. If the moment-generating function is derivable, then the first moment is img; in general, if the moment generating function is k times derivable, then img.
ii. More general, if the variable X is in Lp (the first p moments exist), then we may write (Taylor expansion of ex):

equation

iii. If X, Y are two independent random variables, then MX+Y(t) = MX(t)MY(t).


Remark 9.14 It is the last property in the above proposition that makes it desirable to work with moment-generating functions. However, as mentioned these functions have a big disadvantage when compared with the characteristic functions. The expectation and thus the integrals involved may not be finite and therefore the function may not exist. Obviously, point iii in the Proposition 9.13 generalizes to any finite sum of independent random variables.


Remark 9.15 Another important fact concerning the moment-generating function is that, as in the case of the characteristic function, it characterizes the law of a random variable completely. See exercise 9. 16.


Example 9.6 Calculating Moments Using the M.G.F.
Let X be an r.v. with moment-generating function
equation
Derive the expectation and the variance of X.

Solution: Since

equation

we get for t = 0 and Proposition 9.13

equation

By differentiating MX, twice, we obtain

equation

and

equation

Thus

equation

img

Let us compute the moment-generating function for certain common probability distribution.


Proposition 9.16 Suppose X ~ N(0, 1). Then
equation
for every img.

Proof: For every img, we have

equation

img


Proposition 9.17 If X ~ N(μ, σ2), prove that for every img, we have
equation

Proof: See problem 9.3.

img


Proposition 9.18 If X ~ Exp(λ) with λ > 0, then
equation

Proof: See exercise 9.5.

img


Proposition 9.19 If X ~ Γ(a, λ) (gamma distribution with parameters a, λ > 0), then the moment-generating function MX exists for t < λ and in this case we have
equation

Proof: By the definition of the gamma density, we have

equation

Use the change of variables y = x(λt) to conclude.

img

9.3.3 Relationship with the Characteristic Function

If the moment-generating function of a random variable X exists, then we obviously have

equation

We recall that just as the moment-generating function is related to the Laplace transform, the characteristic function was related to the Fourier transform.

We note that the characteristic function is a better-behaved function than the moment-generating function. The drawback is that it takes complex values and we need to have a modicum of understanding of complex analysis to be able to work with it.

9.3.4 Properties of the MGF

The following theorems are very similar with the corresponding theorems for the characteristic function. This is why if the moment-generating function exists, it is typically easier to work with it than the characteristic function. We will state these theorems without proofs. The proofs are similar with the ones given for the characteristic function.


Theorem 9.20 (Uniqueness theorem) If the MGF of X exists for any t in an open interval containing 0, then the MGF uniquely determines the c.d.f. of X. That is, no two different distributions can have the same values for the MGF's on an interval containing 0.

The uniqueness theorem in fact comes from the uniqueness of the Laplace transform and inverse Laplace transform.


Theorem 9.21. (Continuity Theorem) Let X1, X2, ... a sequence of random variables with c.d.f.'s img and moment generating functions img (which are defined for all t). Suppose that
equation
for all t as n→ ∞. Then, MX(t) is the moment generating function of some random variable X. If FX(x) denote the c.d.f. of X, then
equation
for all x continuity points of FX

We note that the usual hypothesis of the theorem include the requirement that MX(t) be a moment-generating function. This is not needed in fact, the inversion theorem stated next makes this condition obsolete. The only requirement is that the MGF be defined at all x.

Recall that the MGF can be written as a sum of two Laplace transforms. The inversion theorem is stated in terms of the Laplace transform. This is in fact the only form in which it actually is useful.


Theorem 9.22 (Inversion theorem) Let f be a function with its Laplace transform denoted with f, that is,
equation
Then we have
equation
where c is a constant chosen such that it is greater than the real part of all singularities of f. The formula above gives f as the inverse Laplace transform, img.


Remark 9.23 In practice, computing the complex integral is done using the Residuals Theorem (Cauchy residue theorem). If all singularities of f have negative real part or there is no singularity then c can be chosen 0. In this case the integral formula of the inverse Laplace transform becomes identical to the inverse Fourier transform (given in the chapter about the characteristic function, Chapter 8).

If no complex analysis knowledge is available, the typical way of computing Laplace transform and inverse Laplace is through tables.

Exercises

Problems with Solution

9.1 Suppose that X has a discrete uniform distribution on {1, 2, ..., n}, that is,

equation

for every i = 1, ..., n.
a. Compute the moment-generating function of X.
b. Deduce that

equation

Solution: We have

equation

By differentiating with respect to t,

equation

and

equation

img

9.2 Suppose X has uniform distribution U[a, b] with a < b. Show that for every img, we have

equation

and MX(0) = 1.

Solution: Clearly, MX(0) = 1. For t ≠ 0,

equation

img

9.3 Prove Proposition 9.17.

Solution: For every t, we have

equation

since

equation

(density of a normal law).

img

9.4 Use the moment-generating function of X ~ N(μ, σ2) to prove that

equation

Solution: From exercise 9.3, we get

equation

and thus

equation

Also,

equation

and thus img Then

equation

img

9.5 Prove Proposition 9.18.

Solution: We have

equation

for every img.

img

9.6 Suppose that X admits a moment-generating function MX. Prove that

equation

and

equation

(These are called the Chernoff bounds.)

Proof: One has

equation

and

equation

We can use Markov's inequality to conclude.

img

Problems without Solution

9.7 If X ~ Binom(n, p), use the moment generating function to get

equation

9.8 Prove Proposition 9.8.

9.9 Show that

equation

for any img and θ img [0, 2π], using generating functions.

9.10 Find the generating functions, both ordinary and moment-generating function, for the following discrete probability distributions.

a. The distribution describing a fair coin.
b. The distribution describing a fair die.
c. The distribution describing a die that always comes up 3.

9.11 Let X, Y be two random variables such that

equation

and

equation

a. Show that X and Y have the same first and second moments, but not the same third and fourth moments.
b. Calculate the moment generating functions of X and Y.

9.12 Let X be a discrete r.v. with values in {0, 1, ..., n}. Let MX denote its moment-generating function. Find in terms of MX the moment-generating functions of

a.X.
b. X + 1.
c. 3X.
d. aX + b.

9.13 Use Proposition 9.19 to show that X ~ Γ(a, λ) and Y ~ Γ(b, λ) are two independent random variables, then

equation

9.14 Suppose X has the density

equation

where a > 0. This means that X follows a Pareto distribution (power law).

a. Show that MX(t) =∞ for every img.
b. Show that

equation

if and only if a > n.

9.15 Suppose X ~ Poisson(a) with a > 0. Use exercise 6 to show that

equation

9.16 Let X, Y be two random variables with moment-generating functions MX and MY respectively. Show that X and Y have the same distribution if and only if

equation

for any img.

9.17 Let X ~ Exp(λ). Calculate EX3 and EX4. Give a general formula for EXn.

9.18 Using the properties of the MGF, show that if Xi are i.i.d. Exp(λ) random variables, then

equation

is distributed as a gamma random variable with parameters n and λ.

9.19 Prove the following facts about Laplace transform:

a.

equation

b.

equation

where Γ(n) denote the gamma function calculated at n.

9.20 Suppose f* denotes the Laplace transform of the function f. Prove the following properties:

a.

equation

b.

equation

where f(0 +) denote the right limit of f at 0.
c.

equation

9.21 Suppose that X1 ~ Exp(λ1) and X2 ~ Exp(λ2) are independent. Given that the inverse laplace transform of the function img is ecx and that the inverse Laplace transform is linear, calculate the p.d.f. of X1 + X2.

9.22 Repeat the problem above with three exponentially distributed random variables, that is, Xi ~ Exp(λi), i = 1, 2, 3. As a note, the distribution of a sum of n such independent exponentials is called the Erlang distribution and is heavily used in queuing theory.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.48.161