Chapter Nine
Moment-Generating Function
9.1 Introduction/Purpose of the Chapter
We chose to introduce the generating function and the moment-generating function after the chapter on characteristic function. This is contrary to the traditional and historical approach. In modern-day probability the characteristic functions are much more widespread simply because they exist and can be constructed for any random variable or vector regardless of its distribution. On the other hand, generating functions are defined only for positive discrete random variables. The moment-generating function does not exist for many discrete or continuous random variables. Despite this fact, when these functions exist, they are much easier to work with than the characteristic functions (recall that the characteristic function takes values in the complex plain). Thus, we feel that the knowledge of these functions cannot miss from the culture of anyone applying probability concepts.
9.2 Vignette/Historical Notes
According to Roy (2011), Euler found a generating function for the Bernoulli numbers around 1730's. That is, he defined numbers Bn such that
Around the same time, DeMoivre independently introduced the method of generating functions in a general way (similar to our definition). He used this method to encode all binomial probabilities in a simple polynomial function. Laplace (1812) applied the Laplace transform to probability, in effect creating the moment-generating function. 281
9.3 Theory and Applications
9.3.1 Generating Functions and Applications
First, we introduce a generating function for discrete probability distributions. This function in fact can be defined for any sequence of numbers.
Proof: We have
9.3.1.1 Generating Functions for Discrete Random Variables
Proof. The proof is an exercise. Just use the preceding lemma and calculate
in terms of pi and qi.
This definition and the lemma above explain the main use of the generating functions. Recall that the probability distribution may be recovered entirely from the generating function. On the other hand, the generating function of a sum of two random variables is the product of individual generating functions. Thus, while the distribution of the sum using the convolution definition is complicated to calculate, calculating the generating function of the sum is a much simple process. Once calculated, the probability distribution of the sum may be obtained by taking derivatives of the generating function.
Proof: Exercise 9.8.
9.3.1.2 Generating Functions for Simple Distributions
Bellow we present the generating function for commonly encountered discrete random variables.
Once again we stress that the generating functions are very useful when working with discrete random variables (basically when they exist).
Proof. Again these results are easy to prove and are left as exercises.
9.3.1.3 Examples
In the next examples we show the utility of the generating functions.
Solution: We could compute the convolution fZ = fX ∗ fY using the formulas in the previous section. However, calculating the distribution using generating functions is easier. We have
By independence of X and Y and from the Lemma 9.3.7 above, we have
But this is the generating function of a Poisson random variable. Identifying the parameters imply that Z has a Poisson(λ + μ) distribution.
Solution: Again applying the Lemma 9.7, we obtain the generating function of a Binomial(n, p) as
What if we have two independent binomially distributed random variables X Binomial(m, p) and Y Binomial(k, p)? Note that p the probability of success is identical. What is the distribution of X + Y in this case?
We see that we obtain the generating function of a Binomial random variable with parameters m + k and p.
This is of course expected. If X can be written and a sum of n Bernoulli(p) and Y can be written along with a sum of k Bernoulli(p), then of course X + Y is a sum of n + k Bernoulli(p) random variables. Even more intuitively, suppose X is the number of heads obtained in n tosses of a coin with probability p, and Y is the number of Heads obtained in k tosses of the same coin (separate tosses). Then of course X + Y is the total number of heads in n + k tosses.
Let us expand this results in which we presented the distribution of a sum of a fixed number of terms. Consider the case when the sum contains a random number of terms. The next theorem helps with such a situation.
Now let us give an example of application of the previous theorem.
Solution: We know that
The last distribution is the distribution of the number of eggs hatching if we know how many eggs were laid.
We may then calculate the conditional expectations. Let
since that is the formula for the expectation of a Binomial(n, p) random variable. Therefore, E(K|N) = ϕ(N) = Np. Recall that E[E[X|Y]] = E[X]. Using this relation, we get
To find E(N|K), we need to find fN|K:
if n ≥ k and 0 otherwise, where q = 1 − p and for the denominator we have used the following:
since P(K = k|N = m) = 0 for any k > m (it is not possible to hatch more chicks than eggs laid).
Consequently, we immediately obtain
which gives us E(N|K) = qλ + K.
Note that in the previous derivation we kind of obtained the distribution of K. We still have to calculate a sum to find the closed expression. Even though this is not that difficult, we will use generating functions to demonstrate how easy it is to calculate the distribution using them.
Let K = X1 + + XN, where the Xi ~Bernoulli(p). Then using the the above theorem, we obtain
Using the above equations, we conclude that
But since this is a generating function for a Poisson r.v., we immediately conclude that
Solution: Exercise. Please read and understand the previous example.
9.3.2 Moment-Generating Functions. Relation with the Characteristic Functions
Generating functions are very useful when dealing with discrete random variables. To be able to study any random variables, the idea is to make the transformation s = et in G(s) = E(sx). Thus, the following definition is introduced.
The moment-generating function is related to the Laplace transform of the distribution of the random variable X.
With the previous definitions it is easy to see that the moment-generating functions is a sum of two Laplace transforms (provided either side exists). Specifically, suppose the random variable X has density f(x). Then,
where
and
This result is generally needed to use the theorems related to Laplace and specially inverse Laplace transforms.
Solution: Since
we get for t = 0 and Proposition 9.13
By differentiating MX, twice, we obtain
and
Thus
Let us compute the moment-generating function for certain common probability distribution.
Proof: For every , we have
Proof: See problem 9.3.
Proof: See exercise 9.5.
Proof: By the definition of the gamma density, we have
Use the change of variables y = x(λ − t) to conclude.
9.3.3 Relationship with the Characteristic Function
If the moment-generating function of a random variable X exists, then we obviously have
We recall that just as the moment-generating function is related to the Laplace transform, the characteristic function was related to the Fourier transform.
We note that the characteristic function is a better-behaved function than the moment-generating function. The drawback is that it takes complex values and we need to have a modicum of understanding of complex analysis to be able to work with it.
9.3.4 Properties of the MGF
The following theorems are very similar with the corresponding theorems for the characteristic function. This is why if the moment-generating function exists, it is typically easier to work with it than the characteristic function. We will state these theorems without proofs. The proofs are similar with the ones given for the characteristic function.
The uniqueness theorem in fact comes from the uniqueness of the Laplace transform and inverse Laplace transform.
We note that the usual hypothesis of the theorem include the requirement that MX(t) be a moment-generating function. This is not needed in fact, the inversion theorem stated next makes this condition obsolete. The only requirement is that the MGF be defined at all x.
Recall that the MGF can be written as a sum of two Laplace transforms. The inversion theorem is stated in terms of the Laplace transform. This is in fact the only form in which it actually is useful.
If no complex analysis knowledge is available, the typical way of computing Laplace transform and inverse Laplace is through tables.
Exercises
Problems with Solution
9.1 Suppose that X has a discrete uniform distribution on {1, 2, ..., n}, that is,
Solution: We have
By differentiating with respect to t,
and
9.2 Suppose X has uniform distribution U[a, b] with a < b. Show that for every , we have
and MX(0) = 1.
Solution: Clearly, MX(0) = 1. For t ≠ 0,
9.3 Prove Proposition 9.17.
Solution: For every t, we have
since
(density of a normal law).
9.4 Use the moment-generating function of X ~ N(μ, σ2) to prove that
Solution: From exercise 9.3, we get
and thus
Also,
and thus Then
9.5 Prove Proposition 9.18.
Solution: We have
for every .
9.6 Suppose that X admits a moment-generating function MX. Prove that
and
(These are called the Chernoff bounds.)
Proof: One has
and
We can use Markov's inequality to conclude.
Problems without Solution
9.7 If X ~ Binom(n, p), use the moment generating function to get
9.8 Prove Proposition 9.8.
9.9 Show that
for any and θ [0, 2π], using generating functions.
9.10 Find the generating functions, both ordinary and moment-generating function, for the following discrete probability distributions.
9.11 Let X, Y be two random variables such that
and
9.12 Let X be a discrete r.v. with values in {0, 1, ..., n}. Let MX denote its moment-generating function. Find in terms of MX the moment-generating functions of
9.13 Use Proposition 9.19 to show that X ~ Γ(a, λ) and Y ~ Γ(b, λ) are two independent random variables, then
9.14 Suppose X has the density
where a > 0. This means that X follows a Pareto distribution (power law).
if and only if a > n.
9.15 Suppose X ~ Poisson(a) with a > 0. Use exercise 6 to show that
9.16 Let X, Y be two random variables with moment-generating functions MX and MY respectively. Show that X and Y have the same distribution if and only if
for any .
9.17 Let X ~ Exp(λ). Calculate EX3 and EX4. Give a general formula for EXn.
9.18 Using the properties of the MGF, show that if Xi are i.i.d. Exp(λ) random variables, then
is distributed as a gamma random variable with parameters n and λ.
9.19 Prove the following facts about Laplace transform:
where Γ(n) denote the gamma function calculated at n.
9.20 Suppose f* denotes the Laplace transform of the function f. Prove the following properties:
9.21 Suppose that X1 ~ Exp(λ1) and X2 ~ Exp(λ2) are independent. Given that the inverse laplace transform of the function is e−cx and that the inverse Laplace transform is linear, calculate the p.d.f. of X1 + X2.
9.22 Repeat the problem above with three exponentially distributed random variables, that is, Xi ~ Exp(λi), i = 1, 2, 3. As a note, the distribution of a sum of n such independent exponentials is called the Erlang distribution and is heavily used in queuing theory.