CHAPTER 3

Sufficient Statistics and the Information in Samples

PART I: THEORY

3.1 INTRODUCTION

The problem of statistical inference is to draw conclusions from the observed sample on some characteristics of interest of the parent distribution of the random variables under consideration. For this purpose we formulate a model that presents our assumptions about the family of distributions to which the parent distribution belongs. For example, in an inventory management problem one of the important variables is the number of units of a certain item demanded every period by the customer. This is a random variable with an unknown distribution. We may be ready to assume that the distribution of the demand variable is Negative Binomial NB(inline, ν). The statistical model specifies the possible range of the parameters, called the parameter space, and the corresponding family of distributions inline. In this example of an inventory system, the model may be

Unnumbered Display Equation

Such a model represents the case where the two parameters, inline and ν, are unknown. The parameter space here is Θ = {(inline, ν); 0 < inline < 1, 0 < ν < ∞ }. Given a sample of n independent and identically distributed (i.i.d.) random variables X1, …, Xn, representing the weekly demand, the question is what can be said on the specific values of inline and ν from the observed sample?

Every sample contains a certain amount of information on the parent distribution. Intuitively we understand that the larger the number of observations in the sample (on i.i.d. random variables) the more information it contains on the distribution under consideration. Later in this chapter we will discuss two specific information functions, which are used in statistical design of experiments and data analysis. We start with the investigation of the question whether the sample data can be condensed by computing first the values of certain statistics without losing information. If such statistics exist they are called sufficient statistics. The term statistic will be used to indicate a function of the (observable) random variables that does not involve any function of the unknown parameters. The sample mean, sample variance, the sample order statistics, etc., are examples of statistics. As will be shown, the notion of sufficiency of statistics is strongly dependent on the model under consideration. For example, in the previously mentioned inventory example, as will be established later, if the value of the parameter ν is known, a sufficient statistic is the sample mean inline. On the other hand, if ν is unknown, the sufficient statistic is the order statistic (X(1), …, X(n)). When ν is unknown, the sample mean inline by itself does not contain all the information on inline and ν. In the following section we provide a definition of sufficiency relative to a specified model and give a few examples.

3.2 DEFINITION AND CHARACTERIZATION OF SUFFICIENT STATISTICS

3.2.1 Introductory Discussion

Let X = (X1, …, Xn) be a random vector having a joint c.d.f. Fθ (x) belonging to a family inline = {Fθ (x); θ inline Θ }. Such a random vector may consist of n i.i.d. variables or of dependent random variables. Let T(X) = (T1(X), …, Tr(X))′, 1 ≤ rn be a statistic based on X. T could be real (r = 1) or vector valued (r > 1). The transformations Tj(X), j = 1, …, r are not necessarily one–to–one. Let f(x; θ) denote the (joint) probability density function (p.d.f.) of X. In our notation here Ti(X) is a concise expression for Ti(X1, …, Xn). Similarly, Fθ (x) and f(x; θ) represent the multivariate functions Fθ (x1, …, xn) and f(x1, …, xn;θ). As in the previous chapter, we assume throughout the present chapter that all the distribution functions belonging to the same family are either absolutely continuous, discrete, or mixtures of the two types.

Definition of Sufficiency. Let inline be a family of distribution functions and let X = (X1, …, Xn) be a random vector having a distribution in inline. A statistic T(X) is called sufficient with respect to inline if the conditional distribution of X given T(X) is the same for all the elements of inline.

Accordingly, if the joint p.d.f. of X, f(x; θ), depends on a parameter θ and T(X) is a sufficient statistic with respect to inline, the conditional p.d.f. h(x| t) of X given {T(X) = t} is independent of θ. Since f(x;θ) = h(x| t)g(t; θ), where g(t; θ) is the p.d.f. of T(x), all the information on θ in x is summarized in T(x).

The process of checking whether a given statistic is sufficient for some family following the above definition may be often very tedious. Generally the identification of sufficient statistics is done by the application of the following theorem. This celebrated theorem was given first by Fisher (1922) and Neyman (1935). We state the theorem here in terms appropriate for families of absolutely continuous or discrete distributions. For more general formulations see Section 3.2.2. For the purposes of our presentation we require that the family of distributions inline consists of

(i) absolutely continuous distributions; or
(ii) discrete distributions, having jumps on a set of points {ξ1, ξ2, … } independent of θ, i.e., inline for all θ inline Θ; or
(iii) mixtures of distributions satisfying (i) or (ii). Such families of distributions will be called regular (Bickel and Doksum, 1977, p. 61).

The families of discrete or absolutely continuous distributions discussed in Chapter 2 are all regular.

Theorem 3.2.1 (The Neyman–Fisher Factorization Theorem). Let X be a random vector having a distribution belonging to a regular family inline and having a joint p.d.f. f(x; θ), θ inline Θ. A statistic T(X) is sufficient for inline if and only if

(3.2.1) numbered Display Equation

where K(x) ≥ 0 is independent of θ and g(T(x); θ) ≥ 0 depends on x only through T(x).

Proof. We provide here a proof for the case of discrete distributions.

(i) Sufficiency:

We show that (3.2.1) implies that the conditional distribution of X given {T(X) = t} is independent of θ. The (marginal) p.d.f. of T(X) is, according to (3.2.1),

(3.2.2) numbered Display Equation

The joint p.d.f. of X and T(X) is

(3.2.3) numbered Display Equation

Hence, the conditional p.d.f. of X, given {T(X) = t at every point t such that g* (t; θ) > 0, is

(3.2.4) numbered Display Equation

This proves that T(X) is sufficient for inline.

(ii) Necessity:

Suppose that T(X) is sufficient for inline. Then, for every t at which the (marginal) p.d.f. of T(X), g* (t;θ), is positive we have,

(3.2.5) numbered Display Equation

where B(x) ≥ 0 is independent of θ. Moreover, inline since (3.2.5) is a conditional p.d.f. Thus, for every x,

(3.2.6) numbered Display Equation

Finally, since for every x,

(3.2.7) numbered Display Equation

we obtain that

(3.2.8) numbered Display Equation        QED

3.2.2 Theoretical Formulation

3.2.2.1 Distributions and Measures

We generalize the definitions and proofs of this section by providing measure–theoretic formulation. Some of these concepts were discussed in Chapter 1. This material can be skipped by students who have not had real analysis.

Let (Ω, inline, P) be a probability space. A random variable X is a finite real value measurable function on this probability space, i.e., X: Ω → inline. Let inline be the sample space (range of X), i.e., inline = X(Ω). Let inline be the Borel σ–field on inline, and consider the probability space (inline, inline, PX) where, for each B inline inline, PX{B} = P{X−1(B)}. Since X is a random variable, inlineX = {A: A = X−1(B), B inline inline} inline inline.

The distribution function of X is

(3.2.9) numbered Display Equation

Let X1, X2, …, Xn be n random variables defined on the same probability space (Ω, inline, P). The joint distribution of X = (X1, …, Xn)′ is a real value function of inlinen defined as

(3.2.10) numbered Display Equation

Consider the probability space (inline(n), inline(n), P(n)) where inline(n) = inline × ··· × inline, inline(n) = inline × ··· × inline (or the Borel σ–field generated by the intervals (−∞, x1] × ··· (−∞, xn], (x1, …, xn) inline inlinen) and for B inline inline(n)

(3.2.11) numbered Display Equation

A function h: inline(n)inline is said to be inline(n)–measurable if the sets h−1((−∞, ζ]) are in inline(n) for all −∞ < ζ < ∞. By the notation hinline inline(n) we mean that h is inline(n)–measurable.

A random sample of size n is the realization of n i.i.d. random variables (see Chapter 2 for definition of independence).

To economize in notation, we will denote by bold x the vector (x1, …, xn), and by F(x) the joint distribution of (X1, …, Xn). Thus, for all B inline inline(n),

(3.2.12) numbered Display Equation

This is a probability measure on (inline(n), B(n)) induced by F(x). Generally, a σfinite measure μ on inline(n) is a nonnegative real value set function, i.e., μ: inline(n) → [0, ∞], such that

(i) μ (inline) = 0;
(ii) if inline is a sequence of mutually disjoint sets in inline(n), i.e., Bi inline Bj = inline for any ij, then

Unnumbered Display Equation

(iii) there exists a partition of inline(n), {B1, B2, … } for which μ (Bi) < ∞ for all i = 1, 2, … .

The Lebesque measure inline is a σ–finite measure on B(n).

If there is a countable set of marked points in inlinen, S = {x1, x2, x3, … }, the counting measure is

Unnumbered Display Equation

N(B; S) is a σ–finite measure, and for any finite real value function g(x)

Unnumbered Display Equation

Notice that if B is such that N(B; S) = 0 then inlineB g(x)d N(x; S) = 0. Similarly, if B is such that inlineB dx = 0 then, for any positive integrable function g(x), inlineB g(x)dx = 0. Moreover, ν (B) = inlineB g(x)d x and λ (B) = inlineB g(x)d N(x; S) are σ–finite measures on (inline(n), inline(n)).

Let ν and μ be two σ–finite measures defined on (inline(n), inline(n)). We say that ν is absolutely continuous with respect to μ if μ (B) = 0 implies that ν (B) = 0. We denote this relationship by ν inline μ. If ν inline μ and μ inline ν, we say that ν and μ are equivalent, νμ. We will use the notation F inline μ if the probability measure PF is absolutely continuous with respect to μ. If F inline μ there exists a nonnegative function f(x), which is inline measurable, satisfying

(3.2.13) numbered Display Equation

f(x) is called the (Radon–Nikodym) derivative of F with respect to μ or the generalized density p.d.f. of F(x). We write

(3.2.14) numbered Display Equation

or

(3.2.15) numbered Display Equation

As discussed earlier, a statistical model is represented by a family inline of distribution functions Fθ on inline(n), θ inline Θ. The family inline is dominated by a σ–finite measure μ if Fθ inline μ, for each θ inline Θ.

We consider only models of dominated families. A theorem in measure theory states that if inline inline μ then there exists a countable sequence inline inline inline such that

(3.2.16) numbered Display Equation

induces a probability measure P*, which dominates inline.

A statistic T(X) is a measurable function of the data X. More precisely, let T: inline(n)inline(k), k ≥ 1 and let inline(k) be the Borel σ–field of subsets of inline(k). The function T(X) is a statistic if, for every C inline inline(k), T−1(C) inline inline(n). Let inlineT = {B: B = T−1(C) for C inline inline(k)}. The probability measure PT on inline(k), induced by PX, is given by

(3.2.17) numbered Display Equation

Thus, the induced distribution function of T is FT(t), where t inline inlinek and

(3.2.18) numbered Display Equation

If F inline μ then FT inline μT, where μT(C) = μ (T−1(C)) for all C inline inline(k). The generalized density (p.d.f.) of T with respect to μT is gT(t) where

(3.2.19) numbered Display Equation

If h(x) is inline(n) measurable and inline |h(x)| dF(x) < ∞, then the conditional expectation of h(X) given {T(X) = t} is a inlineT measurable function, EF{h(X)| T(X) = t}, for which

(3.2.20) numbered Display Equation

for all C inline inline(k). In particular, if C = inline(k) we obtain the law of the iterated expectation; namely

(3.2.21) numbered Display Equation

Notice that EF{h(X) | T(X)} assumes a constant value on the coset A(t) = {x: T(x) = t} = T−1({t}), t inline inline(k).

3.2.2.2 Sufficient Statistics

Consider a statistical model (inline(n), inline(n), inline) where inline is a family of joint distributions of the random sample. A statistic T: (inline(n), inline(n), inline) → (inline(k), inline(k), inlineT) is called sufficient for inline if, for all B inline inline(n), PF{B| T(X) = t} = p(B; t) for all F inline inline. That is, the conditional distribution of X given T(X) is the same for all F in inline. Moreover, for a fixed t, p(B; t) is inline(n) measurable and for a fixed B, p(B; t) is inline(k) measurable.

Theorem 3.2.2 Let (inline(n), inline(n), inline) be a statistical model and inline inline μ. Let inline inline inline such that F*(x) = inline and inline inline P*. Then T(X) is sufficient for inline if and only if for each θ inline Θ there exists a inlineT measurable function gθ (T(x)) such that, for each B inline inline(n)

(3.2.22) numbered Display Equation

i.e.,

(3.2.23) numbered Display Equation

Proof. (i) Assume that T(X) is sufficient for inline. Accordingly, for each B inline inline(n),

Unnumbered Display Equation

for all θ inline Θ. Fix B in inline(n) and let C inline inline(k).

Unnumbered Display Equation

for each θ inline Θ. In particular,

Unnumbered Display Equation

By the Radon–Nikodym Theorem, since Fθ inline F* for each θ, there exists a inlineT measurable function gθ (T(X)) so that, for every C inline inline(k),

Unnumbered Display Equation

Now, for B inline inline(n) and θ inline Θ,

Unnumbered Display Equation

Hence, dFθ (x)/d F* (x) = gθ (T(x)), which is inlineT measurable.

(ii) Assume that there exists a inlineT measurable function gθ (T(x)) so that, for each θ inline Θ,

Unnumbered Display Equation

Let A inline inline(n) and define the σ–finite measure inline. Thus,

Unnumbered Display Equation

Thus, Pθ {A| T(X)} = P*{A| T(X)} for all θ inline Θ. Therefore T is a sufficient statistic.        QED

Theorem 3.2.3 (Abstract Formulation of the Neyman–Fisher Factorization Theorem) Let (inline(n), inline(n), inline) be a statistical model with inlineinlineμ. Then T(X) is sufficient for inline if and only if

(3.2.24) numbered Display Equation

where h ≥ 0 and h inline inline(n), gθ inline inlineT.

Proof. Since inline inline μ, inline inline inline, such that F*(x) = inline dominates inline. Hence, by the previous theorem, T(X) is sufficient for inline if and only if there exists a inlineT measurable function gθ (T(x)) so that

Unnumbered Display Equation

Let fθn(x) = d Fθn(x)/dμ (x) and set h(x) = inline. The function h(x) inline inline(n) and

Unnumbered Display Equation        QED

3.3 LIKELIHOOD FUNCTIONS AND MINIMAL SUFFICIENT STATISTICS

Consider a vector X = (X1, …, Xn)′ of random variables having a joint c.d.f. Fθ (x) belonging to a family inline = {Fθ (x); θ inline Θ }. It is assumed that inline is a regular family of distributions, i.e., inline inline μ, and, for each θ inline Θ, there exists f(x;θ) such that

Unnumbered Display Equation

f(x;θ) is the joint p.d.f. of X with respect to μ (x). We define over the parameter space Θ a class of functions L(θ; X) called likelihood functions. The likelihood function of θ associated with a vector of random variables X is defined up to a positive factor of proportionality as

(3.3.1) numbered Display Equation

The factor of proportionality in (3.3.1) may depend on X but not on θ. Accordingly, we say that two likelihood functions L1(θ ;X) and L2(θ ; X) are equivalent, i.e., L1(θ ; X) ∼ L2(θ ;X), if L1(θ ;X) = A(X) L2(θ ; X) where A(X) is a positive function independent of θ. For example, suppose that X = (X1, …, Xn)′ is a vector of i.i.d. random variables having a N(θ, 1) distribution, −-∞ < θ < ∞. The likelihood function of θ can be defined as

(3.3.2) numbered Display Equation

where inline and inline and 1′ = (1, …, 1) or as

(3.3.3) numbered Display Equation

where inline. We see that for a given value of X, L1(θ ; X) ∼ L2(θ ; X). All the equivalent versions of a likelihood function L(θ; X) belong to the same equivalence class. They all represent similar functions of θ.

If S(X) is a statistic having a p.d.f. gS(s; θ), (θ inline Θ), then the likelihood function of θ given S(X) = s is LS(θ; s) inline gS(s; θ). LS(θ; s) may or may not have a shape similar to L(θ; X). From the Factorization Theorem we obtain that if L(θ; X)∼ LS(θ; S(X)), for all X, then S(X) is a sufficient statistic for inline. The information on θ given by X can be reduced to S(X) without changing the factor of the likelihood function that depends on θ. This factor is called the kernel of the likelihood function. In terms of the above example, if T(X) = inline, since inlineN inline, LT(X)(θ; t) = exp inline. Thus, for all x such that T(x) = t, LX(θ; t) ∼ L1(θ; x) ∼ L2(θ; x). inline is indeed a sufficient statistic. The likelihood function LT(θ ; T(x)) associated with any sufficient statistic for inline is equivalent to the likelihood function L(θ; x) associated with X. Thus, if T(X) is a sufficient statistic, then the likelihood ratio

Unnumbered Display Equation

is independent of θ. A sufficient statistic T(X) is called minimal if it is a function of any other sufficient statistic S(X). The question is how to determine whether a sufficient statistic T(X) is minimal sufficient.

Every statistic S(X) induces a partition of the sample space χ(n) of the observable random vector X. Such a partition is a collection of disjoint sets whose union is χ(n). Each set in this partition is determined so that all its elements yield the same value of S(X). Conversely, every partition of χ(n) corresponds to some function of X. Consider now the partition whose sets contain only x points having equivalent likelihood functions. More specifically, let x0 be a point in χ(n). A coset of x0 in this partition is

(3.3.4) numbered Display Equation

The partition of χ(n) is obtained by varying x0 over all the points of χ(n). We call this partition the equivalent–likelihood partition. For example, in the N(θ, 1) case −∞ < θ < ∞, each coset consists of vectors x having the same mean inline = inline 1x. These means index the cosets of the equivalent–likelihood partitions. The statistic T(X) corresponding to the equivalent–likelihood partition is called the likelihood statistic. This statistic is an index of the likelihood function L(θ; x). We show now that the likelihood statistic T(X) is a minimal sufficient statistic (m.s.s.).

Let x(1) and x(2) be two different points and let T(x) be the likelihood statistic. Then, T(x(1)) = T(x(2)) if and only if L(θ; x(1)) ∼ L(θ ;x(2)). Accordingly, L(θ; X) is a function of T(X), i.e., f(X;θ) = A(X) g*(T(X); θ). Hence, by the Factorization Theorem, T(X) is a sufficient statistic. If S(X) is any other sufficient statistic, then each coset of S(X) is contained in a coset of T(X). Indeed, if x(1) and x(2) are such that S(x(1)) = S(x(2)) and f(x(i), θ) > 0 (i = 1,2), we obtain from the Factorization Theorem that f(x(1); θ) = k(x(1)) g(S(x(1)); θ) = k(x(1))g(S(x(2)); θ) = k(x(1)) f(x(2); θ)/k(x(2)), where k(x(2)) > 0. That is, L(θ; X(1)) ∼ L(θ; X(2)) and hence T(X(1)) = T(X(2)). This proves that T(X) is a function of S(X) and therefore minimal sufficient.

The minimal sufficient statistic can be determined by determining the likelihood statistic or, equivalently, by determining the partition of χ(n) having the property that f(x(1); θ)/f(x(2); θ) is independent of θ for every two points at the same coset.

3.4 SUFFICIENT STATISTICS AND EXPONENTIAL TYPE FAMILIES

In Section 2.16 we discussed the k–parameter exponential type family of distributions. If X1, …, Xn are i.i.d. random variables having a k–parameter exponential type distribution, then the joint p.d.f. of X = (X1, …, Xn), in its canonical form, is

(3.4.1) numbered Display Equation

It follows that inline is a sufficient statistic. The statistic T(X) is minimal sufficient if the parameters {inline1, …, inlinek} are linearly independent. Otherwise, by reparametrization we can reduce the number of natural parameters and obtain an m.s.s. that is a function of T(X).

Dynkin (1951) investigated the conditions under which the existence of an m.s.s., which is a nontrivial reduction of the sample data, implies that the family of distributions, inline, is of the exponential type. The following regularity conditions are called Dynkin’s Regularity Conditions. In Dynkin’s original paper, condition (iii) required only piecewise continuous differentiability. Brown (1964) showed that it is insufficient. We phrase (iii) as required by Brown.

Dynkin’s Regularity Conditions

(i) The family inline = {Fθ (x); θ inline Θ } is a regular parametric family. Θ is an open subset of the Euclidean space Rk.
(ii) If f(x; θ) is the p.d.f. of Fθ (x), then inline = { x; f(x; θ) > 0} is independent of θ.
(iii) The p.d.f.s f(x; θ) are such that, for each θ inline Θ, f(x; θ) is a continuously differentiable function of x over χ.
(iv) The p.d.f.s f(x; θ) are differentiable with respect to θ for each x inline inline.

Theorem 3.4.1 (Dynkin’s). If the family inline is regular in the sense of Dynkin, and if for a sample of nk i.i.d. random variables U1(X), …, Uk(X) are linearly independent sufficient statistics, then the p.d.f. of X is

Unnumbered Display Equation

where the functions inline1 (θ), …, inlinek(θ) are linearly independent.

For a proof of this theorem and further reading on the subject, see Dynkin (1951), Brown (1964), Denny (1967, 1969), Tan (1969), Schmetterer (1974, p. 215), and Zacks (1971, p. 60). The connection between sufficient statistics and the exponential family was further investigated by Borges and Pfanzagl (1965), and Pfanzagl (1972). A one dimensional version of the theorem is proven in Schervish (1995, p. 109).

3.5 SUFFICIENCY AND COMPLETENESS

A family of distribution functions inline = {Fθ (x);θ inline Θ } is called complete if, for any integrable function h(X),

(3.5.1) numbered Display Equation

implies that Pθ [h(X) = 0] = 1 for all θ inline Θ.

A statistic T(X) is called complete sufficient statistic if it is sufficient for a family inline, and if the family inlineT of all the distributions of T(X) corresponding to the distributions in inline is complete.

Minimal sufficient statistics are not necessarily complete. To show it, consider the family of distributions of Example 3.6 with ξ1 = ξ2 = ξ. It is a four–parameter, exponential–type distribution and the m.s.s. is

Unnumbered Display Equation

The family inlineT is incomplete since inline for all θ = (ξ, σ1, σ2). But inline, all θ. The reason for this incompleteness is that when ξ1 = ξ2 the four natural parameters are not independent. Notice that in this case the parameter space Ω = { inline = (inline1, inline2, inline3, inline4); inline1 = inline2 inline3/inline4} is three–dimensional.

Theorem 3.5.1 If the parameter space Ω corresponding to a kparameter exponential type family is kdimensional, then the family of the minimal sufficient statistic is complete.

The proof of this theorem is based on the analyticity of integrals of the type (2.16.4). For details, see Schervish (1995, p. 108).

From this theorem we immediately deduce that the following families are complete.

1. B(N, θ), 0 < θ < 1; N fixed.
2. P(λ), 0 < λ < ∞.
3. N B(inline, ν), 0 < inline < 1; ν fixed.
4. G(λ, ν), 0 < λ < ∞, 0 < ν < ∞.
5. β (p, q), 0 < p, q < ∞.
6. N(μ, σ2), −∞ < μ < ∞, 0 < σ < ∞.
7. M(N, θ), θ = (θ1, …, θk), 0 < inline θi < 1; N fixed.
8. N(μ, V), μ inline R(k); V positive definite.

We define now a weaker notion of boundedly complete families. These are families for which if h(x) is a bounded function and Eθ {h(X)} = 0, for all θ inline Θ, then Pθ {h(x) = 0} = 1, for all θ inline Θ. For an example of a boundedly complete family that is incomplete, see Fraser (1957, p. 25).

Theorem 3.5.2 (Bahadur). If T(X) is a boundedly complete sufficient statistic, then T(X) is minimal.

Proof. Suppose that S(X) is a sufficient statistic. If S(X) = inline (T(X)) then, for any Borel set B inline inline,

Unnumbered Display Equation

Define

Unnumbered Display Equation

By the law of iterated expectation, Eθ {h(T)} = 0, for all θ inline Θ. But since T(X) is boundedly complete,

Unnumbered Display Equation

Hence, T inline inlineS, which means that T is a function of S. Hence T(X) is an m.s.s.        QED

3.6 SUFFICIENCY AND ANCILLARITY

A statistic A(X) is called ancillary if its distribution does not depend on the particular parameter(s) specifying the distribution of X. For example, suppose that X∼ N(θ 1n, In), −∞ < θ < ∞. The statistic U = (X2X1, …, XnX1) is distributed like N(On−1, In−1 + Jn−1). Since the distribution of U does not depend on θ, U is ancillary for the family inline = {N(θ 1, I), −∞ < θ < ∞ }. If S(X) is a sufficient statistic for a family inline, the inference on θ can be based on the likelihood based on S. If fS(s; θ) is the p.d.f. of S, and if A(X) is ancillary for inline, with p.d.f. h(a), one could write

(3.6.1) numbered Display Equation

where inline (s | a) is the conditional p.d.f. of S given {A = a}. One could claim that, for inferential objectives, one should consider the family of conditional p.d.f.s inlineS|A ={inline (s | a), θ inline Θ }. However, the following theorem shows that if S is a complete sufficient statistic, conditioning on A(X) does not yield anything different, since pS(s; θ) = inline (s | a), with probability one for each θ inline Θ.

Theorem 3.6.1 (Basu’s Theorem). Let X = (X1, …, Xn)′ be a vector of i.i.d. random variables with a common distribution belonging to inline = {Fθ (x), θ inline Θ }. Let T(X) be a boundedly complete sufficient statistic for inline. Furthermore, suppose that A(X) is an ancillary statistic. Then T(X) and A(X) are independent.

Proof. Let C inline inlineA, where inlineA is the Borel σ–subfield induced by A(X). Since the distribution of A(X) is independent of θ, we can determine P{A(X) inline C} without any information on θ. Moreover, the conditional probability P{A(X) inline C | T(X)} is independent of θ since T(X) is a sufficient statistic. Hence, P{A(X) inline C | T(X)} − P{A(X) inline C} is a statistic depending on T(X). According to the law of the iterated expectation,

(3.6.2) numbered Display Equation

Finally, since T(x) is boundedly complete,

(3.6.3) numbered Display Equation

with probability one for each θ. Thus, A(X) and T(X) are independent.        QED

From Basu’s Theorem, we can deduce that only if the sufficient statistic S(X) is incomplete for inline, then an inference on θ, conditional on an ancillary statistic, can be meaningful. An example of such inference is given in Example 3.10.

3.7 INFORMATION FUNCTIONS AND SUFFICIENCY

In this section, we discuss two types of information functions used in statistical analysis: the Fisher information function and the Kullback–Leibler information function. These two information functions are somewhat related but designed to fulfill different roles. The Fisher information function is applied in various estimation problems, while the Kullback–Leibler information function has direct applications in the theory of testing hypotheses. Other types of information functions, based on the log likelihood function, are discussed by Basu (1975), Barndorff–Nielsen (1978).

3.7.1 The Fisher Information

We start with the Fisher information and consider parametric families of distribution functions with p.d.f.s f(x; θ), θ inline Θ, which depend only on one real parameter θ. A generalization to vector valued parameters is provided later.

Definition 3.7.1. The Fisher information function for a family inline = {F(x;θ); θ inline Θ }, where dF(x; θ) = f(x; θ)dμ (x), is

(3.7.1) numbered Display Equation

Notice that according to this definition, inline log f(x; θ) should exist with probability one, under Fθ, and its second moment should exist. The random variable inline log f(x; θ) is called the score function. In Example 3.11 we show a few cases.

We develop now some properties of the Fisher information when the density functions in inline satisfy the following set of regularity conditions.

(i) Θ is an open interval on the real line (could be the whole line);
(ii) inline f(x; θ) exists (finite) for every x and every θ inline Θ.
(iii) For each θ in Θ there exists a δ < 0 and a positive integrable function G(x; θ) such that, for all inline in (θδ, θ + δ),

(3.7.2) numbered Display Equation

(iv) 0 < Eθ inline < ∞ for each θ inline Θ.

One can show that under condition (iii) (using the Lebesgue Dominated Convergence Theorem)

(3.7.3) numbered Display Equation

for all θ inline Θ. Thus, under these regularity conditions,

Unnumbered Display Equation

This may not be true if conditions (3.7.2) do not hold. Example 3.11 illustrates such a case where XR(0, θ). Indeed, if XR(0, θ) then

Unnumbered Display Equation

Moreover, in that example Vθ inline for all θ. Returning back to cases where regularity conditions (3.7.2) are satisfied, we find that if X1, …, Xn are i.i.d. and In(θ) is the Fisher information function based on their joint distribution,

(3.7.4) numbered Display Equation

Since X1, …, Xn are i.i.d. random variables, then

(3.7.5) numbered Display Equation

and due to (3.7.3),

(3.7.6) numbered Display Equation

Thus, under the regularity conditions (3.7.2), I(θ) is an additive function.

We consider now the information available in a statistic S = (S1(X), …, Sr(X)), where 1 ≤ rn. Let gS(y1, …, yr; θ) be the joint p.d.f. of S. The Fisher information function corresponding to S is analogously

(3.7.7) numbered Display Equation

We obviously assume that the family of induced distributions of S satisfies the regularity conditions (i)–(iv). We show now that

(3.7.8) numbered Display Equation

We first show that

(3.7.9) numbered Display Equation

We prove (3.7.9) first for the discrete case. The general proof follows. Let A(y) = {x; S1(x) = y1, …, Sr(x) = yr}. The joint p.d.f. of S at y is given by

(3.7.10) numbered Display Equation

where f(x;θ) is the joint p.d.f. of X. Accordingly,

(3.7.11) numbered Display Equation

Furthermore, for each x such that f(x; θ ) > 0 and according to regularity condition (iii),

(3.7.12) numbered Display Equation

To prove (3.7.9) generally, let S: (inline, inline,inline) → (inline, Γ, inline) be a statistic and inline inline μ and inline be regular. Then, for any C inline Γ,

(3.7.13) numbered Display Equation

Since C is arbitrary, (3.7.9) is proven. Finally, to prove (3.7.8), write

(3.7.14) numbered Display Equation

We prove now that if T(X)is a sufficient statistic for inline, then

Unnumbered Display Equation

Indeed, from the Factorization Theorem, if T(X) is sufficient for inline then f(x;θ) = K(x) g(T(x); θ), for all θ inline Θ. Accordingly, In(θ) = Eθ inline. On the other hand, the p.d.f. of T(X) is gT(t; θ) = A(t)g(t; θ), all θ inline Θ. Hence, inline log gT(t; θ) = inline log g(t; θ) for all θ and all t. This implies that

(3.7.15) numbered Display Equation

for all θ inline Θ. Thus, we have proven that if a family of distributions, inline, admits a sufficient statistic, we can determine the amount of information in the sample from the distribution of the m.s.s.

Under regularity conditions (3.7.2), for any statistic U(X),

Unnumbered Display Equation

By (3.7.15), if U(X) is an ancillary statistic, log gU(u; θ) is independent of θ. In this case inline log gU(u; θ) = inline, with probability 1, and

Unnumbered Display Equation

3.7.2 The Kullback–Leibler Information

The Kullback–Leibler (K–L) information function, to discriminate between two distributions Fθ(x) and Finline(x) of inline = {Fθ(x); θ inline Θ } is defined as

(3.7.16) numbered Display Equation

The family inline is assumed to be regular. We show now that I(θ, inline) ≥ 0 with equality if and only if f(X; θ) = f(X; inline ) with probability one. To verify this, we remind that log x is a concave function of x and by the Jensen inequality (see problem 8, Section 2.5), log (E{Y}) ≥ E{log Y} for every nonnegative random variable Y, having a finite expectation. Accordingly,

(3.7.17) numbered Display Equation

Thus, multiplying both sides of (3.7.17) by −1, we obtain that I(θ, inline) ≥ 0. Obviously, if Pθ {f(X; θ ) = f(X; inline )} = 1, then I(θ, inline ) = 0. If X1, …, Xn are i.i.d. random variables, then the information function in the whole sample is

(3.7.18) numbered Display Equation

This shows that the K–L information function is additive if the random variables are independent.

If S(X) = (S1(X), …, Sr(X)), 1 ≤ rn, is a statistic having a p.d.f. gS(y1, …, yr; θ), then the K–L information function based on the information in S(X) is

(3.7.19) numbered Display Equation

We show now that

(3.7.20) numbered Display Equation

for all θ, inline inline Θ and every statistic S(X) with equality if S(X) is a sufficient statistic. Since the logarithmic function is concave, we obtain from the Jensen inequality

(3.7.21) numbered Display Equation

Generally, if S is a statistic,

Unnumbered Display Equation

then for any C inline Γ

Unnumbered Display Equation

This proves that

(3.7.22) numbered Display Equation

Substituting this expression for the conditional expectation in (3.7.21) and multiplying both sides of the inequality by −1, we obtain (3.7.20). To show that if S(X) is sufficient then equality holds in (3.7.20), we apply the Factorization Theorem. Accordingly, if S(X) is sufficient for inline,

(3.7.23) numbered Display Equation

at all points x at which K(x) > 0. We recall that this set is independent of θ and has probability 1. Furthermore, the p.d.f. of S(X) is

(3.7.24) numbered Display Equation

Therefore,

(3.7.25) numbered Display Equation

3.8 THE FISHER INFORMATION MATRIX

We generalize here the notion of the information for cases where f(x; θ) depends on a vector of k–parameters. The score function, in the multiparameter case, is defined as the random vector

(3.8.1) numbered Display Equation

Under the regularity conditions (3.7.2), which are imposed on each component of θ,

(3.8.2) numbered Display Equation

The covariance matrix of S(θ ;X) is the Fisher Information Matrix (FIM)

(3.8.3) numbered Display Equation

If the components of (3.8.1) are not linearly dependent, then I(θ) is positive definite.

In the k–parameter canonical exponential type family

(3.8.4) numbered Display Equation

The score vector is then

(3.8.5) numbered Display Equation

and the FIM is

(3.8.6) numbered Display Equation

Thus, in the canonical exponential type family, I(inline) is the Hessian matrix of the cumulant generating function K(inline).

It is interesting to study the effect of reparametrization on the FIM. Suppose that the original parameter vector is θ. We reparametrize by defining the k functions

Unnumbered Display Equation

Let

Unnumbered Display Equation

and

Unnumbered Display Equation

Then,

Unnumbered Display Equation

It follows that the FIM, in terms of the parameters w, is

(3.8.7) numbered Display Equation

Notice that I(inline (w)) is obtained from I(θ) by substituting inline (w) for θ.

Partition θ into subvectors θ(1), …, θ(l) (2 ≤ lk). We say that θ(1), …, θ(l) are orthogonal subvectors if the FIM is block diagonal, with l blocks, each containing only the parameters in the corresponding subvector.

In Example 3.14, μ and σ2 are orthogonal parameters, while inline1 and inline2 are not orthogonal.

3.9 SENSITIVITY TO CHANGES IN PARAMETERS

3.9.1 The Hellinger Distance

There are a variety of distance functions for probability functions. Following Pitman (1979), we apply here the Hellinger distance.

Let inline = {F(x; θ), θ inline Θ } be a family of distribution functions, dominated by a σ–finite measure μ, i.e., dF(x; θ ) = f(x; θ )dμ (x), for all θ inline Θ. Let θ1, θ2 be two points in Θ. The Hellinger distance between f(x; θ1) and f(x;θ2) is

(3.9.1) numbered Display Equation

Obviously, ρ (θ1, θ2) = 0 if θ1 = θ2.

Notice that

(3.9.2) numbered Display Equation

Thus, ρ(θ1, θ2) ≤ inline, for all θ1, θ2 inline Θ.

The sensitivity of ρ(θ1, θ0) at θ0 is the derivative (if it exists) of ρ(θ, θ0), at θ = θ0.

Notice that

(3.9.3) numbered Display Equation

If one can introduce the limit, as θθ0, under the integral at the r.h.s. of (3.9.3), then

(3.9.4) numbered Display Equation

Thus, if the regularity conditions (3.7.2) are satisfied, then

(3.9.5) numbered Display Equation

Equation (3.9.5) expresses the sensitivity of ρ(θ, θ0), at θ0, as a function of the Fisher information I(θ0).

Families of densities that do not satisfy the regularity conditions (3.7.2) usually will not satisfy (3.9.5). For example, consider the family of rectangular distributions inline = {R(0, θ), 0 < θ < ∞ }.

For θ > θ0 > 0,

Unnumbered Display Equation

Thus,

Unnumbered Display Equation

On the other hand, according to (3.7.1) with n = 1, inline.

The results of this section are generalizable to families depending on k parameters (θ1, …, θk). Under similar smoothness conditions, if λ = (λ1, …, λk)′ is such that λλ = 1, then

(3.9.6) numbered Display Equation

where I(θ0) is the FIM.

PART II: EXAMPLES

Example 3.1. Let X1, …, Xn be i.i.d. random variables having an absolutely continuous distribution with a p.d.f. f(x). Here we consider the family inline of all absolutely continuous distributions. Let T(X) = (X(1), …, X(n)), where X(1) ≤ ··· ≤ X(n), be the order statistic. It is immediately shown that

Unnumbered Display Equation

Thus, the order statistic is a sufficient statistic. This result is obvious because the order at which the observations are obtained is irrelevant to the model. The order statistic is always a sufficient statistic, when the random variables are i.i.d. On the other hand, as will be shown in the sequel, any statistic that further reduces the data is insufficient for inline and causes some loss of information. inline

Example 3.2. Let X1, …, Xn be i.i.d. random variables having a Poisson distribution, P(λ). The family under consideration is inline = {P(λ); 0 < λ < ∞ }. Let T(X) = inline. We know that T(X) ∼ P(n λ). Furthermore, the joint p.d.f. of X and T(X) is

Unnumbered Display Equation

Hence, the conditional p.d.f. of X given T(X) = t is

Unnumbered Display Equation

where x1, …, xn are nonnegative integers and t = 0, 1, …. We see that the conditional p.d.f. of X given T(X) = t is independent of λ. Hence T(X) is a sufficient statistic. Notice that X1, …, Xn have a conditional multinomial distribution given Σ Xi = t. inline

Example 3.3. Let X = (X1, …, Xn)′ have a multinormal distribution N(μ 1n, In), where 1n = (1, 1, …, 1)′. Let inline. We set X* = (X2, …, Xn) and derive the joint distribution of (X*, T). According to Section 2.9, (X*, T) has the multinormal distribution

Unnumbered Display Equation

where

Unnumbered Display Equation

Hence, the conditional distribution of X* given T is the multinormal

Unnumbered Display Equation

where inline is the sample mean and Vn−1 = In−1inline Jn−1. It is easy to verify that Vn−1 is nonsingular. This conditional distribution is independent of μ. Finally, the conditional p.d.f. of X1 given (X*, T) is that of a one–point distribution

Unnumbered Display Equation

We notice that it is independent of μ. Hence the p.d.f. of X given T is independent of μ and T is a sufficient statistic. inline

Example 3.4. Let (X1, Y1), …, (Xn, Yn) be i.i.d. random vectors having a bivariate normal distribution. The joint p.d.f. of the n vectors is

Unnumbered Display Equation

where −-∞ < ξ, η < ∞; 0 < σ1, σ2 < ∞; −1 ≤ ρ ≤ 1. This joint p.d.f. can be written in the form

Unnumbered Display Equation

where inline, inline, inline, inline, P(x, y) = inline (xiinline)(yiinline).

According to the Factorization Theorem, a sufficient statistic for inline is

Unnumbered Display Equation

It is interesting that even if σ1 and σ2 are known, the sufficient statistic is still T(X, Y). On the other hand, if ρ = 0 then the sufficient statistic is T*(X, Y) = (inline, inline, Q(X), Q(Y)). inline

Example 3.5.

A. Binomial Distributions

inline = {B(n, θ), 0 < θ < 1}, n is known. X1, …, Xn is a sample of i.i.d. random variables. For every point x0, at which f(x0, θ) > 0, we have

Unnumbered Display Equation

Accordingly, this likelihood ratio can be independent of θ if and only if inline. Thus, the m.s.s. is T(X) = inline.

B. Hypergeometric Distributions

XiH(N, M, S), i = 1, …, n. The joint p.d.f. of the sample is

Unnumbered Display Equation

The unknown parameter here is M, M = 0, …, N. N and S are fixed known values. The minimal sufficient statistic is the order statistic Tn = (X(1), …, X(n)). To realize it, we consider the likelihood ratio

Unnumbered Display Equation

This ratio is independent of (M) if and only if x(i) = inline, for all i = 1, 2, …, n.

C. Negative–Binomial Distributions

XiNB(inline, ν), i = 1, …, n; 0 < inline < 1, 0 < ν < ∞.

(i) If ν is known, the joint p.d.f. of the sample is

Unnumbered Display Equation

Therefore, the m.s.s. is inline.

(ii) If ν is unknown, the p.d.f.s ratio is

Unnumbered Display Equation

Hence, the minimal sufficient statistic is the order statistic.

D. Multinomial Distributions

We have a sample of n i.i.d. random vectorsinline, i = 1, …, n. Each X(i) is distributed like the multinomial M(s, θ). The joint p.d.f. of the sample is

Unnumbered Display Equation

Accordingly, an m.s.s. is inline, where inline, j = 1, …, k − 1. Notice that inline.

E. Beta Distributions

Unnumbered Display Equation

The joint p.d.f. of the sample is

Unnumbered Display Equation

0 ≤ xi ≤ 1 for all i = 1, …, n. Hence, an m.s.s. is inline. In cases where either p or q are known, the m.s.s. reduces to the component of Tn that corresponds to the unknown parameter.

F. Gamma Distributions

Unnumbered Display Equation

The joint distribution of the sample is

Unnumbered Display Equation

Thus, if both λ and ν are unknown, then an m.s.s. is inline. If only ν is unknown, the m.s.s. is inline. If only λ is unknown, the corresponding statistic inline is minimal sufficient.

G. Weibull Distributions

X has a Weibull distribution if (Xξ)αE(λ). This is a three–parameter family, θ = (ξ, λ, α); where ξ is a location parameter (the density is zero for all x < ξ); λ−1 is a scale parameter; and α is a shape parameter. We distinguish among three cases.

(i) ξ and α are known.

Let Yi = Xiξ, i = 1, …, n. Since inlineE(λ), we immediately obtain from that an m.s.s., which is,

Unnumbered Display Equation

(ii) If α and λ are known but ξ is unknown, then a minimal sufficient statistic is the order statistic.
(iii) α is unknown.

The joint p.d.f. of the sample is

Unnumbered Display Equation

for all i = 1, …, n. By examining this joint p.d.f., we realize that a minimal sufficient statistic is the order statistic, i.e., Tn = (X(1), …, X(n)).

H. Extreme Value Distributions

The joint p.d.f. of the sample is

Unnumbered Display Equation

Hence, if α is known then inline is a minimal sufficient statistic; otherwise, a minimal sufficient statistic is the order statistic.

Normal Distributions

(i) Single (Univariate) Distribution Model

Unnumbered Display Equation

The m.s.s. is inline. If ξ is known, then an m.s.s. is inline; if σ is known, then the first component of Tn is sufficient.

(ii) Two Distributions Model

We consider a two–sample model according to which X1, …, Xn are i.i.d. having a N(ξ, inline) distribution and Y1, …, Ym are i.i.d. having a N(η, inline) distribution. The X–sample is independent of the Y–sample. In the general case, an m.s.s. is

Unnumbered Display Equation

If inline = inline then the m.s.s. reduces to inline. On the other hand, if ξ = η but σ1σ2 then the minimal statistic is T. inline

Example 3.6. Let (X, Y) have a bivariate distribution inline, inline with −∞ < ξ1, ξ2 < ∞; 0 < σ1, σ2 < ∞. The p.d.f. of (X, Y) is

Unnumbered Display Equation

This bivariate p.d.f. can be written in the canonical form

Unnumbered Display Equation

where

Unnumbered Display Equation

and

Unnumbered Display Equation

Thus, if inline1, inline2, inline3, and inline4 are independent, then an m.s.s. is T(X) = inline. This is obviously the case when ξ1, ξ2, σ1, σ2 can assume arbitrary values. Notice that if ξ1 = ξ2 but σ1σ2 then inline1, …, inline4 are still independent and T(X) is an m.s.s. On the other hand, if ξ1ξ2 but σ1 = σ2 then an m.s.s. is

Unnumbered Display Equation

The case of ξ1 =ξ2, σ1σ2, is a case of four–dimensional m.s.s., when the parameter space is three–dimensional. This is a case of a curved exponential family. inline

Example 3.7. Binomial Distributions

inline = {B(n, θ); 0 < θ < 1}, n fixed. Suppose that Eθ {h(X)} = 0 for all 0 < θ < 1. This implies that

Unnumbered Display Equation

0 < inline < ∞, where inline = θ/(1 − θ) is the odds ratio. Let an, j = inline, j = 0, …, n. The expected value of h(X) is a polynomial of order n in inline. According to the fundamental theorem of algebra, such a polynomial can have at most n roots. However, the hypothesis is that the expected value is zero for all inline in (0, ∞ ). Hence an, j = 0 for all j = 0, …, n, independently of inline. Or,

Unnumbered Display Equation inline

Example 3.8. Rectangular Distributions

Suppose that inline = {R(0, θ); 0 < θ < ∞ }. Let X1, …, Xn be i.i.d. random variables having a common distribution from inline. Let X(n) be the sample maximum. We show that the family of distributions of X(n), inline, is complete. The p.d.f. of X(n) is

Unnumbered Display Equation

Suppose that Eθ {h(X(n))} = 0 for all 0 < θ < ∞. That is

Unnumbered Display Equation

Consider this integral as a Lebesque integral. Differentiating with respect to θ yields

Unnumbered Display Equation

θ inline (0, ∞ ). inline

Example 3.9. In Example 2.15, we considered the Model II of analysis of variance. The complete sufficient statistic for that model is

Unnumbered Display Equation

where inline is the grand mean; inline is the “within” sample variance; and inline is the “between” sample variance. Employing Basu’s Theorem we can immediately conclude that inline is independent of (inline, inline). Indeed, if we consider the subfamily inlineσ, ρ for a fixed σ and ρ, then inline is a complete sufficient statistic. The distributions of inline and inline, however, do not depend on μ. Hence, they are independent of inline. Since this holds for any σ and ρ, we obtain the result. inline

Example 3.10. This example follows Example 2.23 of Barndorff–Nielsen and Cox (1994, p. 42). Consider the random vector N = (N1, N2, N3, N4) having a multinomial distribution M(n, p) where p = (p1, …, p4)′ and, for 0 < θ < 1,

Unnumbered Display Equation

The distribution of N is a curved exponential type. N is an m.s.s., but N is incomplete. Indeed, inline for all 0 < θ < 1, but inline < 1 for all θ. Consider the statistic A1 = N1 + N2. A1B inline. Thus, A1 is ancillary. The conditional p.d.f. of N, given A1 = a is

Unnumbered Display Equation

for n1 = 0, 1, …, a; n3 = 0, 1, …, na; n2 = an1 and n4 = nan3. Thus, N1 is conditionally independent of N3 given A1 = a. inline

Example 3.11. A. Let XB(n, θ), n known, 0 < θ < 1; f(x; θ) = inline (1 − θ)nx satisfies the regularity conditions (3.7.2). Furthermore,

Unnumbered Display Equation

Hence, the Fisher information function is

Unnumbered Display Equation

B. Let X1, …, Xn be i.i.d. random variables having a rectangular distribution R(0, θ), 0 < θ < ∞. The joint p.d.f. is

Unnumbered Display Equation

where inline. Accordingly,

Unnumbered Display Equation

and the Fisher information in the whole sample is

Unnumbered Display Equation

C. Let Xμ + G(1, 2), −∞ < μ < ∞. In this case,

Unnumbered Display Equation

Thus,

Unnumbered Display Equation

But

Unnumbered Display Equation

Hence, I(μ ) does not exist. inline

Example 3.12. In Example 3.10, we considered a four–nomial distribution with parameters pi(θ ), i = 1, …, 4, which depend on a real parameter θ, 0 < θ < 1. We considered two alternative ancillary statistics A = N1 + N2 and A′ = N1 + N4. The question was, which ancillary statistic should be used for conditional inference. Barndorff–Nielsen and Cox (1994, p. 43) recommend to use the ancillary statistic which maximizes the variance of the conditional Fisher information.

A version of the log–likelihood function, conditional on {A = a} is

Unnumbered Display Equation

This yields the conditional score function

Unnumbered Display Equation

The corresponding conditional Fisher information is

Unnumbered Display Equation

Finally, since Ainline inline, the Fisher information is

Unnumbered Display Equation

In addition,

Unnumbered Display Equation

In a similar fashion, we can show that

Unnumbered Display Equation

and

Unnumbered Display Equation

Thus, V{I(θ | A)} > V{I(θ | A′)} for all 0 < θ < 1. Ancillary A is preferred. inline

Example 3.13. We provide here a few examples of the Kullback–Leibler information function.

A. Normal Distributions

Let inline be the class of all the normal distributions {N(μ, σ2); −∞ < μ < ∞, 0 < σ < ∞ }. Let θ1 = (μ1, σ1) and θ2 = (μ2, σ2). We compute I(θ1, θ2). The likelihood ratio is

Unnumbered Display Equation

Thus,

Unnumbered Display Equation

Obviously, inline. On the other hand

Unnumbered Display Equation

where UN (0, 1). Hence, we obtain that

Unnumbered Display Equation

We see that the distance between the means contributes to the K–L information function quadratically while the contribution of the variances is through the ratio ρ = σ2/σ1.

B. Gamma Distributions

Let θi = (λi, νi), i = 1, 2, and consider the ratio

Unnumbered Display Equation

We consider here two cases.

Case I: ν1 = ν2 = ν. Since the νs are the same, we simplify by setting θi = λi (i = 1, 2). Accordingly,

Unnumbered Display Equation

This information function depends on the scale parameters λi (i = 1, 2), through their ratio ρ = λ21.

Case II: λ1 = λ2 = λ. In this case, we write

Unnumbered Display Equation

Furthermore,

Unnumbered Display Equation

The derivative of the log–gamma function is tabulated (Abramowitz and Stegun, 1968). inline

Example 3.14. Consider the normal distribution N(μ, σ2); −∞ < μ < ∞, 0 < σ2 < ∞. The score vector, with respect to θ = (μ, σ2), is

Unnumbered Display Equation

Thus, the FIM is

Unnumbered Display Equation

We have seen in Example 2.16 that this distribution is a two–parameter exponential type, with canonical parameters inline1 = inline and inline2 = inline. Making the reparametrization in terms of inline1 and inline2, we compute the FIM as a function of inline1, inline2.

The inverse transformation is

Unnumbered Display Equation

Thus,

Unnumbered Display Equation

Substituting (3.8.9) into (3.8.8) and applying (3.8.7) we obtain

Unnumbered Display Equation

Notice that inline. inline

Example 3.15. Let (X, Y) have the bivariate normal distribution inline, 0 < σ2 < ∞, −1 < ρ < 1. This is a two–parameter exponential type family with

Unnumbered Display Equation

where U1(x, y) = x2 + y2, U2(x, y) = xy and

Unnumbered Display Equation

The Hessian of K(inline1, inline2) is the FIM, with respect to the canonical parameters. We obtain

Unnumbered Display Equation

Using the reparametrization formula, we get

Unnumbered Display Equation

Notice that in this example neither inline1, inline2 nor σ2, ρ are orthogonal parameters. inline

Example 3.16. Let inline = {E(λ), 0 < λ < ∞ }. ρ21, λ2) = inline. Notice that inline for all 0 < λ1, λ2 < ∞. If λ1 = λ2 then ρ1, λ2) = 0. On the other hand, 0 < ρ21, λ2) < 2 for all 0 < λ1, λ2 < ∞. However, for λ1 fixed

Unnumbered Display Equation inline

PART III: PROBLEMS

Section 3.2

3.2.1 Let X1, …, Xn be i.i.d. random variables having a common rectangular distribution R(θ1, θ2), −∞ < θ1 < θ2 < ∞.

(i) Apply the Factorization Theorem to prove that X(1) = min{Xi} and X(n) = max{Xi} are sufficient statistics.
(ii) Derive the conditional p.d.f. of X = (X1, …, Xn) given (X(1), X(n)).

3.2.2 Let X1, X2, …, Xn be i.i.d. random variables having a two–parameter exponential distribution, i.e., Xμ + E(λ), −∞ < μ < ∞, 0 < λ < ∞. Let X(1) ≤ ··· ≤ Xn be the order statistic.

(i) Apply the Factorization Theorem to prove that X(1) and S = inlineX(1)) are sufficient statistics.
(ii) Derive the conditional p.d.f. of X given (X(1), S).
(iii) How would you generate an equivalent sample X′ (by simulation) when the value of (X(1), S) are given?

3.2.3 Consider the linear regression model (Problem 3, Section 2.9). The unknown parameters are (α, β, σ). What is a sufficient statistic for inline?

3.2.4 Let X1, …, Xn be i.i.d. random variables having a Laplace distribution with p.d.f.

Unnumbered Display Equation

0 < σ < ∞. What is a sufficient statistic for inline

(i) when μ is known?
(ii) when μ is unknown?

Section 3.3

3.3.1 Let X1, …, Xn be i.i.d. random variables having a common Cauchy distribution with p.d.f.

Unnumbered Display Equation

−∞ < μ < ∞, 0 < σ < ∞. What is an m.s.s. for inline?

3.3.2 Let X1, …, Xn be i.i.d. random variables with a distribution belonging to a family inline of contaminated normal distributions, having p.d.f.s,

Unnumbered Display Equation

−∞ < μ < ∞; 0 < σ < ∞; 0 < α < 10−2. What is an m.s.s. for inline?

3.3.3 Let X1, …, Xn be i.i.d. having a common distribution belonging to the family inline of all location and scale parameter beta distributions, having the p.d.f.s

Unnumbered Display Equation

μxμ + σ; −∞ < μ < ∞; 0 < σ < ∞; 0 < p, q < ∞.

(i) What is an m.s.s. when all the four parameters are unknown?
(ii) What is an m.s.s. when p, q are known?
(iii) What is an m.s.s. when μ, σ are known?

3.3.4 Let X1, …, Xn be i.i.d. random variables having a rectangular R(θ1, θ2), −∞ < θ1 < θ2 < ∞. What is an m.s.s.?

3.3.5 inline is a family of joint distributions of (X, Y) with p.d.f.s

Unnumbered Display Equation

Given a sample of n i.i.d. random vectors (Xi, Yi), i = 1, …, n, what is an m.s.s. for inline?

3.3.6 The following is a model in population genetics, called the Hardy–Weinberg model. The frequencies N1, N2, N3, inline = n, of three genotypes among n individuals have a distribution belonging to the family inline of trinomial distributions with parameters (n, p1(θ), p2(θ ), p3(θ)), where

(3.3.1) numbered Display Equation

0 < θ < 1. What is an m.s.s. for inline?

Section 3.4

3.4.1 Let X1, …, Xn be i.i.d. random variables having a common distribution with p.d.f.

Unnumbered Display Equation

−∞ < inline1 < inline2 < ∞. Prove that T(X) = inline is an m.s.s.

3.4.2 Let {(Xk, Yi), i = 1, …, n} be i.i.d. random vectors having a common bivariate normal distribution

Unnumbered Display Equation

where −∞ < ξ, η < ∞; 0 < inline.jpg, inline.jpg < ∞; −1 < ρ < 1.

(i) Write the p.d.f. in canonical form.
(ii) What is the m.s.s. for inline?

3.4.3 In continuation of the previous problem, what is the m.s.s.

(i) when ξ = η = 0?
(ii) when σx = σy = 1?
(iii) when ξ = η = 0, σx = σy = 1?

Section 3.5

3.5.1 Let inline = {Gα (λ, 1); 0 < α < ∞, 0 < λ < ∞ } be the family of Weibull distributions. Is inline complete?

3.5.2 Let inline be the family of extreme–values distributions. Is inline complete?

3.5.3 Let inline = {R(θ1, θ2); −∞ < θ1 < θ2 < ∞ }. Let X1, X2, …, Xn, n ≥ 2, be a random sample from a distribution of inline. Is the m.s.s. complete?

3.5.4 Is the family of trinomial distributions complete?

3.5.5 Show that for the Hardy–Weinberg model the m.s.s. is complete.

Section 3.6

3.6.1 Let {(Xi, Yi), i = 1, …, n} be i.i.d. random vectors distributed like inline, −1 < ρ < 1.

(i) Show that the random vectors X and Y are ancillary statistics.
(ii) What is an m.s.s. based on the conditional distribution of Y given {X = x}?

3.6.2 Let X1, …, Xn be i.i.d. random variables having a normal distribution N(μ, σ2), where both μ and σ are unknown.

(i) Show that U(X) = inline is ancillary, where Me is the sample median; inline is the sample mean; Q1 and Q3 are the sample first and third quartiles.
(ii) Prove that U(X) is independent of |inline|/S, where S is the sample standard deviation.

Section 3.7

3.7.1 Consider the one–parameter exponential family with p.d.f.s

Unnumbered Display Equation

Show that the Fisher information function for θ is

Unnumbered Display Equation

Check this result specifically for the Binomial, Poisson, and Negative–Binomial distributions.

3.7.2 Let (Xi, Yi), i = 1, …, n be i.i.d. vectors having the bivariate standard normal distribution with unknown coefficient of correlation ρ, −1 ≤ ρ ≤ 1. Derive the Fisher information function In(ρ).

3.7.3 Let inline(x) denote the p.d.f. of N(0, 1). Define the family of mixtures

Unnumbered Display Equation

Derive the Fisher information function I(α).

3.7.4 Let inline = {f(x; inline), −∞ < inline < ∞ } be a one–parameter exponential family, where the canonical p.d.f. is

Unnumbered Display Equation

(i) Show that the Fisher information function is

Unnumbered Display Equation

(ii) Derive this Fisher information for the Binomial and Poisson distributions.

3.7.5 Let X1, …, Xn be i.i.d. N(0, σ2), 0 < σ2 < ∞.

(i) What is the m.s.s. T?
(ii) Derive the Fisher information I(σ2) from the distribution of T.
(iii) Derive the Fisher information IS2(σ2), where S2 = inline(Xiinline)2 is the sample variance. Show that IS2(σ2) < I(σ2).

3.7.6 Let (X, Y) have the bivariate standard normal distribution inline, −1 < ρ < 1. X is an ancillary statistic. Derive the conditional Fisher information I(ρ | X) and then the Fisher information I(ρ ).

3.7.7 Consider the model of Problem 6. What is the Kullback–Leibler information function I(ρ1, ρ2) for discriminating between ρ1 and ρ2 where −1 ≤ ρ1 < ρ2 ≤ 1.

3.7.8 Let XP(λ). Derive the Kullback–Leibler information I1, λ2) for 0 < λ1, λ2 < ∞.

3.7.9 Let XB(n, θ). Derive the Kullback–Liebler information function I(θ1, θ2), 0 < θ1, θ2 < 1.

3.7.10 Let XG(λ, ν), 0 < λ < ∞, ν known.

(i) Express the p.d.f. of X as a one–parameter canonical exponential type density, g(x; inline).
(ii) Find inline for which g(x; inline) is maximal.
(iii) Find the Kullback–Leibler information function I(inline, inline) and show that inline.

Section 3.8

3.8.1 Consider the trinomial distribution M(n, p1, p2), 0 < p1, p2, p1 + p2 < 1.

(i) Show that the FIM is

Unnumbered Display Equation

(ii) For the Hardy–Weinberg model, p1(θ) = θ2, p2(θ) = 2θ (1 − θ), derive the Fisher information function

Unnumbered Display Equation

3.8.2 Consider the bivariate normal distribution. Derive the FIM I(ξ, η, σ1, σ2, ρ).

3.8.3 Consider the gamma distribution G(λ, ν). Derive the FIM I(λ, ν).

3.8.4 Consider the Weibull distribution W(λ, α) ∼ (G(λ, 1))1/2; 0 < α, λ < ∞. Derive the Fisher informaton matrix I(λ, α).

Section 3.9

3.9.1 Find the Hellinger distance between two Poisson distributions with parameters λ1 and λ2.

3.9.2 Find the Hellinger distance between two Binomial distributions with parameters p1p2 and the same parameter n.

3.9.3 Show that for the Poisson and the Binomial distributions Equation (3.9.4) holds.

PART IV: SOLUTIONS TO SELECTED PROBLEMS

3.2.1 X1, …, Xn are i.i.d. ∼ R(θ1, θ2), 0 < θ1 < θ2 < ∞.

(i)

Unnumbered Display Equation

Thus, f(X1, …, Xn; θ) = A(x)g(T(x), θ), where A(x) = 1 inline x and

Unnumbered Display Equation

T(X) = (X(1), X(n)) is a likelihood statistic and thus minimal sufficient.

(ii) The p.d.f. of (X(1), X(n)) is

Unnumbered Display Equation

Let (X(1), …, X(n)) be the order statistic. The p.d.f. of (X(1), …, X(n)) is

Unnumbered Display Equation

The conditional p.d.f. of (X(1), …, X(n)) given (X(1), X(n)) is

Unnumbered Display Equation

That is, (X(2), …, X(n−1)) given (X(1), X(n)) are distributed like the (n − 2) order statistic of (n − 2) i.i.d. from R(X(1), X(n)).

3.3.6 The likelihood function of θ, 0 < θ < 1, is

Unnumbered Display Equation

Since N3 = nN1N2, 2N3 + N2 = 2n − 2N1N2. Hence,

Unnumbered Display Equation

The sample size n is known. Thus, the m.s.s. is Tn = 2N1 + N2.

3.4.1

Unnumbered Display Equation

The likelihood function of inline is

Unnumbered Display Equation

Thus inline is a likelihood statistic, i.e., minimal sufficient.

3.4.2 The joint p.d.f. of (Xi, Yi), i = 1, …, n is

Unnumbered Display Equation

In canonical form, the joint density is

Unnumbered Display Equation

where

Unnumbered Display Equation

and

Unnumbered Display Equation

The m.s.s. for inline is T(X, Y) = (Σ Xi, Σ Yi, Σ inline, Σ inline, Σ XiYi).

3.5.5 We have seen that the likelihood of θ is L(θ) inline θT(N)(1 − θ)2nT(N) where T(N) = 2N1 + N2. This is the m.s.s. Thus, the distribution of T(N) is B(2n, θ). Finally inline = B(2n, θ), 0 < θ < 1} is complete.

3.6.2

(i)

Unnumbered Display Equation

independent of μ and σ.

(ii) By Basu’s Theorem, U(X) is independent of (inline, S), which is a complete sufficient statistic. Hence, U(X) is independent of |inline|/S.

3.7.1 The score function is

Unnumbered Display Equation

Hence, the Fisher information is

Unnumbered Display Equation

Consider the equation

Unnumbered Display Equation

Differentiating both sides of this equation with respect to θ, we obtain that

Unnumbered Display Equation

Differentiating the above equations twice with respect to θ, yields

Unnumbered Display Equation

Thus, we get

Unnumbered Display Equation

Therefore

Unnumbered Display Equation

In the Binomial case,

Unnumbered Display Equation

Thus, inline (θ) = log inline, K(θ) = −n log (1 − θ)

Unnumbered Display Equation

Hence, I(θ) = inline.

In the Poisson case,

Unnumbered Display Equation

8 In the Negative–Binomial case, ν known,

Unnumbered Display Equation

3.7.2 Let inline.

Let l(ρ) denote the log–likelihood function of ρ, −1 < ρ < 1. This is

Unnumbered Display Equation

Furthermore,

Unnumbered Display Equation

Recall that I(ρ) = E{−l″(ρ)}. Moreover,

Unnumbered Display Equation

Thus,

Unnumbered Display Equation

3.7.7

Unnumbered Display Equation

Thus, the Kullback–Leibler information is

Unnumbered Display Equation

The formula is good also for ρ1 > ρ2.

3.7.10 The p.d.f. of G(λ, ν) is

Unnumbered Display Equation

When ν is known, we can write the p.d.f. as g(x; inline ) = h(x) exp (inline x + ν log (−inline)), where h(x) = inline, inline = −λ. The value of inline maximizing g(x, inline) is inline = −inline. The K–L information I(inline1, inline) is

Unnumbered Display Equation

Substituting inline1 = inline, we have

Unnumbered Display Equation

Thus,

Unnumbered Display Equation

3.8.2 The log–likelihood function is

Unnumbered Display Equation

The score coefficients are

Unnumbered Display Equation

The FIM

Unnumbered Display Equation

Unnumbered Display Equation

Unnumbered Display Equation

and

Unnumbered Display Equation

3.8.4 The FIM for the Weibull parameters. The likelihood function is

Unnumbered Display Equation

Thus,

Unnumbered Display Equation

Unnumbered Display Equation

Recall that XαE(λ). Thus, E{Xα} = inline and E{S1} = 0. Let inline(1) denote the di-gamma function at 1 (see Abramowitz and Stegun, 1965, pp. 259). Then,

Unnumbered Display Equation

Thus,

Unnumbered Display Equation

Unnumbered Display Equation

where YXαE(λ).

Unnumbered Display Equation

Accordingly,

Unnumbered Display Equation

and

Unnumbered Display Equation

Finally,

Unnumbered Display Equation

Thus,

Unnumbered Display Equation

The Fisher Information Matix is

Unnumbered Display Equation

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.31.67