8
PARAMETRIC POINT ESTIMATION

8.1 INTRODUCTION

In this chapter we study the theory of point estimation. Suppose, for example, that a random variable X is known to have a normal distribution images(μ,σ2), but we do not know one of the parameters, say μ. Suppose further that a sample X1, X2,…,Xn is taken on X. The problem of point estimation is to pick a (one-dimensional) statistic T(X1, X2,…,Xn) that best estimates the parameter μ. The numerical value of T when the realization is x1, x2,…,xn is frequently called an estimate of μ, while the statistic T is called an estimator of μ. If both μ and σ2 are unknown, we seek a joint statistic images as an estimator of (μ, σ2).

In Section 8.2 we formally describe the problem of parametric point estimation. Since the class of all estimators in most problems is too large it is not possible to find the “best” estimator in this class. One narrows the search somewhat by requiring that the estimators have some specified desirable properties. We describe some of these and also outline some criteria for comparing estimators.

Section 8.3 deals, in detail, with some important properties of statistics such as sufficiency, completeness, and ancillarity. We use these properties in later sections to facilitate our search for optimal estimators. Sufficiency, completeness, and ancillarity also have applications in other branches of statistical inference such as testing of hypotheses and nonparametric theory.

In Section 8.4 we investigate the criterion of unbiased estimation and study methods for obtaining optimal estimators in the class of unbiased estimators. In Section 8.5 we derive two lower bounds for variance of an unbiased estimator. These bounds can sometimes help in obtaining the “best” unbiased estimator.

In Section 8.6 we describe one of the oldest methods of estimation and in Section 8.7 we study the method of maximum likelihood estimation and its large sample properties. Section 8.8 is devoted to Bayes and minimax estimation, and Section 8.9 deals with equivariant estimation.

8.2 PROBLEM OF POINT ESTIMATION

Let X be an RV defined on a probability space (Ω, images, P). Suppose that the DF F of X depends on a certain number of parameters, and suppose further that the functional form of F is known except perhaps for a finite number of these parameters. Let images be the unknown parameter associated with F.

Let images be an RV with DF Fθ, where images is a vector of unknown parameters, θ ∈ Θ. Let ψ be a real-valued function on Θ. In this chapter we investigate the problem of approximating ψ (θ) on the basis of the observed value x of X.

The problem of point estimation is to find an estimator δ for the unknown parametric function ψ(θ) that has some nice properties. The value δ(x) of δ(X) for the data x is called the estimate of ψ(θ).

In most problems X1,X2,…, Xn are iid RVs with common DF Fθ.

It is clear that in any given problem of estimation we may have a large, often an infinite, class of appropriate estimators to choose from. Clearly we would like the estimator δ to be close to ψ(θ), and since δ is a statistic, the usual measure of closeness images is also an RV, we interpret “δ close to ψ” to mean “close on the average.” Examples of such measures of closeness are

for some images, and

for some images. Obviously we want (1) to be large whereas (2) to be small. For images, the quantity defined in (2) is called mean square error and we denote it by

(3)images

Among all estimators for ψ we would like to choose one say δ0 such that

for all δ, all images and all θ. In case of (2) the requirement is to choose δ 0 such that

for all δ, and all θ ∈ Θ. Estimators satisfying (4) or (5) do not generally exist.

We note that

(6)images

where

(7)images

is called the bias of δ. An estimator that has small MSE has small bias and variance. In order to control MSE, we need to control both variance and bias.

One approach is to restrict attention to estimators which have zero bias, that is,

The condition of unbiasedness (8) ensures that, on the average the estimator δ has no systematic error; it neither over-nor underestimates ψ on the average. If we restrict attention only to the class of unbiased estimators then we need to find an estimator δ0 in this class such that δ0 has the least variance for all θ ∈ Θ. The theory of unbiased estimation is developed in Section 8.4.

Another approach is to replace images in (2) by a more general function. Let L(θ, δ) measure the loss in estimating ψ by δ. Assume that L, the loss function, satisfies images for all θ and δ, and images for all θ. Measure average loss by the risk function

(9)images

Instead of seeking an estimator which minimizes R the risk uniformly in θ, we minimize

for some weight function π on Θ and minimize

The estimator that minimizes the average risk defined in (10) leads to the Bayes estimator and the estimator that minimizes (11) leads to the minimax estimator. Bayes and minimax estimation are discussed in Section 8.8.

Sometimes there are symmetries in the problem which may be used to restrict attention only to estimators which also exhibit the same symmetry. Consider, for example, an experiment in which the length of life of a light bulb is measured. Then an estimator obtained from the measurements expressed in hours and minutes must agree with an estimator obtained from the measurements expressed in minutes. If X represents measurements in original units (hours) and Y represents corresponding measurements in transformed units (minutes) then images (here images). If δ(X) is an estimator of the true mean, then we would expect δ(Y), the estimator of the true mean to correspond to δ(X) according to the relation images. That is, images, for all images. This is an example of an equivariant estimator which is the topic under extensive discussion in Section 8.9.

Finally, we consider some large sample properties of estimators. As the sample size images, the data x are practically the whole population, and we should expect δ(X) to approach ψ(θ) in some sense. For example, if images, and X1,X2,…,Xn are iid RVs with finite mean then strong law of large numbers tells us that images with probability 1. This property of a sequence of estimators is called consistency.

It is important to remember that consistency is a large sample property. Moreover, we speak of consistency of a sequence of estimators rather than one point estimator.

Example 4 is a particular case of the following theorem.

In Section 8.7 we consider large sample properties of maximum likelihood estimators and in Section 8.5 asymptotic efficiency is introduced.

PROBLEMS 8.2

  1. Suppose that Tn is a sequence of estimators for parameter θ that satisfies the conditions of Theorem 2. Then images, that is, Tn is squared error consistent for θ. If Tn is consistent for θ and images for all θ and all (x1, x2,…,xn) ∈ imagesn, show that images. If, however, images, then show that Tn may not be squared error consistent for θ.
  2. Let X1,X2,…,Xn be a sample from images. Let images. Show that images. Write images. Is Yn consistent for θ?
  3. Let X1,X2,…,Xn be iid RVs with images and images. Show that images is a consistent estimator for μ.
  4. Let X1,X2,…,Xn be a sample from U[0,θ]. Show that images is a consistent estimator for θe–1.
  5. In Problem 2 show that images is asymptotically biased for θ and is not BAN. (Show that images.)
  6. In Problem 5 consider the class of estimators images. Show that the estimator images in this class has the least MSE.
  7. Let X1, X2,…,Xn be iid with PDF images. Consider the class of estimators images. Show that the estimator that has the smallest MSE in this class is given by images.

8.3 SUFFICIENCY, COMPLETENESS AND ANCILLARITY

After the completion of any experiment, the job of a statistician is to interpret the data she has collected and to draw some statistically valid conclusions about the population under investigation. The raw data by themselves, besides being costly to store, are not suitable for this purpose. Therefore the statistician would like to condense the data by computing some statistics from them and to base her analysis on these statistics, provided that there is “no loss of information” in doing so. In many problems of statistical inference a function of the observations contains as much information about the unknown parameter as do all the observed values. The following example illustrates this point.

A rigorous definition of the concept involved in the above discussion requires the notion of a conditional distribution and is beyond the scope of this book. In view of the discussion of conditional probability distributions in Section 4.2, the following definition will suffice for our purposes.

Not every statistic is sufficient.

Definition 1 is not a constructive definition since it requires that we first guess a statistic T and then check to see whether T is sufficient. Moreover, the procedure for checking that T is sufficient is quite time-consuming. We now give a criterion for determining sufficient statistics.

We note that the order statistic (X(1),X(2),…,X(n)) is also sufficient. Note also that the parameter is one-dimensional, the statistics (X(1), X(n)) is two-dimensional, whereas the order statistic is n-dimensional.

In Example 9 we saw that order statistic is sufficient. This is not a mere coincidence. In fact, if images are exchangeable then the joint PDF of X is a symmetric function of its arguments. Thus

images

and it follows that the order statistic is sufficient for fθ.

The concept of sufficiency is frequently used with another concept, called completeness, which we now define.

In Definition 3 X will usually be a multiple RV. The family of distributions of T is obtained from the family of distributions of X1,X2,…,Xn by the usual transformation technique discussed in Section 4.4.

The next example illustrates the existence of a sufficient statistic which is not complete.

We see by a similar argument that X(n) is complete, which is the same as saying that images is a complete family of densities. Clearly, X(n) is sufficient.

Using an induction argument, we conclude that images and hence images. It follows that images is a complete family of distributions, and X(n) is a complete sufficient statistic.

Now suppose that we exclude the value images for some fixed images from the family images. Let us write images. Then images is not complete. We ask the reader to show that the class of all functions g such that images for all images consists of functions of the form

images

where c is a constant, images.

Remark 7. Completeness is a property of a family of distributions. In Remark 6 we saw that if a statistic is sufficient for a class of distributions it is sufficient for any subclass of those distributions. Completeness works in the opposite direction. Example 14 shows that the exclusion of even one member from the family images destroys completeness.

The following result covers a large class of probability distributions for which a complete sufficient statistic exists.

Let us write images,images, and images,images. Then images, and both images are nonnegative functions. In terms of images, (3) is the same as

for all θ.

Let images be fixed, and write

(5)images

Then both images are PMFs, and it follows from (4) that

for all images. By the uniqueness of MGFs (6) implies that

images

and hence that images for all t, which is equivalent to images for all t. Since T is clearly sufficient (by the factorization criterion), it is proved that T is a complete sufficient statistic.

In Example 6, 8, and 9 we have shown that a given family of probability distributions that admits a nontrivial sufficient statistic usually admits several sufficient statistics. Clearly we would like to be able to choose the sufficient statistic that results in the greatest reduction of data collection. We next study the notion of a minimal sufficient statistic. For this purpose it is convenient to introduce the notion of a sufficient partition. The reader will recall that a partition of a space images is just a collection of disjoint sets Eα such that images Any statistic T(X1,X2,…,Xn) induces a partition of the space of values of (X1,X2,…,Xn), that is, T induces a covering of images by a family images of disjoint sets images, where t belongs to the range of T. The sets At are called partition sets. Conversely, given a partition, any assignment of a number to each set so that no two partition sets have the same number assigned defines a statistic. Clearly this function is not, in general, unique.

Let images1, images2 be two partitions of a space images. We say that images1 is a subpartition of images2 if every partition set in images2 is a union of sets of images1. We sometimes say also that images1 is finer than images2(images2 is coarser than images1) or that images2 is a reduction of images1. In this case, a statistic T2 that defines images2 must be a function of any statistic T1 that defines images1. Clearly, this function need not have a unique inverse unless the two partitions have exactly the same partition sets.

Given a family of distributions images for which a sufficient partition exists, we seek to find a sufficient partition images that is as coarse as possible, that is, any reduction of images leads to a partition that is not sufficient.

The question of the existence of the minimal partition was settled by Lehmann and Scheffé [65] and, in general, involves measure-theoretic considerations. However, in the cases that we consider where the sample space is either discrete or a finite-dimensional Euclidean space and the family of distributions of X is defined by a family of PDFs (PMFs) images such difficulties do not arise. The construction may be described as follows.

Two points x and y in the sample space are said to be likelihood equivalent, and we write images, if and only if there exists a images which does not depend on θ such that images. We leave the reader to check that “~” is an equivalence relation (that is, it is reflexive, symmetric, and transitive) and hence “~” defines a partition of the sample space. This partition defines the minimal sufficient partition.

A rigorous proof of the above assertion is beyond the scope of this book. The basic ideas are outlined in the following theorem.

To prove the sufficiency of the minimal sufficient partition images, let T1 be an RV that induces images. Then T1 takes on distinct values over distinct sets of images but remains constant on the same set. If images, then

Now

images

depending on whether the joint distribution of X is absolutely continuous or discrete. Since fθ(x)/fθ(y) is independent of θ whenever images, it follows that the ratio on the right-hand side of (7) does not depend on θ. Thus T1 is sufficient.

In view of Theorem 3 a minimal sufficient statistic is a function of every sufficient statistic. It follows that if T1 and T2 are both minimal sufficient, then both must induce the same minimal sufficient partition and hence T1 and T2 must be equivalent in the sense that each must be a function of the other (with probability 1).

How does one show that a statistic T is not sufficient for a family of distributions images? Other than using the definition of sufficiency one can sometimes use a result of Lehmann and Scheffé [65] according to which if T1(X) is sufficient for images, then T2(X) is also sufficient if and only if images for some Borel-measurable function g and all images, where B is a Borel set with images.

Another way to prove T nonsufficient is to show that there exist x for which images but x and y are not likelihood equivalent. We refer to Sampson and Spencer [98] for this and other similar results.

The following important result will be proved in the next section.

We emphasize that the converse is not true. A minimal sufficient statistic may not be complete.

If X1, X2,…,Xn is a sample from images, then (X(1), X(n)) is minimal sufficient for θ but not complete since

images

for all θ.

Finally, we consider statistics that have distributions free of the parameter(s) θ and seem to contain no information about θ. We will see (Example 23) that such statistics can sometimes provide useful information about θ.

In Example 20 we saw that S2 was independent of the minimal sufficient statistic images. The following result due to Basu shows that it is not a mere coincidence.

The converse of Basu’s Theorem is not true. A statistic S that is independent of every ancillary statistic need not be complete (see, for example, Lehmann [62]).

The following example due to R.A. Fisher shows that if there is no sufficient statistic for θ, but there exists a reasonable statistic not independent of an ancillary statistic A(X), then the recovery of information is sometimes helped by the ancillary statistic via a conditional analysis. Unfortunately, the lack of uniqueness of ancillary statistics creates problems with this conditional analysis.

Consider the statistics

images

and

images

Then the joint PDF of S and A is given by

images

and it is clear that S and A are not independent. The marginal distribution of A is given by the PDF

images

where C(x, y) is the constant of integration which depends only on x, y, and n but not on θ. In fact, images, where K0 is the standard form of a Bessel function (Watson [116]). Consequently A is ancillary for θ.

Clearly, the conditional PDF of S given images is of the form

images

The amount of information lost by using S(X, Y) alone is images th part of the total and this loss of information is gained by the knowledge of the ancillary statistic A(X, Y). These calculations will be discussed in Example 8.5.9.

PROBLEMS 8.3

  1. Find a sufficient statistic in each of the following cases based on a random sample of size n:
    1. images when (i) α is unknown, β known; (ii) β; is unknown, α known; and (iii) α,β are both unknown.
    2. images when (i) α is unknown, β known; (ii) β is unknown, α known; and (iii) α,β are both unknown.
    3. images, where
      images

      and images are integers, when (i) N1 is known, N2 unknown; (ii) N2 known, N1 unknown; and (iii) N1,N2 are both unknown.

    4. images, where
      images
    5. images, where
      images
    6. images, where
      images

      and

      images
    7. images, where
      images

      when (i) p is known, θ unknown; (ii) p is unknown, θ known; and (iii) p, θ are both unknown.

  2. Let images be a sample from images(ασ, σ2), where α is a known real number. Show that the statistic images is sufficient for σ but that the family of distributions of T(X) is not complete.

    No.

  3. Let X1,X2,…,Xn be a sample from images(μ,σ2). Then images is clearly sufficient for the family (μ,σ2), μ∈images, images. Is the family of distributions of X complete?
  4. Let X1,X2,…,Xn be a sample from images Show that the statistic images is sufficient for θ but not complete.
  5. If images and T is sufficient, then so also is U.
  6. In Example 14 show that the class of all functions g for which images for all P ∈ images consists of functions of the form
    images
    where c is a constant.
  7. For the class images of two DFs where images is images(0,1) and images is images(1,0), find a sufficient statistic.
  8. Consider the class of hypergeometric probability distributions images, where
    images

    Show that it is a complete class. If images, is images complete?

  9. Is the family of distributions of the order statistic in sampling from a Poisson distribution complete?
  10. Let (X1,X2,…,Xn) be a random vector of the discrete type. Is the statistic images sufficient?
  11. Let X1,X2,…,Xn be a random sample from a population with law images(X). Find a minimal sufficient statistic in each of the following cases:
    1. images.
    2. images.
    3. images.
    4. images.
    5. images.
    6. images.
    7. images.
    8. images.
  12. Let X1,X2 be a sample of size 2 from P(λ). Show that the statistic X1 + αX2, where images is an integer, is not sufficient for λ.
  13. Let X1, X2,…,Xn be a sample from the PDF
    images

    Show that images is a minimal sufficient statistic for θ, but images is not sufficient.

  14. Let X1,X2,…,Xn be a sample from images (0,σ2). Show that images is a minimal sufficient statistic but images is not sufficient for σ2.
  15. Let X1,X2,…,Xn be a sample from PDF images. Find a minimal sufficient statistic for (α,β).
  16. Let T be a minimal sufficient statistic. Show that a necessary condition for a sufficient statistic U to be complete is that U be minimal.
  17. Let X1,X2,…,Xn be iid images(μ, σ2). Show that (images, S2) is independent of each of images
  18. Let X1,X2,…,Xn be iid images(θ,1). Show that a necessary and sufficient condition for images and images to be independent is images.
  19. Let X1,X2,…,Xn be a random sample from images. Show that X(1) is a complete sufficient statistic which is independent of S2.
  20. Let X1,X2,…,Xn be iid RVs with common PDF images. Show that X must be independent of every scale-invariant statistic such as images
  21. Let T1,T2 be two statistics with common domain D. Then T1 is a function of T2 if and only if
    images
  22. Let S be the support of fθ, θ ∈ Ѳ and let T be a statistic such that for some Ѳ1,Ѳ2 ∈ Ѳ, and x, yS, images, images but images. Then show that T is not sufficient for θ.
  23. Let X1,X2,…,Xn be iid images (Ѳ ,1). Use the result in Problem 22 to show that images is not sufficient for θ.
    1. If T is complete then show that any one-to-one mapping of T is also complete.
    2. Show with the help of an example that a complete statistic is not unique for a family of distributions.

8.4 UNBIASED ESTIMATION

In this section we focus attention on the class of unbiased estimators. We develop a criterion to check if an unbiased estimator is optimal in this class. Using sufficiency and completeness, we describe a method of constructing uniformly minimum variance unbiased estimators.

Note that S is not, in general, unbiased for σ. If X1,X2,…,Xn are iid images RVs we know that images. Therefore,

images

The bias of S is given by

images

We note that images so that S is asymptotically unbiased for σ.

If T is unbiased for θ, g(T) is not, in general, an unbiased estimator of g(θ) unless g is a linear function.

Let θ be estimable, and let T be an unbiased estimator of θ. Let T1 be another unbiased estimator of θ, different from T. This means that there exists at least one θ such that images. In this case there exist infinitely many unbiased estimators of θ of the form images. It is therefore desirable to find a procedure to differentiate among these estimators.

In general, a particular estimator will be better than another for some values of θ and worse for others. Definitions 2 and 3 are special cases of this concept if we restrict attention only to unbiased estimators.

The following result gives a necessary and sufficient condition for an unbiased estimator to be a UMVUE.

Conversely, let (6) hold for some T0 ε images, all θ ε Θ and all v ε images0, and let T ε images. Then images, and for every θ

images

We have

images

by the Cauchy-Schwarz inequality. If images, then images and there is nothing to prove. Otherwise

images

or images. Since T is arbitrary, the proof is complete.

Since T0 and T are both UMVUEs images, and it follows that the correlation coefficient between T and T0 is 1. This implies that images for some a, b and all θ ε Θ. Since T and T0 are both unbiased for θ, we must have images for all θ.

Remark 4. Both Theorems 1 and 2 have analogs for LMVUE's at θ0 ε Θ, θ0 fixed.

We now turn our attention to some methods for finding UMVUE's.

We will show that images. Let images. Then Y is images(nθ, n), X1 is images(θ,1), and (X1, Y) is a bivariate normal RV with variance covariance matrix images. Therefore,

images

as asserted.

If we let images, we can show similarly that images is the UMVUE for ψ(θ). Note that images may occasionally be negative, so that an UMVUE for θ2 is not very sensible in this case.

If we consider the family images instead, we have seen (Example 8.3.14 and Problem 8.3.6) that images is not complete. The UMVUE for the family images is images, which is not the UMVUE for images. The UMVUE for images is in fact, given by

images

The reader is asked to check that T1 has covariance 0 with all unbiased estimators g of 0 that are of the form described in Example 8.3.14 and Problem 8.3.6, and hence Theorem 1 implies that T1 is the UMVUE. Actually T1(X1) is a complete sufficient statistic for images. Since, images is not even unbiased for the family images. The minimum variance is given by

images

The following example shows that UMVUE may exist while minimal sufficient statistic may not.

It follows that images, and for images. Thus

images

and so on. Consequently, all unbiased estimators of 0 are of the form images. Clearly, images if images otherwise is unbiased for (θ). Moreover, for all θ

images

so that T is UMVUE of ψ(θ).

We conclude this section with a proof of Theorem 8.3.4.

PROBLEMS 8.4

  1. Let X1,X2,…,Xn images be a sample from b(1,p). Find an unbiased estimator for images.
  2. Let X1,X2,…,Xn images be a sample from images(μ,σ2). Find an unbiased estimator for σp, where images. Find a minimum MSE estimator of σp.
  3. Let X1,X2,…,Xn be iid images(μ,σ2) RVs. Find a minimum MSE estimator of the form αS2 for the parameter σ2. Compare the variances of the minimum MSE estimator and the obvious estimator S2.
  4. Let images. Does there exist an unbiased estimator of θ?
  5. Let images. Does there exist an unbiased estimator of images?
  6. Let X1,X2,…,Xn be a sample from images be an integer. Find the UMVUE for (a) images and (b) images.
  7. Let X1,X2,…,Xn be a sample from a population with mean θ and finite variance, and T be an estimator of θ of the form images. If T is an unbiased estimator of θ that has minimum variance and T' is another linear unbiased estimator of θ, then
    images
  8. Let T1, T2 be two unbiased estimators having common variance, images, where σ2 is the variance of the UMVUE. Show that the correlation coefficient between images.
  9. Let images and images. Let X1,X2,…,Xn be a sample on X. Find the UMVUE of d(θ).
  10. This example covers most discrete distributions. Let X1,X2,…,Xn be a sample from PMF
    images

    where images, and let images. Write

    images

    Show that T is a complete sufficient statistic for θ and that the UMVUE for images (r > 0 is an integer) is given by

    images
    (Roy and Mitra [94])

     

  11. Let X be a hypergeometric RV with PMF
    images

    where max images.

    1. Find the UMVUE for M when N is assumed to be known.
    2. Does there exist an unbiased estimator of N (M known)?
  12. Let X1,X2,…Xn be iid images. Find the UMVUE of images, where images is a fixed real number.
  13. Let X1,X2,…,Xn be a random sample from P(λ). Let images be a parametric function. Find the UMVUE for ψ(λ). In particular, find the UMVUE for (a) images, (b) images for some fixed integer images, (c) images, and (d) images.
  14. Let X1,X2,…,Xn be a sample from PMF
    images

    Let ψ(N) be some function of N. Find the UMVUE of ψ(N).

  15. Let X1,X2,…,Xn be a random sample from P(λ). Find the UMVUE of images, where k is a fixed positive integer.
  16. Let (X1,Y1),(X2,Y2),…,(Xn,Yn) be a sample from a bivariate normal population with parameters images, and ρ. Assume that images, and it is required to find an unbiased estimator of μ. Since a complete sufficient statistic does not exist, consider the class of all linear unbiased estimators
    images
    1. Find the variance of images.
    2. Choose images to minimize images and consider the estimator
      images

      Compute images. If images, the BLUE of μ (in the sense of minimum variance) is

      images

      irrespective of whether σ1 and ρ are known or unknown.

    3. If images and ρ, σ1, σ2 are unknown, replace these values in α0 by their corresponding estimators. Let
      images

      Show that

      images

      is an unbiased estimator of μ.

  17. Let X1,X2,…,Xn be iid images(θ,1). Let images, where Φ is the DF of a images(0,1) RV. Show that the UMVUE of p is given by images.
  18. Prove Theorem 5.
  19. In Example 10 show that T1 is the UMVUE for N (restricted to the family images), and compute the minimum variance.
  20. Let (X1,Y1),…,(Xn,Yn) be a sample from a bivariate population with finite variances images, respectively, and covariance γ. Show that
    images

    where images. It is assumed that appropriate order moments exist.

  21. Suppose that a random sample is taken on (X,Y) and it is desired to estimate γ, the unknown covariance between X and Y. Suppose that for some reason a set S of n observations is available on both X and Y, an additional n1n observations are available on X but the corresponding Y values are missing, and an additional n2 – n observations of Y are available for which the X values are missing. Let S1 be the set of all images X values, and S2, the set of all images Y values, and write
    images

    Show that

    images

    is an unbiased estimator of γ. Find the variance of images, and show that images, where S11 is the usual unbiased estimator of γ based on the n observations in S (Boas [11]).

  22. Let X1,X2,…,Xn be iid with common PDF images. Let x0 be a fixed real number. Find the UMVUE of fθ(x0).
  23. Let X1,X2,…,Xn be iid images(μ,1) RVs. Let images. Show that images is UMVUE of φ(x;,1) where φ(x;μ2) is the PDF of a images RV.
  24. Let X1,X2,…,Xn be iid G(1, θ) RVs. Show that the UMVUE of images, images, is given by h(x|t) the conditional PDF of X1 given images where
    images
  25. Let X1,X2,…,Xn be iid RVs with common PDF images, and = 0 elsewhere. Show that images is a complete sufficient statistic for θ. Find the UMVU estimator of θr.
  26. Let X1,X2,…,Xn be a random sample from PDF
    images

    where images.

    1. images is a complete sufficient statistic for θ.
    2. Show that the UMVUEs of μ and σ are given by
      images
    3. Find the UMVUE of images.
    4. Show that the UMVUE of images is given by
      images

      where images.

8.5 UNBIASED ESTIMATION (CONTINUED): A LOWER BOUND FOR THE VARIANCE OF AN ESTIMATOR

In this section we consider two inequalities, each of which provides a lower bound for the variance of an estimator. These inequalities can sometimes be used to show that an unbiased estimator is the UMVUE. We first consider an inequality due to Fréchet, Cramér, and Rao (the FCR inequality).

Let (p) be a function of p and T(X) be an unbiased estimator of (p). The only condition that need be checked is differentiability under the summation sign. We have

images

which is a polynomial in p and hence can be differentiated with respect to p. For any unbiased estimator T(X) of p we have

images

and since

images

it follows that the variance of the estimator X/n attains the lower bound of the FCR inequality, and hence T(X) has least variance among all unbiased estimators of p. Thus T(X) is UMVUE for p.

Let us next consider the problem of unbiased estimation of images based on a sample of size 1. The estimator

images

is unbiased for ψ(λ) since images

images

Also,

images

To compute the FCR lower bound we have

images

This has to be differentiated with respect to images, since we want a lower bound for an estimator of the parameter images. Let images. Then

images

and

images

so that

images

where images.

Since images for images, we see that var(δ(X)) is greater than the lower bound obtained from the FCR inequality. We show next that δ(X) is the only unbiased estimator of θ and hence is the UMVUE.

If h is any unbiased estimator of θ, it must satisfy images. That is, for all images

images

Equating coefficients of powers of λ we see immediately that images and images for images. It follows that images.

The same computation can be carried out when X1,X2,…,Xn is random sample from P(λ). We leave the reader to show that the FCR lower bound for any unbiased estimator of images is images. The estimator images is clearly unbiased for images with variance images. The UMVUE of images is given by images with images.

Corollary. Let X1,X2,…,Xn be iid with common PDF fθ(x). Suppose the family images satisfies the conditions of Theorem 1. Then equality holds in (2) if and only if, for all images,

for some function k(θ).

Proof. Recall that we derived (2) by an application of Cauchy–Schwatz inequality where equality holds if and only if (8) holds.

Remark 7. Integrating (8) with respect to θ we get

images

for some functions images, S, and A. It follows that fθ is a one-parameter exponential family and the statistic T is sufficient for θ.

Remark 8. A result that simplifies computations is the following. If fθ is twice differentiable and images can be differentiated under the expectation sign, then

For the proof of (9), it is striaghtforward to check that

images

Taking expectations on both side we get (9).

We next consider an inequality due to Chapman, Robbins, and Kiefer (the CRK inequality) that gives a lower bound for the variance of an estimator but does not require regularity conditions of the Fréchet-Cramér-Rao type.

We next introduce the concept of efficiency.

It is usual to consider the performance of an unbiased estimator by comparing its variance with the lower bound given by the FCR inequality.

In view of Remarks 6 and 7, the following result describes the relationship between most efficient unbiased estimators and UMVUEs.

Clearly, an estimator T satisfying the conditions of Theorem 3 will be UMVUE, and two estimators coincide. We emphasize that we have assumed the regularity conditions of FCR inequality in making this statement.

We return to Example 8.3.23 where X1, X2,…,Xn are iid G(1, θ), and Y1,Y2,…,Yn iid G(1, 1/θ), and X’s and Y’s are independent. Then (X1, Y1) has common PDF fθ(x, y) given above. We will compute Fisher’s Information for θ in the family of PDFs of images. Using the PDFs of images and images and the transformation technique, it is easy to see that S(X,Y) has PDF

images

Thus

images

It follows that

images

That is, the information about θ in S is smaller than that in the sample.

The Fisher Information in the conditional PDF of S given images, where images, can be shown (Problem 12) to equal

images

where K0 and K1 are Bessel functions of order 0 and 1, respectively. Averaging over all values of A, one can show that the information is 2n/θ2 which is the total Fisher information in the sample of n pairs (xj, yj)’s.

PROBLEMS 8.5

  1. Are the following families of distributions regular in the sense of Fréchet, Cramér, and Rao? If so, find the lower bound for the variance of an unbiased estimator based on a sample size n.
    1. images if images, and = 0 otherwise; images.
    2. images if images, and = 0 otherwise.
    3. images.
    4. images.
  2. Find the CRK lower bound for the variance of an unbiased estimator of θ, based on a sample of size n from the PDF of Problem 1(b).
  3. Find the CRK bound for the variance of an unbiased estimator of θ in sampling from images(θ,1).
  4. In Problem 1 check to see whether there exists a most efficient estimator in each case.
  5. Let X1, X2,…,Xn be a sample from a three-point distribution:
    images

    where images Does the FCR inequality apply in this case? If so, what is the lower bound for the variance of an unbiased estimator of θ?

    images

  6. Let X1, X2,…,Xn be iid RVs with mean μ and finite variance. What is the efficiency of the unbiased (and consistent) estimator images relative to images?
  7. When does the equality hold in the CRK inequality?
  8. Let X1, X2,…,Xn be a sample from images(μ, 1), and let images:
    1. Show that the minimum variance of any estimator of μ2 from the FCR inequality is 4μ2/n:
    2. Show that images is the UMVUE of μ2 with variance images.
  9. Let X1, X2,…,Xn be iid G(1, 1/α) RVs:
    1. Show that the estimator images is the UMVUE for α with variance images.
    2. Show that the minimum variance from FCR inequality is α2/n.
  10. In Problem 8.4.16 compute the relative efficiency of images with respect to images.
  11. Let X1,X2,…,Xn and Y1,Y2,…,Ym be independent samples from images and images, respectively, where images are unknown. Let images and images, and consider the problem of unbiased estimation of μ:
    1. If ρ is known, show that
      images

      where images is the BLUE of μ. Compute images.

    2. If ρ is unknown, the unbiased estimator
      images

      is optimum in the neighborhood of images. Find the variance of images.

    3. Compute the efficiency of images relative to images.
    4. Another unbiased estimator of μ is
      images

      where images is an images RV.

  12. Show that the Fisher Information on θ based on the PDF
    images

    for fixed a equals images, where K0(2a) and K1(2a) are Bessel functions of order 0 and 1 respectively.

8.6 SUBSTITUTION PRINCIPLE (METHOD OF MOMENTS)

One of the simplest and oldest methods of estimation is the substitution principle: Let ψ(θ), θ ∈ Θ be a parametric function to be estimated on the basis of a random sample X1, X2,…, Xn from a population DF F. Suppose we can write images for some known function h. Then the substitution principle estimator of ψ(θ) is images. where images is the sample distribution function. Accordingly we estimate images by images by images, and so on. The method of moments is a special case when we need to estimate some known function of a finite number of unknown moments. Let us suppose that we are interested in estimating

where h is some known numerical function and mj is the jth-order moment of the population distribution that is known to exist for images.

Remark 1. It is easy to extend the method to the estimation of joint moments. Thus we use images to estimate E(XY) and so on.

Remark 2. From the WLLN, images. Thus, if one is interested in estimating the population moments, the method of moments leads to consistent and unbiased estimators. Moreover, the method of moments estimators in this case are asymptotically normally distributed (see Section 7.5).

Again, if one estimates parameters of the type θ defined in (1) and h is a continuous function, the estimators T(X1,X2,…, Xn) defined in (2) are consistent for θ (see Problem 1). Under some mild conditions on h, the estimator T is also asymptotically normal (see Cramér [17, pp. 386–387]).

In particular, if X1, X2,…, Xn are iid P(λ) RVs, we know that images and images. The method of moments leads to using either images or images as an estimator of λ. To avoid this kind of ambiguity we take the estimator involving the lowest-order sample moment.

Method of moments may lead to absurd estimators. The reader is asked to compute estimators of θ in images(θ, θ) or images(θ, θ2) by the method of moments and verify this assertion.

PROBLEMS 8.6

  1. Let images, and images, where a and b are constants. Let images be a continuous function. Show that images.
  2. Let X1, X2,…, Xn be a sample from G(α, β). Find the method of moments estimator for (α, β).
  3. Let X1, X2,…, Xn be a sample from images(μ, σ2). Find the method of moments estimator for (μ, σ2).
  4. Let X1, X2,…, Xn be a sample from B(α, β). Find the method of moments estimator for (α, β).
  5. A random sample of size n is taken from the lognormal PDF
    images

    Find the method of moments estimators for μ and σ2.

8.7 MAXIMUM LIKELIHOOD ESTIMATORS

In this section we study a frequently used method of estimation, namely, the method of maximum likelihood estimation. Consider the following example.

The principle of maximum likelihood essentially assumes that the sample is representative of the population and chooses as the estimator that value of the parameter which maximizes the PDF (PMF) fθ(x).

Usually θ will be a multiple parameter. If X1; X2,…, Xn are iid with PDF (PMF) fθ(x), the likelihood function is

(2)images

Let images and images.

It is convenient to work with the logarithm of the likelihood function. Since log is a monotone function,

(4)images

Let Θ be an open subset of imagesk, and suppose that fθ(x) is a positive, differentiable function of (that is, the first-order partial derivatives exist in the components of θ). If a supremum images exists, it must satisfy the likelihood equations

Any nontrivial root of the likelihood equations (5) is called an MLE in the loose sense. A parameter value that provides the absolute maximum of the likelihood function is called an MLE in the strict sense or, simply, an MLE.

Remark 1. If images, there may still be many problems. Often the likelihood equation images has more than one root, or the likelihood function is not differentiable everywhere in Θ, or images may be a terminal value. Sometimes the likelihood equation may be quite complicated and difficult to solve explicitly. In that case one may have to resort to some numerical procedure to obtain the estimator. Similar remarks apply to the multiparameter case.

Note that images is not unbiased for σ2 Indeed, images. But images is unbiased, as we already know. Also, images is unbiased, and both images and images are consistent. In addition, images and images are method of moments estimators for μ and σ2 and images is jointly sufficient.

Finally, note that images is the MLE of μ if σ2 is known; but if f is known, the MLE of σ2 a2 is not images but images

We see that the MLE images is consistent, sufficient, and complete, but not unbiased.

In particular, let images and images and suppose that the observations, arranged in increasing order of magnitude, are 1, 2, 4. In this case the MLE can be shown to be images, which corresponds to the first-order statistic. If the sample values are 2, 3, 4, the third-order statistic is the MLE.

Remark 2. We have seen that MLEs may not be unique, although frequently they are. Also, they are not necessarily unbiased even if a unique MLE exists. In terms of MSE, an MLE may be worthless. Moreover, MLEs may not even exist. We have also seen that MLEs are functions of sufficient statistics. This is a general result, which we now prove.

Let us write images. Then

images

so that

images

We need only to show that images.

Recall from (8.5.4) with images that

images

and substituting images we get

images

That is,

images

and the proof is complete.

Remark 3. In Theorem 2 we assumed the differentiability of A(θ) and the existence of the second-order partial derivative images. If the conditions of Theorem 2 are satisfied, the most efficient estimator is necessarily the MLE. It does not follow, however, that every MLE is most efficient. For example, in sampling from a normal population, images is the MLE of σ2, but it is not most efficient. Since images is images, we see that images, which is not equal to the FCR lower bound, 2σ4/n. Note that images is not even an unbiased estimator of σ2.

We next consider an important property of MLEs that is not shared by other methods of estimation. Often the parameter of interest is not θ but some function h(θ).If images is MLE of θ what is the MLE of h(θ)? If images is a one to one function of θ, then the inverse function images is well defined and we can write the likelihood function as a function of λ We have

images

so that

images

It follows that the supremum of L* is achieved at images. Thus images is the MLE of h (θ).

In many applications images is not one-to-one. It is still tempting to take images as the MLE of λ. The following result provides a justification.

Let images, so that images. Therefore, the MLE of β is M/log images, where images is the MLE of p. To compute the MLE of p we have

images

so that the MLE of p is images. Thus the MLE of β is

images

Finally we consider some important large-sample properties of MLE's. In the following we assume that images is a family of PDFs (PMFs), where Θ is an open interval on images. The conditions listed below are stated when fθ is a PDF. Modifications for the case where fθ is a PMF are obvious and will be left to the reader.

  1. images exist for all θ ε Θ and every x. Also,
    images
  2. images for all θ ε Θ.
  3. images for all θ
  4. There exists a function H(x) such that for all θ ε Θ
    images
  5. There exists a function g(θ) which is positive and twice differentiable for every θεΘ, and a function H(x) such that for all θ
    images

Note that the condition (v) is equivalent to condition (iv) with the added qualification that images.

We state the following results without proof.

On occasions one encounters examples where the conditions of Theorem 4 are not satisfied and yet a solution of the likelihood equation is consistent and asymptotically normal.

The following theorem covers such cases also.

Remark 4. It is important to note that the results in Theorems 4 and 5 establish the consistency of some root of the likelihood equation but not necessarily that of the MLE when the likelihood equation has several roots. Huzurbazar [47] has shown that under certain conditions the likelihood equation has at most one consistent solution and that the likelihood function has a relative maximum for such a solution. Since there may be several solutions for which the likelihood function has relative maxima, Cramér's and Huzurbazar's results still do not imply that a solution of the likelihood equation that makes the likelihood function an absolute maximum is necessarily consistent.

Wald [115] has shown that under certain conditions the MLE is strongly consistent. It is important to note that Wald does not make any differentiability assumptions.

In any event, if the MLE is a unique solution of the likelihood equation, we can use Theorems 4 and 5 to conclude that it is consistent and asymptotically normal. Note that the asymptotic variance is the same as the lower bound of the FCR inequality.

We leave the reader to check that in Example 13 conditions of Theorem 5 are satisfied.

Remark 5. The invariance and the large sample properties of MLEs permit us to find MLEs of parametric functions and their limiting distributions. The delta method introduced in Section 7.5 (Theorem 1) comes in handy in these applications. Suppose in Example 13 we wish to estimate images. By invariance of MLEs, the MLE of images where images is the MLE of θ. Applying Theorem 7.5.1 we see that images is AN(θ2, 8θ4/n).

In Example 14, suppose we wish to estimate images Then images is the MLE of ψ(λ) and, in view of Theorem 7.5.1, images.

Remark 6. Neither Theorem 4 nor Theorem 5 guarantee asymptotic normality for a unique MLE. Consider, for example, a random sample from U(0,θ]. Then X(n) is the unique MLE for θ and in Problem 8.2.5 we asked the reader to show that images.

PROBLEMS 8.7

  1. Let X1, X2,…,Xn be iid RVs with common PMF (pdf) fθ (x).Find an MLE for θ in each of the following cases:
    1. images
    2. images.
    3. images and ∝ known.
    4. images.
  2. Find an MLE, if it exists, in each of the following cases:
    1. images: both n and images are unknown, and one observation is available.
    2. images.
    3. images .
    4. X1, X2, …, Xn is a sample from
      images
    5. images .
    6. images.
  3. Suppose that n observations are taken on an RV X with distribution images(μ,1), but instead of recording all the observations one notes only whether or not the observation is less than 0. If images occurs images times, find the MLE of μ.
  4. Let X1, X2 ,…,Xn be a random sample from PDF
    images
    1. Find the MLE of (α, β).
    2. Find the MLE of images.
  5. Let X1, X2,…,Xn be a sample from exponential density images, images. Find the MLE of θ, and show that it is consistent and asymptotically normal.
  6. For Problem 8.6.5 find the MLE for (μ, σ2).
  7. For a sample of size 1 taken from images(μ, σ2), show that no MLE of (μ, σ2) exists.
  8. For Problem 8.6.5 suppose that we wish to estimate N on the basis of observations X1,X2,…, XM:
    1. Find the UMVUE of N.
    2. Find the MLE of N.
    3. Compare the MSEs of the UMVUE and the MLE.
  9. Let images be independent RVs where images, images Find MLEs for μ1, μ2, …, μs, and σ2. Show that the MLE for σ2 is not consistent as s →∞ (n fixed) (Neyman and Scott [77]).
  10. Let (X, Y) have a bivariate normal distribution with parameters images, and p Suppose that n observations are made on the pair (X,Y), and N–n observations on X that is, N–n observations on Y are missing. Find the MLE's of μ1, μ2, σ21σ22, and p (Anderson [2] ).

    [Hint: If images is the joint PDF of (X,Y) write

    images

    where f1 is the marginal (normal) PDF of X, and fY|X is the conditional (normal) PDF of Y, given x with mean

    images

    and variance images. Maximize the likelihood function first with respect to μ1 and images and then with respect to images, and images

  11. In Problem 5, let images denote the MLE of θ. Find the MLE of images asymptotic distribution.
  12. In Problem 1(d), find the asymptotic distribution of the MLE of θ.
  13. In Problem 2(a), find MLE of images and its asymptotic distribution.
  14. Let X1,X2,…, Xn, be a random sample from some DF F on the real line. Suppose we observe x1,x2,…,xn which are all different. Show that the MLE of F is images, the empirical DF of the sample.
  15. Let X1, X2, …, Xn be iid images(μ,1). Suppose images. Find the MLE of μ
  16. Let images have a multinomial distribution with parameters images, images, images, where n is known. Find the MLE of images.

    images.

  17. Consider the one parameter exponential density introduced in Section 5.5 in its natural form with PDF
    images
    1. Show that the MGF of T(X) is given by
      images
      for t in some neighborhood of the origin. Moreover, images and images.
    2. If the equation images has a solution, it must be the unique MLE of η.
  18. In Problem 1(b) show that the unique MLE of θ is consistent. Is it asymptotically normal?

8.8 BAYES AND MINIMAX ESTIMATION

In this section we consider the problem of point estimation in a decision-theoretic setting. We will consider here Bayes and minimax estimation.

Let images be a family of PDFs (PMFs) and X1, X2,…,Xn be a sample from this distribution. Once the sample point (x1, x2,…,xn) is observed, the statistician takes an action on the basis of these data. Let us denote by images the set of all actions or decisions open to the statistician.

If images is observed, the statistician takes action images

Another element of decision theory is the specification of a loss function, which measures the loss incurred when we take a decision.

The value L(θ, a) is the loss to the statistician if he takes action a when θ is the true parameter value. If we use the decision function δ(X) and loss function L and θ is the true parameter value, then the loss is the RV L(θ, δ(X)). (As always, we will assume that L is a Borel-measurable function.)

The basic problem of decision theory is the following: Given a space of actions A, and a loss function L(θ, a), find a decision function δ in D such that the risk R(θ, δ) is "minimum" in some sense for all images. We need first to specify some criterion for comparing the decision functions δ.

If the problem is one of estimation, that is, if images, we call δ* satisfying (2) a minimax estimator of θ.

The computation of minimax estimators is facilitated by the use of the Bayes estimation method. So far, we have considered θ as a fixed constant and fθ(x) has represented the PDF (PMF) of the RV X. In Bayesian estimation we treat θ as a random variable distributed according to PDF (PMF) π(θ) on Θ.Also, π is called the a priori distribution.Now images represents the conditional probability density (or mass) function of RV X, given that images is held fixed. Since π is the distribution of θ, it follows that the joint density (PMF) of θ and X is given by

(3)images

In this framework R(θ, δ) is the conditional average loss, images, given that θ is held fixed. (Note that we are using the same symbol to denote the RV θ and a value assumed by it.)

Remark 1. The argument used in Theorem 1 shows that a Bayes estimator is one which minimizes images. Theorem 1 is a special case which says that if images the function

images

is the Bayes estimator for θ with respect to π, the a priori distribution on Θ.

Remark 2. Suppose T(X) is sufficient for the parameter θ. Then it is easily seen that the posterior distribution of θ given x depends on x only through T and it follows that the Bayes estimator of θ is a function of T.

The quadratic loss function used in Theorem 1 is but one example of a loss function in frequent use. Some of many other loss functions that may be used are

images

Clearly δ* is also the Bayes estimator under the quadratic loss function images.

Key to the derivation of Bayes estimator is the posteriori distribution, h(θ | x).The derivation of the posteriori distribution, images however, is a three-step process:

  1. Find the joint distribution of X and θ given by images.
  2. Find the marginal distribution with PDF (PMF) g(x) by integrating (summing) over images
  3. Divide the joint PDF (PMF) by g(x).

It is not always easy to go through these steps in practice. It may not be possible to obtain images in a closed form.

To avoid problem of integration such as that in Example 8, statisticians use the so-called conjugate prior distributions. Often there is a natural parameter family of distributions such that the posterior distributions also belong to the same family. These priors make the computations much easier.

Conjugate priors are popular because whenever the prior family is parametric the posterior distributions are always computable, images being an updated parametric version of π(θ). One no longer needs to go through a computation of g, the marginal PDF (PMF) of X.Once images is known g, if needed, is easily determined from

images

Thus in Example 10, we see easily that g(x) is beta images while in Example 6 g is given by

images

Conjugate priors are usually associated with a wide class of sampling distributions, namely, the exponential family of distributions.

Natural Conjugate Priors
Smapling PDF(PMF), images Prior π(θ) Posterior images
N(θ, σ2) N(μ, τ2) images
G(v, β) G(α, β) images
b(n, p) B(α, β) images
P(λ) G(α, β) images
NB(r; p) B(α, β) images
G(γ, 1/θ) G(α, β) images

Another easy way is to use a noninformative prior π(θ) though one needs some integration to obtain g(x).

Calculation of images becomes easier by-passing the calculation of g(x) when images is invariant under a group images of transformations following Fraser’s [33] structural theory.

Let images be a group of Borel-measurable functions on imagesn onto itself. The group operation is composition, that is, if g1 and g2 are mappings from imagesn onto imagesn, g2g1 is defined by images. Also, images is closed under composition and inverse, so that all maps in images are one-to-one. We define the group G of affine linear transformations images by

images

The inverse of {a, b} is

images

and the composition {a, b} and images is given by

images

In particular,

images

The following theorem provides a method for determining minimax estimators.

The following examples show how to obtain constant risk estimators and the suitable prior distribution.

Consider the natural conjugate priori PDF

images

The a posteriori PDF of p. given x, is expressed by

images

It follows that

images

Which is the Bayes estimator for a squared error loss .For this to be of the form δ*, we must have

images

giving images. It follows that the estimator δ*(x) is minimax with constant risk

images

Note that the UMVUE (which is also the MLE) is images with risk images. Comparing the two risks (Figs. 1 and 2), we see that

images
c8-fig-0001

Fig. 1 Comparison of R(p, δ) and images.

c8-fig-0002

Fig. 2 Comparison of R(p, δ) and images.

so that

images

in the interval images, where images as images. Moreover,

images

Clearly, we would prefer the minimax estimator if n is small and would prefer the UMVUE because of its simplicity if n is large.

The following theorem which is an extension of Theorem 2 is of considerable help to prove minimaxity of various estimators.

Proof. Clearly, images. Suppose images is not admissible, then there exists another rule δ*(x) such that images while the inequality is strict for some images (say). Now, the risk R(θ, δ) is a continuous function of θ and hence there exists an images such that images. for images.

Now consider the prior N(0, τ2). Then the Bayes estimator is images ith risk images. Thus,

images

However,

images

We get

images

The right-hand side goes to images. This result leads to a contradiction that δ* is admissible. Hence images is admissible under squared loss.

Thus we have proved that images is an admissible minimax estimator of the mean of a normal distribution image.

PROBLEMS 8.8

  1. It rains quite often in Bowling Green, Ohio. On a rainy day a teacher has essentially three choices: (1) to take an umbrella and face the possible prospect of carrying it around in the sunshine; (2) to leave the umbrella at home and perhaps get drenched; or (3) to just give up the lecture and stay at home. Let images, where θ1 corresponds to rain, and θ2, to no rain. Let images, where ai corresponds to the choice i, images. Suppose that the following table gives the losses for the decision problem:
    θ1 θ2
    a1 1 2
    a2 4 0
    a3 5 5

    The teacher has to make a decision on the basis of a weather report that depends on θ as follows.

    θ1 θ2
    W1 (Rain) 0.7 0.2
    W2 (No Rain) 0.3 0.8

    Find the minimax rule to help the teacher reach a decision.

  2. Let X1,X2,…, Xn be a random sample from P(λ). For estimating λ, using the quadratic error loss function, an a priori distribution over Θ, given by PDF
    images

    is used:

    1. Find the Bayes estimator for λ.
    2. If it is required to estimate images with the same loss function and same a priori PDF, find the Bayes estimator for ϕ(λ).
  3. Let X1, X2,…, Xn be a sample from b(1, θ). Consider the class of decision rules δ of the form images, where α is a constant to be determined. Find α according to the minimax principle, using the loss function (θ–δ)2, where δ is an estimator for θ.
  4. Let δ* be a minimax estimator for (θ) with respect to the squared error loss function. Show that images is a minimax estimator for images.
  5. Let images, and suppose that the a priori PDF of θ is U(0, 1). Find the Bayes estimator of θ, using loss function images. Find a minimax estimator for θ.
  6. In Example 5 find the Bayes estimator for p2.
  7. Let X1, X2,…, Xn be a random sample from G(1, 1/λ). To estimate λ, let the a priori PDF on λ be images, and let the loss function be squared error. Find the Bayes estimator of λ.
  8. Let X1, X2,…, Xn be iid U(0, θ) RVs. Suppose the prior distribution of θ is a Pareto PDF images. Using the quadratic loss function find the Bayes estimator of θ.
  9. Let T be the unique Bayes estimator of θ with respect to the prior density π. Then T is admissible.
  10. Let X1, X2,…, Xn be iid with PDF images. Take images. Find the Bayes estimator of θ under quadratic loss.
  11. For the PDF of Problem 10 consider the estimation of θ under quadratic loss. Consider the class of estimators a images for all images. Show that X(1)–1/n is minimax in this class.

8.9 PRINCIPLE OF EQUIVARIANCE

Let images.be a family of distributions of some RV X. Let images be sample space of values of X. In Section 8.8 we saw that the statistical decision theory revolves around the following four basic elements: the parameter space Θ, the action space images, the sample space X, and the loss function L(θ, a).

Let images be a group of transformations which map X onto itself. We say that images is invariant under images if for each images and every images, there is a unique images such that images whenvever images. Accordingly,

(1)images

for all Borel subsets in imagesn. We note that the invariance of images under images does not change the class of distributions we begin with; it only changes the parameter or index θ to images.The group images induces images, a group of transformations images on Θ onto itself.

In order to apply invariance considerations to a decision problem we need also to ensure that the loss function is invariant.

Suppose θ is the mean of PDF fθ, images, and {fθ} is invariant under images. Consider the estimator images. What we want in an estimator images of θ is that it changes in the same prescribed way as the data are changed. In our case, since X changes to images we would like images to transform to images.

Indeed g on S induces images on Θ. Thus if images, then images so if δ(X) estimates θ then δ(gX) should estimate images. The principle of equivariance requires that we restrict attention to equivariant estimators and select the “best” estimator in this class in a sense to be described later in this section.

In Example 6 consider the statistic images. Note that under the translation group images and images. That is, for every images. A statistic images is said to be invariant under a group of transformations images if images. for all images. When images is the translation group, an invariant statistic (function) under images is called location-invariant. Similarly if images is the scale group, we call images scale-invariant and if images is the location-scale group, we call images location-scale invariant. In Example 6 images is location-invariant but not equivariant, and images 2 and images are not location-invariant.

A very important property of equivariant estimators is that their risk function is constant on orbits of θ.

Remark 1. When the risk function of every equivariant estimator is constant, an estimator (in the class equivariant estimators) which is obtained by minimizing the constant is called the minimum risk equivariant (MRE) estimator.

Next consider estimation of σ2 with images and images. Then images is an equivariant estimator of σ2. Note that images may be used to designate x on its orbits

images

Again A(x) is invariant under images and A(X) is ancillary to σ2. Moreover, images, and A(X) are independent.

Finally, we consider estimation of (μ, σ2) when images. Then images, where images is an equivariant estimator of (μ, σ2). Also images may be used to designate x on its orbits

images

Note that the statistic A(X)defined in each of the three cases considered in Example 8 is constant on its orbits. A statistic A is said to be maximal invariant if

  1. A is invariant, and
  2. A is maximal, that is, images for some images.

We now derive an explicit expression for MRE estimator for a location parameter. Let X1, X2,…, Xn be iid with common PDF images. Then images is invariant under images and an estimator of θ is equivariant if

images

for all real b.

From Theorem 1 the risk function of an equivariant estimator is constant with risk

images

where the expectation is with respect to PDF images. Consequently, among all equivariant estimators for θ, the MRE estimator is 0 satisfying

images

Thus we only need to choose the function q in (4).

Let L(θ, ) be the loss function. Invariance considerations require that

images

for all real b so that L(θ, ) must be some function w of images

Let images, and g(y) be the joint PDF of Y under images. Let images be the conditional density, under images, of X1 given images. Then

(5)images

Then R(0, ∂) will be minimized by choosing, for each fixed y, q(y) to be that value of c which minimizes

Necessarily q depends on y. In the special case images, the integral in (6) is minimum when c is chosen to be the mean of the conditional distribution. Thus the unique MRE estimator of θ is given by

(7)images

This is the so-called Pitman estimator. Let us simplify it a little more by computing images.

First we need to compute h(u|y). When images, the joint PDF of X1, Y2,…, Yn is easily seen to be

images

so the joint PDF of (Y2,…, Yn) is given by

images

It follows that

Now let images. Then the conditional PDF of Z given y is images. It follows from (8) that

Remark 2. Since the joint PDF of X1, X2,…, Xn is images, the joint PDF of θ and X when θ has prior π(θ) is images. The joint marginal of X is images. It follows that the conditional pdf of θ given images is given by

images

Taking images, the improper uniform prior on Θ, we see from (9) that ∂0(x) is the Bayes estimator of θ under squared error loss and prior images. Since the risk of 0 is constant, it follows that 0 is also minimax estimator of θ.

Remark 3. Suppose S is sufficient for θ. Then images. so that the Pitman estimator of θ can be rewritten as

images

which is a function of s alone.

We now consider, briefly, Pitman estimator of a scale parameter. Let X have a joint PDF

images

where f is known and images is a scale parameter. The family images remains invariant under images which induces images on Θ. Then for estimation of σk loss function L(σ, a) is invariant under these transformations if and only if images. An estimator of σk is equivariant under images if

images

Some simple examples of scale-equivariant estimators of σ are the mean deviation images and the standard deviation images. We note that the group images over Θ is transitive so according to Theorem 1, the risk of any equivariant estimator of σk is free of σ and an MRE estimator minimizes this risk over the class of all equivariant estimators of σk. Using the loss function images it can be shown that the MRE estimator of σk, also known as the Pitman estimate of σk, is given by

images

Just as in the location case one can show that ∂0 is a function of the minimal sufficient statistic and ∂0 is the Bayes estimator of σk with improper prior images. Consequently, ∂0 is minimax.

PROBLEMS 8.9

In all problems assume that X1, X2,…, Xn is a random sample from the distribution under consideration.

  1. Show that the following statistics are equivariant under translation group:
    1. Median (Xi).
    2. images.
    3. images, the quantile of order p, images.
    4. images.
    5. images, where images is the mean of a sample of size m, images.
  2. Show that the following statistics are invariant under location or scale or location scale group:
    1. images – median (Xi).
    2. images.
    3. images.
    4. images, where (X1, Y1,…,(Xn, Yn) is random sample from a bivariate distribution.
  3. Let the common distribution be G(α, σ) where images is known and images is unknown. Find the MRE estimator of σ under loss images.
  4. Let the common PDF be the folded normal distribution
    images

    Verify that the best equivariant estimator of μ under quadratic loss of given by

    images
  5. Let images.
    1. Show that (X(1), X(n)) is jointly sufficient statistic for θ.
    2. Verify whether or not (X(n)X(1)) is an unbiased estimator of θ. Find an ancillary statistic.
    3. Determine the best invariant estimator of θ under loss function images.
  6. Let
    images

    Find the Pitman estimator of θ.

  7. Let images, for images. Find the Pitman estimator of θ.
  8. Show that an estimator or ∂ is (location) equivariant if and only if
    images

    Where ∂0 is any equivariant estimator and φ is an invariant function.

  9. Let X1, X2 be iid with PDF
    images

    Find, explicitly, the Pitman estimator of σr.

  10. Let X1, X2,…,Xn be iid with PDF
    images

    Find the Pitman estimator of θk.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.4.191