2
Point Estimation

2.1 Introduction

The theory of point estimation is described in most books about mathematical statistics, and we refer here, as in other chapters, mainly to Rasch and Schott (2018).

We describe the problem as follows. Let the distribution Pθ of a random variable y depend on a parameter (vector) θ ∈ Ω ⊆ Rp,  p ≥ 1 . With the help of a realisation, Y, of a random sample Y = (y1, y2, … , yn)T,  n ≥ 1 we have to make a statement concerning the value of θ (or a function of it). The elements of a random sample Y are independently and identically distributed (i.i.d) like y. Obviously the statement about θ should be as precise as possible. What this really means depends on the choice of the loss function defined in section 1.4 in Rasch and Schott (2018). We define an estimator S(Y), i.e. a measurable mapping of Rn onto Ω taking the value S(Y) for the realisation Y=(y1, y2, … , yn)T of Y, where S(Y) is called the estimate of θ. The estimate is thus the realisation of the estimator. In this chapter, data are assumed to be realisations (y1, y2, … , yn ) of one random sample where n is called the sample size; the case of more than one sample is discussed in the following chapters. The random sample, i.e. the random variable y stems from some distribution, which is described when the method of estimation depends on the distribution – like in the maximum likelihood estimation. For this distribution the rth central moment

2.1equation

is assumed to exist where μ = E(y) is the expectation and σ2 = E[(y − μ)2] is the variance of y. The rth central sample moment mr is defined as

2.2equation

with

2.3equation

An estimator S(Y) based on a random sample Y = (y1, y2, … , yn)T of size n ≥ 1 is said to be unbiased with respect to θ if

2.4equation

holds for all θ ɛ Ω.

The difference bn(θ) = E[S(Y)] − θ is called the bias of the estimator S(Y).

We show here how R can easily calculate estimates of location and scale parameters as well as higher moments from a data set. We at first create a simple data set y in R. The following values are weights in kilograms and therefore non‐negative.

 > y <- c(5,7,1,7,8,9,13,9,10,10,18,10,15,10,10,11,8,11,12,13,15, 22,10,25,11) 

If we consider y as a sample, the sample size n can with R be determined via

 > length(y)
  [1] 25 

i.e. n = 25. We start with estimating the parameters of location.

In Sections 2.2, 2.3, and 2.4 we assume that we observe measurements in an interval scale or ratio scale; if they are in an ordinal or nominal scale we use the methods described in Section 2.5.

2.2 Estimating Location Parameters

When we estimate any parameter we assume that it exists, so speaking about expectations, skewness γ1 = μ3/σ3, kurtosis γ2 = [μ4/σ4] − 3 and so on we assume that the corresponding moments in the underlying distribution exist.

The arithmetic mean, or briefly, the mean

2.5equation

is an estimate of the expectation μ of some distribution.

2.2.1 Maximum Likelihood Estimation of Location Parameters

We now show how for location parameters of non‐normal distributions maximum likelihood estimates are calculated. We start with the lognormal distribution.

Note: more advanced R‐users can calculate maximum‐likelihood estimates directly using the library "maxLik".

2.2.2 Estimating Expectations from Censored Samples and Truncated Distributions

We consider a random variable y that is normally distributed with expectation μ and variance σ2. In animal breeding often selection means one‐sided truncation of the distribution. All animals with a performance (birth weight for example) larger than a value a are excluded from further breeding. In general we can say that we only use such observations from a normally distributed random variable y that is larger than a. The left‐sided truncated standard normal distribution is defined in the region [a, ∞). Since the final area under the curve of a truncated distribution must be equal to 1, the new curve is stretched up to compensate for the lost truncated area over the region (− ∞ , a). Therefore the density function of the ‘in a’ truncated normal distribution is

equation

The expectation of y after truncation is

2.17equation

The right‐sided truncated distribution of the standard normal distribution is defined in the region (− ∞ , b]. The density function of the ‘in b’ right‐sided truncated normal distribution is

equation

The expectation of y after truncation is in the left‐sided case

equation

and in the right‐sided case

equation

However, often after truncation (selection), the expectation μ of the initial distribution has to be estimated.

2.2.3 Estimating Location Parameters of Finite Populations

We assume that we have a finite population of size N. We first define the location parameters of such distributions and then show how to estimate them from a realised random sample of size n. It seems reasonable first to read Section 1.3. The usual procedure is sampling without replacement; when we sample with replacement the factor images in some of the formulae below is dropped. We write Y1, Y2, … , YN for the N values in the finite population with expectation images and variance images for sampling without replacement or images for sampling with replacement.

The quantity images with the bias images of the estimator images is called the mean square error (MSE) of images.

2.3 Estimating Scale Parameters

The most important scale parameters are the range, the interquartile range (IQR), and the standard deviation, variance. Except the variance all have the same dimensions as the observations.

The sample range R is a function of the order statistics of the sample its realisation is the difference between the largest and the smallest value of the sample, i.e. R = (y(n) − y(1)).

2.4 Estimating Higher Moments

In Section 2.2 the rth moment of a random variable was defined for any r > 1 and assumed that it exists if we discuss it. In (2.2) the rth sample moment was defined and for r > 2 we speak of higher moments. Usually for r > 2 the sample moments mr are used as (biased) estimates of the corresponding moments μr of a random variable.

We consider here functions of the third and the fourth moment, the skewness and the kurtosis.

The skewness γ1 is the standardised third moment

2.23equation

Sometimes it is estimated from a sample Y = (y1, y2, … , yn)T by the sample skewness

equation

with s2 defined in (2.21). The estimator

equation

is biased.

In the statistical package SAS and IBM‐SPSS Statistics with the weight 1 for all the sampled data the skewness is estimated as

equation

2.5 Contingency Tables

Contingency tables are used when observations are nominally scaled. We describe here contingency tables in general, even if not only problems of estimation are handled by them. They will mainly be used in Chapter 3 but describing them here gives a unique approach. A k‐dimensional contingency table with si levels of the ith of k factors Fi, (i = 1, …, k) is given by s1 · s2 ·  …  · sk classes, containing the number of observations from N investigated objects in a nominal scale with level si of the ith factor Ai. For such contingency tables there exist k + 1 different models. The models depend on how many factors are observed by the experimenter (they are observation factors) and thus contain random results. The other factors are called fixed factors. We explain this by a two‐dimensional contingency table.

2.5.1 Models of Two‐Dimensional Contingency Tables

In two‐dimensional contingency tables three models exist.

2.5.1.1 Model I

If we investigate N pupils and investigate whether they have blue eyes or not and if they are fair‐haired or not, then we have k = 2 factors: A eye colour with s1 = 2 levels and B hair colour with s2 = 2 levels. The observations can be arranged in a contingency table like Table 2.3.

Here both factors are observation factors, the entries nij, i = 1, 2, j = 1, 2 and the marginal sums N, N, N·1, and N·2 of the contingency Table 2.3 are random variables. Investigated is a random sample of size N. We call this situation model I of a contingency table.

2.5.1.2 Model II

If the marginal number of one of the factors, let's say A, are fixed in advance we obtain a contingency table like Table 2.4.

Such a situation occurs if N1 female and N2 male pupils are observed and it is counted how many have blue and how many do not have blue eyes. We call this model II of a contingency table.

2.5.1.3 Model III

The situation of model III with all marginal sums fixed in advance are of theoretical interest as in Fisher's ‘problem of the lady tasting tea’ reported in Fisher (1935, 1971). The lady in question (Muriel Bristol) claimed to be able to tell whether the tea or the milk was first added to the cup. Fisher proposed to give her eight cups, four of each variety, in random order. One could then ask what the probability was for her getting the specific number of cups she identified correctly, but just by chance. However, when the lady knows that for each variety four cups have been prepared she would make all marginal sums equal to four. That situation leads to Fisher's exact test in Chapter 3.

Here we describe two‐dimensional contingency tables; three‐dimensional tables are described in Rasch et al. (2008, Verfahren 4/31/3000).

In contingency tables, we can estimate measures but also test hypotheses. Here we only show how to calculate several measures from observed data in two‐dimensional contingency tables. Tests of hypotheses can be found in Chapter 3.

The degree of association between the two variables (here factors) can be assessed by a number of coefficients, so‐called association measures. The simplest, applicable only to the case of 2 × 2 contingency tables, are as follows.

2.5.2 Association Coefficients for 2 × 2 Tables

These coefficients do not depend on the marginal sums and are often calculated from a two‐dimensional contingency table in the form of Table 2.5.

References

  1. Digby, P.G.N. (1983). Approximating the tetrachoric correlation coefficient. Biometrics 39: 753–757.
  2. Fisher, R.A. (1971) [1935]. The Design of Experiments, 9e. New York: Macmillan. ISBN: 0‐02‐844690‐9.
  3. Rasch, D., Herrendörfer, G., Bock, J., Victor, N. and Guiard, V. (2008). Verfahrensbibliothek Versuchsplanung und ‐ auswertung, 2. verbesserte Auflage in einem Band mit CD. R. Oldenbourg Verlag München Wien.
  4. Rasch, D. and Schott, D. (2018) Mathematical Statistics. Wiley. Oxford.
  5. Yule, G.U. (1900). On the association of attributes in statistics: with illustrations from the material of the Childhood Society, &c. Philosophical Transactions of the Royal Society of London (A) 194: 257–319.
  6. Yule, G.U. (1911). Introduction to the Theory of Statistics. London Griffin.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.82.234