Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2
Point Estimation

2.1 Introduction

The theory of point estimation is described in most books about mathematical statistics, and we refer here, as in other chapters, mainly to Rasch and Schott (2018).

We describe the problem as follows. Let the distribution P_θ of a random variable y depend on a parameter (vector) θ ∈ Ω ⊆ R^p, p ≥ 1 . With the help of a realisation, Y, of a random sample Y = (y₁, y₂, … , y_n)^T, n ≥ 1 we have to make a statement concerning the value of θ (or a function of it). The elements of a random sample Y are independently and identically distributed (i.i.d) like y. Obviously the statement about θ should be as precise as possible. What this really means depends on the choice of the loss function defined in section 1.4 in Rasch and Schott (2018). We define an estimator S(Y), i.e. a measurable mapping of Rⁿ onto Ω taking the value S(Y) for the realisation Y=(y₁, y₂, … , y_n)^T of Y, where S(Y) is called the estimate of θ. The estimate is thus the realisation of the estimator. In this chapter, data are assumed to be realisations (y₁, y₂, … , y_n ) of one random sample where n is called the sample size; the case of more than one sample is discussed in the following chapters. The random sample, i.e. the random variable y stems from some distribution, which is described when the method of estimation depends on the distribution – like in the maximum likelihood estimation. For this distribution the rth central moment

2.1

is assumed to exist where μ = E(y) is the expectation and σ² = E[(y − μ)²] is the variance of y. The rth central sample moment m_r is defined as

2.2

with

2.3

An estimator S(Y) based on a random sample Y = (y₁, y₂, … , y_n)^T of size n ≥ 1 is said to be unbiased with respect to θ if

2.4

holds for all θ ɛ Ω.

The difference b_n(θ) = E[S(Y)] − θ is called the bias of the estimator S(Y).

We show here how R can easily calculate estimates of location and scale parameters as well as higher moments from a data set. We at first create a simple data set y in R. The following values are weights in kilograms and therefore non‐negative.

 > y <- c(5,7,1,7,8,9,13,9,10,10,18,10,15,10,10,11,8,11,12,13,15, 22,10,25,11)

If we consider y as a sample, the sample size n can with R be determined via

 > length(y)
  [1] 25

i.e. n = 25. We start with estimating the parameters of location.

In Sections 2.2, 2.3, and 2.4 we assume that we observe measurements in an interval scale or ratio scale; if they are in an ordinal or nominal scale we use the methods described in Section 2.5.

2.2 Estimating Location Parameters

When we estimate any parameter we assume that it exists, so speaking about expectations, skewness γ₁ = μ₃/σ³, kurtosis γ₂ = [μ₄/σ⁴] − 3 and so on we assume that the corresponding moments in the underlying distribution exist.

The arithmetic mean, or briefly, the mean

2.5

is an estimate of the expectation μ of some distribution.

Problem 2.1

Calculate the arithmetic mean of a sample.

Solution

Use the command > mean().

 > mean(y)

Example

We use the sample Y already defined above and obtain

 > y<- c(5,7,1,7,8,9,13,9,10,10,18,10,15,10,10,11,8,11,12,13,15,22, 10,25,11)
> mean(y)
[1] 11.2

i.e. .

The arithmetic mean is a least squares estimate of the expectation μ of y.

The corresponding least squares estimator is and is unbiased.

Problem 2.2

Calculate the extreme values y₍₁₎ = min(y) and y_(n) = max(y) of a sample.

Solution

We receive the extreme values using the R commands >min() and >max().

Example

Again, we use the sample y defined above and obtain

 > min(y)
[1] 1
> max(y)
[1] 25

i.e. y₍₁₎ = 1 and y₍₂₅₎ = 25 if we denote the jth element of the ordered set of Y by y_(j) such that y₍₁₎ ≤ … ≤ y_(n) holds. Note: you can get both values using the command > range(y).

Sometimes one or more elements of Y = (y₁, y₂, … , y_n)^T do not have the same distribution as the others and Y = (y₁, y₂, … , y_n)^T is not a random sample.

If only a few of the elements of Y have a different distribution we call them outliers. Often the minimum and the maximum values of y represent realisations of such outliers. If we conjecture the existence of such outliers we can use special L‐estimators as the trimmed or the Winsorised mean. Outliers in observed values can occur even if the corresponding element of Y is not an outlier. This can happen by incorrectly writing down an observed number or by an error in the measuring instrument.

L‐estimators are weighted means of order statistics (where L stands for linear combination). If we arrange the elements of the realisation Y of Y according to their magnitude, and if we denote the jth element of this ordered set by y_(j) such that y₍₁₎ ≤ … ≤ y_(n) holds, then

is a function of the realisation of Y, and S(Y) = Y_(.) = (y₍₁₎, … , y_(n))^T is said to be the order statistic vector, the component y_(i) is called the ith order statistic and

2.6

is said to be an L‐estimator and is called an L‐estimate.

If we put

in (2.6) with , then

2.7

is called the – trimmed mean.

If we do not suppress the t smallest and the t largest observations, but concentrate them in the values y_(t + 1) and y_{(n − t)}, respectively, then we get the so‐called Winsorised mean

2.8

The median in samples of even size n = 2m can be defined as the 1/2 Winsorised mean

2.9

To calculate the trimmed and Winsorised means using R we first order the samples of n observations by magnitude.

Problem 2.3

Order a vector of numbers by magnitude.

Solution

Use the vector y of numbers and the command >sort().

 > sortedy <- sort(y)

Example

We again use the sample

 > y<- c(5,7,1,7,8,9, 13,9,10,10, 18,10, 15, 10,10, 11, 8, 11,12,13, 15, 22, 10,25, 11)

and obtain

 > sortedy <- sort(y)
> sortedy
 [1]  1  5  7  7  8  8  9  9 10 10 10 10 10 10 11 11 11 12 13 13 15 15 18  22 25

Problem 2.4

Calculate the trimmed mean of a sample.

Solution

We at first order the sample Y using the command sort, as shown in Problem 2.3. Then we drop the smallest and the largest entry in y and denote the result as x. With > mean(x) we obtain the trimmed mean of a sample Y.

Example

We use sortedy, the ordered sample y from Problem 2.3 of the 25 observations.

 [1]  1  5  7  7  8  8  9  9 10 10 10 10 10 10 11 11 11 12 13 13 15 15 18
 22 25

and drop manually the smallest and the largest entry and call the result x.

 x<- c(5,7,7,8,8,9,9,10,10,10,10,10,10,11,11,11,12,13,13,15,15,18,22)

However, this can be done directly with R as follows

 > x< - sortedy[-1]
> x <- x[-24]
> x
[1]  5  7  7  8  8  9  9 10 10 10 10 10 10 11 11 11 12 13 13 15 15 18 22
> length(x)
[1] 23

Then we calculate the mean of the entries in x.

 > mean(x)
[1] 11.04348

and by rounding we obtain .

This is the – trimmed mean of y.

Note: you can directly find the trimmed mean using the command > mean(y, trim=1/25).

Problem 2.5

Calculate the Winsorised mean of a sample of size n.

Solution

We at first order the sample Y using the command sort, as shown in Problem 2.3. Then we set y₍₁₎ = y₍₂₎ and y_{(n − 1)} = y_(n) and call the result z.

Example

We calculate the Winsorised mean of y in

 y<- c(5,7,1,7,8,9, 13,9,10,10, 18,10, 15, 10,
     10, 11, 8,11,12,13, 15, 22, 10,25, 11).

We at first calculate using sort the ordered sample

 > sortedy <- sort(y)
 1  5  7  7  8  8  9  9 10 10 10 10 10 10 11 11 11 12 13 13 15 15 18 22 25

and shift manually 1 to 5 and 22 to 25. The result is

 z<- c(5,5,7,7,8,8,9,10,10,10,10,10,10,11,11,11,12,13,13, 15,15,18,25,25)

Of course this can be done directly in R using

 > sortedy[1] <- 5
> sortedy[24] <- 25
> z <- sortedy
> z
 [1]  5  5  7  7  8  8  9  9 10 10 10 10 10 10 11 11 11 12  13 13  15 15 18 25 25

We get the Winsorised mean via

 > mean(z)
[1] 11.48

or by rounding as .

Problem 2.6

Calculate the median of a sample.

Solution

We receive the median of a sample Y using R via

 > median(y)

Example

We use the sample

 y<- c(5,7,1,7,8,9, 13,9,10,10, 18,10, 15, 10,10, 11, 8,11,12,13, 15, 22, 10,25, 11)

and obtain

 > median(y)
[1] 10

Further location measures are the quantiles of the empirical distribution of the sample Y. We denote by q(P,Y) the P‐quantile of Y. The P‐quantile is the value in the range of y so that 100P% of the values of Y are below and 100(1 − P)% are above P. The most important quantiles are the quartiles, which are three numbers q(0.25, y) = Q₁(y), q(0.50, y) = Q₂(y) = median(y) and q(0.75, y) = Q₃(y) that divide the data set into four equal groups, each group comprising a quarter of the data.

Problem 2.7

Calculate the first and the third quartile of a sample.

Solution

Using the command >summary( ) we get the minimum and the maximum, the first and the third quartile, and the median and the mean of the sample.

Example

We use again the sample y above and get

 > summary(y)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    1.0     9.0    10.0    11.2    13.0    25.0

The first and the third quartile are Q₁(y) = 9 and Q₃(y) = 13 respectively.

If the observed numbers are ratios it often is better to use the geometric mean in place of the arithmetic mean. The geometric mean G of a vector Y=(y₁, y₂, … , y_n)^T, y_i > 0 for i = 1, …, n is defined as

2.10

The geometric mean is less than the corresponding arithmetic mean if at least two of the elements of Y are different.

Problem 2.8

Calculate the geometric mean of a sample.

Solution

We rewrite (2.10) as images and get G via

 >exp(mean(log(y)))

Example

Using the 25 data sets in y of Problem 2.1, calculating

 > exp(mean(log(y)))
[1] 9.892722

and rounding we get G = 9.89.

Another mean of Y=(y₁, y₂, … , y_n)^T is the harmonic mean H.

The harmonic mean H of a vector Y=(y₁, y₂, … , y_n)^T, y_i > 0 is defined as

2.11

Problem 2.9

Calculate the harmonic mean of a sample.

Solution

We get H via

 >length(y)/sum(1/y)

Example

Using the 25 data sets in y of Problem 2.1, calculating

 > length(y)/sum(1/y)
[1] 7.480133

and rounding we get H = 7.48.

It can be shown that , as we can see from Problems 2.1, 2.8, and 2.9, where we got 7.48 < 9.89 < 11.2.

2.2.1 Maximum Likelihood Estimation of Location Parameters

We now show how for location parameters of non‐normal distributions maximum likelihood estimates are calculated. We start with the lognormal distribution.

Problem 2.10

Calculate from n observations (x₁, x₂, … , x_n) of a lognormal distributed random variable the maximum‐likelihood (ML) estimate of the expectation .

Solution

Calculate the arithmetic mean of ln x_i; it is the maximum likelihood estimate and a realisation of an unbiased estimator.

Example

Measures of the blood pressure of 15 male persons of ages between 25 and 40 are as follows: 132, 156, 128, 122, 130, 115, 123, 125, 128, 129, 132, 124, 127, 122, 141.

In R we write

 > b<- c(132, 156, 128, 122, 130, 115, 123, 125, 128, 129, 132, 124, 127, 122, 141)

and then

 > mean(log(b))
[1] 4.856905

and .

The binomial distribution with parameter p of an event (success) and n i.i.d random variables has the probability (likelihood) function

2.13

where y is the number of successes in n independent trials with probability p of a success. In the notation so far, we consider the random sample Y = (y₁, y₂, … , y_n)^T as one, where y_i takes the value 1 with probability p and the value 0 with probability 1 − p.

By setting the derivative

equal to 0 we get the solution , which supplies a maximum of L, as the second derivative of ln L relative to p is negative. Therefore, the uniquely determined ML estimate is the relative frequency of successes.

2.14

Problem 2.11

Estimate the parameter p of a binomial distribution.

Solution

We observe a sample of size n with zeros (for no success) and 1's (for success) and calculate its arithmetic mean.

Example

In a sample of 20 observations we kept the vector (0,0,1,0,1,1,0,1,0,0,1,0,0,0,1,0, 0,1,0,0).

Using R we obtain via

 > w<- c(0,0,1,0,1,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0)
> mean(w)
[1] 0.35

Or by counting the numbers y of 1's in the observed sample and dividing this by n = 20, this also gives

In R we get the counting of 0 and 1 in w using the command:

 > table(w)
w
 0  1
13  7

and the total number n using the command

 > length(w)
[1] 20

Hence the calculation in R gives

 > 7/20
[1] 0.35

Problem 2.12

Estimate the parameter λ of a Poisson distribution.

Solution

The Poisson distribution with a random sample Y = (y_1, … ,y_n)^T has the likelihood function

2.15

By setting the derivative

equal to 0 we get the solution

2.16

which supplies a maximum of L, as the second derivative of ln L relative to λ is negative.

Example

The number of noxious weed seeds in 98 subsamples of Phleum pratense (meadow grass) is given in the frequency Table 2.1.

Table 2.1 Number of noxious weed seeds.

Number of noxious seeds	0	1	2	3	4	5	6	7	8	9	10 or more
Observed frequency f	3	17	26	16	18	9	3	5	0	1	0

In R we do the calculation of as follows:

 > number  <-    c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> frequency <- c(3, 17, 26, 16, 18, 9, 3, 5, 0, 1, 0)
> sum(number*frequency)
[1] 296
> sum(frequency)
[1] 98
> lambda <- sum(number*frequency)/sum(frequency)
> lambda
[1] 3.020408

Note: more advanced R‐users can calculate maximum‐likelihood estimates directly using the library "maxLik".

2.2.2 Estimating Expectations from Censored Samples and Truncated Distributions

We consider a random variable y that is normally distributed with expectation μ and variance σ². In animal breeding often selection means one‐sided truncation of the distribution. All animals with a performance (birth weight for example) larger than a value a are excluded from further breeding. In general we can say that we only use such observations from a normally distributed random variable y that is larger than a. The left‐sided truncated standard normal distribution is defined in the region [a, ∞). Since the final area under the curve of a truncated distribution must be equal to 1, the new curve is stretched up to compensate for the lost truncated area over the region (− ∞ , a). Therefore the density function of the ‘in a’ truncated normal distribution is

The expectation of y after truncation is

2.17

The right‐sided truncated distribution of the standard normal distribution is defined in the region (− ∞ , b]. The density function of the ‘in b’ right‐sided truncated normal distribution is

The expectation of y after truncation is in the left‐sided case

and in the right‐sided case

However, often after truncation (selection), the expectation μ of the initial distribution has to be estimated.

Problem 2.13

Estimate the expectation and the variance of the initial N(μ, σ²) distribution after an ‘in a’ left‐sided and after an ‘in b’ right‐sided truncation.

Remark

The estimation of the variance actually belongs to Section 2.3 but we need an estimate of the variance to calculate an estimate of the expectation.

Solution

Truncation right‐sided:

Calculate first the mean of the observations y_i (i = 1, …, n) and

From this we calculate as initial values of an iteration , , and , and from this we obtain the next steps in the iteration

We stop the iteration with some small ε if .

If truncation is right‐sided at b we use and and proceed as above.

Example

From a cattle population 500 heifers are selected with a milk performance of at least 3000 kg milk performance during the first 300 days of milking period. Their mean milk performance was and the second sample moment . We need estimates of the expectation and variance of all heifers of the population of heifers. Assuming that the milk performance can be modelled by a normal distribution, we can apply the iteration described above and obtain the results in Table 2.2.

If we choose ε = 1 kg iteration is stopped at step 20 because .

Now we estimate the expectation in the case of censoring. While truncation occurs in distributions, censoring occurs in samples. We have left‐ and right‐sided censoring and two types of censored samples. Type I censoring occurs if an experiment has a set number of subjects or items and stops the experiment at a predetermined time, at which point any subjects remaining are right‐censored. That is measurements y_i are only known if y_i > y₀ where y₀ is given by the experimenter before the experiment starts. We assume as our model a N(μ, σ²)‐distributed random sample Y = (y₁, y₂, … , y_N)^T, N ≥ 1 of size N. Left‐sided censoring means, from m = N − n > 0 values of the realised sample Y = (y₁, y₂, … , y_N)^T we only know that they are below y₀, and from n realisations we have a measured value. If on the other hand from m = N − n values of the realised sample Y = (y₁, y₂, … , y_N)^T we only know that they are above y₀ and from n realisations we have a measured value, we speak about right‐sided censoring.

Type II censoring occurs, we speak about right-sided censoring, if an experiment has a set number N of subjects or items and stops the experiment when a predetermined number are observed to have failed; the remaining subjects are then right‐censored. Analogously, left‐censored can be defined. Here n and m are given before the experiment starts.

Problem 2.14

Estimate the expectation of a N(μ, σ²)‐distribution based on a left‐sided or a right‐sided censored sample of type I.

Solution

Use the iteration described in the solution of Problem 2.13 using the following changes.

Replace in the scheme below the original in the first column by the entries in the second and third column respectively.

Original	Left‐sided	Right‐sided
Type I,II c	−
Type II left a	y₍₁₎
Type II right b		y_{(N − n)}

2.2.3 Estimating Location Parameters of Finite Populations

We assume that we have a finite population of size N. We first define the location parameters of such distributions and then show how to estimate them from a realised random sample of size n. It seems reasonable first to read Section 1.3. The usual procedure is sampling without replacement; when we sample with replacement the factor in some of the formulae below is dropped. We write Y₁, Y₂, … , Y_N for the N values in the finite population with expectation and variance for sampling without replacement or for sampling with replacement.

The quantity with the bias of the estimator is called the mean square error (MSE) of .

Problem 2.16

A universe of size N is subdivided into s disjoint clusters of size N_i (i = 1, 2, … , s). n_i sample units are drawn from stratum i by pure random sampling with a total sample size in the universe. Estimate the expectation of the universe if the n_i/n are chosen proportional to N_i/N.

Solution

We estimate the expectation μ of the universe by the realisation of an unbiased estimator as

2.18

Example

We refer to the data given in Table 1.1. To draw a sample of n = 5000 people from Vienna to estimate the average age of the population we multiply the values in the column n_i by 5 and receive the values n_i of persons from which we have to take their ages. For instance from the municipality ‘Innere Stadt’ we need the ages of 50 people selected by pure random sampling without replacement. Calculate the means of each municipality and use (2.18).

Problem 2.17

Estimate the expectation of a universe with N elements in s primary units (strata) having sizes N_i (i = 1, …, s) by a two‐stage sampling, drawing at first r < s strata and then from each selected strata m elements.

Solution

Select in the first stage r < s strata with probability . Draw from the selected strata by pure random sampling without replacement a sample of size m < N_i (i = 1, … , s).

Example

We will estimate the average age of the inhabitants of Vienna. We take the values of Table 1.1 and select r = 5 from the s = 23 municipalities to take an overall sample of n = 1000. For this we split the interval (0,1] into 23 subintervals i = 1, …, 23 with N₀ = 0 and generate five uniformly distributed random numbers in (0,1]. If a random number falls in any of the 23 sub‐intervals (which can easily be found by using the ‘cum’ column in Table 1.1), the corresponding municipality has to be selected. If a further random number falls into the same interval it is dropped and replaced by another uniformly distributed random number. We generate five such random numbers as follows:

 > runif(5)
 [1] 0.18769112 0.78229430 0.09359499 0.46677904 0.51150546

The first number corresponds to the municipality Mariahilf, the second to Florisdorf, the third to Landstraße, the fourth to Hietzing, and the last one to Penzing. To obtain a random sample of size 1000 we take pure random samples without replacement of size 200 from people in Mariahilf, Florisdorf, Landstraße, Hietzing, and Penzing respectively. Finally the mean of the ages of the 1000 selected inhabitants has to be calculated.

2.3 Estimating Scale Parameters

The most important scale parameters are the range, the interquartile range (IQR), and the standard deviation, variance. Except the variance all have the same dimensions as the observations.

The sample range R is a function of the order statistics of the sample its realisation is the difference between the largest and the smallest value of the sample, i.e. R = (y_(n) − y₍₁₎).

Problem 2.18

Calculate the range R of a sample.

Solution

Determine in a sample Y = (y₁, y₂, … , y_n) the smallest value y₍₁₎ = min(y_(i)), i = 1, … , n and the largest value y_(n) = max(y_(i)), i = 1, … , n. Then R = y_(n) − y₍₁₎.

Example

In our vector y has already been used above in Problem 2.1 we found the minimum and the maximum values of y

   > min(y)
[1] 1
  > max(y)
[1] 25

and therefore the sample range is R = 25−1 = 24.

The interquartile range IQR(Y) of a sample Y is defined as the difference between the third and the first sample quartile.

2.19

Problem 2.19

Calculate the interquartile range IQR(Y) of a sample Y.

Solution

Calculate the sample quartiles Q₁(Y) and Q₃(Y) of the sample Y and then IQR(Y) as the difference between the third quartile Q₃(Y) and the first quartile Q₁(Y).

Example

From the example of Problem 2.7 we know that Q₁(Y) = 9 and Q₃(Y) = 13, i.e.

We can find directly Q₁ and Q₃ from a data vector y as follows and then calculate the IQR.

We use the data of Problem 2.1.

 > y<- c(5,7,1,7,8,9, 13,9,10,10, 18,10, 15, 10, 10, 11, 8,11,12,13, 15,  22,10,25, 11)

> summary(y)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    1.0     9.0    10.0    11.2    13.0    25.0

However, if we are only interested in the interquartile range IQR we can find it directly in R:

 > IQR(y, na.rm = TRUE, type = 7)
> IQR(y, na.rm = TRUE, type = 7)
[1] 4

A further scale parameter is the standard deviation. The standard deviation of the distribution of y is the positive square root σ of the variance

2.20

An unbiased estimator of σ² from Y = (y₁, y₂, … , y_n)^T is

2.21

Using R we can find the estimate s² of a data set y with the command var(y).

The square root s of s² is a biased estimator of σ.

An unbiased estimator of σ if the elements of Y are normally distributed is then

2.22

In R the estimate sigmacap images is found by the commands for a data set y as follows:

 > n <- length(y)
> s <- sqrt(var(y))
> sigmacap <- (s*gamma((n-1)/2)*sqrt(n-1)) / (sqrt(2)*gamma(n/2))

Problem 2.20

Calculate for an observed sample the estimate (realisation) of the square root s of s² in (2.21).

Further give the estimate .

Solution

Use in R >sqrt(var(y)). Alternatively, you can use directly > sd(y).

For the commands of sigmacap see above.

Example

With y from Problem 2.1 we can calculate the realisation of s via

 > sqrt(var(y))
[1] 5.024938

The estimate sigmacap is found as follows:

 > n <- length(y)
> n
[1] 25
> s <- sqrt(var(y))
> s
[1] 5.024938
> sigmacap <- (s*gamma((n-1)/2)*sqrt(n-1)) / (sqrt(2)*gamma(n/2))
> sigmacap
[1] 5.077539

2.4 Estimating Higher Moments

In Section 2.2 the rth moment of a random variable was defined for any r > 1 and assumed that it exists if we discuss it. In (2.2) the rth sample moment was defined and for r > 2 we speak of higher moments. Usually for r > 2 the sample moments m_r are used as (biased) estimates of the corresponding moments μ_r of a random variable.

We consider here functions of the third and the fourth moment, the skewness and the kurtosis.

The skewness γ₁ is the standardised third moment

2.23

Sometimes it is estimated from a sample Y = (y₁, y₂, … , y_n)^T by the sample skewness

with s² defined in (2.21). The estimator

is biased.

In the statistical package SAS and IBM‐SPSS Statistics with the weight 1 for all the sampled data the skewness is estimated as

Problem 2.21

Calculate the sample skewness g₁ from a sample y for the data with weight 1.

Solution

Use the R ‐commands

 > m <- mean(y)
> s <- sqrt(var(y))
> n <-length(y)
> devy <- y-m
> m3 <- sum(devy^3)/n
> g1 <- (m3*n^2) / [s^3*(n-1)*(n-2)]

Example

With Y from Problem 2.1

 > y<- c(5,7,1,7,8,9, 13,9,10,10, 18,10, 15, 10,10, 11, 8,11,12,13, 15, 22, 10,25, 11)
> m <- mean(y)
> m
[1] 11.2
> s <- sqrt(var(y))
> s
[1] 5.024938
> n <-length(y)
> n
[1] 25
> devy <- y-m
> devy
 [1]  -6.2  -4.2 -10.2  -4.2  -3.2  -2.2   1.8  -2.2  -1.2  -1.2  6.8 -1.2   3.8  -1.2  -1.2  -0.2  -3.2  -0.2   0.8  1.8   3.8  10.8    -1.2  13.8  -0.2
> m3 <- sum(devy^3)/n
> m3
[1] 111.168
> g1 <- (m3*n^2)/(s^3*(n-1)*(n-2))
> g1
[1] 0.9920388

The kurtosis γ₂ is the standardised fourth moment −3. The value 3 is subtracted because then the normal distribution has kurtosis 0.

2.24

Sometimes it is estimated from a sample Y = (y₁, y₂, … , y_n)^T by the sample kurtosis

2.25

The estimator g₂ is biased.

In the statistical packages SAS and SPSS with the weight 1 for all the sampled data the kurtosis is estimated as

Problem 2.22

Calculate the sample kurtosis from a sample Y.

Solution

Use the R ‐commands

 > m <- mean(y)
> s <- sqrt(var(y))
> n <-length(y)
> devy <- y-m
> m4 <- sum(devy^4)/n
> g2 <- (((n+1)*m4 -3*(n-1)^3*s^4/n^2)*n^2)/((n-1)*(n-2)*(n-3)*s^4)

Example

With Y from Problem 2.1

 > y <- c(5,7,1,7,8,9, 13,9,10,10, 18,10, 15, 10, 10, 11, 8, 11,12,13, 15,  22, 10,25, 11)
> m <- mean(y)
> m
[1] 11.2
> s <- sqrt(var(y))
> s
[1] 5.024938
> n <-length(y)
> n
[1] 25
> devy <- y-m
> devy
 [1]  -6.2  -4.2 -10.2  -4.2  -3.2  -2.2   1.8  -2.2  -1.2 -1.2   6.8  -1.2   3.8  -1.2  -1.2  -0.2  -3.2  -0.2   0.8 1.8   3.8  10.8 -1.2  13.8  -0.2
> m4 <- sum(devy^4)/n
> m4
[1] 2625.686
> g2 <- (((n+1)*m4 -3*(n-1)^3*s^4/n^2)*n^2)/((n-1)*(n-2)*(n-3)*s^4)
> g2
[1] 2.095743

2.5 Contingency Tables

Contingency tables are used when observations are nominally scaled. We describe here contingency tables in general, even if not only problems of estimation are handled by them. They will mainly be used in Chapter 3 but describing them here gives a unique approach. A k‐dimensional contingency table with s_i levels of the ith of k factors F_i, (i = 1, …, k) is given by s₁ · s₂ · … · s_k classes, containing the number of observations from N investigated objects in a nominal scale with level s_i of the ith factor A_i. For such contingency tables there exist k + 1 different models. The models depend on how many factors are observed by the experimenter (they are observation factors) and thus contain random results. The other factors are called fixed factors. We explain this by a two‐dimensional contingency table.

2.5.1 Models of Two‐Dimensional Contingency Tables

In two‐dimensional contingency tables three models exist.

2.5.1.1 Model I

If we investigate N pupils and investigate whether they have blue eyes or not and if they are fair‐haired or not, then we have k = 2 factors: A eye colour with s₁ = 2 levels and B hair colour with s₂ = 2 levels. The observations can be arranged in a contingency table like Table 2.3.

Here both factors are observation factors, the entries n_ij, i = 1, 2, j = 1, 2 and the marginal sums N_1·, N_2·, N_·1, and N_·2 of the contingency Table 2.3 are random variables. Investigated is a random sample of size N. We call this situation model I of a contingency table.

2.5.1.2 Model II

If the marginal number of one of the factors, let's say A, are fixed in advance we obtain a contingency table like Table 2.4.

Such a situation occurs if N₁ female and N₂ male pupils are observed and it is counted how many have blue and how many do not have blue eyes. We call this model II of a contingency table.

2.5.1.3 Model III

The situation of model III with all marginal sums fixed in advance are of theoretical interest as in Fisher's ‘problem of the lady tasting tea’ reported in Fisher (1935, 1971). The lady in question (Muriel Bristol) claimed to be able to tell whether the tea or the milk was first added to the cup. Fisher proposed to give her eight cups, four of each variety, in random order. One could then ask what the probability was for her getting the specific number of cups she identified correctly, but just by chance. However, when the lady knows that for each variety four cups have been prepared she would make all marginal sums equal to four. That situation leads to Fisher's exact test in Chapter 3.

Here we describe two‐dimensional contingency tables; three‐dimensional tables are described in Rasch et al. (2008, Verfahren 4/31/3000).

In contingency tables, we can estimate measures but also test hypotheses. Here we only show how to calculate several measures from observed data in two‐dimensional contingency tables. Tests of hypotheses can be found in Chapter 3.

The degree of association between the two variables (here factors) can be assessed by a number of coefficients, so‐called association measures. The simplest, applicable only to the case of 2 × 2 contingency tables, are as follows.

2.5.2 Association Coefficients for 2 × 2 Tables

These coefficients do not depend on the marginal sums and are often calculated from a two‐dimensional contingency table in the form of Table 2.5.

Problem 2.23

Calculate the association measure Q in (2.26) for data CT2x2.

Solution

Make for data CT2x2 a table from the two columns with the R command

 > CT2x2 <- cbind( c(3534, 270), c(1319, 250) )
> a <- CT2x2[1,1]
> b <- CT2x2[1,2]
> c<- CT2x2[2,1]
> d<- CT2x2[2,2]
> numQ  <-  a*c – b*d  # num = numerator
> denomQ <-  a*c+b*d  # denom = denominator
> Q <- numQ/denomQ

Example

 > CT2x2 <- cbind(c(3534, 270), c(1319, 250))
> CT2x2
     [,1] [,2]
[1,] 3534 1319
[2,]  270  250
> a<-CT2x2[1,1]
> a
[1] 3534
> b <- CT2x2[1,2]
> b
[1] 1319
> c<- CT2x2[2,1]
> c
[1] 270
> d<- CT2x2[2,2]
> d
[1] 250
> numQ <- a*c - b*d
> numQ
[1] 624430
> denomQ <- a*c + b*d
> denomQ
[1] 1283930
> Q <- numQ/denomQ
> Q
[1] 0.4863427

From Yule (1911) we know the coefficient

2.27

We have |Y| ≤ |Q|.

Problem 2.24

Calculate the association measure Y in (2.27) for example CT2x2.

Solution

Using the same commands as in the solution of Problem 2.23 to get the values a, b, c, and d we have only to add the commands:

 > numY <- (sqrt(a*c)) – (sqrt(b*d))
> denomY <- (sqrt(a*c)) + (sqrt(b*d))
> Y <- numY / denomY

Example

 > numY <-  (sqrt(a*c)) -(sqrt(b*d))
> numY
[1] 402.5827
> denomY <- (sqrt(a*c)) + (sqrt(b*d))
> denomY
[1] 1551.06
> Y = numY/denomY
> Y <- numY/denomY
> Y
[1] 0.2595533

From Digby (1983) we know the coefficient

2.28

Problem 2.25

Calculate the association measure H in (2.28) for example CT2x2.

Solution

Using the same commands as in the solution of Problem 2.23 to get the values a, b, c, and d we have only to add the commands:

 > n1 <- ((a*c))^0.75
> n2 <- ((b*d))^0.75
> numH <- n1 - n2
> denomH <-  n1 + n2
> H <- numH/denomH

Example

 > n1 <- ((a*c))^0.75
> n1
[1] 30529.71
> n2 <- ((b*d))^0.75
> n2
[1] 13760.64
> numH <- n1 - n2
> numH
[1] 16769.07
> denomH <- n1 + n2
> denomH
[1] 44290.35
> H <- numH/denomH
> H
[1] 0.3786169

If we have general s₁, s₂ – also written a × b – contingency tables where neither s₁ or s₂ is equal to 2, we use other measures, often depending on n_ij analogous to Table 2.3 via

2.29

One of these is the contingency coefficient C defined as

2.30

This coefficient is smaller than 1 and therefore adjusted by

2.31

Example 2.2

From a population of children in Germany with the German language as their mother tongue (MTG) a random sample with replacement of size 50 was taken. Also from a population of children in Germany with the Turkish language as their mother tongue (MTT) a random sample with replacement of size 50 was taken. From the children the marital status of their mother (MS) was determined, there were s₁ = 4 classes of MS: MSS = single, MSM = married, MSD = divorced and MSW = widow. From the children the mother tongue (MT) has s₂ = 2 classes: MTG and MTT (see Table 2.6).

Table 2.2 Some results of the first 20 steps in the iteration of the Heifer example.

Step number, i	v_i + 1		√m_{2, i + 1}
1	0.062 98	451 0.5	106 4.5
2	0.102 06	445 0.4	110 9.1
3	0.125 44	440 9.8	113 8.2
4	…	…	…
19	0.206 64	427 8.9	122 7.4
20	0.207 20	427 8.0	122 8.0

Problem 2.26

Calculate χ² and the association measure C in (2.30) for Example 2.2.

Solution

In R use the following commands:

 > y <- c(6,3,23,42,18,4,3,1)
>CT4x2 <- matrix(y, nrow=4, byrow=T)
> colnames(CT4x2) <- c("MTG", "MTT")
> rownames(CT4x2) <- c("MSS", "MSM", "MSD", "MSW")
> CT4x2
> chisq.test(CT4x2)
> CHISQ <- 16.4629 # The value given by R as X-squared
> N <- sum(y)
> C2 <- CHISQ/(CHISQ + N)
> C <- sqrt(C2)

Example

 > y <- c(6,3,23,42,18,4,3,1)
> CT4x2 <- matrix(y, nrow=4, byrow=T)
> colnames(CT4x2) <- c("MTG", "MTT")
> rownames(CT4x2) <- c("MSS", "MSM", "MSD", "MSW")
> CT4x2
    MTG MTT
MSS   6   3
MSM  23  42
MSD  18   4
MSW   3   1
> chisq.test(CT4x2)

        Pearson's Chi-squared test

data:  CT4x2
X-squared = 16.4629, df = 3, p-value = 0.0009112

Warning message:
In chisq.test(CT4x2) : Chi-squared approximation may be incorrect
> CHISQ <- 16.4629 # The value given by R as X-squared
> N <- sum(y)
> N
[1] 100
> C2 <- CHISQ/(CHISQ + N)
> C <- sqrt(C2)
> C
[1] 0.3759753

Problem 2.27

Calculate the association measure C_adj in (2.31) for Example 2.2.

Solution

Using the previous commands of Problem 2.26 we add the following R commands:

 > s1 <-4
> s2 <- 2
> A <- (s1-1)*(s2-1)/(s1*s2)
> Cadj <- (A^0.25)*C

Example

 > s1 <- 4
> s2 <- 2
> A <- (s1-1)*(s2-1)/(s1*s2)
> A
[1] 0.375
> Cadj <- (A^0.25)*C
> Cadj
[1] 0.2942166

If s₁ = s₂ = k then we get the simple formula

Further measures are Tschuprow's coefficient

2.32

and Cramer's coefficient

2.33

If s₁ = s₂ then T = V, otherwise we have T < V. Between C and V the following relations exist:

Example 2.3

From the population of Germany a random sample with replacement of size N = 2000 was drawn. From each person the hair colour (factor A) was determined as A1 = ‘blond’, A2 = ‘dark’, and A3 = ‘other hair colour’. Also from each person the eye colour (factor B) was determined as B1 = ‘blue’, B2 = ‘dark’, and B3 = ‘other eye colour’.

The results are shown in Table 2.7.

Table 2.3 A two‐by‐two contingency table – model I.

	Factor B
Factor A	B₁	B₂	Sum
A₁	n₁₁	n₁₂	N_1·
A₂	n₂₁	n₂₂	N_2·
Sum	N_·1	N_·2	N

Table 2.4 A two‐by‐two contingency table – model II.

	Factor B
Factor A	B₁	B₂	Sum
A₁	n₁₁	n₁₂	N_1·
A₂	n₂₁	n₂₂	N_2·
Sum	N_·1	N_·2	N

Table 2.5 A two‐by‐two contingency table – for calculating association measures.

	Factor B
Factor A	B₁	B₂	Sum
A₁	a	b	a + b
A₂	c	d	c + d
Sum	a + c	b + d	N

Table 2.6 Mother tongue and marital status of the mother of 50 children.

	MT, mother tongue
MS, marital status	MTG	MTT
MSS	6	3
MSM	23	42
MSD	18	4
MSW	3	1

Table 2.7 Hair and eye colour of 2000 German persons.

	Eye colour, B
Hair colour, A	B1	B2	B3
A1	418	362	123
A2	153	318	164
A3	66	131	265

Problem 2.28

Calculate the association measure T in (2.29) for Example 2.3.

Solution

Use the following R commands:

 > y <- c(418, 362, 123, 153, 318, 164, 66, 131, 265)
>CT3x3 <- matrix(y, nrow=3, byrow=T)
> colnames(CT3x3) <- c("B1", "B2", "B3")
> rownames(CT3x3) <- c("A1", "A2", "A3")
> CT3x3
> chisq.test(CT3x3)
> CHISQ <- 359.9694 # The value given by R as X-squared
> N <- sum(y)
> s1 <- 3
> s2 <- 3
> prod <- (s1-1)*(s2-1)
> denom <- N*sqrt(prod)
>  T <- sqrt((CHISQ/denom))
> T
[1] 0.2999872
>

Example

 > y <- c(418, 362, 123, 153, 318, 164, 66, 131, 265)
>CT3x3 <- matrix(y, nrow=3, byrow=T)
> colnames(CT3x3) <- c("B1", "B2", "B3")
> rownames(CT3x3) <- c("A1", "A2", "A3")
> CT3x3
    B1  B2  B3
A1 418 362 123
A2 153 318 164
A3  66 131 265
> chisq.test(CT3x3)

        Pearson's Chi-squared test

data:  CT3x3
X-squared = 359.9694, df = 4, p-value < 2.2e-16
> CHISQ <- 359.9694 # The value given by R as X-squared
> N <- sum(y)
> N
[1] 2000
> s1 <- 3
> s2 <- 3
> prod <- (s1-1)*(s2-1)
> prod
[1] 4
> denom <- N*sqrt(prod)
> denom
[1] 4000
> T <- sqrt((CHISQ/denom))
> T
[1] 0.2999872

Problem 2.29

Calculate the association measure V in (2.33) for Example 2.3.

Solution

Using the previous commands of Problem 2.28 we add the following R commands:

 >s <-c(s1,s2)
>mins <- min(s)
> V <- sqrt((CHISQ/(N*(mins-1)))
> V

Example

 > s <-c(s1,s2)
> s
[1] 3 3
> mins <- min(s)
> mins
[1] 3
> V <- sqrt((CHISQ/(N*(mins-1))))
> V
[1] 0.06415392

References

Digby, P.G.N. (1983). Approximating the tetrachoric correlation coefficient. Biometrics 39: 753–757.
Fisher, R.A. (1971) [1935]. The Design of Experiments, 9e. New York: Macmillan. ISBN: 0‐02‐844690‐9.
Rasch, D., Herrendörfer, G., Bock, J., Victor, N. and Guiard, V. (2008). Verfahrensbibliothek Versuchsplanung und ‐ auswertung, 2. verbesserte Auflage in einem Band mit CD. R. Oldenbourg Verlag München Wien.
Rasch, D. and Schott, D. (2018) Mathematical Statistics. Wiley. Oxford.
Yule, G.U. (1900). On the association of attributes in statistics: with illustrations from the material of the Childhood Society, &c. Philosophical Transactions of the Royal Society of London (A) 194: 257–319.
Yule, G.U. (1911). Introduction to the Theory of Statistics. London Griffin.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2 Point Estimation

Create new playlist

Sign In

Sign Up

2.1 Introduction

2.2 Estimating Location Parameters

2.2.1 Maximum Likelihood Estimation of Location Parameters

2.2.2 Estimating Expectations from Censored Samples and Truncated Distributions

2.2.3 Estimating Location Parameters of Finite Populations

2.3 Estimating Scale Parameters

2.4 Estimating Higher Moments

2.5 Contingency Tables

2.5.1 Models of Two‐Dimensional Contingency Tables

2.5.1.1 Model I

2.5.1.2 Model II

2.5.1.3 Model III

2.5.2 Association Coefficients for 2 × 2 Tables

References

Table of Contents for
2 Point Estimation