Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8
Regression Analysis

8.1 Introduction

The term regression stems from Galton (1885) who described biological phenomena. Later Yule (1897) generalised this term. It describes the relationship between two (or more) characters, which at this early stage were considered as realisations of random variables. Nowadays regression analysis is a theory within mathematical statistics with broad applications in empirical research.

Partially, we take notations from those of functions in mathematics. A mathematical function describes a deterministic relation between variables. So is the circumference c of a circle dependent on the radius r of this circle, c = 2rπ.

Written in this way the circumference is called the dependent variable and the radius is called the independent variable or more generally in a function

where x is called the independent variable and y the dependent variable. The role of these variables can be interchanged under some mathematical assumptions by using the (existence assumed) inverse function

In contrast to mathematics, in empirical sciences such functional relationships seldom exist. For instance, let us consider height at withers and age, or height at withers and chest girth of cattle. Although there is obviously no formula by which one can calculate the chest girth or the age of cattle from the height at withers, nevertheless there is obviously a connection between both. When both measurements are present and a point represents the value pair of each animal in a coordinate system. All these points are not, as in the case of a functional dependency, on a curve; it is rather a point cloud or as we say a scatter plot. In such a cloud, a clear trend is frequently recognisable, which suggests the existence of a relationship. We demonstrate this by examples.

Example 8.1

In Table 8.1, we find data from Barath et al. (1996).

Table 8.1 The height of hemp plants (y in centimetres) during growth (x age in weeks).

x_i	y_i	x_i	y_i
1	8.30	8	84.40
2	15.20	9	98.10
3	24.70	10	107.70
4	32.00	11	112.00
5	39.30	12	116.90
6	55.40	13	119.90
7	69.00	14	121.10

The scatter‐plot for these data is shown in Figure 8.1. It seems that there exists a relatively strong but non‐linear relationship.

Figure 8.1 Scatter‐plot for the association between age and height of hemp plants.

Example 8.2

In Table 8.2 data collected from 25 students on a statistics course in 1996 at Wageningen Agricultural University (The Netherlands) are shown.

Table 8.2 Shoe sizes (x in centimetres) and body heights (y in centimetres) from 25 students.

Shoe size	Height	Shoe size	Height
42	165	40	162
43	185	41	163
43	178	40	160
41	170	38	151
39	157	42	170
44	170	41	170
36	159	43	178
40	180	42	163
40	168	38	160
39	165	39	166
42	172	40	170
40	158	42	178
43	180

The scatter‐plot for this group of students is shown in Figure 8.2. Here there is no strong relationship, but a linear relationship possibly exists.

Figure 8.2 Scatter‐plot for the observations of Example 8.2.

Example 8.3

We consider data from Rasch (1968). Reported are the average withers heights (in centimetres) of 112 female cattle (Deutsches Frankenrind [German Frankonian Cattle]) measured at six‐monthly intervals between birth (age = 0) and 60 months. These are shown in Table 8.3.

Table 8.3 Average withers heights of 112 cows in the first 60 months of life.

i	Age (months)	Height (cm)
1	0	77.20
2	6.00	94.50
3	12.00	107.20
4	18.00	116.00
5	24.00	122.40
6	30.00	126.70
7	36.00	129.20
8	42.00	129.90
9	48.00	130.40
10	54.00	130.80
11	60.00	131.20

Problem 8.1

Draw the scatter plot of Example 8.3.

Solution

 > age <- c( 0,6,12,18,24,30,36,42,48,54,60)
> height <- c(77.2,94.5,107.2,116.0,122.4,126.7,129.2,129.9,130.4,130.8,131.2)
> plot(age,height,main = "Example 8.3")

Let us discuss the figures and examples. The scatter‐plot in Figure 8.3 shows us that the relation between height and age is not linear. We shall therefore fit quasilinear and intrinsically non‐linear functions to these data.

Scatter-plot depicting the association between the height and age of 25 students. — Figure 8.3 Scatter‐plot of the data in Example 8.3.

In Example 8.1 the age when hemp plants were measured was chosen by the experimenter – a situation discussed in more detail in Section 8.2 and of course the age is plotted on the x‐axis. The corresponding scatter plot in Figure 8.1 shows a strong relation between both variables and it seems that they lie on an S‐shaped (sigmoid) curve.

In Example 8.2 both variables are collected at the same time – a situation discussed in more detail in Section 8.3 and here any of the two variables could be plotted on the x‐axis. The corresponding scatter plot in Figure 8.2 shows no clear relation between both variables and it is uncertain which curve could be drawn through the scatter plot.

Relationships that are not strictly functional are stochastic, and their investigation is the main object of a regression analysis. We call the variable(s) used to explain another variable the regressor(s) and the dependent variable the regressand. In Example 8.1, clearly, the age is the regressor, but in Example 8.2 the shoe size as well as the body height can be used as regressor.

In the style of mathematics, the terms dependent and independent variables are also used for stochastic relations in regression analysis – even if this sometimes makes no sense. If we consider the body height and the shoe size of students in Example 8.2 neither is independent of the other. In contrast, in the pair (height of hemp plants, age) in Example 8.1 the first depends on the second and not the other way around. We have here two very different situations. In the first case, we could model each of the two characteristics by a random variable; both measured on the animal at the same time. In the second case, the experimenter determines at which ages the height of plants should be measured. As we see later the choice of age leads to a part of optimal experimental designs.

We use two different models for such situations. In regression model I the regressor is not a stochastic variable; its values are fixed before observation (given by the experimenter). In regression model II both variables are observed together and only in this case may we interchange the regressor – regressand – role of both variables.

In all cases, a function of the regressor variables is considered the regression function, and its parameters are estimated. In a narrower sense, regression may refer specifically to the estimation of continuous response variables, as opposed to the discrete response variables used in Chapter 11. The sample is representative of the population for inference prediction. The basic of regression analysis is the regression function; this is a mathematical function within a regression model. Let the regression function depend on a fixed or random vector of regressors x or x and a parameter vector β. In addition, the regression model contains a random error term e. We write either the model for the regressand y as

for model I or as

for model II respectively.

For both models, the following assumptions must be fulfilled.

The error term is a random variable with expectation zero.

The regressor variables are measured without error. (If this is not so, modelling may be done instead using error‐in‐variables models not discussed in this book).
The regressor variables are linearly independent.
The error terms are uncorrelated for several observations y_i.
The variance of the error is equal for all observations.

Regressor and regressand variables often refer to values measured at point locations. There may be spatial trends and spatial autocorrelation in the variables that violate statistical assumptions of regression. We discuss such problems in Chapter 12.

We start with linear and non‐linear regression models with non‐random regressors in Section 8.2 and continue with regression models with random regressors in Section 8.3.

8.2 Regression with Non‐Random Regressors – Model I of Regression

In this section, we use the regression model

8.1

with n larger than the number of unknown components of β. The x_i may be vectors , if k = 1 we speak about simple regression, and if k > 1 about multiple regression. For (8.1) the assumptions 1–4 are given above.

8.2.1 Linear and Quasilinear Regression

Definition 8.1

Let X be a [n × (k + 1)] matrix of rank k + 1 < n and Ω = R[X] the rank space of X. Further, let β ∈ Ω be a vector of parameters β_j (j = 0, … , k) and Y = Y_n an n‐dimensional random variable. If the relations E(e) = 0_n and var(e) = σ²I_n for the error term e are valid, and the e_i 's are uncorrelated, then

8.2

using X = (e_n, X*) is called model I of the linear regression with k regressors in standard form; X* is the matrix of the k column vectors of the regressors.

If some of the x_ji, j = 1, … , k are nonlinear functions of some of the remaining x_ji we speak about quasiliner regression if these x_ji do not depend on β or on any other unknown parameters. We have var(Y) = var(e) = σ²I_n.

We write the regression equation in matrix form as (8.2) with

and

Example 8.4

We use data from Steger and Püschel (1960) to demonstrate some calculations (Table 8.4).

Table 8.4 Carotene content (in mg/100 g dry matter) y of grass in dependency of the time of storage x (in days) for two kinds of storage from Steger and Püschel (1960).

Time	Sack	Glass
1	31.2500	31.2500
60	28.7100	30.4700
124	23.6700	20.3400
223	18.1300	11.8400
303	15.5300	9.4500

8.2.1.1 Parameter Estimation

When we know nothing about the distribution of the error terms, we use the least squares method, which gives the same results as the maximum likelihood method for normally distributed y. An estimator of the regression coefficient β using the least squares method is an estimator where its realisations fulfil

This leads to the estimators ((X^TX)⁻¹ existing because rank (X) = k + 1 ≤ n)

8.3

The variance var(b) of the vector b is

8.4

From the Gauss–Markow theorem (Rasch and Schott (2018)), Theorem 4.3) we know that the least squares method gives in linear models linearly unbiased estimators with minimal variance – so‐called best linear unbiased estimators (BLUE).

We write (X^TX)⁻¹ = (c_ij), i, j = 0, 1, … , k and use this later in the testing part.

For those not familiar with matrix notation we can use the following way.

Determine the minimum of

8.5

We obtain the minimum by deriving to β and zeroising the derivatives, and checking that the solution gives a minimum. The solution we call , the least squares estimate and switching to random variables y_i gives the least squares estimator .

The components of β in (8.5) are called regression coefficients, the components of (8.3) are the estimated regression coefficients.

We consider the special case of simple linear regression

8.6

Zeroising these derivatives gives the so‐called normal equations.

The values of β₀ and β₁, minimising S = are denoted by b₀ and b₁. We obtain the following equations by putting the partial derivatives with respect to β₀ and β₁ above equal to zero and replacing all y_i by the random variable y_i. (We check the fact that we really obtain a minimum by showing that the matrix of the second partial derivatives of S is positive definite). When S is a convex function, the solution of the first partial derivatives of S equal to zero gives a minimum.

We write with SS the sum of squares and SP the sum of products:

8.7

and

8.8

and obtain

8.9

and

8.10

Problem 8.4

Determine the least squares estimator of the simple linear regression (we assume that at least two of the x_i are different).

We call β₀ the intercept and β₁ the slope of the regression line.

Solution

Let S = . The first partial derivatives of S are:

Putting the first partial derivatives equal to zero, we find the solutions (8.9) and (8.10).

Because S is a convex function, we really get a minimum.

Example

We use the sack‐glass data from Example 8.4. The estimated regression coefficients in the linear model for the sack are:

the estimated regression function is with y = carotene content in sack and x = time:.

 > time <- c(1, 60, 124, 223, 303)
> sack <- c(31.25, 28.71, 23.67, 18.13, 15.53)
> lm( sack ∼ time)
Call:
lm(formula = sack ∼ time)
Coefficients:
(Intercept)         time
   31.21565     -0.05455

Figure 8.4 shows the estimated regression line and the scatter plot of the carotene example (sack) in Table 8.5.

Table 8.5 Lower and upper bounds of the realised 95%‐confidence band for β₀ + β₁x.

x		K_x	Lower bound	Upper bound
1	31.16	0.73184	28.92	33.40
60	27.94	0.56012	26.23	29.65
124	24.45	0.45340	23.06	25.84
223	19.05	0.55668	17.35	20.75
303	14.69	0.79701	12.25	17.13

Figure 8.4 Scatter‐plot and estimated regression line of the carotene example (sack) in Table 8.4.

When we use the carotene data for glass in Table 8.4 we receive analogously

 > glass <- c(31.25 ,30.47,20.34, 11.84, 9.45)
> lm( glass ∼time)
Call:
lm(formula = glass ∼ time)
Coefficients:
(Intercept)         time
   32.18536     -0.08098

and the estimated regression line = 32.185 − 0.081x

Both regression lines are shown in Figure 8.5.

Figure 8.5 Estimated regression lines of the carotene example (sack , glass ) of Table 8.4.

With k = 2 and f(x) = β₀ + β₁x + β₂x² we obtain the following model equation for y_j.

8.11

We assume that the e_j are mutually independent and distributed as N(0; σ ²).

This is the model for a quadratic regression. Formally, we can handle this case like a multiple linear regression with two regressors. Let

Equation (8.11) gives us the following matrix expressions:

We obtain:

and

In general, we write images and

Problem 8.5

Calculate the estimates of a linear, quadratic and cubic regression.

Solution

In R read y and x data and calculate then

 > x2 <- x*x
> lm( y ∼ x + x2)
> x3 <- x2*x
> lm( y ∼ x + x2 + x3)

Example

We use the data of Example 8.4 y = sack and x = time.

For the linear regression we get:

 > summary(lm( sack ∼time))
Call:
lm(formula = sack ∼ time)
Residuals:
      1       2       3       4       5
 0.0889  0.7676 -0.7809 -0.9200  0.8444
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.21565    0.70588   44.22 2.55e-05 ***
time        -0.05455    0.00394  -13.85 0.000815 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9603 on 3 degrees of freedom
Multiple R-squared:  0.9846,    Adjusted R-squared:  0.9795
F-statistic: 191.8 on 1 and 3 DF,  p-value: 0.0008152

The estimate of the linear regression is: y = 31.21565 − 0.05455x

For the quadratic regression function, we get:

 > time2 <- time*time
> summary(lm( sack ∼time + time2))
Call:
lm(formula = sack ∼ time + time2)
Residuals:
      1       2       3       4       5
-0.5104  0.9455 -0.1824 -0.5371  0.2844

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.183e+01  8.154e-01  39.039 0.000655 ***
time        -7.100e-02  1.369e-02  -5.188 0.035203 *
time2        5.367e-05  4.307e-05   1.246 0.338882
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8824 on 2 degrees of freedom
Multiple R-squared:  0.9913,    Adjusted R-squared:  0.9827
F-statistic: 114.3 on 2 and 2 DF,  p-value: 0.008671

The estimate for the quadratic regression is: y = 31.83 − 0.071x + 0.00005367x².

For the cubic regression we get:

 > time3 <- time2*time
> summary(lm( sack ∼time + time2 + time3))
Call:
lm(formula = sack ∼ time + time2 + time3)
Residuals:
       1        2        3        4        5
-0.11843  0.38525 -0.41305  0.20105 -0.05483

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.141e+01  6.153e-01  51.044   0.0125 *
time        -3.934e-02  2.026e-02  -1.942   0.3027
time2       -2.400e-04  1.685e-04  -1.424   0.3897
time3        6.516e-07  3.680e-07   1.771   0.3272
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6136 on 1 degrees of freedom
Multiple R-squared:  0.9979,    Adjusted R-squared:  0.9916
F-statistic: 158.7 on 3 and 1 DF,  p-value: 0.05828

The estimate of the cubic regression function is:

Conclusion

The cubic regression gives no significant regression coefficients for x² and x³. The quadratic regression has no significant regression coefficient for x².

The linear regression has a very significant regression coefficient for x.

Geometrically minimising in formula (8.5) means that the squares of the differences between the observed values y_i and the regression function to be estimated are summarised and the sum must become a minimum. Be careful not to use the term distances because those are the orthogonal differences between the points in the scatter plot and the regression curve. However, the differences y_i− are lines parallel to the x‐axis in the scatter plot. This is the reason that in model II of regression analysis we have two (or more) estimated regression functions depending on which variable we choose as regressor.

Problem 8.6

Write down the normal equations of case 2 in the example of Problem 8.2.

Solution

In our case (8.5) has the form

The first partial derivatives of S with respect to β₀, β₁, β₂, β₃ zeroing, leads to the normal equations

Example

We use the data from Table 8.3 and fit a third degree polynomial

 > age <- c( 0,6,12,18,24,30,36,42,48,54,60)
> height <- c(77.2,94.5,107.2,116.0,122.4,126.7,129.2,129.9,130.4,130.8,131.2)
> age2 <- age*age
> age3 <- age2*age
> summary(lm( height ∼age + age2 + age3))
Call:
lm(formula = height ∼ age + age2 + age3)
Residuals:
      Min        1Q    Median        3Q       Max
-0.248485 -0.173485 -0.007459  0.194172  0.308392

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  7.743e+01  2.224e-01  348.17 4.26e-16 ***
age          3.159e+00  3.375e-02   93.61 4.18e-12 ***
age2        -6.342e-02  1.347e-03  -47.07 5.11e-10 ***
age3         4.289e-04  1.474e-05   29.11 1.45e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2502 on 7 degrees of freedom
Multiple R-squared:  0.9999,    Adjusted R-squared:  0.9998
F-statistic: 1.699e+04 on 3 and 7 DF,  p-value: 7.059e-14

The estimate of the third degree polynomial is:

We can derive the variances and the covariance of the estimators of the simple linear regression and obtain them using

8.12

and

8.13

Thus the covariance matrix of images is given by

8.14

From the estimators and the estimates of the regression coefficients, we can estimate the regression function (8.6). For this, we replace the parameters by their estimators and estimates respectively. The result is either the estimator of the regression function

8.15

or its estimate

8.16

The variance var(e_i) = σ² in (8.6) is called the residual variance and is estimated by the estimator

8.17

The estimate of the residual variance is

8.18

Problem 8.10

Show how parameters of a general linear regression function y = f(x_i, β) = β₀ + β₁ x₁ + ⋯ + β_k x_k can be estimated.

Solution

 > lm( y ∼ x1 + .... + xk )

Example

As an example, we consider the quadratic (quasi‐linear) regression function and the data of Example 8.3 with y = height and x = age

 > age <- c( 0,6,12,18,24,30,36,42,48,54,60)
> height <- c(77.2,94.5,107.2,116.0,122.4,126.7,129.2,129.9,130.4,130.8,131.2)
> age2 <- age*age
> summary(lm( height ∼age + age2))
Call:
lm(formula = height ∼ age + age2)
Residuals:
    Min      1Q  Median      3Q     Max
-3.5615 -1.8837 -0.0075  1.8090  3.2203
Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 80.761538   1.969299   41.01 1.38e-10 ***
age          2.276092   0.152706   14.90 4.05e-07 ***
age2        -0.024819   0.002451  -10.12 7.74e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.585 on 8 degrees of freedom
Multiple R-squared:  0.9832,    Adjusted R-squared:  0.9791
F-statistic: 234.8 on 2 and 8 DF,  p-value: 7.875e-08

The estimate of the quadratic regression function is:

8.2.1.2 Confidence Intervals and Hypotheses Testing

For confidence estimation and hypotheses testing we must assume that the error terms e of the corresponding regression models are independent and normally distributed (N(0, σ²)).We construct confidence intervals for the regression coefficients and confidence, and prediction intervals for the regression line for the simple linear regression only.

If t(f;P) is the P‐quantile of the t‐distribution with f degrees of freedom and CS(f;P) the P‐quantile of the chi‐squared distribution with f degrees of freedom we obtain the following (1−α)100% confidence intervals for β_j, j = 0, 1 based on n > 2 data pairs

8.22

and σ ²

8.23

Equation (8.23) is not the best solution, there is a shorter interval obtained by α = α₁ + α₂ as a sum of positive components α₁ and α₂ different from α₁ = α₂ = α/2 and then replacing α/2 by α₁ and 1 − α/2 by 1 − α₂ in (8.23). The reason for this is the asymmetry of the chi‐squared distribution.

For E(y) = β₀ + β₁x for any x in the experimental region we need the value of the following quantity K_x:

and then the (1 − α) confidence interval for E(y) = β₀ + β₁x is:

8.24

If we calculate (8.24) for each x‐value in the experimental region, and draw a graph of the realised upper and lower limits, the included region is called a confidence belt.

Problem 8.11

Draw a scatter plot with the regression line and a 95% confidence belt.

Solution

 > summary(lm( y ∼ x))
> plot(x, y)
> abline(lm(y ∼ x ))
> pred.y <- predict( y ∼x)
> pc <- predict(lm( y ∼x ), interval = "confidence")
> matlines(x, pc, lty = c(1,2,2), col = "black")

Example

We consider in Example 8.4 storage in a sack where x = time and y = sack.

We choose α = 0.05 and calculate using R

 > qt(0.975,3)
[1] 3.182446

hence t(3; 0.975) = 3.182.

The lower and upper bounds of the realised 95% confidence belt for β₀ + β₁x can be found in Table 8.5.

Using R:

 > time <- c(1, 60, 124, 223, 303)
> sack <- c(31.25, 28.71, 23.67, 18.13, 15.53) 
> plot( time , sack)
> abline(lm(sack ∼time))
> pred.sack <- predict(lm(sack ∼time))
> pred.sack
       1        2        3        4        5
31.16110 27.94238 24.45089 19.04999 14.68563
> pc <- predict(lm(sack ∼time), interval = "confidence")
> pc
       fit      lwr      upr
1 31.16110 28.92463 33.39757
2 27.94238 26.23068 29.65408
3 24.45089 23.06530 25.83648
4 19.04999 17.34881 20.75118
5 14.68563 12.25001 17.12125
>  matlines(time, pc, lty=c(1,2,2), color= "black")

The scatter‐plot and the estimated regression line with 95%‐confidence bands are shown in Figure 8.6.

Figure 8.6 Scatter‐plot and estimated regression line of the carotene example with 95%‐ confidence bands.

We can test the hypothesis that any regression coefficient in a simple or multiple or in a quasilinear regression is zero by a t‐test.

To test the hypothesis that the element β_j of the vector β in (8.2) is a given value β_j⁽⁰⁾ against a one‐ or two‐sided alternative we write:

against one of the alternatives

(a) H_{0, j} : β_j < β_j⁽⁰⁾, j = 0, … , k
(b) H_{0, j} : β_j > β_j⁽⁰⁾, j = 0, … , k
(c) H_{0, j} : β_j ≠ β_j⁽⁰⁾, j = 0, … , k.

We use the test statistic

8.25

and reject H₀ with a type I error rate α if the realisation t_j of (8.25) is for

(a) t_j < t(n − 2, 1 − α)
(b) t_j > t(n − 2, 1 − α)
(c) .

We take s_j in analogy to (8.4).

Problem 8.12

Test for a simple linear regression function H₀: β₁ = −0.1 against H_A: β₁ ≠ −0.1 with a significance level α = 0.05.

Solution

From the linear regression with R command lm() we can find the estimate of the standard error of b₁. Insert the estimate b₁ and the standard error in (8.25) and obtain the observed t‐value.

Example

We use the data of Example 8.4 with y = sack and x = time and test H₀: β₁ = −0.1.

For the linear regression we get:

 > time <- c(1, 60, 124, 223, 303)
> sack <- c(31.25, 28.71, 23.67, 18.13, 15.53)
> summary(lm( sack ∼time))
Call:
lm(formula = sack ∼ time)
Residuals:
      1       2       3       4       5
 0.0889  0.7676 -0.7809 -0.9200  0.8444
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.21565    0.70588   44.22 2.55e-05 ***
time        -0.05455    0.00394  -13.85 0.000815 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9603 on 3 degrees of freedom
Multiple R-squared:  0.9846,    Adjusted R-squared:  0.9795
F-statistic: 191.8 on 1 and 3 DF,  p-value: 0.0008152

> b1 <- -0.05455
> se1 <- 0.00394
> df <- 3
>  numt1 <- b1-(-0.1)
>  t1 <- numt1/se1
>  t1
[1] 11.53553
>  df <- 3
>  p_value <- 2*pt(abs(t1) , df, lower.tail = FALSE)
> p_value
[1] 0.001398722

Because the p‐value 0.0014 < 0.05 the H₀: β₁ = −0.1 is rejected.

Problem 8.13

To test the null hypothesis H_{0, 1} : β₁ = β₁⁽⁰⁾, the sample size n is to be determined so that for a given risk of the first kind α, the risk of the second kind has at least a given value β so long as, according to the alternative hypothesis, one of the following holds:

(a) β₁ − β₁⁽⁰⁾ ≤ δ,
(b) β₁ − β₁⁽⁰⁾ ≤ − δ, or
(c) | β₁ − β₁⁽⁰⁾| ≤ δ.

Solution

The estimator b₁ is under the normality assumption for y normally distributed with expectation β₁ and variance . SS_x is the sum of squared deviations of the x – values. Therefore

is distributed.

If we replace σ by its estimator we obtain

and this is centrally t‐distributed with n − 2 degrees of freedom. Then

is non‐centrally t‐distributed with n − 2 degrees of freedom and non‐centrality parameter

We get the minimum sample size when we maximise λ and thus SS_x, and this is the case for the D‐optimal design (see Rasch and Schott (2018) and Section 8.3.2).

Let us assume that the minimal sample size n is even, n = 2t, and that the experimental region is given by [x_l, x_u]. This means the D‐optimal design is images and . If n is odd, the D‐optimal design is images and .

Now we solve the equation

8.26

for n with P = 1 – α for the cases (a) and (b) and in case (c) using the relative effect size .

Example

We use for the two‐sided case α = 0.05, β = 0.1 and and obtain:

Problem 8.14

Test the two regression coefficients in a simple linear regression against zero with one‐ and two‐sided alternatives.

Solution

The two pairs of hypotheses are

against one of the alternatives

(a) H_{a, 0} : β₀ < β₀⁽⁰⁾
(b) H_{a, 0} : β₀ > β₀⁽⁰⁾
(c) H_{a, 0} : β₀ ≠ β₀⁽⁰⁾

and

against one of the alternatives

(a) H_{a, 1} : β₁ < β₁⁽⁰⁾
(b) H_{a, 1} : β₁ > β₁⁽⁰⁾
(c) H_{a, 1} : β₁ ≠ β₁⁽⁰⁾.

The test statistics are:

and

respectively.

For s₀² and s₁² see 8.21 and the Example below it.

Example

We test the hypothesis that the slope of the simple linear regression in Example 8.4 for ‘sack’ is zero. Because the carotene content cannot increase during storage the pair of hypotheses is H_{0, 1} : β₁ = β₁⁽⁰⁾, H_{0, 1} : β₁ < β₁⁽⁰⁾. We choose α = 0.05 and obtain using R:

 > time <- c(1, 60, 124, 223, 303)
> sack <- c(31.25, 28.71, 23.67, 18.13, 15.53) 
> summary(lm( sack ∼time))
Call:
lm(formula = sack ∼ time)
Residuals:
      1       2       3       4       5
 0.0889  0.7676 -0.7809 -0.9200  0.8444
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.21565    0.70588   44.22 2.55e-05 ***
time        -0.05455    0.00394  -13.85 0.000815 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9603 on 3 degrees of freedom
Multiple R-squared:  0.9846,    Adjusted R-squared:  0.9795
F-statistic: 191.8 on 1 and 3 DF,  p-value: 0.0008152

In the output of R is the outcome of the two‐sided test of H_0,1: β₁ = 0 against H_a,1: β₁ ≠ 0 given. However, our alternative is H_a,1: β₁ < 0, hence we must use a left‐sided critical region.

Because the estimate of β₁ is −0.05455 (negative), Pr(t(3) < −13.85) is 0.000815/2 < 0.05, hence H_0,1: β₁ = 0 is rejected. We also can check the p‐value using R:

 >  p_value <- pt(- 13.85 , 3)
> p_value
[1] 0.0004073806

Finally, we test the null hypothesis that two regression functions have equal parameters. We restrict ourselves on comparing two regression lines and use as an example the two kinds of storage (sack and glass) of Example 8.4

We consider two independent sets of measurements (y₁₁, ..., y_1n1) and (y₂₁,..., y_2n2) of sizes n₁ and n₂ from two underlying populations, in order to test the null hypothesis

H₀: β₁ = β₂

against one of the following one‐ or two‐sided alternatives

(a) H_A: β₁ > β₂
(b) H_A: β₁ < β₂
(c) H_A: β₁ ≠ β₂

with the two regression models

If we are sure that var(e_1i) = var(e_1i) = σ² we estimate σ² using

The variance of the difference var(b₁ − b₂) is and SS_xj (j = 1,2) analogous to (8.7) as images .

The test statistic is

8.27

with n₁ + n₂–4 degrees of freedom.

H₀ is rejected if:

(a) t > t(n₁ + n₂–4; 1−α),
(b) t < −t(n₁ + n₂–4; 1−α) and
(c) |t|>t(n₁ + n₂–4; 1−)

and otherwise accepted.

If we are not sure that the two variances are equal, or if we know that they are unequal, we use the Welch test shown in Section 3.3.1.2.

We replace the test statistics (8.24) by

8.28

and H₀ is rejected if |t*| is larger than the corresponding quantile of the central t‐distribution with approximate degrees of freedom f, the so‐called Satterthwaite f,

Problem 8.15

Test the hypothesis that the slopes in the two types of storage in Example 8.4 are equal with significance level α = 0.05 using the Welch test.

Solution and Example

 > time <- c(1, 60, 124, 223, 303)
> sack <- c(31.25, 28.71, 23.67, 18.13, 15.53)
> glass <- c(31.25 ,30.47,20.34, 11.84, 9.45)
> ms <- lm( sack ∼time)
> summary(ms)
Call:
lm(formula = sack ∼ time)
Residuals:
      1       2       3       4       5
 0.0889  0.7676 -0.7809 -0.9200  0.8444
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.21565    0.70588   44.22 2.55e-05 ***
time        -0.05455    0.00394  -13.85 0.000815 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9603 on 3 degrees of freedom
Multiple R-squared:  0.9846,    Adjusted R-squared:  0.9795
F-statistic: 191.8 on 1 and 3 DF,  p-value: 0.0008152

> mg <- lm ( glass ∼ time)
> summary(mg)
Call:
lm(formula = glass ∼ time)
Residuals:
      1       2       3       4       5
-0.8544  3.abs(Welch1434 -1.8038 -2.2868  1.8016
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.18536    2.00597  16.045 0.000527 ***
time        -0.08098    0.01120  -7.233 0.005450 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.729 on 3 degrees of freedom
Multiple R-squared:  0.9458,    Adjusted R-squared:  0.9277
F-statistic: 52.32 on 1 and 3 DF,  p-value: 0.00545

> bs <--0.05455
> SEs<-0.00394  
> dfs <- 3
> bg<--0.08098    
> SEg <-0.01120
> dfg<-3
> Welch_t<-(bs-bg)/sqrt(SEs^2+SEg^2)
> Welch_t
[1]2.226095
> f <- (SEs^2 + SEg^2)^2/((SEs^4/dfs) + (SEg^4/dfg))
> f
[1] 3.731319
> p <- pt(abs(Welch_t),f)
> p_value <- 2*(1-p)
> p_value
[1] 0.09485371

Because the p‐value = 0.09485 > 0.05, we cannot reject the hypothesis that the slopes in the two types of storage are equal.

8.2.2 Intrinsically Non‐Linear Regression

In this section, we give estimates for parameters in such regression functions, which are non‐linear in x and in the parameter vector β.

Definition 8.2

The regression function f(x, ) in a regressor x ∈ R and the parameter vector (we from now on use for the parameter vector and β_j for one of its components, only for examples we use)

which is non‐linear in x and in at least one of the β_j, j = 0, … , k and cannot be made linear or quasi‐linear by any continuous transformation of the non‐linearity parameter is called an intrinsically non‐linear regression function. We speak about k‐parametric intrinsically non‐linear regression functions.

We consider the intrinsically non‐linear regression model

8.29

and assume that the e_i are independently N(0, σ²) distributed. Normal distribution we need for the result found by Jennrich (1969).

One important application of intrinsically non‐linear regression is growth research. Many special functions were first used as growth functions and researchers used a special notation for the regression coefficients. In place of β₀, β₁, … they used α, β, …. This leads to a conflict with our notation for the risks of the first and the second kind in hypotheses testing. Because the notation of growth research is so strongly accepted in applications, we in this section use the common notations and write our risks as α* and β*.

For linear models a general theory of estimating and tests was possible and by the Gauß–Markoff theorem we could verify optimal properties of the least squares estimator . A corresponding theory for the general (non‐linear) case does not exist.

For intrinsically non‐linear problems, the situation can be characterised as follows.

The existing theoretical results are not useful for solutions of practical problems; many results about distributions of the estimators are asymptotic.
The practical approach leads to numerical problems for iterative solutions. We know scarcely anything about properties of the distributions of the estimators. The application of the methods of the intrinsically non‐linear regression is more a problem of numerical mathematics than of mathematical statistics.
A compromise could be to go over to quasilinear approximations for the non‐linear problem and conduct the parameter estimation for the approximative model. However, by this we lose the interpretability of the parameters, desired in growth research and other applications by practitioners.

We do not repeat the general theory of the intrinsically non‐linear regression – this can be found in chapter 9 of Rasch and Schott (2018) – but restrict ourselves to some important functions used in applications. In the theoretical part of this section we use β; only in the examples do we use α, β, ….

The least squares estimate of β based on n observations is the magnitude minimises ; is the corresponding least squares estimator and its distribution is unknown. We estimate σ² in analogy to the linear case by

8.30

and write . This estimator is biased.

8.2.2.1 The Asymptotic Distribution of the Least Squares Estimators

For the asymptotic distribution of the least squares estimators, i.e. the distribution of the estimators if the number of observation n tends to infinity, we use results of an important theorem of Jennrich (1969). We need some notation – readers with a less mathematical background may skip the next part.

We assume that the function f(x, β) in Definition 8.2 is twice continuously differentiable with respect to β. For the n regressor values we obtain f(x_i, β), i = 1, …, n. We consider the abbreviations:

8.31

We assume that F^TF is positive definite and is the asymptotic information matrix.

Jennrich (1969) showed that is asymptotically

distributed.

We call

8.32

for each n the asymptotic covariance matrix of (a bit strange but generally in use).

We estimate by

8.33

In we replace the parameters by their estimators and is given by (8.30).

Based on the estimated asymptotic covariance matrix we can derive test statistics to test the hypothesis that a component β_j of β equals β_j0 as

8.34

and this statistic is asymptotically t‐distributed with n − k degrees of freedom.

Analogous to (8.22) we obtain asymptotic confidence intervals for a component β_j as

8.35

From now on we write in the special functions below or .

How good the corresponding tests (or confidence intervals) for small n are was investigated by simulation studies, which followed the scheme below:

For the role simulations play in statistics, see Pilz et al. (2018).

For simulations reported for special functions resulting from a research project of Rasch at the University of Rostock during the years 1980 to 1990, see Rasch and Schimke (1983) and Rasch (1990).

The hypothesis H_j0 : β_j = β_j0 is tested against H_j0 : β_j ≠β_j0 with the test statistic (8.32). For each of 10 000 runs we added pseudorandom numbers e_i from a normal distribution with expectation 0 and variance σ² to the values of the function f(x_i, θ) at n fixed support points inx_i [x_l, x_u]. Then for each i

is a simulated observation. We calculate then from the n simulated observation the least squares estimates , the estimate of σ² and the realisation of the test statistic (8.34). We then calculated 10 000 test statistic t_j and counted how often t_j fulfilled

where α_nom is the nominal confidence coefficient (risk of the first kind) and α_act is the actual confidence coefficient (risk of the first kind) reached by the simulation

respectively (the null hypothesis in the simulation was always correct), divided by 10 000 giving an estimate of α_act. Further 10 000 runs to test with three Δ_j values have been performed to get information about the power.

We now consider a two‐parametric and four three‐parametric intrinsically non‐linear regression functions important in applications and report the corresponding simulation results, changing the parameter symbols as mentioned above.

8.2.2.2 The Michaelis–Menten Regression

Michaelis–Menten kinetics is one of the best‐known models of enzyme kinetics. It is named after German biochemist Michaelis and Canadian physician Menten. The model takes the form of an equation relating reaction rate y

with the concentration x of a substrate. Michaelis and Menten (1913) published the data of Example 8.5 and an equation of the form

8.36

Problem 8.16

Determine the estimates of the Michaelis–Menten regression and draw a plot of the estimated regression line.

Solution

The partial derivatives with respect to α and β of

after zeroising putting are after some manipulations:

and

respectively. Here we put . Because S is a convex function, we really get a minimum.

The random residual variance is

8.38

with approximately df = n – 2.

Example

We use the data of Table 8.6 and calculate the estimates a and b, and the estimate

of in (8.38).

With R we must insert start values for the function y = . For x → ∞ we get . For x = 1 we get . From the data we guess 1.2 = and 0.025 = hence 1.2b/(1 + b) = 0.025, b = 0.0213 and a = 0.026. The R‐command nls( ) gives a direct solution for the parameter estimates a and b.

 > x <- c(0,1,6,17,27,38,62,95,1372,1440)
> y <- c(0,0.025,0.117,0.394,0.537,0.727,0.877,1.023,1.136,1.178)
> model <- nls(y∼ a*x/(1 + b*x),start=list(a = 0.026, b = 0.0213))
> model
Nonlinear regression model
  model: y ∼ a * x/(1 + b * x)
   data: parent.frame()
      a       b
0.04135 0.03392
 residual sum-of-squares: 0.02933
Number of iterations to convergence: 6
Achieved convergence tolerance: 5.338e-06

Hence the fitted function is and = 0.02923.

The values of Table 8.3 and the corresponding regression curve gives Figure 8.7.

Figure 8.7 Scatter‐plot of the observations from Table 8.3 and the fitted function.

We now derive the asymptotic covariance matrix of the estimators. We first calculate

and

and then obtain

The determinant is images = images .

The inverse of F^TF is

8.39

We obtained asymptotic confidence estimations and tests as described in Section 8.2.2.1

We test

with the test statistic

8.40

and

is tested with

8.41

Problem 8.17

Test the hypotheses for the parameters of f_M(x).

Solution

In R we continue the calculation for Problem 8.16 and calculate first for i = 1, …, n. Calculate then Z₂ = Σ , Z₃= Σ and Z₄= Σ , then the estimate of Δ is (Z₂ · Z₄ – (Z₃)²).

Further we insert these and the estimate a = 0.04135 and s_n = √0.02933/8 = 0.06055 in t_α for (8.40) and the estimate b = 0.03392 and s_n = √0.02933/8 = 0.06055 in t_β for (8.41).

Example

We test with the data of Example 8.5 with significance level α = 0.05.

Continuation of the R program of Problem 8.16:

 > z <-c(x/(1 + 0.03392*x))
> z2 <- c(z*z)
> Z2 <- sum(z2)
> z3 <-c(z*z2)
> Z3 <- sum(z3)
> z4 <-c(z2*z2)
> Z4 <- sum(z4)
> Delta <- Z2*Z4 - (Z3)^2
> SSRes <- 0.02933
> n <- length(y)
> df <- n-2
> df
[1] 8
> sn <- sqrt(SSRes/df)
> sn
[1] 0.06054957
> numeratorta <- (0.04135 - 0.05)*sqrt(Delta)
> denominatorta <- sn*sqrt(Z4)
> ta <- numeratorta/denominatorta
> ta
[1] -1.937028
> p_valueta <- 2*(1-pt(abs(ta), df))
> p_valueta
[1] 0.08875258
> numeratortb <- (0.0392 - 0.05)*sqrt(Delta)
> denominatortb <- sn*sqrt(Z2)/(0.04135^2)
> tb <- numeratortb/denominatortb
> tb
[1] -0.1018714
> p_valuetb <- 2*(1-pt(abs(tb), df))
> p_valuetb
[1] 0.9213658

Because the p‐value for the test statistic t_a of (8.40) is 0.08875 > 0.05 H_0a is not rejected.

Because the p‐value for the test statistic t_b of (8.41) is 0.92137 > 0.05 H_0b is not rejected.

Problem 8.18

Construct (1−α*) confidence intervals for the parameters of f_M(x).

Solution

We obtain from (8.35)

and

Continuing the R results of Problem 8.17 we calculate for a the Q_a with alpha = 0.05:

 > Qa <- qt(0.025, df, lower.tail = FALSE)*(sn*sqrt(Z4))/sqrt(Delta)

The 0.95‐confidence limits for a are then a – Q_a and a + Q_a.

 > Qb <- qt(0.025, df, lower.tail = FALSE)*sn*sqrt(Z4))/(a*sqrt(Delta))

The 0.95‐confidence limits for b are then b – Q_b and b + Q_b.

Example

We construct 0.95‐confidence intervals with the data of Example 8.5 for α and β.

Continuing the R results of Problem 8.17.

 > a<- 0.04135
> Qa <- qt(0.025, df, lower.tail = FALSE)* (sn*sqrt(Z4))/sqrt(Delta)
> Qa
[1] 0.0102977
>  LowerCLa <-  a-Qa
> LowerCLa
[1] 0.0310523
> UpperCLa <- a + Qa
> UpperCLa
[1] 0.0516477
> b <- 0.03392
> Qb <- qt(0.025, df, lower.tail = FALSE)* (sn*sqrt(Z2))/(a*sqrt(Delta))
> Qb
[1] 0.01010897
> LowerCLb <- b-Qb
> LowerCLb
[1] 0.02381103
> UpperCLb <- b + Qb
> UpperCLb
[1] 0.04402897

The 0.95‐confidence interval for a is: [0.0310523; 0.0516477].

The 0.95‐confidence interval for b is: [0.02381103; 0.04402897]

8.2.2.3 Exponential Regression

The model

8.42

is an exponential regression model where γ is the non‐linearity parameter. αβγ ≠ 0 implies that no parameter is zero. Brody (1945) used it as a growth function – therefore model (8.42) is sometimes called Brody's model.

The least squares estimates a, b, c of α, β, γ are found by solving the first partial derivatives of y equal to zero and solve the simultaneous equations

8.43

For the solution of these simultaneous equations we check whether there is a minimum. This means that the matrix of the second partial derivatives must have positive eigenvalues.

From Rasch and Schott (2018, Chapter 9) we know that

8.44

We will not use this way in this section because R finds the minimum directly, as shown in Problem 8.19

Minimising images gives us the estimates a, b, c of α, β, γ.

Problem 8.19

Show how the parameters of f_E(x) can be estimated.

Solution

To use the R‐command nls() we need initial values. Because for x = 0 = α + β we use y(min) = α + β, For γ < 0 is = α we use it for y(max); γ we must find by systematic search. The R‐command nls( ) gives a direct solution. In R we use A, B, C for, respectively, a, b, c.

Example

Let us consider a numerical example with the growth of leaf surfaces of oil palms y in square metres and age x in years observed in Indonesia by Breure.

The data in Table 8.7 are available.

Table 8.7 Leaf surfaces of oil palms y in square metres and age x in years

Age	Surface
1	2.02
2	3.62
3	5.71
4	7.13
5	8.33
6	8.29
7	9.81
8	11.3
9	12.18
10	12.67
11	12.62
12	13.01

First we input the data and make a plot.

 > age <- c(1,2,3,4,5,6,7,8,9,10,11,12)
> surface <-
c(2.02,3.62,5.71,7.13,8.33,8.29,9.81,11.3,12.18,12.67,12.62,13.01)
> growth <- cbind(age,surface)

The plot is given below and from the data we find for x = 1 y = 2.02 ≈ A + Bexp(C). From the plot we see that C < 0, hence we choose for C as start value a negative value such as −0.1.

Now 2.02 ≈ A + B* 0.905. Further, y(max) = 13.01 ≈ A. We find B ≈ (2.02−13.01)/0.905 = −12.144. We use as starting values A = 13, B = −12 and C = −0.1.

 >  model <- nls(surface ∼ A + B*exp(C*age), start=list(A = 13, B =-12,  C = - 0.1))
> model
Nonlinear regression model
  model: surface ∼ A + B * exp(C * age)
   data: parent.frame()
      A       B       C
 16.479 -16.650  -0.138
 residual sum-of-squares: 1.801
Number of iterations to convergence: 5
Achieved convergence tolerance: 3.579e-07

The fitted exponential function is = 16.479–16.650*exp(−0.138*age) Figure 8.8.

Figure 8.8 Scatter‐plot of the data and the fitted exponential function in the example of Problem 8.19.

 > plot(age,surface)
> fittedsurf  <-  c(16.479-16.650*exp(-0.138*age))

Problem 8.21

Test the hypotheses for all three parameters of f_E(x) with significance level α = 0.05.

Solution

In R we first estimate the parameters of the exponential function. Then we construct the test statistic t_α for (8.46), t_β for (8.47) and t_γ for (8.48).

Example

We test in Example 8.3

In Example 8.3 let we choose age to be x and height to be y. Because for x = 0 = α + β we use y(min) = α + β, For γ < 0 is = α we use it for y(max), γ we must find by systematic search. From the plot in Figure 8.3 we start with y(min) = a + b = 77 and y(max) = a = 132 . Hence b = −55. The R‐command nls( ) gives a direct solution for the parameter estimates a and b. We start with a guess for c a negative value such as −0.1.

 > x <- c( 0,6,12,18,24,30,36,42,48,54,60)
>  y <-c(77.2,94.5,107.2,116.0,122.4,126.7,129.2,129.9,130.4,130.8,131.2)
>  n <- length(y)
>  n
[1] 11
> model <- nls(y ∼ a + b*exp(c*x), start=list(a = 132, b =-55, c = - 0.1))
> model
Nonlinear regression model
  model: y ∼ a + b * exp(c * x)
   data: parent.frame()
       a        b        c
132.9622 -56.4210  -0.0677
 residual sum-of-squares: 6.084
Number of iterations to convergence: 6
Achieved convergence tolerance: 1.58e-06

The estimated exponential function is y = 132.9622 −56.4210e^−0.677.

The estimate s_n² = 6.084 with approximately df = n – 3 = 11–3 = 8.

 > SSRes <- 6.084
> df  <- n-3
> sn <- sqrt(SSRes/df)
> sn
[1] 0.8720665
> a <- 132.9622
> b <- -56.4210
> c <- -0.0677
> cc <- c(exp(2*c*x))
> C <- sum(cc)
> ee <- c(x^2*exp(2*c*x))
> E <- sum(ee)
> dd <- c(x*exp(2*c*x))
> D <- sum(dd)
> aa <- c(exp(c*x))
> A <- sum(aa)
> bb <- c(x*exp(c*x))
> B <- sum(bb)
>  Delta <- n*(C*E- D^2)-A*(A*E-D*B)+B*(A*D-C*B)
> Delta
[1] 358.7337
> num_ta <- (a-130)*sqrt(Delta)
> Sa <- C*E-D^2
> den_ta <- sn*sqrt(Sa)
> SEa <- den_ta/sqrt(Delta)
> SEa
[1] 0.5914927
>ta <- num_ta/den_ta
> ta
[1] 5.008008
> p_valuea <- 2*(1-pt(abs(ta),df))
> p_valuea
[1] 0.001042351
> num_tb <- (b-(-65))*sqrt(Delta)
> Sb <- n*E-B^2
> den_tb <- sn*sqrt(Sb)
> SEb <- den_tb/sqrt(Delta)
> SEb
[1] 0.8734449
> tb <- num_tb/den_tb
> tb
[1] 9.822027
> p_valueb <- 2*(1-pt(abs(tb),df))
> p_valueb
[1] 9.704446e-06
> num_tc <- (c-(-0.055))*abs(b)*sqrt(Delta)
> Sc <- n*C-A^2
> den_tc <- sn*sqrt(Sc)
> SEc <- den_tc/(sqrt(b^2)*sqrt(Delta))
> SEc
[1] 0.002707207
> tc <- num_tc/den_tc
> tc
[1] -4.691181
> p_valuec <- 2*(1-pt(abs(tc),df))
> p_valuec
[1] 0.001559142

Because the p‐value for t_α is 0.001042351 < 0.05, H_0a is rejected.

Because the p‐value for t_β is 9.704446e‐06 < 0.05, H_0b is rejected.

Because the p‐value for t_γ is 0.001559142 < 0.05, H_0c is rejected.

Confidence intervals with a nominal confidence coefficient 1‐ are defined as follows:

Parameter α:

8.49

Parameter β:

8.50

Parameter γ:

8.51

Problem 8.22

Construct 0.95‐confidence intervals for all three parameters of f_E(x).

Solution

Continuation of the R program for Problem 8.21 we calculate in R (8.49)–(8.51).

Example

We use the data of Example 8.3 and continue the R program of Problem 8.21.

 > t_value <- qt(0.025, df, lower.tail = FALSE)
> t_value
[1] 2.306004
> Qa <- ((sn*sqrt(C*E-D^2))/sqrt(Delta))*t_value
> Qa
[1] 1.363985
> lowerCLa <- a-Qa
> lowerCLa
[1] 131.5982
> upperCLa <- a + Qa
> upperCLa
[1] 134.3262
> Qb <- ((sn*sqrt(n*E-B^2))/sqrt(Delta))*t_value
> Qb
[1] 2.014168
> lowerCLb <- b-Qb
> lowerCLb
[1] -58.43517
> upperCLb <- b+Qb
> upperCLb
[1] -54.40683
> Qc <- (sn*sqrt(((n*C-A^2)/b^2))/sqrt(Delta))*t_value
> Qc
[1] 0.006242831
> lowerCLc <- c-Qc
> lowerCLc
[1] -0.07394283
> upperCLc <- c+Qc
> upperCLc
[1] -0.06145717

The 0.95‐confidence interval for α is: [131.5982; 134.3262].

The 0.95‐confidence interval for β is: [−58.43517; −54.40683].

The 0.95‐confidence interval for γ is: [−0.07394283; −0.06145717].

Using the simulation experiment described in Section 8.2.2.1, we found that the empirical variances do not differ strongly from the main diagonal elements of the asymptotic covariance matrix for n = 4.

The choice of the denominator n − 3 in estimate s² of σ² is analogous to the linear case. There, n − 3 (or in general n − p) is the number degrees of freedom of the χ²‐distribution of the numerator of s². If we compare expectation, variance, skewness and kurtosis of a χ²‐distribution with n − 3 degrees of freedom with the corresponding empirical values from the simulation experiment, we see that even for the smallest possible n = 4 a good accordance is found. This means that n − 3 is a good choice for the denominator in the estimator of σ².

Table 8.8 shows the relative frequencies of confidence estimations and tests respectively (with α_nom = 0.05 and α_nom = 0.1 for a special parameter configuration of the exponential regression from 10 000 runs. As we can see already with n = 4 a sufficient alignment is found between α_nom and α_act. Therefore, the tests above can be used as approximative α_nom tests and the corresponding confidence intervals as approximative (1−α_nom)‐confidence intervals.

Table 8.8 Relative frequencies of 10 000 simulated samples for the correct acception 1 − α_act of H₀ : α = 0. β = − 50, γ = − 0.05, n = 10(−1)4, and α_nom = 0.05 and α_nom = 0.1 for the exponential regression.

	H_0α : α = 0		H_0β : β = − 50		H_0γ : γ = − 0.05
n	α_nom = 0.05	α_nom = 0.1	α_nom = 0.05	α_nom = 0.1	α_nom = 0.05	α_nom = 0.1
10	95.26	90.63	95.25	90.55	95.24	90.53
9	94.76	88.33	95.13	90.04	94.72	88.38
8	95.46	90.66	95.33	90.61	95.07	90.21
7	95.38	90.22	95.25	90.28	95.39	90.41
6	94.98	90.22	94.89	88.76	94.93	90.24
5	95.01	90.21	95.14	90.10	94.94	90.23
4	94.93	88.92	95.14	88.86	95.09	88.86

8.2.2.4 The Logistic Regression

The model

8.52

is the model of a logistic regression where γ is the non‐linearity parameter. αβγ ≠ 0 implies that no parameter is zero. The term logistic regression is not uniquely used – see Chapter 11 for another use of this term.

The first and second derivative to x are

and

At the inflection point (x_ω, η_ω), the numerator of the second derivative has to be zero so that

The second derivative changes at x_ω its sign and we really obtain an inflection point.

Now we show that the same curve is described by more than one function. The transition from one function to another is called reparametrisation.

Because we have

8.53

The logistic regression function is written as the three‐parametric hyperbolic tangent‐regression function.

Problem 8.23

Show how the parameters of f_L(x) can be estimated.

Solution

We use the R‐command nls() to minimise

Example

We use the data of Example 8.1 and fit a logistic regression function to the hemp data.

To obtain initial valued for searching the minimum we see that for x = 0: and use for the value at the start of the growth. For γ < 0 we get and we use the largest value of y as the initial value for a. For c we start as a guess after several trials with the initial value −1. From the data we take as the largest value 121, hence the start value is a = 121. We take after several trials the start value c = −1; for x = 1 we obtain 8.3 = hence 8.3 = hence for b we find 36.9 and we take as the start value b = 37.

 > x<-  c(1,2,3,4,5,6,7,8,9,10,11,12,13,14)
> y <- c(8.3,15.2,24.7,32,39.3,55.4,69,84.4,98.1,107.7,112,116.9,119.9,121.1)
> model <- nls(y ∼ a/(1+b*exp(c*x)),start=list(a=121,b =37,c = -1))
> model
Nonlinear regression model
  model: y ∼ a/(1 + b * exp(c * x))
   data: parent.frame()
       a        b        c
126.1910  19.7348  -0.4607
 residual sum-of-squares: 40.75
Number of iterations to convergence: 8
Achieved convergence tolerance: 7.927e-06

The fitted logistic function is y = .

Problem 8.25

Test the hypotheses with significance α^* = 0.05 for each of the three parameters of f_L(x) with R. The null‐hypotheses values are respectively α₀ = 15, β₀ = 7 and γ₀ = −0.05.

Solution

Using the estimates a, b and c we calculate the estimates for c_ij and insert the values in (8.58)–(8.60).

Example

We use the data from Example 8.6 and estimate the logistic function with R.

We use the data of Example 8.6 and fit a logistic regression function to the oil palm data.

To obtain initial valued for searching the minimum we see that for x = 0: and we use for the value at the start of the growth. For γ < 0 we get and we use the largest value of y as initial value for a. For c we start after several trials as a guess, with the initial value −1. From the data we take as the largest value 13.01, hence the start value for a = 13. We would take after several trials as a start value for c = −1; for x = 1 we obtain 2.02 = hence b = 14.77 and we take as a start value b = 15.

 > x <- c(1,2,3,4,5,6,7,8,9,10,11,12)
> y <- c(2.02,3.62,5.71,7.13,8.33,8.29,9.81,11.3,12.18,12.67,12.62,13.01)
> model <- nls(y ∼ a/(1+b*exp(c*x)),start=list(a=13,b=15,c= -1))
> model
Nonlinear regression model
  model: y ∼ a/(1 + b * exp(c * x))
   data: parent.frame()
      a       b       c
13.4470  5.8508 -0.4239
 residual sum-of-squares: 2.905

Number of iterations to convergence: 7
Achieved convergence tolerance: 1.518e-06

The estimate of the logistic function is = .

Now we calculate the estimate for the asymptotic covariance matrix of the estimators.

 > a <‐13.4470
> b <- 5.8508
> c <- -0.4239
> z <- c(1/(1+b*exp(c*x)))
> aa <- c(z^2)
> A <- sum(aa)
> bb <-c((exp(c*x)*z^3))
> B <- sum(bb)
> cc <- c((z^3*x*exp(c*x)))
> C <- sum(cc)
> dd <-  c((z^4*exp(2*c*x)))
> D <- sum(dd)
> ee <- c((z^4*x*exp(2*c*x)))
> E <- sum(ee)
> gg <- c((z^4*x^2*exp(2*c*x)))
> G <- sum(gg)
> P1 <- A*D*G
> P2 <- 2*B*C*E
> P3 <- (C^2)*D
> P4 <- A*(E^2)
> P5 <- (B^2)*G
> Delta <- P1 + P2 - P3 - P4 - P5
> Delta
[1] 0.000582551
> SSRes <- 2.905
> n <- length(y)
> df <- n-3
> sn <- sqrt(SSRes/df)
> sn
[1] 0.5681354
> SEa <- (sn*sqrt((D*G-E^2))/sqrt(Delta))
> SEa
[1] 0.05991013
> SEb <- (sn*sqrt((A*G-C^2)/(a^2))/sqrt(Delta))
> SEb
[1] 0.102548
> SEc <- sn*sqrt((A*D-B^2)/((a^2)*(b^2)))/sqrt(Delta)
> SEc
[1] 8.531211e-05

Problem 8.26

Estimate the asymptotic covariance matrix for this example.

Solution

Use the estimates of the parameters and insert them into (8.56) and multiply this by SS_res/df.

Example

We continue with the R program of Problem 8.25.

 > cova_a <- SEa^2
> cova_a
[1] 0.003589223
> covb_b <- SEb^2
> covb_b
[1] 0.0105161
> covc_c <- SEc^2
> covc_c
[1] 7.278157e-09
> sn2 <-sn^2
> sn2
[1] 0.3227778
> cova_b <- -(sn2*(E*C - B*G))/(a*Delta)
> cova_b
[1] 0.004791069
> cova_c <- -(sn2*(B*E - C*D))/(a*b*Delta)
> cova_c
[1] -2.260269e-08
> covb_c <- (sn2*(B*C - A*E))/(a^2*b*Delta)
> covb_c
[1] 1.259186e-08

This means the asymptotic covariance matrix is

Problem 8.27

Construct (1 − α*) confidence intervals for all three parameters of f_L(x).

Solution

Confidence intervals with a nominal confidence coefficient 1− are defined as follows.

Parameter α:

8.61

Parameter β:

8.62

Parameter γ:

8.63

Example

We use the data from Example 8.6 and construct 0.95‐confidence intervals for all parameters a, b and c.

 > n <- length(y)
> df <- n-3
>  t_value <- qt(0.025, df, lower.tail = FALSE)
> t_value
[1] 2.262157
>  Qa <- SEa* t_value
>  lowerCLa <- a-Qa
>  lowerCLa
[1] 12.19065
> upperCLa <- a+Qa
> upperCLa
[1] 14.70335
>  Qb <- SEb*t_value
>  lowerCLb <- b-Qb
>  lowerCLb
[1] 3.55785
> upperCLb<- b+Qb
> upperCLb
[1] 8.14375
>  Qc <- SEc*t_value
>  lowerCLc <- c-Qc
>  lowerCLc
[1] -0.5411389
> upperCLc<- c+Qc
> upperCLc
[1] -0.3066611

The 0.95‐confidence interval for α is: [12.19065; 14.70335].

The 0.95‐confidence interval for β is: [3.55785; 8.14375].

The 0.95‐confidence interval for γ is: [−0.5411389; −0.3066611].

In simulation experiments as described in Section 8.2.2.1 for 15 (α, β, γ) combinations, x_i values in [0, 65], normally distributed e_i and α_nom = 0.05 and 0.1 were performed. For all parameter combinations, the result was that tests and confidence estimations based on the asymptotic covariance matrix already is recommended for n > 3.

8.2.2.5 The Bertalanffy Function

The regression function f_B(x) of the model

8.64

is called the Bertalanffy function and was used by Bertalanffy (1929) to describe the growth of the body weight of animals. This function has two inflection points if α and β have different signs and are located at

respectively

Problem 8.28

Show how the parameters of f_B(x) can be estimated.

Solution

We use the R‐command nls() to minimise

Example

We use the oil palm data of Example 8.6 and fit a Bertalanffy regression function. To obtain initial value a,b,c for the parameters we consider [α + βe⁰]³ = [α + β]³ and use for [a + b]³ the smallest y value. Because we use a³ = y_max and as an initial value for a we take (y_max) ^1/3 . Further we use after several trials c = − 0.2.

From the data we take as the largest value 13.01, hence the start value for a³ = 13, hence a ≈ 2.3513 and we take a = 2.4. For x = 1 we obtain 2.02 = [a + be^c]³ = (13 + b·0.819)³ hence b ≈ 0.903 and we take as start value b = 1.

 >x <- c(1,2,3,4,5,6,7,8,9,10,11,12)
> y <- c(2.02,3.62,5.71,7.13,8.33,8.29,9.81,11.3,12.18,12.67,12.62,13.01)
> model <- nls(y ∼ (a+b*exp(c*x))^3,start=list(a= 2.4,b=1,c= -0.2))
> model
Nonlinear regression model
  model: y ∼ (a + b * exp(c * x))^3
   data: parent.frame()
      a       b       c
 2.4435 -1.4307 -0.2373
 residual sum-of-squares: 1.963
Number of iterations to convergence: 16
Achieved convergence tolerance: 1.493e-06

The estimated regression function is [2.4435 − 1.4307e^−0.2373x]³.

In Figure 8.9 we find the graph of the fitted regression function.

Figure 8.9 Fitted regression function of the example in Problem 8.28.

With θ = (θ₁, θ₂, θ₃)^T = (α, β, γ)^T and with the notation of Definition 8.1 we obtain

Again we use for writing

abbreviations like:

Now |F^TF| = 9³{c₁₁(c₂₂c₃₃ − c₂₃²) − c₁₂(c₁₂c₃₃ − c₂₃c₁₃) + c₁₃(c₁₂c₂₃ − c₂₂c₁₃)} = 9³Δ.

The asymptotic covariance matrix is therefore

8.65

We test

with the test statistic

8.66

Further

is tested with

8.67

and

8.68

The are obtained from the c_ij by replacing all parameters by their estimators. Further is

8.69

with approximately df = n – 3.

Problem 8.29

Test the hypotheses with significance level α^* = 0.05 for each of the three parameters of f_B(x) using R. The null‐hypotheses values are respectively α₀ = 3, β₀ = −1, and γ₀ = −0.1!

Solution

Using the estimates a, b, and c we calculate the estimates for c_ij and insert the values in (8.66)–(8.68).

Example

We use the oil palm data from Example 8.6 and we continue with the R commands of Problem 8.28 (Figure 8.10).

Figure 8.10 Fitted regression function of the example in Problem 8.32.

 > a <- 2.4435
> b <-  -1.4307
> c <- -0.2373 
> z <- c(( a + b*exp(c*x))^4)
> A <- sum(z)
> bb <- c(z*exp(c*x))
> B <- sum(bb)
> cc <- c(x*z*exp(c*x)) 
> C <- sum(cc)
> dd <- c(z*exp(2*c*x))
> D <- sum(dd)
> ee <- c(z*x*exp(2*c*x))
> E <- sum(ee) 
> gg <- c(z*x^2*exp(2*c*x))
> G <- sum(gg)
> Delta <- A*D*G + 2*B*C*E -(C^2)*D - (E^2)*A -(B^2)*G
> Delta
[1] 13926.38 
> n <- length(y)
> SSRes <- 1.963
> df <- n-3
> sn <- sqrt(SSRes/df)
> sn
[1] 0.4670237 
> parta <- D*G-E^2
> SEa <- sn*sqrt((parta))/(sqrt(9)*sqrt(Delta))
> SEa
[1] 0.0434715
> partb <- A*G-C^2
> SEb <- sn*sqrt((partb))/(sqrt(9)*sqrt(Delta))
> SEb
[1] 0.08784467
> partc <- (A*D-B^2)/(b^2)
> SEc  <- sn*sqrt((partc))/(sqrt(9)*sqrt(Delta))
> SEc
[1] 0.03167163
> ta <- (a - 3)/SEa
> ta
> ta
[1] -12.80149
> p_valuea <- 2*(1-pt(abs(ta),df))
> p_valuea
[1] 4.431017e-07
> tb <- (b-(-1))/SEb
> tb
[1] -4.902973
> p_valueb <- 2*(1-pt(abs(tb),df))
> p_valueb
[1] 0.0008441792
> tc <- (c-(-0.1))/SEc
> tc
[1] -4.33511
> p_valuec <- 2*(1-pt(abs(tc),df))
> p_valuec
[1] 0.001890926

Because the p‐value for t_a is 4.431017e‐07 < 0.05 H_0a is rejected.

Because the p‐value for t_b is 0.0008441792 < 0.05 H_0b is rejected.

Because the p‐value for t_c is 0.001890926 < 0.05 H_0c is rejected.

Confidence intervals with a nominal confidence coefficient 1− are defined as follows:

Parameter α:

8.70

Parameter β:

8.71

Parameter γ.

8.72

Problem 8.30

Construct (1 − α*)‐confidence intervals for all three parameters of f_B(x).

Solution

Continuing with the R program of Problems 8.28 and 8.29 we insert the SEa into (8.70), the SEb into (8.71) and SEc into (8.72).

Example

We use the data from Example 8.6 and construct 0.95‐ confidence intervals for all parameters.

 > t_value <- qt(0.025, df, lower.tail = FALSE)
> t_value
[1] 2.262157
>  Qa <- SEa*t_value
>  lowerCLa <- a-Qa
>  lowerCLa
[1] 2.345161
> upperCLa <- a+Qa
> upperCLa
[1] 2.541839
>  Qb <- SEb*t_value
>  lowerCLb <- b-Qb
>  lowerCLb
[1] -1.629418
> upperCLb<- b+Qb
> upperCLb
[1] -1.231982
>  Qc <- SEc*t_value
>  lowerCLc <- c-Qc
>  lowerCLc
[1] -0.3089462
> upperCLc<- c+Qc
> upperCLc
[1] -0.1656538

The 0.95‐confidence interval for α is: [2.345161; 2.541839].

The 0.95‐confidence interval for β is: [−1.629418; −1.231982].

The 0.95‐confidence interval for γis: [−0.3089462; −0.1656538].

Schlettwein (1987) did the simulation experiments described in Section 8.2.2.1 with normally distributed e_i and for several parameter combinations and n‐values. They led to the conclusion that for normally distributed e_i in model (8.64) the asymptotic tests and confidence intervals for all n ≥ 4 are appropriate.

8.2.2.6 The Gompertz Function

The regression function f_G(x, θ) of the model

8.73

is the Gompertz‐Function. In Gompertz (1825) it was used to describe the population growth. The function has an inflection point at

Problem 8.31

Show how the parameters of f_G(x) can be estimated with the least squares method.

Solution

We use the R‐command nls() to minimise

Example

We use the oil palm data of Example 8.6 and fit a Gompertz regression function. Analogously to Section 8.2.2.5 we find for x = 0 that ae^b = y_min and for γ < 0 that a = y_max ≈ 13. We try several values for c and use c = − 0.1. For x = 1 we have y = 2.02 = 13 · e^{b · 0.905} hence b · 0.905 = ln(2.02/13) and b = −2.06, let us use b = −2.

 >x <- c(1,2,3,4,5,6,7,8,9,10,11,12)
>  y <- c(2.02,3.62,5.71,7.13,8.33,8.29,9.81,11.3,12.18,12.67,12.62,13.01)
> model <- nls(y ∼ a*exp(b*exp(c*x)),start=list(a= 13,b=-2,c=-0.1))
> model
Nonlinear regression model
  model: y ∼ a * exp(b * exp(c * x))
   data: parent.frame()
     a      b      c
14.144 -2.347 -0.285
 residual sum-of-squares: 2.16
Number of iterations to convergence: 8
Achieved convergence tolerance: 5.094e-06

The fitted Gompertz function is = .

With the notation of Definition 8.1 we obtain and by this all components of θ are non‐linearity parameters. Again we use for writing

8.74

Now

The asymptotic covariance matrix is given by

We test the three hypotheses using (8.66)–(8.68).

The are obtained from the c_ij by replacing all parameters by their estimators. Further

8.75

With approximately df = n − 3.

Problem 8.32

Test the hypotheses with significance level α* = 0.05 for each of the three parameters of f_G(x) with R. The null‐hypothese values are respectively α₀ = 15, β₀ = −2, and γ₀ = −0.3.

Solution

Using the estimates a, b and c we calculate the estimates for c_ij and insert the values in (8.66)–(8.68).

Example

We use the oil palm data from Example 8.3 and continue the R commands of Problem 8.31.

 > a <-  14.144
> b <- -2.347
> c <- -0.285
> aa <- c(exp(2*b*exp(c*x)))
> A <- sum(aa)
> bb <- c(exp(c*x)*exp(2*b*exp(c*x)))
> B <- sum(bb)
> cc <- c(x*exp(c*x)*exp(2*b*exp(c*x)))
> C <- sum(cc)
> dd <- c(exp(2*c*x)*exp(2*b*exp(c*x)))
> D <- sum(dd)
> ee <- c(x*exp(2*c*x)*exp(2*b*exp(c*x)))
> E <- sum(ee)
> gg <- c(x^2*exp(2*c*x)*exp(2*b*exp(c*x)))
> G <- sum(gg)
> Delta <- A*D*G + 2*B*C*E -(C^2)*D - (E^2)*A -(B^2)*G
> Delta
[1] 0.06390853
>  n <- length(y)
> df <- n-3
>  SSRes <-  2.16
> sn <- sqrt(SSRes/df)
> sn
[1] 0.4898979
> SEa <- sn*sqrt(D*G - E^2)/sqrt(Delta)
> SEa
[1] 0.6790962
> SEb <- sn*sqrt((A*G-C^2)/a^2)/sqrt(Delta)
> SEb
[1] 0.2136496
> SEc <- sn*sqrt((A*D-B^2)/(a^2*b^2))/sqrt(Delta)
> SEc
[1] 0.03601059
> ta <- (a-15)/SEa
> ta
[1] -1.260499
>  p_valuea <- 2*(1-pt(abs(ta),df))
>  p_valuea
[1] 0.2391875
> tb <- (b-(-2))/SEb
> tb
[1] -1.624155
> p_valueb  <- 2*(1-pt(abs(tb),df))
> p_valueb
[1] 0.1387897
> tc <- (c-(-0.3))/SEc
> tc
[1] 0.4165442
> p_valuec <- 2*(1-pt(abs(tc),df))
> p_valuec
[1] 0.6867704
>age <- c(1,2,3,4,5,6,7,8,9,10,11,12)
> surface <-
c(2.02,3.62,5.71,7.13,8.33,8.29,9.81,11.3,12.18,12.67,12.62,13.01)
> plot(age,surface, main="Example 8.6 with fitted Gompertz function")
> lines(fittedsurf, lty = 2)

Problem 8.33

Construct (1 − α*)‐confidence intervals for all three parameters of f_G(x).

Solution

Continuation with the R program of Problem 8.31, 8.32, and use the SEa, the SEb and SEc for the confidence intervals.

Example

We use the data from Example 8.3 and construct 0.95 – confidence intervals for all parameters

 >   t_value <- qt(0.025, df, lower.tail = FALSE)
> t_value
[1] 2.262157
>  Qa <- SEa*t_value
>  lowerCLa <- a-Qa
>  lowerCLa
[1] 12.60778
>  upperCLa <- a+Qa
>  upperCLa
[1] 15.68022
> Qb <- SEb*t_value
> lowerCLb <- b-Qb
> lowerCLb
[1] -2.830309
> upperCLb<- b+Qb
> upperCLb
[1] -1.863691
>  Qc <- SEc*t_value
>  lowerCLc <- c-Qc
>  lowerCLc
[1] -0.3664616
> upperCLc<- c+Qc
> upperCLc
[1] -0.2035384

The 0.95‐confidence interval for α is: [12.60778; 15.68022].

The 0.95‐confidence interval for β is: [−2.830309; −1.863691].

The 0.95‐confidence interval for γ is: [−0.3664616; −0.2035384].

In our simulation experiments described in Section 8.2.2.1 with normally distributed e_i and for several parameter combinations and n‐values we concluded that for normally distributed e_i in model (8.73) the asymptotic tests and confidence intervals for all n ≥ 4 are appropriate.

8.2.3 Optimal Experimental Designs

In a model I of regression, we determine the values of the predictor variable in advance. Clearly, we would wish to do this so that a given precision can be reached using a minimal number of observations. In this context, any selection of the x‐values is called an experimental design. The value of the precision requirement is called the optimality criterion, and an experimental design that minimises the criterion for a given sample size is called an optimal experimental design.

Most optimal experimental designs are concerned with estimation criteria. Thus, the x_i values are often chosen in such a way that the variance of an estimator is minimised (for a given number of measurements) amongst all possible allocations of x_i values for linear or quasilinear regression functions. In the case of an intrinsically non‐linear regression function, the asymptotic variance of an estimator is minimised.

First we have to define the so‐called experimental region (x_l, x_u) in which measurements can or will be taken. Here x_l is the lower bound and x_u the upper bound of an interval on the x‐axis.

A big disadvantage of optimal designs is their dependence on the accuracy of the chosen model, they are only certain to be optimal if the assumed model is correct.

8.2.3.1 Simple Linear and Quasilinear Regression

Definition 8.3

A matrix

is called an exact m‐point‐design or a m‐point‐design for short with the support S_m = (x₁, … , x_m) and the allocation N_m = (n₁, n₂, … , n_m). The interval [x_l, x_o] is the experimental region.

For the simple linear regression, the following criteria are used:

var(b₀) – (C₁)‐optimality: minimise var(b₀) in (8.11)
var(b₁) – (C₂)‐optimality: minimise var(b₁) in (8.12)
D‐optimality: minimise the determinant D in (8.19) of the covariance matrix (8.14)
G‐optimality: minimise the maximum [over (x_l, x_u)] of the expected width of the confidence interval (8.24).

To minimise var(b₁) in (8.12) we have to maximise SS_x in its denominator and at the same time this minimises D in (8.19). The two criteria give the same optimal design solution and also minimise the expected width of the confidence interval for β₁ in (8.22).

In the case of normally distributed errors, we interpret the D‐optimality criterion as follows. If we construct a confidence region for the vector of coefficients in the simple linear regression model, we get an ellipse. The D‐optimal experimental design minimises the volume of this ellipse, among all such ellipses arising from any design having the same number of observations. We restrict ourselves in this section to the D‐optimality criterion, and thus also to var(b)‐optimality and on the G‐optimality.

If the experimental region is the interval (x_l, x_u) and if n is even (n = 2r), then the D‐optimal and the G‐optimal experimental designs are identical: – we take r measurements at each of the two boundaries of the region.

Thus, these optimal (exact) designs take the form

If n is odd, with n = 2r + 1, then there are two D‐optimal designs with r measurements at one boundary and r + 1 measurements at the other. Therefore, the D‐optimal designs are given by

respectively.

The (unique) G‐optimal design for n = 2r + 1 requires r readings at both ends of the interval and one at the interval mid‐point. Therefore, this design has the form

We will now determine the D‐ and G‐optimal designs for the carotene Example 8.4.

Problem 8.34

Determine the exact D‐ and G‐ optimal design in [x_l = 1, x_u = 303] with n = 5.

Solution

We see that

are D‐optimal designs respectively. In both designs SS_x(optimal) = 109444.8 and the value of the D‐criterion is D = 1.8274·10⁻⁷·σ ⁴.

In general we would prefer the second design, in which we begin with three readings and take two readings at the end, because of the possibility those parts of the experimental material may become corrupted during the experiment.

The G‐optimal design is

This design is not D‐optimal. After obtaining SS_x (G‐optimal) = 91204 we find a larger D‐value D = 2.1929·10⁻⁶·σ⁴. However, this design minimises the maximum of the expected width

of (8.24). This means that it minimises the maximum of K_x= . K_x takes its maximum at the border of the experimental region. In our case this is either at x = 1 or at x = 303.

If we insert x = 1 or x = 303 in the formula for K_x and because

SS_x (G‐optimal) = 91204, we obtain and .

The maximum expected width of (8.24) of the design images in Example 8.4 is larger because its maximum is 0.797.

Its D value is D = 3.3664·10⁻⁶·σ ⁴.

For polynomial quasilinear regression models, Pukelsheim (1993) gave D‐ and G‐optimal designs, which we report in Table 8.10.

images — Table 8.10 D‐ and G‐optimal designs for polynomial regression for x ∈ [*a, b*] and *n = m(k* + 1).

8.2.3.2 Intrinsically Non‐linear Regression

For intrinsically non‐linear regression models, the covariance matrix cannot be used for defining optimal design because it is unknown. We therefore use the asymptotic covariance matrix, but it unfortunately depends on at least one of the unknown parameters. Even if Rasch (1993) could show that the position of the support points often only slightly depends on θ we must check this for each special regression function separately. We therefore can find optimal designs only depending on at least one of the function parameters, we call such designs therefore locally optimal. We denote the part of θ which occurs in the formula of in (8.33) by θ₀ and write θ₀ as an argument of the asymptotic covariance matrix as . At first, some general results are presented and later locally optimal designs for the functions in Section 8.2.3.1 are given.

The asymptotic covariance matrix can now be written in dependency on θ₀ and V_{n, m} ∈ V (V is the set of all admissible designs) as

and is called a locally D‐, G‐ or C‐optimal m‐point design at θ = θ₀, if

8.76

Here H means any of the criteria D, G or C.

If V_m is the set of concrete m‐point designs, then is called concrete locally H‐optimal m‐point design, if

8.77

For some regression functions and optimality criteria analytical solutions in closed form of the problems could be found. Otherwise, search methods are applied.

The first analytical solution we find in Box and Lucas 1959 in Theorem 8.1.

Theorem 8.2

(Rasch 1990.) The support of a concrete locally D‐optimal p‐point design of size n is independent of n and the n_i of this design are as equal as possible (i.e. is n = ap, then n_i = a otherwise the n_i differ maximal by 1).

In Rasch (1990) further theorems concerning the D‐optimality can be found.

If n > 2 p, the D‐optimal concrete designs are approximately G‐optimal in the sense that the value of the G‐criterion for the concrete D‐optimal p‐point design even for n ≠ tp (t integer) does nearly not differ from that of the concrete. G‐optimal design. For the functions in (8.6) we found optimal designs by search methods and for n > p + 2 we often found p‐point designs.

For all models and optimality criteria an equidistant design in the experimental region with one observation at each support point is far from being optimal.

We show this by examples for our special functions in Section 8.2.2.

8.2.3.3 The Michaelis‐Menten Regression

The locally D‐optimum design of the function in [a,b] is for even n = 2t given by (Ermakov and Zigljavski 1987):

For odd n = 2 t + 1 we have two locally D‐optimum designs namely images and images respectively.

In the interval [0, 1440], which was used in Table 8.6 we had n = 10 and by this t = 5. With the parameter estimated the locally D‐optimum design is given as

with the criterium value 6.445. The original design of Michaelis and Menten in the form

has the criterion value 16.48.

Rasch (2008) found by search procedures the locally C₁‐optimal design images and the locally C₂‐optimal design images .

8.2.3.4 Exponential Regression

By constructing locally optimal designs in the class of m‐point designs (m ≥ 3) only such optimal designs have been found which are three‐point designs as derived by Box and Lucas (1959) for n = 3.

In Table 8.11 we list for rounded estimated parameter values found in the example of Problem 8.19 (oil palm data) and smaller and larger values of these parameters the locally D‐ and C_i(i = 1, 2, 3)‐optimal designs.

We compare the criterion value 0.00015381σ⁶ of the D‐optimal design images with the criterion value of the design images used in the experiment, which is 0.0004782σ⁶.

8.2.3.5 The Logistic Regression

By constructing locally optimal designs in the class of m‐point designs (m ≥ 3) only three‐point designs have been found.

In Table 8.12 we list for rounded estimated parameter values found in the example of Problem 8.23 (hemp data) and smaller and larger values of these parameters the locally D‐ and C_i (i = 1, 2, 3)‐optimal designs.

8.2.3.6 The Bertalanffy Function

By constructing locally optimal designs in the class of m‐point designs (m ≥ 3) only three‐point designs have been found.

In Table 8.13 we list for rounded estimated parameter values found in the example of Problem 8.28 (oil palm data) and smaller and larger values of these parameters the locally D‐ and C_i (i = 1, 2, 3)‐optimal designs.

8.2.3.7 The Gompertz Function

By constructing locally optimal designs in the class of m‐point designs (m ≥ 3) only three‐point designs have been found.

In Table 8.14 we list for rounded estimated parameter values found in the example of Problem 8.31 and smaller and larger values of these parameters the locally D‐ and C_i (i = 1, 2, 3)‐optimal designs.

8.3 Models with Random Regressors

We consider mainly the linear or quasilinear case for model II of the regression analysis where not only the regressand but also the regressor is a random variable. As mentioned at the top of this chapter we use the regression model

where the following assumptions must be fulfilled:

The error terms e_i are random variables with expectation zero.
The variance of the error terms e_i is equal for all observations.
For hypotheses testing and confidence estimation we assume that the e_i are N(0, σ²) distributed.
The error terms e_i are independent of the regressor variables x_i.
.

Intrinsically non‐linear regression models with random regressors are more the exception than the rule and play a minor role. An exception is the relative growth of a part of a body to the whole body or another part, we speak about allometry and consider this in Section 8.3.2.

8.3.1 The Simple Linear Case

The model

8.79

with the additional assumptions (1) to (5) is called a model II of the (if k > 1, multiple) linear regression. Correlation coefficients are defined, as long as (8.79) holds and the distribution has finite second moments.

We first discuss the case with k = 1 and

8.80

is the simple linear model II. Of course, we may interchange regressor and regressand and consider the model

Instead of discussing both models we simply rename the two variables and come back to (8.80).

The two‐dimensional density distribution of images may have existing second moments so that

8.81

In model I we had E(y_i) = β₀ + β₁x_i but now from (8.81) we see that E(y_i) = β₀ + β₁μ_x ∀ i and does not depend on the regressor variable. The estimator of (8.81) is

We consider the conditional expectation (i.e. the expectation of the conditional distribution of y for given x = x).

8.82

and the right‐hand side of (8.82) equals (8.6) for model I.

Because the parameter estimation in model II is done for this conditional expectation (8.82) the formulae for the estimates are identical for both models. This is a reason why program packages like IBM SPSS‐Statistics and SAS in regression analysis do not ask which model is assumed for the data as in the analysis of variance.

The estimators are

8.83

and

8.84

Graybill (1961) showed with lemma 10.1 that images . Further

Even if the images are (independently) normally distributed, images is not normally distributed.

The ratio

8.85

is called the correlation coefficient between x and y.

By replacing and by their unbiased estimators and we get the (biased) estimator

8.86

of the correlation coefficient.

To test the hypothesis H₀: ρ ≤ ρ₀ against the alternative hypothesis H_A: ρ > ρ₀ (respectively H₀: ρ ≥ ρ₀ against H_A: ρ < ρ₀). we replace r by the modified Fisher transform

8.87

which is approximately normally distributed even if n is rather small. Cramér (1946) proved that for n = 10 the approximation is actually sufficient as long as −0.8 ≤ ρ ≤ 0.8. The expectation E of z, being a function ζ of ρ, amounts to

The statistic z in (8.87) can be used to test the hypothesis H₀: ρ ≤ ρ₀ against the alternative hypothesis H_A: ρ > ρ₀ (respectively H₀: ρ ≥ ρ₀ against H_A: ρ < ρ₀). H₀: ρ ≤ ρ₀ is rejected with a first kind risk α, if z ≥ ζ(ρ₀)+ z_1‐α · (respectively for H₀: ρ ≥ ρ_0. if z ≥ ζ(ρ₀) – z_1‐α ·); z_1‐α is the (1 − α)‐quantile of the standard normal distribution).

We use a sequential triangular test (Whitehead 1992; Schneider 1992). We split the sequence of data pairs into sub‐samples of length, say k > 3 each. For each sub‐sample j (j = 1, 2, … m) we calculate a statistic of which distribution is only a function of ρ and k.

A triangular test must be based on a statistic with expectation 0, given the null‐hypothesis, therefore we transform z into a realisation of a standardised variable

which has the expectation 0 if the null‐hypothesis is true.

The parameter

is used as a test parameter. For ρ = ρ₀ the parameter θ is zero (as demanded). For ρ = ρ₁ we obtain:

The difference δ = ρ₁ − ρ₀ is the practical relevant difference which should be detected with power 1 − β.

From each sub‐sample j we now calculate the sample correlation coefficient r_j as well as its transformed values (j = 1, 2, …, m).

Now by and V_m = m the sequential path is defined by points (V_m, Z_m) for m = 1, 2, … up to the maximum of V below or exactly at the point where a decision can be made. The continuation region is a triangle whose three sides depend on α, β, and θ₁via

and

with the percentiles z_P of the standard normal distribution. That is, one side of the looked‐for triangle lies between –a and a on the ordinate of the (V, Z) plane (V = 0). The two other borderlines are defined by the lines L₁: Z = a + cV and L₂: Z = −a + 3cV, which intersect at

The maximum sample size is of course k·V_max. The decision rule now is: Continue sampling as long as −a + 3cV_m < Z_m < a + cV_m if θ₁ > 0 or −a + 3cV_m > Z_m > a + cV_m if θ₁ < 0. Given θ₁ > 0, accept H_A in case that Z_m reaches or exceeds L₁ and accept H₀ in case that Z_m reaches or underruns L₂, Given θ₁ < 0, accept H_A in the case Z_m reaches or underruns L₁ and accept H₀ in the case Z_m reaches or exceeds L₂. If the point is reached, H_A to be accepted.

What values must be chosen for k? The answer was found by a simulation experiment in Rasch et al. (2018) where

(a) the optimal size of subsamples (k_opt), where the actual type‐I‐risk (α_act) is below but as close as possible to the nominal type‐I‐risk (α_nom) was determined and
(b) the optimal nominal type‐II‐risk (β_opt), where the corresponding actual type‐II‐risk (β_act) is below but as close as possible to the nominal type‐II‐risk (β_nom) was determined.

Starting from k = 4, the size of the subsample was systematically increased with an increment of 1 for each parameter combination until the actual type‐I‐risk (α_act) fell below the nominal type‐I‐risk (α_nom). This optimal size of subsample (k_opt) was found in the next step to determine the optimal nominal type‐II‐risk (β_opt). That is, the nominal type‐II‐risk (β_nom) was systematically increased with an increment of 0.005 until the actual type‐II‐risk (β_act) fell below the nominal type‐II‐risk (α_nom).

Paths (Z, V) were generated by bivariate normally distributed random numbers x and y with means μ_x = μ_y = 0, variances = = 1, and a correlation coefficient σ_xy = ρ.

Using the seqtest package version 0.1‐0 (Yanagida 2016) simulations can be performed for any α_nom, β_nom, and δ =ρ₁ − ρ₀. We present here results for nominal risks α_nom = 0.05 and 0.01, β_nom = 0.1 and 0.2, values of ρ₀ ranging 0.1–0.9 with an increment of 0.1, and δ =ρ₁ − ρ₀= 0.05, 0.10, 0.15, and 0.20.

For each parameter, combination 1 000 000 runs (paths) were generated. As criteria, we calculated:

(a) the relative frequency of wrongly accepting H₁, given ρ = ρ₀, which is an estimate of the actual type‐I‐risk (α_act),
(b) the relative frequency of keeping H₀, given ρ = ρ₁ which is an estimate of the actual type‐II‐risk (β_act),
(c) the average number of sample pairs (x, y), i.e. average sample number (ASN), is the mean (Table 8.15).

Table 8.15 Optimal size of sub‐samples (k) and optimal nominal type‐II‐risk (β_opt) values for α_nom = 0.05.

Given values of the test problem			Optimal values
ρ₀	ρ₁	β	k_opt	β_opt	ASN \| ρ₁
0.2	0.25	0.1	78	0.125	1909
		0.2	65	0.235	1532
	0.30	0.1	33	0.130	491
		0.2	27	0.245	396
	0.35	0.1	20	0.135	224
		0.2	16	0.250	183
	0.40	0.1	14	0.140	129
		0.2	12	0.260	106
0.3	0.35	0.1	76	0.125	1710
		0.2	79	0.240	1364
	0.40	0.1	38	0.135	428
		0.2	32	0.250	348
	0.45	0.1	23	0.140	193
		0.2	19	0.255	158
	0.50	0.1	16	0.150	109
		0.2	13	0.270	90
0.5	0.55	0.1	102	0.140	1122
		0.2	83	0.245	916
	0.60	0.1	41	0.145	278
		0.2	33	0.265	224
	0.65	0.1	23	0.155	120
		0.2	19	0.275	98
	0.70	0.1	16	0.165	66
		0.2	13	0.285	54
0.7	0.75	0.1	73	0.145	503
		0.2	61	0.265	403
	0.80	0.1	28	0.165	115
		0.2	23	0.285	94
	0.85	0.1	15	0.180	46

Example 8.7

We use results from Rasch et al. (2018).

The sequential triangular test for testing a correlation coefficient is implemented in the R package seqtest (Yanagida 2016), which is available on CRAN (The Comprehensive R Archive Network) and can be installed via command line install.packages("seqtest"). This package offers a simulation function to determine the optimal size of subsamples (k_opt) and the optimal nominal type‐II‐risk (β_opt) for a user‐specified parameter combination. In the following example we determine k_opt and β_opt for H₀: ρ₀ ≤ 0.3 and H₁: ρ₁ > 0.3 with δ = 0.25 and α_nom = 0.01 and β_nom = 0.05.

After installing the package, the package is loaded using the function library(). We type

 > library(seqtest)

In the first step, we determine the optimal size of subsamples (k_opt). We type

 > sim.cor.seqtest(rho.sim = 0.3, rho.0 = 0.3, rho.1 = 0.55,  k = seq(4, 16, by = 1), alpha = 0.05, beta = 0.05,  runs = 10000)

i.e. we apply the function sim.cor.seqtest() using the first argument rho.sim to specify the simulated correlation coefficient ρ and the arguments rho.0 and rho.1 to specify ρ₀ and ρ₁. The argument k is used to specify a sequence for k, i.e. from 4 to 16 by increment of 1, for which the simulation is conducted, and the arguments alpha and beta are used to specify α_nom and β_nom. Last, we specify 10 000 runs using argument runs for each simulation condition.

As a results, we obtain:

 Statistical Simulation for the Sequential Triangular Test

   H0: rho.0 <= 0.3  versus  H1: rho.1 > 0.3

   Nominal type-I-risk (alpha):       0.05
   Nominal type-II-risk (beta):       0.05
   Practical relevant effect (delta): 0.25

   Simulated data based on rho:       0.3
   Simulation runs:                   10000

  Estimated empirical type-I-risk (alpha):
   k = 4:  0.144
   k = 5:  0.107
   k = 6:  0.084
   k = 7:  0.068
   k = 8:  0.059
   k = 9:  0.063
   k = 10: 0.056
   k = 11: 0.054
   k = 12: 0.053
   k = 13: 0.051
   k = 14: 0.049
   k = 15: 0.047
   k = 16: 0.043

Simulation results indicate that k = 14 is the optimal value, where α_act is below but close to α_nom.

In the next step, we determine the optimal nominal type‐II‐risk (β_opt) based on the optimal size of subsamples k_opt = 14. We type

 > sim.cor.seqtest(rho.sim = 0.55, rho.0 = 0.3, rho.1 = 0.55,  k = 16, alpha = 0.05, beta = seq(0.05, 0.15, by = 0.01),  runs = 10000)

i.e. again we apply the function >sim.cor.seqtest(). This time, we specify rho.sim = 0.55 to simulate the H₁ condition and use the argument beta to specify a sequence for β_nom, i.e. from 0.05 to 0.15 by increment of 0.01, for which the simulation is conducted. As a result we obtain:

 Statistical Simulation for the Sequential Triangular Test

   H0: rho.0 <= 0.3  versus  H1: rho.1 > 0.3

   Nominal type-I-risk (alpha):       0.05
   Practical relevant effect (delta): 0.25
   n in each sub-sample (k):          14

   Simulated data based on rho:       0.55
   Simulation runs:                   10000

  Estimated empirical type-II-risk (beta):
   Nominal beta = 0.05: 0.024
   Nominal beta = 0.06: 0.031
   Nominal beta = 0.07: 0.037
   Nominal beta = 0.08: 0.047
   Nominal beta = 0.09: 0.049
   Nominal beta = 0.10: 0.056
   Nominal beta = 0.11: 0.061
   Nominal beta = 0.12: 0.076
   Nominal beta = 0.13: 0.084
   Nominal beta = 0.14: 0.084
   Nominal beta = 0.15: 0.095

Simulation results indicate that β_nom = 0.09 is the optimal value, where β_act is below but close to β_nom.

The optimal values k_opt and β_opt determined by the simulation function are used for the sequential triangular test for testing a correlation coefficient. Let us assume that the first correlation coefficient calculated from a sample of 14 pairs is r₁ = 0.46. We type

 > seq.obj <- cor.seqtest(x = 0.46, k = 14, rho.0 = 0.3, rho.1 = 0.55,
                alpha = 0.05, beta = 0.09)

i.e. we apply the function sim.cor.seqtest(), using the first argument x to specify the sampled correlation coefficient 0.46. We specify ρ_0, ρ₁ and α using arguments rho.0, rho.1 and alpha, and specify k_opt and β_opt using functions k and beta. Results is assigned to the object seq.obj. As a result we obtain:

 Sequential triangular test for Pearson's correlation coefficient

  H0: rho.0 <= 0.3  versus  H1: rho.1 > 0.3
  alpha: 0.05  beta: 0.09  delta: 0.25  k: 14

  Step 1
    V.m:     1.000       Z.m:      0.585
    Continuation range | V.m: [-3.084, 4.248]

  Test not finished, continue by adding data via update()
  Current sample size for 1 correlation coefficient: 1 x 14 = 14

Results show that the test statistic Z_m is within the continuation range conditioned on V_m. Hence, no final decision is achievable and for that reason we continue our study. Next, let us assume we sampled r₂ = 0.57, r₃ = 0.49, r₄ = 0.69, and r₅ = 0.63 from k = 14 pairs each. We type

 > update(seq.obj, x = c(0.57, 0.49, 0.69, 0.63))

i.e. we apply the function update() to update results in the seq.obj object. As a result we obtain:

 Sequential triangular test for Pearson's correlation coefficient

  H0: rho.0 <= 0.3  versus  H1: rho.1 > 0.3
  alpha: 0.05  beta: 0.09  delta: 0.25  k: 14

  Step 2
    V.m:     2.000       Z.m:      1.667
    Continuation range | V.m: [-2.211, 4.539]

  Step 3
    V.m:     3.000       Z.m:      2.380
    Continuation range | V.m: [-1.338, 4.830]

  Step 4
    V.m:     4.000       Z.m:      4.128
    Continuation range | V.m: [-0.465, 5.121]

  Step 5
    V.m:     5.000       Z.m:      5.522
    Continuation range | V.m: [0.408, 5.412]

  Test finished: Accept alternative hypothesis (H1)
  Final sample size for 5 correlation coefficients: 5 x 14 = 70

Results show that the cumulated test statistic Z_m is leaves the continuation range conditioned on V_m at step 5. Hence, the test is finished and the alternative hypothesis is accepted (see Figure 8.11).

Figure 8.11 Graph of the triangle of the test of Example 8.7.

In a simulation study, Rasch et al. (2018):

(a) Determined the optimal size of sub‐samples (k_opt), where the actual type‐I‐risk (α_act) is below but as close as possible to the nominal type‐I‐risk (α_nom).
(b) Determined the optimal nominal type‐II‐risk (β_opt), where the corresponding actual type‐II‐risk (β_act) is below but as close as possible to the nominal type‐II‐risk (β_nom).

Starting from k = 4, the size of the sub‐sample was systematically increased with an increment of 1 for each parameter combination until the actual type‐I‐risk (α_act) fell below the nominal type‐I‐risk (α_nom). This optimal size of subsample (k_opt) was found in the next step to determine the optimal nominal type‐II‐risk (β_opt). That is, the nominal type‐II‐risk (β_nom) was systematically increased with an increment of 0.005 until the actual type‐II‐risk (β_act) fell below the nominal type‐II‐risk (α_nom).

Paths (Z, V) were generated by bivariate normally distributed random numbers x and y with means μ_x = μ_y = 0, variances σ_x² = σ_y² = 1, and a correlation coefficient σ_xy = ρ.

Using the seqtest package version 0.1–0 (Yanagida 2016) simulations can be performed for any α_nom, β_nom, and δ =ρ₁ − ρ₀. We present here results for nominal risks α_nom = 0.05 and 0.01, β_nom = 0.1 and 0.2, values of ρ₀ ranging 0.1–0.9 with an increment of 0.1, and δ =ρ₁ − ρ₀= 0.05, 0.10, 0.15, and 0.20.

For each parameter, combination 1 000 000 runs (paths) were generated. As criteria, we calculated:

(a) the relative frequency of wrongly accepting H₁, given ρ = ρ₀, which is an estimate of the actual type‐I‐risk (α_act),
(b) the relative frequency of keeping H₀, given ρ = ρ₁, which is an estimate of the actual type‐II‐risk (β_act),
(c) the average number of sample pairs (x, y), i.e. ASN, is the mean number of sample pairs over all 1 000 000 paths runs of the simulation study (Table 8.16).

Table 8.16 Optimal size of subsamples (k) and optimal nominal type‐II‐risk (β_opt) values for α_nom = 0.05.

Given values of the test problem			Optimal values
ρ₀	ρ₁	β	k_opt	β_opt	ASN \| ρ₁
0.1	0.15	0.1	53	0.120	2064
		0.2	43	0.225	1675
	0.20	0.1	22	0.125	543
		0.2	19	0.235	444
	0.25	0.1	13	0.130	257
		0.2	11	0.245	212
	0.30	0.1	10	0.135	152
		0.2	8	0.255	129
0.2	0.25	0.1	78	0.125	1909
		0.2	65	0.235	1532
	0.30	0.1	33	0.130	491
		0.2	27	0.245	396
	0.35	0.1	20	0.135	224
		0.2	16	0.250	183
	0.40	0.1	14	0.140	129
		0.2	12	0.260	106
0.3	0.35	0.1	76	0.125	1710
		0.2	79	0.240	1364
	0.40	0.1	38	0.135	428
		0.2	32	0.250	348
	0.45	0.1	23	0.140	193
		0.2	19	0.255	158
	0.50	0.1	16	0.150	109
		0.2	13	0.270	90
0.4	0.45	0.1	105	0.130	1436
		0.2	85	0.240	1162
	0.50	0.1	41	0.140	358
		0.2	34	0.255	290
	0.55	0.1	25	0.150	157
		0.2	20	0.265	129
	0.60	0.1	17	0.160	87
		0.2	14	0.275	73
0.5	0.55	0.1	102	0.140	1122
		0.2	83	0.245	916
	0.60	0.1	41	0.145	278
		0.2	33	0.265	224
	0.65	0.1	23	0.155	120
		0.2	19	0.275	98
	0.70	0.1	16	0.165	66
		0.2	13	0.285	54
0.6	0.65	0.1	92	0.140	810
		0.2	76	0.255	652
	0.70	0.1	36	0.155	193
		0.2	29	0.270	158
	0.75	0.1	21	0.165	82
		0.2	17	0.285	68
	0.80	0.1	13	0.180	43
		0.2	11	0.305	36
0.7	0.75	0.1	73	0.145	503
		0.2	61	0.265	403
	0.80	0.1	28	0.165	115
		0.2	23	0.285	94
	0.85	0.1	15	0.180	46
		0.2	13	0.300	39
	0.90	0.1	10	0.200	23
		0.2	8	0.330	20
0.8	0.85	0.1	49	0.160	235
		0.2	40	0.275	192
	0.90	0.1	17	0.185	50
		0.2	15	0.305	42
	0.95	0.1	9	0.210	17
		0.2	7	0.345	15
0.9	0.95	0.1	19	0.190	53
		0.2	16	0.310	44

8.3.2 The Multiple Linear Case and the Quasilinear Case

In the case k = 2 random variable (x₁, x₂, x₃) is assumed to be three‐dimensional normally distributed with existing second moments. Any of these three random variables may be use as regressand y in the regression model and the other renamed as x₁, x₂

8.88

as special case of (8.79) [with assumptions (1) to (5)].

Unbiased estimators of β₀, β₁, β₂ can be received analogous to (8.83) and (8.84).

The three conditional two‐dimensional distributions are two‐dimensional distributions with correlation coefficients

8.89

Here ρ_ij,ρ_ik and ρ_jk are the correlation coefficients of the three two‐dimensional marginal distributions of (x_i,x_j,x_k). It can easily be shown that these marginal distributions are two‐dimensional normal distributions if (x₁,x₂,x₃) is normally distributed.

The correlation coefficient (8.89) of the conditional two‐dimensional normal distribution of (x_i,x_j) for given x_k, i, j, k = 1, 2, 3 but different is called the partial correlation coefficient between (x_i,x_j) after fixing x_k at x_k.

We obtain estimators r_{i j · k} for partial correlation coefficients by replacing the simple correlation coefficients in (8.87) by their estimators and obtain the biased estimators

8.90

For the general parameter estimation, we use the formulae in Section 8.2

8.3.2.1 Hypotheses Testing ‐ General

Bartlett (1933) showed that all the tests in 8.2.1.2 could be applied also for model II, but the power of the tests differ between both models.

8.3.2.2 Confidence Estimation

An approximate (1 − α) 100%‐confidence interval for ϱ is

8.92

with the ‐quantile of the standard normal distribution .

To interpret the value of ϱ (and also of r), we again consider the regression function f(x) = E(y|x) = β₀ + β₁x.

ϱ² can now be explained as a measure of the proportion of the variance of y, explainable by the regression on x. The conditional variance of y is

and

is the proportion of the variance of y, not explainable by the regression on x, and by this the statement above follows. We call ϱ² = B measure of determination.

To construct confidence intervals for β₀ and β₁ or to test hypotheses about these parameters seems to be difficult but the methods for model I can also be applied for model II. We demonstrate this for the example of the confidence interval for β₀. The argumentation for confidence intervals for other parameters and for the statistical tests is analogous.

The probability statement

leading to the confidence interval (8.58) for j = 0 is true, if for fixed values x₁, … , x_n samples of y values are selected repeatedly. Using the frequency interpretation, β₀ is covered in about (1 − α) 100% of the cases by the interval (8.58). This statement is valid for each arbitrary n‐tuple x_i l, … , x_in, also for an n‐tuple x_i l, … , x_in, randomly selected from the distribution because (8.58) is independent of x₁, … , x_n, if the conditional distribution of the y_j is normal. However, this is the case, because (y, x₁, … , x_k) was assumed to be normally distributed. By this the construction of confidence intervals and testing of hypotheses can be done by the methods and formulae given above. However, the expected width of the confidence intervals and the power function of the tests differ for both models.

That is really a confidence interval with a confidence coefficient 1 − α also for model II, can of course be proven exactly, using a theorem of Bartlett (1933) by which

is t(n − 2)‐distributed.

8.3.3 The Allometric Model

In model II we discuss only one intrinsically non‐linear model, the allometric model

8.93

first used by Snell (1892) to describe the dependency between body mass and brain mass.

The allometric model is mainly used to describe relative growth of one part of a growing individual (plant or animal) to the total mass or the mass of another part see Huxley (1972). Nowadays this model is also used in technical applications.

8.3.4 Experimental Designs

The experimental design for model II of the regression analysis differs fundamentally from that of model I. Because x in model II is a random variable, the problem of the optimal choice of x does not occur. Experimental design in model II means only the optimal choice of n in dependency of given precision requirements. A systematic description about that is given in Rasch et al. (2008). We repeat this in the following.

At first we restrict in (8.59) on k = 1, and consider the more general model of the regression within of groups with the same slope β₁

8.94

We estimate β₁ for a > 1 not by 8.9, but by

8.95

with for each of the a groups as defined in Example 8.1.

If we look for a minimal images so that we find in Bock (1998)

If k = 1 for the expectation E(y|x) = β₀ + β₁x (1 − α)− confidence interval should be given so that the expectation of the square of the half width of the interval does not exceed δ, then

8.96

must be chosen.

References

Atkinson, A.C. and Hunter, W.G. (1968). The design of experiments for parameter estimations. Technometrics 10: 271–289.
Barath, C.S., Rasch, D., and Szabo, T. (1996). Összefügges a kiserlet pontossaga es az ismetlesek szama kozött. Allatenyesztes es takarmanyozas 45: 359–371.
Bartlett, M.S. (1933). On the theory of statistical regression. Proc. Royal Soc. Edinburgh 53: 260–283.
Bertalanffy, L. von (1929). Vorschlag zweier sehr allgemeiner biologischer Gesetze. Biol. Zbl. 49: 83–111.
Bock, J. (1998). Bestimmung des Stichprobenumfanges für biologische Experimente und kontrollierte klinische Studien. München Wien: R. Oldenbourg Verlag.
Box, G.E.P. and Lucas, H.L. (1959). Design of experiments in nonlinear statistics. Biometrics 46: 77–96.
Brody, S. (1945). Bioenergetics and Growth. N.Y: Rheinhold Pub. Corp.
Cramér, H. (1946). Mathematical Methods of Statistics. Princeton: Princeton University Press.
Ermakov, S.M. and Zigljavski, A.A. (1987). Математическая теория оптималных экспериметов (Mathematical Theory of Optimal Experiments). Moskwa: Nauka.
Galton, F. (1885). Opening address as President of the Anthropology Section of the British Association for the Advancement of Science, September 10th, 1885, at Aberdeen. Nature 32: 507–510.
Gompertz, B. (1825). On the nature of the function expressive of the law of human mortality, and on a new method determining the value of life contingencies. Phil. Trans. Roy. Soc. (B), London 513–585.
Graybill, A.F. (1961). An Introduction to Linear Statistical Models. New York: Mc Graw Holl.
Huxley, J.S. (1972). Problems of Relative Growth, 2e. New York: Dover.
Jennrich, R.J. (1969). Asymptotic properties of nonlinear least squares estimation. Ann. Math. Stat. 40: 633–643.
Michaelis, L. and Menten, M.L. (1913). Die Kinetik der Invertinwirkung. Biochem. Z. 49: 333–369.
Pilz, J., Rasch, D., Melas, V.B., and Moder, K. (eds.) (2018). Statistics and Simulation: IWS 8, Vienna, Austria, September 2015, Springer Proceedings in Mathematics & Statistics, vol. 231. Heidelberg: Springer.
Pukelsheim, F. (1993). Optimal Design of Experiments. New York: Wiley.
Rasch, D. (1968). Elementare Einführung in die Mathematische Statistik. Berlin: VEB Deutscher Verlag der Wissenschaften.
Rasch, D. (1990). Optimum experimental design in nonlinear regression. Commun. Stat. Theory and Methods 19: 4789–4806.
Rasch, D. (1993). The robustness against parameter variation of exact locally optimum experimental designs in growth models ‐ a case study. In: Techn. Note 93‐3. Department of Mathematics, Wageningen Agricultural University.
Rasch, D. (2008) Versuchsplanung in der nichtlinearen Regression ‐ Michaelis‐Menten gestern und heute. Vortrag an der Boku Wien, Mai 2008.
Rasch, D. and Schimke, E. (1983). Distribution of estimators in exponential regression – a simulation study. Skand. J. Stat. 10: 293–300.
Rasch, D. and Schott, D. (2018). Mathematical Statistics. Oxford: Wiley.
Rasch, D., Herrendörfer, G., Bock, J. et al. (eds.) (2008). Verfahrensbibliothek Versuchsplanung und ‐ auswertung, 2. verbesserte Auflage in einem Band mit CD. München Wien: R. Oldenbourg Verlag.
Rasch, D., Yanagida, T., Kubinger, K.D., and Schneider, B. (2018). Determination of the optimal size of subsamples for testing a correlation coefficient by a sequential triangular test. In: Statistics and Simulation: IWS 8, Vienna, Austria, September 2015, Springer Proceedings in Mathematics & Statistics, vol. 231 (ed. J. Pilz, D. Rasch, V.B. Melas and K. Moder).
Schlettwein, K. (1987). Beiträge zur Analyse von vier speziellen Wachstumsfunktionen. Rostock: Dipl.Arbeit, Sektion Mathematik, Univ.
Schneider, B. (1992). An interactive computer program for design and monitoring of sequential clinical trials. In Proceedings of the XVIth international biometric conference (pp. 237–250), Hamilton, New Zealand.
Snell, O. (1892). Die Abhängigkeit des Hirngewichts von dem Körpergewicht und den geistigen Fähigkeiten. Arch. Psychiatr. 23: 436–446.
Steger, H. and Püschel, F. (1960). Der Einfluß der Feuchtigkeit auf die Haltbarkeit des Carotins in künstlich getrocknetem Grünfutter. Die Deutsche Landwirtschaft 11: 301–303.
Whitehead, J. (1992). The Design and Analysis of Sequential Clinical Trials, 2e. Chichester: Ellis Horwood.
Yanagida, T. (2016). seqtest: Sequential triangular test, R package version 0.1‐0. http://CRAN.R‐project.org/package=seqtest.
Yule, G.U. (1897). On the theory of correlation. J. R. Stat. Soc. 60 (4): 812–854, Blackwell Publishing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Criterion	(α,β,γ) = (14,−18, −0.2)	(α,β,γ) = (16.5, −16.7, −0.14)	(α,β,γ) = (18, −17, −0.097)
D
C₁
C₂
C₃

Criterion	(α,β,γ) = (123,16, −0.5)	(α,β,γ) = (126,20, −0.46)	((α,β,γ) = (130,23, −0.42)
D
C_α
C_β
C_γ

Criterion	(α,β,γ) = (2.35, −1.63, −0.31)	(α,β,γ) = (2.44, −1.43, −0.24)	(α,β,γ) = (2.54, −1.23, −0.17)
D
C₁
C₂
C₃

Criterion	(β,γ) = (−17,−0.14)	(β,γ) = (−19,−0.2)	(β,γ) = (−14,−0.08)
D
C_α
C_β
C_γ

y	0	0.025	0.117	0.394	0.537	0.727	0.877	1.023	1.136	1.178
x	0	1	6	17	27	38	62	95	1372	1440

Table of Contents for 8 Regression Analysis

Create new playlist

Sign In

Sign Up

8.1 Introduction

8.2 Regression with Non‐Random Regressors – Model I of Regression

8.2.1 Linear and Quasilinear Regression

8.2.1.1 Parameter Estimation

8.2.1.2 Confidence Intervals and Hypotheses Testing

8.2.2 Intrinsically Non‐Linear Regression

8.2.2.1 The Asymptotic Distribution of the Least Squares Estimators

8.2.2.2 The Michaelis–Menten Regression

8.2.2.3 Exponential Regression

8.2.2.4 The Logistic Regression

8.2.2.5 The Bertalanffy Function

8.2.2.6 The Gompertz Function

8.2.3 Optimal Experimental Designs

8.2.3.1 Simple Linear and Quasilinear Regression

8.2.3.2 Intrinsically Non‐linear Regression

8.2.3.3 The Michaelis‐Menten Regression

8.2.3.4 Exponential Regression

8.2.3.5 The Logistic Regression

8.2.3.6 The Bertalanffy Function

8.2.3.7 The Gompertz Function

8.3 Models with Random Regressors

8.3.1 The Simple Linear Case

8.3.2 The Multiple Linear Case and the Quasilinear Case

8.3.2.1 Hypotheses Testing ‐ General

8.3.2.2 Confidence Estimation

8.3.3 The Allometric Model

8.3.4 Experimental Designs

References

Table of Contents for
8 Regression Analysis