Chapter 12
Using masks

Masks refers to the temporary fixing of one or more parameters and optimizing over the rest. This can be very helpful when one parameter is a setting or control that we later wish to optimize. It can also be useful to fix a parameter to which the objective function is particularly sensitive. In this chapter, we explore this idea, noting that some tools allow such options.

12.1 An example

We return to our Hobbs weeds example with maximum likelihood estimation of the three logistic parameters plus the dispersion. That is, given a set of values of the growth of some quantity c12-math-0001, for example, density of weeds, at times c12-math-0002, we wish to maximize the product of terms of the form

12.1 equation

where

12.2 equation

(see Section 1.2)

Our parameters to adjust are again the c12-math-0005, c12-math-0006, and c12-math-0007, which here are unscaled, as well as the standard error sigma.

12.2 Specifying the objective

We first want to set up the residuals. As R is very good at vector operations, we compute res as a vector and avoid the index c12-math-0008. Let us furthermore work with the log of the parameters, which is oneform of scaling. The code to do this is function lhobbs.res.

lhobbs.res <- function(xl, y) {
    # log scaled Hobbs weeds problem -- residual base parameters
    # on log(x)
    x <- exp(xl)
    if (abs(12 * x[3]) > 50) {
        # check computability
        rbad <- rep(.Machine$double.xmax, length(x))
        return(rbad)
    }
    if (length(x) != 3)
        stop("hobbs.res -- parameter vector n!=3")
    t <- 1:length(y)
    res <- x[1]/(1 + x[2] * exp(-x[3] * t)) - y
}

There are a few points to note.

  • The parameters used to compute the logistic are the exponentiated values of the input parameters. This ensures that these parameters are positive without the need for explicit lower bounds at zero that sometimes give rise to computational issues.
  • There is an explicit check that our third parameter, in its computational form, that is, exponentiated, is not too large.
  • The residual is expressed as “model–data,” noted in Chapter 10, which is not the usual choice made today by statisticians. As we square the residuals, the result is the same, and the derivatives of the residual then have the same sign as the derivatives of the model. A small point but one that I have found helps to reduce errors.

The residuals must now be combined into the likelihood function.

lhobbs.lik <- function(xaug, y = y0) {
    # likelihood function including sigma
    xl <- xaug[1:3]
    logSigma <- xaug[4]
    sigma2 = exp(2 * logSigma)
    res <- lhobbs.res(xl, y)
    nll <- 0.5 * (length(res) * log(2 * pi * sigma2) + sum(res *
        res)/sigma2)
}

This function explicitly negates the logarithm of the product of the likelihoods. Thus we can minimize this function with respect to the four parameters. Note again that we use the logarithms of the quantities we want as our parameters.

In this problem, it is relatively easy, although rather tedious, to develop expressions for the gradient via the chain rule, and we will do so. First, we need the Jacobian of the residuals.

lhobbs.jac <- function(xl, y) {
    # scaled Hobbs weeds problem -- Jacobian
    x <- exp(xl)
    jj <- matrix(0, 12, 3)
    t <- 1:12
    yy <- exp(-x[3] * t)
    zz <- 1/(1 + x[2] * yy)
    jj[t, 1] <- zz * exp(xl[1])
    jj[t, 2] <- -x[1] * zz * zz * yy * exp(xl[2])
    jj[t, 3] <- x[1] * zz * zz * yy * x[2] * t * exp(xl[3])
    return(jj)
}

From this, we can compute the gradient of the sum-of-squares function.

lhobbs.g <- function(xl, y) {
    # scaled Hobbs weeds problem -- gradient
    shj <- lhobbs.jac(xl, y)
    shres <- lhobbs.res(xl, y)
    shg <- as.vector(2 * (shres %*% shj))
    return(shg)
}

Finally, we combine all these to get the gradient of the negative log likelihood.

lhobbs.lg <- function(xaug, y = y0) {
    # gradient function including sigma
    xl <- xaug[1:3]
    logSigma <- xaug[4]
    sigma2 = exp(2 * logSigma)
    res3 <- lhobbs.res(xl, y)
    n <- length(res3)
    f3 <- crossprod(res3)
    g3 <- 0.5 * lhobbs.g(xl, y)/sigma2
    gg <- c(g3, (n - f3/sigma2))
}

While we could set up a separate function of just three parameters and solve the least squares problem as a way to get preliminary estimates for the logistic parameters, with masks we can use the full function just created and fix the sigma parameter. Here is the computation. Note how we use the vector bdmsk to fix the fourth parameter, as well as the very crude starting values.

y0 <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558,
    50.156, 62.948, 75.995, 91.972)
require(Rvmmin)
## Loading required package: Rvmmin
## Loading required package: optextras
## Loading required package: numDeriv
start <- c(1, 1, 1, 1)
bdmsk <- c(1, 1, 1, 0)  # Cat fix parameter 4 at first
## afix4<-Rvmmintry(start, lhobbs.lik, lhobbs.lg, bdmsk=bdmsk,
## y=y0, control=list(trace=2))
afix4 <- Rvmmin(start, lhobbs.lik, lhobbs.lg, bdmsk = bdmsk,
    y = y0, control = list(trace = 0))
print(afix4)
## $par
## [1] 1 1 1 1
##
## $value
## [1] 1520
##
## $counts
## [1] 5 2
##
## $convergence
## [1] -1
##
## $message
## [1] "Rvmminb appears to have converged"
##
## $bdmsk
## [1] 1 1 1 0

Let us now take the parameters from this run and use them on the four-parameter problem with all parameters free, that is, without masks.

start2 <- afix4$par
## new start
print(start2)
## [1] 1 1 1 1
bdmsk <- rep(1, 4)  ## EXPLICITLY free all parameters
aall <- Rvmmin(start2, lhobbs.lik, lhobbs.lg, bdmsk = bdmsk,
    y = y0)
print(aall)
## $par
## [1]  5.2791  3.8937 -1.1597 -0.7672
##
## $value
## [1] 7.821
##
## $counts
## [1] 393  91
##
## $convergence
## [1] 0
##
## $message
## [1] "Rvmminb appears to have converged"

Finally, let us return to the full four-parameter problem and then compare the work done by the two approaches.

## [1] 1 1 1 1
## $par
## [1]  5.2791  3.8937 -1.1597 -0.7672
##
## $value
## [1] 7.821
##
## $counts
## [1] 393  91
##
## $convergence
## [1] 0
##
## $message
## [1] "Rvmminb appears to have converged"
##
##
## Comparison of work: Masked      vs.    Free
##     Functions          398             393
##     Gradients           93              91
##
##  Exponentiated parameters, last is sigma:
## [1] 196.1863  49.0916   0.3136   0.4643

Here we see that there is some saving in workload, although on modern computers the time saving is not noticeable. More importantly, however, it is sometimes difficult to get a solution directly when all parameters are left free, while the staged approach is able to succeed.

12.3 Masks for nonlinear least squares

Package nlmrt allows masks to be specified using the names of the parameters. Here is a small example of the Bates form of the three-parameter logistic where we fix the upper asymptote to a particular value.

weeddata <- data.frame(y = y0, t = 1:12)
mystart <- c(Asym = 250, xmid = 6, scal = 1)  # This sets the Asym value
require(nlmrt)  # Ensure tools available
## Loading required package: nlmrt
maskrun <- nlxb(y  Asym/(1 + exp((xmid - t)/scal)), start = mystart,
    data = weeddata, mask = c("Asym"), trace = FALSE)
maskrun
## nlmrt class object: x
## residual sumsquares =  6.0738  on  12 observations
##     after  6    Jacobian and  7 function evaluations
##   name        coeff        SE      tstat     pval      gradient   JSingval
## Asym             250  M       NA        NA        NA           0       52.8
## xmid         13.8144          NA        NA        NA   3.094e-06       12.9
## scal         3.43303          NA        NA        NA  -6.946e-06          0

In the above output, we note that standard error estimates and c12-math-0009-statistics are omitted. This is because the Jacobian of the reduced dimension problem is now singular. We can, of course, recast the problem, as we do in the following. However, it is part of my “to-do” list to adjust the package functions to provide the appropriate estimates.

mystart2 <- c(xmid = 6, scal = 1)  # This sets the Asym value
maskrun2 <- nlxb(y  250/(1 + exp((xmid - t)/scal)), start = mystart2,
    data = weeddata)
maskrun2
## nlmrt class object: x
## residual sumsquares =  6.0738  on  12 observations
##     after  6    Jacobian and  7 function evaluations
##   name      coeff        SE      tstat      pval      gradient    JSingval
## xmid       13.8144     0.04945     279.3          0   3.094e-06        52.8
## scal       3.43303     0.03772        91  6.661e-16  -6.946e-06        12.9

12.4 Other approaches to masks

Clearly, we can employ the idea of masks by explicitly coding the appropriate functions. This is, of course, more work when our goal is to estimate the fully parameterized model. It has surprised the author that the idea of masks is not more common in nonlinear parameter estimation software. One tool that does include the idea is AD Model Builder (ADMD), where it is referred to as phases. In ADMB, parameters are specified via lower and upper bounds and an integer giving the “phase” at which they will be allowed to participate in the optimization.

In the stats4 package that is distributed with base R the mle() function claims to allow the user to specify parameters with fixed values, through I have not found success in trying to use this function. By contrast, the very similar mle2() from bbmle (Bolker and Team, 2013) seems to work quite well. It includes the same facility for fixing parameters, that is, fixed=list(parameter= value). There is an example in Section 21.1.

The package maxLik (Toomet and Henningsen, 2008) also allows for what the authors call “fixed parameters.” Here the parameters to be fixed (masked) are specified in a vector giving the positions of the values to be fixed in the parameter vector. For situations with many parameters, this will be tidier than my use of an indicator of zero in the appropriate position in bdmsk. Even better—an idea I shall likely copy—they allow for a vector of character names of the relevant parameters.

References

  1. Bolker B and Team RDC 2013 bbmle: Tools for general maximum likelihood estimation. R package version 1.0.13.
  2. Toomet O and Henningsen A 2008 Sample selection models in r: package sample selection. Journal of Statistical Software 27(7), 1–23.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.60.63