Masks refers to the temporary fixing of one or more parameters and optimizing over the rest. This can be very helpful when one parameter is a setting or control that we later wish to optimize. It can also be useful to fix a parameter to which the objective function is particularly sensitive. In this chapter, we explore this idea, noting that some tools allow such options.
We return to our Hobbs weeds example with maximum likelihood estimation of the three logistic parameters plus the dispersion. That is, given a set of values of the growth of some quantity , for example, density of weeds, at times , we wish to maximize the product of terms of the form
where
(see Section 1.2)
Our parameters to adjust are again the , , and , which here are unscaled, as well as the standard error sigma.
We first want to set up the residuals. As R is very good at vector operations, we compute res as a vector and avoid the index . Let us furthermore work with the log of the parameters, which is oneform of scaling. The code to do this is function lhobbs.res
.
lhobbs.res <- function(xl, y) { # log scaled Hobbs weeds problem -- residual base parameters # on log(x) x <- exp(xl) if (abs(12 * x[3]) > 50) { # check computability rbad <- rep(.Machine$double.xmax, length(x)) return(rbad) } if (length(x) != 3) stop("hobbs.res -- parameter vector n!=3") t <- 1:length(y) res <- x[1]/(1 + x[2] * exp(-x[3] * t)) - y }
There are a few points to note.
The residuals must now be combined into the likelihood function.
lhobbs.lik <- function(xaug, y = y0) { # likelihood function including sigma xl <- xaug[1:3] logSigma <- xaug[4] sigma2 = exp(2 * logSigma) res <- lhobbs.res(xl, y) nll <- 0.5 * (length(res) * log(2 * pi * sigma2) + sum(res * res)/sigma2) }
This function explicitly negates the logarithm of the product of the likelihoods. Thus we can minimize this function with respect to the four parameters. Note again that we use the logarithms of the quantities we want as our parameters.
In this problem, it is relatively easy, although rather tedious, to develop expressions for the gradient via the chain rule, and we will do so. First, we need the Jacobian of the residuals.
lhobbs.jac <- function(xl, y) { # scaled Hobbs weeds problem -- Jacobian x <- exp(xl) jj <- matrix(0, 12, 3) t <- 1:12 yy <- exp(-x[3] * t) zz <- 1/(1 + x[2] * yy) jj[t, 1] <- zz * exp(xl[1]) jj[t, 2] <- -x[1] * zz * zz * yy * exp(xl[2]) jj[t, 3] <- x[1] * zz * zz * yy * x[2] * t * exp(xl[3]) return(jj) }
From this, we can compute the gradient of the sum-of-squares function.
lhobbs.g <- function(xl, y) { # scaled Hobbs weeds problem -- gradient shj <- lhobbs.jac(xl, y) shres <- lhobbs.res(xl, y) shg <- as.vector(2 * (shres %*% shj)) return(shg) }
Finally, we combine all these to get the gradient of the negative log likelihood.
lhobbs.lg <- function(xaug, y = y0) { # gradient function including sigma xl <- xaug[1:3] logSigma <- xaug[4] sigma2 = exp(2 * logSigma) res3 <- lhobbs.res(xl, y) n <- length(res3) f3 <- crossprod(res3) g3 <- 0.5 * lhobbs.g(xl, y)/sigma2 gg <- c(g3, (n - f3/sigma2)) }
While we could set up a separate function of just three parameters and solve the least squares problem as a way to get preliminary estimates for the logistic parameters, with masks we can use the full function just created and fix the sigma parameter. Here is the computation. Note how we use the vector bdmsk
to fix the fourth parameter, as well as the very crude starting values.
y0 <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) require(Rvmmin) ## Loading required package: Rvmmin ## Loading required package: optextras ## Loading required package: numDeriv start <- c(1, 1, 1, 1) bdmsk <- c(1, 1, 1, 0) # Cat fix parameter 4 at first ## afix4<-Rvmmintry(start, lhobbs.lik, lhobbs.lg, bdmsk=bdmsk, ## y=y0, control=list(trace=2)) afix4 <- Rvmmin(start, lhobbs.lik, lhobbs.lg, bdmsk = bdmsk, y = y0, control = list(trace = 0)) print(afix4) ## $par ## [1] 1 1 1 1 ## ## $value ## [1] 1520 ## ## $counts ## [1] 5 2 ## ## $convergence ## [1] -1 ## ## $message ## [1] "Rvmminb appears to have converged" ## ## $bdmsk ## [1] 1 1 1 0
Let us now take the parameters from this run and use them on the four-parameter problem with all parameters free, that is, without masks.
start2 <- afix4$par ## new start print(start2) ## [1] 1 1 1 1 bdmsk <- rep(1, 4) ## EXPLICITLY free all parameters aall <- Rvmmin(start2, lhobbs.lik, lhobbs.lg, bdmsk = bdmsk, y = y0) print(aall) ## $par ## [1] 5.2791 3.8937 -1.1597 -0.7672 ## ## $value ## [1] 7.821 ## ## $counts ## [1] 393 91 ## ## $convergence ## [1] 0 ## ## $message ## [1] "Rvmminb appears to have converged"
Finally, let us return to the full four-parameter problem and then compare the work done by the two approaches.
## [1] 1 1 1 1 ## $par ## [1] 5.2791 3.8937 -1.1597 -0.7672 ## ## $value ## [1] 7.821 ## ## $counts ## [1] 393 91 ## ## $convergence ## [1] 0 ## ## $message ## [1] "Rvmminb appears to have converged" ## ## ## Comparison of work: Masked vs. Free ## Functions 398 393 ## Gradients 93 91 ## ## Exponentiated parameters, last is sigma: ## [1] 196.1863 49.0916 0.3136 0.4643
Here we see that there is some saving in workload, although on modern computers the time saving is not noticeable. More importantly, however, it is sometimes difficult to get a solution directly when all parameters are left free, while the staged approach is able to succeed.
Package nlmrt
allows masks to be specified using the names of the parameters. Here is a small example of the Bates form of the three-parameter logistic where we fix the upper asymptote to a particular value.
weeddata <- data.frame(y = y0, t = 1:12) mystart <- c(Asym = 250, xmid = 6, scal = 1) # This sets the Asym value require(nlmrt) # Ensure tools available ## Loading required package: nlmrt maskrun <- nlxb(y ∼ Asym/(1 + exp((xmid - t)/scal)), start = mystart, data = weeddata, mask = c("Asym"), trace = FALSE) maskrun ## nlmrt class object: x ## residual sumsquares = 6.0738 on 12 observations ## after 6 Jacobian and 7 function evaluations ## name coeff SE tstat pval gradient JSingval ## Asym 250 M NA NA NA 0 52.8 ## xmid 13.8144 NA NA NA 3.094e-06 12.9 ## scal 3.43303 NA NA NA -6.946e-06 0
In the above output, we note that standard error estimates and -statistics are omitted. This is because the Jacobian of the reduced dimension problem is now singular. We can, of course, recast the problem, as we do in the following. However, it is part of my “to-do” list to adjust the package functions to provide the appropriate estimates.
mystart2 <- c(xmid = 6, scal = 1) # This sets the Asym value maskrun2 <- nlxb(y ∼ 250/(1 + exp((xmid - t)/scal)), start = mystart2, data = weeddata) maskrun2 ## nlmrt class object: x ## residual sumsquares = 6.0738 on 12 observations ## after 6 Jacobian and 7 function evaluations ## name coeff SE tstat pval gradient JSingval ## xmid 13.8144 0.04945 279.3 0 3.094e-06 52.8 ## scal 3.43303 0.03772 91 6.661e-16 -6.946e-06 12.9
Clearly, we can employ the idea of masks by explicitly coding the appropriate functions. This is, of course, more work when our goal is to estimate the fully parameterized model. It has surprised the author that the idea of masks is not more common in nonlinear parameter estimation software. One tool that does include the idea is AD Model Builder (ADMD), where it is referred to as phases. In ADMB, parameters are specified via lower and upper bounds and an integer giving the “phase” at which they will be allowed to participate in the optimization.
In the stats4
package that is distributed with base R the mle()
function claims to allow the user to specify parameters with fixed values, through I have not found success in trying to use this function. By contrast, the very similar mle2()
from bbmle
(Bolker and Team, 2013) seems to work quite well. It includes the same facility for fixing parameters, that is, fixed=list(parameter= value)
. There is an example in Section 21.1.
The package maxLik
(Toomet and Henningsen, 2008) also allows for what the authors call “fixed parameters.” Here the parameters to be fixed (masked) are specified in a vector giving the positions of the values to be fixed in the parameter vector. For situations with many parameters, this will be tidier than my use of an indicator of zero in the appropriate position in bdmsk
. Even better—an idea I shall likely copy—they allow for a vector of character names of the relevant parameters.
3.147.60.63