CRAN (Comprehensive R Archive Network) and R-Forge have a number of packages for function minimization. Indeed, I am a developer involved in several of them. This chapter explores some of these. Truthfully, there is more material than can be sensibly included in one book, let alone one chapter of a book, but we will do our best to provide a decent overview.
optimx
The package optimx
(Nash and Varadhan, 2011a, 2011b) is a wrapper package for a number of optimization routines, including several of those included in the base system. One purpose is to unify the calls to a number of useful optimizers into a common syntax and thereby allow their performance comparison in an easy way. This ability for comparison makes the package a useful introduction to a number of the packages it calls. Different syntaxes for the calls to different routines are a great nuisance for the user and a frequent source of error.
However, optimx
is more than simply a tool to unify the calling sequence syntax. The design incorporates several important concepts that are discussed elsewhere in this book (see Chapter 3).
summary()
, coef()
, and print()
methods are available for the returned results.There are other features that could improve the success of optimization calculations, and these have been investigated in the experimental package optplus
that is in the optimizer
project on R-forge (https://r-forge.r-project.org/R/?group_id=395). This extends optimx
as follows:
optim()
function, is available for all methods. This extension added a great deal of complexity to the code and incurs a performance penalty.Unfortunately, the performance costs of these features appear to outweigh the reliability benefits. Scaling and function checking are useful ideas, but I have yet to discover a general but efficient implementation in R. When an optimization task is being carried out many times, we would like to separate out some of the checks and scaling so that they are not repeated unnecessarily. For performance, the scaling and checks should be applied at the stage that the objective and gradient functions are written. On the other hand, when we want to solve a problem only one time, such checks can help to avoid time-consuming errors and scaling can help us to find a solution.
optimx
At the time of writing, the optimization methods in optimx
includes the following methods from the base system that were discussed in Chapter 8:
optim()
function;nlm()
;nlminb()
.Also included (again at time of writing) are
Rvmmin
(Nash, 2011b), which is my all-R implementation of the variable metric method of Fletcher (1970);Rcgmin
(Nash, 2011a), which is my all-R implementation of the conjugate gradient method of Dai and Yuan (2001);spg
from BB
(Varadhan and Gilbert, 2009), an R implementation of the Birgin and Raydan (2001) code by Ravi Varadhan;ucminf
(Nielsen and Mortensen, 2012), a variable metric method using an inverse Hessian approximation with a line search due to Nielsen (2000);nmkb
and hjkb
from package dfoptim
(Varadhan and Borchers, 2011), which are Kelley's (1987) variant of the Nelder–Mead method with bounds and a Hooke and Jeeves (1961) method with boundsnewuoa
and bobyqa
from package minqa
(Bates et al. 2010), an interfacing of Powell's BOBYQA Powell (2009) that is written in Fortran;optimx()
Let us use the traditional Rosenbrock function fr()
with, where it can be used, the analytic gradient frg()
as presented in Chapter 8.
require(optimx) ## Loading required package: optimx fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)∧2 + (1 - x1)∧2 } frg <- function(x) { ## Rosenbrock Banana function gradient x1 <- x[1] x2 <- x[2] g1 <- -400 * (x2 - x1 * x1) * x1 - 2 * (1 - x1) g2 <- 200 * (x2 - x1 * x1) gg <- c(g1, g2) } sstart <- c(-1.2, 1) arb <- optimx(par = sstart, fn = fr, gr = frg, method = "all") ## Loading required package: numDeriv ## Warning: bounds can only be used with method L-BFGS-B (or Brent) ## Warning: bounds can only be used with method L-BFGS-B (or Brent) ## Warning: bounds can only be used with method L-BFGS-B (or Brent) print(summary(arb, order = value)) ## p1 p2 value fevals gevals niter ## Rvmmin 1.0000 1.0000 1.700e-26 59 39 NA ## nlminb 1.0000 1.0000 4.292e-22 43 36 35 ## Rcgmin 1.0000 1.0000 8.805e-18 735 434 NA ## ucminf 1.0000 1.0000 2.726e-17 38 38 NA ## newuoa 1.0000 1.0000 2.178e-15 257 NA NA ## BFGS 1.0000 1.0000 2.268e-13 47 47 NA ## CG 1.0000 1.0000 2.268e-13 47 47 NA ## Nelder-Mead 1.0000 1.0000 2.268e-13 47 47 NA ## L-BFGS-B 1.0000 1.0000 2.268e-13 47 47 NA ## bobyqa 1.0000 1.0000 2.845e-13 290 NA NA ## nlm 1.0000 1.0000 3.974e-12 NA NA 23 ## hjkb 1.0000 0.9999 2.618e-09 540 NA 19 ## spg 0.9998 0.9996 3.896e-08 141 NA 112 ## nmkb 1.0003 1.0006 4.269e-07 120 NA NA ## convcode kkt1 kkt2 xtimes ## Rvmmin 2 TRUE TRUE 0.004 ## nlminb 0 TRUE TRUE 0.000 ## Rcgmin 0 TRUE TRUE 0.032 ## ucminf 0 TRUE TRUE 0.000 ## newuoa 0 TRUE TRUE 0.000 ## BFGS 0 TRUE TRUE 0.000 ## CG 0 TRUE TRUE 0.000 ## Nelder-Mead 0 TRUE TRUE 0.000 ## L-BFGS-B 0 TRUE TRUE 0.000 ## bobyqa 0 TRUE TRUE 0.004 ## nlm 0 TRUE TRUE 0.000 ## hjkb 0 TRUE TRUE 0.008 ## spg 0 TRUE TRUE 0.060 ## nmkb 0 FALSE TRUE 0.004
optimx()
allows a very quick comparison of how well methods work on particular classes of problem. It should not, of course, be used in this way for obtaining solutions on a production basis. In the present example, we see that we could use any of the first half dozen methods satisfactorily.
The collection of packages for R is large and fluid. Those in what I consider the primary collection, the Comprehensive R Archive Network or CRAN, must satisfy regular build tests, so sometimes disappear from the collection. Overall, the size of the collection is growing. There are, moreover, many other repositories and collections, some are excellent and others are of dubious value. Thus my comments here are directed to some tools that I consider useful.
nloptr
and nloptwrap
nloptr
(Ypma, 2013) is an interface to the NLopt library maintained by Steven G. Johnson of MIT (http://ab-initio.mit.edu/wiki/index.php/NLopt). There is a simplified wrapper for this called nloptwrap
(Borchers, 2013). There are a number of optimization tools in NLopt, some of which mirror, and are derived from the same sources as, some of the tools seen above for unconstrained and bounds constrained optimization.
require(nloptwrap) ## Loading required package: nloptwrap ## Loading required package: nloptr ## ## Attaching package: 'nloptwrap' ## ## The following objects are masked from 'package:minqa': ## ## bobyqa, newuoa nowp <- function(answer) { # nloptwrap answer summary cat("Fn =", answer$value, " after ", answer$iter, " iterations, parameters: ") print(answer$par) cat(answer$message, " ") invisible(0) } albfgs <- lbfgs(sstart, fr, gr = frg) nowp(albfgs) ## Fn = 7.357e-23 after 56 iterations, parameters: ## [1] 1 1 ## NLOPT_SUCCESS: Generic success return value. atnewton <- tnewton(sstart, fr, gr = frg) nowp(atnewton) ## Fn = 1.735e-24 after 74 iterations, parameters: ## [1] 1 1 ## NLOPT_SUCCESS: Generic success return value. avarmetric <- varmetric(sstart, fr, gr = frg) nowp(avarmetric) ## Fn = 1.391e-18 after 53 iterations, parameters: ## [1] 1 1 ## NLOPT_SUCCESS: Generic success return value. anelmead <- neldermead(sstart, fr) nowp(anelmead) ## Fn = 3.656e-14 after 214 iterations, parameters: ## [1] 1 1 ## NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached. anewuoa <- newuoa(sstart, fr) nowp(anewuoa) ## Fn = 8.023e-17 after 194 iterations, parameters: ## [1] 1 1 ## NLOPT_SUCCESS: Generic success return value. # remove the packages in case of confusion detach(package:nloptwrap) detach(package:nloptr)
trust
and trustOptim
Trust region methods of optimization, as pointed out very nicely in http://en.wikipedia.org/wiki/Trust_region, choose a step size for a search for a new and hopefully better set of parameters and then find a search direction. The step size is determined by the region in which a model of the objective function is sufficiently accurate for the needs of optimization. Traditional gradient methods choose the search direction and then work out a step size.
trust
(Geyer, 2013) implements a trust region method where the user must supply objective function code and a starting set of parameters. However, the objective function now returns not only the function but also its gradient and Hessian. This is, for many problems, a large and onerous task. In the following text, we illustrate this using the Rosenbrock function that has just been used above. Furthermore, we use the example code from trust
to show an alternative way to generate the objective function in objfun2()
.
frh <- function(x) { ## Rosenbrock Banana function gradient x1 <- x[1] x2 <- x[2] h11 <- -400 * x2 + 1200 * x1 * x1 + 2 h12 <- -400 * x1 h21 <- h12 h22 <- 200 HH <- matrix(c(h11, h12, h21, h22), nrow = 2, ncol = 2) } objfun1 <- function(x) { val <- fr(x) gg <- frg(x) HH <- frh(x) list(value = val, gradient = gg, hessian = HH) } objfun2 <- function(x) { stopifnot(is.numeric(x)) stopifnot(length(x) == 2) f <- expression(100 * (x2 - x1∧2)∧2 + (1 - x1)∧2) g1 <- D(f, "x1") g2 <- D(f, "x2") h11 <- D(g1, "x1") h12 <- D(g1, "x2") h22 <- D(g2, "x2") x1 <- x[1] x2 <- x[2] f <- eval(f) g <- c(eval(g1), eval(g2)) B <- rbind(c(eval(h11), eval(h12)), c(eval(h12), eval(h22))) list(value = f, gradient = g, hessian = B) } require(trust) atrust1 <- trust(objfun1, sstart, rinit = 1, rmax = 5) atrust1 ## $value ## [1] 3.7902e-22 ## ## $gradient ## [1] 2.9933e-10 -1.6724e-10 ## ## $hessian ## [,1] [,2] ## [1,] 802 -400 ## [2,] -400 200 ## ## $argument ## [1] 1 1 ## ## $converged ## [1] TRUE ## ## $iterations ## [1] 27 ## Get same answers from atrust2<-trust(objfun2, sstart, ## rinit=1, rmax=5)
We note that trust
works fine for the Rosenbrock function, which it should, given the amount of information provided.
Where should a package such as trust
be used? As it requires the gradient and Hessian as well as the function to be computed, it clearly will not be advisable to apply it to functions where these are difficult or impossible to compute. This includes functions where we have conditional statements, absolute values, or similar constructs. Moreover, trust
will be most favored when the gradient and Hessian are easy to compute, which can be the case for certain likelihoods based on exponential family probability distributions. Communications from its author, Charles Geyer, make it clear that he understands this very well and that trust
is intended for situations where it is very economical to compute the objective function, gradient, and Hessian in a coordinated manner.
If gradient and hessian functions are available, we still need to consider the control parameters rinit
and rmax
. These values are likely not critical, but defaults are not provided.
A more recent package is trustOptim
(Braun, 2013), which claims to be usable for sparse Hessians. There is a vignette to illustrate how this package can be used. However, some suggested test cases supplied by the package author gave unexpected results, so there are uncertainties about the package at the time of writing. I suspect that in time this package will become useful in the sphere of application for which it is intended, but at the time of writing, urge caution in using it.
optim()
routines?In several places in this book, I have noted that some of the methods in optim()
are “old,” even though they are my own. Among those of us working to improve the optimization tools in R, there is an ongoing discussion of how to replace the minimizers in optim()
. This is, and should be, an ongoing task and discussion. With optimx
, we do have some reasonable and indeed evolving replacements for the methods in optim()
. However, there remains an important and possibly difficult adjustment to be made in various packages and functions that make use of optim()
, in particular those that directly call the internal functions that make up the optim()
function. For example, nnet()
directly calls the internal C routine vmmin.c which is the principal computational engine of the optim() method “BFGS.” The difficulty is less in offering “new” functions than in removing the obsolete ones.
minqa
: Derivative-Free Optimization Algorithms by Quadratic Approximation. R Foundation for Statistical Computing. R package version 1.1.13.Rcgmin
: conjugate Gradient Minimization of Nonlinear Functions with Box Constraints. Nash Information Services Inc. R package version 2011-2.10.Rvmmin
: variable Metric Nonlinear Function Minimization with Bounds Constraints. Nash Information Services Inc. R package version 2011-2.25.optimx
: a Replacement and Extension of the optim() function. Nash Information Services Inc. and Johns Hopkins University. R package version 2011-3.5.BB
: an R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. Journal of Statistical Software 32(4), 1–26.18.225.149.238