Chapter 9
Add-in function minimization packages for R

CRAN (Comprehensive R Archive Network) and R-Forge have a number of packages for function minimization. Indeed, I am a developer involved in several of them. This chapter explores some of these. Truthfully, there is more material than can be sensibly included in one book, let alone one chapter of a book, but we will do our best to provide a decent overview.

9.1 Package optimx

The package optimx (Nash and Varadhan, 2011a, 2011b) is a wrapper package for a number of optimization routines, including several of those included in the base system. One purpose is to unify the calls to a number of useful optimizers into a common syntax and thereby allow their performance comparison in an easy way. This ability for comparison makes the package a useful introduction to a number of the packages it calls. Different syntaxes for the calls to different routines are a great nuisance for the user and a frequent source of error.

However, optimx is more than simply a tool to unify the calling sequence syntax. The design incorporates several important concepts that are discussed elsewhere in this book (see Chapter 3).

  • A number of initial tests of the function, gradient, and scaling are available before optimization is attempted.
  • After optimizers are complete, Kuhn–Karush–Tucker verification of the solutions is the default behavior if the number of parameters is not too large.
  • summary(), coef(), and print() methods are available for the returned results.

There are other features that could improve the success of optimization calculations, and these have been investigated in the experimental package optplus that is in the optimizer project on R-forge (https://r-forge.r-project.org/R/?group_id=395). This extends optimx as follows:

  • Parameter scaling, as in the optim() function, is available for all methods. This extension added a great deal of complexity to the code and incurs a performance penalty.
  • Similarly, function evaluations are checked to see if they are computable, so that trial points or search directions can be revised. This allows optimization to proceed and avoids some stoppages that kill a user's program, but for problems where trial points are all computable is a waste of time.
  • After optimizers complete, an axial search verification of the solutions is the default behavior. This requires c09-math-0001 function evaluations for each method, where c09-math-0002 is the number of parameters.

Unfortunately, the performance costs of these features appear to outweigh the reliability benefits. Scaling and function checking are useful ideas, but I have yet to discover a general but efficient implementation in R. When an optimization task is being carried out many times, we would like to separate out some of the checks and scaling so that they are not repeated unnecessarily. For performance, the scaling and checks should be applied at the stage that the objective and gradient functions are written. On the other hand, when we want to solve a problem only one time, such checks can help to avoid time-consuming errors and scaling can help us to find a solution.

9.1.1 Optimizers in optimx

At the time of writing, the optimization methods in optimx includes the following methods from the base system that were discussed in Chapter 8:

  • the BFGS, Nelder–Mead, L-BFGS-B, and CG methods from the optim() function;
  • nlm();
  • nlminb().

Also included (again at time of writing) are

  • package Rvmmin (Nash, 2011b), which is my all-R implementation of the variable metric method of Fletcher (1970);
  • package Rcgmin (Nash, 2011a), which is my all-R implementation of the conjugate gradient method of Dai and Yuan (2001);
  • function spg from BB (Varadhan and Gilbert, 2009), an R implementation of the Birgin and Raydan (2001) code by Ravi Varadhan;
  • package ucminf (Nielsen and Mortensen, 2012), a variable metric method using an inverse Hessian approximation with a line search due to Nielsen (2000);
  • functions nmkb and hjkb from package dfoptim (Varadhan and Borchers, 2011), which are Kelley's (1987) variant of the Nelder–Mead method with bounds and a Hooke and Jeeves (1961) method with bounds
  • methods newuoa and bobyqa from package minqa (Bates et al. 2010), an interfacing of Powell's BOBYQA Powell (2009) that is written in Fortran;

9.1.2 Example use of optimx()

Let us use the traditional Rosenbrock function fr() with, where it can be used, the analytic gradient frg() as presented in Chapter 8.

require(optimx)
## Loading required package: optimx
fr <- function(x) {
    ## Rosenbrock Banana function
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)2 + (1 - x1)2
}
frg <- function(x) {
    ## Rosenbrock Banana function gradient
    x1 <- x[1]
    x2 <- x[2]
    g1 <- -400 * (x2 - x1 * x1) * x1 - 2 * (1 - x1)
    g2 <- 200 * (x2 - x1 * x1)
    gg <- c(g1, g2)
}
sstart <- c(-1.2, 1)
arb <- optimx(par = sstart, fn = fr, gr = frg, method = "all")
## Loading required package: numDeriv
## Warning: bounds can only be used with method L-BFGS-B (or Brent)
## Warning: bounds can only be used with method L-BFGS-B (or Brent)
## Warning: bounds can only be used with method L-BFGS-B (or Brent)
print(summary(arb, order = value))
##                 p1     p2     value fevals gevals niter
## Rvmmin      1.0000 1.0000 1.700e-26     59     39    NA
## nlminb      1.0000 1.0000 4.292e-22     43     36    35
## Rcgmin      1.0000 1.0000 8.805e-18    735    434    NA
## ucminf      1.0000 1.0000 2.726e-17     38     38    NA
## newuoa      1.0000 1.0000 2.178e-15    257     NA    NA
## BFGS        1.0000 1.0000 2.268e-13     47     47    NA
## CG          1.0000 1.0000 2.268e-13     47     47    NA
## Nelder-Mead 1.0000 1.0000 2.268e-13     47     47    NA
## L-BFGS-B    1.0000 1.0000 2.268e-13     47     47    NA
## bobyqa      1.0000 1.0000 2.845e-13    290     NA    NA
## nlm         1.0000 1.0000 3.974e-12     NA     NA    23
## hjkb        1.0000 0.9999 2.618e-09    540     NA    19
## spg         0.9998 0.9996 3.896e-08    141     NA   112
## nmkb        1.0003 1.0006 4.269e-07    120     NA    NA
##             convcode  kkt1 kkt2 xtimes
## Rvmmin             2  TRUE TRUE  0.004
## nlminb             0  TRUE TRUE  0.000
## Rcgmin             0  TRUE TRUE  0.032
## ucminf             0  TRUE TRUE  0.000
## newuoa             0  TRUE TRUE  0.000
## BFGS               0  TRUE TRUE  0.000
## CG                 0  TRUE TRUE  0.000
## Nelder-Mead        0  TRUE TRUE  0.000
## L-BFGS-B           0  TRUE TRUE  0.000
## bobyqa             0  TRUE TRUE  0.004
## nlm                0  TRUE TRUE  0.000
## hjkb               0  TRUE TRUE  0.008
## spg                0  TRUE TRUE  0.060
## nmkb               0 FALSE TRUE  0.004

optimx() allows a very quick comparison of how well methods work on particular classes of problem. It should not, of course, be used in this way for obtaining solutions on a production basis. In the present example, we see that we could use any of the first half dozen methods satisfactorily.

9.2 Some other function minimization packages

The collection of packages for R is large and fluid. Those in what I consider the primary collection, the Comprehensive R Archive Network or CRAN, must satisfy regular build tests, so sometimes disappear from the collection. Overall, the size of the collection is growing. There are, moreover, many other repositories and collections, some are excellent and others are of dubious value. Thus my comments here are directed to some tools that I consider useful.

9.2.1 nloptr and nloptwrap

nloptr (Ypma, 2013) is an interface to the NLopt library maintained by Steven G. Johnson of MIT (http://ab-initio.mit.edu/wiki/index.php/NLopt). There is a simplified wrapper for this called nloptwrap (Borchers, 2013). There are a number of optimization tools in NLopt, some of which mirror, and are derived from the same sources as, some of the tools seen above for unconstrained and bounds constrained optimization.

require(nloptwrap)
## Loading required package: nloptwrap
## Loading required package: nloptr
##
## Attaching package: 'nloptwrap'
##
## The following objects are masked from 'package:minqa':
##
##    bobyqa, newuoa
nowp <- function(answer) {
    # nloptwrap answer summary
    cat("Fn =", answer$value, " after ", answer$iter, " iterations, parameters:
")
    print(answer$par)
    cat(answer$message, "
")
    invisible(0)
}
albfgs <- lbfgs(sstart, fr, gr = frg)
nowp(albfgs)
## Fn = 7.357e-23  after  56  iterations, parameters:
## [1] 1 1
## NLOPT_SUCCESS: Generic success return value.
atnewton <- tnewton(sstart, fr, gr = frg)
nowp(atnewton)
## Fn = 1.735e-24  after  74  iterations, parameters:
## [1] 1 1
## NLOPT_SUCCESS: Generic success return value.
avarmetric <- varmetric(sstart, fr, gr = frg)
nowp(avarmetric)
## Fn = 1.391e-18  after  53  iterations, parameters:
## [1] 1 1
## NLOPT_SUCCESS: Generic success return value.
anelmead <- neldermead(sstart, fr)
nowp(anelmead)
## Fn = 3.656e-14  after  214  iterations, parameters:
## [1] 1 1
## NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached.
anewuoa <- newuoa(sstart, fr)
nowp(anewuoa)
## Fn = 8.023e-17  after  194  iterations, parameters:
## [1] 1 1
## NLOPT_SUCCESS: Generic success return value.
# remove the packages in case of confusion
detach(package:nloptwrap)
detach(package:nloptr)

9.2.2 trust and trustOptim

Trust region methods of optimization, as pointed out very nicely in http://en.wikipedia.org/wiki/Trust_region, choose a step size for a search for a new and hopefully better set of parameters and then find a search direction. The step size is determined by the region in which a model of the objective function is sufficiently accurate for the needs of optimization. Traditional gradient methods choose the search direction and then work out a step size.

trust (Geyer, 2013) implements a trust region method where the user must supply objective function code and a starting set of parameters. However, the objective function now returns not only the function but also its gradient and Hessian. This is, for many problems, a large and onerous task. In the following text, we illustrate this using the Rosenbrock function that has just been used above. Furthermore, we use the example code from trust to show an alternative way to generate the objective function in objfun2().

frh <- function(x) {
    ## Rosenbrock Banana function gradient
    x1 <- x[1]
    x2 <- x[2]
    h11 <- -400 * x2 + 1200 * x1 * x1 + 2
    h12 <- -400 * x1
    h21 <- h12
    h22 <- 200
    HH <- matrix(c(h11, h12, h21, h22), nrow = 2, ncol = 2)
}
objfun1 <- function(x) {
    val <- fr(x)
    gg <- frg(x)
    HH <- frh(x)
    list(value = val, gradient = gg, hessian = HH)
}
objfun2 <- function(x) {
    stopifnot(is.numeric(x))
    stopifnot(length(x) == 2)
    f <- expression(100 * (x2 - x12)2 + (1 - x1)2)
    g1 <- D(f, "x1")
    g2 <- D(f, "x2")
    h11 <- D(g1, "x1")
    h12 <- D(g1, "x2")
    h22 <- D(g2, "x2")
    x1 <- x[1]
    x2 <- x[2]
    f <- eval(f)
    g <- c(eval(g1), eval(g2))
    B <- rbind(c(eval(h11), eval(h12)), c(eval(h12), eval(h22)))
    list(value = f, gradient = g, hessian = B)
}
require(trust)
atrust1 <- trust(objfun1, sstart, rinit = 1, rmax = 5)
atrust1
## $value
## [1] 3.7902e-22
##
## $gradient
## [1]  2.9933e-10 -1.6724e-10
##
## $hessian
##      [,1] [,2]
## [1,]  802 -400
## [2,] -400  200
##
## $argument
## [1] 1 1
##
## $converged
## [1] TRUE
##
## $iterations
## [1] 27
## Get same answers from atrust2<-trust(objfun2, sstart,
## rinit=1, rmax=5)

We note that trust works fine for the Rosenbrock function, which it should, given the amount of information provided.

Where should a package such as trust be used? As it requires the gradient and Hessian as well as the function to be computed, it clearly will not be advisable to apply it to functions where these are difficult or impossible to compute. This includes functions where we have conditional statements, absolute values, or similar constructs. Moreover, trust will be most favored when the gradient and Hessian are easy to compute, which can be the case for certain likelihoods based on exponential family probability distributions. Communications from its author, Charles Geyer, make it clear that he understands this very well and that trust is intended for situations where it is very economical to compute the objective function, gradient, and Hessian in a coordinated manner.

If gradient and hessian functions are available, we still need to consider the control parameters rinit and rmax. These values are likely not critical, but defaults are not provided.

A more recent package is trustOptim (Braun, 2013), which claims to be usable for sparse Hessians. There is a vignette to illustrate how this package can be used. However, some suggested test cases supplied by the package author gave unexpected results, so there are uncertainties about the package at the time of writing. I suspect that in time this package will become useful in the sphere of application for which it is intended, but at the time of writing, urge caution in using it.

9.3 Should we replace optim() routines?

In several places in this book, I have noted that some of the methods in optim() are “old,” even though they are my own. Among those of us working to improve the optimization tools in R, there is an ongoing discussion of how to replace the minimizers in optim(). This is, and should be, an ongoing task and discussion. With optimx, we do have some reasonable and indeed evolving replacements for the methods in optim(). However, there remains an important and possibly difficult adjustment to be made in various packages and functions that make use of optim(), in particular those that directly call the internal functions that make up the optim() function. For example, nnet() directly calls the internal C routine vmmin.c which is the principal computational engine of the optim() method “BFGS.” The difficulty is less in offering “new” functions than in removing the obsolete ones.

References

  1. Bates D, Mullen KM, Nash JC and Varadhan R 2010 minqa: Derivative-Free Optimization Algorithms by Quadratic Approximation. R Foundation for Statistical Computing. R package version 1.1.13.
  2. Birgin EG and Raydan MM 2001 Spg: software for convex-constrained optimization. ACM Transactions on Mathematical Software 27, 340–349.
  3. Borchers HW 2013 nloptwrap: Wrapper for Package nloptr. R package version 0.5-1.
  4. Braun M 2013 trustOptim: Trust region nonlinear optimization, efficient for sparse Hessians. R package version 0.8.1.
  5. Dai YH and Yuan Y 2001 An efficient hybrid conjugate gradient method for unconstrained optimization. Annals of Operations Research 103(1–4), 33–47.
  6. Fletcher R 1970 A new approach to variable metric algorithms. Computer Journal 13(3), 317–322.
  7. Geyer CJ 2013 trust: Trust Region Optimization. R package version 0.1-4.
  8. Hooke R and Jeeves TA 1961 “direct search” solution of numerical and statistical problems. Journal of the ACM 8(2), 212–229.
  9. Kelley C 1987 Iterative Methods for Optimization Frontiers in Applied Mathematics. Society for Industrial and Applied Mathematics.
  10. Nash JC 2011a Rcgmin: conjugate Gradient Minimization of Nonlinear Functions with Box Constraints. Nash Information Services Inc. R package version 2011-2.10.
  11. Nash JC 2011b Rvmmin: variable Metric Nonlinear Function Minimization with Bounds Constraints. Nash Information Services Inc. R package version 2011-2.25.
  12. Nash JC and Varadhan R 2011a optimx: a Replacement and Extension of the optim() function. Nash Information Services Inc. and Johns Hopkins University. R package version 2011-3.5.
  13. Nash JC and Varadhan R 2011b Unifying optimization algorithms to aid software system users: optimx for R. Journal of Statistical Software 43(9), 1–14.
  14. Nielsen HB 2000 UCMINF - an algorithm for unconstrained, nonlinear optimization. Technical report, Department of Mathematical Modelling, Technical University of Denmark. Report IMM-REP-2000-18.
  15. Nielsen HB and Mortensen SB 2012 ucminf: General-purpose unconstrained non-linear optimization. R package version 1.1-3.
  16. Powell MJD 2009 The BOBYQA algorithm for bound constrained optimization without derivatives.
  17. Varadhan R and Borchers HW 2011 dfoptim: Derivative-free Optimization Johns Hopkins University and ABB Corporate Research. R package version 2011.12-9.
  18. Varadhan R and Gilbert P 2009 BB: an R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. Journal of Statistical Software 32(4), 1–26.
  19. Ypma J 2013 nloptr: R interface to NLopt. R package version 0.9.3.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.86.60