Chapter 17
Finding the right solution

This chapter looks at how we

  • find the appropriate solution and
  • examining it to decide if it is satisfactory to our purposes.

Almost any developer of software tools for solving computational problems has a few “war stories” about the user who, given a perfectly good mathematical solution to his or her problem, complains that it “is not right.” But even before that we have to actually generate a solution.

Throughout this chapter, we will only talk of minimization problems. This is the framework we have used elsewhere, and maximization problems are assumed to have been converted to minimizations.

17.1 Particular requirements

As we have touched on earlier, users generally “know” their own problem in ways that the optimization software and its designers cannot be expected to address. If one or more parameters must obey a particular constraint, such as being positive, this must be communicated to the software in some way. If a parameter must be an integer, then we need to choose how we solve the problem so that this is taken into account.

The advice in this section, therefore, is

  • write down every requirement on the solution and solution parameters, even if these conditions may not be imposed explicitly in the software;
  • given a trial solution, check that each condition is satisfied;
  • as needed, impose the conditions in the software, which may imply a change of method.

17.1.1 A few integer parameters

In cases where there are at most two or three integer optimization parameters, I would create a loop or grid and solve the optimization problem for each value of the integer in a reasonable range. This is, admittedly, a very crude approach, but mixed integer optimization is a field full of difficulties. Let us look at our classic Rosenbrock function in the integer domain as an example.

fr <- function(x) {
    ## Rosenbrock Banana function
    x1 <- x[1]
    x2 <- x[2]
    value <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
x1r <- seq(-10, 10, 1)
x2r <- x1r
frbest <- 1e+300
xbest <- rep(NA, 2)
for (x1 in x1r) {
    for (x2 in x2r) {
        fval <- fr(c(x1, x2))
        if (fval < frbest) {
            xbest <- c(x1, x2)
            frbest <- fval
        }
    }
}

This approach can also be used for situations where we need to decide which sequence of activities is best by generating the set of possible sequences and looping over them. Clearly this is NOT a good approach in general. I would not recommend it, for example, for the type of schedule optimization performed by Michael Trick (see, for example, http://mat.tepper.cmu.edu/trick/). However, when a simple construct such as the example above can be used to perform an essentially exhaustive search, it is likely less work than learning how to set up and use a more sophisticated tool. R is quite weak in this area, and relies on calls to external packages, no doubt because of its statistical heritage. My own experience with integer optimization parameters is severely limited for the same reasons.

17.2 Starting values for iterative methods

Starting values for iterative methods for nonlinear optimization or least squares are a general issue for almost all such software. The choice of starting values is often directed by the purpose of the computations, especially when there are multiple optima in the domain of possible solutions.

From the point of view of testing solutions, starting values may seem an incongruous topic. However, in iterative methods, it is useful to consider not only the parameters that are produced by a method but also where that method started if it used an iterative process. Moreover, if the end point is very far from a start that is considered reasonable, we should wonder what has happened. The test, if we can consider it as such, is one of consistency with our anticipated outcome.

The starting parameters we use come from a variety of sources:

  • For a number of test problems, there are well-known sets of starting values. Hence, for the Rosenbrock test function (Rosenbrock, 1960), the start is at (c17-math-0001).
  • For some problems, such as the nls() problems with selfStart() modules, there are starting values that satisfy some supposedly “reasonable” conditions. Note the capitalization in selfStart.
  • For problems with which the user has a base of experience, it is likely that quite good guesses at the parameters can be provided.
  • If there are sensible bounds on the parameters, then some choice such as the midpoints of the intervals may work reasonably well. For positive parameters, the square root of the upper bound may be a better choice (i.e., the geometric mean of the bounds).
  • Where possible, I recommend that starting values are NOT on a bound. Some methods (e.g., nmkb() in dfoptim) cannot be called with parameters on the boundary.
  • For problems that are well known to be reliably solved from practically any starts, it is not uncommon to use 1 for every starting value. This is, in fact, used as a default by nls() and some other programs, and I have on occasion used it myself.

In many cases, very crude approximations will suffice to get the scale of parameters, and some commonsense thinking will provide loose bounds. Better starting values can be obtained if a rough linearization of a model is possible so that, for example, a linear regression can be performed. The starting values for the SSlogis() tool use a variant of this idea (see Sections 6.5.1 and 16.5). Note that it is possible to display the code to see what this does by typing SSlogis at the R command line.

17.3 KKT conditions

17.3.1 Unconstrained problems

For unconstrained minimization of functions that have continuous gradients, the solution must have a zero gradient at every local minimum (Karush, 1939; Kuhn and Tucker, 1951). This is a necessary condition, but because local maxima and saddle points also give zero gradients, it is not sufficient. We call this the first-order KKT (Karush–Kuhn–Tucker) condition.

We also want the objective function to be higher at every point “near” the proposed minimum, so that the function is locally convex. There is a nice mnemonic at http://shreevatsa.wordpress.com/2009/03/17/convex-and-concave/ to help one remember this. If the second-order derivatives of the objective can be computed, they define the Hessian matrix

equation

The condition for the proposed solution to be at “the bottom of a bowl” is that the Hessian is positive-definite, for which the most common test is that its eigenvalues are all positive.

Reality, of course, is never so nice. The villain in this is the possibility of floating-point representations of eigenvalues that are “small.” Indeed, we can likely create matrices that resemble Hessians that are provably positivee-definite in exact arithmetic that give some small negative estimates of the eigenvalues. And it is very common that real problems give positive but small Hessian eigenvalues. Thus we must test for “small.”

In this activity, my own rule of thumb is to first look for all positive eigenvalues. Those computed negative or zero should, I believe, be reported as such, even if it turns out later that they are an awkward special case of unfortunate rounding. In any event, these cases are candidates for careful examination and will likely cause trouble for our optimizers.

Assuming all the eigenvalues are strictly positive, I take the ratio of the smallest to the largest of the Hessian eigenvalues. This ratio should not be “too small.” In package optimx, ratios smaller than 1ec17-math-0003 are taken as evidence of effective singularity of the Hessian.

There is no absolute rule about such tolerances. My own justification comes from multiple linear regression, where the singular values of the matrix of variables are comparable to the square roots of the Hessian eigenvalues. Thus a ratio of singular values smaller than 1ec17-math-0004 would be taken as evidence of singularity, which in linear regression is indicative of a lack of independence between the explanatory variables. The singular values give the relative contribution of orthogonalized explanatory variables to the overall variability of these variables. Thus a singular value that is 1/1000 as big as the largest is contributing a very tiny fraction of the overall variability. Indeed, if we graph the singular values as bars on a regular-sized sheet of paper, the smallest singular value is represented by a bar about 0.2 mm long or less, because A4 and “letter” sizes are 210 and 216 mm wide, respectively.

Warning: Many people assume that computed eigenvalues will be sorted largest to smallest. While many programs do indeed provide them in this ordering, it pays to be careful and check or else use program code that explicitly looks for the maximal and minimal values. The extra time for the check is small compared to the time to repair the damage when we get this wrong.

17.3.2 Constrained problems

The optimality conditions for constrained problems are more demanding to both specify and compute. Gill et al. (1981, Chapter 3) give a fairly comprehensive solution, but for most R users, I believe that the material will not be easy reading. There is a good summary introduction to the wider topic in http://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions.

Clearly, if the constraints are not active, we can use the KKT conditions above. When one or more constraints are active, however, we need to ensure that the gradient as projected along any feasible direction is essentially zero as a first-order condition. There are extensions of the KKT conditions for such situations, but we need to be conscious that they depend on assumptions of continuity and differentiability that may not be true.

My own preference—and this is colored by the limited number of problems with general constraints I have encountered—is to test the proposed optimum to see if it satisfies the constraints. This has the merit that the computations are done using the user's own code. We can also check nearby points to ensure that they are either infeasible (we are on a constraint boundary) or have a higher value of the objective function.

17.4 Search tests

Following the idea of checking points near a proposed solution, I have found it useful for a number of problems to perform a simple axial search around a proposed or trial minimum. Here we will consider only minimization. I have found in my own work that trying to add maximization in a clean way to such search strategies—which the user may wish to alter—is a recipe for trouble. For maximization problems, we minimize c17-math-0005 times the function and report the negated value. (It is the process of trying to apply scalings appropriately for reporting results that creates the issues for searches.)

The most common search I apply is an axial search where along each parameter direction a small step “forward” and “backward” is attempted and the objective function reevaluated. Code for this in BASIC appeared in Nash and Walker-Smith (1987). There is also a version in R in the package optextras in the optimizer project packages on R-forge (https://r-forge.r-project.org/R/?group_id=395). This is able to recognize bounds and mask constraints. However, it does not evaluate more general constraints and would need modification for them.

One of the purposes for which the axial search was implemented was to restart direct search algorithms like Nelder–Mead. It is, of course, not foolproof. For example, a function having a saddle between peaks to the northwest and southeast of a supposed minimum that falls away in the northeast and southwest directions could find that the north, south, east, and west axial directions all give higher points on our fog-bound surface.

References

  1. Gill PE, Murray W, and Wright MH 1981 Practical Optimization. Academic Press, London and New York.
  2. Karush W 1939 Minima of functions of several variables with inequalities as side constraints. Master's thesis, M.Sc. Dissertation, Department of Mathematics, University of Chicago Chicago, IL.
  3. Kuhn HW and Tucker AW 1951 Nonlinear programming. Proceedings of 2nd Berkeley Symposium, pp. 481–492, Berkeley, USA.
  4. Nash JC and Walker-Smith M 1987 Nonlinear Parameter Estimation: An Integrated System in BASIC. Marcel Dekker, New York. See http://www.nashinfo.com/nlpe.htm for an expanded downloadable version.
  5. Rosenbrock HH 1960 An automatic method for finding the greatest or least value of a function. The Computer Journal 3, 175–184.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.37.10