Chapter 9. Optimization

Often, in scientific computing, we are required to find the value of x for which a function f(x) will attain a maximum or minimum value. In other words, we want to maximize or minimize f(x). This process is termed as numerical optimization and can be summarized as follows:

Optimization

In the preceding formula, x represents a vector of variables also known as the unknowns or parameters, f is the function of x we want to maximize or minimize known as the objective function, zi is the constraint functions that x must fulfill, and N and M are sets of indices. Optimization problems are used in mathematics, finance, and computer science to find the best solution from all feasible solutions. We can simplify maximization optimization problems to minimization problems by remembering that the maximum of f(x) is essentially the minimum of -f(x). Therefore, to maximize the function f, we simply need to minimize f. In this chapter, you will learn different methods and functions used to perform numerical optimization in R.

In this chapter, we will cover the following topics:

  • The golden section search method
  • The Newton-Raphson method
  • The Nelder-Mead simplex method
  • Other methods with the optim() function
  • Linear programming
  • Integer-restricted optimization
  • Unrestricted variables with the lp() function
  • Quadratic programming

Let's start by minimizing the following function:

Optimization

In the preceding example, the function f has no constraints, so we can select variables within its full range freely. To solve this problem, we will apply the golden section search method, which is a one-dimensional unconstrained optimization method.

One-dimensional optimization

One-dimensional unconstrained optimization is used to minimize the function of the type One-dimensional optimization. The golden section search method is a zero-line search method used to solve functions of the type One-dimensional optimization. This method uses the values of the objective function f and not their derivatives, making this type of solution best applicable to minimize the One-dimensional optimization function since f(x) is not differentiable at x = 2.5.

The golden section search method

The golden section search method uses an interval reduction strategy independent of the number of iterations, where the ratio between the sizes of two consecutive intervals is constant and makes use of the golden ratio The golden section search method. The golden ratio is defined algebraically as follows:

The golden section search method

Here, The golden section search method.

Basically, it is an iterative method that minimizes the function by:

  • Defining an interval [a, b] that contains the minimizer.
  • Shrinking the interval into progressively smaller intervals [a', b'] that still contain the minimizer.
  • Repeating the second step until the difference between b' – a' is small enough based on a pre-set tolerance.
  • Using the midpoint of that interval as an estimate for the true minimizer, which is calculated as (a' + b' )/2. The maximum error for this minimizer becomes (b' – a')/2.

Let's take a look at the following steps to perform the Golden search method to initialize f(x):

  1. To apply the previous points to minimize the function, we will write a function that will pick two points on the [a, b] interval defined as x1 and x2, where x1 < x2.
  2. Then, we calculate f(x1) and f(x2). If f(x1) > f(x2), then the minimizer must be to the right of x1. Conversely, if f(x1) < f(x2), then the minimizer must be to the left of x2.
  3. Consequently, the minimizer will be in the interval [x1, b] if the former is true and in the interval [a, x2] if the latter is true.
  4. To take advantage of certain properties of the golden ratio, we will set x1 and x2 as follows:
    The golden section search method
  5. Here is how we justify the equation for x2. After the first iteration of the golden search method, a will be replaced with a' = x1 and the next value for x1 will be x1':

    The golden section search method
  6. Now we can replace x1 in the formula by The golden section search method, which yields the following formula:
    The golden section search method
  7. One of the properties of the golden ratio is as follows:
    The golden section search method
  8. So we can substitute The golden section search method by The golden section search method, and by applying simple algebra, we can write:
    The golden section search method
    Since The golden section search method, we can write The golden section search method.
  9. Now, to translate these steps into an R function, we can use the following function to find the minimizer:
    The golden section search method
  10. We start by defining our function, which we will call golden.method(), and set arguments to specify the pre-set the tolerance and initial values for a and b. By default, let's set the tolerance value to 0.000001 as given in the following code:
    golden.method <- function(f, a, b, tolerance=0.000001){
      
      # store the golden ratio in the object psi
      psi <- (1 + sqrt(5))/2
      
      #Calculate value for x1 and x2 using the formulas we defined 
      x1 <- b - (b - a)/psi
      x2 <- a + (b - a)/psi
    
      #Find the value for f(x1) and f(x2)
      fx1 <- f(x1)
      fx2 <- f(x2)
    
    # Repeat test for the minimizer while the absolute difference
    # between b and a is greater than the set threshold
    
      while(abs(b-a) > tolerance){
    
      # Use an if statement to test if the minimizer is to the  
      # left of x2 else it is to the right of x1
    
        if(fx2 > fx1){
          
          # Since the minimizer is to the left of x2, we can re-
        # use x2 for b' (defined b in loop) 
          
          b <- x2
          
          # We can reuse x1 for x2 for the next iteration of loop 
          
          x2 <- x1
          
          # Since we re-use x1 for x2 we can use the value stored 
          # in fx1 for fx2
          
          fx2 <- fx1
          
          # Now we calculate a new value for x1 and f(x1) to test 
          # in the next iteration of the loop
          
          x1 <- b - (b - a)/psi
          
          fx1 <- f(x1)
          
        } else {
        
          # Similarly, since the minimizer is to the right of x1,   
          # we can re-use the value x1 for a 
          
          a <- x1
          
          # We can re-use x2 for x1 in the next iteration of the loop
          
          x1 <- x2
          fx1 <- fx2
          
          # We calculate a new value for x2 and f(x2) to test in 
          # the next iteration of the loop
          
          x2 <- a + (b - a)/psi
          fx2 <- f(x2)
          
        }
      }
      
      # When the absolute difference between b and a is below our 
      # set threshold, we can use the midpoint of the final 
      # [a', b'] interval as an estimate for the true minimize
      
      minimizer <- (a + b)/2
      max.error <- (b - a)/2
      
      #return the minimizer and the maximum error in a list
      
      results <- list(minimizer=minimizer, maximum.error=max.error)
      
      return(results)
    } 

Now let's give our golden.method() function a try with The golden section search method.

First, let's take a look at the curve for f(x) using the curve() function to help determine the best [a, b] interval to start from in our golden.method() function, as follows:

# Define the function f
> f <- function(x) { 
  abs(x - 2.5) + (x - 1)^2 
}

#Plot the curve
> curve(f, from=0, to=5)

The result is shown in the following plot:

The golden section search method

From the previous plot, it seems that the minimizer is located between 1 and 3, so we will set a=1 and b=3 in our golden.method() function, as follows:

> golden.method(f, a=1, b=3)
$minimizer
[1] 1.5

$maximum.error
[1] 3.321874e-07

You can also add the minimizer to the graph as follows:

> curve(f, from=0, to=5, cex.lab=1.5)
> res <- golden.method(f, a=1, b=3)
> points(res$minimizer,f(res$minimizer))

The result is shown in the following plot:

The golden section search method

Now let's use the golden.method() function to maximize The golden section search method using the following lines of code:

# Define the function f
> g <- function(x) {
  abs(x-2.5) - abs(x -1) - abs(x -0.5)
}

#Plot the curve
> curve(g, from=-10, to=10)

The result is shown in the following plot:

The golden section search method

Now, to maximize g(x), we simply need to minimize –g(x) by creating a function to calculate negative g and use it as input into our golden.method() function. Let's take a look at this in the following lines of code:

> h <- function(x) { 
  -g(x)
}

# We know from the curve the minimizer is between [-5, 5]
> golden.method(h, a=-5, b=5)
$minimizer
[1] 0.4999999

$maximum.error
[1] 3.92094e-07

Note

In the previous example, we had to write a new function that calculates –g(x), because if we write golden.method(-g, a=-5, b=5), we will get Error in -g : invalid argument to unary operator.

The optimize() function

Instead of using the golden.method() function we defined, we could also use the built-in optimize() function. This function uses a combination of the golden search method and successive parabolic interpolation, which we will not go into detail about here. Nevertheless, we will show you how to use the function to minimize our first example, The optimize() function, with the optimize() function. Let's take a look at this in following lines of code:

> f <- function(x) { 

abs(x - 2.5) + (x - 1)^2 
}

Then, we use the optimize function using the interval argument to define a and b and the tol argument to set the tolerance. Let's take a look at this in the following lines of code:

> optimize(f, interval=c(1, 3), tol=0.000001)
$minimum
[1] 1.5

$objective
[1] 1.25

As you can see, we get the same answer as before, that is, 1.5. The function also returns $objective, which corresponds to the value of f(x) at that point. We could have also defined the interval with the lower and upper arguments instead as follows:

> optimize(f, lower=1, upper=3, tol=0.000001)
$minimum
[1] 1.5

$objective
[1] 1.25

Another useful feature of the optimize() function is its ability to directly maximize a function without the user needing to specify the negative of the function, like we did earlier. Instead, you just need to specify maximum=TRUE when you use the optimize function. Let's use the optimize function to maximize The optimize() function from our earlier example. Let's take a look at this in the following lines of code:

> g <- function(x) {
   abs(x-2.5) - abs(x -1) - abs(x -0.5)
 }
> optimize(g, lower=-5, upper=5, tol=0.000001, maximum=TRUE)
$maximum
[1] 0.5

$objective
[1] 1.5

The Newton-Raphson method

The Newton-Raphson method is a derivative-based numerical method used to solve one-dimensional optimization problems. By using an initial point and the derivative information of the objective function, you can obtain the minimum for f(x). It uses linear approximation to solve the root of the equation. Given that f(x) is a convex function, otherwise known as a "well-behaved" function, and r is the root of the equation f(x) = 0, we can estimate the true root r with x0, where r = x0 + h and h is the measure of how far x0 is from r. If h is very small and near 0, we can use a Taylor series approximation to solve the equation as follows:

The Newton-Raphson method

Therefore, r = x0 + h can be approximated as follows:

The Newton-Raphson method

This approximation can be done with a potentially better estimate for r being x1 defined as follows:

The Newton-Raphson method

If we repeat this process multiple times, the subsequent estimates for r will be defined as follows:

The Newton-Raphson method

This process will be repeated until The Newton-Raphson method is near 0 or a pre-set tolerance ε such that The Newton-Raphson method. Therefore, as long as the minimizer for f(x) is in the interval [a, b] and not a or b and satisfies f'(x) = 0, we can find the minimizer of f(x) using this approach by using the following formula:

The Newton-Raphson method

However, we must also check that the value we find is actually a minimizer of the function and not just a maximizer or a point of inflection. So, to guarantee that our solution is a minimizer, we check whether The Newton-Raphson method.

Now let's use the Newton-Raphson method to minimize the function The Newton-Raphson method.

The first derivative of f(x) is as follows:

The Newton-Raphson method

The second derivative of f(x) is as follows:

The Newton-Raphson method

Before we begin, we will plot the function to determine the best initial estimate for x0. Let's take a look at this in the following lines of code:

> f <- function(x) {
  exp(-x^2) + x^3
 }
> curve(f, from=-1, to=4)

The result is shown in the following plot:

The Newton-Raphson method

From the curve, a reasonable estimate for x0 is 1. Now, we will write a function to implement the Newton-Raphson method to minimize the function.

We include the N argument to define the maximum number of iterations we should run, which we default to 100, as follows:

 newton.method <- function(f, ff, fff, x0, tol=0.000001, N=100){
   
   # We start the counter at 1
   
  i <- 1
   
   # We create a vector to store the estimates and values for   
   # f(x), f'(x) and f''(x)
   
  estimates <- numeric(N)
   fvalue <- numeric(N)
   ffvalue <- numeric(N)
   fffvalue <- numeric(N)
   
   while(i <= N){
     
     # We use the Newton-Raphson formula to estimate the            
     # minimizer
     
    x1 <- x0 - ff(x0)/fff(x0)
     
     # We store the estimates and values for f(x), f'(x) and        
     # f''(x) 
     
    estimates[i] <- x1
     fvalue[i] <- f(x1)
     ffvalue[i] <- ff(x1)
     fffvalue[i] <- fff(x1)
     
     # We update the counter
     
    i <- i + 1
     
     # We break from the loop if we reach below our pre-set          
     # tolerance
       
    if(abs(x0 - x1) < tol) break
     
     # Before the next iteration of the loop we replace x0 with     
     # the value x1
     
    x0 <- x1
   
   }
   
   # We return a dataframe  of all the estimates and values for     
   # the first & second derivative
   
  estimates <- estimates[1:(i-1)]
   fvalue <- fvalue[1:(i-1)]
   ffvalue <- ffvalue[1:(i-1)]
   fffvalue <- fffvalue[1:(i-1)]
   
  df <- as.data.frame(cbind(estimates, fvalue, ffvalue,   fffvalue))
   
  return(df)
 
 }

Now, let's use the function we wrote to minimize our function stored in f. First, we need to create a function for The Newton-Raphson method and The Newton-Raphson method, which we will store in fp and fpp, respectively.

For the first derivative The Newton-Raphson method, we use the following lines of code:

fp <- function(x) {
  3*x^2 - 2*x*exp(-x^2) 
} 

For the second derivative The Newton-Raphson method, we use the following lines of code:

fpp <- function(x) {
  4*x^2*exp(-x^2 )- 2*exp(-x^2 )+ 6*x
}

Next, we use the newton.method() function with x0=1 using f, fp, and fpp, as follows:

> newton.method(f, fp, fpp, x0=1)
  estimates    fvalue       ffvalue fffvalue
1 0.6638477 0.9361433  4.675900e-01 3.830410
2 0.5417746 0.9046560  7.262741e-02 2.634812
3 0.5142100 0.9036205  3.761765e-03 2.361857
4 0.5126173 0.9036175  1.255900e-05 2.346086
5 0.5126120 0.9036175  1.418679e-10 2.346033
6 0.5126120 0.9036175 -1.110223e-16 2.346033

From the output, we can see that our minimum is 0.5126120, and since The Newton-Raphson method is positive, we can conclude that it is a local minimum. We can check our solution with the optimize() function in the [0, 2] interval as follows:

> optimize(f, interval=c(0, 2), tol=0.000001)
$minimum
[1] 0.5126119

$objective
[1] 0.9036175

As you can see, we get a pretty close value of 0.5126120 compared to 0.5126119 using the optimize() function.

There is also a Newton-Raphson function, (), which is part of the spuRs package. Let's use it to minimize the function as follows:

> install.packages("spuRs")
> library("spuRs")

To be able to use the newtonraphson() function to minimize our equation, we need to create a function that will return the value for The Newton-Raphson method and The Newton-Raphson method in a vector as the fftn argument.

Since we need to apply the Newton-Raphson method on the The Newton-Raphson method and The Newton-Raphson method function, we store these values in separate variables that we will combine in a results vector to return to the newtonraphson() function. Let's write the ftn argument function and call it minimizer.ftn() as follows:

minimizer.ftn <- function(x){
  ffvalue <- 3*x^2 - 2*x*exp(-x^2) 
  fffvalue <- 4*x^2*exp(-x^2 )- 2*exp(-x^2 )+ 6*x
  results <- c(ffvalue, fffvalue)
  return(results)

}

Now we run the analysis setting x0=1 for our initial estimate and tol=0.000001 for our pre-set tolerance as follows:

>  newtonraphson(minimizer.ftn, x0=1, tol=0.000001)
At iteration 1 value of x is: 0.6638477 
At iteration 2 value of x is: 0.5417746 
At iteration 3 value of x is: 0.51421 
At iteration 4 value of x is: 0.5126173 
At iteration 5 value of x is: 0.512612 
Algorithm converged
[1] 0.512612

We get almost the same answer as with our newton.method() function.

The Nelder-Mead simplex method

An alternative derivative-free approach to the Newton-Raphson method is the Nelder-Mead simplex method. It is a nonlinear optimization technique to minimize an objective function in multiple dimensions. By default, the optim() function, which is part of the basic functions in R, uses the Nelder-Mead simplex method to minimize your function of choice. Let's use the optim() function with its default settings to solve The Nelder-Mead simplex method. To set the initial estimate x0 in the optim() function, we use the par argument. In this case, we set par to 1 as follows:

> f <- function(x) {
 exp(-x^2) + x^3
 }

> optim(par=1, fn=f)
$par
[1] 0.5126953

$value
[1] 0.9036175

$counts
function gradient 
      26       NA 

$convergence
[1] 0

$message
NULL

Warning message:
In optim(1, f) : one-dimensional optimization by Nelder-Mead is unreliable:
use "Brent" or optimize() directly

From the output, you will notice that the minimizer is given in $par. You will also notice that there is a warning message to use the Brent method or optimize() function directly since it yields more accurate results. We can apply the Brent method directly by specifying method="Brent" and the [a, b] interval with the upper and lower arguments, as follows:

> optim(1, f, method="Brent", lower=0, upper=2)
$par
[1] 0.512612

$value
[1] 0.9036175

$counts
function gradient 
      NA       NA 

$convergence
[1] 0

$message
NULL

As you can see, we get the same value we found earlier with the Brent method compared to the Nelder-Mead method. However, it is worth mentioning that the Nelder-Mead approach is a heuristic search method that is preferentially used to solve problems that can't easily be solved with other methods.

The Nelder-Mead method is also used to solve multidimensional problems. For example, let's minimize the rosenbrock function, which is defined as follows:

The Nelder-Mead simplex method

First, we store the function in an object called rosenbrock.f, as follows:

> rosenbrock.f <- function(x1,y1) {
 (1-x1)^2 + 100*(y1 - x1^2)^2
 }

Then, we can plot the function as follows:

  1. First, we define the x and y values to plot:
    > x <- seq(-3, 3, by=0.2)
    > y <- seq(-2, 3, by=0.2)
  2. Then, we obtain the outer product of the x and y vectors using the outer() function as follows:
    > z <- outer(x, y, rosenbrock.f)
  3. Next, we plot the function using the persp() function as follows:
    >persp(x,y,z,phi=40,theta=40,col="turquoise",shade=.000001,ticktype="detailed", cex.lab=1.5, zlab="")

The result is shown in the following plot:

The Nelder-Mead simplex method

One of the properties of the Rosenbrock function is that it is always positive except when The Nelder-Mead simplex method and The Nelder-Mead simplex method, where it is 0; so, (1, 1) is a minimum for this function. Let's see how close we get to the minimum using the Nelder-Mead method with the optim() function.

To be able to use the optim() function, we need to rewrite rosenbrock.f so that x and y are respectively stored in the first and second index of a single vector x as follows:

> rosenbrock.f2 <- function(x) {
 (1-x[1])^2 + 100*(x[2]-x[1]^2)^2
}

Now we can use the optim() function to minimize the Rosenbrock function. This time, we must specify coordinates (x0, y0) for the initial estimate of the minimizer using a vector for the $par argument. In this case, we will guess (0.7, 0.7) for the initial values of the parameters to be optimized as follows:

> optim(par=c(0.7,0.7), rosenbrock.f2)
$par
[1] 1.000065 1.000120

$value
[1] 1.499038e-08

$counts
function gradient 
      71       NA 

$convergence
[1] 0

$message
NULL

As you can see, we get values pretty close to (1, 1). If we set the initial parameters to (1.5, 1.5), we get similar results as follows:

> optim(par=c(1.5,1.5), rosenbrock.f2)
$par
[1] 0.9995667 0.9991578

$value
[1] 2.463013e-07
# Output truncated here

It is also recommended that you check out the neldermead package if you are interested in applying variants of the Nelder-Mead method for your optimization problems. The neldermead package provides you with the opportunity to manage settings specific to the Nelder-Mead method such as the method to compute the initial simplex, that is, the specific termination criteria. The neldermead package also allows you to apply the fixed shape simplex method of Spendley, Hext, and Himsworth, the variable shape simplex method of Nelder and Mead, and the Box complex method. You can read more on this package at http://cran.r-project.org/web/packages/neldermead/vignettes/neldermead_manual.pdf.

More optim() features

The optim() function allows you to use a variety of methods to minimize functions, which are given in the following table:

The optim() function method name

Full method name

"Nelder-Mead"

Nelder and Mead

"BFGS"

Broyden, Fletcher, Goldfarb, and Shanno

"CG"

Conjugate gradient

"L-BFGS-B"

Limited-memory BFGS

"SANN"

Simulated annealing

"Brent"

Brent

For example, we could minimize the Rosenbrock function using the simulated annealing method by setting method="SANN". The simulated annealing method is a stochastic global optimization method that can also be used for non-differentiable functions. It performs random jumps around the starting point to explore its vicinity and progressively narrows the jumps around a point until it finds the minimum. Since its output depends on the random number generator, you will get slightly different values each time you run the method, unless you set the seed first to reproduce the data at a later point. Let's take a look at the following lines of code:

> set.seed(267)

> optim(par=c(0.7,0.7), rosenbrock.f2, method="SANN")
$par
[1] 1.019317 1.039894

$value
[1] 0.000451745

#Output truncated

The optim() function also allows you to apply quasi-Newton methods to solve your optimization problems. For example, let's use the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm method to solve More optim() features.

First, let's plot the function using the lattice package instead of the persp() function as follows:

# We store the function f(x, y) in h using the first index
# for x and the second index for y

h <- function(x){
  (x[1]-2)^2 + (x[2] -1)^2
}

# We get values for x[1] and x[2] and store them in separate vectors

> x1 <- seq(-4, 4, by=0.5)
> x2 <- seq(-4, 4, by=0.5)

#We use the expand.grid() to create a matrix from all combinations of both vectors

> mat <- as.matrix(expand.grid(x1, x2))

# We rename the columns to x1 and x2
> colnames(mat) <- c(«x1», «x2»)

# Calculate f(x, y) and store in z by applying the function
# h on the matrix by row 

> z <- apply(mat, 1, h)

# We create a date frame containing x1, x2 and z

> df <- data.frame(mat, z)

# We load the lattice package and use wireframe function to plot

> library(lattice) 
> wireframe(z ~ x1 * x2 , data=df, shade = TRUE, scales = list(arrows = FALSE), screen = list(z = -35, x = -50))

The result is shown in the following plot:

More optim() features

If you prefer to plot the function using the persp() function, use the following lines of code:

# Script to use to plot f(x, y) with the persp() function
> hfn <- function(x, y){
 (x-2)^2 + (y -1)^2
 }
> x <- seq(-5, 5, by=0.2)
> y <- seq(-5, 5, by=0.2)
> z <- outer(x, y, hfn)


> persp(x,y,z,phi=35,theta=50,col="purple",shade=.00000001, ticktype="detailed")

# Plot not shown

From this plot, we can use (0, 0) for the initial parameters in the optim() function. This method does not require that the Hessian matrix of second derivatives be calculated directly. Instead, the Hessian matrix is approximated using rank-one updates obtained from approximate gradient evaluations. The optim() function allows us to return a numerically differentiated Hessian matrix with hessian=TRUE and supply a function to return the gradient with gr. For simplicity, we will leave the default setting gr=NULL, which uses a finite-difference approximation for the gradient. Let's apply these conditions to minimize the function as follows:

> optim(par=c(0,0), h, method="BFGS", hessian = T)
$par
[1] 2 1

$value
[1] 6.357135e-26

$counts
function gradient 
       9        3 

$convergence
[1] 0

$message
NULL

$hessian
     [,1] [,2]
[1,]    2    0
[2,]    0    2

As expected, the minimizer is (2, 1).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.170.187