Often, in scientific computing, we are required to find the value of x for which a function f(x) will attain a maximum or minimum value. In other words, we want to maximize or minimize f(x). This process is termed as numerical optimization and can be summarized as follows:
In the preceding formula, x represents a vector of variables also known as the unknowns or parameters, f is the function of x we want to maximize or minimize known as the objective function, zi is the constraint functions that x must fulfill, and N and M are sets of indices. Optimization problems are used in mathematics, finance, and computer science to find the best solution from all feasible solutions. We can simplify maximization optimization problems to minimization problems by remembering that the maximum of f(x) is essentially the minimum of -f(x). Therefore, to maximize the function f, we simply need to minimize f. In this chapter, you will learn different methods and functions used to perform numerical optimization in R.
In this chapter, we will cover the following topics:
optim()
functionlp()
functionLet's start by minimizing the following function:
In the preceding example, the function f has no constraints, so we can select variables within its full range freely. To solve this problem, we will apply the golden section search method, which is a one-dimensional unconstrained optimization method.
One-dimensional unconstrained optimization is used to minimize the function of the type . The golden section search method is a zero-line search method used to solve functions of the type . This method uses the values of the objective function f and not their derivatives, making this type of solution best applicable to minimize the function since f(x) is not differentiable at x = 2.5.
The golden section search method uses an interval reduction strategy independent of the number of iterations, where the ratio between the sizes of two consecutive intervals is constant and makes use of the golden ratio . The golden ratio is defined algebraically as follows:
Here, .
Basically, it is an iterative method that minimizes the function by:
Let's take a look at the following steps to perform the Golden search method to initialize f(x):
golden.method()
, and set arguments to specify the pre-set the tolerance and initial values for a and b. By default, let's set the tolerance value to 0.000001 as given in the following code:golden.method <- function(f, a, b, tolerance=0.000001){ # store the golden ratio in the object psi psi <- (1 + sqrt(5))/2 #Calculate value for x1 and x2 using the formulas we defined x1 <- b - (b - a)/psi x2 <- a + (b - a)/psi #Find the value for f(x1) and f(x2) fx1 <- f(x1) fx2 <- f(x2) # Repeat test for the minimizer while the absolute difference # between b and a is greater than the set threshold while(abs(b-a) > tolerance){ # Use an if statement to test if the minimizer is to the # left of x2 else it is to the right of x1 if(fx2 > fx1){ # Since the minimizer is to the left of x2, we can re- # use x2 for b' (defined b in loop) b <- x2 # We can reuse x1 for x2 for the next iteration of loop x2 <- x1 # Since we re-use x1 for x2 we can use the value stored # in fx1 for fx2 fx2 <- fx1 # Now we calculate a new value for x1 and f(x1) to test # in the next iteration of the loop x1 <- b - (b - a)/psi fx1 <- f(x1) } else { # Similarly, since the minimizer is to the right of x1, # we can re-use the value x1 for a a <- x1 # We can re-use x2 for x1 in the next iteration of the loop x1 <- x2 fx1 <- fx2 # We calculate a new value for x2 and f(x2) to test in # the next iteration of the loop x2 <- a + (b - a)/psi fx2 <- f(x2) } } # When the absolute difference between b and a is below our # set threshold, we can use the midpoint of the final # [a', b'] interval as an estimate for the true minimize minimizer <- (a + b)/2 max.error <- (b - a)/2 #return the minimizer and the maximum error in a list results <- list(minimizer=minimizer, maximum.error=max.error) return(results) }
Now let's give our golden.method()
function a try with .
First, let's take a look at the curve for f(x) using the curve()
function to help determine the best [a, b] interval to start from in our golden.method()
function, as follows:
# Define the function f > f <- function(x) { abs(x - 2.5) + (x - 1)^2 } #Plot the curve > curve(f, from=0, to=5)
The result is shown in the following plot:
From the previous plot, it seems that the minimizer is located between 1 and 3, so we will set a=1
and b=3
in our golden.method()
function, as follows:
> golden.method(f, a=1, b=3) $minimizer [1] 1.5 $maximum.error [1] 3.321874e-07
You can also add the minimizer to the graph as follows:
> curve(f, from=0, to=5, cex.lab=1.5) > res <- golden.method(f, a=1, b=3) > points(res$minimizer,f(res$minimizer))
The result is shown in the following plot:
Now let's use the golden.method()
function to maximize using the following lines of code:
# Define the function f > g <- function(x) { abs(x-2.5) - abs(x -1) - abs(x -0.5) } #Plot the curve > curve(g, from=-10, to=10)
The result is shown in the following plot:
Now, to maximize g(x), we simply need to minimize –g(x) by creating a function to calculate negative g and use it as input into our golden.method()
function. Let's take a look at this in the following lines of code:
> h <- function(x) { -g(x) } # We know from the curve the minimizer is between [-5, 5] > golden.method(h, a=-5, b=5) $minimizer [1] 0.4999999 $maximum.error [1] 3.92094e-07
Instead of using the golden.method()
function we defined, we could also use the built-in optimize()
function. This function uses a combination of the golden search method and successive parabolic interpolation, which we will not go into detail about here. Nevertheless, we will show you how to use the function to minimize our first example, , with the optimize()
function. Let's take a look at this in following lines of code:
> f <- function(x) { abs(x - 2.5) + (x - 1)^2 }
Then, we use the optimize function using the interval
argument to define a and b and the tol
argument to set the tolerance. Let's take a look at this in the following lines of code:
> optimize(f, interval=c(1, 3), tol=0.000001) $minimum [1] 1.5 $objective [1] 1.25
As you can see, we get the same answer as before, that is, 1.5. The function also returns $objective
, which corresponds to the value of f(x) at that point. We could have also defined the interval with the lower
and upper
arguments instead as follows:
> optimize(f, lower=1, upper=3, tol=0.000001) $minimum [1] 1.5 $objective [1] 1.25
Another useful feature of the optimize()
function is its ability to directly maximize a function without the user needing to specify the negative of the function, like we did earlier. Instead, you just need to specify maximum=TRUE
when you use the optimize
function. Let's use the optimize
function to maximize from our earlier example. Let's take a look at this in the following lines of code:
> g <- function(x) { abs(x-2.5) - abs(x -1) - abs(x -0.5) } > optimize(g, lower=-5, upper=5, tol=0.000001, maximum=TRUE) $maximum [1] 0.5 $objective [1] 1.5
The Newton-Raphson method is a derivative-based numerical method used to solve one-dimensional optimization problems. By using an initial point and the derivative information of the objective
function, you can obtain the minimum for f(x). It uses linear approximation to solve the root of the equation. Given that f(x) is a convex function, otherwise known as a "well-behaved" function, and r is the root of the equation f(x) = 0, we can estimate the true root r with x0, where r = x0 + h and h is the measure of how far x0 is from r. If h is very small and near 0, we can use a Taylor series approximation to solve the equation as follows:
Therefore, r = x0 + h can be approximated as follows:
This approximation can be done with a potentially better estimate for r being x1 defined as follows:
If we repeat this process multiple times, the subsequent estimates for r will be defined as follows:
This process will be repeated until is near 0 or a pre-set tolerance ε such that . Therefore, as long as the minimizer for f(x) is in the interval [a, b] and not a or b and satisfies f'(x) = 0, we can find the minimizer of f(x) using this approach by using the following formula:
However, we must also check that the value we find is actually a minimizer of the function and not just a maximizer or a point of inflection. So, to guarantee that our solution is a minimizer, we check whether .
Now let's use the Newton-Raphson method to minimize the function .
The first derivative of f(x) is as follows:
The second derivative of f(x) is as follows:
Before we begin, we will plot the function to determine the best initial estimate for x0. Let's take a look at this in the following lines of code:
> f <- function(x) { exp(-x^2) + x^3 } > curve(f, from=-1, to=4)
The result is shown in the following plot:
From the curve, a reasonable estimate for x0 is 1. Now, we will write a function to implement the Newton-Raphson method to minimize the function.
We include the N
argument to define the maximum number of iterations we should run, which we default to 100
, as follows:
newton.method <- function(f, ff, fff, x0, tol=0.000001, N=100){ # We start the counter at 1 i <- 1 # We create a vector to store the estimates and values for # f(x), f'(x) and f''(x) estimates <- numeric(N) fvalue <- numeric(N) ffvalue <- numeric(N) fffvalue <- numeric(N) while(i <= N){ # We use the Newton-Raphson formula to estimate the # minimizer x1 <- x0 - ff(x0)/fff(x0) # We store the estimates and values for f(x), f'(x) and # f''(x) estimates[i] <- x1 fvalue[i] <- f(x1) ffvalue[i] <- ff(x1) fffvalue[i] <- fff(x1) # We update the counter i <- i + 1 # We break from the loop if we reach below our pre-set # tolerance if(abs(x0 - x1) < tol) break # Before the next iteration of the loop we replace x0 with # the value x1 x0 <- x1 } # We return a dataframe of all the estimates and values for # the first & second derivative estimates <- estimates[1:(i-1)] fvalue <- fvalue[1:(i-1)] ffvalue <- ffvalue[1:(i-1)] fffvalue <- fffvalue[1:(i-1)] df <- as.data.frame(cbind(estimates, fvalue, ffvalue, fffvalue)) return(df) }
Now, let's use the function we wrote to minimize our function stored in f
. First, we need to create a function for and , which we will store in fp
and fpp
, respectively.
For the first derivative , we use the following lines of code:
fp <- function(x) { 3*x^2 - 2*x*exp(-x^2) }
For the second derivative , we use the following lines of code:
fpp <- function(x) { 4*x^2*exp(-x^2 )- 2*exp(-x^2 )+ 6*x }
Next, we use the newton.method()
function with x0=1
using f
, fp
, and fpp
, as follows:
> newton.method(f, fp, fpp, x0=1) estimates fvalue ffvalue fffvalue 1 0.6638477 0.9361433 4.675900e-01 3.830410 2 0.5417746 0.9046560 7.262741e-02 2.634812 3 0.5142100 0.9036205 3.761765e-03 2.361857 4 0.5126173 0.9036175 1.255900e-05 2.346086 5 0.5126120 0.9036175 1.418679e-10 2.346033 6 0.5126120 0.9036175 -1.110223e-16 2.346033
From the output, we can see that our minimum is 0.5126120, and since is positive, we can conclude that it is a local minimum. We can check our solution with the optimize()
function in the [0, 2] interval as follows:
> optimize(f, interval=c(0, 2), tol=0.000001) $minimum [1] 0.5126119 $objective [1] 0.9036175
As you can see, we get a pretty close value of 0.5126120 compared to 0.5126119 using the optimize()
function.
There is also a Newton-Raphson function, ()
, which is part of the spuRs
package. Let's use it to minimize the function as follows:
> install.packages("spuRs") > library("spuRs")
To be able to use the newtonraphson()
function to minimize our equation, we need to create a function that will return the value for and in a vector as the fftn
argument.
Since we need to apply the Newton-Raphson method on the and function, we store these values in separate variables that we will combine in a results
vector to return to the newtonraphson()
function. Let's write the ftn
argument function and call it minimizer.ftn()
as follows:
minimizer.ftn <- function(x){ ffvalue <- 3*x^2 - 2*x*exp(-x^2) fffvalue <- 4*x^2*exp(-x^2 )- 2*exp(-x^2 )+ 6*x results <- c(ffvalue, fffvalue) return(results) }
Now we run the analysis setting x0=1
for our initial estimate and tol=0.000001
for our pre-set tolerance as follows:
> newtonraphson(minimizer.ftn, x0=1, tol=0.000001) At iteration 1 value of x is: 0.6638477 At iteration 2 value of x is: 0.5417746 At iteration 3 value of x is: 0.51421 At iteration 4 value of x is: 0.5126173 At iteration 5 value of x is: 0.512612 Algorithm converged [1] 0.512612
We get almost the same answer as with our newton.method()
function.
An alternative derivative-free approach to the Newton-Raphson method is the Nelder-Mead simplex method. It is a nonlinear optimization technique to minimize an objective function in multiple dimensions. By default, the optim()
function, which is part of the basic functions in R, uses the Nelder-Mead simplex method to minimize your function of choice. Let's use the optim()
function with its default settings to solve . To set the initial estimate x0 in the optim()
function, we use the par
argument. In this case, we set par
to 1
as follows:
> f <- function(x) { exp(-x^2) + x^3 } > optim(par=1, fn=f) $par [1] 0.5126953 $value [1] 0.9036175 $counts function gradient 26 NA $convergence [1] 0 $message NULL Warning message: In optim(1, f) : one-dimensional optimization by Nelder-Mead is unreliable: use "Brent" or optimize() directly
From the output, you will notice that the minimizer is given in $par
. You will also notice that there is a warning message to use the Brent method or optimize()
function directly since it yields more accurate results. We can apply the Brent method directly by specifying method="Brent"
and the [a, b] interval with the upper
and lower
arguments, as follows:
> optim(1, f, method="Brent", lower=0, upper=2) $par [1] 0.512612 $value [1] 0.9036175 $counts function gradient NA NA $convergence [1] 0 $message NULL
As you can see, we get the same value we found earlier with the Brent method compared to the Nelder-Mead method. However, it is worth mentioning that the Nelder-Mead approach is a heuristic search method that is preferentially used to solve problems that can't easily be solved with other methods.
The Nelder-Mead method is also used to solve multidimensional problems. For example, let's minimize the rosenbrock
function, which is defined as follows:
First, we store the function in an object called rosenbrock.f
, as follows:
> rosenbrock.f <- function(x1,y1) { (1-x1)^2 + 100*(y1 - x1^2)^2 }
Then, we can plot the function as follows:
x
and y
values to plot:> x <- seq(-3, 3, by=0.2) > y <- seq(-2, 3, by=0.2)
x
and y
vectors using the outer()
function as follows:> z <- outer(x, y, rosenbrock.f)
persp()
function as follows:>persp(x,y,z,phi=40,theta=40,col="turquoise",shade=.000001,ticktype="detailed", cex.lab=1.5, zlab="")
The result is shown in the following plot:
One of the properties of the Rosenbrock function is that it is always positive except when and , where it is 0; so, (1, 1) is a minimum for this function. Let's see how close we get to the minimum using the Nelder-Mead method with the optim()
function.
To be able to use the optim()
function, we need to rewrite rosenbrock.f
so that x and y are respectively stored in the first and second index of a single vector x
as follows:
> rosenbrock.f2 <- function(x) { (1-x[1])^2 + 100*(x[2]-x[1]^2)^2 }
Now we can use the optim()
function to minimize the Rosenbrock function. This time, we must specify coordinates (x0, y0) for the initial estimate of the minimizer using a vector for the $par
argument. In this case, we will guess (0.7, 0.7)
for the initial values of the parameters to be optimized as follows:
> optim(par=c(0.7,0.7), rosenbrock.f2) $par [1] 1.000065 1.000120 $value [1] 1.499038e-08 $counts function gradient 71 NA $convergence [1] 0 $message NULL
As you can see, we get values pretty close to (1, 1)
. If we set the initial parameters to (1.5, 1.5)
, we get similar results as follows:
> optim(par=c(1.5,1.5), rosenbrock.f2) $par [1] 0.9995667 0.9991578 $value [1] 2.463013e-07 # Output truncated here
It is also recommended that you check out the neldermead
package if you are interested in applying variants of the Nelder-Mead method for your optimization problems. The neldermead
package provides you with the opportunity to manage settings specific to the Nelder-Mead method such as the method to compute the initial simplex, that is, the specific termination criteria. The neldermead
package also allows you to apply the fixed shape simplex method of Spendley, Hext, and Himsworth, the variable shape simplex method of Nelder and Mead, and the Box complex method. You can read more on this package at http://cran.r-project.org/web/packages/neldermead/vignettes/neldermead_manual.pdf.
The optim()
function allows you to use a variety of methods to minimize functions, which are given in the following table:
The optim() function method name |
Full method name |
---|---|
"Nelder-Mead" |
Nelder and Mead |
"BFGS" |
Broyden, Fletcher, Goldfarb, and Shanno |
"CG" |
Conjugate gradient |
"L-BFGS-B" |
Limited-memory BFGS |
"SANN" |
Simulated annealing |
"Brent" |
Brent |
For example, we could minimize the Rosenbrock function using the simulated annealing method by setting method="SANN"
. The simulated annealing method is a stochastic global optimization method that can also be used for non-differentiable functions. It performs random jumps around the starting point to explore its vicinity and progressively narrows the jumps around a point until it finds the minimum. Since its output depends on the random number generator, you will get slightly different values each time you run the method, unless you set the seed first to reproduce the data at a later point. Let's take a look at the following lines of code:
> set.seed(267) > optim(par=c(0.7,0.7), rosenbrock.f2, method="SANN") $par [1] 1.019317 1.039894 $value [1] 0.000451745 #Output truncated
The optim()
function also allows you to apply quasi-Newton methods to solve your optimization problems. For example, let's use the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm method to solve .
First, let's plot the function using the lattice
package instead of the persp()
function as follows:
# We store the function f(x, y) in h using the first index # for x and the second index for y h <- function(x){ (x[1]-2)^2 + (x[2] -1)^2 } # We get values for x[1] and x[2] and store them in separate vectors > x1 <- seq(-4, 4, by=0.5) > x2 <- seq(-4, 4, by=0.5) #We use the expand.grid() to create a matrix from all combinations of both vectors > mat <- as.matrix(expand.grid(x1, x2)) # We rename the columns to x1 and x2 > colnames(mat) <- c(«x1», «x2») # Calculate f(x, y) and store in z by applying the function # h on the matrix by row > z <- apply(mat, 1, h) # We create a date frame containing x1, x2 and z > df <- data.frame(mat, z) # We load the lattice package and use wireframe function to plot > library(lattice) > wireframe(z ~ x1 * x2 , data=df, shade = TRUE, scales = list(arrows = FALSE), screen = list(z = -35, x = -50))
The result is shown in the following plot:
If you prefer to plot the function using the persp()
function, use the following lines of code:
# Script to use to plot f(x, y) with the persp() function > hfn <- function(x, y){ (x-2)^2 + (y -1)^2 } > x <- seq(-5, 5, by=0.2) > y <- seq(-5, 5, by=0.2) > z <- outer(x, y, hfn) > persp(x,y,z,phi=35,theta=50,col="purple",shade=.00000001, ticktype="detailed") # Plot not shown
From this plot, we can use (0, 0) for the initial parameters in the optim()
function. This method does not require that the Hessian matrix of second derivatives be calculated directly. Instead, the Hessian matrix is approximated using rank-one updates obtained from approximate gradient evaluations. The optim()
function allows us to return a numerically differentiated Hessian matrix with hessian=TRUE
and supply a function to return the gradient with gr
. For simplicity, we will leave the default setting gr=NULL
, which uses a finite-difference approximation for the gradient. Let's apply these conditions to minimize the function as follows:
> optim(par=c(0,0), h, method="BFGS", hessian = T) $par [1] 2 1 $value [1] 6.357135e-26 $counts function gradient 9 3 $convergence [1] 0 $message NULL $hessian [,1] [,2] [1,] 2 0 [2,] 0 2
As expected, the minimizer is (2, 1).
18.226.170.187