Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6
Nonlinear Programming II: Unconstrained Optimization Techniques

6.1 Introduction

This chapter deals with the various methods of solving the unconstrained minimization problem:

(6.1)

It is true that rarely a practical design problem would be unconstrained; still, a study of this class of problems is important for the following reasons:

The constraints do not have significant influence in certain design problems.
Some of the powerful and robust methods of solving constrained minimization problems require the use of unconstrained minimization techniques.
The study of unconstrained minimization techniques provide the basic understanding necessary for the study of constrained minimization methods.
The unconstrained minimization methods can be used to solve certain complex engineering analysis problems. For example, the displacement response (linear or nonlinear) of any structure under any specified load condition can be found by minimizing its potential energy. Similarly, the eigenvalues and eigenvectors of any discrete system can be found by minimizing the Rayleigh quotient.

As discussed in Chapter 2, a point X ^* will be a relative minimum of f(X) if the necessary conditions

(6.2)

are satisfied. The point X ^* is guaranteed to be a relative minimum if the Hessian matrix is positive definite, that is,

(6.3)

Equations (6.2) and (6.3) can be used to identify the optimum point during numerical computations. However, if the function is not differentiable, Eqs. (6.2) and (6.3) cannot be applied to identify the optimum point. For example, consider the function

where a > 0 and b > 0. The graph of this function is shown in Figure 6.1. It can be seen that this function is not differentiable at the minimum point, x ^* = 0, and hence Eqs. (6.2) and (6.3) are not applicable in identifying x ^*. In all such cases, the commonly understood notion of a minimum, namely, f(X ^*) < f(X) for all X, can be used only to identify a minimum point. The following example illustrates the formulation of a typical analysis problem as an unconstrained minimization problem.

Graph depicts function is not differentiable at minimum point. — Figure 6.1 Function is not differentiable at minimum point.

Example 6.1

A cantilever beam is subjected to an end force P ₀ and an end moment M ₀ as shown in Figure 6.2a. By using a one‐finite‐element model indicated in Figure 6.2b, the transverse displacement, w(x), can be expressed as 1

(E1)

where N _i(x) are called shape functions and are given by

(E2)

(E3)

(E4)

(E5)

α = x/l, and u ₁, u ₂, u ₃, and u ₄ are the end displacements (or slopes) of the beam. The deflection of the beam at point A can be found by minimizing the potential energy of the beam (F), which can be expressed as 1

(E6)

where E is Young's modulus and I is the area moment of inertia of the beam. Formulate the optimization problem in terms of the variables x ₁ = u ₃ and x ₂ = u ₄ l for the case P ₀ l ³/EI = 1 and M ₀ l ²/EI = 2.

Figure 6.2 Finite‐element model of a cantilever beam.

Solution

Since the boundary conditions are given by u ₁ = u ₂ = 0, w(x) can be expressed as

(E7)

so that

(E8)

Eq. (E6) can be rewritten as

(E9)

By using the relations u ₃ = x ₁, u ₄ l = x ₂, P ₀ l ³/EI = 1, and M ₀ l ²/EI = 2, and introducing the notation f = Fl ³/EI, Eq. (E9) can be expressed as

(E10)

Thus the optimization problem is to determine x ₁ and x ₂, which minimize the function f given by Eq. (E10).

6.1.1 Classification of Unconstrained Minimization Methods

Several methods are available for solving an unconstrained minimization problem. These methods can be classified into two broad categories as direct search methods and descent methods as indicated in Table 6.1. The direct search methods require only the objective function values but not the partial derivatives of the function in finding the minimum and hence are often called the nongradient methods. The direct search methods are also known as zeroth‐order methods since they use zeroth‐order derivatives of the function. These methods are most suitable for simple problems involving a relatively small number of variables. These methods are, in general, less efficient than the descent methods. The descent techniques require, in addition to the function values, the first and in some cases the second derivatives of the objective function. Since more information about the function being minimized is used (through the use of derivatives), descent methods are generally more efficient than direct search techniques. The descent methods are known as gradient methods. Among the gradient methods, those requiring only first derivatives of the function are called first‐order methods; those requiring both first and second derivatives of the function are termed second‐order methods.

Table 6.1 Unconstrained minimization methods.

Direct search methodsa	Descent methodsb
Random search method	Steepest descent (Cauchy) method
Grid search method	Fletcher–Reeves method
Univariate method	Newton's method
Pattern search methods	Marquardt method
Powell's method	Quasi‐Newton methods
	Davidon–Fletcher–Powell method
	Broyden–Fletcher–Goldfarb–Shanno method
Simplex method

^aDo not require the derivatives of the function.

^bRequire the derivatives of the function.

6.1.2 General Approach

All the unconstrained minimization methods are iterative in nature and hence they start from an initial trial solution and proceed toward the minimum point in a sequential manner as shown in Figure 5.3. The iterative process is given by

(6.4)

where X _i is the starting point, S _i is the search direction, is the optimal step length, and X _i+1 is the final point in iteration i. It is important to note that all the unconstrained minimization methods (i) require an initial point X ₁ to start the iterative procedure, and (ii) differ from one another only in the method of generating the new point X _i+1 (from X _i) and in testing the point X _i+1 for optimality.

6.1.3 Rate of Convergence

Different iterative optimization methods have different rates of convergence. In general, an optimization method is said to have convergence of order p if 2

(6.5)

where X _i and X _i+1 denote the points obtained at the end of iterations i and i + 1, respectively, X ^* represents the optimum point, and ||X|| denotes the length or norm of the vector X:

If p = 1 and 0 ≤ k ≤ 1, the method is said to be linearly convergent (corresponds to slow convergence). If p = 2, the method is said to be quadratically convergent (corresponds to fast convergence). An optimization method is said to have superlinear convergence (corresponds to fast convergence) if

(6.6)

The definitions of rates of convergence given in Eqs. (6.5) and (6.6) are applicable to single‐variable as well as multivariable optimization problems. In the case of single‐variable problems, the vector, X _i, for example, degenerates to a scalar, x _i.

6.1.4 Scaling of Design Variables

The rate of convergence of most unconstrained minimization methods can be improved by scaling the design variables. For a quadratic objective function, the scaling of the design variables changes the condition number1 of the Hessian matrix. When the condition number of the Hessian matrix is 1, the steepest descent method, for example, finds the minimum of a quadratic objective function in one iteration.

If denotes a quadratic term, a transformation of the form

(6.7)

can be used to obtain a new quadratic term as

(6.8)

The matrix [R] can be selected to make [Ã] = [R]^T[A][R] diagonal (i.e. to eliminate the mixed quadratic terms). For this, the columns of the matrix [R] are to be chosen as the eigenvectors of the matrix [A]. Next the diagonal elements of the matrix [Ã] can be reduced to 1 (so that the condition number of the resulting matrix will be 1) by using the transformation

(6.9)

where the matrix [S] is given by

(6.10)

Thus the complete transformation that reduces the Hessian matrix of f to an identity matrix is given by

(6.11)

so that the quadratic term reduces to .

If the objective function is not a quadratic, the Hessian matrix and hence the transformations vary with the design vector from iteration to iteration. For example, the second‐order Taylor's series approximation of a general nonlinear function at the design vector X _i can be expressed as

(6.12)

where

(6.13)

(6.14)

(6.15)

The transformations indicated by Eqs. (6.7) and (6.9) can be applied to the matrix [A] given by Eq. (6.15). The procedure of scaling the design variables is illustrated with the following example.

Example 6.2

Find a suitable scaling (or transformation) of variables to reduce the condition number of the Hessian matrix of the following function to 1:

(E1)

Solution

The quadratic function can be expressed as

(E2)

where

As indicated above, the desired scaling of variables can be accomplished in two stages.

Stage 1: Reducing [A] to a Diagonal Form, [Ã]

The eigenvectors of the matrix [A] can be found by solving the eigenvalue problem

(E3)

where λ _i is the ith eigenvalue and u _i is the corresponding eigenvector. In the present case, the eigenvalues, λ _i, are given by

(E4)

which yield and . The eigenvector u _i corresponding to λ _i can be found by solving Eq. (E3):

that is,

and

that is,

Thus the transformation that reduces [A] to a diagonal form is given by

(E5)

that is,

This yields the new quadratic term as , where

and hence the quadratic function becomes

(E6)

Stage 2: Reducing [Ã] to a Unit Matrix

The transformation is given by Y = [S]Z, where

Stage 3: Complete Transformation

The total transformation is given by

(E7)

where

(E8)

With this transformation, the quadratic function of Eq. (E1) becomes

(E9)

The contours of the quadratic functions given by Eqs. (E1), (E6), and (E9) are shown in Figure 6.3a–c, respectively.

Geometric illustration of the contours of the quadratic functions given by Equations E1 and E6. — Figure 6.3 Contours of the original and transformed functions.

Geometric illustration of the contours of the quadratic functions given by Equation E9. — Figure 6.3 Contours of the original and transformed functions.

Direct Search Methods

6.2 Random Search Methods

Random search methods are based on the use of random numbers in finding the minimum point. Since most of the computer libraries have random number generators, these methods can be used quite conveniently. Some of the best known random search methods are presented in this section.

6.2.1 Random Jumping Method

Although the problem is an unconstrained one, we establish the bounds l _i and u _i for each design variable x _i, i = 1, 2, …, n, for generating the random values of x _i:

(6.16)

In the random jumping method, we generate sets of n random numbers (r ₁, r ₂, …, r _n), that are uniformly distributed between 0 and 1. Each set of these numbers is used to find a point, X, inside the hypercube defined by Eq. (6.16) as

(6.17)

and the value of the function is evaluated at this point X. By generating a large number of random points X and evaluating the value of the objective function at each of these points, we can take the smallest value of f(X) as the desired minimum point.

6.2.2 Random Walk Method

The random walk method is based on generating a sequence of improved approximations to the minimum, each derived from the preceding approximation. Thus if X _i is the approximation to the minimum obtained in the (i − 1)th stage (or step or iteration), the new or improved approximation in the ith stage is found from the relation

(6.18)

where λ is a prescribed scalar step length and u _i is a unit random vector generated in the ith stage. The detailed procedure of this method is given by the following steps 3:

Start with an initial point X ₁, a sufficiently large initial step length λ, a minimum allowable step length ε, and a maximum permissible number of iterations N.
Find the function value f ₁ = f(X ₁).
Set the iteration number as i = 1.
Generate a set of n random numbers r ₁, r ₂, …, r _n each lying in the interval [−1, 1] and formulate the unit vector u as
(6.19)

The directions generated using Eq. (6.19) are expected to have a bias toward the diagonals of the unit hypercube 3. To avoid such a bias, the length of the vector, R, is computed as

and the random numbers generated (r ₁, r ₂, …, r _n) are accepted only if R ≤ 1 but are discarded if R > 1. If the random numbers are accepted, the unbiased random vector u _i is given by Eq. (6.19).
Compute the new vector and the corresponding function value as X = X ₁ + λ u and f = f(X).
Compare the values of f and f ₁. If f < f ₁, set the new values as X ₁ = X and f ₁ = f and go to step 3. If f ≥ f ₁, go to step 7.
If i ≤ N, set the new iteration number as i = i + 1 and go to step 4. On the other hand, if i > N, go to step 8.
Compute the new, reduced, step length as λ = λ/2. If the new step length is smaller than or equal to ε, go to step 9. Otherwise (i.e. if the new step length is greater than ε), go to step 4.
Stop the procedure by taking X _opt ≈ X ₁ and f _opt ≈ f ₁.

This method is illustrated with the following example.

Example 6.3

Minimize using random walk method from the point images with a starting step length of λ = 1.0. Take ε = 0.05 and N = 100.

Solution

The results are summarized in Table 6.2, where only the trials that produced an improvement are shown.

Table 6.2 Minimization of f by random walk method.

Step length, λ	Number of trials requireda			Current objective function value, f ₁ = f(X₁ + λu)
Step length, λ	Number of trials requireda			Current objective function value, f ₁ = f(X₁ + λu)	Components of X₁ + λu
1	2
1.0	1	−0.936 96	0.349 43	−0.063 29
1.0	2	−1.152 71	1.325 88	−1.119 86
Next 100 trials did not reduce the function value.
0.5	1	−1.343 61	1.788 00	−1.128 84
0.5	3	−1.073 18	1.367 44	−1.202 32
Next 100 trials did not reduce the function value.
0.25	4	−0.864 19	1.230 25	−1.213 62
0.25	2	−0.869 55	1.480 19	−1.220 74
0.25	8	−1.106 61	1.559 58	−1.236 42
0.25	30	−0.942 78	1.370 74	−1.241 54
0.25	6	−1.087 29	1.574 74	−1.242 22
0.25	50	−0.926 06	1.383 68	−1.242 74
0.25	23	−1.079 12	1.581 35	−1.243 74
Next 100 trials did not reduce the function value.
0.125	1	−0.979 86	1.505 38	−1.248 94
Next 100 trials did not reduce the function value.
0.062 5	100 trials did not reduce the function value.
0.031 25	As this step length is smaller than ε, the program is terminated.

aOut of the directions generated that satisfy R ≤ 1, number of trials required to find a direction that also reduces the value of f.

6.2.3 Random Walk Method with Direction Exploitation

In the random walk method described in Section 6.2.2, we proceed to generate a new unit random vector u _i+1 as soon as we find that u _i is successful in reducing the function value for a fixed step length λ. However, we can expect to achieve a further decrease in the function value by taking a longer step length along the direction u _i. Thus the random walk method can be improved if the maximum possible step is taken along each successful direction. This can be achieved by using any of the one‐dimensional minimization methods discussed in Chapter 5. According to this procedure, the new vector X _i+1 is found as

(6.20)

where is the optimal step length found along the direction u _i so that

(6.21)

The search method incorporating this feature is called the random walk method with direction exploitation.

6.2.4 Advantages of Random Search Methods

These methods can work even if the objective function is discontinuous and nondifferentiable at some of the points.
The random methods can be used to find the global minimum when the objective function possesses several relative minima.
These methods are applicable when other methods fail due to local difficulties such as sharply varying functions and shallow regions.
Although the random methods are not very efficient by themselves, they can be used in the early stages of optimization to detect the region where the global minimum is likely to be found. Once this region is found, some of the more efficient techniques can be used to find the precise location of the global minimum point.

6.3 Grid Search Method

This method involves setting up a suitable grid in the design space, evaluating the objective function at all the gird points, and finding the grid point corresponding to the lowest function value. For example, if the lower and upper bounds on the ith design variable are known to be l _i and u _i, respectively, we can divide the range (l _i, u _i) into p _i − 1 equal parts so that denote the grid points along the x _i axis (i = 1, 2, …, n). This leads to a total of p ₁ p ₂ ⋯ p _n grid points in the design space. A grid with p _i = 4 is shown in a two‐dimensional design space in Figure 6.4. The grid points can also be chosen based on methods of experimental design 4,5. It can be seen that the grid method requires prohibitively large number of function evaluations in most practical problems. For example, for a problem with 10 design variables (n = 10), the number of grid points will be 3¹⁰ = 59 049 with p _i = 3 and 4¹⁰ = 1 048 576 with p _i = 4. However, for problems with a small number of design variables, the grid method can be used conveniently to find an approximate minimum. Also, the grid method can be used to find a good starting point for one of the more efficient methods.

Illustration of grid and pi is equal to 4. — Figure 6.4 Grid with p _i = 4.

6.4 Univariate Method

In this method we change only one variable at a time and seek to produce a sequence of improved approximations to the minimum point. By starting at a base point X _i in the ith iteration, we fix the values of n − 1 variables and vary the remaining variable. Since only one variable is changed, the problem becomes a one‐dimensional minimization problem and any of the methods discussed in Chapter 5 can be used to produce a new base point X _i+1. The search is now continued in a new direction. This new direction is obtained by changing any one of the n − 1 variables that were fixed in the previous iteration. In fact, the search procedure is continued by taking each coordinate direction in turn. After all the n directions are searched sequentially, the first cycle is complete and hence we repeat the entire process of sequential minimization. The procedure is continued until no further improvement is possible in the objective function in any of the n directions of a cycle. The univariate method can be summarized as follows:

Choose an arbitrary staring point X ₁ and set i = 1.
Find the search direction S _i as
(6.22)
Determine whether λ _i should be positive or negative. For the current direction S _i, this means find whether the function value decreases in the positive or negative direction. For this we take a small probe length (ε) and evaluate f _i = f(X _i), f ⁺ = f(X _i + ε S _i), and f ⁻ = f(X _i − ε S _i). If f ⁺ < f _i, S _i will be the correct direction for decreasing the value of f and if f ⁻ < f _i, −S _i will be the correct one. If both f ⁺ and f ⁻ are greater than f _i, we take X _i as the minimum along the direction S _i.
Find the optimal step length such that
(6.23)

where + or − sign has to be used depending upon whether S _i or −S _i is the direction for decreasing the function value.
Set depending on the direction for decreasing the function value, and f _i+1 = f(X _i+1).
Set the new value of i = i + 1 and go to step 2. Continue this procedure until no significant change is achieved in the value of the objective function.

The univariate method is very simple and can be implemented easily. However, it will not converge rapidly to the optimum solution, as it has a tendency to oscillate with steadily decreasing progress toward the optimum. Hence it will be better to stop the computations at some point near to the optimum point rather than trying to find the precise optimum point. In theory, the univariate method can be applied to find the minimum of any function that possesses continuous derivatives. However, if the function has a steep valley, the method may not even converge. For example, consider the contours of a function of two variables with a valley as shown in Figure 6.5. If the univariate search starts at point P, the function value cannot be decreased either in the direction ±S ₁ or in the direction ±S ₂. Thus the search comes to a halt and one may be misled to take the point P, which is certainly not the optimum point, as the optimum point. This situation arises whenever the value of the probe length ε needed for detecting the proper direction (±S ₁ or ±S ₂) happens to be less than the number of significant figures used in the computations.

Geometric illustration of failure of the univariate method on a steel valley. — Figure 6.5 Failure of the univariate method on a steep valley.

Example 6.4

Minimize with the starting point (0, 0).

Solution

We will take the probe length (ε) as 0.01 to find the correct direction for decreasing the function value in step 3. Further, we will use the differential calculus method to find the optimum step length along the direction ±S _i in step 4.

Iteration i = 1
1. Step 1:Set as i = 1 and
2. Step 2:Choose the search direction S ₁ as .
3. Step 3: To find whether the value of f decreases along S ₁ or −S ₁, we use the probe length ε. Since
  
  −S ₁ is the correct direction for minimizing f from X ₁.
4. Step 4: To find the optimum step length , we minimize
  
  As df/dλ ₁ = 0 at , we have .
5. Step 5: Set
Iteration i = 2
1. Step 1:Set as i = 2 and
2. Step 2:Choose the search direction S ₂ as .
3. Step 3: Since f ₂ = f(X ₂) = −0.125,
  
  S ₂ is the correct direction for decreasing the value of f from X ₂.
4. Step 4: We minimize f(X ₂ + λ ₂ S ₂) to find .
  Here
5. Step 5: Set

Next, we set the iteration number as i = 3, and continue the procedure until the optimum solution images with f(X ^*) = −1.25 is found.

Note: If the method is to be computerized, a suitable convergence criterion has to be used to test the point X _i+1(i = 1, 2, …) for optimality.

6.5 Pattern Directions

In the univariate method, we search for the minimum along directions parallel to the coordinate axes. We noticed that this method may not converge in some cases, and that even if it converges, its convergence will be very slow as we approach the optimum point. These problems can be avoided by changing the directions of search in a favorable manner instead of retaining them always parallel to the coordinate axes. To understand this idea, consider the contours of the function shown in Figure 6.6. Let the points 1, 2, 3, … indicate the successive points found by the univariate method. It can be noticed that the lines joining the alternate points of the search (e.g. 1, 3; 2, 4; 3, 5; 4, 6; …) lie in the general direction of the minimum and are known as pattern directions. It can be proved that if the objective function is a quadratic in two variables, all such lines pass through the minimum. Unfortunately, this property will not be valid for multivariable functions even when they are quadratics. However, this idea can still be used to achieve rapid convergence while finding the minimum of an n‐variable function. Methods that use pattern directions as search directions are known as pattern search methods.

Geometric illustration of lines defined by the alternate points lie in the general direction of the minimum. — Figure 6.6 Lines defined by the alternate points lie in the general direction of the minimum.

One of the best‐known pattern search methods, the Powell's method, is discussed in Section 6.6. In general, a pattern search method takes n univariate steps, where n denotes the number of design variables and then searches for the minimum along the pattern direction S _i, defined by

(6.24)

where X _i is the point obtained at the end of n univariate steps and X _i−n is the starting point before taking the n univariate steps. In general, the directions used prior to taking a move along a pattern direction need not be univariate directions.

6.6 Powell's Method

Powell's method is an extension of the basic pattern search method. It is the most widely used direct search method and can be proved to be a method of conjugate directions 7. A conjugate directions method will minimize a quadratic function in a finite number of steps. Since a general nonlinear function can be approximated reasonably well by a quadratic function near its minimum, a conjugate directions method is expected to speed up the convergence of even general nonlinear objective functions. The definition, a method of generation of conjugate directions, and the property of quadratic convergence are presented in this section.

6.6.1 Conjugate Directions

Definition: Conjugate Directions. Let A = [A] be an n × n symmetric matrix. A set of n vectors (or directions) {S _i} is said to be conjugate (more accurately A‐conjugate) if
(6.25)

It can be seen that orthogonal directions are a special case of conjugate directions (obtained with [A] = [I] in Eq. (6.25)).
Definition: Quadratically Convergent Method. If a minimization method, using exact arithmetic, can find the minimum point in n steps while minimizing a quadratic function in n variables, the method is called a quadratically convergent method.

Geometric illustration of conjugate directions. — Figure 6.7 Conjugate directions.

Proof

Let X ^* minimize the quadratic function Q(X). Then

(6.32)

Given a point X ₁ and a set of linearly independent directions S ₁, S ₂, …, S _n, constants β _i can always be found such that

(6.33)

where the vectors S ₁, S ₂, …, S _n have been used as basis vectors. If the directions S _i are A‐conjugate and none of them is zero, the S _i can easily be shown to be linearly independent and the β _i can be determined as follows.

Eqs. (6.32) and (6.33) lead to

(6.34)

Multiplying this equation throughout by , we obtain

(6.35)

Eq. (6.35) can be rewritten as

(6.36)

that is,

(6.37)

Now consider an iterative minimization procedure starting at point X ₁, and successively minimizing the quadratic Q(X) in the directions S ₁, S ₂, …, S _n, where these directions satisfy Eq. (6.25). The successive points are determined by the relation

(6.38)

where is found by minimizing Q(X _i + λ _i S _i) so that2

(6.39)

Since the gradient of Q at the point X _i+1 is given by

(6.40)

Eq. (6.39) can be written as

(6.41)

This equation gives

(6.42)

From Eq. (6.38), we can express X _i as

(6.43)

so that

(6.44)

using the relation (6.25). Thus Eq. (6.42) becomes

(6.45)

which can be seen to be identical to Eq. (6.37). Hence the minimizing step lengths are given by β _i or . Since the optimal point X ^* is originally expressed as a sum of n quantities β ₁, β ₂, …, β _n, which have been shown to be equivalent to the minimizing step lengths, the minimization process leads to the minimum point in n steps or less. Since we have not made any assumption regarding X ₁ and the order of S ₁, S ₂, …, S _n, the process converges in n steps or less, independent of the starting point as well as the order in which the minimization directions are used.

6.6.2 Algorithm

The basic idea of Powell's method is illustrated graphically for a two‐variable function in Figure 6.8. In this figure the function is first minimized once along each of the coordinate directions starting with the second coordinate direction and then in the corresponding pattern direction. This leads to point 5. For the next cycle of minimization, we discard one of the coordinate directions (the x ₁ direction in the present case) in favor of the pattern direction. Thus we minimize along u ₂ and S ₁ and obtain point 7. Then we generate a new pattern direction S ₂ as shown in the figure. For the next cycle of minimization, we discard one of the previously used coordinate directions (the x ₂ direction in this case) in favor of the newly generated pattern direction. Then, by starting from point 8, we minimize along directions S ₁ and S ₂, thereby obtaining points 9 and 10, respectively. For the next cycle of minimization, since there is no coordinate direction to discard, we restart the whole procedure by minimizing along the x ₂ direction. This procedure is continued until the desired minimum point is found.

Geometric illustration of progress of Powell's method. — Figure 6.8 Progress of Powell's method.

The flow diagram for the version of Powell's method described above is given in Figure 6.9. Note that the search will be made sequentially in the directions S _n; S ₁, S ₂, S ₃, …, S _n−1, S _n; ; S ₂, S ₃, …, S _n−1, S _n, ; ; S ₃, S ₄, …, S _n−1, S _n, , ; , … until the minimum point is found. Here S _i indicates the coordinate direction u _i and the jth pattern direction. In Figure 6.9, the previous base point is stored as the vector Z in block A, and the pattern direction is constructed by subtracting the previous base point from the current one in block B. The pattern direction is then used as a minimization direction in blocks C and D. For the next cycle, the first direction used in the previous cycle is discarded in favor of the current pattern direction. This is achieved by updating the numbers of the search directions as shown in block E. Thus, both points Z and X used in block B for the construction of pattern direction are points that are minima along S _n in the first cycle, the first pattern direction in the second cycle, the second pattern direction in the third cycle, and so on.

Flowchart depicts Powell's method. — Figure 6.9 Flowchart for Powell's Method.

Quadratic Convergence

It can be seen from Figure 6.9 that the pattern directions , , , … are nothing but the lines joining the minima found along the directions S _n, , , …, respectively. Hence by Theorem 6.1, the pairs of directions (S _n, ), ( , ), and so on, are A‐conjugate. Thus, all the directions S _n, , , … are A‐conjugate. Since, by Theorem 6.2, any search method involving minimization along a set of conjugate directions is quadratically convergent, Powell's method is quadratically convergent. From the method used for constructing the conjugate directions , , …, we find that n minimization cycles are required to complete the construction of n conjugate directions. In the ith cycle, the minimization is done along the already constructed i conjugate directions and the n − i nonconjugate (coordinate) directions. Thus after n cycles, all the n search directions are mutually conjugate and a quadratic will theoretically be minimized in n ² one‐dimensional minimizations. This proves the quadratic convergence of Powell's method.

It is to be noted that as with most of the numerical techniques, the convergence in many practical problems may not be as good as the theory seems to indicate. Powell's method may require a lot more iterations to minimize a function than the theoretically estimated number. There are several reasons for this:

Since the number of cycles n is valid only for quadratic functions, it will take generally greater than n cycles for nonquadratic functions.
The proof of quadratic convergence has been established with the assumption that the exact minimum is found in each of the one‐dimensional minimizations. However, the actual minimizing step lengths will be only approximate, and hence the subsequent directions will not be conjugate. Thus the method requires more number of iterations for achieving the overall convergence.
Powell's method, described above, can break down before the minimum point is found. This is because the search directions S _i might become dependent or almost dependent during numerical computation.

Convergence Criterion

The convergence criterion one would generally adopt in a method such as Powell's method is to stop the procedure whenever a minimization cycle produces a change in all variables less than one‐tenth of the required accuracy. However, a more elaborate convergence criterion, which is more likely to prevent premature termination of the process, was given by Powell 7.

Example 6.6

Minimize from the starting point images using Powell's method.

Solution

Cycle 1: Univariate Search

We minimize f along images from X ₁. To find the correct direction (+S ₂ or −S ₂) for decreasing the value of f, we take the probe length as ε = 0.01. As f ₁ = f(X ₁) = 0.0, and

f decreases along the direction +S ₂. To find the minimizing step length λ ^* along S ₂, we minimize

As df/dλ = 0 at , we have images .

Next, we minimize f along images from images . Since

f decreases along −S ₁. As f(X ₂ − λ S ₁) = f(−λ, 0.50) = 2λ ² − 2λ − 0.25, df/dλ = 0 at . Hence images .

Now we minimize f along images from images . As f ₃ = f(X ₃) = −0.75, f ⁺ = f(X ₃ + ε S ₂) = f(−0.5, 0.51) = −0.7599 < f ₃, f decreases along +S ₂ direction. Since

This gives

Cycle 2: Pattern Search

Now we generate the first pattern direction as

and minimize f along from X ₄. Since

f decreases in the positive direction of . As

at λ ^* = 1.0 and hence

The point X ₅ can be identified to be the optimum point.

If we do not recognize X ₅ as the optimum point at this stage, we proceed to minimize f along the direction images from X ₅. Then we would obtain

This shows that f cannot be minimized along S ₂, and hence X ₅ will be the optimum point. In this example the convergence has been achieved in the second cycle itself. This is to be expected in this case, as f is a quadratic function, and the method is a quadratically convergent method.

6.7 Simplex Method

Definition: Simplex. The geometric figure formed by a set of n + 1 points in an n‐dimensional space is called a simplex. When the points are equidistant, the simplex is said to be regular. Thus, in two dimensions the simplex is a triangle, and in three dimensions, it is a tetrahedron.

The basic idea in the simplex method3 is to compare the values of the objective function at the n + 1 vertices of a general simplex and move the simplex gradually toward the optimum point during the iterative process. The following equations can be used to generate the vertices of a regular simplex (equilateral triangle in two‐dimensional space) of size a in the n‐dimensional space 10:

(6.46)

where

(6.47)

where X ₀ is the initial base point and u _j is the unit vector along the jth coordinate axis. This method was originally given by Spendley et al. 10 and was developed later by Nelder and Mead 11. The movement of the simplex is achieved by using three operations, known as reflection, contraction, and expansion.

6.7.1 Reflection

If X _h is the vertex corresponding to the highest value of the objective function among the vertices of a simplex, we can expect the point X _r obtained by reflecting the point X _h in the opposite face to have the smallest value. If this is the case, we can construct a new simplex by rejecting the point X _h from the simplex and including the new point X _r. This process is illustrated in Figure 6.10. In Figure 6.10a, the points X ₁, X ₂, and X ₃ form the original simplex, and the points X ₁, X ₂, and X _r form the new one. Similarly, in Figure 6.10b, the original simplex is given by points X ₁, X ₂, X ₃, and X ₄, and the new one by X ₁, X ₂, X ₃, and X _r. Again, we can construct a new simplex from the present one by rejecting the vertex corresponding to the highest function value. Since the direction of movement of the simplex is always away from the worst result, we will be moving in a favorable direction. If the objective function does not have steep valleys, repetitive application of the reflection process leads to a zigzag path in the general direction of the minimum as shown in Figure 6.11. Mathematically, the reflected point X _r is given by

(6.48)

where X _h is the vertex corresponding to the maximum function value:

(6.49)

Geometric illustration of reflection. — Figure 6.10 Reflection.

Geometric illustration of progress of the reflection process. — Figure 6.11 Progress of the reflection process.

X ₀ is the centroid of all the points X _i except i = h:

(6.50)

and α > 0 is the reflection coefficient defined as

(6.51)

Thus X _r will lie on the line joining X _h and X ₀, on the far side of X ₀ from X _h with |X _r − X ₀| = α|X _h − X ₀|. If f(X _r) lies between f(X _h) and f(X _l), where X _l is the vertex corresponding to the minimum function value,

(6.52)

X _h is replaced by X _r and a new simplex is started.

If we use only the reflection process for finding the minimum, we may encounter certain difficulties in some cases. For example, if one of the simplexes (triangles in two dimensions) straddles a valley as shown in Figure 6.12 and if the reflected point X _r happens to have an objective function value equal to that of the point X _h, we will enter into a closed cycle of operations. Thus if X ₂ is the worst point in the simplex defined by the vertices X ₁, X ₂, and X ₃, the reflection process gives the new simplex with vertices X ₁, X ₃, and X _r. Again, since X _r has the highest function value out of the vertices X ₁, X ₃, and X _r, we obtain the old simplex itself by using the reflection process. Thus the optimization process is stranded over the valley and there is no way of moving toward the optimum point. This trouble can be overcome by making a rule that no return can be made to points that have just been left.

Geometric illustration of reflection process not leading to a new simplex. — Figure 6.12 Reflection process not leading to a new simplex.

Whenever such situation is encountered, we reject the vertex corresponding to the second worst value instead of the vertex corresponding to the worst function value. This method, in general, leads the process to continue toward the region of the desired minimum. However, the final simplex may again straddle the minimum, or it may lie within a distance of the order of its own size from the minimum. In such cases it may not be possible to obtain a new simplex with vertices closer to the minimum compared to those of the previous simplex, and the pattern may lead to a cyclic process, as shown in Figure 6.13. In this example the successive simplexes formed from the simplex 123 are 234, 245, 456, 467, 478, 348, 234, 245, …,4 which can be seen to be forming a cyclic process. Whenever this type of cycling is observed, one can take the vertex that is occurring in every simplex (point 4 in Figure 6.13) as the best approximation to the optimum point. If more accuracy is desired, the simplex has to be contracted or reduced in size, as indicated later.

Geometric illustration of reflection process leading to a cyclic process. — Figure 6.13 Reflection process leading to a cyclic process.

6.7.2 Expansion

If a reflection process gives a point X _r for which f(X _r) < f(X _l) (i.e. if the reflection produces a new minimum), one can generally expect to decrease the function value further by moving along the direction pointing from X ₀ to X _r. Hence we expand X _r to X _e using the relation

(6.53)

where γ is called the expansion coefficient, defined as

If f(X _e) < f(X _l), we replace the point X _h by X _e and restart the process of reflection. On the other hand, if f(X _e) > f(X _l), it means that the expansion process is not successful and hence we replace point X _h by X _r and start the reflection process again.

6.7.3 Contraction

If the reflection process gives a point X _r for which f(X _r) > f(X _i) for all i except i = h, and f(X _r) < f(X _h), we replace point X _h by X _r. Thus the new X _h will be X _r. In this case we contract the simplex as follows:

(6.54)

where β is called the contraction coefficient (0 ≤ β ≤ 1) and is defined as

If f(X _r) > f(X _h), we still use Eq. (6.54) without changing the previous point X _h. If the contraction process produces a point X _c for which f(X _c) < min[f(X _h), f(X _r)], we replace the point X _h in X ₁, X ₂, …, X _n+1 by X _c and proceed with the reflection process again. On the other hand, if f(X _c) ≥ min[f(X _h), f(X _r)], the contraction process will be a failure, and in this case we replace all X _i by (X _i + X _l)/2 and restart the reflection process.

The method is assumed to have converged whenever the standard deviation of the function at the n + 1 vertices of the current simplex is smaller than some prescribed small quantity ε, that is,

(6.55)

Example 6.7

Minimize . Take the points defining the initial simplex as

and α = 1.0, β = 0.5, and γ = 2.0. For convergence, take the value of ε as 0.2.

Solution

Iteration 1
1. Step 1: The function value at each of the vertices of the current simplex is given by
  
  Therefore,
2. Step 2: The centroid X ₀ is obtained as
3. Step 3: The reflection point is found as
  
  Then
4. Step 4: As f(X _r) < f(X _l), we find X _e by expansion as
  
  Then
5. Step 5: Since f(X _e) < f(X _l), we replace X _h by X _e and obtain the vertices of the new simplex as
6. Step 6: To test for convergence, we compute
  
  As this quantity is not smaller than ε, we go to the next iteration.
Iteration 2
1. Step 1: As f(X ₁) = 80.0, f(X ₂) = 56.75, and f(X ₃) = 96.0,
2. Step 2: The centroid is
3. Step 3:
4. Step 4: As f(X _r) < f(X _l), we find X _e as
5. Step 5: As f(X _e) < f(X _l), we replace X _h by X _e and obtain the new vertices as
6. Step 6: For convergence, we compute Q as
  
  Since Q > ε, we go to the next iteration.

This procedure can be continued until the specified convergence is satisfied. When the convergence is satisfied, the centroid X ₀ of the latest simplex can be taken as the optimum point.

Indirect Search (Descent) Methods

6.8 Gradient of a Function

The gradient of a function is an n‐component vector given by

(6.56)

The gradient has a very important property. If we move along the gradient direction from any point in n‐dimensional space, the function value increases at the fastest rate. Hence the gradient direction is called the direction of steepest ascent. Unfortunately, the direction of steepest ascent is a local property and not a global one. This is illustrated in Figure 6.14, where the gradient vectors ∇f evaluated at points 1, 2, 3, and 4 lies along the directions 11′, 22′, 33′, and 44′, respectively. Thus the function value increases at the fastest rate in the direction 11′ at point 1, but not at point 2. Similarly, the function value increases at the fastest rate in direction 22′(33′) at point 2 (3), but not at point 3 (4). In other words, the direction of steepest ascent generally varies from point to point, and if we make infinitely small moves along the direction of steepest ascent, the path will be a curved line like the curve 1–2–3–4 in Figure 6.14.

Geometric illustration of steepest ascent directions. — Figure 6.14 Steepest ascent directions.

Since the gradient vector represents the direction of steepest ascent, the negative of the gradient vector denotes the direction of steepest descent. Thus any method that makes use of the gradient vector can be expected to give the minimum point faster than one that does not make use of the gradient vector. All the descent methods make use of the gradient vector, either directly or indirectly, in finding the search directions. Before considering the descent methods of minimization, we prove that the gradient vector represents the direction of steepest ascent.

Proof

Consider an arbitrary point X in the n‐dimensional space. Let f denote the value of the objective function at the point X. Consider a neighboring point X + d X with

(6.57)

where dx ₁, dx ₂, …, dx _n represent the components of the vector d X. The magnitude of the vector d X, ds, is given by

(6.58)

If f + df denotes the value of the objective function at X + d X, the change in f, df, associated with d X can be expressed as

(6.59)

If u denotes the unit vector along the direction d X and ds the length of d X, we can write

(6.60)

The rate of change of the function with respect to the step length ds is given by Eq. (6.59) as

(6.61)

The value of df/ds will be different for different directions and we are interested in finding the particular step d X along which the value of df/ds will be maximum. This will give the direction of steepest ascent.5 By using the definition of the dot product, Eq. (6.61) can be rewritten as

(6.62)

where ||∇f|| and ||u|| denote the lengths of the vectors ∇f and u, respectively, and θ indicates the angle between the vectors ∇f and u. It can be seen that df/ds will be maximum when θ = 0° and minimum when θ = 180°. This indicates that the function value increases at a maximum rate in the direction of the gradient (i.e. when u is along ∇f).

6.8.1 Evaluation of the Gradient

The evaluation of the gradient requires the computation of the partial derivatives ∂f/∂x _i, i = 1, 2, …, n. There are three situations where the evaluation of the gradient poses certain problems:

The function is differentiable at all the points, but the calculation of the components of the gradient, ∂f/∂x _i, is either impractical or impossible.
The expressions for the partial derivatives ∂f/∂x _i can be derived, but they require large computational time for evaluation.
The gradient ∇f is not defined at all the points.

In the first case, we can use the forward finite‐difference formula

(6.63)

to approximate the partial derivative ∂f/∂x _i at X _m. If the function value at the base point X _m is known, this formula requires one additional function evaluation to find (∂f/∂x _i)|_Xm. Thus it requires n additional function evaluations to evaluate the approximate gradient ∇f|_X
m. For better results we can use the central finite difference formula to find the approximate partial derivative ∂f/∂x _i|_X
m:

(6.64)

This formula requires two additional function evaluations for each of the partial derivatives. In Eqs. (6.63) and (6.64), Δx _i is a small scalar quantity and u _i is a vector of order n whose ith component has a value of 1, and all other components have a value of zero. In practical computations, the value of Δx _i has to be chosen with some care. If Δx _i is too small, the difference between the values of the function evaluated at (X _m + Δx _i u _i) and (X _m − Δx _i u _i) may be very small and numerical round‐off error may predominate. On the other hand, if Δx _i is too large, the truncation error may predominate in the calculation of the gradient.

In the second case also, the use of finite‐difference formulas is preferred whenever the exact gradient evaluation requires more computational time than the one involved in using Eq. (6.63) or (6.64).

In the third case, we cannot use the finite‐difference formulas since the gradient is not defined at all the points. For example, consider the function shown in Figure 6.15. If Eq. (6.64) is used to evaluate the derivative df/ds at X _m, we obtain a value of α ₁ for a step size Δx ₁ and a value of α ₂ for a step size Δx ₂. Since, in reality, the derivative does not exist at the point X _m, use of finite‐difference formulas might lead to a complete breakdown of the minimization process. In such cases the minimization can be done only by one of the direct search techniques discussed earlier.

Graph depicts gradient not defined at xm. — Figure 6.15 Gradient not defined at x _m.

6.8.2 Rate of Change of a Function Along a Direction

In most optimization techniques, we are interested in finding the rate of change of a function with respect to a parameter λ along a specified direction, S _i, away from a point X _i. Any point in the specified direction away from the given point X _i can be expressed as X = X _i + λ S _i. Our interest is to find the rate of change of the function along the direction S _i (characterized by the parameter λ), that is,

(6.65)

where x _j is the jth component of X. But

(6.66)

where x _ij and s _ij are the jth components of X _i and S _i, respectively. Hence

(6.67)

If λ ^* minimizes f in the direction S _i, we have

(6.68)

at the point X _i + λ ^* S _i.

6.9 Steepest Descent (CAuchy) Method

The use of the negative of the gradient vector as a direction for minimization was first made by Cauchy in 1847 12. In this method we start from an initial trial point X ₁ and iteratively move along the steepest descent directions until the optimum point is found. The steepest descent method can be summarized by the following steps:

Start with an arbitrary initial point X ₁. Set the iteration number as i = 1.
Find the search direction S _i as
(6.69)
Determine the optimal step length in the direction S _i and set
(6.70)
Test the new point, X _i+1, for optimality. If X _i+1 is optimum, stop the process. Otherwise, go to step 5.
Set the new iteration number i = i + 1 and go to step 2.

The method of steepest descent may appear to be the best unconstrained minimization technique since each one‐dimensional search starts in the “best” direction. However, owing to the fact that the steepest descent direction is a local property, the method is not really effective in most problems.

Example 6.8

Minimize starting from the point images .

Solution

Iteration 1

The gradient of f is given by

Therefore,

To find X ₂, we need to find the optimal step length . For this, we minimize with respect to λ ₁. Since df/dλ ₁ = 0 at , we obtain

As images , X ₂ is not optimum.

Iteration 2

To minimize

we set df/dλ ₂ = 0. This gives and hence

Since the components of the gradient at X ₃, images , are not zero, we proceed to the next iteration.

Iteration 3

Therefore,

The gradient at X ₄ is given by

Since , X ₄ is not optimum and hence we have to proceed to the next iteration. This process has to be continued until the optimum point, , is found.

Convergence Criteria: The following criteria can be used to terminate the iterative process.

When the change in function value in two consecutive iterations is small:
(6.71)
When the partial derivatives (components of the gradient) of f are small:
(6.72)
When the change in the design vector in two consecutive iterations is small:
(6.73)

6.10 Conjugate Gradient (FLetcher–REeves) Method

The convergence characteristics of the steepest descent method can be improved greatly by modifying it into a conjugate gradient method (which can be considered as a conjugate directions method involving the use of the gradient of the function). We saw (in Section 6.6) that any minimization method that makes use of the conjugate directions is quadratically convergent. This property of quadratic convergence is very useful because it ensures that the method will minimize a quadratic function in n steps or less. Since any general function can be approximated reasonably well by a quadratic near the optimum point, any quadratically convergent method is expected to find the optimum point in a finite number of iterations.

We have seen that Powell's conjugate direction method requires n single‐variable minimizations per iteration and sets up a new conjugate direction at the end of each iteration. Thus it requires, in general, n ² single‐variable minimizations to find the minimum of a quadratic function. On the other hand, if we can evaluate the gradients of the objective function, we can set up a new conjugate direction after every one‐dimensional minimization, and hence we can achieve faster convergence. The construction of conjugate directions and development of the Fletcher–Reeves method are discussed in this section.

6.10.1 Development of the Fletcher–Reeves Method

The Fletcher–Reeves method is developed by modifying the steepest descent method to make it quadratically convergent. Starting from an arbitrary point X ₁, the quadratic function

(6.74)

can be minimized by searching along the search direction S ₁ = −∇f ₁ (steepest descent direction) using the step length (see Problem 40):

(6.75)

The second search direction S ₂ is found as a linear combination of S ₁ and − ∇f ₂:

(6.76)

where the constant β ₂ can be determined by making S ₁ and S ₂ conjugate with respect to [A]. This leads to (see Problem 41):

(6.77)

This process can be continued to obtain the general formula for the ith search direction as

(6.78)

where

(6.79)

Thus the Fletcher–Reeves algorithm can be stated as follows.

6.10.2 Fletcher–Reeves Method

The iterative procedure of Fletcher–Reeves method can be stated as follows:

Start with an arbitrary initial point X ₁.
Set the first search direction S ₁ = −∇f(X ₁) = −∇f ₁.
Find the point X ₂ according to the relation
(6.80)

where is the optimal step length in the direction S ₁. Set i = 2 and go to the next step.
Find ∇f _i = ∇f(X _i), and set
(6.81)
Compute the optimum step length in the direction S _i, and find the new point
(6.82)
Test for the optimality of the point X _i+1. If X _i+1 is optimum, stop the process. Otherwise, set the value of i = i + 1 and go to step 4.

Remarks

The Fletcher–Reeves method was originally proposed by Hestenes and Stiefel 14 as a method for solving systems of linear equations derived from the stationary conditions of a quadratic. Since the directions S _i used in this method are A‐conjugate, the process should converge in n cycles or less for a quadratic function. However, for ill‐conditioned quadratics (whose contours are highly eccentric and distorted), the method may require much more than n cycles for convergence. The reason for this has been found to be the cumulative effect of rounding errors. Since S _i is given by Eq. (6.81), any error resulting from the inaccuracies involved in the determination of , and from the round‐off error involved in accumulating the successive |∇f _i|² S _i−1/|∇f _i−1|² terms, is carried forward through the vector S _i. Thus the search directions S _i will be progressively contaminated by these errors. Hence it is necessary, in practice, to restart the method periodically after every, say, m steps by taking the new search direction as the steepest descent direction. That is, after every m steps, S _m+1 is set equal to −∇f _m+1 instead of the usual form. Fletcher and Reeves have recommended a value of m = n + 1, where n is the number of design variables.
Despite the limitations indicated above, the Fletcher–Reeves method is vastly superior to the steepest descent method and the pattern search methods, but it turns out to be rather less efficient than the Newton and the quasi‐Newton (variable metric) methods discussed in the latter sections.

6.11 Newton's Method

Newton's method presented in Section 5.12.1 can be extended for the minimization of multivariable functions. For this, consider the quadratic approximation of the function f(X) at X = X _i using the Taylor's series expansion

(6.83)

where [J _i] = [J]|_X
i is the matrix of second partial derivatives (Hessian matrix) of f evaluated at the point X _i. By setting the partial derivatives of Eq. (6.83) equal to zero for the minimum of f(X), we obtain

(6.84)

Equations (6.84) and (6.83) give

(6.85)

If [J _i] is nonsingular, Eq. (6.85) can be solved to obtain an improved approximation (X = X _i+1) as

(6.86)

Since higher‐order terms have been neglected in Eq. (6.83), Eq. (6.86) is to be used iteratively to find the optimum solution X*.

The sequence of points X ₁, X ₂, …, X _i+1 can be shown to converge to the actual solution X* from any initial point X ₁ sufficiently close to the solution X*, provided that [J ₁] is nonsingular. It can be seen that Newton's method uses the second partial derivatives of the objective function (in the form of the matrix [J _i]) and hence is a second‐order method.

Example 6.10

Show that the Newton's method finds the minimum of a quadratic function in one iteration.

Solution

Let the quadratic function be given by

The minimum of f(X) is given by

The iterative step of Eq. (6.86) gives

(E1)

where X _i is the starting point for the ith iteration. Thus Eq. (E1) gives the exact solution

Figure 6.16 illustrates this process.

Geometric illustration of minimization of a quadratic function of one step. — Figure 6.16 Minimization of a quadratic function in one step.

Schematic illustration of bar subjected to an axial load. — Figure 6.17 Bar subjected to an axial load.

Geometric illustration of tapered cantilever beam. — Figure 6.18 Tapered cantilever beam.

Schematic illustration of three-degree-of-freedom mass system. — Figure 6.19 Three‐degree‐of‐freedom spring–mass system.

Schematic illustration of straight fin. — Figure 6.20 Straight fin.

Circuits diagram depicts two bodies connected by springs. — Figure 6.21 Two bodies connected by springs.

Geometric illustration of two-bar truss. — Figure 6.22 Two‐bar truss.

Schematic illustration of three carts interconnected by springs. — Figure 6.23 Three carts interconnected by springs.

Example 6.11

Minimize by taking the starting point as .

Solution

To find X ₂ according to Eq. (6.86), we require [J ₁]⁻¹, where

Therefore,

Eq. (6.86) gives

To see whether or not X ₂ is the optimum point, we evaluate

As g ₂ = 0, X ₂ is the optimum point. Thus the method has converged in one iteration for this quadratic function.

If f(X) is a nonquadratic function, Newton's method may sometimes diverge, and it may converge to saddle points and relative maxima. This problem can be avoided by modifying Eq. (6.86) as

(6.87)

where is the minimizing step length in the direction S _i = −[J _i]⁻¹∇f _i. The modification indicated by Eq. (6.87) has a number of advantages. First, it will find the minimum in lesser number of steps compared to the original method. Second, it finds the minimum point in all cases, whereas the original method may not converge in some cases. Third, it usually avoids convergence to a saddle point or a maximum. With all these advantages, this method appears to be the most powerful minimization method. Despite these advantages, the method is not very useful in practice, due to the following features of the method:

It requires the storing of the n × n matrix [J _i].
It becomes very difficult and sometimes impossible to compute the elements of the matrix [J _i].
It requires the inversion of the matrix [J _i] at each step.
It requires the evaluation of the quantity [J _i]⁻¹∇f _i at each step.

These features make the method impractical for problems involving a complicated objective function with a large number of variables.

6.12 MArquardt Method

The steepest descent method reduces the function value when the design vector X _i is away from the optimum point X*. The Newton method, on the other hand, converges fast when the design vector X _i is close to the optimum point X*. The Marquardt method 15 attempts to take advantage of both the steepest descent and Newton methods. This method modifies the diagonal elements of the Hessian matrix, [J _i], as

(6.88)

where [I] is an identity matrix and α _i is a positive constant that ensures the positive definiteness of [ ] when [J _i] is not positive definite. It can be noted that when α _i is sufficiently large (on the order of 10⁴), the term α _i[I] dominates [J _i] and the inverse of the matrix [J _i] becomes

(6.89)

Thus if the search direction S _i is computed as

(6.90)

S _i becomes a steepest descent direction for large values of α _i. In the Marquardt method, the value of α _i is taken to be large at the beginning and then reduced to zero gradually as the iterative process progresses. Thus as the value of α _i decreases from a large value to zero, the characteristics of the search method change from those of a steepest descent method to those of the Newton method. The iterative process of a modified version of Marquardt method can be described as follows.

Start with an arbitrary initial point X ₁ and constants α ₁ (on the order of 10⁴), c ₁(0 < c ₁ < 1), c ₂(c ₂ > 1), and ε (on the order of 10⁻²). Set the iteration number as i = 1.
Compute the gradient of the function, ∇f _i = ∇f(X _i).
Test for optimality of the point X _i. If ||∇f _i|| = ||∇f(X _i)|| ≤ ε, X _i is optimum and hence stop the process. Otherwise, go to step 4.
Find the new vector X _i+1 as
(6.91)
Compare the values of f _i+1 and f _i. If f _i+1 < f _i, go to, step 6. If f _i+1 ≥ f _i, go to step 7.
Set α _i+1 = c ₁ α _i, i = i + 1, and go to step 2.
Set α _i = c ₂ α _i and go to step 4.

An advantage of this method is the absence of the step size λ_i along the search direction S _i. In fact, the algorithm above can be modified by introducing an optimal step length in Eq. (6.91) as

(6.92)

where is found using any of the one‐dimensional search methods described in Chapter 5.

Example 6.12

Minimize from the starting point using Marquardt method with α ₁ = 10⁴, c ₁ = , c ₂ = 2, and ε = 10⁻².

Solution

Iteration 1 (i = 1)

Here f ₁ = f(X ₁) = 0.0 and

Since ||∇f ₁|| = 1.4142 > ε, we compute

As f ₂ = f(X ₂) = −1.9997 × 10⁻⁴ < f ₁, we set α ₂ = c ₁ α ₁ = 2500, i = 2, and proceed to the next iteration.

Iteration 2 (i = 2)

The gradient vector corresponding to X ₂ is given by , ||∇f ₂|| = 1.4141 > ε, and hence we compute

Since f ₃ = f(X ₃) = −0.9993 × 10⁻³ < f ₂, we set α ₃ = c ₁ α ₂ = 625, i = 3, and proceed to the next iteration. The iterative process is to be continued until the convergence criterion, ||∇f _i|| < ε, is satisfied.

6.13 Quasi‐Newton Methods

The basic iterative process used in the Newton's method is given by Eq. (6.86):

(6.93)

where the Hessian matrix [J _i] is composed of the second partial derivatives of f and varies with the design vector X _i for a nonquadratic (general nonlinear) objective function f. The basic idea behind the quasi‐Newton or variable metric methods is to approximate either [J _i] by another matrix [A _i] or [J _i]⁻¹ by another matrix [B _i], using only the first partial derivatives of f. If [J _i]⁻¹ is approximated by [B _i], Eq. (6.93) can be expressed as

(6.94)

where can be considered as the optimal step length along the direction

(6.95)

It can be seen that the steepest descent direction method can be obtained as a special case of Eq. (6.95) by setting [B _i] = [I].

6.13.1 Computation of [B_i ]

To implement Eq. (6.94), an approximate inverse of the Hessian matrix [B _i] ≡ [A _i]⁻¹, is to be computed. For this, we first expand the gradient of f about an arbitrary reference point, X ₀, using Taylor's series as

(6.96)

If we pick two points X _i and X _i+1 and use [A _i] to approximate [J ₀], Eq. (6.96) can be rewritten as

(6.97)

(6.98)

Subtracting Eq. (6.98) from (6.97) yields

(6.99)

where

(6.100)

(6.101)

The solution of Eq. (6.99) for d _i can be written as

(6.102)

where [B _i] = [A _i]⁻¹ denotes an approximation to the inverse of the Hessian matrix, [J ₀]⁻¹. It can be seen that Eq. (6.102) represents a system of n equations in n ² unknown elements of the matrix [B _i]. Thus for n > 1, the choice of [B _i] is not unique and one would like to choose [B _i] that is closest to [J ₀]⁻¹, in some sense. Numerous techniques have been suggested in the literature for the computation of [B _i] as the iterative process progresses (i.e. for the computation of [B _i+1] once [B _i] is known). A major concern is that in addition to satisfying Eq. (6.102), the symmetry and positive definiteness of the matrix [B _i] is to be maintained; that is, if [B _i] is symmetric and positive definite, [B _i+1] must remain symmetric and positive definite.

6.13.2 Rank 1 Updates

The general formula for updating the matrix [B _i] can be written as

(6.103)

where [ΔB _i] can be considered to be the update (or correction) matrix added to [B _i]. Theoretically, the matrix [ΔB _i] can have its rank as high as n. However, in practice, most updates, [ΔB _i], are only of rank 1 or 2. To derive a rank 1 update, we simply choose a scaled outer product of a vector z for [ΔB _i] as

(6.104)

where the constant c and the n‐component vector z are to be determined. Equations (6.103) and (6.104) lead to

(6.105)

By forcing Eq. (6.105) to satisfy the quasi‐Newton condition, Eq. (6.102),

(6.106)

we obtain

(6.107)

Since (z ^T g _i) in Eq. (6.107) is a scalar, we can rewrite Eq. (6.107) as

(6.108)

Thus a simple choice for z and c would be

(6.109)

(6.110)

This leads to the unique rank 1 update formula for [B _i+1]:

(6.111)

This formula has been attributed to Broyden 16. To implement Eq. (6.111), an initial symmetric positive definite matrix is selected for [B ₁] at the start of the algorithm, and the next point X ₂ is computed using Eq. (6.94). Then the new matrix [B ₂] is computed using Eq. (6.111) and the new point X ₃ is determined from Eq. (6.94). This iterative process is continued until convergence is achieved. If [B _i] is symmetric, Eq. (6.111) ensures that [B _i+1] is also symmetric. However, there is no guarantee that [B _i+1] remains positive definite even if [B _i] is positive definite. This might lead to a breakdown of the procedure, especially when used for the optimization of nonquadratic functions. It can be verified easily that the columns of the matrix [ΔB _i] given by Eq. (6.111) are multiples of each other. Thus the updating matrix has only one independent column and hence the rank of the matrix will be 1. This is the reason why Eq. (6.111) is considered to be a rank 1 updating formula. Although the Broyden formula, Eq. (6.111), is not robust, it has the property of quadratic convergence 17. The rank 2 update formulas which are given next guarantee both symmetry and positive definiteness of the matrix [B _i+1] and are more robust in minimizing general nonlinear functions, hence are preferred in practical applications.

6.13.3 Rank 2 Updates

In rank 2 updates we choose the update matrix [ΔB _i] as the sum of two rank 1 updates as

(6.112)

where the constants c ₁ and c ₂ and the n‐component vectors z ₁ and z ₂ are to be determined. Equations (6.103) and (6.112) lead to

(6.113)

By forcing Eq. (6.113) to satisfy the quasi‐Newton condition, Eq. (6.106), we obtain

(6.114)

where and can be identified as scalars. Although the vectors z ₁ and z ₂ in Eq. (6.114) are not unique, the following choices can be made to satisfy Eq. (6.114):

(6.115)

(6.116)

(6.117)

(6.118)

Thus the rank 2 update formula can be expressed as

(6.119)

This equation is known as the Davidon–Fletcher–Powell (DFP) formula 20,21. Since

(6.120)

where S _i is the search direction, d _i = X _i+1 − X _i can be rewritten as

(6.121)

Thus Eq. (6.119) can be expressed as

(6.122)

Remarks

Equations (6.111) and (6.119) are known as inverse update formulas since these equations approximate the inverse of the Hessian matrix of f.
It is possible to derive a family of direct update formulas in which approximations to the Hessian matrix itself are considered. For this we express the quasi‐Newton condition as (see Eq. (6.99))
(6.123)

The procedure used in deriving Eqs. (6.111) and (6.119) can be followed by using [A _i], d _i, and g _i in place of [B _i], g _i, and d _i, respectively. This leads to the rank 2 update formula, which is similar to Eq. (6.119), known as the Broydon–Fletcher–Goldfarb–Shanno (BFGS) formula 22–25:

(6.124)

In practical computations, Eq. (6.124) is rewritten more conveniently in terms of [B _i], as

(6.125)
The DFP and the BFGS formulas belong to a family of rank 2 updates known as Huang's family of updates 18, which can be expressed for updating the inverse of the Hessian matrix as
(6.126)

where

(6.127)

and ρ _i and θ _i are constant parameters. It has been shown 18 that Eq. (6.126) maintains the symmetry and positive definiteness of [B _i+1] if [B _i] is symmetric and positive definite. Different choices of ρ _i and θ _i in Eq. (6.126) lead to different algorithms. For example, when ρ _i = 1 and θ _i = 0, Eq. (6.126) gives the DFP formula, Eq. (6.119). When ρ _i = 1 and θ _i = 1, Eq. (6.126) yields the BFGS formula, Eq. (6.125).
It has been shown that the BFGS method exhibits superlinear convergence near X ^* 17.
Numerical experience indicates that the BFGS method is the best unconstrained variable metric method and is less influenced by errors in finding compared to the DFP method.
The methods discussed in this section are also known as secant methods since Eqs. (6.99) and (6.102) can be considered as secant equations (see Section 5.12).

The DFP and BFGS iterative methods are described in detail in the following sections.

6.14 DAvidon–FLetcher–POwell Method

The iterative procedure of the DFP method can be described as follows:

Start with an initial point X ₁ and a n × n positive definite symmetric matrix [B ₁] to approximate the inverse of the Hessian matrix of f. Usually, [B ₁] is taken as the identity matrix [I]. Set the iteration number as i = 1.
Compute the gradient of the function, ∇f _i, at point X _i, and set
(6.128)
Find the optimal step length in the direction S _i and set
(6.129)
Test the new point X _i+1 for optimality. If X _i+1 is optimal, terminate the iterative process. Otherwise, go to step 5.
Update the matrix [B _i] using Eq. (6.119) as
(6.130)

where

(6.131)

(6.132)

(6.133)
Set the new iteration number as i = i + 1, and go to step 2.

Note: The matrix [B _i+1], given by Eq. (6.130), remains positive definite only if is found accurately. Thus if is not found accurately in any iteration, the matrix [B _i] should not be updated. There are several alternatives in such a case. One possibility is to compute a better value of by using more number of refits in the one‐dimensional minimization procedure (until the product ∇f _i+1 becomes sufficiently small). However, this involves more computational effort. Another possibility is to specify a maximum number of refits in the one‐dimensional minimization method and to skip the updating of [B _i] if could not be found accurately in the specified number of refits. The last possibility is to continue updating the matrix [B _i] using the approximate values of found, but restart the whole procedure after certain number of iterations, that is, restart with i = 1 in step 2 of the method.

Example 6.14

Minimize taking as the starting point. Use cubic interpolation method for one‐dimensional minimization.

Solution

Since this method requires the gradient of f, we find that

Iteration 1

We take

At , and f ₁ = 3609. Therefore,

By normalizing, we obtain

To find , we minimize

(E1)

with respect to λ₁. Eq. (E1) gives

Since the solution of the equation df/dλ₁ = 0 cannot be obtained in a simple manner, we use the cubic interpolation method for finding .

Cubic Interpolation Method (First Fitting)

Stage 1:As the search direction S ₁ is normalized already, we go to stage 2.
Stage 2: To establish lower and upper bounds on the optimal step size , we have to find two points A and B at which the slope df/dλ₁ has different signs. We take A = 0 and choose an initial step size of t ₀ = 0.25 to find B.
At λ₁ = A = 0:

At λ₁ = t ₀ = 0.25:

As df/dλ₁ is negative, we accelerate the search by taking λ₁ = 4t ₀ = 1.00.

At λ₁ = 1.00:

Since df/dλ₁ is still negative, we take λ₁ = 2.00.

At λ₁ = 2.00:

Although df/dλ₁ is still negative, it appears to have come close to zero and hence we take the next value of λ₁ as 2.50.

At λ₁ = 2.50:

Since df/dλ₁ is negative at λ₁ = 2.0 and positive at λ₁ = 2.5, we take A = 2.0 (instead of zero for faster convergence) and B = 2.5. Therefore,
Stage 3: To find the optimal step length using Eq. (5.54), we compute

Therefore,
Stage 4: To find whether is close to , we test the value of df/dλ₁.

Also,

Since df/dλ₁ is not close to zero at , we use a refitting technique.

Second Fitting: Now we take since df/dλ₁ is negative at and B = 2.5.

Thus

With these values we find that

To test for convergence, we evaluate df/dλ at . Since , it can be assumed to be sufficiently close to zero and hence we take ≃ = 2.201. This gives

Testing X ₂ for convergence: To test whether the D‐F‐P method has converged, we compute the gradient of f at X ₂:

As the components of this vector are not close to zero, X ₂ is not optimum and hence the procedure has to be continued until the optimum point is found.

6.15 BRoyden–FLetcher–GOldfarb–SHanno Method

As stated earlier, a major difference between the DFP and BFGS methods is that in the BFGS method, the Hessian matrix is updated iteratively rather than the inverse of the Hessian matrix. The BFGS method can be described by the following steps.

Start with an initial point X ₁ and a n × n positive definite symmetric matrix [B ₁] as an initial estimate of the inverse of the Hessian matrix of f. In the absence of additional information, [B ₁] is taken as the identity matrix [I]. Compute the gradient vector ∇f ₁ = ∇f(X ₁) and set the iteration number as i = 1.
Compute the gradient of the function, ∇f _i, at point X _i, and set
(6.134)
Find the optimal step length in the direction S _i and set
(6.135)
Test the point X _i+1 for optimality. If ||∇f _i+1|| ≤ ε, where ε is a small quantity, take X^* ≈ X _i+1 and stop the process. Otherwise, go to step 5.
Update the Hessian matrix as
(6.136)

where

(6.137)

(6.138)
Set the new iteration number as i = i + 1 and go to step 2.

Remarks

The BFGS method can be considered as a quasi‐Newton, conjugate gradient, and variable metric method.
Since the inverse of the Hessian matrix is approximated, the BFGS method can be called an indirect update method.
If the step lengths are found accurately, the matrix, [B _i], retains its positive definiteness as the value of i increases. However, in practical application, the matrix [B _i] might become indefinite or even singular if are not found accurately. As such, periodical resetting of the matrix [B _i] to the identity matrix [I] is desirable. However, numerical experience indicates that the BFGS method is less influenced by errors in than is the DFP method.
It has been shown that the BFGS method exhibits superlinear convergence near X* [19].

6.16 Test Functions

The efficiency of an optimization algorithm is studied using a set of standard functions. Several functions, involving different number of variables, representing a variety of complexities have been used as test functions. Almost all the test functions presented in the literature are nonlinear least squares; that is, each function can be represented as

(6.139)

where n denotes the number of variables and m indicates the number of functions (f _i) that define the least‐squares problem. The purpose of testing the functions is to show how well the algorithm works compared to other algorithms. Usually, each test function is minimized from a standard starting point. The total number of function evaluations required to find the optimum solution is usually taken as a measure of the efficiency of the algorithm. References [29–32] present a comparative study of the various unconstrained optimization techniques. Some of the commonly used test functions are given below.

Rosenbrock's parabolic valley 8:
(6.140)
A quadratic function:
(6.141)
Powell's quartic function 7:
(6.142)
Fletcher and Powell's helical valley 21:
(6.143)

where
A nonlinear function of three variables 7:
(6.144)
Freudenstein and Roth function 27:
(6.145)
Powell's badly scaled function 28:
(6.146)
Brown's badly scaled function 29:
(6.147)
Beale's function 29:
(6.148)
Wood's function 30:
(6.149)

6.17 Solutions Using Matlab

The solution of different types of optimization problems using MATLAB is presented in Chapter 17. Specifically, the MATLAB solution of multivariable unconstrained optimization problem, the Rosenbrock's parabolic valley function given by Eq. (6.140), is given in Example 17.6.

References and Bibliography

1 Rao, S.S. (2018). The Finite Element Method in Engineering, 6e. Oxford: Elsevier‐Butterworth‐Heinemann.
2 Edgar, T.F. and Himmelblau, D.M. (1988). Optimization of Chemical Processes. New York: McGraw‐Hill.
3 Fox, R.L. (1971). Optimization Methods for Engineering Design. Reading, MA: Addison‐Wesley.
4 Biles, W.E. and Swain, J.J. (1980). Optimization and Industrial Experimentation. New York: Wiley.
5 Hicks, C.R. (1993). Fundamental Concepts in the Design of Experiments. Fort Worth, TX: Saunders College Publishing.
6 Hooke, R. and Jeeves, T.A. (1961). Direct search solution of numerical and statistical problems. Journal of the ACM 8 (2): 212–229.
7 Powell, M.J.D. (1964). An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Computer Journal 7 (4): 303–307.
8 Rosenbrock, H.H. (1960). An automatic method for finding the greatest or least value of a function. The Computer Journal 3 (3): 175–184.
9 Rao, S.S. (1984). Optimization: Theory and Applications, 2e. New Delhi: Wiley Eastern.
10 Spendley, W., Hext, G.R., and Himsworth, F.R. (1962). Sequential application of simplex designs in optimization and evolutionary operation. Technometrics 4: 441.
11 Nelder, J.A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal 7: 308.
12 Cauchy, A.L. (1847). Méthode générale pour la résolution des systèmes d'équations simultanées. Comptes Rendus de l'Academie des Sciences , Paris 25: 536–538.
13 Fletcher, R. and Reeves, C.M. (1964). Function minimization by conjugate gradients. The Computer Journal 7 (2): 149–154.
14 Hestenes, M.R. and Stiefel, E. (1952). Methods of Conjugate Gradients for Solving Linear Systems, Report 1659, National Bureau of Standards, Washington, DC.
15 Marquardt, D. (1963). An algorithm for least squares estimation of nonlinear parameters. SIAM Journal on Applied Mathematics 11 (2): 431–441.
16 Broyden, C.G. (1967). Quasi‐Newton methods and their application to function minimization. Mathematics of Computation 21: 368.
17 Broyden, C.G., Dennis, J.E., and More, J.J. (1975). On the local and superlinear convergence of quasi‐Newton methods. Journal of the Institute of Mathematics and Its Applications 12: 223.
18 Huang, H.Y. (1970). Unified approach to quadratically convergent algorithms for function minimization. Journal of Optimization Theory and Applications 5: 405–423.
19 Dennis, J.E. Jr. and More, J.J. (1977). Quasi‐Newton methods, motivation and theory. SIAM Review 19 (1): 46–89.
20 Davidon, W.C. (1959). Variable Metric Method of Minimization, Report ANL‐5990, Argonne National Laboratory, Argonne, IL.
21 Fletcher, R. and Powell, M.J.D. (1963). A rapidly convergent descent method for minimization. The Computer Journal 6 (2): 163–168.
22 Broyden, G.G. (1970). The convergence of a class of double‐rank minimization algorithms, Parts I and II. Journal of the Institute of Mathematics and Its Applications 6, pp.: 76–90, 222–231.
23 Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal 13: 317–322.
24 Goldfarb, D. (1970). A family of variable metric methods derived by variational means. Mathematics of Computation 24: 23–26.
25 Shanno, D.F. (1970). Conditioning of quasi‐Newton methods for function minimization. Mathematics of Computation 24: 647–656.
26 Powell, M.J.D. (1962). An iterative method for finding stationary values of a function of several variables. The Computer Journal 5: 147–151.
27 Freudenstein, F. and Roth, B. (1963). Numerical solution of systems of nonlinear equations. Journal of the ACM 10 (4): 550–556.
28 Powell, M.J.D. (1970). A hybrid method for nonlinear equations. In: Numerical Methods for Nonlinear Algebraic Equations (ed. P. Rabinowitz), 87–114. New York: Gordon & Breach.
29 More, J.J., Garbow, B.S., and Hillstrom, K.E. (1981). Testing unconstrained optimization software. ACM Transactions on Mathematical Software 7 (1): 17–41.
30 Colville, A.R. (1968). A Comparative Study of Nonlinear Programming Codes, Report 320‐2949, IBM New York Scientific Center.
31 Eason, E.D. and Fenton, R.G. (1974). A comparison of numerical optimization methods for engineering design. ASME Journal of Engineering Design 96: 196–200.
32 Sargent, R.W.H. and Sebastian, D.J. (1972). Numerical experience with algorithms for unconstrained minimization. In: Numerical Methods for Nonlinear Optimization (ed. F.A. Lootsma), 45–113. London: Academic Press.
33 Shanno, D.F. (1983). Recent advances in gradient based unconstrained optimization techniques for large problems. ASME Journal of Mechanisms, Transmissions, and Automation in Design 105: 155–159.
34 Rao, S.S. (2017). Mechanical Vibrations, 6e. Hoboken, NJ: Pearson Education.
35 Haftka, R.T. and Gürdal, Z. (1992). Elements of Structural Optimization, 3e. Dordrecht, The Netherlands: Kluwer Academic.
36 Kowalik, J. and Osborne, M.R. (1968). Methods for Unconstrained Optimization Problems. New York: American Elsevier.

Review Questions

6.1 State the necessary and sufficient conditions for the unconstrained minimum of a function.
6.2 Give three reasons why the study of unconstrained minimization methods is important.
6.3 What is the major difference between zeroth‐, first‐, and second‐order methods?
6.4 What are the characteristics of a direct search method?
6.5 What is a descent method?
6.6 Define each term:
1. Pattern directions
2. Conjugate directions
3. Simplex
4. Gradient of a function
5. Hessian matrix of a function
6.7 State the iterative approach used in unconstrained optimization.
6.8 What is quadratic convergence?
6.9 What is the difference between linear and superlinear convergence?
6.10 Define the condition number of a square matrix.
6.11 Why is the scaling of variables important?
6.12 What is the difference between random jumping and random walk methods?
6.13 Under what conditions are the processes of reflection, expansion, and contraction used in the simplex method?
6.14 When is the grid search method preferred in minimizing an unconstrained function?
6.15 Why is a quadratically convergent method considered to be superior for the minimization of a nonlinear function?
6.16 Why is Powell's method called a pattern search method?
6.17 What are the roles of univariate and pattern moves in the Powell's method?
6.18 What is univariate method?
6.19 Indicate a situation where a central difference formula is not as accurate as a forward difference formula.
6.20 Why is a central difference formula more expensive than a forward or backward difference formula in finding the gradient of a function?
6.21 What is the role of one‐dimensional minimization methods in solving an unconstrained minimization problem?
6.22 State possible convergence criteria that can be used in direct search methods.
6.23 Why is the steepest descent method not efficient in practice, although the directions used are the best directions?
6.24 What are rank 1 and rank 2 updates?
6.25 How are the search directions generated in the Fletcher–Reeves method?
6.26 Give examples of methods that require n ², n, and 1 one‐dimensional minimizations for minimizing a quadratic in n variables.
6.27 What is the reason for possible divergence of Newton's method?
6.28 Why is a conjugate directions method preferred in solving a general nonlinear problem?
6.29 What is the difference between Newton and quasi‐Newton methods?
6.30 What is the basic difference between DFP and BFGS methods?
6.31 Why are the search directions reset to the steepest descent directions periodically in the DFP method?
6.32 What is a metric? Why is the DFP method considered as a variable metric method?
6.33 Answer true or false:
1. A conjugate gradient method can be called a conjugate directions method.
2. A conjugate directions method can be called a conjugate gradient method.
3. In the DFP method, the Hessian matrix is sequentially updated directly.
4. In the BFGS method, the inverse of the Hessian matrix is sequentially updated.
5. The Newton method requires the inversion of an n × n matrix in each iteration.
6. The DFP method requires the inversion of an n × n matrix in each iteration.
7. The steepest descent directions are the best possible directions.
8. The central difference formula always gives a more accurate value of the gradient than does the forward or backward difference formula.
9. Powell's method is a conjugate directions method.
10. The univariate method is a conjugate directions method.

Problems

6.1 A bar is subjected to an axial load, P ₀, as shown in Figure 6.17. By using a one‐finite‐element model, the axial displacement, u(x), can be expressed as 1

where N _i(x) are called the shape functions:

and u ₁ and u ₂ are the end displacements of the bar. The deflection of the bar at point Q can be found by minimizing the potential energy of the bar (f), which can be expressed as

where E is Young's modulus and A is the cross‐sectional area of the bar. Formulate the optimization problem in terms of the variables u ₁ and u ₂ for the case P ₀ l/EA = 1.
6.2 The natural frequencies of the tapered cantilever beam (ω) shown in Figure 6.18, based on the Rayleigh‐Ritz method, can be found by minimizing the function 34:

with respect to c ₁ and c ₂, where f = ω ², E is Young's modulus, and ρ is the density. Plot the graph of 3fρl ³ /Eh ² in (c ₁, c ₂) space and identify the values of ω ₁ and ω ₂.
6.3 The Rayleigh's quotient corresponding to the three‐degree‐of‐freedom spring–mass system shown in Figure 6.19 is given by 34

where

It is known that the fundamental natural frequency of vibration of the system can be found by minimizing R(X). Derive the expression of R(X) in terms of x ₁, x ₂, and x ₃ and suggest a suitable method for minimizing the function R(X).
6.4 The steady‐state temperatures at points 1 and 2 of the one‐dimensional fin (x ₁ and x ₂) shown in Figure 6.20 correspond to the minimum of the function 1:

Plot the function f in the (x ₁, x ₂) space and identify the steady‐state temperatures of points 1 and 2 of the fin.
6.5 Figure 6.21 shows two bodies, A and B, connected by four linear springs. The springs are at their natural positions when there is no force applied to the bodies. The displacements x ₁ and x ₂ of the bodies under any applied force can be found by minimizing the potential energy of the system. Find the displacements of the bodies when forces of 1000 and 2000 lb are applied to bodies A and B, respectively, using Newton's method. Use the starting vector, .
Hint:

where the strain energy of a spring of stiffness k and end displacements x ₁ and x ₂ is given by k(x ₂ − x ₁)² and the potential of the applied force, F _i, is given by x _i F _i.
6.6 (a) (b) (c) (d) The potential energy of the two‐bar truss shown in Figure 6.22 under the applied load P is given by

where E is Young's modulus, A the cross‐sectional area of each member, l the span of the truss, s the length of each member, h the depth of the truss, θ the angle at which load is applied, x ₁ the horizontal displacement of free node, and x ₂ the vertical displacement of the free node.
1. (a) Simplify the expression of f for the data E = 207 × 10⁹ Pa, A = 10⁻⁵ m², l = 1.5 m, h = 4 m, P = 10 000 N, and θ = 30°.
2. (b) Find the steepest descent direction, S ₁, of f at the trial vector X ₁ = .
3. (c) Derive the one‐dimensional minimization problem, f(λ), at X ₁ along the direction S ₁.
4. (d) Find the optimal step length λ^* using the calculus method and find the new design vector X ₂.
6.7 Three carts, interconnected by springs, are subjected to the loads P ₁, P ₂, and P ₃ as shown in Figure 6.23. The displacements of the carts can be found by minimizing the potential energy of the system (f):

where

Derive the function f(x ₁, x ₂, x ₃) for the following data: k ₁ = 5000 N/m, k ₂ = 1500 N/m, k ₃ = 2000 N/m, k ₄ = 1000 N/m, k ₅ = 2500 N/m, k ₆ = 500 N/m, k ₇ = 3000 N/m, k ₈ = 3500 N/m, P ₁ = 1000 N, P ₂ = 2000 N, and P ₃ = 3000 N. Complete one iteration of Newton's method and find the equilibrium configuration of the carts. Use X ₁ = {0 0 0}^T.
6.8 Plot the contours of the following function over the region (−5 ≤ x ₁ ≤ 5, −3 ≤ x ₂ ≤ 6) and identify the optimum point:
6.9 Plot the contours of the following function in the two dimensional (x ₁, x ₂) space over the region (−4 ≤ x ₁ ≤ 4, −3 ≤ x ₂ ≤ 6) and identify the optimum point:
6.10 Consider the problem

Plot the contours of f over the region (−4 ≤ x ₁ ≤ 4, −3 ≤ x ₂ ≤ 6) and identify the optimum point.
6.11 It is required to find the solution of a system of linear algebraic equations given by [A]X = b, where [A] is a known n × n symmetric positive‐definite matrix and b is an n‐component vector of known constants. Develop a scheme for solving the problem as an unconstrained minimization problem.
6.12 Solve the following equations using the steepest descent method (two iterations only) with the starting point, X ₁ = {0 0 0}:
6.13 An electric power of 100 MW generated at a hydroelectric power plant is to be transmitted 400 km to a stepdown transformer for distribution at 11 kV. The power dissipated due to the resistance of conductors is i ² c ⁻¹, where i is the line current in amperes and c is the conductance in mhos. The resistance loss, based on the cost of power delivered, can be expressed as 0.15i ² c ⁻¹ dollars. The power transmitted (k) is related to the transmission line voltage at the power plant (e) by the relation , where e is in kilovolts. The cost of conductors is given by 2c millions of dollars, and the investment in equipment needed to accommodate the voltage e is given by 500e dollars. Find the values of e and c to minimize the total cost of transmission using Newton's method (one iteration only).
6.14 Find a suitable transformation of variables to reduce the condition number of the Hessian matrix of the following function to one:
6.15 Find a suitable transformation or scaling of variables to reduce the condition number of the Hessian matrix of the following function to one:
6.16 Determine whether the following vectors serve as conjugate directions for minimizing the function .
6.17 Consider the problem:

Find the solution of this problem in the range −10 ≤ x _i ≤ 10, i = 1, 2, using the random jumping method. Use a maximum of 10 000 function evaluations.
6.18 Consider the problem:

Find the minimum of this function in the range −5 ≤ x _i ≤ 5, i = 1, 2, using the random walk method with direction exploitation.
6.19 Find the condition number of each matrix.
6.20 Perform two iterations of the Newton's method to minimize the function

from the starting point .
6.21 Perform two iterations of univariate method to minimize the function given in Problem 20 from the stated starting vector.
6.22 Perform four iterations of Powell's method to minimize the function given in Problem 20 from the stated starting point.
6.23 Perform two iterations of the steepest descent method to minimize the function given in Problem 20 from the stated starting point.
6.24 Perform two iterations of the Fletcher–Reeves method to minimize the function given in Problem 20 from the stated starting point.
6.25 Perform two iterations of the DFP method to minimize the function given in Problem 20 from the stated starting vector.
6.26 Perform two iterations of the BFGS method to minimize the function given in Problem 20 from the indicated starting point.
6.27 Perform two iterations of the Marquardt's method to minimize the function given in Problem 20 from the stated starting point.
6.28 Prove that the search directions used in the Fletcher–Reeves method are [A]‐conjugate while minimizing the function
6.29 Generate a regular simplex of size 4 in a two‐dimensional space using each base point:

(a) (b) (c)
6.30 Find the coordinates of the vertices of a simplex in a three‐dimensional space such that the distance between vertices is 0.3 and one vertex is given by (2, −1, −8).
6.31 Generate a regular simplex of size 3 in a three‐dimensional space using each base point.

(a) (b) (c)
6.32 Find a vector S ₂ that is conjugate to the vector

with respect to the matrix:
6.33 (a) (b) (c) (d) Compare the gradients of the function at given by the following methods:
1. (a) Analytical differentiation
2. (b) Central difference method
3. (c) Forward difference method
4. (d) Backward difference method
Use a perturbation of 0.005 for x ₁ and x ₂ in the finite‐difference methods.
6.34 (a) (b) (c) It is required to evaluate the gradient of the function

at point using a finite‐difference scheme. Determine the step size Δx to be used to limit the error in any of the components, ∂f/∂x _i, to 1% of the exact value, in the following methods:
1. (a) Central difference method
2. (b) Forward difference method
3. (c) Backward difference method
6.35 Consider the minimization of the function

Perform one iteration of Newton's method from the starting point using Eq. (6.86). How much improvement is achieved with X ₂?
6.36 Consider the problem:

If a base simplex is defined by the vertices

find a sequence of four improved vectors using reflection, expansion, and/or contraction.
6.37 Consider the problem:

If a base simplex is defined by the vertices

find a sequence of four improved vectors using reflection, expansion, and/or contraction.
6.38 Consider the problem:

Find the solution of the problem using grid search with a step size Δx _i = 0.1 in the range −3 ≤ x _i ≤ 3, i = 1, 2.
6.39 Show that the property of quadratic convergence of conjugate directions is independent of the order in which the one‐dimensional minimizations are performed by considering the minimization of

using the conjugate directions and and the starting point .
6.40 Show that the optimal step length that minimizes f(X) along the search direction S _i = −∇f _i is given by Eq. (6.75).
6.41 Show that β ₂ in Eq. (6.76) is given by Eq. (6.77).
6.42 Minimize f = 2 + from the starting point (1, 2) using the univariate method (two iterations only).
6.43 Minimize by using the steepest descent method with the starting point (1, 2) (two iterations only).
6.44 Minimize by the Newton's method using the starting point as (2, −1, 1).
6.45 Minimize starting from point (0, 0) using Powell's method. Perform four iterations.
6.46 Minimize by the simplex method. Perform two steps of reflection, expansion, and/or contraction.
6.47 Solve the following system of equations using Newton's method of unconstrained minimization with the starting point
6.48 It is desired to solve the following set of equations using an unconstrained optimization method:

Formulate the corresponding problem and complete two iterations of optimization using the DFP method starting from .
6.49 Solve Problem 48 using the BFGS method (two iterations only).
6.50 The following nonlinear equations are to be solved using an unconstrained optimization method:

Complete two one‐dimensional minimization steps using the univariate method starting from the origin.
6.51 Consider the two equations

Formulate the problem as an unconstrained optimization problem and complete two steps of the Fletcher–Reeves method starting from the origin.
6.52 Solve the equations 5x ₁ + 3x ₂ = 1 and 4x ₁ − 7x ₂ = 76 using the BFGS method with the starting point (0, 0).
6.53 (a) (b) (c) (d) (e) (f) (g) (h) Indicate the number of one‐dimensional steps required for the minimization of the function according to each scheme:
1. (a) Steepest descent method
2. (b) Fletcher–Reeves method
3. (c) DFP method
4. (d) Newton's method
5. (e) Powell's method
6. (f) Random search method
7. (g) BFGS method
8. (h) Univariate method
6.54 Same as Problem 53 for the following function:
6.55 Verify whether the following search directions are [A]‐conjugate while minimizing the function
6.56 Solve the equations x ₁ + 2x ₂ + 3x ₃ = 14, x ₁ − x ₂ + x ₃ = 1, and 3x ₁ − 2x ₂ + x ₃ = 2 using Marquardt's method of unconstrained minimization. Use the starting point X ₁ = {0, 0, 0}^T.
6.57 Apply the simplex method to minimize the function f given in Problem 20. Use the point (−1.2, 1.0) as the base point to generate an initial regular simplex of size 2 and go through three steps of reflection, expansion, and/or contraction.
6.58 Write a computer program to implement Powell's method using the golden section method of one‐dimensional search.
6.59 Write a computer program to implement the Davidon–Fletcher–Powell method using the cubic interpolation method of one‐dimensional search. Use a finite‐difference scheme to evaluate the gradient of the objective function.
6.60 Write a computer program to implement the BFGS method using the cubic interpolation method of one‐dimensional minimization. Use a finite‐difference scheme to evaluate the gradient of the objective function.
6.61 Write a computer program to implement the steepest descent method of unconstrained minimization with the direct root method of one‐dimensional search.
6.62 Write a computer program to implement the Marquardt method coupled with the direct root method of one‐dimensional search.
6.63 Find the minimum of the quadratic function given by Eq. (6.141) starting from the solution X ₁ = {0, 0}^T using MATLAB.
6.64 Find the minimum of the Powell's quartic function given by Eq. (6.142) starting from the solution X ₁ = {3, −1, 0, 1}^T using MATLAB.
6.65 Find the minimum of the Fletcher and Powell's helical valley function given by Eq. (6.143) starting from the solution X ₁ = {−1, 0, 0}^T using MATLAB.
6.66 Find the minimum of the nonlinear function given by Eq. (6.144) starting from the solution X ₁ = {0, 1, 2}^T using MATLAB.
6.67 Find the minimum of the Wood's function given by Eq. (6.149) starting from the solution X ₁ = {−3, −1, −3, −1}^T using MATLAB.

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6 Nonlinear Programming II: Unconstrained Optimization Techniques

Create new playlist

Sign In

Sign Up

6.1 Introduction

6.1.1 Classification of Unconstrained Minimization Methods

6.1.2 General Approach

6.1.3 Rate of Convergence

6.1.4 Scaling of Design Variables

Direct Search Methods

6.2 Random Search Methods

6.2.1 Random Jumping Method

6.2.2 Random Walk Method

6.2.3 Random Walk Method with Direction Exploitation

6.2.4 Advantages of Random Search Methods

6.3 Grid Search Method

6.4 Univariate Method

6.5 Pattern Directions

6.6 Powell's Method

6.6.1 Conjugate Directions

6.6.2 Algorithm

Quadratic Convergence

Convergence Criterion

6.7 Simplex Method

6.7.1 Reflection

6.7.2 Expansion

6.7.3 Contraction

Indirect Search (Descent) Methods

6.8 Gradient of a Function

6.8.1 Evaluation of the Gradient

6.8.2 Rate of Change of a Function Along a Direction

6.9 Steepest Descent (CAuchy) Method

6.10 Conjugate Gradient (FLetcher–REeves) Method

6.10.1 Development of the Fletcher–Reeves Method

6.10.2 Fletcher–Reeves Method

Remarks

6.11 Newton's Method

6.12 MArquardt Method

6.13 Quasi‐Newton Methods

6.13.1 Computation of [Bi ]

6.13.2 Rank 1 Updates

6.13.3 Rank 2 Updates

Remarks

6.14 DAvidon–FLetcher–POwell Method

6.15 BRoyden–FLetcher–GOldfarb–SHanno Method

Remarks

6.16 Test Functions

6.17 Solutions Using Matlab

References and Bibliography

Review Questions

Problems

Notes

Table of Contents for
6 Nonlinear Programming II: Unconstrained Optimization Techniques

6.13.1 Computation of [B_i ]