3.3. Optimality Conditions

A basic knowledge of optimality conditions is important for understanding the performance of the various numerical methods discussed later in the chapter. In this section, we introduce the basic concept of optimality, the necessary and sufficient conditions for the relative maxima and minima of a function, as well as the solution methods based on the optimality conditions. Simple examples are used to explain the underlying concepts. The examples will also show the practical limitations of the methods.

3.3.1. Basic Concept of Optimality

We start by recalling a few basic concepts we learned in Calculus regarding maxima and minima, followed by defining local and global optima; thereafter, we illustrate the concepts using functions of one and multiple variables.

3.3.1.1. Functions of a single variable

This section presents a few definitions for basic terms.
Stationary point: For a continuous and differentiable function f(x), a stationary point x is a point at which the slope of the function vanishes—that is, f ′(x) = df/dx = 0 at x = x, where x belongs to its domain of definition. As illustrated in Figure 3.7, a stationary point can be a minimum if f ″(x) > 0, a maximum if f ″(x) < 0, or an inflection point if f ″(x) = 0 in the neighborhood of x.
image
FIGURE 3.7 A stationary point may be (a) a minimum, (b) a maximum, or (c) an inflection point.
image
FIGURE 3.8 Global and local minimum of a function f(x).
Global and local minimum: A function f(x) is said to have a local (or relative) minimum at x = x if f(x) ≤ f(x + δ) for all sufficiently small positive and negative values of δ, that is, in the neighborhood of the point x∗. A function f(x) is said to have a global (or absolute) minimum at x = x∗ if f(x∗) ≤ f(x) for all x in the domain over which f(x) is defined. Figure 3.8 shows the global and local optimum points of a function f(x) with a single variable x.
Necessary condition: Consider a function f(x) of single variable defined for a < x < b. To find a point of x∈(a, b) that minimizes f(x), the first derivative of function f(x) with respect to x at x = x must be a stationary point; that is, f ′(x) = 0.
Sufficient condition: For the same function f(x) stated above and f ′(x) = 0, then it can be said that f(x) is a minimum value of f(x) if f ″(x) > 0, or a maximum value if f ″(x) < 0.
EXAMPLE 3.1
Find a minimum of the function f(x) = x2  2x, for x ∈ (0, 2).
Solutions
The first derivative of f(x) with respect to x is f ′(x) = 2x  2. We set f ′(x) = 0, and solve for x = 1, which is a stationary point. This is the necessary condition for x = 1 to a minimum of the function f(x).
We take second derivative of f(x) with respect to x, f ″(x) = 2 > 0, which satisfies the sufficient condition of the function f(x) that has a minimum at x = 1, and the minimum value of the function at x = 1 is f(1) = 1.
The concept illustrated above can be easily extended to functions of multiple variables. We use functions of two variables to provide a graphical illustration on the concepts.

3.3.1.2. Functions of multiple variables

A function of two variables f(x1, x2) = (cos2x1 + cos2x2)2 is graphed in Figure 3.9a. Perturbations from point (x1, x2) = (0, 0), which is a local minimum, in any direction result in an increase in the function value of f(x); that is, the slopes of the function with respect to x1 and x2 are zero at this point of local minimum. Similarly, a function f(x1, x2) = (cos2x1 + cos2x2)2 graphed in Figure 3.9b has a local maximum at (x1, x2) = (0, 0). Perturbations from this point in any direction result in a decrease in the function value of f(x); that is, the slopes of the function with respect to x1 and x2 are zero at this point of local maximum. The first derivatives of the function with respect to the variables are zero at the minimum or maximum, which again is called a stationary point.
Necessary condition: Consider a function f(x) of multivariables defined for xRn, where n is the number of variables. To find a point of xRn that minimizes f(x), the gradient of the function f(x) at x = x must be a stationary point; that is, ∇f(x) = 0.
The gradient of a function of multivariables is defined as

f(x)[fx1,fx2,...,fxn]T

image (3.8)

Geometrically, the gradient vector is normal to the tangent plane at a given point x, and it points in the direction of maximum increase in the function. These properties are quite important; they will be used in developing optimality conditions and numerical methods for optimum design. In Example 3.2, the gradient vector for a function of two variables is calculated for illustration purpose.
image
FIGURE 3.9 Functions of two variables (MATLAB Script 2 can be found in Appendix A): (a) f(x1, x2) = (cos2x1 + cos2x2)2 with a local minimum at (0, 0) and (b) f(x1x2) = (cos2x1 + cos2x2)2 with a local maximum at (0, 0).
EXAMPLE 3.2
A function of two variables is defined as

f(x1,x2)=x2ex12x22

image (3.9a)

 
which is graphed in MATLAB shown below (left). The MATLAB script for the graph can be found in Appendix A (Script 3). Calculate the gradient vectors of the function at (x1, x2) = (1, 1) and (x1, x2) = (1,1).
Solutions
From Eq. 3.8, the gradient vector of the function f(x1, x2) is

f(x1,x2)=[2x1x2ex12x22,ex12x222x22ex12x22]T

image (3.9b)

 
At (x1, x2) = (1, 1), f(1, 1) = e2 = 0.1353, and ∇f(1, 1) = [2e2, e2]T; and at (x1, x2) = (1,1), f(1,1) = e2 = 0.1353, and ∇f(1, 1) = [2e2, e2]T. The iso-lines of f(1, 1) and f(1, 1) as well as the gradient vectors at (1, 1) and (1,1) are shown in the figure below (right). In this example, gradient vector at a point x is perpendicular to the tangent line at x, and the vector points in the direction of maximum increment in the function value. The maximum and minimum of the function are shown for clarity.
image
Sufficient condition: For the same function f(x) stated above, let ∇f(x) = 0, then f(x) has a minimum value of f(x) if its Hessian matrix defined in Eq. 3.10 is positive-definite.

H=2f=[2fxixj]=[2fx122fx1x2...2fx1xn2fx2x12fx22...2fx2xn.........2fxnx12fxnx2...2fxn2]n×n

image (3.10)

where all derivatives are calculated at the given point x. The Hessian matrix is an n × n matrix, where n is the number of variables. It is important to note that each element of the Hessian is a function in itself that is evaluated at the given point x. Also, because f(x) is assumed to be twice continuously differentiable, the cross partial derivatives are equal; that is,

2fxixj=2fxjxi;i,j=1,n

image (3.11)

Therefore, the Hessian is always a symmetric matrix. The Hessian matrix plays a prominent role in exploring the sufficiency conditions for optimality.
Note that a square matrix is positive-definite if (a) the determinant of the Hessian matrix is positive (i.e., |H| > 0) or (b) all its eigenvalues are positive. To calculate the eigenvalues λ of a square matrix, the following equation is solved:

|HλI|=0

image (3.12)

where I is an identity matrix of n × n.
EXAMPLE 3.3
A function of three variables is defined as

f(x1,x2,x2)=x12+2x1x2+2x22+x322x1+x2+8

image (3.13a)

 
Calculate the gradient vector of the function and determine a stationary point, if it exists. Calculate a Hessian matrix of the function f, and determine if the stationary point found gives a minimum value of the function f.
Solutions
We first calculate the gradient of the function and set it to zero to find the stationary point(s), if any:

f(x1,x2,x3)=[2x1+2x22,2x1+4x2+1,2x3]T

image (3.13b)

 
Setting Eq. 3.13b to zero, we have x = [2.5, 1.5, 0]T, which is the only stationary point. Now, we calculate the Hessian matrix:

H=2f=[220240001]

image (3.13c)

 
which is positive-definite because

|H|=|220240001|=84=4>0

image (3.13d)

 
or

|HλI|=|2λ2024λ0001λ|=(2λ)(4λ)(1λ)4(1λ)=0

image (3.13e)

 
Solving Eq. 3.13e, we have λ = 1, 0.7639 and 5.236, which are all positive. Hence, the Hessian matrix is positive-definite; therefore, the stationary point x∗ = [2.5, 1.5, 0]T is a minimum point, at which the function value is f(x∗) = 4.75.

3.3.2. Basic Concept of Design Optimization

For an optimization problem defined in Eq. 3.3, we find design variable vector x to minimize an objective function f(x) subject to the inequality constraints gi(x) ≤ 0, i = 1 to m, the equality constraints hj(x) = 0, j = 1 to p, and the side constraints xk ≤ xk ≤ xku, k = 1, n. In Eq. 3.5, we define the feasible set S, or feasible region, for a design problem as a collection of feasible designs. For unconstrained problems, the entire design space is feasible because there are no constraints. In general, the optimization problem is to find a point in the feasible region that gives a minimum value to the objective function. From a design perspective, in particular solving Eq. 3.3, we state the followings terms.
Global minimum: A function f(x) of n design variables has a global minimum at x if the value of the function at xS is less than or equal to the value of the function at any other point x in the feasible set S. That is,

f(x)f(x),(forall)xS

image (3.14)

If strict inequality holds for all x other than x in Eq. 3.14, then x is called a strong (strict) global minimum; otherwise, it is called a weak global minimum.
Local minimum: A function f(x) of n design variables has a local (or relative) minimum at xS if inequality of Eq. 3.14 holds for all x in a small neighborhood N (vicinity) of x. If strict inequality holds, then x is called a strong (strict) local minimum; otherwise, it is called a weak local minimum.
Neighborhood N of point x is defined as the set of points:

N={x|xSwithxx<δ}

image (3.15)

for some small δ > 0. Geometrically, it is a small feasible region around point x, such as a sphere of radius δ for n = 3 (number of design variables n = 3).
Next, we illustrate the derivation of the necessary and sufficient conditions using Taylor's series expansion. For the time being, we assume unconstrained problems. In the next subsection, we extend the discussion to constrained problems.
Expanding the objective function f(x) at the inflection point x using Taylor's series, we have

f(x)=f(x)+f(x)TΔx+12ΔxTH(x)Δx+R

image (3.16)

where R is the remainder containing higher-order terms in Δx, and Δx = x  x. We define increment Δf(x) as

Δf(x)=f(x)f(x)=f(x)TΔx+12ΔxTH(x)Δx+R

image (3.17)

If we assume a local minimum at x, then Δf must be nonnegative due to the definition of a local minimum given in Eq. 3.14; that is, Δf ≥ 0.
Because Δx is small, the first-order term ∇f(x)T Δx dominates other terms, and therefore Δf can be approximated as Δf(x) = ∇f(x)TΔx. Note that Δf in this equation can be positive or negative depending on the sign of the term ∇f(x)TΔx. Because Δx is arbitrary (a small increment in x), its components may be positive or negative. Therefore, we observe that Δf can be nonnegative for all possible Δx unless

f(x)=0

image (3.18)

In other words, the gradient of the function at x must be zero. In the component form, this necessary condition becomes

f(x)xi=0,i=1,n

image (3.19)

Again, points satisfying Eq. 3.18 or Eq. 3.19 are called stationary points.
Considering the second term on the right-hand side of Eq. 3.17 evaluated at a stationary point x, the positivity of Δf is assured if

ΔxTH(x)Δx>0

image (3.20)

for all Δx ≠ 0. This is true if the Hessian H(x) is a positive-definite matrix, which is then the sufficient condition for a local minimum of f(x) at x.

3.3.3. Lagrange Multipliers

We begin the discussion of optimality conditions for constrained problems by including only the equality constraints in the formulation in this section; that is, inequalities in Eq. 3.3b are ignored temporarily. More specifically, the optimization problem is restated as

Minimize:f(x)

image (3.21a)

Subjectto:hj(x)=0,j=1,p

image (3.21b)

xkxkxku,k=1,n

image (3.21c)

The reason is that the nature of equality constraints is quite different from that of inequality constraints. Equality constraints are always active for any feasible design, whereas an inequality constraint may not be active at a feasible point. This changes the nature of the necessary conditions for the problem when inequalities are included.
A common approach for dealing with equality constraints is to introduce scalar multipliers associated with each constraint, called Lagrange multipliers. These multipliers play a prominent role in optimization theory as well as in numerical methods, in which a constrained problem is converted into an unconstrained problem that can be solved by using optimality conditions or numerical algorithms specifically developed for them. The values of the multipliers depend on the form of the objective and constraint functions. If these functions change, the values of the Lagrange multipliers also change.
Through Lagrange multipliers, the constrained problem (with equality constraints) shown in Eq. 3.21 is converted into an unconstrained problem as

L(x,λ)=f(x)+j=1pλjhj(x)=f(x)+λTh(x)

image (3.22)

which is called a Lagrangian function, or simply Lagrangian. If we expand the vector of design variables to include the Lagrange multipliers, then the necessary and sufficient conditions of a local minimum discussed in the previous subsection are applicable to the problem defined in Eq. 3.22.
Before discussing the optimality conditions, we defined an important term called regular point. Consider the constrained optimization problem defined in Eq. 3.21, a point x satisfying the constraint functions h(x) = 0 is said to be a regular point of the feasible set if the objective f(x) is differentiable and gradient vectors of all constraints at the point x are linearly independent. Linear independence means that no two gradients are parallel to each other, and no gradient can be expressed as a linear combination of the others. When inequality constraints are included in the problem definition, then for a point to be regular, gradients of all the active constraints must also be linearly independent.
The necessary condition (or Lagrange multiplier theorem) is stated next.
Consider the optimization problem defined in Eq. 3.21. Let x be a regular point that is a local minimum for the problem. Then, there exist unique Lagrange multipliers λj, j = 1, p such that

L(x,λ)=L(x,λ)xi=0;i=1,n

image (3.23)

Differentiating the Lagrangian L(x, λ) with respect to λj, we recover the equality constraints as

L(x,λ)λi=0;hj(x)=0;j=1,p

image (3.24)

The gradient conditions of Eqs 3.23 and 3.24 show that the Lagrangian is stationary with respect to both x and λ. Therefore, it may be treated as an unconstrained function in the variables x and λ to determine the stationary points. Note that any point that does not satisfy the conditions cannot be a local minimum point. However, a point satisfying the conditions need not be a minimum point either. It is simply a candidate minimum point, which can actually be an inflection or maximum point.
The second-order necessary and sufficient conditions, similar to that of Eq. 3.20, in which the Hessian matrix includes terms of Lagrange multipliers, can distinguish between the minimum, maximum, and inflection points. More specifically, a sufficient condition for f(x) to have a local minimum at x is that each root of the polynomial in ε, defined by the following determinant equation be positive:

|LIεGG0|T=0

image (3.25)

where

LIε=[L11εL12...L1nL21L22ε...L2n.........Ln1Ln2...Lnnε]n×n

image (3.26)

and

G=[g11g12...g1ng21g22...g2n.........gm1gm2...gmn]m×n

image (3.27)

Note that Lij is a partial derivative of the Lagrangian L with respect to xi and λj, i.e., Lij=2L(x,λ)xixjimage, i, j = 1, n; and gpq is the partial derivative of gp with respect to xq; i.e., gpq=gp(x)xqimage, p = 1, m and q = 1, n.
EXAMPLE 3.4
Find the optimal solution for the following problem:

Minimize:f(x)=3x12+6x1x2+5x22+7x1+5x2

image (3.28a)

 

Subjectto:g(x)=x1+x25=0

image (3.28b)

 
Solution
Define the Lagrangian as

L(x,λ)=f(x)+λg(x)=(3x12+6x1x2+5x22+7x1+5x2)+λ(x1+x25)

image

 
Taking derivatives of L(x, λ) with respect to x1, x2, and λ, respectively, we have

L/x1=6x1+6x2+7+λ=0

image

 
From Eq. 3.28b, 6(x1 + x2) = 6(5) =  λ. Therefore, λ = 37.
Also, ∂L/∂x2 = 6x1 + 10x2 + 5 + λ = 0. It can be also written as 6(x1 + x2) + 4x2 + 5 + λ = 6(5) + 4x2 + 5  37 = 0. Hence x2 = 0.5 and x1 = 4.5.
We obtain the stationary point x∗ = [4.5, 0.5]T and λ = 37. Next, we check the sufficient condition of Eq. 3.25; that is, for this example, we have

|L11εL12g11L12L22εg21g11g210|=0

image

 
in which
L11=2Lx12|(x,λ)=6image, L12=L21=2Lx1x2|(x,λ)=6image, L22=2Lx22|(x,λ)=10image, g11=gx1|(x,λ)=1image, and g12=gx2|(x,λ)=1image. Hence the determinant becomes

|6ε61610ε1110|=6+6(10ε)(6ε)=0

image

 
Therefore, ε = 2. Because ε is positive, x and λ correspond to a minimum.

3.3.4. Karush–Kuhn–Tucker Conditions

Next, we extend the Lagrange multiplier to include inequality constraints and consider the general optimization problem defined in Eq. 3.3.
We first transform an inequality constraint into an equality constraint by adding a new variable to it, called the slack variable. Because the constraint is of the form “≤”, its value is either negative or zero at a feasible point. Thus, the slack variable must always be nonnegative (i.e., positive or zero) to make the inequality an equality.
An inequality constraint gi(x) ≤ 0 is equivalent to the equality constraint

Gi(x)=gi(x)+si2=0

image (3.29)

where si is a slack variable. The variables si are treated as unknowns of the design problem along with the design variables. Their values are determined as a part of the solution. When the variable si has zero value, the corresponding inequality constraint is satisfied at equality. Such an inequality is called an active (or tight) constraint; that is, there is no “slack” in the constraint. For any si ≠ 0, the corresponding constraint is a strict inequality. It is called an inactive constraint, and has slack given by si2.
Note that once a design point is specified, Eq. 3.29 can be used to calculate the slack variable si2. If the constraint is satisfied at the point (i.e., gi(x) ≤ 0), then si2 ≥ 0. If it is violated, then si2 is negative, which is not acceptable; that is, the point is not a feasible point and hence is not a candidate minimum point.
Similar to that of Section 3.3.3, through Lagrange multipliers, the constrained problem (with equality and inequality constraints) defined in Eq. 3.3 is converted into an unconstrained problem as

L(x,λ,μ,s)=f(x)+i=1pλihi(x)+j=1mμj(gj(x)+sj2)=f(x)+λTh(x)+μTG(x)

image (3.30)

If we expand the vector of design variables to include the Lagrange multipliers λ and μ, and the slack variables s, then the necessary and sufficient conditions of a local minimum discussed in the previous subsection are applicable to the unconstrained problem defined in Eq. 3.30.
Note that derivatives of the Lagrangian L with respect to x and λ lead to Eqs 3.23 and 3.24, respectively. On the other hand, the derivatives of L with respect to μ yield converted equality constraints of Eq. 3.29. Furthermore, the derivatives of L with respect to s yield

μjsj=0,j=1,m

image (3.31)

which is an additional necessary condition for the Lagrange multipliers of “≤ type” constraints given as

μj0;j=1,m

image (3.32)

where μj is the Lagrange multiplier for the jth inequality constraint. Equations (3.32) is referred to as the nonnegativity of Lagrange multipliers (Arora 2012).
The necessary conditions for the constrained problem with equality and inequality constraints defined in Eq. 3.3 can be summed up in what are commonly known as the KKT first-order necessary conditions.
Karush–Kuhn–Tucker Necessary Conditions: Let x be a regular point of the feasible set that is a local minimum for f(x), subject to hi(x) = 0; i = 1, p; gj(x) ≤ 0; j = 1, m. Then there exist Lagrange multipliers λ (a p-vector) and μ (an m-vector) such that the Lagrangian L is stationary with respect to xk, λi, μj, and s at the point x; that is:
1. Stationarity:

L(x,λ,μ,s)=0;

image (3.33a)

    or

Lxk=fxk+i=1pλihixk+j=1mμjgjxk=0;k=1,n

image (3.33b)

(2) Equality constraints:

Lλi=0;hi(x)=0;i=1,p

image (3.34)

(3) Inequality constraints (or complementary slackness condition):

Lμj=0;gj(x)+sj2=0;j=1,m

image (3.35)

Lsj=0;2μjsj=0;j=1,m

image (3.36)

In addition, gradients of the active constraints must be linearly independent. In such a case, the Lagrange multipliers for the constraints are unique.
EXAMPLE 3.5
Solve the following optimization problem using the KKT conditions.

Minimize:f(x)=2x12x2+12(x12+x22)

image (3.37a)

 

Subjectto:g1(x)=3x1+x260

image (3.37b)

 

h1(x)=x1x2=0

image (3.37c)

 

0x1,0x2

image (3.37d)

 
Solutions
Using Eq. 3.30, we state the Lagrangian of the problem as

L=2x12x2+12(x12+x22)+λ1(x1x2)+μ1(3x1+x26+s12)+μ2(x1+s22)+μ3(x2+s32)

image (3.37e)

 
Taking derivatives of the Lagrangian with respect to x1, x2, λ1, μ1, μ2, μ3, s1, s2, and s3 and setting them to zero, we have

Lx1=2+x1+λ1+3μ1μ2=0

image (3.37f)

 

Lx2=2+x2λ1+μ1μ3=0

image (3.37g)

 

Lλ1=x1x2=0

image (3.37h)

 

Lμ1=3x1+x26+s12=0

image (3.37i)

 

Lμ2=x1+s22=0

image (3.37j)

 

Lμ3=x2+s32=0

image (3.37k)

 

Ls1=2μ1s1=0

image (3.37l)

 

Ls2=2μ2s2=0

image (3.37m)

 

Ls3=2μ3s3=0

image (3.37n)

 
Note that, as expected, Eqs 3.37h3.37k reduce back to the constraint equations of the original optimization problem in Eqs 3.37b3.37d. We have total nine equations (Eqs 3.37f3.37n) and nine unknowns. Not all nine equations are linear. It is not guaranteed that all nine unknowns can be solved uniquely from the nine equations. As discussed before, these equations are solved in different cases. In this example, the equality constraint must be satisfied; hence, from Eq. 3.37h, x1 = x2. Next, we make assumptions and proceed with different cases.
Case 1: we assume that the inequality constraint, Eq. 3.37i, is active; hence, s1 = 0. From the same equation, we have

3x1+x26=4x16=0.Hencex1=x2=1.5.

image

 
which implies that the side constraints are not active; hence, from Eqs 3.37j and 3.37k, s2 ≠ 0 and s3 ≠ 0, implying μ2 = μ3 = 0 from Eqs 3.37m and 3.37n.
Solve μ1 and λ1 from Eqs 3.37f and 3.37g, we have μ1 = 0.25, and λ1 = 0.25. Then, from Eq. 3.37a, the objective function is

f(1.5,1.5)=2(1.5)2(1.5)+1/2(1.52+1.52)=3.75

image

 
Case 2: we assume that the side constraints, Eqs 3.37j and 3.37k, are active; hence, s2 = s3 = 0; and x1 = x2 = 0. From Eq. 3.37i, s1 ≠ 0, hence μ1 = 0 from Eq. 3.37l. There is no more assumption can be made logically. Equations (3.37f) and (3.37g) consist of three unknowns, λ1, μ2, and μ3; they cannot be solved uniquely. If we further assume μ2 = 0, then we have λ1 = 2 and μ2 = 4. If we assume μ3 = 0, then we have λ1 = 2 and μ2 = 4. Then, from Eq. 3.37a, the objective function is

f(0,0)=2(0)2(0)+1/2(02+02)=0

image

 
which is greater than that of Case 1.
Case 3: we assume that the side constraint, Eq. 3.37j, is active; hence, s2 = 0; and x1 = 0. We assume constraints (3.37i) and (3.37k) are not active; hence, s1 ≠ 0 and s3 ≠ 0; implying μ1 = μ3 = 0. There is no more assumption that can be made logically. Equations (3.37f) and (3.37g) consist of three unknowns, λ1, μ2, and x2; they cannot be solved uniquely. If we further assume λ1 = 0, then we have x2 = 2 and μ2 = 2. Then, from Eq. 3.37a, the objective function is

f(0,2)=2(0)2(2)+1/2(02+22)=2

image

 
which is greater than that of Case 1. We may proceed with a few more cases and find possible solutions. After exhausting all cases, the solution obtained in Case 1 gives a minimum value for the objective function.
As seen in the example above, solving optimization problems using the KKT conditions is not straightforward, even for a simple problem. All possible cases must be identified and carefully examined. In addition, a sufficient condition that involves second derivatives of the Lagrangian is difficult to verify. Furthermore, for practical engineering design problems, objective and constraint functions are not expressed in terms of design variables explicitly, and taking derivatives analytically is not possible. After all, KKT conditions serve well for understanding the concept of optimality.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.138.178