Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

12
Duality Revisited

12.1 Introduction

The material presented in this chapter is largely theoretical in nature and may be considered to be a more “refined” approach to the study of linear programming and duality. That is to say, the mathematical techniques employed herein are quite general in nature in that they encompass a significant portion of the foundations of linear (as well as nonlinear) programming. In particular, an assortment of mathematical concepts often encountered in the calculus, along with the standard matrix operations which normally underlie the theoretical development of linear programming, are utilized to derive the Karush‐Kuhn‐Tucker necessary conditions for a constrained extremum and to demonstrate the formal equivalence between a solution to the primal maximum problem and the associated saddle‐point problem. Additionally, the duality and complementary slackness theorems of the preceding chapter are reexamined in the context of this “alternative” view of the primal and dual problems.

12.2 A Reformulation of the Primal and Dual Problems

Turning to the primal problem, let us maximize f(X) = C′X subject to AX ≤ b, X ≥ O, X ∈ Eⁿ, where A is of order (m × n) with rows a₁, …, a_m, and b is an (m × 1) vector with components b₁, …, b_m. Alternatively, if

are, respectively, of order (m + n × n) and (m + n × 1), then we may maximize f(X) = C′X subject to where X ∈ Eⁿ is now unrestricted. In this formulation x_j ≥ 0, j = 1, …, n, is treated as a structural constraint so that defines a region of feasible or admissible solutions ⊆ Eⁿ.

If X₀ yields an optimal solution to the primal maximum problem, then no permissible change in X (i.e. one that does not violate any of the constraints specifying K) can improve the optimal value of the objective function. How may we characterize such admissible changes? It is evident that, if starting at some feasible X, we can find a vector h such that a small movement along it violates no constraint, then h specifies a feasible direction at X. More formally, let δ(X₀) be a suitably restricted spherical δ – neighborhood about the point X₀ ∈ Eⁿ. Then the (n × 1) vector h is a feasible direction at X₀ if there exists a scalar t ≥ 0, 0 ≤ t < δ, such that X₀ + th is feasible. In this light, for δ(X₀) again a spherical δ − neighborhood about the feasible point X₀ ∈ Eⁿ, the set of feasible directions at X₀, D(X₀), is the collection of all (n × 1) vectors h such that X₀ + th is feasible for t sufficiently small, i.e.

Here D(X₀) is the tangent support cone consisting of all feasible directions at X₀ and is generated by the tangent or supporting hyperplanes to K at X₀ (Figure 12.1). And since each such hyperplane

Graph depicting patterns (D(Xo)) formed below an intersecting ascending (ar X=br (tangent hyperplane) and descending (as X=bs (tangent hyperplane) lines with a dot marker (X0) at the intersection point. — **Figure 12.1** Set of feasible directions D(X₀).

specifies a closed half‐space the tangent support cone D(X₀) represents that portion of the intersection of these closed half‐planes in the immediate vicinity of (X₀).

To discern the relationship between D(X₀)(X₀ optimal) and: (i) f(X₀); (ii) the constraints let us consider the following two theorems, starting with Theorem 12.1.

Hence f must decrease along any feasible direction h. Geometrically, no feasible direction may form an acute angle (<π/2) between itself and C.

To set the stage for our next theorem, we note that for any optimal X∈ K, it is usually the case that not all the inequality constraints are binding or hold as an equality. To incorporate this observation into our discussion, let us divide all of the constraints into two mutually exclusive classes – those that are binding at and those which are not, So if

depicts the index set of binding constraints, then for i ∉ I. In this regard, we have Theorem 12.5.

To interpret this theorem, we note that may be characterized as an inward pointing or interior normal to at X₀. So if h is a feasible direction, it makes a nonobtuse angle (≤π/2) with all of the interior normals to the boundary of K at X₀. Geometrically, these interior normal or gradients of the binding constraints form a finite cone containing all feasible directions making nonobtuse angles with the supporting hyperplanes to K at X₀. Such a cone is polar to the tangent support cone D(X₀) and will be termed the polar support cone D(X₀)⁺(Figure 12.2a) Thus, D(X₀)⁺ is the cone spanned by the gradients such that for h∈D i ∈ J. Looked at from another perspective, may be considered an outward‐pointing or exterior normal to the boundary of K at X₀. In this instance, if h is a feasible direction, it must now make a non‐acute angle (≥π/2) with all of the outward‐pointing normals to the boundary of K at X₀. Again looking to geometry, the exterior normals or negative gradients of the binding constraints form a finite cone containing all feasible directions making nonacute angles with the hyperplanes tangent to K at X₀. This cone is the dual of the tangent support cone D(X₀) and is termed the dual support cone D(X₀)^*(Figure 12.2b). Thus, D(X₀)^* is the cone spanned by the negative gradients such that, for all h∈D i ∈ J.

Image described by caption and surrounding text. — **Figure 12.2** (a) Polar support cone D(X₀)⁺; (b) Dual support cone D(X₀)^*.

At this point, let us collect our major results. We found that if f(X) subject to i ∈ J, attains its maximum at X₀, then

(12.1)

How may we interpret this condition? Given any h∈ D(X₀), (12.1) holds if the gradient of f or C lies within the finite cone spanned by the exterior normals i ∈ J, i.e. C∈ D(X₀)^*(or if −C is contained within the finite cone generated by the interior normals i ∈ J, i.e. −C∈ D(X₀)⁺). Hence (12.1) requires that the gradient of f be a nonnegative linear combination of the negative gradients of the binding constraints at X₀ (Figure 12.3). In this regard, there must exist real numbers such that

(12.2)

Graph displaying an intersecting ascending and descending lines with dot marker (x0) at the intersection point. The dot marker has 3 upward arrows (a’r, C, and a’s) forming line patterns labeled D (X0)*. — **Figure 12.3** C ∈ D(X₀)^*.

Under what conditions will the numbers , i ∈ J, exist? To answer this question, let us employ Theorem 12.3.

Theorem 12.3

Minkowski‐Farkas Theorem. A necessary and sufficient condition for the n‐component vector V to lie within the finite cone spanned by the columns of the (m × n) matrix B is that V′ Y ≤ 0 for all Y satisfying B′ Y ≤ O, i.e. there exists an n‐component vector λ ≥ O such that Bλ = V if and only if V′ Y ≤ 0 for all Y satisfying B′ Y ≤ O.

Proof. (sufficiency) if Bλ = V, λ ≥ O, then λ′ B′ = V′ and λ′ B′Y = V′Y ≤ 0 for all Y for which B′Y ≤ O. (necessity) if V′ Y ≤ 0 for each Y satisfying B′Y ≤ O, then the (primal) linear programming problem

has as its optimal solution Y = O. Hence, the dual problem

also has an optimal solution by Duality Theorem 6.4. Hence there exists at least one λ ≥ O satisfying Bλ = V. Q.E.D.

Returning now to our previous question, in terms of the notation used above, if: C = V; the vectors i ∈ J, are taken to be the columns of B (i.e. ); and the i ∈ J, are the elements of λ ≥ O, then a necessary and sufficient condition for C to lie within the finite cone generated by the vectors is that C′h ≤ 0 for all h satisfying i ∈ J. Hence, there exist real numbers such that (12.2) holds.

We may incorporate all of the constraints (active or otherwise) into our discussion by defining the scalar as zero whenever or i ∉ J. Then (12.2) is equivalent to the system

(12.3)

where Note that for all i values since, if then while if then

When we formed the structural constraint inequality the n non‐negativity conditions X ≥ O were treated as structural constraints. That is, x_j ≥ 0 was converted to However, if the non‐negativity conditions are not written in this fashion, but appear explicitly as X ≥ O, then (12.3) may be rewritten in an equivalent form provided by Theorem 12.4.

Theorem 12.4

Karush‐Kuhn‐Tucker Theorem for Linear Programs (A Necessary and Sufficient Condition). The point X₀ ∈ Eⁿ solves the primal problem

if and only if

(12.4)

where U₀ is an (m × l) vector with components ²

Proof. (necessity) to solve the primal problem, let us employ the technique of Lagrange. In this regard, we first convert each of the inequality constraints to an equality by subtracting, from its left‐hand side, a nonnegative slack variable. (Actually, since we require that each slack variable be nonnegative, its square will be subtracted from the left‐hand side to ensure its nonnegativity.) That is, b_i − a_iX ≥ 0 is transformed to while x_j ≥ 0 is converted to where are all squares of slack variables. Hence, the primal Lagrangian appears as

where u₁, …, u_m, v₁, …, v_n are Lagrange multipliers. In matrix terms, the primal Lagrangian may be written as

where

Here are of order (m × 1) while have order (n × 1). Then

(12.5)

Let us first transform (12.5b, c) to respectively. Then, from and (12.5d) we obtain, at That is, if the constraint b_i − a_iX ≥ 0 is not binding at X₀, then while if it is binding, Hence, at least one of each pair vanishes, thus guaranteeing that Next, combining and (12.5e) yields X′V = 0. From (12.5a) we have −V = C − A′U. Then, at X₀ and U₀, these latter two expressions yield In this regard, either allowing c_j to be less than or in which case may be positive. To see this, let us assume, without loss of generality, that the first k < m structural constraints are binding at X₀. Then while Now, if the primal problem possesses a solution at X₀, C′h ≤ 0 for those vectors h that do not violate a_ih = 0, i = 1, …, k. If we multiply each equality a_ih = 0 by some constant and form the sum we ultimately obtain

(12.6)

Let us further assume that at least one component of is strictly positive (and thus ). Then there exists a set of sufficiently small positive and negative deviations h_l = δ, − δ such that, for h_l = δ > 0, (12.6) becomes

(12.7)

while for h_l = − δ, (12.6) may be written as

(12.8)

whence, upon combining (12.7), (12.8),

If (so that ), h_l can only be positive. Hence h_l = δ > 0 in (12.6) yields

In general then,

(sufficiency) For X∈ K, let us express the Lagrangian of f as

With L(X, U₀, V₀) concave in X over K, the linear approximation to L at X₀ is

(12.9)

Hence C′X₀ ≥ L(X, U₀, V₀) ≥ C′X, X∈ K, and thus X₀ solves the primal problem. Q.E.D.

We note briefly that if the primal problem appears as

Then (12.4) becomes

(12.4.1)

We now turn to a set of important corollaries to the preceding theorem (Corollaries 12.1, 12.2, and 12.3).

Next,

Finally,

So if the right‐hand side b_i of the ith primal structural constraint were changed by a sufficiently small amount ε_i, then the corresponding change in the optimal value of f, f⁰, would be Moreover, since images we may determine the amount by which the optimal value of the primal objective function changes given small (marginal) changes in the right‐hand sides of any specific subset of structural constraints. For instance, let us assume, without loss of generality, that the right‐hand sides of the first t < m primal structural constraints are increased by sufficiently small increments ε₁, …, ε_t, respectively. Then the optimal value of the primal objective function would be increased by

Throughout the preceding discussion the reader may have been wondering why the qualification “generally” was included in the statement of Corollary 12.3. The fact of the matter is that there exist values of, say, b_k, for which ∂f⁰/∂b_k is not defined, i.e. points where ∂f⁰/∂b_k is discontinuous so that (12.10) does not hold. To see this, let us express f in terms of b_k, as

where the depict fixed levels of the requirements .

If in Figure 12.4 we take area OABCG to be the initial feasible region when (here the line segment CG consists of all feasible points X that satisfy , then increasing the kth requirement from changes the optimal extreme point from C to D. The result of this increase in b_k is a concomitant increase in f⁰ proportionate to the distance where the constant of proportionality is Thus, over this range of b_k values, ∂f⁰/∂b_k exists (is continuous). If we again augment the level of b_k by the same amount as before, this time shifting the kth structural constraint so that it passes through point E, ∂f⁰/∂b_k becomes discontinuous since now the maximal point moves along a more steeply sloping portion (segment EF) of the boundary of the feasible region so that any further equal parallel shift in the kth structural constraint would produce a much larger increment in the optimal value of f than before. Thus, each time the constraint line a_kX = b_k passes through an extreme point of the feasible region, there occurs a discontinuity in ∂f⁰/∂b_k.

To relate this argument directly to f(b_k) above, let us describe the behavior of f(b_k) as b_k assumes the values In Figure 12.5 the piecewise continuous curve f(b_k) indicates the effect on f(b_k) precipitated by changes in b_k. (Note that each time the constraint a_kX = b_k passes through an extreme point of the feasible region in Figure 12.4, the f(b_k) curve assumes a kink, i.e. a point of discontinuity in its derivative. Note also that for a requirement level of [say ], f cannot increase any further so that beyond F′, f(b_k) is horizontal) As far as the dual objective function g(u) = b′u is concerned, g may be expressed as g(b_k) = b_ku_k+ constant since is independent of b_k. For ( optimal), Moreover, since the dual structural constraints are unaffected by changes in b_k, dual feasibility is preserved for variations in b_k away from so that, in general, for any pair of feasible solutions to the primal and dual problems, f(b_k) ≤ g(b_k), i.e. g(b_k) is tangent to f(b_k) for (point E′ in Figure 12.5) while sufficiently small. In this regard, for while for whence So while both the left‐ and right‐hand derivatives of f(b_k) exist at itself does not exist since i.e. ∂f/∂b_k possesses a finite discontinuity at . In fact, ∂f/∂b_k is discontinuous at each of the points, A′, B′, E′ and F′ in Figure 12.5. For it is evident that since f(b_k), g(b_k) coincide all along the segment B′ E′ of Figure 12.5. In this instance, we may unequivocally write Moreover, for (see Figures 12.4 and 12.5), since the constraint is superfluous, i.e. it does not form any part of the boundary of the feasible region. In general, then, ∂f/∂b_k|₊ ≤ u_k ≤ ∂f/∂b_k|₋.

Top: Graph displaying an ascending curve labeled f(bk) with dot markers labeled A, B, C, D, E, and F. Bottom: Graph displaying horizontal lines in between vertical lines connecting to dot markers indicated on top. — **Figure 12.5** Tracking f(b_k) as b_k increases.

To formalize the preceding discussion, we state Theorem 12.5.

To summarize: if both the left‐ and right‐hand derivatives ∂f⁰/∂b_i|₋, ∂f⁰/∂b_i|₊ exist and their common value is then otherwise

12.3 Lagrangian Saddle Points

(Belinski and Baumol 1968; Williams 1963)

If we structure the primal‐dual pair of problems as

PRIMAL PROBLEM
images

DUAL PROBLEM
images

then the Lagrangian associated with the primal problem is L(X, U) = C′X + U′(b − AX) with X ≥ O, U ≥ O. And if we now express the dual problem as maximize {−b′U} subject to −A′U ≤ − C,U ≥ O, then the Lagrangian of the dual problem is M(U, X) = − b′U + X′(−C + A′U), where again U ≥ O, X ≥ O. Moreover, since M(U, X) = − b′U + X′C + X′A′U = − C′X − U′b + U′AX = − L(X, U), we see that the Lagrangian of the dual problem is actually the negative of the Lagrangian of the primal problem. Alternatively, if x_j ≥ 0 is converted to then the revised primal problem appears as

with associated Lagrangian where As far as the dual of this revised problem is concerned, we seek to

In this instance also.

We next turn to the notion of a saddle point of the Lagrangian. Specifically, a point (X₀, U_o)ε E^n + m is termed a saddle point of the Lagrangian L(X, U) = C′X + U′(b − AX) if

(12.11)

for all X ≥ O, U ≥ O. Alternatively, E^2n + m is a saddle point of if

(12.12)

for all X unrestricted and U ≥ O. What these definitions imply is that simultaneously attains a maximum with respect to X and a minimum with respect to Hence (12.11), (12.12) appear, respectively, as

(12.11.1)

(12.12.1)

Under what conditions will possess a saddle point at To answer these questions, we must look to the solution of what may be called the saddle point problem: To find a point (X₀, U₀) such that (12.11) holds for all (X, U)ε E^n + m, X ≥ O, U ≥ O; or, to find a point such that (12.12) holds for all E^2n + m, X unrestricted,

In the light of (12.11.1), (12.12.1) we shall ultimately see that determining a saddle point of a Lagrangian corresponds to maxi‐minimizing or mini‐maximizing it. To this end, we look to Theorem 12.6.

Our discussion in this chapter has centered around the solution of two important types of problems – the primal linear programming problem and the saddle point problem. As we shall now see, there exists an important connection between them. Specifically, what does the attainment of a saddle point of the Lagrangian of the primal problem imply about the existence of a solution to the primal problem? To answer this, we cite Theorem 12.7.

The importance of the developments in this section is that they set the stage for an analysis of the Karush‐Kuhn‐Tucker equivalence theorem. This theorem (Theorem 12.5) establishes the notion that the existence of an optimal solution to the primal problem is equivalent to the existence of a saddle point of its associated Lagrangian, i.e. solving the primal problem is equivalent to maxi‐minimizing (mini‐maximizing) its Lagrangian.

Theorem 12.8

Karush‐Kuhn‐Tucker Equivalence Theorem (A Necessary and Sufficient Condition). A vector X₀ is an optimal solution to the primal linear programming problem

if and only if there exists a vector such that is a saddle point of the Lagrangian , in which case

Proof. (sufficiency) Let solve the saddle point problem. Then by Theorem 12.6, system (12.13) obtains. In this regard, if X₀ maximizes then Hence, this latter expression replaces (12.13a) so that (12.13) is equivalent to (12.3), i.e. if has a saddle point at , then (12.3) holds. Thus X₀ solves the primal problem.

(necessity) Let X₀ solve the primal problem.

Then, by Corollary 12.1, the optimality of X₀ implies the existence of a such that And since represents an optimal solution to the dual problem, Since X₀ satisfies , then, for any , it follows that Thus

Hence the right‐hand side of (12.12) is established. To verify its left‐hand side, we note that since is equivalent to (12.13a), by the sufficiency portion of Theorem 12.6. Q.E.D.

It is instructive to analyze this equivalence from an alternative viewpoint. We noted above that solving the primal linear programming problem amounts to maxi‐minimizing (mini‐maximizing) its associated Lagrangian. In this regard, if we express the primal problem as

then, if L(X, U₀) has a maximum in the X‐ direction at X₀, it follows from (12.4a, b, f) that

while, if L(X₀, U) attains a minimum in the U‐ direction at U₀, then, from (12.4c, d, e),

Thus, X₀ is an optimal solution to the preceding primal problem if and only if there exists a vector U₀ ≥ O such that (X₀, U₀) is a saddle point of L(X, U), in which case system (12.4) is satisfied.

One final point is in order. As a result of the discussion underlying the Karush‐Kuhn‐Tucker equivalence theorem, we have Corollary 12.4.

12.4 Duality and Complementary Slackness Theorems

(Dreyfus and Freimer 1962)

We now turn to an alternative exposition of two of the aforementioned fundamental theorems of linear programming (Chapter 6), namely the duality and complementary slackness theorems. First, Theorem 12.5 explains the duality theorem.

Theorem 12.9

Duality Theorem. A feasible solution to the primal maximum problem

is optimal if and only if the dual minimum problem

has a feasible solution with

Proof. (sufficiency) Let be a feasible solution to the dual problem with Since X₀ is, by hypothesis, a feasible solution to the primal problem, Corollary 12.4 informs us that X₀ is optimal.

(necessity) For any particular requirements vector β, let the differentiable function f(β) depict the maximum possible value of f when Then at the optimal solution X₀ to the primal problem, Moreover, it is evident that if any component of say were increased, additional feasible solutions may be admitted to our discussion without excluding any solutions that were originally feasible so that f⁰ would not decrease, i.e.

(12.14)

And if for some i at X₀, then increasing clearly does not affect the optimal value of f so that

(12.15)

where is the ith row of .

Let us now consider a suitably restricted positive increase h_j in the jth component of X₀ from where now

Here represents a solution to the modified primal problem

where is the jth column of . Clearly is feasible since

In addition, at , the objective value of the modified problem,

must be less than or equal to the maximal value of the modified problem, i.e.

(12.16)

(12.16.1)

Applying a first‐order Taylor expansion to at h_j = 0 yields

or, utilizing (12.16.1),

Then

(12.17)

With unrestricted, a sufficiently small negative increment is admissible so that is also feasible for the modified primal problem. In this instance (12.16) becomes

(12.16.2)

A second application of Taylor’s formula now yields

so that

(12.18)

Since infinitesimally small positive and negative changes preserve feasibility when is unrestricted, (12.17), (12.18) together imply that

(12.19)

From (12.11) we know that images Hence, satisfies the dual structural constraints(12.19) and thus represents a feasible solution to the dual minimum problem. (We note briefly that if the primal problem appears as

then (12.19) replaces

where images is the ith component of U₀ is a feasible solution to the associated dual minimum problem.)

It now remains to demonstrate that the primal and dual objective values are the same. First, for unrestricted, (12.19) implies that

where is the jth component of the (n × 1) vector Then And for some so that, in either instance,

Summing over all j thus yields

(12.20)

Next, if for some specific i value then while if (12.15) indicates that so that In each case, then,

Summing over all i thus yields

(12.21)

so that, from (12.20) and (12.21), Q.E.D.

Next, if, as above, we express the primal‐dual pair of problems as

PRIMAL PROBLEM
images

DUAL PROBLEM
images

then their associated Lagrangians may be written, respectively, as

where is an (m + n × 1) vector of nonnegative primal slack variables, i.e.

In this light, we now look to Theorem 12.5.

By way of interpretation: (i) if the primal slack variable corresponding to the ith primal structural constraint is positive, then the associated dual variable must equal zero. Conversely, if is positive, then the ith primal structural constraint must hold as an equality with Before turning to point (ii), let us examine the form of the jth dual structural constraint. Since A = [α₁, …, α_n],

and thus the jth dual structural constraint appears as In this light, (ii) if the jth primal variable is different from zero, then the dual surplus variable associated with the jth dual structural constraint equals zero. Conversely, if then the jth dual structural constraint is not binding and thus the jth primal variable must be zero.

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12 Duality Revisited

Create new playlist

Sign In

Sign Up

12.1 Introduction

12.2 A Reformulation of the Primal and Dual Problems

12.3 Lagrangian Saddle Points

12.4 Duality and Complementary Slackness Theorems

Notes

Table of Contents for
12 Duality Revisited