Chapter 14  Unconstrained and Constrained Optimization Problems

Shuguang Cui, Anthony Man-Cho So, and Rui Zhang

Texas A&M University, College Station, USA

The Chinese University of Hong Kong, Hong Kong, China

National University of Singapore, Singapore

In the first section of this chapter, we will give an overview of the basic mathematical tools that are useful for analyzing both unconstrained and constrained optimization problems. In order to allow the readers to focus on the applications of these tools and not to be burdened with too many technical details, we shall state most of the results without proof. However, the readers are strongly encouraged to refer to the texts [1, 2, 3, 4] for expositions of these results and other further developments. In the second section, we provide three application examples to illustrate how we could apply the optimization techniques to solve real-world problems, with a focus on communications, networking, and signal processing. In the last section, several exercise questions are given to help the audience gain a deeper understanding of the material.

14.1    Basics of Convex Analysis

The notion of convexity plays a very important role in both the theoretical and algorithmic aspects of optimization. Before we discuss the relevance of convexity in optimization, let us first introduce the notions of convex sets and convex functions and state some of their properties.

Definition 14.1.1. Let S ∈ ℝn be a set. We say that

1.  S is affine if αx + (1 − α)yS whenever x, y ∈ S and α ∈ ℝ;

2.  S is convex if αx + (1 — a)yS whenever x, yS and α ∈ [0,1].

Given x, y ∈ ℝn and a ∈ ℝ, the vector z = αx + (1 − α)y is called an affine combination of x and y. If α ∈ [0,1], then z is called a convex combination of x and y.

Geometrically, when x and y are distinct points in ℝn, the set

L = {zn:z = αx + (1  α)y,α}L = {zRn:z = αx + (1  α)y,αR}

of all affine combinations of x and y is simply the line determined by x and y; and the set

S = {zn:z = αx + (1  α)y,α[0,1]}S = {zRn:z = αx + (1  α)y,α[0,1]}

is the line segment between x and y. By convention, the empty set 00 is convex.

It is clear that one can generalize the notion of affine (resp. convex) combination of two points to any finite number of points. In particular, an affine combination of the points x1,…,xk ∈ ℝn is a point z = ki = 1αiXiz = ki = 1αiXi, where ki = 1αi = 1ki = 1αi = 1. Similarly, a convex combination of the points x1,…, xk ∈ ℝn is a point z = ki = 1αiXiz = ki = 1αiXi, where ki = 1αi = 1ki = 1αi = 1 and α1,…, ak ≥ 0.

Here are some sets in Euclidean space whose convexity can be easily established by first principles:

Example 14.1.1. (Some Examples of Convex Sets)

1.  Non-negative Orthant: n +  = {xn:x0}Rn +  = {xRn:x0}.

2.  Hyperplane: H (s, c) = {x ∈ ℝn : sT x = c}.

3.  Halfspaces: H+(s, c) = {x ∈ ℝn : sTxc}, H(s, c) = {x ∈ ℝn : sTxc}.

4.  Euclidean Ball: B(, r) = {x ∈ ℝn : ‖x|‖2r}.

5.  Ellipsoid: E(x̄, Q, r) = {x ∈ ℝn : (x)TQ(x) ≤ r2}, where Q is an n × n symmetric, positive definite matrix (i.e., xTQx ̎ 0 for all x ∈ ℝn {0}), and is denoted by Q0.?

6.  Simplex: Δ = {ni = 0αixi:ni = 0αi = 1,αi0fori=0,1,,n}Δ = {ni = 0αixi:ni = 0αi = 1,αi0fori=0,1,,n}, where x0, x1,…,xn are vectors in ℝn such that the vectors x1x0, x2x0,…, xnx0 are linearly independent (equivalently, the vectors x0, x1,…, xn are affinely independent).

7.  Positive Semidefinite Cone:

Sn +  = {An × n:ASn +  = {ARn × n:A = {A ∈ ℝn × n : A is symmetric and xTAx ≥ 0 for all x ∈ ℝn} (a symmetric matrix A ∈ ℝℝn × n is said to be positive semidefinite if xT Ax ≥ 0 for all x ∈ ℝn, and is denoted by A0).

Let us now turn to the notion of a convex function.

Definition 14.1.2. Let S ∈ ℝn be a nonempty convex set, and let f: S → ℝ be a real-valued function.

1.  We say that f is convex on S if

f(αx1 + (1  α)x2)αf(x1) + (1  α)f(x2)f(αx1 + (1  α)x2)αf(x1) + (1  α)f(x2)

(14.1)

for all x1, x2S and α ∈ [0,1]. We say that f is concave if −f is convex.

2.  We say that f is strictly convex on S if

f(αx1 + (1  α)x2)<αf(x1) + (1  α)f(x2)f(αx1 + (1  α)x2)<αf(x1) + (1  α)f(x2)

for all x1, x2S and α ∈ (0,1).

3.  The epigraph of f is the set epi(f ) = {(x, r) ∈ S × ℝ : f (x) ≤ r}.

The relationship between convex sets and convex functions can be summarized as follows:

Proposition 14.1.1. Let f be as in Definition 14.1.2. Then, f is convex (as a function) iff epi(f) is convex (as a set in S × ℝ).

Let r ∈ ℝ be arbitrary. A set closely related to the epigraph is the so-called rlevel set of f, which is defined as L(r) = {xn : f (x) ≤ r}. It is clear that if f is convex, then L(r) is convex for all r ∈ ℝ. However, the converse is not true, as illustrated by the function xx3. A function f: S → ℝ whose domain is convex and whose r-level sets are convex for all r ∈ ℝ is called quasi-convex.

One of the most desirable features of convexity is the following:

Proposition 14.1.2. Consider the optimization problem:

minimizef(x)subjecttoxS,minimizef(x)subjecttoxS,

where S ∈ ℝn is a convex set and f : S → ℝ is convex. Then, any local minimum of f is also a global minimum1.

Now, let S ∈ ℝn be an open convex set, and let f: S → ℝ be an arbitrary function. When f has suitable degree of differentiability, we can characterize its convexity by its gradient or Hessian. Specifically, we have the following:

Theorem 14.1.1. Let Sn be an open convex set, and let f : S → ℝ be a differentiable function on S. Then, fs convex on S iff

f(x1)f(x2) + (f(x2))T(x1  x2)f(x1)f(x2) + (f(x2))T(x1  x2)

for all x1, x2S. Furthermore, if f is twice continuously differentiable function on S, then f is convex on S iff2f (x) is positive semidefinite for all xS.

Sometimes it may be difficult to verify directly from the definition whether a given function is convex or not. However, a function can often be obtained as a composition of several, more elementary functions. When each of those elementary functions is convex, it is natural to ask whether their composition is also convex. In general, the answer is no. On the other hand, here are some transformations that preserve convexity.

Theorem 14.1.2. Let S ∈ ℝn be a nonempty convex set. Then, the following hold:

1.  (Non-negative Combinations) Let f1,…,fm : S → ℝ be convex functions, and let α1,…, αm ≥ 0. Then, the function mi = 1αifimi = 1αifi is convex on S.

2.  (Pointwise Supremum) Let {fi}i∈I be an arbitrary family of convex functions on S. Then, the pointwise supremum f = supiI fi is convex on S.

3.  (Affine Composition) Let f: ℝn → ℝ be a convex function and A: ℝm → ℝn be an affine mapping2. then, the function f o A: ℝm → ℝn given by (fA)(x) = f (A(x)) is convex on ℝm.

4.  (Composition with an Increasing Convex Function) Let f : S → ℝ be a convex function, and let g : ℝ → ℝ be an increasing convex function. Then, the function g ∘ f : S → ℝ defined by (g ∘ f)(x) = g(f (x)) is convex on S.

5.  (Restriction on Lines) Let f : S → ℝ be a function. Given x0S and h ∈ ℝn, define the function fx0,h:{ + }fx0,h:RR{ + } by

˜fx0,h(t) = {f(x0 + th)ifx0 + thS, + otherwise.f˜x0,h(t) = {f(x0 + th) + ifx0 + thS,otherwise.

Then, f is convex on S iff ˜fx0,hf˜x0,h is convex onfor any x0S and h ∈ ℝn.

Let us now illustrate an application of Theorem 14.1.2.

Example 14.1.2. Let f: ℝm × x → ℝ+ be given by f(X) = ‖X2, where ‖·‖2 denotes the spectral norm or largest singular value of the m × n matrix X. By the Courant-Fischer theorem (see, e.g., [5]), we have

f(X) = sup{uTXv:u2 = 1,v2 = 1}.f(X) = sup{uTXv:u2 = 1,v2 = 1}.

(14.2)

Now, for each u ∈ ℝm and v ∈ ℝn with ‖u2 = ‖v2 = 1, define the function fu, v :ℝm × n → ℝ by

fu,v(X) = uTXv.fu,v(X) = uTXv.

Note that fu, v is a convex (in fact, linear) function of X for each u, v. Hence, it follows from (14.2) that f is a pointwise supremum of a family of linear functions of X. By Theorem 14.1.2, this implies that f is convex.

14.2    Unconstrained vs. Constrained Optimization

14.2.1    Optimality Conditions for Unconstrained Optimization

One of the most fundamental problems in optimization is to derive conditions for identifying potential optimal solutions to an optimization problem. Typically, such conditions, which are known as optimality conditions, would enable us to reduce the original optimization problem to that of checking the validity of certain geometric conditions, or to that of checking the consistency of certain system of inequalities. As an illustration and to motivate our discussion, let us first consider a univariate, twice continuously differentiable function f: ℝ → ℝ. Recall from basic calculus that if ∈ ℝ is a local minimum of f, then we must have

df(x)dx|x = ˉx = 0.df(x)dx|x = x¯ = 0.

(14.3)

In other words, condition (14.3) is a necessary condition for to be a local minimum. However, it is not a sufficient condition, as an ∈ ℝ that satisfies (14.3) can be a local maximum or just a stationary point. In order to certify that is indeed a local minimum, one could check, in addition to (14.3), whether

d2f(x)dx2|x = ˉx>0.d2f(x)dx2|x = x¯>0.

(14.4)

In particular, condition (14.4) is a sufficient condition for to be a local minimum.

In the above discussion, conditions (14.3) and (14.4) together yield a system of inequalities whose solutions are local minima of the function f. Alternatively, they can be viewed as stating the geometric fact that there is no descent direction in a neighborhood of a local minimum. In particular, the former is an algebraic interpretation of local optimality, while the latter is a geometric interpretation. It is worth noting that each interpretation has its own advantage. Indeed, the geometric interpretation can often help us gain intuitions about the problem at hand, and the algebraic interpretation would help to make those intuitions precise. Thus, it is good to keep both interpretations in mind.

To derive optimality conditions for the local minima of a multivariate twice continuously differentiable function f : ℝn → ℝ, we first recall that ∇f (x), the gradient of f at x ∈ ℝn, is the direction of steepest ascent at x. Thus, if ∇f (x) ≠ 0, then starting at x, we can proceed in the direction −∇f (x) and achieve a smaller function value. More specifically, we have the following

Proposition 14.2.1. Suppose that f: ℝn → ℝ is continuously differentiable at x̄ ∈ ℝn. If there exists a d ∈ ℝn such that ∇ (f (x̄))T d < 0, then there exists an α0 > 0 such that f (x̄ + αd) < f () for all a ∈ (0, α0). In other words, d is a descent direction of f at .

Using Proposition 14.2.1, we can establish the following:

Corollary 14.2.1. (First Order Necessary Condition for Unconstrained Optimization) Suppose that f : ℝn → ℝ is continuously differentiable at x̄ ∈ ℝn. If x̄ is a local minimum, then we have ▽f (x̄) = 0. In particular, we have = ∅.

Similar to the univariate case, even if x̄ ∈ ℝ0 satisfies ∇f (x̄) = 0, we cannot conclude that x̄ is a local minimum. For instance, consider the function f : ℝ2 → ℝ given by f(x1,x2) =   x21  (x  x2)f(x1,x2) =   x21  (x  x2). Then, we have

f(x) =   2(2x1  x2,x2  x1).f(x) =   2(2x1  x2,x2  x1).

In particular, the (unique) solution to ▽f (x) = 0 is 1 = 2 =0. However, as can be easily verified, the point (x̄1, x̄2) = (0,0) is a global maximum of f.

The above example shows that some extra conditions are needed in order to guarantee that a solution to the equation ∇f(x) = 0 is a local minimum of f. for instance, we have the following proposition, which states that if f is convex at x̄, then the necessary condition in Corollary 14.2.1 is also sufficient3:

Proposition 14.2.2. Suppose that f: ℝn → ℝ is continuously differentiable and convex at . Then, x̄ is a global minimum iff ∇f () = 0.

Alternatively, if ∇f () = 0 and ∇2f (), the Hessian of f at , is positive definite, then is a local minimum. Specifically, we have the following proposition,

which generalizes the corresponding result for the univariate case (cf. (14.3) and (14.4)).

Proposition 14.2.3. (Second Order Sufficient Condition for Unconstrained Optimization) Suppose that f : ℝn → ℝ is twice continuously differentiable at x̄ ∈ ℝn. If ∇f (x̄) = 0 and ∇2f () is positive definite, then is a local minimum.

Let us now illustrate the above results with an example.

Example 14.2.1. Let f : ℝn → ℝ be defined by f(x) = 12xTQx+cTxf(x) = 12xTQx+cTx, where QSn and c ∈ ℝn are given. Then, f is continuously differentiable, and we have ∇ f(x) = Qx + c and V2f(x) = Q. Now, if f is convex, or equivalently, if Q0, then by Proposition 14.2.2, any ∈ ℝn that satisfies Qx̄ + c = 0 will be a global minimum of f. Note that in this case, we cannot even conclude from Proposition 14.2.3 that x̄ is a local minimum of f, since we only have Q0. On the other hand, suppose that Q ≻ 0. Then, Q is invertible, and by Proposition 14.2.3, the point x̄ = − Q−1c is a local minimum of f. However, since f is convex, Proposition 14.2.2 allows us to draw a stronger conclusion, namely, the point x̄ = − Q−1c is in fact the unique global minimum.

14.2.2    Optimality Conditions for Constrained Optimization

After deriving optimality conditions for unconstrained optimization problems, let us turn our attention to constrained optimization problems of the form

minxSf(x),minxSf(x),

(14.5)

where S is a nonempty subset of n. Note that due to the constraint xS, even if x̄ ∈ n satisfies ∇f (x̄) = 0 and ∇2f (x̄) ≻ 0, it may not be a solution to (14.5), since x̄ need not lie in 0. Similarly, a local minimum of f over S need not satisfy ∇f () = 0, since all the descent directions of f at x̄ may lead to points that do not lie in S. Thus, in order to derive optimality conditions for (14.5), we need to consider not only the set of descent directions at x̄, i.e.,

D = {dn:f(ˉx)Td<0},D = {dRn:f(x¯)Td<0},

(14.6)

but also the set of feasible directions at x̄, i.e.,

F = {dn{0}:thereexistsanα0>0suchthatˉx + αdSforallα(0,α0)}.F = {dRn{0}:thereexistsanα0>0suchthatx¯ + αdSforallα(0,α0)}.

(14.7)

We emphasize that in order for dF, the entire open line segment {x̄ + αd : ∈ (0, α0)} must belong to S. This is to ensure that whenever dD, one can find a feasible solution S with f () < f () by proceeding from in the direction d. Indeed, by Proposition 14.2.1, if dD, then there exists an α1 > 0 such that f ( + αd) < f () for all α (0, α1). However, if x̄ + αdS for any α ∈ (0, α1), then we cannot rule out the local minimality of x̄, even if x̄ + αdS for some α ̄ α1.

As the following proposition shows, the sets D and F provide a necessary, and under some additional assumptions, even sufficient condition for optimality.

Proposition 14.2.4. Consider Problem (14.5). Suppose that f : ℝn → ℝ is continuously differentiable at x̄ ∈ S. If x̄ is a local minimum, then we have DF = ∅. Conversely, suppose that (i) DF = ∅, (ii) f is convex at x̄, and (iii) there exists an ϵ ̄ 0 such that d = xF for any xSB°(x̄, ϵ). Then, x̄ is a local minimum of f over S.

REMARKS: Condition (iii) is to ensure that the entire line segment {x̄ + α(x) : α ∈ [0,1]} lies in S for any xSB°(, ϵ), so that d = xF; see the remark after (14.7).

So far we have only discussed optimality conditions for a very general class of optimization problems, i.e., problems of the form (14.5). In particular, we derived a necessary condition for local optimality in terms of the sets D and F, namely that DF = ∅. However, such a condition is largely geometric, and it is not as easy to manipulate as algebraic conditions (e.g., a system of inequalities). On the other hand, as we will show below, if the feasible region has more structure, then one can circumvent such difficulty and derive algebraic optimality conditions. To begin, let us consider the following class of optimization problems:

minimizef(x)subjecttogi(x)0fori = 1,,m,xX,minimizesubjecttof(x)gi(x)0fori = 1,,m,xX,

(14.8)

where f: ℝn → ℝ and gi : ℝn → ℝ are continuously differentiable functions, and X is a nonempty open subset of ℝn (usually we take X = n). We then have the following:

Proposition 14.2.5. Let S = {xX : gi(x) ≤ 0 for i = 1,…, m} be the feasible region of problem (14.8), and let S. Define

I = {i{1,,m}:gi(ˉx) = 0}I = {i{1,,m}:gi(x¯) = 0}

to be the index set for the active or binding constraints. Furthermore, define

G = {dn:gi(ˉx)Td<0foriI},ˉG = {dn{0}:gi(ˉx)Td<0foriI}.G = {dRn:gi(x¯)Td<0foriI},G¯¯¯ = {dRn{0}:gi(x¯)Td<0foriI}.

(14.9)

Then, we have GF, where F is defined in (14.7). Moreover, if the functions gi, where iI, are strictly convex (resp. concave) at , then F = ∈ (resp. F = ).

Using Proposition 14.2.4 and Proposition 14.2.5, we can establish the following geometric optimality condition for (14.8):

Corollary 14.2.2. Let S be the feasible region of problem (14.8). Let x̄ ∈ S, and define I = {i ∈ {1,…, m} : gi() = 0}. If is a local minimum, then DG = ∅, where D is defined in (14.6) and G is defined in (14.9).

The intuition behind Corollary 14.2.2 is quite straightforward. Indeed, suppose that dDG. Then, by Proposition 14.2.1, there exists an α0 > 0 such that f ( + αd) < f () and gi( + αd) gi() = 0 for all i ∈ I and α ∈ (0, α0). Moreover, by the continuity of the functions g1,…, gm, for sufficiently small α > 0, we have gi( + αd) > 0 for all iI. It follows that there exists an α1 > 0 such that + αdS and f ( + αd) < f () for all α ∈ (0, α1). In other words, x is not a local minimum.

The upshot of Corollary 14.2.2 is that it allows us to derive optimality conditions for (14.8) that is more algebraic in nature. Specifically, Corollary 14.2.2, together with Farkas’ lemma, yields the following:

Theorem 14.2.1. (Karush–Kuhn–Tucker Necessary Conditions) Let S be a local minimum of problem (14.8), and let I = {i G m} : gi(x̄) = 0} be the index set for the active constraints. Suppose that the family {∇gi(x̄)}i∈I of vectors is linearly independent. Then, there exist ū1,…, ūm ∈ ℝ such that

f(ˉx) + mi = 1ˉuigi(ˉx) = 0,ˉuigi(ˉx) = 0fori = 1,,m,ˉui0fori = 1,.m.f(x¯¯) + i = 1mu¯igi(x¯¯)u¯igi(x¯¯)u¯i =  = 0,00fori = 1,,m,fori = 1,.m.

(14.10)

We say that ∈ ℝn is a KKT point if (i) S and (ii) there exist Lagrange multipliers 1,…, m such that (x̄, 1,…, m) satisfies the system (14.10).

Note that if the gradient vectors of the active constraints are not linearly independent, then the KKT conditions are not necessary for local optimality, even when the optimization problem is convex. This is demonstrated in the following example.

Example 14.2.2. Consider the following optimization problem:

minimizex1subjectto(x1  1)2 + (x2  1)21,(x1  1)2 + (x2 + 1)21.minimizesubjecttox1(x1  1)2 + (x2  1)21,(x1  1)2 + (x2 + 1)21.

(14.11)

Since there is only one feasible solution (i.e., (x1, x2) = (1, 0)), it is naturally optimal. Besides the primal feasibility condition, the KKT conditions of (14.11) are given by

[10] + 2u1[x1  1x2  1] + 2u2[x1  1x2 + 1] = 0,u1((x1  1)2 + (x2  1)2  1) = 0,u2((x1  1)2 + (x2 + 1)2  1) = 0.u1,u20[10] + 2u1[x1  1x2  1] + 2u2[x1  1x2 + 1] = 0,u1((x1  1)2 + (x2  1)2  1) = 0,u2((x1  1)2 + (x2 + 1)2  1) = 0.u1,u20

However, it is clear that there is no solution (ul,u2) ≥ 0 to the above system when (x1, x2) = (1,0).

Let us now illustrate Theorem 14.2.1 with an example.

Example 14.2.3. (Optimization of a Matrix Function) Let A0 and b > 0 be given. Consider the following problem:

minimize  logdet(Z)subjecttotr(AZ)b,Z0.minimizesubjectto  logdet(Z)tr(AZ)b,Z0.

(14.12)

Note that (14.12) is of the form (14.8), since we may write (14.12) as

minimize  logdet(Z)subjecttotr(AZ)b,ZSn +  + ,minimizesubjectto  logdet(Z)tr(AZ)b,ZSn +  + ,

and Sn +  + n(n + 2)/2Sn +  + Rn(n + 2)/2 is an open set. Now, it is known that for X0,

logdet(X) = X  1,tr(AX) = A;logdet(X) = X  1,tr(AX) = A;

see, e.g., [6]. Hence, the KKT conditions associated with (14.12) are given by

tr(AZ)b,Z0,(a)  Z  1 + uA = 0,u0,(b)u(tr(AZ)  b) = 0,(c)tr(AZ)b,Z0,(a)  Z  1 + uA = 0,u0,(b)u(tr(AZ)  b) = 0,(c)

Condition (a) is simply primal feasibility. Condition (c) is known as complementarity. As we shall see later, condition (b) can be interpreted as feasibility with respect to a certain dual of (14.12).

Note that Theorem 14.2.1 applies only to inequality-constrained optimization problems of the form (14.8). However, by extending the geometric arguments used to prove Corollary 14.2.2, one can establish similar necessary optimality conditions for optimization problems of the form

minimizef(x)subjecttogi(x)0fori = 1,,m1hj(x) = 0forj = 1,,m2xX,minimizesubjecttof(x)gi(x)0hj(x) = 0xX,fori = 1,,m1forj = 1,,m2

(14.13)

where f,g1,…, gm1,, h1,…, hm2 : ℝn → ℝ are continuously differentiable functions, and X is a nonempty open subset of ℝn. Specifically, we have the following:

Theorem 14.2.2. (Karush–Kuhn–Tucker Necessary Conditions) Let S be the feasible region of Problem (14.13). Suppose that ∈ S is a local minimum of problem (14.13), with I = {i ∈ {1,…,m1} : gi (x̄) = 0} being the index set for the active constraints. Furthermore, suppose that is regular, i.e., the family {gi(ˉx)}iI{hj(ˉx)}m2j = 1 of vectors is linearly independent. Then, there exist ˉυ1,,ˉυm1 and ˉw1,,ˉwm2k such that

f(ˉx) + m1i = 1ˉvigi(ˉx) + m2j = 1ˉwihj(ˉx) = 0,ˉvigi(ˉx) = 0fori = 1,,m1,ˉvi0fori = 1,,m1.

(14.14)

As demonstrated in Exercise 14.2.2, the linear independence of the gradient vectors of the active constraints is generally needed to guarantee the existence of Lagrange multipliers. However, such a regularity condition is not always easy to check. As it turns out, there are other forms of regularity conditions, a more well-known of which is the following:

Theorem 14.2.3. Suppose that in Problem (14.13), the functions g1,,gm1 are convex and h1,,hm2 are linear. Let S be a local minimum, and let I = {i ∈ {1,…,m1} : gi(x̄) = 0}. If the Slater condition is satisfied, i.e., if there exists an x′ ∈ S such that gi(x′) < 0 for all iI, then satisfies the KKT conditions (14.14).

Another setting in which the existence of Lagrange multipliers is guaranteed is the following:

Theorem 14.2.4. Suppose that in Problem (14.13), the functions g1,,gm1 are concave and h1,,hm2 are linear. Let S be a local minimum. Then, satisfies the KKT conditions (14.14).

In particular, Theorem 14.2.4 implies that when all the constraints in problem (14.13) are linear, one can always find Lagrange multipliers for any local minimum of problem (14.13).

So far we have only discussed necessary optimality conditions for constrained optimization problems. Let us now turn our attention to sufficient conditions. The following theorem can be viewed as an extension of the first order sufficient condition in Proposition 14.2.2 to the constrained setting.

Theorem 14.2.5. Suppose that in Problem (14.13), the functions g1,,gm1 are convex, h1,,hm2 are linear, and X = ℝn. Let ∈ ℝn be feasible for (14.13). If there exist vectors v̄ ∈ Rmi and ˉwm2 such that (, , ) satisfies the KKT conditions (14.14), then is a global minimum.

To demonstrate the usage of the above results, let us consider the following example:

Example 14.2.4. (Linear Programming) Consider the standard form linear programming (LP):

minimizef(x)cTxsubjecttohj(x)ajTx  bj = 0forj = 1,,m,gi(x)  xi0fori = 1,,n,

(14.15)

where a1,…, am, c ∈ ℝn and b1,…, bm ℝ. Since

f(x) = c,gi(x) =   eifori = 1,,n,hi(x) = ajforj = 1,,m,

the KKT conditions associated with (14.15) are given by

c  ni = 1viei + mj = 1wjaj = 0,xivi = 0fori = 1,,n,vi0fori = 1,,n,ajTx = bforj = 1,,m,xi0fori = 1,,n.

The above system can be written more compactly as follows:

Ax = b,x0,(a)ATw + c = v,v0,(b)xTv = 0,(c)

where A is an m × n matrix whose j-th row is aj, where j = 1,…, m. Readers who are familiar with the theory of linear programming will immediately recognize that (a) is primal feasibility, (b) is dual feasibility, and (c) is complementarity. In particular, when we apply Theorem 14.2.4 to Problem (14.15), we obtain the strong duality theorem of linear programming.

14.2.3    Lagrangian Duality

Given an optimization problem P (the primal problem), we can associate with it a dual problem whose properties are closely related to those of P. To begin our investigation, consider the following primal problem:

vp = inff(x)subjecttogi(x)0fori = 1,,m1,(P)hj(x) = 0forj = 1,,m2,xX.

Here, f, g1,gm1, h1,hm2:n are arbitrary functions, and X is an arbitrary nonempty subset of ℝn. For the sake of brevity, we shall write the first two sets of constraints in (P) as g(x) ≤ 0 and h(x) = 0, where g:nm1 is given by g(x) = (g1(x),,gm1(x)) and h:nm2 is given by (h1(x),,hm2(x)).

Now, the Lagrangian dual problem associated with (P) is the following problem:

vd = supθ(u,v)infxXL(x,u,v)(D)subjecttou0.

Here, L:n × m1 × m2 is the Lagrangian function given by

L(x,u,v) = f(x) + mi = 1uigi(x) + m2j = 1uihi(x) = f(x) + uTg(x) + vTh(x).

(14.16)

Observe that the above formulation can be viewed as a penalty function approach, in the sense that we incorporate the primal constraints g(x) ≤ 0 and h(x) = 0 into the objective function of (D) using the Lagrange multipliers u and v. Also, since the set X is arbitrary, there can be many different Lagrangian dual problems for the same primal problem, depending on which constraints are handled as g(x) ≤ 0 and h(x) = 0, and which constraints are treated by X. However, different choices of the Lagrangian dual problem will in general lead to different outcomes, both in terms of the dual optimal value as well as the computational efforts required to solve the dual problem.

Let us now investigate the relationship between (P) and (D). For any X and (ˉu,ˉv)m1 +  × m2, we have

infxXL(x,ˉu,ˉv)f(ˉx) + ˉuTg(ˉx) + ˉvTh(ˉx)supu0L(ˉx,u,v).

This implies that

supu0infxXL(x,u,v)infxXsupu0L(x,u,v).

(14.17)

In particular, we have the following weak duality theorem, which asserts that the dual objective value is always a lower bound on the primal objective value:

Theorem 14.2.6. (Weak Duality) Let be feasible for (P) and (ū, v̄) be feasible for (D). Then, we have θ(, ) ≤ f (). In particular, if υ*d=+, then (P) has no feasible solution.

Given the primal–dual pair of problems (P) and (D), the duality gap between them is defined as Δ=υ*pυ*d. By Theorem 14.2.6, we always have Δ ≥ 0. It would be nice to have Δ = 0 (i.e., zero duality gap). However, as the following example shows, this is not true in general.

Example 14.2.5. Consider the following problem from [1, Example 6.2.2]:

minimizef(x)  2x1 + x2subjecttoh(x)x1 + x2  3 = 0,xX,

(14.18)

where X ∈ ℝ2 is the following discrete set:

X = {(0,0),(0,4),(4,4)(4,0),(1,2),(2,1)}.

By enumeration, we see that the optimal value of (14.18) is −3, attained at the point (x1, x2) = (2,1). Now, one can verify that the Lagrangian function is given by

θ(v) = minxX{  2x1 + x2 + v(x1 + x2  3)} = {  4 + 5vforv  1,  8 + vfor  1v2,  3vforv2.

It follows that maxv θ(ν) = −6, which is attained at v = 2. Note that the duality gap in this example is Δ = −3 − (−6) = 3 > 0.

The above example raises the important question of when the duality gap is zero. It turns out that there is a relatively simple answer to this question. Before we proceed, let us introduce the following definition:

Definition 14.2.1. We say that (, , ) is a saddle point of the Lagrangian function L defined in (14.16) if the following conditions are satisfied:

1.  X,

2.   ≥ 0, and

3.  for all xX and (u,v)m1 × m2, we have

L(ˉx,u,v)L(ˉx,ˉu,ˉv)L(x,ˉu,ˉv).

In particular, observe that (, , ) is a saddle point of L if minimizes L over X when (u, v) is fixed at (, ), and that (, ) maximizes L over all m1 × m2 with u ≥ 0 when x is fixed at x̄.

We are now ready to state the following theorem:

Theorem 14.2.7. (Saddle Point Optimality Conditions) The point (, , ) with X and ≥ 0 is a saddle point of L iff

1.  L(x̄, ux̄, vx̄) = minx∈X L(X, ū, v̄),

2.  g() ≤ 0 and h() = 0, and

3.  ūTg(x̄) = 0.

Moreover, the point (, , ) is a saddle point of L iff and (, ) are the optimal solutions to (P) and (D), respectively, with f (x̄) = θ(, v), i.e., there is no duality gap.

In other words, the existence of a saddle point (, , ) of L implies that

infxXL(x,ˉu,ˉv) = L(ˉx,ˉu,ˉv) = supu0L(ˉx,u,v),

which in turn implies that

supu0infxXL(x,u,v) = infxXsupu0L(x,u,v),

i.e., inequality (14.17) holds with equality, and υ*p=υ*d.

Now, if we want to apply Theorem 14.2.7 to certify that the duality gap between (P) and (D) is zero, we need to produce a saddle point of the Lagrangian function L, which is not always an easy task. The following theorem, which is an application of Sion’s minimax theorem [7] (see [8] for an elementary proof), provides an easy-to-check sufficient condition for certifying zero duality gap.

Theorem 14.2.8. Let L be the Lagrangian function defined in (14.16). Suppose that

1.  X is a compact convex subset ofn,

2.  (u, v) ↦ L (x, u, v) is continuous and concave on m1 +  × m2 for each xX, and

3.  x ↦ L (x, u, v) is continuous and convex on X for each (u,v)m1 +  × m2.

Then we have.

supu0infxXL(x,u,v) = infxXsupu0L(x,u,v).

Let us now illustrate some of the above results with an example.

Example 14.2.6. (Semidefinite Programming) Consider the following standard form semidefinite programming (SDP):

inff(Z)tr(CZ),subjecttohi(Z)bj  tr(AjZ) = 0forj = 1,,m,ZXSn + ,

(14.19)

where C, A1,…, Am ∈ ℝn × n are symmetric matrices, b1,…, bm ∈ ℝ and Sn +  is the set of n × n symmetric positive semidefinite matrices. The Lagrangian dual associated with (14.19) is given by

supθ(v)infZSn + {tr(CZ) + mj = 1vj(bj  tr(AjZ))}.

(14.20)

Now, for any fixed v ∈ ℝm, we have

θ(v) = {bTvifC  mj = 1vjAjSn + ,  otherwise.

(14.21)

To see this, let UΛUT be the spectral decomposition of C  mj = 1υjAj, and suppose that Λii < 0 for some i = 1,…, n. Consider the matrix Z(α) = αUeieiTU. Clearly, we have Z(α)Sn +  for all α > 0. Moreover, as α → ∞, we have

tr((C  mj = 1vjAj)Z(α)) = αtr((UΛUT)(UeieiTUT)) = αtr(ΛeieiT) = αΛii  ,

whence

θ(v) = bTv + infZSn + tr((C  mj = 1vjAj)Z) =   .

On the other hand, if C  mj  1υjAjSn + , then we have tr ((C  mj = 1υjAj)Z)0 for any ZSn + . It follows that θ(ν) = bTv in this case (by taking, say, Z = 0).

Now, using (14.21), we see that (14.20) is equivalent to

supbTvsubjecttoC  mj = 1vjAjSn + ,

(14.22)

which is known as a dual standard form SDP.

14.3    Application Examples

In the past decade optimization techniques, especially convex optimization techniques, have been widely used in various engineering fields such as industrial engineering, mechanical engineering, and electrical engineering. For electrical engineering in particular, optimization techniques have been applied to solve problems in communications [9, 10, 11, 12, 13, 14], networking [15, 16, 17, 18, 19], signal processing [20, 22], and even circuit design [23]. In this section, we briefly go through several examples in communications, networking, and signal processing to illustrate how we could apply the results introduced in the previous section to solve real-world problems.

Example 14.3.1. (Power Allocation Optimization in Parallel AWGN Channels) Consider the transmission over n parallel AWGN channels. The ith channel, i ∈ {1,…, n}, is characterized by the channel power gain, hi ≥ 0, and the additive Gaussian noise power, σi > 0. Let the transmit power allocated to the ith channel be denoted by pi ≥ 0. The maximum information rate that can be reliably transmitted over the ith channel is given by [24]

ri = log(1 + hipiσi).

(14.23)

Given a constraint P on the total transmit power over n channels, i.e., ni = 1piP, we want to optimize the allocated power p1,…,pn such that the sum rate of n channels, ni = 1ri, is maximized. This problem is thus formulated as

maximizeni = 1log(1 + hipiσi)subjecttoni = 1piP,pi0fori = 1,,n.

(14.24)

For convenience, we rewrite the above problem equivalently as

maximizef(p)  ni = 1log(1 + hipiσi)subjecttoh(p)ni = 1pi  P0,gi(p)  pi0fori = 1,,n,pn,

(14.25)

where p = [p1,…,pn]T. It is easy to verify that f(p) is convex, and h(p), g1(p),…,gn(p) are all affine and thus convex. According to Theorem 14.2.5, if we can find a set of feasible solutions = [1,…, n] ∈ ℝn for the above constrained minimization problem as well as a set of u ≥ 0 and vi ≥ 0, i = 1,…, n such that the following KKT conditions are satisfied,

f(ˉp) + uh(ˉp) + ni = 1vigi(ˉp) = 0,(a)uh(ˉp) = 0,(b)vigi(ˉp) = 0,fori = 1,,n,(c)

(14.26)

then we can claim that is a global minimum for this problem. Suppose that u > 0. From (b), it follows that h() = 0, i.e., ni = 1ˉpi = P. From (a), it follows that

ˉpi = 1u  vi  σihifori = 1,,n.

(14.27)

Suppose that p̄i > 0. From (c), it follows that vi = 0. Then from (14.27), it follows that ˉpi = 1u  σihi>0. Clearly, if this inequality holds, the corresponding i will satisfy both (a) and (c). Otherwise, the preassumption of i > 0 cannot be true and the only feasible value for i is i = 0. In this case, since 1u  σihi0, we can always find a vi ≥ 0 such that i = 0 holds in (14.27). To summarize, for any u > 0, the set of feasible values for i that satisfy both (a) and (c) are given by

ˉpi = (1u  σihi) + fori = 1,,n,

(14.28)

where (x)+ = max(0, x) for x ∈ ℝ. Furthermore, recall that this set of p̄i’s needs to satisfy ni = 1ˉpi = P, i.e.,

ni = 1(1u  σihi) +  = P.

(14.29)

Note that for any P > 0, in the above equation there exists a unique positive root of u (which can be found numerically by a simple bisection search over the interval 0 < u <maxi(hii)). With the root of u, the corresponding i’s given in (14.28) satisfy all the KKT conditions in (a), (b), and (c), and are thus the global optimal solutions for Problem (14.25). It is worth noting that the structure for the optimal power allocation in (14.28) is known as the “water-filling” solution [24].

Example 14.3.2. (Transmit Optimization for MIMO AWGN Channels with Per-Antenna Power Constraints) Consider the transmission over a MIMO AWGN channel with n transmitting antennas and m receiving antennas. The propagation channel from the transmitter to the receiver is represented by a real matrix, Hm×n, in which all the columns are assumed to be “non-empty”, i.e., there is at least one element in each column being non-zero. The additive noises at m receiving antennas are assumed to be i.i.d. Gaussian random variables with zero mean and unit variance. The transmit signals from the ith antenna, i ∈ {1,…, n}, are denoted by xi(t) ∈ ℝ, t = 0, 1,…, which are subject to a per-antenna average power constraint Pi, i.e., E {(xi(t))2 } ≤ Pi, where E {·} denotes the expectation. Let ZSn +  denote the transmit covariance matrix, i.e., Z = E {x(t) (x(t))T}, where x(t) = [x1(t),…,xn(t)]T. The set of per-antenna transmit power constraints can then be expressed as

tr(AiZ)Pifori = 1,,n,

(14.30)

where Ai ∈ ℝn×n is a matrix with all zero elements expect for the ith diagonal element being one.

For any transmit covariance matrix ZSn + , the maximum transmit rate over the MIMO AWGN channel is given by [25]

r = logdet(I + HZHT),

(14.31)

where I denotes an identity matrix. The problem of our interest here is to maximize the rate r over ZSn +  subject to the set of per-antenna transmit power constraints, which can be equivalently formulated as

vp = minimizef(Z)  logdet(I + HZHT)subjecttogi(Z)tr(AiZ)  Pi0fori = 1,,n,ZSn + .

(14.32)

In the following, we apply the Lagrangian duality to solve the above problem. The Lagrangian function for this problem is given by

L(Z,u) = f(Z) + ni = 1uigi(Z) =   logdet(I + HZHT) + ni = 1ui(tr(AiZ)  Pi),

(14.33)

where u = [u1,,un]Tn + . The Lagrangian dual problem associated with problem (14.32) is then given by

vd = maximizeθ(u)minZSn + L(Z,u)subjecttou0.

(14.34)

It can be verified that the conditions listed in Theorem 14.2.8 are all satisfied for the Lagrangian function L(Z, u) given in (14.33). We thus conclude that v*p = v*d, i.e., the duality gap for Problem (14.32) is zero. Accordingly, we can solve this problem equivalently by solving its dual problem (14.34), as shown next.

First, we solve the minimization problem in (14.34) to obtain the dual function θ(u) for any given u0. Observe that θ(u) can be explicitly written as

θ(u) = minZSn +   logdet(I + HZHT) + tr(AuZ)  ni = 1uiPi

(14.35)

where Au = ni = 1uiAi is a diagonal matrix with the ith diagonal element equal to ui, i = 1,…,n. Note that for the minimization problem in the above, the optimal solution for Z is independent of the term ni = 1uiPi, which thus can be ignored. To solve this minimization problem, we first observe that if any diagonal element in Au, say, ui, i ∈ {1,…,n}, is equal to zero, then the minimum value for this problem becomes −∞, which is attained by, e.g., taking Z = α1i1Ti, where 1i denotes an n × 1 vector with all zero elements except for the ith element being one, and letting α → ∞. Next, we consider the case where all ui’s are greater than zero. In this case, Au is full-rank and thus its inverse exists. By defining a new variable ˉZ = A1/2uZA1/2uSn +  and using the fact that tr(AB) = tr(BA), the minimization problem in (14.35) can be rewritten as

minˉZSn +   logdet(I + HA  1/2uˉZA  1/2uHT) + tr(ˉZ).

(14.36)

Let the SVD of HA  1/2u be denoted by

HA  1/2u = UΛVT,

(14.37)

where U ∈ ℝm × m and V ∈ ℝn × n are unitary matrices, and Λ ∈ ℝm×n is a diagonal matrix with the diagonal elements being denoted by λ1,…,λk, k = min(m, n), and λi ≥ 0, i = 1,…, k. Substituting (14.37) into (14.36) and using the fact that log det (I + AB) = log det (I + BA) yield

minˉZSn +   logdet(I + ΛVTˉZVΛT) + tr(ˉZ).

(14.38)

By letting = VTZ̄V and using the fact that tr ()→ = tr (), we obtain an equivalent problem of (14.38) as

minˆZSn +   logdet(I + Λ˜ZΛT) + tr(ˆZ).

(14.39)

Recall the Hadamard’s inequality [24], which states that for any XSm + , det (X)mi=1Xii, iff X is a diagonal matrix, where Xii denotes the ith diagonal element of X, i = 1,…, m. Applying this result to Problem (14.39), it follows that the minimum value for this problem is attained iff is a diagonal matrix. Let the diagonal elements of be denoted by p1,…,pn. Since ˆZSn + , Problem (14.39) can be simplified as

minimize  ni = 1log(1 + λ2ipi) + ni = 1pisubjecttopi0fori = 1,,n.

(14.40)

Note that in the above problem, for convenience we have assumed that λi = 0, for i = k + 1,…, n. Similar to Exercise 14.3.1, the global minimum for the above problem can be shown to be the following water-filling solution:

pi = (1  1λ2i) + fori = 1,,n.

(14.41)

To summarize, for any given u > 0, the optimal solution for the minimization problem in (14.35) is given by

Zu = A  1/2uVˆZVTA  1/2i,

(14.42)

where ˆZ is a diagonal matrix with the diagonal elements given in (14.41). Moreover, the dual function θ(u) in (14.35) can be simplified to be

θ(u){  ki = 1(log(λ2i)) +  + ki = 1(1  1λ2i) +   ni = 1uiPi  ifu>0otherwise,

(14.43)

where λ1,…, λk are related to u via (14.37).

Next, we solve the dual problem (14.34) by maximizing the dual function θ(u) in (14.43) over u0. The corresponding dual optimal solution of u then leads to the optimal solution of Zu in (14.42) for the primal problem (14.32). Since v*d = v*p ≥ 0, in fact we only need to consider the maximization of θ(u) over u > 0 in (14.43). However, due to the coupled structure of λi’s and ui’s shown in (14.37), it is not evident whether θ(u) in (14.43) is differentiable over ui’s for u > 0. As a result, conventional decent methods to find the global minimum for differentiable convex functions such as Newton’s method are ineffective for our problem at hand. Thus, we resort to an alternative method, known as subgradient based method, to handle the non-differentiable function θ(u). First, we introduce the definition of subgradient for an arbitrary real-valued function z(x) defined over a nonempty convex set S ∈ ℝn. We assume that z(x) has a finite maximum. However, z(x) need not be continuously differentiable nor have an analytical expression for its differential. In this case, a vector v ∈ ℝn is called the subgradient of z(x) at point x = x0 if for any xS, the following inequality holds:

z(x)z(x0) + vT(x  x0).

(14.44)

If at any point xS a corresponding subgradient v for z(x) is attainable, then the maximum of z(x) can be found via an iterative search over xS based on v (see, e.g., the ellipsoid method [26]). Since 0(u) is defined over a convex set u > 0 and has a finite maximum, the dual problem (14.34) can thus be solved by a subgradient based method. Next, we show that the subgradient of θ(u) at any point u > 0 is given by [tr (A1Zu) − P1,…, tr (AnZu) − Pn]T, where Zu is given in (14.42). Suppose that at any two points u > 0 and u′ > 0, θ(u) and θ(u′) are attained by Z = Zu and Z = Zu, respectively. Then, we have the following inequalities:

θ(u) = L(Zu,u) = minzSn + L(Z,u)L(Zu,u) =   logdet(I + HZuHT) + [tr(A1Zu)  P1,,tr(AnZu)  Pn]u =   logdet(I + HZuHT) + [tr(A1Zu)  P1,,tr(AnZu)  Pn]u + [tr(A1Zu)  P1,,tr(AnZu)  Pn](u  u) = L(Zu,u)[tr(A1Zu)  P1,,tr(AnZu)  Pn](u  u) = θ(u)[tr(A1Zu)  P1,,tr(AnZu)  Pn](u  u),

from which the subgradient of θ(u) follows.

Last, we can verify that the optimal primal and dual solutions, Zu given in (14.42) and the corresponding u > 0 satisfy (a) of the following KKT conditions:

f(Zu) + ni = 1uigi(Zu) = 0,(a)uigi(Zu) = 0fori = 1,,n,(b)

(14.45)

while since u > 0, from (b) it follows that gi(Zu) = 0, i.e., tr(Ai Zu) = Pi must hold for i = 1,…,n. Thus, all transmit antennas should transmit with their maximum power levels with the optimal transmit covariance matrix Zu, which is consistent with the observation that the subgradient of the dual function θ(u) at the optimal dual solution of u should vanish to 0.

Example 14.3.3. (Power Efficient Beamforming in Two-Way Relay Network via SDP Relaxation) In this example, we illustrate how an originally nonconvex problem can be solved via convex techniques. As shown in Figure 14.1, we consider a two-way relay channel (TWRC) consisting of two source nodes, S1 and S2, each with a single antenna and a relay node, R, equipped with M antennas, M ≥ 2. It is assumed that the transmission protocol of TWRC uses two consecutive equal-duration time slots for one round of information exchange between S1 and S2 via R. During the first time slot, both S1 and S2 transmit concurrently to R, which linearly processes the received signal and then broadcasts the resulting signal to S1 and S2 during the second time slot. It is also assumed that perfect synchronization has been established among S1, S2, and R prior to data transmission. The received baseband signal at R in the first time slot is expressed as

yR(n) = h1p1s1(n) + h2p2s2(n) + zR(n)

(14.46)

where yR(n) ∈ ℂM is the received signal vector at symbol index n, n = 1,…,N, with N denoting the total number of transmitted symbols during one time slot; h1 ∈ ℂM and h2 ∈ ℂM represent the channel vectors from S1 to R and from S2 to R, respectively, which are assumed to be constant during the two time slots; s1(n) and s2(n) are the transmitted symbols from S1 and S2, respectively, with E {|s1(n)|} = 1, E {|s2(n)|} = 1, and |·| denoting the absolute value for a complex number; p1 and p2 denote the transmit powers of S1 and S2, respectively; and zR(n) ∈ ℂM is the receiver noise vector, independent over n, and without loss of generality, it is assumed that zR(n) has a circular symmetric complex Gaussian (CSCG) distribution with zero mean and identity covariance matrix, denoted by zR(n) ~ CN(0, I), n. Upon receiving the mixed signal from S1 and S2, R processes it with amplify-and-forward (AF) relay operation, also known as linear analogue relaying, and then broadcasts the processed signal to S1 and S2 during the second time slot. Mathematically, the linear processing (beamforming) operation at the relay can be concisely represented as

xR(n) = AyR(n),n = 1,,N

(14.47)

Image

Figure 14.1:  The two-way multi antenna relay channel.

where xR(n) ∈ ℂM is the transmitted signal at R, and A ∈ ℂM × M is the relay processing matrix.

Note that the transmit power of R can be shown equal to

pR(A) = E[tr(xR(n)xHR(n))] = Ah122p1 + Ah222p2 + tr(AAH).

(14.48)

We can assume w.l.o.g. that channel reciprocity holds for TWRC during uplink and downlink transmissions, i.e., the channels from R to S1 and S2 during the second time slot are given as hT1 and hT2, respectively. Thus, the received signals at S1 can be written as

y1(n) = hT1xR(n) + z1(n) = hT1Ah1p1s1(n) + hT1Ah2p2s2(n) + hT1AzR(n) + z1(n)

(14.49)

for n = 1,…, N, where z1(n)’s are the independent receiver noise samples at S1, and it is assumed that z1(n) ~ CN(0, 1), ∀n. Note that on the right-hand side of (14.49), the first term is the self-interference of S1, while the second term contains the desired message from S2. Assuming that both hT1Ah1 and hT1Ah2 are perfectly known at S1 via training-based channel estimation prior to data transmission, S1 can first subtract its self-interference from y1(n) and then coherently demodulate s2(n). The above practice is known as analogue network coding (ANC). From (14.49), subtracting the self-interference from y1(n) yields

˜y1(n) = ˜h21p2s2(n) + ˜z1(n),n = 1,,N

(14.50)

where ˜h21 = hT1Ah2, and ˜z1(n)~CN(0,AHh*122 + 1), where * denotes the complex conjugate. From (14.50), for a given A, the maximum achievable SNR for the end-to-end link from S2 to S1 via R, denoted by γ21, is given as

γ21 = |hT1Ah2|2p2AHh122 + 1

(14.51)

Similarly, it can be shown that the maximum SNR ϓ12 for the link from S1 to S2 via R is given as

γ12 = |hT2Ah1|2p1AHh222 + 1.

(14.52)

Now we minimize the relay transmission power given in (14.48), under the constraints that the achievable SNRs ϓ21 and ϓ12 over the two directions are above two target values, ϓ̄1 and ϓ̄2. As such, the optimization can be formulated as

minimizeApR: = Ah122p1 + Ah222p2 + tr(AAH)subjectto|hT1Ah2|2ˉγ1p2AHh122 + ˉγ1p2|hT2Ah1|2ˉγ2p1AHh222 + ˉγ2p1,

(14.53)

For the convenience of analysis, we further modify the above problem as follows. First, let Vec(Q) be a K2 x 1 vector associated with a K × K square matrix Q = [q1,…, qK ]T, where qk ∈ ℂK ,k = 1,…, K, by the rule of Vec(Q) = [qT1,,qTK]T. Next, with b = Vec(A) and Θ = p1h1hH1 + p2h2hH2 + I, we can express pR in the objective function of (14.53) as pR = tr(AΘAH) = Φb22, with Φ = (diag(ΘT,ΘT))12, where diag(A, B) denotes a block-diagonal matrix with A and B as the diagonal square matrices. Similarly, let f1 = Vec(h1hT2) and f2 = Vec(h2hT1). Then, from (14.53) it follows that |hT1Ah2|2 = |fT1b|2 and |hT2Ah1|2 = |fT2b|2. Furthermore, by defining

hi = [hi(1,1)0hi(2,1)00hi(1,1)0hi(2,1)],i = 1,2,

we have AHh*i22 = hib22,, i=1,2. Using the above transformations, (14.53) can be rewritten as

minimizebpR: = Φb22subjectto|fT1b|2ˉγ1p2h1b22 + ˉγ1p2|fT2b|2ˉγ2p1h2b22 + ˉγ2p1.

(14.54)

The above problem can be shown to be still nonconvex. However, in the following, we show that the exact optimal solution could be obtained via a relaxed SDP problem.

We first define E0 = ΦHΦ,E1 = p2˜γ1fT1  hH1h1 and E2 = p2˜γ2f*2fT2  hH2h2. Since standard semidefinite programming (SDP) formulations only involve real variables and constants, we introduce a new real matrix variable as X = [bR; bI] × [bR; bI]T, where bR = Re(b) and bI = Im (b) are the real and imaginary parts of b, respectively. To rewrite the norm representations at (14.54) in terms of X, we need to rewrite E0, E1, and E2, as expanded matrices F0, F1, and F2, respectively, in terms of their real and imaginary parts. Specifically, to write out F0, we first define the short notations ΦR = Re(Φ) and ΦI = Im(Φ); then we have

F0 = [ΦTRΦR + ΦTIΦIΦTIΦR  ΦTRΦIΦTRΦI  ΦTIΦRΦTRΦR + ΦTIΦI].

The expanded matrices F1 and F2 can be generated from E1 and E2 in a similar way, where the two terms in E1 or E2 could first be expanded separately then summed together.

As such, problem (14.54) can be equivalently rewritten as

minimizeXpR: = tr(F0X)subjecttotr(F1X)1,tr(F2X)1,X0,rank(X) = 1.

(14.55)

The above problem is still not convex given the last rank-one constraint. However, if we remove such a constraint, this problem is relaxed into a convex SDP problem as shown below.

minimizeXpR: = tr(F0X)subjecttotr(F1X)1,tr(F2X)1,X0.

(14.56)

Given the convexity of the above SDP problem, the optimal solution could be efficiently found by various convex optimization methods. Note that SDP relaxation usually leads to an optimal X for problem (14.56) that is of rank r with r ≥ 1, which makes it impossible to reconstruct the exact optimal solution for Problem (14.54) when r > 1. A commonly adopted method in the literature to obtain a feasible rank-one (but in general suboptimal) solution from the solution of SDP relaxation is via “randomization” (see, e.g., [27] and references therein). Fortunately, we show in the following that with the special structure in Problem (14.56), we could efficiently reconstruct an optimal rank-one solution from its optimal solution that could be of rank r with r > 1, based on some elegant results derived for SDP relaxation in [28]. In other words, we could obtain the exact optimal solution for the nonconvex problem in (14.55) without losing any optimality, and as efficiently as solving a convex problem.

Theorem 14.3.1. Assume that an optimal solution X* of rank r > 1 has been found for Problem (14.56), we could efficiently construct another feasible optimal solution X** of rank one, i.e., X** is the optimal solution for both (14.55) and (14.56).

Proof: Please refer to [21].

Note that the above proof is self-constructive, based on which we could easily obtain a routine to obtain an optimal rank-one solution for Problem (14.55) from X*. Then we could map the solution back to obtain an optimal solution for the problem in (14.53).

14.4    Exercises

Exercise 14.4.1. Please indicate whether the following sets are convex or not.

1.  {x:aTx  bcT+ d1;cTx+d<0} ;

2.  {X : Ax = b, ‖x‖2 = 1};

3.  {X : X11a0 + X22a1 ⪰ 0, a0Sn, a1Sn}; (Xij stands for the ijth element in matrix X)

4.  {X : aTXa = 1}

Exercise 14.4.2. Please indicate whether the following functions are convex or concave or neither.

1.  f(x) = supw{logni = 1exiw};

2.  f (x) = −(x1x2x3)1/3, x > 0;

3.  f (X) = lo gdet (ATXA) ; X ≻ 0;

4.  f(x) = xTAx+2x  5,A = [0110].

Exercise 14.4.3. With the following problem formulation, answer the followup questions.

minimize  (x1 + x2)subjecttoa1x21,a2x  b221,wherex = [x1,x2]T,a1 = a2 = [1001],andb2 = [1,0]T.

where f(x) = xTAx + 2x  5,A = [0110], and b2 = [1,0]T.

1.  Is this problem convex?

2.  Does Slater’s constraint condition hold?

3.  What is the optimal solution for this problem? (Hint: Try to solve this problem graphically if the KKT conditions are hard to solve.)

4.  What is the optimal objective value for the dual problem?

5.  What is the optimal value for the dual variable associated with the second constraint?

Exercise 14.4.4. Given the optimization problem shown in Exercise 14.4.3, please reformulate it as a semidefinite programming (SDP) problem, then derive the dual problem of the resulting SDP problem.

Exercise 14.4.5. With the following optimization problem, answer the followup questions.

maximizePni = 1log(1 + Piδi)subjecttoni = 1Pi = Ptotal,P0,

where P = [P1,…, Pn]T, and δi > 0, i = 1,…, n.

1.  Is KKT sufficient for us to get the optimal solution for the above problem?

2.  Is KKT necessary for the optimal solution?

3.  Please write out the KKT conditions for this problem.

4.  Please solve the general form of optimal Pi’s.

5.  If n = 3, δ1 = 2, δ2 = 10, δ3 = 5, and Ptotal = 10, what are the optimal Pi values?

Exercise 14.4.6. Let 1 <m <n be integers, and let A be an m × n matrix with full row rank. Furthermore, let c ∈ ℝn and QSn +  +  be given. Consider the following optimization problem:

minimize12xTQx + cTxsubjecttoAx = 0.

(14.57)

1.  Explain why the KKT conditions are necessary and sufficient for (14.57).

2.  Write down the KKT conditions associated with (14.57). Hence, express the optimal solution to (14.57) in closed form.

Exercise 14.4.7. Let f : ℝn → ℝn be a differentiable convex function. Consider the following problem:

minimizef(x)subjecttox0.

(14.58)

Show that ∈ ℝn is an optimal solution to (14.58) iff it satisfies the following system:

f(ˉx)0,ˉx0,ˉxTf(ˉx) = 0.

Exercise 14.4.8. This problem is concerned with finding the minimum-volume enclosing ellipsoid of a set of vectors.

1.  Let u ∈ ℝ n be fixed, and define the function g:S n+ g(X) = Xu22 Find ∇g(X).

2.  Let V = {v1,…, vm} ∈ ℝn be a set of vectors that span ℝn. Consider the following problem:

inf  logdet(X)subjecttoXvi221i = 1,,m,XSn +  + .

(14.59)

Let be an optimal solution to (14.59) (it can be shown that such an exists). Write down the KKT conditions that must satisfy.

3.  Suppose that m = n and vi = ei for i = 1,…,n, where ei is the i-th standard basis vector. Using the above result, determine the optimal solution to (14.59) and find the corresponding Lagrange multipliers.

Exercise 14.4.9. Let a ∈ ℝn, b ∈ ℝ and c ∈ ℝn be such that a, c > 0 and b > 0. Consider the following problem:

minimizeni = 1cixisubjecttoni = 1aixi = b,x0.

(14.60)

1.  Let u1 ∈ ℝ and u2 ∈ ℝn be the Lagrange multipliers associated with the equality and inequality constraints, respectively. Write down the KKT conditions associated with (14.60).

2.  Give explicit expressions for ∈ ℝn, 1 ∈ ℝ and 2 ∈ ℝn such that (x̄, 1, 2) satisfies the KKT conditions above.

3.  Is the solution X ∈ Rn found above an optimal solution to (14.60)? Explain.

References

[1]  M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms, 2nd ed., ser. Wiley-Interscience Series in Discrete Mathematics and Optimization. New York: John Wiley & Sons, Inc., 1993.

[2]  D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, Massachusetts: Athena Scientific, 1999.

[3]  S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge: Cambridge University Press, 2004, available online at http://www.stanford.edu/~boyd/cvxbook/.

[4]  D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, 3rd ed., ser. International Series in Operations Research and Management Science. New York: Springer Science+Business Media, LLC, 2008, vol. 116.

[5]  R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge: Cambridge University Press, 1985.

[6]  M. Brookes, “The Matrix Reference Manual,” 2005, available online at http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html.

[7]  M. Sion, “On General Minimax Theorems,” Pacific Journal of Mathematics, vol. 8, no. 1, pp. 171–176, 1958.

[8]  H. Komiya, “Elementary Proof for Sion’s Minimax Theorem,” Kodai Mathematical Journal, vol. 11, no. 1, pp. 5–7, 1988.

[9]  S. Cui, A. J. Goldsmith, and A. Bahai, “Energy-constrained modulation optimization,” IEEE Transactions on Wireless Communications, vol. 4, no. 5, pp. 2349–2360, September 2005.

[10]  S. Cui, M. Kisialiou, Z.-Q. Luo, and Z. Ding, “Robust blind multiuser detection against signature waveform mismatch based on second order cone programming,” IEEE Transactions on Wireless Communications, vol. 4, no. 4, pp. 1285–1291, July 2005.

[11]  R. Zhang and Y.-C. Liang, “Exploiting multi-antennas for opportunistic spectrum sharing in cognitive radio networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 1, pp. 88–102, Feb. 2008.

[12]  W. Yu and R. Lui, “Dual methods for nonconvex spectrum optimization of multicarrier systems,” IEEE Transactions on Communications, vol. 54, no. 7, pp. 1310–1322, July 2006.

[13]  L. Zhang, R. Zhang, Y.-C. Liang, Y. Xin, and S. Cui, “On the relationship between the multi-antenna secrecy communications and cognitive radio communications,” IEEE Transactions on Communications, vol. 58, no. 6, pp. 1877–1886, June 2010.

[14]  R. Zhang, S. Cui, and Y.-C. Liang, “On ergodic sum capacity of fading cognitive multiple-access and broadcast channels,” IEEE Transactions on Information Theory, vol. 55, no. 11, pp. 5161–5178, November 2009.

[15]  M. Chiang, “Balancing transport and physical layers in wireless multihop networks: Jointly optimal congestion control and power control,” IEEE Journal on Selected Areas in Communications, vol. 23, no. 1, pp. 104–116, January 2005.

[16]  J. Xiao, S. Cui, Z. Q. Luo, and A. J. Goldsmith, “Joint estimation in sensor networks under energy constraint,” IEEE Transactions on Signal Processing, vol. 54, no. 2, pp. 413–422, February 2005.

[17]  A. So and Y. Ye, “Theory of semidefinite programming for sensor network localization,” Mathematical Programming, vol. 109, pp. 367–384, 2007.

[18]  S. Cui and A. J. Goldsmith, “Cross-layer design in energy-constrained networks using cooperative MIMO techniques,” EURASIP’s Signal Processing Journal, Special Issue on Advances in Signal Processing-based Cross-layer Designs, vol. 86, pp. 1804–1814, August 2006.

[19]  R. Madan, S. Cui, S. Lall, and A. Goldsmith, “Modeling and optimization of transmission schemes in energy-constrained wireless sensor networks,” IEEE/ACM Transactions on Networking, vol. 15, no. 6, pp. 1359–1372, December 2007.

[20]  Z. Quan, S. Cui, H. V. Poor, and A. Sayed, “Collaborative wideband sensing for cognitive radios,” IEEE Signal Processing Magazine special issue on cognitive radios, vol. 25, no. 6, pp. 60–73, January 2009.

[21]  R. Zhang, Y.-C. Liang, C.-C. Chai, and S. Cui, “Optimal beamforming for two-way multi-antenna relay channel with analogue network coding,” IEEE Journal on Selected Areas of Communications, vol. 27, no. 6, pp. 699–712, June 2009.

[22]  R. Zhang and S. Cui, “Cooperative interference management with miso beamforming,” IEEE Transactions on Signal Processing, vol. 58, no. 10, pp. 5450–5458, October 2010.

[23]  S. P. Boyd, S.-J. Kim, D. D. Patil, and M. A. Horowitz, “Digital Circuit Optimization via Geometric Programming,” Operations Research, vol. 53, no. 6, pp. 899–932, November 2005.

[24]  T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley, 1991.

[25]  I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. Telecommun., vol. 10, no. 6, pp. 585–595, November 1999.

[26]  R. G. Bland, D. Goldfarb, and M. J. Todd, “The ellipsoid method: a survey,” Operations Research, vol. 29, no. 6, pp. 1039–1091, November 1981.

[27]  Z.-Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 24, no. 8, pp. 1426–1438, August 2006.

[28]  Y. Ye and S. Zhang, “New results on quadratic minimization,” SIAM J. Optim., vol. 14, pp. 245–267, 2003.

1Recall that for a generic optimization problem minxSn f(x), a point x* ∈ S is called a global minimum if f(x*) ≤ f(x) for all xS. On the other hand, if there exists an ϵ > 0 such that the point x* ∈ S satisfies f(x*) ≤ f(x) for all xS(x*, ϵ), then it is called a local minimum. Here, B°(, ϵ) = denotes the open ball centered at ∈ ℝn of radius ϵ > 0.

2A map A :m → ℝn is said to be affine if there exists an n × m matrix B and a vector d ∈ ℝn such that A(x) = Bx + d for all x ∈ ℝm.

3Let S be a nonempty convex subset of ℝn. We say that f : S → ℝ is convex at x̄S if f(α + (1 − α)x) ≤ αf () + (1 − α)f (x) for all a ∈ (0, 1) and x ∈ S. Note that a function f: S → ℝ can be convex at a particular point S without being convex on S.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.78.106