Chapter 14 Unconstrained and Constrained Optimization Problems
‡ Texas A&M University, College Station, USA
♮ The Chinese University of Hong Kong, Hong Kong, China
♮ National University of Singapore, Singapore
In the first section of this chapter, we will give an overview of the basic mathematical tools that are useful for analyzing both unconstrained and constrained optimization problems. In order to allow the readers to focus on the applications of these tools and not to be burdened with too many technical details, we shall state most of the results without proof. However, the readers are strongly encouraged to refer to the texts [1, 2, 3, 4] for expositions of these results and other further developments. In the second section, we provide three application examples to illustrate how we could apply the optimization techniques to solve real-world problems, with a focus on communications, networking, and signal processing. In the last section, several exercise questions are given to help the audience gain a deeper understanding of the material.
14.1 Basics of Convex Analysis
The notion of convexity plays a very important role in both the theoretical and algorithmic aspects of optimization. Before we discuss the relevance of convexity in optimization, let us first introduce the notions of convex sets and convex functions and state some of their properties.
Definition 14.1.1. Let S ∈ ℝn be a set. We say that
1. S is affine if αx + (1 − α)y ∈ S whenever x, y ∈ S and α ∈ ℝ;
2. S is convex if αx + (1 — a)y ∈ S whenever x, y ∈ S and α ∈ [0,1].
Given x, y ∈ ℝn and a ∈ ℝ, the vector z = αx + (1 − α)y is called an affine combination of x and y. If α ∈ [0,1], then z is called a convex combination of x and y.
Geometrically, when x and y are distinct points in ℝn, the set
L = {z∈ℝn:z = αx + (1 − α)y,α∈ℝ}
of all affine combinations of x and y is simply the line determined by x and y; and the set
S = {z∈ℝn:z = αx + (1 − α)y,α∈[0,1]}
is the line segment between x and y. By convention, the empty set 0
It is clear that one can generalize the notion of affine (resp. convex) combination of two points to any finite number of points. In particular, an affine combination of the points x1,…,xk ∈ ℝn is a point z = ∑ki = 1αiXi
Here are some sets in Euclidean space whose convexity can be easily established by first principles:
Example 14.1.1. (Some Examples of Convex Sets)
1. Non-negative Orthant: ℝn + = {x∈ℝn:x≥0}
2. Hyperplane: H (s, c) = {x ∈ ℝn : sT x = c}.
3. Halfspaces: H+(s, c) = {x ∈ ℝn : sTx ≤ c}, H−(s, c) = {x ∈ ℝn : sTx ≥ c}.
4. Euclidean Ball: B(x̄, r) = {x ∈ ℝn : ‖x − x̄|‖2 ≤ r}.
5. Ellipsoid: E(x̄, Q, r) = {x ∈ ℝn : (x − x̄)TQ(x − x̄) ≤ r2}, where Q is an n × n symmetric, positive definite matrix (i.e., xTQx ̎ 0 for all x ∈ ℝn {0}), and is denoted by Q ≻ 0.?
6. Simplex: Δ = {∑ni = 0αixi:∑ni = 0αi = 1,αi≥0 for i=0,1,⋯,n}
7. Positive Semidefinite Cone:
Sn + = {A∈ℝn × n:A
□
Let us now turn to the notion of a convex function.
Definition 14.1.2. Let S ∈ ℝn be a nonempty convex set, and let f: S → ℝ be a real-valued function.
1. We say that f is convex on S if
f(αx1 + (1 − α)x2)≤αf(x1) + (1 − α)f(x2) |
(14.1) |
for all x1, x2 ∈ S and α ∈ [0,1]. We say that f is concave if −f is convex.
2. We say that f is strictly convex on S if
f(αx1 + (1 − α)x2)<αf(x1) + (1 − α)f(x2)
for all x1, x2 ∈ S and α ∈ (0,1).
3. The epigraph of f is the set epi(f ) = {(x, r) ∈ S × ℝ : f (x) ≤ r}.
The relationship between convex sets and convex functions can be summarized as follows:
Proposition 14.1.1. Let f be as in Definition 14.1.2. Then, f is convex (as a function) iff epi(f) is convex (as a set in S × ℝ).
Let r ∈ ℝ be arbitrary. A set closely related to the epigraph is the so-called r—level set of f, which is defined as L(r) = {x ∈ ℝn : f (x) ≤ r}. It is clear that if f is convex, then L(r) is convex for all r ∈ ℝ. However, the converse is not true, as illustrated by the function x ↦ x3. A function f: S → ℝ whose domain is convex and whose r-level sets are convex for all r ∈ ℝ is called quasi-convex.
One of the most desirable features of convexity is the following:
Proposition 14.1.2. Consider the optimization problem:
minimize f(x)subject to x∈S,
where S ∈ ℝn is a convex set and f : S → ℝ is convex. Then, any local minimum of f is also a global minimum1.
Now, let S ∈ ℝn be an open convex set, and let f: S → ℝ be an arbitrary function. When f has suitable degree of differentiability, we can characterize its convexity by its gradient or Hessian. Specifically, we have the following:
Theorem 14.1.1. Let S ∈ ℝn be an open convex set, and let f : S → ℝ be a differentiable function on S. Then, fs convex on S iff
f(x1)≥f(x2) + (∇f(x2))T(x1 − x2)
for all x1, x2 ∈ S. Furthermore, if f is twice continuously differentiable function on S, then f is convex on S iff ∇2f (x) is positive semidefinite for all x ∈ S.
Sometimes it may be difficult to verify directly from the definition whether a given function is convex or not. However, a function can often be obtained as a composition of several, more elementary functions. When each of those elementary functions is convex, it is natural to ask whether their composition is also convex. In general, the answer is no. On the other hand, here are some transformations that preserve convexity.
Theorem 14.1.2. Let S ∈ ℝn be a nonempty convex set. Then, the following hold:
1. (Non-negative Combinations) Let f1,…,fm : S → ℝ be convex functions, and let α1,…, αm ≥ 0. Then, the function ∑mi = 1αifi
2. (Pointwise Supremum) Let {fi}i∈I be an arbitrary family of convex functions on S. Then, the pointwise supremum f = supi∈I fi is convex on S.
3. (Affine Composition) Let f: ℝn → ℝ be a convex function and A: ℝm → ℝn be an affine mapping2. then, the function f o A: ℝm → ℝn given by (f ∘ A)(x) = f (A(x)) is convex on ℝm.
4. (Composition with an Increasing Convex Function) Let f : S → ℝ be a convex function, and let g : ℝ → ℝ be an increasing convex function. Then, the function g ∘ f : S → ℝ defined by (g ∘ f)(x) = g(f (x)) is convex on S.
5. (Restriction on Lines) Let f : S → ℝ be a function. Given x0 ∈ S and h ∈ ℝn, define the function fx0,h:ℝ→ℝ∪{ + ∞}
˜fx0,h(t) = {f(x0 + th)if x0 + th∈S, + ∞otherwise.
Then, f is convex on S iff ˜fx0,h
Let us now illustrate an application of Theorem 14.1.2.
Example 14.1.2. Let f: ℝm × x → ℝ+ be given by f(X) = ‖X‖2, where ‖·‖2 denotes the spectral norm or largest singular value of the m × n matrix X. By the Courant-Fischer theorem (see, e.g., [5]), we have
f(X) = sup{uTXv:‖u‖2 = 1, ‖v‖2 = 1}. |
(14.2) |
Now, for each u ∈ ℝm and v ∈ ℝn with ‖u‖2 = ‖v‖2 = 1, define the function fu, v :ℝm × n → ℝ by
fu,v(X) = uTXv.
Note that fu, v is a convex (in fact, linear) function of X for each u, v. Hence, it follows from (14.2) that f is a pointwise supremum of a family of linear functions of X. By Theorem 14.1.2, this implies that f is convex.
14.2 Unconstrained vs. Constrained Optimization
14.2.1 Optimality Conditions for Unconstrained Optimization
One of the most fundamental problems in optimization is to derive conditions for identifying potential optimal solutions to an optimization problem. Typically, such conditions, which are known as optimality conditions, would enable us to reduce the original optimization problem to that of checking the validity of certain geometric conditions, or to that of checking the consistency of certain system of inequalities. As an illustration and to motivate our discussion, let us first consider a univariate, twice continuously differentiable function f: ℝ → ℝ. Recall from basic calculus that if x̄ ∈ ℝ is a local minimum of f, then we must have
(14.3) |
In other words, condition (14.3) is a necessary condition for x̄ to be a local minimum. However, it is not a sufficient condition, as an x̄ ∈ ℝ that satisfies (14.3) can be a local maximum or just a stationary point. In order to certify that x̄ is indeed a local minimum, one could check, in addition to (14.3), whether
(14.4) |
In particular, condition (14.4) is a sufficient condition for x̄ to be a local minimum.
In the above discussion, conditions (14.3) and (14.4) together yield a system of inequalities whose solutions are local minima of the function f. Alternatively, they can be viewed as stating the geometric fact that there is no descent direction in a neighborhood of a local minimum. In particular, the former is an algebraic interpretation of local optimality, while the latter is a geometric interpretation. It is worth noting that each interpretation has its own advantage. Indeed, the geometric interpretation can often help us gain intuitions about the problem at hand, and the algebraic interpretation would help to make those intuitions precise. Thus, it is good to keep both interpretations in mind.
To derive optimality conditions for the local minima of a multivariate twice continuously differentiable function f : ℝn → ℝ, we first recall that ∇f (x), the gradient of f at x ∈ ℝn, is the direction of steepest ascent at x. Thus, if ∇f (x) ≠ 0, then starting at x, we can proceed in the direction −∇f (x) and achieve a smaller function value. More specifically, we have the following
Proposition 14.2.1. Suppose that f: ℝn → ℝ is continuously differentiable at x̄ ∈ ℝn. If there exists a d ∈ ℝn such that ∇ (f (x̄))T d < 0, then there exists an α0 > 0 such that f (x̄ + αd) < f (x̄) for all a ∈ (0, α0). In other words, d is a descent direction of f at x̄.
Using Proposition 14.2.1, we can establish the following:
Corollary 14.2.1. (First Order Necessary Condition for Unconstrained Optimization) Suppose that f : ℝn → ℝ is continuously differentiable at x̄ ∈ ℝn. If x̄ is a local minimum, then we have ▽f (x̄) = 0. In particular, we have = ∅.
Similar to the univariate case, even if x̄ ∈ ℝ0 satisfies ∇f (x̄) = 0, we cannot conclude that x̄ is a local minimum. For instance, consider the function f : ℝ2 → ℝ given by f(x1,x2) = − x21 − (x − x2)
∇f(x) = − 2(2x1 − x2,x2 − x1).
In particular, the (unique) solution to ▽f (x) = 0 is x̄1 = x̄2 =0. However, as can be easily verified, the point (x̄1, x̄2) = (0,0) is a global maximum of f.
The above example shows that some extra conditions are needed in order to guarantee that a solution to the equation ∇f(x) = 0 is a local minimum of f. for instance, we have the following proposition, which states that if f is convex at x̄, then the necessary condition in Corollary 14.2.1 is also sufficient3:
Proposition 14.2.2. Suppose that f: ℝn → ℝ is continuously differentiable and convex at x̄. Then, x̄ is a global minimum iff ∇f (x̄) = 0.
Alternatively, if ∇f (x̄) = 0 and ∇2f (x̄), the Hessian of f at x̄, is positive definite, then x̄ is a local minimum. Specifically, we have the following proposition,
which generalizes the corresponding result for the univariate case (cf. (14.3) and (14.4)).
Proposition 14.2.3. (Second Order Sufficient Condition for Unconstrained Optimization) Suppose that f : ℝn → ℝ is twice continuously differentiable at x̄ ∈ ℝn. If ∇f (x̄) = 0 and ∇2f (x̄) is positive definite, then x̄ is a local minimum.
Let us now illustrate the above results with an example.
Example 14.2.1. Let f : ℝn → ℝ be defined by f(x) = 12xTQx+cTx
14.2.2 Optimality Conditions for Constrained Optimization
After deriving optimality conditions for unconstrained optimization problems, let us turn our attention to constrained optimization problems of the form
(14.5) |
where S is a nonempty subset of ℝn. Note that due to the constraint x ∈ S, even if x̄ ∈ ℝn satisfies ∇f (x̄) = 0 and ∇2f (x̄) ≻ 0, it may not be a solution to (14.5), since x̄ need not lie in 0. Similarly, a local minimum x̄ of f over S need not satisfy ∇f (x̄) = 0, since all the descent directions of f at x̄ may lead to points that do not lie in S. Thus, in order to derive optimality conditions for (14.5), we need to consider not only the set of descent directions at x̄, i.e.,
(14.6) |
but also the set of feasible directions at x̄, i.e.,
F = {d∈ℝn{0}:there exists an α0>0 such that ˉx + αd∈S for all α∈(0,α0)}. |
(14.7) |
We emphasize that in order for d ∈ F, the entire open line segment {x̄ + αd : ∈ (0, α0)} must belong to S. This is to ensure that whenever d ∈ D, one can find a feasible solution x̄′ ∈ S with f (x̄′) < f (x̄) by proceeding from x̄ in the direction d. Indeed, by Proposition 14.2.1, if d ∈ D, then there exists an α1 > 0 such that f (x̄ + αd) < f (x̄) for all α ∈ (0, α1). However, if x̄ + αd ∈ S for any α ∈ (0, α1), then we cannot rule out the local minimality of x̄, even if x̄ + αd ∈ S for some α ̄ α1.
As the following proposition shows, the sets D and F provide a necessary, and under some additional assumptions, even sufficient condition for optimality.
Proposition 14.2.4. Consider Problem (14.5). Suppose that f : ℝn → ℝ is continuously differentiable at x̄ ∈ S. If x̄ is a local minimum, then we have D ∩ F = ∅. Conversely, suppose that (i) D ∩ F = ∅, (ii) f is convex at x̄, and (iii) there exists an ϵ ̄ 0 such that d = x − x̄ ∈ F for any x ∈ S ∩ B°(x̄, ϵ). Then, x̄ is a local minimum of f over S.
REMARKS: Condition (iii) is to ensure that the entire line segment {x̄ + α(x − x̄) : α ∈ [0,1]} lies in S for any x ∈ S ∩ B°(x̄, ϵ), so that d = x − x̄ ∈ F; see the remark after (14.7).
So far we have only discussed optimality conditions for a very general class of optimization problems, i.e., problems of the form (14.5). In particular, we derived a necessary condition for local optimality in terms of the sets D and F, namely that D ∩ F = ∅. However, such a condition is largely geometric, and it is not as easy to manipulate as algebraic conditions (e.g., a system of inequalities). On the other hand, as we will show below, if the feasible region has more structure, then one can circumvent such difficulty and derive algebraic optimality conditions. To begin, let us consider the following class of optimization problems:
minimizef(x)subject togi(x)≤0 for i = 1,…,m,x∈X, |
(14.8) |
where f: ℝn → ℝ and gi : ℝn → ℝ are continuously differentiable functions, and X is a nonempty open subset of ℝn (usually we take X = ℝn). We then have the following:
Proposition 14.2.5. Let S = {x ∈ X : gi(x) ≤ 0 for i = 1,…, m} be the feasible region of problem (14.8), and let x̄ ∈ S. Define
I = {i∈{1,…,m}:gi(ˉx) = 0}
to be the index set for the active or binding constraints. Furthermore, define
G = {d∈ℝn:∇gi(ˉx)Td<0 for i∈I},ˉG = {d∈ℝn{0}:∇gi(ˉx)Td<0 for i∈I}. |
(14.9) |
Then, we have G ∈ F ∈ G̅, where F is defined in (14.7). Moreover, if the functions gi, where i ∈ I, are strictly convex (resp. concave) at x̄, then F = ∈ (resp. F = G̅).
Using Proposition 14.2.4 and Proposition 14.2.5, we can establish the following geometric optimality condition for (14.8):
Corollary 14.2.2. Let S be the feasible region of problem (14.8). Let x̄ ∈ S, and define I = {i ∈ {1,…, m} : gi(x̄) = 0}. If x̄ is a local minimum, then D ∩ G = ∅, where D is defined in (14.6) and G is defined in (14.9).
The intuition behind Corollary 14.2.2 is quite straightforward. Indeed, suppose that d ∈ D ∩ G. Then, by Proposition 14.2.1, there exists an α0 > 0 such that f (x̄ + αd) < f (x̄) and gi(x̄ + αd) gi(x̄) = 0 for all i ∈ I and α ∈ (0, α0). Moreover, by the continuity of the functions g1,…, gm, for sufficiently small α > 0, we have gi(x̄ + αd) > 0 for all i ∈ I. It follows that there exists an α1 > 0 such that x̄ + αd ∈ S and f (x̄ + αd) < f (x̄) for all α ∈ (0, α1). In other words, x is not a local minimum.
The upshot of Corollary 14.2.2 is that it allows us to derive optimality conditions for (14.8) that is more algebraic in nature. Specifically, Corollary 14.2.2, together with Farkas’ lemma, yields the following:
Theorem 14.2.1. (Karush–Kuhn–Tucker Necessary Conditions) Let x̄ ∈ S be a local minimum of problem (14.8), and let I = {i G m} : gi(x̄) = 0} be the index set for the active constraints. Suppose that the family {∇gi(x̄)}i∈I of vectors is linearly independent. Then, there exist ū1,…, ūm ∈ ℝ such that
∇f(ˉx) + m∑i = 1ˉui∇gi(ˉx) = 0,ˉuigi(ˉx) = 0for i = 1,…,m,ˉui≥0for i = 1,….m. |
(14.10) |
We say that x̄ ∈ ℝn is a KKT point if (i) x̄ ∈ S and (ii) there exist Lagrange multipliers ū1,…, ūm such that (x̄, ū1,…, ūm) satisfies the system (14.10).
Note that if the gradient vectors of the active constraints are not linearly independent, then the KKT conditions are not necessary for local optimality, even when the optimization problem is convex. This is demonstrated in the following example.
Example 14.2.2. Consider the following optimization problem:
minimizex1subject to(x1 − 1)2 + (x2 − 1)2≤1,(x1 − 1)2 + (x2 + 1)2≤1. |
(14.11) |
Since there is only one feasible solution (i.e., (x1, x2) = (1, 0)), it is naturally optimal. Besides the primal feasibility condition, the KKT conditions of (14.11) are given by
[10] + 2u1[x1 − 1x2 − 1] + 2u2[x1 − 1x2 + 1] = 0,u1((x1 − 1)2 + (x2 − 1)2 − 1) = 0,u2((x1 − 1)2 + (x2 + 1)2 − 1) = 0.u1,u2≥0
However, it is clear that there is no solution (ul,u2) ≥ 0 to the above system when (x1, x2) = (1,0).
□
Let us now illustrate Theorem 14.2.1 with an example.
Example 14.2.3. (Optimization of a Matrix Function) Let A ≻ 0 and b > 0 be given. Consider the following problem:
minimize − log det(Z)subject totr(AZ)≤b,Z≻0. |
(14.12) |
Note that (14.12) is of the form (14.8), since we may write (14.12) as
minimize − log det(Z)subject totr(AZ)≤b,Z∈Sn + + ,
and Sn + + ∈ℝn(n + 2)/2
∇log det (X) = X − 1, ∇tr(AX) = A;
see, e.g., [6]. Hence, the KKT conditions associated with (14.12) are given by
tr(AZ) ≤ b, Z ≻ 0, (a) − Z − 1 + uA = 0, u ≥ 0, (b)u(tr(AZ) − b) = 0, (c)
Condition (a) is simply primal feasibility. Condition (c) is known as complementarity. As we shall see later, condition (b) can be interpreted as feasibility with respect to a certain dual of (14.12).
□
Note that Theorem 14.2.1 applies only to inequality-constrained optimization problems of the form (14.8). However, by extending the geometric arguments used to prove Corollary 14.2.2, one can establish similar necessary optimality conditions for optimization problems of the form
minimizef(x)subject togi(x)≤0for i = 1,…,m1hj(x) = 0for j = 1,…,m2x∈X, |
(14.13) |
where f,g1,…, gm1,, h1,…, hm2 : ℝn → ℝ are continuously differentiable functions, and X is a nonempty open subset of ℝn. Specifically, we have the following:
Theorem 14.2.2. (Karush–Kuhn–Tucker Necessary Conditions) Let S be the feasible region of Problem (14.13). Suppose that x̄ ∈ S is a local minimum of problem (14.13), with I = {i ∈ {1,…,m1} : gi (x̄) = 0} being the index set for the active constraints. Furthermore, suppose that x̄ is regular, i.e., the family {∇gi(ˉx)}i∈I∪{∇hj(ˉx)}m2j = 1 of vectors is linearly independent. Then, there exist ˉυ1,⋯,ˉυm1∈ℝ and ˉw1,⋯,ˉwm2k∈ℝ such that
∇f(ˉx) + m1∑i = 1ˉvi∇gi(ˉx) + m2∑j = 1ˉwi∇hj(ˉx) = 0,ˉvigi(ˉx) = 0for i = 1,…,m1,ˉvi≥0for i = 1,…,m1. |
(14.14) |
As demonstrated in Exercise 14.2.2, the linear independence of the gradient vectors of the active constraints is generally needed to guarantee the existence of Lagrange multipliers. However, such a regularity condition is not always easy to check. As it turns out, there are other forms of regularity conditions, a more well-known of which is the following:
Theorem 14.2.3. Suppose that in Problem (14.13), the functions g1,…,gm1 are convex and h1,…,hm2 are linear. Let x̄ ∈ S be a local minimum, and let I = {i ∈ {1,…,m1} : gi(x̄) = 0}. If the Slater condition is satisfied, i.e., if there exists an x′ ∈ S such that gi(x′) < 0 for all i ∈ I, then x̄ satisfies the KKT conditions (14.14).
Another setting in which the existence of Lagrange multipliers is guaranteed is the following:
Theorem 14.2.4. Suppose that in Problem (14.13), the functions g1,…,gm1 are concave and h1,…,hm2 are linear. Let x̄ ∈ S be a local minimum. Then, x̄ satisfies the KKT conditions (14.14).
In particular, Theorem 14.2.4 implies that when all the constraints in problem (14.13) are linear, one can always find Lagrange multipliers for any local minimum of problem (14.13).
So far we have only discussed necessary optimality conditions for constrained optimization problems. Let us now turn our attention to sufficient conditions. The following theorem can be viewed as an extension of the first order sufficient condition in Proposition 14.2.2 to the constrained setting.
Theorem 14.2.5. Suppose that in Problem (14.13), the functions g1,…,gm1 are convex, h1,…,hm2 are linear, and X = ℝn. Let x̄ ∈ ℝn be feasible for (14.13). If there exist vectors v̄ ∈ Rmi and ˉw∈ℝm2 such that (x̄, v̄, w̄) satisfies the KKT conditions (14.14), then x̄ is a global minimum.
To demonstrate the usage of the above results, let us consider the following example:
Example 14.2.4. (Linear Programming) Consider the standard form linear programming (LP):
minimizef(x)≡cTxsubject tohj(x)≡ajTx − bj = 0for j = 1,…,m,gi(x)≡ − xi≤0for i = 1,…,n, |
(14.15) |
where a1,…, am, c ∈ ℝn and b1,…, bm ∈ ℝ. Since
∇f(x) = c,∇gi(x) = − eifor i = 1,…,n,∇hi(x) = ajfor j = 1,…,m,
the KKT conditions associated with (14.15) are given by
c − n∑i = 1viei + m∑j = 1wjaj = 0,xivi = 0for i = 1,…,n,vi≥0for i = 1,…,n,ajTx = bfor j = 1,…,m,xi≥0for i = 1,…,n.
The above system can be written more compactly as follows:
Ax = b,x≥0,(a)ATw + c = v,v≥0,(b)xTv = 0,(c)
where A is an m × n matrix whose j-th row is aj, where j = 1,…, m. Readers who are familiar with the theory of linear programming will immediately recognize that (a) is primal feasibility, (b) is dual feasibility, and (c) is complementarity. In particular, when we apply Theorem 14.2.4 to Problem (14.15), we obtain the strong duality theorem of linear programming.
Given an optimization problem P (the primal problem), we can associate with it a dual problem whose properties are closely related to those of P. To begin our investigation, consider the following primal problem:
v∗p = inff(x)subject togi(x)≤0for i = 1,…,m1,(P)hj(x) = 0for j = 1,…,m2,x∈X.
Here, f, g1,⋯gm1, h1,⋯hm2:ℝn→ℝ are arbitrary functions, and X is an arbitrary nonempty subset of ℝn. For the sake of brevity, we shall write the first two sets of constraints in (P) as g(x) ≤ 0 and h(x) = 0, where g:ℝn→ℝm1 is given by g(x) = (g1(x),⋯,gm1(x)) and h:ℝn→ℝm2 is given by (h1(x),⋯,hm2(x)).
Now, the Lagrangian dual problem associated with (P) is the following problem:
v∗d = supθ(u,v)≡infx∈XL(x,u,v)(D)subject tou≥0.
Here, L:ℝn × ℝm1 × ℝm2→ℝ is the Lagrangian function given by
L(x,u,v) = f(x) + m∑i = 1uigi(x) + m2∑j = 1uihi(x) = f(x) + uTg(x) + vTh(x). |
(14.16) |
Observe that the above formulation can be viewed as a penalty function approach, in the sense that we incorporate the primal constraints g(x) ≤ 0 and h(x) = 0 into the objective function of (D) using the Lagrange multipliers u and v. Also, since the set X is arbitrary, there can be many different Lagrangian dual problems for the same primal problem, depending on which constraints are handled as g(x) ≤ 0 and h(x) = 0, and which constraints are treated by X. However, different choices of the Lagrangian dual problem will in general lead to different outcomes, both in terms of the dual optimal value as well as the computational efforts required to solve the dual problem.
Let us now investigate the relationship between (P) and (D). For any x̄ ∈ X and (ˉu,ˉv)∈ℝm1 + × ℝm2, we have
infx∈XL(x,ˉu,ˉv)≤f(ˉx) + ˉuTg(ˉx) + ˉvTh(ˉx)≤supu≥0L(ˉx,u,v).
This implies that
(14.17) |
In particular, we have the following weak duality theorem, which asserts that the dual objective value is always a lower bound on the primal objective value:
Theorem 14.2.6. (Weak Duality) Let x̄ be feasible for (P) and (ū, v̄) be feasible for (D). Then, we have θ(ū, v̄) ≤ f (x̄). In particular, if υ*d=+∞, then (P) has no feasible solution.
Given the primal–dual pair of problems (P) and (D), the duality gap between them is defined as Δ=υ*p−υ*d. By Theorem 14.2.6, we always have Δ ≥ 0. It would be nice to have Δ = 0 (i.e., zero duality gap). However, as the following example shows, this is not true in general.
Example 14.2.5. Consider the following problem from [1, Example 6.2.2]:
(14.18) |
where X ∈ ℝ2 is the following discrete set:
X = {(0,0),(0,4),(4,4)(4,0),(1,2),(2,1)}.
By enumeration, we see that the optimal value of (14.18) is −3, attained at the point (x1, x2) = (2,1). Now, one can verify that the Lagrangian function is given by
θ(v) = minx∈X{ − 2x1 + x2 + v(x1 + x2 − 3)} = { − 4 + 5vfor v≤ − 1, − 8 + vfor − 1≤v≤2, − 3vfor v≥2.
It follows that maxv θ(ν) = −6, which is attained at v = 2. Note that the duality gap in this example is Δ = −3 − (−6) = 3 > 0.
The above example raises the important question of when the duality gap is zero. It turns out that there is a relatively simple answer to this question. Before we proceed, let us introduce the following definition:
Definition 14.2.1. We say that (x̄, ū, v̄) is a saddle point of the Lagrangian function L defined in (14.16) if the following conditions are satisfied:
1. x̄ ∈ X,
2. ū ≥ 0, and
3. for all x ∈ X and (u,v)∈ℝm1 × ℝm2, we have
L(ˉx,u,v)≤L(ˉx,ˉu,ˉv)≤L(x,ˉu,ˉv).
In particular, observe that (x̄, ū, v̄) is a saddle point of L if x̄ minimizes L over X when (u, v) is fixed at (ū, v̄), and that (ū, v̄) maximizes L over all ℝm1 × ℝm2 with u ≥ 0 when x is fixed at x̄.
We are now ready to state the following theorem:
Theorem 14.2.7. (Saddle Point Optimality Conditions) The point (x̄, ū, v̄) with x̄ ∈ X and ū ≥ 0 is a saddle point of L iff
1. L(x̄, ux̄, vx̄) = minx∈X L(X, ū, v̄),
2. g(x̄) ≤ 0 and h(x̄) = 0, and
3. ūTg(x̄) = 0.
Moreover, the point (x̄, ū, v̄) is a saddle point of L iff x̄ and (ū, v̄) are the optimal solutions to (P) and (D), respectively, with f (x̄) = θ(ū, v), i.e., there is no duality gap.
In other words, the existence of a saddle point (x̄, ū, v̄) of L implies that
infx∈XL(x,ˉu,ˉv) = L(ˉx,ˉu,ˉv) = supu≥0L(ˉx,u,v),
which in turn implies that
supu≥0infx∈XL(x,u,v) = infx∈Xsupu≥0L(x,u,v),
i.e., inequality (14.17) holds with equality, and υ*p=υ*d.
Now, if we want to apply Theorem 14.2.7 to certify that the duality gap between (P) and (D) is zero, we need to produce a saddle point of the Lagrangian function L, which is not always an easy task. The following theorem, which is an application of Sion’s minimax theorem [7] (see [8] for an elementary proof), provides an easy-to-check sufficient condition for certifying zero duality gap.
Theorem 14.2.8. Let L be the Lagrangian function defined in (14.16). Suppose that
1. X is a compact convex subset of ℝn,
2. (u, v) ↦ L (x, u, v) is continuous and concave on ℝm1 + × ℝm2 for each x ∈ X, and
3. x ↦ L (x, u, v) is continuous and convex on X for each (u,v)∈ℝm1 + × ℝm2.
Then we have.
supu≥0infx∈XL(x,u,v) = infx∈Xsupu≥0L(x,u,v).
Let us now illustrate some of the above results with an example.
Example 14.2.6. (Semidefinite Programming) Consider the following standard form semidefinite programming (SDP):
inff(Z)≡tr(CZ),subject tohi(Z)≡bj − tr(AjZ) = 0 for j = 1,…,m,Z∈X≡Sn + , |
(14.19) |
where C, A1,…, Am ∈ ℝn × n are symmetric matrices, b1,…, bm ∈ ℝ and Sn + is the set of n × n symmetric positive semidefinite matrices. The Lagrangian dual associated with (14.19) is given by
(14.20) |
Now, for any fixed v ∈ ℝm, we have
(14.21) |
To see this, let UΛUT be the spectral decomposition of C − ∑mj = 1υjAj, and suppose that Λii < 0 for some i = 1,…, n. Consider the matrix Z(α) = αUeieiTU. Clearly, we have Z(α)∈Sn + for all α > 0. Moreover, as α → ∞, we have
tr((C − m∑j = 1vjAj)Z(α)) = α⋅tr((UΛUT)(UeieiTUT)) = α⋅tr(ΛeieiT) = αΛii→ − ∞,
whence
θ(v) = bTv + infZ∈Sn + tr((C − m∑j = 1vjAj)Z) = − ∞.
On the other hand, if C − ∑mj − 1υjAj∈Sn + , then we have tr ((C − ∑mj = 1υjAj)Z)≥0 for any Z∈Sn + . It follows that θ(ν) = bTv in this case (by taking, say, Z = 0).
Now, using (14.21), we see that (14.20) is equivalent to
(14.22) |
which is known as a dual standard form SDP.
In the past decade optimization techniques, especially convex optimization techniques, have been widely used in various engineering fields such as industrial engineering, mechanical engineering, and electrical engineering. For electrical engineering in particular, optimization techniques have been applied to solve problems in communications [9, 10, 11, 12, 13, 14], networking [15, 16, 17, 18, 19], signal processing [20, 22], and even circuit design [23]. In this section, we briefly go through several examples in communications, networking, and signal processing to illustrate how we could apply the results introduced in the previous section to solve real-world problems.
Example 14.3.1. (Power Allocation Optimization in Parallel AWGN Channels) Consider the transmission over n parallel AWGN channels. The ith channel, i ∈ {1,…, n}, is characterized by the channel power gain, hi ≥ 0, and the additive Gaussian noise power, σi > 0. Let the transmit power allocated to the ith channel be denoted by pi ≥ 0. The maximum information rate that can be reliably transmitted over the ith channel is given by [24]
(14.23) |
Given a constraint P on the total transmit power over n channels, i.e., ∑ni = 1pi≤P, we want to optimize the allocated power p1,…,pn such that the sum rate of n channels, ∑ni = 1ri, is maximized. This problem is thus formulated as
maximize∑ni = 1log(1 + hipiσi)subject to∑ni = 1pi≤P,pi≥0 for i = 1,…,n. |
(14.24) |
For convenience, we rewrite the above problem equivalently as
maximizef(p)≡ − ∑ni = 1log(1 + hipiσi)subject toh(p)≡∑ni = 1pi − P≤0,gi(p)≡ − pi≤0 for i = 1,…,n,p∈ℝn, |
(14.25) |
where p = [p1,…,pn]T. It is easy to verify that f(p) is convex, and h(p), g1(p),…,gn(p) are all affine and thus convex. According to Theorem 14.2.5, if we can find a set of feasible solutions p̄ = [p̄1,…, p̄n] ∈ ℝn for the above constrained minimization problem as well as a set of u ≥ 0 and vi ≥ 0, i = 1,…, n such that the following KKT conditions are satisfied,
∇f(ˉp) + u∇h(ˉp) + n∑i = 1vi∇gi(ˉp) = 0,(a)uh(ˉp) = 0,(b)vigi(ˉp) = 0,for i = 1,…,n,(c) |
(14.26) |
then we can claim that p̄ is a global minimum for this problem. Suppose that u > 0. From (b), it follows that h(p̄) = 0, i.e., ∑ni = 1ˉpi = P. From (a), it follows that
(14.27) |
Suppose that p̄i > 0. From (c), it follows that vi = 0. Then from (14.27), it follows that ˉpi = 1u − σihi>0. Clearly, if this inequality holds, the corresponding p̄i will satisfy both (a) and (c). Otherwise, the preassumption of p̄i > 0 cannot be true and the only feasible value for p̄i is p̄i = 0. In this case, since 1u − σihi≤0, we can always find a vi ≥ 0 such that p̄i = 0 holds in (14.27). To summarize, for any u > 0, the set of feasible values for p̄i that satisfy both (a) and (c) are given by
(14.28) |
where (x)+ = max(0, x) for x ∈ ℝ. Furthermore, recall that this set of p̄i’s needs to satisfy ∑ni = 1ˉpi = P, i.e.,
(14.29) |
Note that for any P > 0, in the above equation there exists a unique positive root of u (which can be found numerically by a simple bisection search over the interval 0 < u <maxi(hi/σi)). With the root of u, the corresponding p̄i’s given in (14.28) satisfy all the KKT conditions in (a), (b), and (c), and are thus the global optimal solutions for Problem (14.25). It is worth noting that the structure for the optimal power allocation in (14.28) is known as the “water-filling” solution [24].
Example 14.3.2. (Transmit Optimization for MIMO AWGN Channels with Per-Antenna Power Constraints) Consider the transmission over a MIMO AWGN channel with n transmitting antennas and m receiving antennas. The propagation channel from the transmitter to the receiver is represented by a real matrix, H ∈ ℝm×n, in which all the columns are assumed to be “non-empty”, i.e., there is at least one element in each column being non-zero. The additive noises at m receiving antennas are assumed to be i.i.d. Gaussian random variables with zero mean and unit variance. The transmit signals from the ith antenna, i ∈ {1,…, n}, are denoted by xi(t) ∈ ℝ, t = 0, 1,…, which are subject to a per-antenna average power constraint Pi, i.e., E {(xi(t))2 } ≤ Pi, where E {·} denotes the expectation. Let Z∈Sn + denote the transmit covariance matrix, i.e., Z = E {x(t) (x(t))T}, where x(t) = [x1(t),…,xn(t)]T. The set of per-antenna transmit power constraints can then be expressed as
(14.30) |
where Ai ∈ ℝn×n is a matrix with all zero elements expect for the ith diagonal element being one.
For any transmit covariance matrix Z∈Sn + , the maximum transmit rate over the MIMO AWGN channel is given by [25]
(14.31) |
where I denotes an identity matrix. The problem of our interest here is to maximize the rate r over Z∈Sn + subject to the set of per-antenna transmit power constraints, which can be equivalently formulated as
v∗p = minimizef(Z)≡ − log det(I + HZHT)subject togi(Z)≡tr(AiZ) − Pi≤0 for i = 1,…,n,Z∈Sn + . |
(14.32) |
In the following, we apply the Lagrangian duality to solve the above problem. The Lagrangian function for this problem is given by
L(Z,u) = f(Z) + n∑i = 1uigi(Z) = − log det(I + HZHT) + n∑i = 1ui(tr(AiZ) − Pi), |
(14.33) |
where u = [u1,⋯,un]T∈ℝn + . The Lagrangian dual problem associated with problem (14.32) is then given by
(14.34) |
It can be verified that the conditions listed in Theorem 14.2.8 are all satisfied for the Lagrangian function L(Z, u) given in (14.33). We thus conclude that v*p = v*d, i.e., the duality gap for Problem (14.32) is zero. Accordingly, we can solve this problem equivalently by solving its dual problem (14.34), as shown next.
First, we solve the minimization problem in (14.34) to obtain the dual function θ(u) for any given u ≥ 0. Observe that θ(u) can be explicitly written as
θ(u) = minZ∈Sn + − log det(I + HZHT) + tr(AuZ) − n∑i = 1uiPi |
(14.35) |
where Au = ∑ni = 1uiAi is a diagonal matrix with the ith diagonal element equal to ui, i = 1,…,n. Note that for the minimization problem in the above, the optimal solution for Z is independent of the term ∑ni = 1uiPi, which thus can be ignored. To solve this minimization problem, we first observe that if any diagonal element in Au, say, ui, i ∈ {1,…,n}, is equal to zero, then the minimum value for this problem becomes −∞, which is attained by, e.g., taking Z = α1i1Ti, where 1i denotes an n × 1 vector with all zero elements except for the ith element being one, and letting α → ∞. Next, we consider the case where all ui’s are greater than zero. In this case, Au is full-rank and thus its inverse exists. By defining a new variable ˉZ = A1/2uZA1/2u∈Sn + and using the fact that tr(AB) = tr(BA), the minimization problem in (14.35) can be rewritten as
(14.36) |
Let the SVD of HA − 1/2u be denoted by
(14.37) |
where U ∈ ℝm × m and V ∈ ℝn × n are unitary matrices, and Λ ∈ ℝm×n is a diagonal matrix with the diagonal elements being denoted by λ1,…,λk, k = min(m, n), and λi ≥ 0, i = 1,…, k. Substituting (14.37) into (14.36) and using the fact that log det (I + AB) = log det (I + BA) yield
(14.38) |
By letting Ẑ = VTZ̄V and using the fact that tr (Z̄)→ = tr (Z̄), we obtain an equivalent problem of (14.38) as
(14.39) |
Recall the Hadamard’s inequality [24], which states that for any X∈Sm + , det (X)≤∏mi=1Xii, iff X is a diagonal matrix, where Xii denotes the ith diagonal element of X, i = 1,…, m. Applying this result to Problem (14.39), it follows that the minimum value for this problem is attained iff Ẑ is a diagonal matrix. Let the diagonal elements of Ẑ be denoted by p1,…,pn. Since ˆZ∈Sn + , Problem (14.39) can be simplified as
minimize − ∑ni = 1log(1 + λ2ipi) + ∑ni = 1pisubject topi≥0 for i = 1,…,n. |
(14.40) |
Note that in the above problem, for convenience we have assumed that λi = 0, for i = k + 1,…, n. Similar to Exercise 14.3.1, the global minimum for the above problem can be shown to be the following water-filling solution:
(14.41) |
To summarize, for any given u > 0, the optimal solution for the minimization problem in (14.35) is given by
(14.42) |
where ˆZ is a diagonal matrix with the diagonal elements given in (14.41). Moreover, the dual function θ(u) in (14.35) can be simplified to be
θ(u){ − ∑ki = 1(log(λ2i)) + + ∑ki = 1(1 − 1λ2i) + − ∑ni = 1uiPi − ∞if u>0otherwise, |
(14.43) |
where λ1,…, λk are related to u via (14.37).
Next, we solve the dual problem (14.34) by maximizing the dual function θ(u) in (14.43) over u ≥ 0. The corresponding dual optimal solution of u then leads to the optimal solution of Zu in (14.42) for the primal problem (14.32). Since v*d = v*p ≥ 0, in fact we only need to consider the maximization of θ(u) over u > 0 in (14.43). However, due to the coupled structure of λi’s and ui’s shown in (14.37), it is not evident whether θ(u) in (14.43) is differentiable over ui’s for u > 0. As a result, conventional decent methods to find the global minimum for differentiable convex functions such as Newton’s method are ineffective for our problem at hand. Thus, we resort to an alternative method, known as subgradient based method, to handle the non-differentiable function θ(u). First, we introduce the definition of subgradient for an arbitrary real-valued function z(x) defined over a nonempty convex set S ∈ ℝn. We assume that z(x) has a finite maximum. However, z(x) need not be continuously differentiable nor have an analytical expression for its differential. In this case, a vector v ∈ ℝn is called the subgradient of z(x) at point x = x0 if for any x ∈ S, the following inequality holds:
(14.44) |
If at any point x ∈ S a corresponding subgradient v for z(x) is attainable, then the maximum of z(x) can be found via an iterative search over x ∈ S based on v (see, e.g., the ellipsoid method [26]). Since 0(u) is defined over a convex set u > 0 and has a finite maximum, the dual problem (14.34) can thus be solved by a subgradient based method. Next, we show that the subgradient of θ(u) at any point u > 0 is given by [tr (A1Zu) − P1,…, tr (AnZu) − Pn]T, where Zu is given in (14.42). Suppose that at any two points u > 0 and u′ > 0, θ(u) and θ(u′) are attained by Z = Zu and Z = Z′u, respectively. Then, we have the following inequalities:
θ(u′) = L(Z′u,u′) = minz∈Sn + L(Z,u′) ≤L(Zu,u′) = − log det(I + HZuHT) + [tr(A1Zu) − P1,…,tr(AnZu) − Pn]u′ = − log det(I + HZuHT) + [tr(A1Zu) − P1,…,tr(AnZu) − Pn]u + [tr(A1Zu) − P1,…,tr(AnZu) − Pn](u′ − u) = L(Zu,u)[tr(A1Zu) − P1,…,tr(AnZu) − Pn](u′ − u) = θ(u)[tr(A1Zu) − P1,…,tr(AnZu) − Pn](u′ − u),
from which the subgradient of θ(u) follows.
Last, we can verify that the optimal primal and dual solutions, Zu given in (14.42) and the corresponding u > 0 satisfy (a) of the following KKT conditions:
∇f(Zu) + n∑i = 1ui∇gi(Zu) = 0, (a)uigi(Zu) = 0 for i = 1,…,n, (b) |
(14.45) |
while since u > 0, from (b) it follows that gi(Zu) = 0, i.e., tr(Ai Zu) = Pi must hold for i = 1,…,n. Thus, all transmit antennas should transmit with their maximum power levels with the optimal transmit covariance matrix Zu, which is consistent with the observation that the subgradient of the dual function θ(u) at the optimal dual solution of u should vanish to 0.
Example 14.3.3. (Power Efficient Beamforming in Two-Way Relay Network via SDP Relaxation) In this example, we illustrate how an originally nonconvex problem can be solved via convex techniques. As shown in Figure 14.1, we consider a two-way relay channel (TWRC) consisting of two source nodes, S1 and S2, each with a single antenna and a relay node, R, equipped with M antennas, M ≥ 2. It is assumed that the transmission protocol of TWRC uses two consecutive equal-duration time slots for one round of information exchange between S1 and S2 via R. During the first time slot, both S1 and S2 transmit concurrently to R, which linearly processes the received signal and then broadcasts the resulting signal to S1 and S2 during the second time slot. It is also assumed that perfect synchronization has been established among S1, S2, and R prior to data transmission. The received baseband signal at R in the first time slot is expressed as
(14.46) |
where yR(n) ∈ ℂM is the received signal vector at symbol index n, n = 1,…,N, with N denoting the total number of transmitted symbols during one time slot; h1 ∈ ℂM and h2 ∈ ℂM represent the channel vectors from S1 to R and from S2 to R, respectively, which are assumed to be constant during the two time slots; s1(n) and s2(n) are the transmitted symbols from S1 and S2, respectively, with E {|s1(n)|} = 1, E {|s2(n)|} = 1, and |·| denoting the absolute value for a complex number; p1 and p2 denote the transmit powers of S1 and S2, respectively; and zR(n) ∈ ℂM is the receiver noise vector, independent over n, and without loss of generality, it is assumed that zR(n) has a circular symmetric complex Gaussian (CSCG) distribution with zero mean and identity covariance matrix, denoted by zR(n) ~ CN(0, I), ∀n. Upon receiving the mixed signal from S1 and S2, R processes it with amplify-and-forward (AF) relay operation, also known as linear analogue relaying, and then broadcasts the processed signal to S1 and S2 during the second time slot. Mathematically, the linear processing (beamforming) operation at the relay can be concisely represented as
(14.47) |
where xR(n) ∈ ℂM is the transmitted signal at R, and A ∈ ℂM × M is the relay processing matrix.
Note that the transmit power of R can be shown equal to
pR(A) = E[tr(xR(n)xHR(n))] = ‖Ah1‖22p1 + ‖Ah2‖22p2 + tr(AAH). |
(14.48) |
We can assume w.l.o.g. that channel reciprocity holds for TWRC during uplink and downlink transmissions, i.e., the channels from R to S1 and S2 during the second time slot are given as hT1 and hT2, respectively. Thus, the received signals at S1 can be written as
y1(n) = hT1xR(n) + z1(n) = hT1Ah1√p1s1(n) + hT1Ah2√p2s2(n) + hT1AzR(n) + z1(n) |
(14.49) |
for n = 1,…, N, where z1(n)’s are the independent receiver noise samples at S1, and it is assumed that z1(n) ~ CN(0, 1), ∀n. Note that on the right-hand side of (14.49), the first term is the self-interference of S1, while the second term contains the desired message from S2. Assuming that both hT1Ah1 and hT1Ah2 are perfectly known at S1 via training-based channel estimation prior to data transmission, S1 can first subtract its self-interference from y1(n) and then coherently demodulate s2(n). The above practice is known as analogue network coding (ANC). From (14.49), subtracting the self-interference from y1(n) yields
(14.50) |
where ˜h21 = hT1Ah2, and ˜z1(n)~CN(0,‖AHh*1‖22 + 1), where * denotes the complex conjugate. From (14.50), for a given A, the maximum achievable SNR for the end-to-end link from S2 to S1 via R, denoted by γ21, is given as
(14.51) |
Similarly, it can be shown that the maximum SNR ϓ12 for the link from S1 to S2 via R is given as
(14.52) |
Now we minimize the relay transmission power given in (14.48), under the constraints that the achievable SNRs ϓ21 and ϓ12 over the two directions are above two target values, ϓ̄1 and ϓ̄2. As such, the optimization can be formulated as
minimizeApR: = ‖Ah1‖22p1 + ‖Ah2‖22p2 + tr(AAH)subject to|hT1Ah2|2≥ˉγ1p2‖AHh∗1‖22 + ˉγ1p2|hT2Ah1|2≥ˉγ2p1‖AHh∗2‖22 + ˉγ2p1, |
(14.53) |
For the convenience of analysis, we further modify the above problem as follows. First, let Vec(Q) be a K2 x 1 vector associated with a K × K square matrix Q = [q1,…, qK ]T, where qk ∈ ℂK ,k = 1,…, K, by the rule of Vec(Q) = [qT1,⋯,qTK]T. Next, with b = Vec(A) and Θ = p1h1hH1 + p2h2hH2 + I, we can express pR in the objective function of (14.53) as pR = tr(AΘAH) = ‖Φb‖22, with Φ = (diag(ΘT,ΘT))12, where diag(A, B) denotes a block-diagonal matrix with A and B as the diagonal square matrices. Similarly, let f1 = Vec(h1hT2) and f2 = Vec(h2hT1). Then, from (14.53) it follows that |hT1Ah2|2 = |fT1b|2 and |hT2Ah1|2 = |fT2b|2. Furthermore, by defining
hi = [hi(1,1)0hi(2,1)00hi(1,1)0hi(2,1)],i = 1,2,
we have ‖AHh*i‖22 = ‖hib‖22,, i=1,2. Using the above transformations, (14.53) can be rewritten as
minimizebpR: = ‖Φb‖22subject to|fT1b|2≥ˉγ1p2‖h1b‖22 + ˉγ1p2|fT2b|2≥ˉγ2p1‖h2b‖22 + ˉγ2p1. |
(14.54) |
The above problem can be shown to be still nonconvex. However, in the following, we show that the exact optimal solution could be obtained via a relaxed SDP problem.
We first define E0 = ΦHΦ, E1 = p2˜γ1fT1 − hH1h1 and E2 = p2˜γ2f*2fT2 − hH2h2. Since standard semidefinite programming (SDP) formulations only involve real variables and constants, we introduce a new real matrix variable as X = [bR; bI] × [bR; bI]T, where bR = Re(b) and bI = Im (b) are the real and imaginary parts of b, respectively. To rewrite the norm representations at (14.54) in terms of X, we need to rewrite E0, E1, and E2, as expanded matrices F0, F1, and F2, respectively, in terms of their real and imaginary parts. Specifically, to write out F0, we first define the short notations ΦR = Re(Φ) and ΦI = Im(Φ); then we have
F0 = [ΦTRΦR + ΦTIΦIΦTIΦR − ΦTRΦIΦTRΦI − ΦTIΦRΦTRΦR + ΦTIΦI].
The expanded matrices F1 and F2 can be generated from E1 and E2 in a similar way, where the two terms in E1 or E2 could first be expanded separately then summed together.
As such, problem (14.54) can be equivalently rewritten as
minimizeXpR: = tr(F0X)subject totr(F1X)≥1,tr(F2X)≥1, X≽0,rank(X) = 1. |
(14.55) |
The above problem is still not convex given the last rank-one constraint. However, if we remove such a constraint, this problem is relaxed into a convex SDP problem as shown below.
(14.56) |
Given the convexity of the above SDP problem, the optimal solution could be efficiently found by various convex optimization methods. Note that SDP relaxation usually leads to an optimal X for problem (14.56) that is of rank r with r ≥ 1, which makes it impossible to reconstruct the exact optimal solution for Problem (14.54) when r > 1. A commonly adopted method in the literature to obtain a feasible rank-one (but in general suboptimal) solution from the solution of SDP relaxation is via “randomization” (see, e.g., [27] and references therein). Fortunately, we show in the following that with the special structure in Problem (14.56), we could efficiently reconstruct an optimal rank-one solution from its optimal solution that could be of rank r with r > 1, based on some elegant results derived for SDP relaxation in [28]. In other words, we could obtain the exact optimal solution for the nonconvex problem in (14.55) without losing any optimality, and as efficiently as solving a convex problem.
Theorem 14.3.1. Assume that an optimal solution X* of rank r > 1 has been found for Problem (14.56), we could efficiently construct another feasible optimal solution X** of rank one, i.e., X** is the optimal solution for both (14.55) and (14.56).
Proof: Please refer to [21].
Note that the above proof is self-constructive, based on which we could easily obtain a routine to obtain an optimal rank-one solution for Problem (14.55) from X*. Then we could map the solution back to obtain an optimal solution for the problem in (14.53).
Exercise 14.4.1. Please indicate whether the following sets are convex or not.
1. {x:aTx − bcTx + d≤1;cTx+d<0} ;
2. {X : Ax = b, ‖x‖2 = 1};
3. {X : X11a0 + X22a1 ⪰ 0, a0 ∈ Sn, a1 ∈ Sn}; (Xij stands for the ijth element in matrix X)
4. {X : aTXa = 1}
Exercise 14.4.2. Please indicate whether the following functions are convex or concave or neither.
1. f(x) = supw{log∑ni = 1exiw};
2. f (x) = −(x1x2x3)1/3, x > 0;
3. f (X) = lo gdet (ATXA) ; X ≻ 0;
4. f(x) = xTAx+2x − 5,A = [0110].
Exercise 14.4.3. With the following problem formulation, answer the followup questions.
minimize − (x1 + x2)subject to‖a1x‖2≤1,‖a2x − b2‖2≤1,where x = [x1,x2]T,a1 = a2 = [1001],and b2 = [1, 0]T.
where f(x) = xTAx + 2x − 5,A = [0110], and b2 = [1,0]T.
1. Is this problem convex?
2. Does Slater’s constraint condition hold?
3. What is the optimal solution for this problem? (Hint: Try to solve this problem graphically if the KKT conditions are hard to solve.)
4. What is the optimal objective value for the dual problem?
5. What is the optimal value for the dual variable associated with the second constraint?
Exercise 14.4.4. Given the optimization problem shown in Exercise 14.4.3, please reformulate it as a semidefinite programming (SDP) problem, then derive the dual problem of the resulting SDP problem.
Exercise 14.4.5. With the following optimization problem, answer the followup questions.
maximizePn∑i = 1log(1 + Piδi)subject ton∑i = 1Pi = Ptotal,P≥0,
where P = [P1,…, Pn]T, and δi > 0, i = 1,…, n.
1. Is KKT sufficient for us to get the optimal solution for the above problem?
2. Is KKT necessary for the optimal solution?
3. Please write out the KKT conditions for this problem.
4. Please solve the general form of optimal Pi’s.
5. If n = 3, δ1 = 2, δ2 = 10, δ3 = 5, and Ptotal = 10, what are the optimal Pi values?
Exercise 14.4.6. Let 1 <m <n be integers, and let A be an m × n matrix with full row rank. Furthermore, let c ∈ ℝn and Q∈Sn + + be given. Consider the following optimization problem:
(14.57) |
1. Explain why the KKT conditions are necessary and sufficient for (14.57).
2. Write down the KKT conditions associated with (14.57). Hence, express the optimal solution to (14.57) in closed form.
Exercise 14.4.7. Let f : ℝn → ℝn be a differentiable convex function. Consider the following problem:
(14.58) |
Show that x̄ ∈ ℝn is an optimal solution to (14.58) iff it satisfies the following system:
∇f(ˉx)≥0,ˉx≥0,ˉxT∇f(ˉx) = 0.
Exercise 14.4.8. This problem is concerned with finding the minimum-volume enclosing ellipsoid of a set of vectors.
1. Let u ∈ ℝ n be fixed, and define the function g:S nℝ+ g(X) = ‖Xu‖22 Find ∇g(X).
2. Let V = {v1,…, vm} ∈ ℝn be a set of vectors that span ℝn. Consider the following problem:
(14.59) |
Let X̄ be an optimal solution to (14.59) (it can be shown that such an X̄ exists). Write down the KKT conditions that X̄ must satisfy.
3. Suppose that m = n and vi = ei for i = 1,…,n, where ei is the i-th standard basis vector. Using the above result, determine the optimal solution to (14.59) and find the corresponding Lagrange multipliers.
Exercise 14.4.9. Let a ∈ ℝn, b ∈ ℝ and c ∈ ℝn be such that a, c > 0 and b > 0. Consider the following problem:
(14.60) |
1. Let u1 ∈ ℝ and u2 ∈ ℝn be the Lagrange multipliers associated with the equality and inequality constraints, respectively. Write down the KKT conditions associated with (14.60).
2. Give explicit expressions for x̄ ∈ ℝn, ū1 ∈ ℝ and ū2 ∈ ℝn such that (x̄, ū1, ū2) satisfies the KKT conditions above.
3. Is the solution X ∈ Rn found above an optimal solution to (14.60)? Explain.
[1] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms, 2nd ed., ser. Wiley-Interscience Series in Discrete Mathematics and Optimization. New York: John Wiley & Sons, Inc., 1993.
[2] D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, Massachusetts: Athena Scientific, 1999.
[3] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge: Cambridge University Press, 2004, available online at http://www.stanford.edu/~boyd/cvxbook/.
[4] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, 3rd ed., ser. International Series in Operations Research and Management Science. New York: Springer Science+Business Media, LLC, 2008, vol. 116.
[5] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge: Cambridge University Press, 1985.
[6] M. Brookes, “The Matrix Reference Manual,” 2005, available online at http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html.
[7] M. Sion, “On General Minimax Theorems,” Pacific Journal of Mathematics, vol. 8, no. 1, pp. 171–176, 1958.
[8] H. Komiya, “Elementary Proof for Sion’s Minimax Theorem,” Kodai Mathematical Journal, vol. 11, no. 1, pp. 5–7, 1988.
[9] S. Cui, A. J. Goldsmith, and A. Bahai, “Energy-constrained modulation optimization,” IEEE Transactions on Wireless Communications, vol. 4, no. 5, pp. 2349–2360, September 2005.
[10] S. Cui, M. Kisialiou, Z.-Q. Luo, and Z. Ding, “Robust blind multiuser detection against signature waveform mismatch based on second order cone programming,” IEEE Transactions on Wireless Communications, vol. 4, no. 4, pp. 1285–1291, July 2005.
[11] R. Zhang and Y.-C. Liang, “Exploiting multi-antennas for opportunistic spectrum sharing in cognitive radio networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 1, pp. 88–102, Feb. 2008.
[12] W. Yu and R. Lui, “Dual methods for nonconvex spectrum optimization of multicarrier systems,” IEEE Transactions on Communications, vol. 54, no. 7, pp. 1310–1322, July 2006.
[13] L. Zhang, R. Zhang, Y.-C. Liang, Y. Xin, and S. Cui, “On the relationship between the multi-antenna secrecy communications and cognitive radio communications,” IEEE Transactions on Communications, vol. 58, no. 6, pp. 1877–1886, June 2010.
[14] R. Zhang, S. Cui, and Y.-C. Liang, “On ergodic sum capacity of fading cognitive multiple-access and broadcast channels,” IEEE Transactions on Information Theory, vol. 55, no. 11, pp. 5161–5178, November 2009.
[15] M. Chiang, “Balancing transport and physical layers in wireless multihop networks: Jointly optimal congestion control and power control,” IEEE Journal on Selected Areas in Communications, vol. 23, no. 1, pp. 104–116, January 2005.
[16] J. Xiao, S. Cui, Z. Q. Luo, and A. J. Goldsmith, “Joint estimation in sensor networks under energy constraint,” IEEE Transactions on Signal Processing, vol. 54, no. 2, pp. 413–422, February 2005.
[17] A. So and Y. Ye, “Theory of semidefinite programming for sensor network localization,” Mathematical Programming, vol. 109, pp. 367–384, 2007.
[18] S. Cui and A. J. Goldsmith, “Cross-layer design in energy-constrained networks using cooperative MIMO techniques,” EURASIP’s Signal Processing Journal, Special Issue on Advances in Signal Processing-based Cross-layer Designs, vol. 86, pp. 1804–1814, August 2006.
[19] R. Madan, S. Cui, S. Lall, and A. Goldsmith, “Modeling and optimization of transmission schemes in energy-constrained wireless sensor networks,” IEEE/ACM Transactions on Networking, vol. 15, no. 6, pp. 1359–1372, December 2007.
[20] Z. Quan, S. Cui, H. V. Poor, and A. Sayed, “Collaborative wideband sensing for cognitive radios,” IEEE Signal Processing Magazine special issue on cognitive radios, vol. 25, no. 6, pp. 60–73, January 2009.
[21] R. Zhang, Y.-C. Liang, C.-C. Chai, and S. Cui, “Optimal beamforming for two-way multi-antenna relay channel with analogue network coding,” IEEE Journal on Selected Areas of Communications, vol. 27, no. 6, pp. 699–712, June 2009.
[22] R. Zhang and S. Cui, “Cooperative interference management with miso beamforming,” IEEE Transactions on Signal Processing, vol. 58, no. 10, pp. 5450–5458, October 2010.
[23] S. P. Boyd, S.-J. Kim, D. D. Patil, and M. A. Horowitz, “Digital Circuit Optimization via Geometric Programming,” Operations Research, vol. 53, no. 6, pp. 899–932, November 2005.
[24] T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley, 1991.
[25] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. Telecommun., vol. 10, no. 6, pp. 585–595, November 1999.
[26] R. G. Bland, D. Goldfarb, and M. J. Todd, “The ellipsoid method: a survey,” Operations Research, vol. 29, no. 6, pp. 1039–1091, November 1981.
[27] Z.-Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 24, no. 8, pp. 1426–1438, August 2006.
[28] Y. Ye and S. Zhang, “New results on quadratic minimization,” SIAM J. Optim., vol. 14, pp. 245–267, 2003.
1Recall that for a generic optimization problem minx∈S∈ℝn f(x), a point x* ∈ S is called a global minimum if f(x*) ≤ f(x) for all x ∈ S. On the other hand, if there exists an ϵ > 0 such that the point x* ∈ S satisfies f(x*) ≤ f(x) for all x ∈ S ∩ B°(x*, ϵ), then it is called a local minimum. Here, B°(x̄, ϵ) = denotes the open ball centered at x̄ ∈ ℝn of radius ϵ > 0.
2A map A : ℝm → ℝn is said to be affine if there exists an n × m matrix B and a vector d ∈ ℝn such that A(x) = Bx + d for all x ∈ ℝm.
3Let S be a nonempty convex subset of ℝn. We say that f : S → ℝ is convex at x̄ ∈ S if f(αx̄ + (1 − α)x) ≤ αf (x̄) + (1 − α)f (x) for all a ∈ (0, 1) and x ∈ S. Note that a function f: S → ℝ can be convex at a particular point x̄ ∈ S without being convex on S.
3.134.78.106