14 Unconstrained and Constrained Optimization Problems

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Shuguang Cui^‡, Anthony Man-Cho So^♮, and Rui Zhang^♮

^‡ Texas A&M University, College Station, USA

^♮ The Chinese University of Hong Kong, Hong Kong, China

^♮ National University of Singapore, Singapore

In the first section of this chapter, we will give an overview of the basic mathematical tools that are useful for analyzing both unconstrained and constrained optimization problems. In order to allow the readers to focus on the applications of these tools and not to be burdened with too many technical details, we shall state most of the results without proof. However, the readers are strongly encouraged to refer to the texts [1, 2, 3, 4] for expositions of these results and other further developments. In the second section, we provide three application examples to illustrate how we could apply the optimization techniques to solve real-world problems, with a focus on communications, networking, and signal processing. In the last section, several exercise questions are given to help the audience gain a deeper understanding of the material.

14.1 Basics of Convex Analysis

The notion of convexity plays a very important role in both the theoretical and algorithmic aspects of optimization. Before we discuss the relevance of convexity in optimization, let us first introduce the notions of convex sets and convex functions and state some of their properties.

Definition 14.1.1. Let S ∈ ℝⁿ be a set. We say that

1. S is affine if αx + (1 − α)y ∈ S whenever x, y ∈ S and α ∈ ℝ;

2. S is convex if αx + (1 — a)y ∈ S whenever x, y ∈ S and α ∈ [0,1].

Given x, y ∈ ℝⁿ and a ∈ ℝ, the vector z = αx + (1 − α)y is called an affine combination of x and y. If α ∈ [0,1], then z is called a convex combination of x and y.

Geometrically, when x and y are distinct points in ℝⁿ, the set

L = {z∈Rn:z = αx + (1 − α)y,α∈R} $L = {z \in ℝ^{n} : z = α x + (1 - α) y, α \in ℝ}$

of all affine combinations of x and y is simply the line determined by x and y; and the set

S = {z∈Rn:z = αx + (1 − α)y,α∈[0,1]} $S = {z \in ℝ^{n} : z = α x + (1 - α) y, α \in [0, 1]}$

is the line segment between x and y. By convention, the empty set 0 $0$ is convex.

It is clear that one can generalize the notion of affine (resp. convex) combination of two points to any finite number of points. In particular, an affine combination of the points x₁,…,x_k ∈ ℝⁿ is a point z = ∑ki = 1αiXi $z = \sum_{i = 1}^{k} α_{i} X_{i}$ , where ∑ki = 1αi = 1 $\sum_{i = 1}^{k} α_{i} = 1$ . Similarly, a convex combination of the points x₁,…, x_k ∈ ℝⁿ is a point z = ∑ki = 1αiXi $z = \sum_{i = 1}^{k} α_{i} X_{i}$ , where ∑ki = 1αi = 1 $\sum_{i = 1}^{k} α_{i} = 1$ and α₁,…, a_k ≥ 0.

Here are some sets in Euclidean space whose convexity can be easily established by first principles:

Example 14.1.1. (Some Examples of Convex Sets)

1. Non-negative Orthant: Rn + = {x∈Rn:x≥0} $ℝ_{+}^{n} = {x \in ℝ^{n} : x \geq 0}$ .

2. Hyperplane: H (s, c) = {x ∈ ℝⁿ : s^T x = c}.

3. Halfspaces: H⁺(s, c) = {x ∈ ℝⁿ : s^Tx ≤ c}, H⁻(s, c) = {x ∈ ℝⁿ : s^Tx ≥ c}.

4. Euclidean Ball: B(x̄, r) = {x ∈ ℝⁿ : ‖x − x̄|‖₂ ≤ r}.

5. Ellipsoid: E(x̄, Q, r) = {x ∈ ℝⁿ : (x − x̄)^TQ(x − x̄) ≤ r²}, where Q is an n × n symmetric, positive definite matrix (i.e., x^TQx ̎ 0 for all x ∈ ℝⁿ {0}), and is denoted by Q ≻ 0.?

6. Simplex: Δ = {∑ni = 0αixi:∑ni = 0αi = 1,αi≥0 for i=0,1,⋯,n} $Δ = {\sum_{i = 0}^{n} α_{i} x_{i} : \sum_{i = 0}^{n} α_{i} = 1, α_{i} \geq 0 for i =0,1, \dots, n}$ , where x₀, x₁,…,x_n are vectors in ℝⁿ such that the vectors x₁ −x₀, x₂ − x₀,…, x_n − x₀ are linearly independent (equivalently, the vectors x₀, x₁,…, x_n are affinely independent).

7. Positive Semidefinite Cone:

Sn + = {A∈Rn × n:A $S_{+}^{n} = {A \in ℝ^{n \times n} : A$ = {A ∈ ℝ^{n × n} : A is symmetric and x^TAx ≥ 0 for all x ∈ ℝⁿ} (a symmetric matrix A ∈ ℝℝ^{n × n} is said to be positive semidefinite if x^T Ax ≥ 0 for all x ∈ ℝⁿ, and is denoted by A ≳ 0).

□

Let us now turn to the notion of a convex function.

Definition 14.1.2. Let S ∈ ℝⁿ be a nonempty convex set, and let f: S → ℝ be a real-valued function.

1. We say that f is convex on S if

f(αx1 + (1 − α)x2)≤αf(x1) + (1 − α)f(x2) $f (α x_{1} + (1 - α) x_{2}) \leq α f (x_{1}) + (1 - α) f (x_{2})$

(14.1)

for all x₁, x₂ ∈ S and α ∈ [0,1]. We say that f is concave if −f is convex.

2. We say that f is strictly convex on S if

f(αx1 + (1 − α)x2)<αf(x1) + (1 − α)f(x2) $f (α x_{1} + (1 - α) x_{2}) < α f (x_{1}) + (1 - α) f (x_{2})$

for all x₁, x₂ ∈ S and α ∈ (0,1).

3. The epigraph of f is the set epi(f ) = {(x, r) ∈ S × ℝ : f (x) ≤ r}.

The relationship between convex sets and convex functions can be summarized as follows:

Proposition 14.1.1. Let f be as in Definition 14.1.2. Then, f is convex (as a function) iff epi(f) is convex (as a set in S × ℝ).

Let r ∈ ℝ be arbitrary. A set closely related to the epigraph is the so-called r—level set of f, which is defined as L(r) = {x ∈ ℝⁿ : f (x) ≤ r}. It is clear that if f is convex, then L(r) is convex for all r ∈ ℝ. However, the converse is not true, as illustrated by the function x ↦ x³. A function f: S → ℝ whose domain is convex and whose r-level sets are convex for all r ∈ ℝ is called quasi-convex.

One of the most desirable features of convexity is the following:

Proposition 14.1.2. Consider the optimization problem:

minimize f(x)subject to x∈S, $\begin{matrix} minimize f (x) \\ subject to x \in S, \end{matrix}$

where S ∈ ℝⁿ is a convex set and f : S → ℝ is convex. Then, any local minimum of f is also a global minimum¹.

Now, let S ∈ ℝⁿ be an open convex set, and let f: S → ℝ be an arbitrary function. When f has suitable degree of differentiability, we can characterize its convexity by its gradient or Hessian. Specifically, we have the following:

Theorem 14.1.1. Let S ∈ ℝⁿ be an open convex set, and let f : S → ℝ be a differentiable function on S. Then, fs convex on S iff

f(x1)≥f(x2) + (∇f(x2))T(x1 − x2) $f (x_{1}) \geq f (x_{2}) + {(\nabla f (x_{2}))}^{T} (x_{1} - x_{2})$

for all x₁, x₂ ∈ S. Furthermore, if f is twice continuously differentiable function on S, then f is convex on S iff ∇²f (x) is positive semidefinite for all x ∈ S.

Sometimes it may be difficult to verify directly from the definition whether a given function is convex or not. However, a function can often be obtained as a composition of several, more elementary functions. When each of those elementary functions is convex, it is natural to ask whether their composition is also convex. In general, the answer is no. On the other hand, here are some transformations that preserve convexity.

Theorem 14.1.2. Let S ∈ ℝⁿ be a nonempty convex set. Then, the following hold:

1. (Non-negative Combinations) Let f₁,…,f_m : S → ℝ be convex functions, and let α₁,…, α_m ≥ 0. Then, the function ∑mi = 1αifi $\sum_{i = 1}^{m} α_{i} f_{i}$ is convex on S.

2. (Pointwise Supremum) Let {f_i}_i∈I be an arbitrary family of convex functions on S. Then, the pointwise supremum f = sup_i∈I fi is convex on S.

3. (Affine Composition) Let f: ℝⁿ → ℝ be a convex function and A: ℝ^m → ℝⁿ be an affine mapping². then, the function f o A: ℝ^m → ℝⁿ given by (f ∘ A)(x) = f (A(x)) is convex on ℝ^m.

4. (Composition with an Increasing Convex Function) Let f : S → ℝ be a convex function, and let g : ℝ → ℝ be an increasing convex function. Then, the function g ∘ f : S → ℝ defined by (g ∘ f)(x) = g(f (x)) is convex on S.

5. (Restriction on Lines) Let f : S → ℝ be a function. Given x₀ ∈ S and h ∈ ℝⁿ, define the function fx0,h:R→R∪{ + ∞} $f_{x_{0}, h} : ℝ \to ℝ \cup^{} {+ \infty}$ by

f˜x0,h(t) = {f(x0 + th) + ∞if x0 + th∈S,otherwise. ${\tilde{f}}_{x_{0}, h} (t) = {\begin{cases} f (x_{0} + t h) & i f x_{0} + t h \in S, \\ + \infty & o t h e r w i s e . \end{cases}$

Then, f is convex on S iff f˜x0,h ${\tilde{f}}_{x_{0}, h}$ is convex on ℝ for any x₀ ∈ S and h ∈ ℝⁿ.

Let us now illustrate an application of Theorem 14.1.2.

Example 14.1.2. Let f: ℝ^{m × x} → ℝ₊ be given by f(X) = ‖X‖₂, where ‖·‖₂ denotes the spectral norm or largest singular value of the m × n matrix X. By the Courant-Fischer theorem (see, e.g., [5]), we have

f(X) = sup{uTXv:∥u∥2 = 1, ∥v∥2 = 1}. $f (X) = \sup {u^{T} X v : {‖ u ‖}_{2} = 1, {‖ v ‖}_{2} = 1} .$

(14.2)

Now, for each u ∈ ℝ^m and v ∈ ℝⁿ with ‖u‖₂ = ‖v‖₂ = 1, define the function f_{u, v} :ℝ^{m × n} → ℝ by

fu,v(X) = uTXv. $f_{u, v} (X) = u^{T} X v .$

Note that f_{u, v} is a convex (in fact, linear) function of X for each u, v. Hence, it follows from (14.2) that f is a pointwise supremum of a family of linear functions of X. By Theorem 14.1.2, this implies that f is convex.

14.2 Unconstrained vs. Constrained Optimization

14.2.1 Optimality Conditions for Unconstrained Optimization

One of the most fundamental problems in optimization is to derive conditions for identifying potential optimal solutions to an optimization problem. Typically, such conditions, which are known as optimality conditions, would enable us to reduce the original optimization problem to that of checking the validity of certain geometric conditions, or to that of checking the consistency of certain system of inequalities. As an illustration and to motivate our discussion, let us first consider a univariate, twice continuously differentiable function f: ℝ → ℝ. Recall from basic calculus that if x̄ ∈ ℝ is a local minimum of f, then we must have

df(x)dx|x = x¯ = 0. $\frac{d f (x)}{d x} |_{x = \bar{x}} = 0 .$

(14.3)

In other words, condition (14.3) is a necessary condition for x̄ to be a local minimum. However, it is not a sufficient condition, as an x̄ ∈ ℝ that satisfies (14.3) can be a local maximum or just a stationary point. In order to certify that x̄ is indeed a local minimum, one could check, in addition to (14.3), whether

d2f(x)dx2|x = x¯>0. $\frac{d^{2} f (x)}{d x^{2}} |_{x = \bar{x}} > 0 .$

(14.4)

In particular, condition (14.4) is a sufficient condition for x̄ to be a local minimum.

In the above discussion, conditions (14.3) and (14.4) together yield a system of inequalities whose solutions are local minima of the function f. Alternatively, they can be viewed as stating the geometric fact that there is no descent direction in a neighborhood of a local minimum. In particular, the former is an algebraic interpretation of local optimality, while the latter is a geometric interpretation. It is worth noting that each interpretation has its own advantage. Indeed, the geometric interpretation can often help us gain intuitions about the problem at hand, and the algebraic interpretation would help to make those intuitions precise. Thus, it is good to keep both interpretations in mind.

To derive optimality conditions for the local minima of a multivariate twice continuously differentiable function f : ℝⁿ → ℝ, we first recall that ∇f (x), the gradient of f at x ∈ ℝⁿ, is the direction of steepest ascent at x. Thus, if ∇f (x) ≠ 0, then starting at x, we can proceed in the direction −∇f (x) and achieve a smaller function value. More specifically, we have the following

Proposition 14.2.1. Suppose that f: ℝⁿ → ℝ is continuously differentiable at x̄ ∈ ℝⁿ. If there exists a d ∈ ℝⁿ such that ∇ (f (x̄))^T d < 0, then there exists an α₀ > 0 such that f (x̄ + αd) < f (x̄) for all a ∈ (0, α₀). In other words, d is a descent direction of f at x̄.

Using Proposition 14.2.1, we can establish the following:

Corollary 14.2.1. (First Order Necessary Condition for Unconstrained Optimization) Suppose that f : ℝⁿ → ℝ is continuously differentiable at x̄ ∈ ℝⁿ. If x̄ is a local minimum, then we have ▽f (x̄) = 0. In particular, we have = ∅.

Similar to the univariate case, even if x̄ ∈ ℝ0 satisfies ∇f (x̄) = 0, we cannot conclude that x̄ is a local minimum. For instance, consider the function f : ℝ² → ℝ given by f(x1,x2) = − x21 − (x − x2) $f (x^{1}, x^{2}) = - x_{1}^{2} - (x - x_{2})$ . Then, we have

∇f(x) = − 2(2x1 − x2,x2 − x1). $\nabla f (x) = - 2 (2 x_{1} - x_{2}, x_{2} - x_{1}) .$

In particular, the (unique) solution to ▽f (x) = 0 is x̄₁ = x̄₂ =0. However, as can be easily verified, the point (x̄₁, x̄₂) = (0,0) is a global maximum of f.

The above example shows that some extra conditions are needed in order to guarantee that a solution to the equation ∇f(x) = 0 is a local minimum of f. for instance, we have the following proposition, which states that if f is convex at x̄, then the necessary condition in Corollary 14.2.1 is also sufficient³:

Proposition 14.2.2. Suppose that f: ℝⁿ → ℝ is continuously differentiable and convex at x̄. Then, x̄ is a global minimum iff ∇f (x̄) = 0.

Alternatively, if ∇f (x̄) = 0 and ∇²f (x̄), the Hessian of f at x̄, is positive definite, then x̄ is a local minimum. Specifically, we have the following proposition,

which generalizes the corresponding result for the univariate case (cf. (14.3) and (14.4)).

Proposition 14.2.3. (Second Order Sufficient Condition for Unconstrained Optimization) Suppose that f : ℝⁿ → ℝ is twice continuously differentiable at x̄ ∈ ℝⁿ. If ∇f (x̄) = 0 and ∇²f (x̄) is positive definite, then x̄ is a local minimum.

Let us now illustrate the above results with an example.

Example 14.2.1. Let f : ℝⁿ → ℝ be defined by f(x) = 12xTQx+cTx $f (x) = \frac{1}{2} x^{T} {Qx+c}^{T} x$ , where Q ∈ Sⁿ and c ∈ ℝⁿ are given. Then, f is continuously differentiable, and we have ∇ f(x) = Qx + c and V²f(x) = Q. Now, if f is convex, or equivalently, if Q ≥ 0, then by Proposition 14.2.2, any x̄ ∈ ℝⁿ that satisfies Qx̄ + c = 0 will be a global minimum of f. Note that in this case, we cannot even conclude from Proposition 14.2.3 that x̄ is a local minimum of f, since we only have Q ⪰ 0. On the other hand, suppose that Q ≻ 0. Then, Q is invertible, and by Proposition 14.2.3, the point x̄ = − Q⁻¹c is a local minimum of f. However, since f is convex, Proposition 14.2.2 allows us to draw a stronger conclusion, namely, the point x̄ = − Q⁻¹c is in fact the unique global minimum.

14.2.2 Optimality Conditions for Constrained Optimization

After deriving optimality conditions for unconstrained optimization problems, let us turn our attention to constrained optimization problems of the form

minx∈Sf(x), $\min_{x \in S} f (x),$

(14.5)

where S is a nonempty subset of ℝⁿ. Note that due to the constraint x ∈ S, even if x̄ ∈ ℝⁿ satisfies ∇f (x̄) = 0 and ∇²f (x̄) ≻ 0, it may not be a solution to (14.5), since x̄ need not lie in 0. Similarly, a local minimum x̄ of f over S need not satisfy ∇f (x̄) = 0, since all the descent directions of f at x̄ may lead to points that do not lie in S. Thus, in order to derive optimality conditions for (14.5), we need to consider not only the set of descent directions at x̄, i.e.,

D = {d∈Rn:∇f(x¯)Td<0}, $D = {d \in ℝ^{n} : \nabla f {(\bar{x})}^{T} d < 0},$

(14.6)

but also the set of feasible directions at x̄, i.e.,

F = {d∈Rn{0}:there exists an α0>0 such that x¯ + αd∈S for all α∈(0,α0)}. $F = {d \in ℝ^{n} {0} : there exists an α_{0} > 0 such that \bar{x} + α d \in S for all α \in (0, α_{0})} .$

(14.7)

We emphasize that in order for d ∈ F, the entire open line segment {x̄ + αd : ∈ (0, α₀)} must belong to S. This is to ensure that whenever d ∈ D, one can find a feasible solution x̄′ ∈ S with f (x̄′) < f (x̄) by proceeding from x̄ in the direction d. Indeed, by Proposition 14.2.1, if d ∈ D, then there exists an α₁ > 0 such that f (x̄ + αd) < f (x̄) for all α ∈ (0, α₁). However, if x̄ + αd ∈ S for any α ∈ (0, α₁), then we cannot rule out the local minimality of x̄, even if x̄ + αd ∈ S for some α ̄ α₁.

As the following proposition shows, the sets D and F provide a necessary, and under some additional assumptions, even sufficient condition for optimality.

Proposition 14.2.4. Consider Problem (14.5). Suppose that f : ℝⁿ → ℝ is continuously differentiable at x̄ ∈ S. If x̄ is a local minimum, then we have D ∩ F = ∅. Conversely, suppose that (i) D ∩ F = ∅, (ii) f is convex at x̄, and (iii) there exists an ϵ ̄ 0 such that d = x − x̄ ∈ F for any x ∈ S ∩ B°(x̄, ϵ). Then, x̄ is a local minimum of f over S.

REMARKS: Condition (iii) is to ensure that the entire line segment {x̄ + α(x − x̄) : α ∈ [0,1]} lies in S for any x ∈ S ∩ B°(x̄, ϵ), so that d = x − x̄ ∈ F; see the remark after (14.7).

So far we have only discussed optimality conditions for a very general class of optimization problems, i.e., problems of the form (14.5). In particular, we derived a necessary condition for local optimality in terms of the sets D and F, namely that D ∩ F = ∅. However, such a condition is largely geometric, and it is not as easy to manipulate as algebraic conditions (e.g., a system of inequalities). On the other hand, as we will show below, if the feasible region has more structure, then one can circumvent such difficulty and derive algebraic optimality conditions. To begin, let us consider the following class of optimization problems:

minimizesubject tof(x)gi(x)≤0 for i = 1,…,m,x∈X, $\begin{array}{l} minimize & f (x) \\ subject to & g_{i} (x) \leq 0 f o r i = 1, \dots, m, \\ x \in X, \end{array}$

(14.8)

where f: ℝⁿ → ℝ and g_i : ℝⁿ → ℝ are continuously differentiable functions, and X is a nonempty open subset of ℝⁿ (usually we take X = ℝⁿ). We then have the following:

Proposition 14.2.5. Let S = {x ∈ X : g_i(x) ≤ 0 for i = 1,…, m} be the feasible region of problem (14.8), and let x̄ ∈ S. Define

I = {i∈{1,…,m}:gi(x¯) = 0} $I = {i \in {1, \dots, m} : g_{i} (\bar{x}) = 0}$

to be the index set for the active or binding constraints. Furthermore, define

G = {d∈Rn:∇gi(x¯)Td<0 for i∈I},G¯¯¯ = {d∈Rn{0}:∇gi(x¯)Td<0 for i∈I}. $\begin{array}{l} G = {d \in ℝ^{n} : \nabla g_{i} {(\bar{x})}^{T} d < 0 for i \in I}, \\ \bar{G} = {d \in ℝ^{n} {0} : \nabla g_{i} {(\bar{x})}^{T} d < 0 for i \in I} . \end{array}$

(14.9)

Then, we have G ∈ F ∈ G̅, where F is defined in (14.7). Moreover, if the functions g_i, where i ∈ I, are strictly convex (resp. concave) at x̄, then F = ∈ (resp. F = G̅).

Using Proposition 14.2.4 and Proposition 14.2.5, we can establish the following geometric optimality condition for (14.8):

Corollary 14.2.2. Let S be the feasible region of problem (14.8). Let x̄ ∈ S, and define I = {i ∈ {1,…, m} : g_i(x̄) = 0}. If x̄ is a local minimum, then D ∩ G = ∅, where D is defined in (14.6) and G is defined in (14.9).

The intuition behind Corollary 14.2.2 is quite straightforward. Indeed, suppose that d ∈ D ∩ G. Then, by Proposition 14.2.1, there exists an α₀ > 0 such that f (x̄ + αd) < f (x̄) and g_i(x̄ + αd) g_i(x̄) = 0 for all i ∈ I and α ∈ (0, α₀). Moreover, by the continuity of the functions g₁,…, g_m, for sufficiently small α > 0, we have g_i(x̄ + αd) > 0 for all i ∈ I. It follows that there exists an α₁ > 0 such that x̄ + αd ∈ S and f (x̄ + αd) < f (x̄) for all α ∈ (0, α₁). In other words, x is not a local minimum.

The upshot of Corollary 14.2.2 is that it allows us to derive optimality conditions for (14.8) that is more algebraic in nature. Specifically, Corollary 14.2.2, together with Farkas’ lemma, yields the following:

Theorem 14.2.1. (Karush–Kuhn–Tucker Necessary Conditions) Let x̄ ∈ S be a local minimum of problem (14.8), and let I = {i G m} : g_i(x̄) = 0} be the index set for the active constraints. Suppose that the family {∇g_i(x̄)}_i∈I of vectors is linearly independent. Then, there exist ū₁,…, ū_m ∈ ℝ such that

∇f(x¯¯) + ∑i = 1mu¯i∇gi(x¯¯)u¯igi(x¯¯)u¯i = = ≥0,00for i = 1,…,m,for i = 1,….m. $\begin{array}{r} \nabla f (\bar{x}) + \sum_{i = 1}^{m} {\bar{u}}_{i} \nabla g_{i} (\bar{x}) & = & 0, \\ {\bar{u}}_{i} g_{i} (\bar{x}) & = & 0 & f o r i = 1, \dots, m, \\ {\bar{u}}_{i} & \geq & 0 & f o r i = 1, \dots . m . \end{array}$

(14.10)

We say that x̄ ∈ ℝⁿ is a KKT point if (i) x̄ ∈ S and (ii) there exist Lagrange multipliers ū₁,…, ū_m such that (x̄, ū₁,…, ū_m) satisfies the system (14.10).

Note that if the gradient vectors of the active constraints are not linearly independent, then the KKT conditions are not necessary for local optimality, even when the optimization problem is convex. This is demonstrated in the following example.

Example 14.2.2. Consider the following optimization problem:

minimizesubject tox1(x1 − 1)2 + (x2 − 1)2≤1,(x1 − 1)2 + (x2 + 1)2≤1. $\begin{array}{l} minimize & x_{1} \\ subject to & {(x_{1} - 1)}^{2} + {(x_{2} - 1)}^{2} \leq 1, \\ {(x_{1} - 1)}^{2} + {(x_{2} + 1)}^{2} \leq 1. \end{array}$

(14.11)

Since there is only one feasible solution (i.e., (x₁, x₂) = (1, 0)), it is naturally optimal. Besides the primal feasibility condition, the KKT conditions of (14.11) are given by

[10] + 2u1[x1 − 1x2 − 1] + 2u2[x1 − 1x2 + 1] = 0,u1((x1 − 1)2 + (x2 − 1)2 − 1) = 0,u2((x1 − 1)2 + (x2 + 1)2 − 1) = 0.u1,u2≥0 $\begin{matrix} [\begin{matrix} 1 \\ 0 \end{matrix}] + 2 u_{1} [\begin{matrix} x_{1} - 1 \\ x_{2} - 1 \end{matrix}] + 2 u_{2} [\begin{matrix} x_{1} - 1 \\ x_{2} + 1 \end{matrix}] = 0, \\ u_{1} ({(x_{1} - 1)}^{2} + {(x_{2} - 1)}^{2} - 1) = 0, \\ u_{2} ({(x_{1} - 1)}^{2} + {(x_{2} + 1)}^{2} - 1) = 0. \\ u_{1}, u_{2} \geq 0 \end{matrix}$

However, it is clear that there is no solution (u_l,u₂) ≥ 0 to the above system when (x₁, x₂) = (1,0).

□

Let us now illustrate Theorem 14.2.1 with an example.

Example 14.2.3. (Optimization of a Matrix Function) Let A ≻ 0 and b > 0 be given. Consider the following problem:

minimizesubject to − log det(Z)tr(AZ)≤b,Z≻0. $\begin{array}{l} minimize & - \log \det (Z) \\ subject to & tr (A Z) \leq b, \\ Z ≻ 0. \end{array}$

(14.12)

Note that (14.12) is of the form (14.8), since we may write (14.12) as

minimizesubject to − log det(Z)tr(AZ)≤b,Z∈Sn + + , $\begin{array}{l} minimize & - \log \det (Z) \\ subject to & tr (A Z) \leq b, \\ Z \in S_{+ +}^{n}, \end{array}$

and Sn + + ∈Rn(n + 2)/2 $S_{+ +}^{n} \in ℝ^{n (n + 2) / 2}$ is an open set. Now, it is known that for X ≻ 0,

∇log det (X) = X − 1, ∇tr(AX) = A; $\nabla \log \det (X) = X^{- 1}, \nabla tr (AX) = A;$

see, e.g., [6]. Hence, the KKT conditions associated with (14.12) are given by

tr(AZ) ≤ b, Z ≻ 0, (a) − Z − 1 + uA = 0, u ≥ 0, (b)u(tr(AZ) − b) = 0, (c) $\begin{array}{r} tr (A Z) \leq b, Z ≻ 0, (a) \\ - Z^{- 1} + u A = 0, u \geq 0, (b) \\ u (tr (A Z) - b) = 0, (c) \end{array}$

Condition (a) is simply primal feasibility. Condition (c) is known as complementarity. As we shall see later, condition (b) can be interpreted as feasibility with respect to a certain dual of (14.12).

□

Note that Theorem 14.2.1 applies only to inequality-constrained optimization problems of the form (14.8). However, by extending the geometric arguments used to prove Corollary 14.2.2, one can establish similar necessary optimality conditions for optimization problems of the form

minimizesubject tof(x)gi(x)≤0hj(x) = 0x∈X,for i = 1,…,m1for j = 1,…,m2 $\begin{array}{l} minimize & f (x) \\ subject to & g_{i} (x) \leq 0 & for i = 1, \dots, m_{1} \\ h_{j} (x) = 0 & for j = 1, \dots, m_{2} \\ x \in X, \end{array}$

(14.13)

where f,g₁,…, g_m1,, h₁,…, h_m₂ : ℝⁿ → ℝ are continuously differentiable functions, and X is a nonempty open subset of ℝⁿ. Specifically, we have the following:

Theorem 14.2.2. (Karush–Kuhn–Tucker Necessary Conditions) Let S be the feasible region of Problem (14.13). Suppose that x̄ ∈ S is a local minimum of problem (14.13), with I = {i ∈ {1,…,m₁} : g_i (x̄) = 0} being the index set for the active constraints. Furthermore, suppose that x̄ is regular, i.e., the family ${\nabla_{g i} (\bar{x})}_{i \in I} \cup {\nabla h_{j} (\bar{x})}_{j = 1}^{m_{2}}$ of vectors is linearly independent. Then, there exist ${\bar{υ}}_{1}, \dots, {\bar{υ}}_{m 1} \in ℝ$ and ${\bar{w}}_{1}, \dots, {\bar{w}}_{m 2} k \in ℝ$ such that

$\begin{matrix} \nabla f (\bar{x}) + \sum_{i = 1}^{m_{1}} {\bar{v}}_{i} \nabla g_{i} (\bar{x}) + \sum_{j = 1}^{m_{2}} {\bar{w}}_{i} \nabla h_{j} (\bar{x}) & = & 0, \\ {\bar{v}}_{i} g_{i} (\bar{x}) & = & 0 & f o r i = 1, \dots, m_{1}, \\ {\bar{v}}_{i} & \geq & 0 & f o r i = 1, \dots, m_{1} . \end{matrix}$

(14.14)

As demonstrated in Exercise 14.2.2, the linear independence of the gradient vectors of the active constraints is generally needed to guarantee the existence of Lagrange multipliers. However, such a regularity condition is not always easy to check. As it turns out, there are other forms of regularity conditions, a more well-known of which is the following:

Theorem 14.2.3. Suppose that in Problem (14.13), the functions $g_{1}, \dots, g_{m_{1}}$ are convex and $h_{1}, \dots, h_{m_{2}}$ are linear. Let x̄ ∈ S be a local minimum, and let I = {i ∈ {1,…,m₁} : g_i(x̄) = 0}. If the Slater condition is satisfied, i.e., if there exists an x′ ∈ S such that g_i(x′) < 0 for all i ∈ I, then x̄ satisfies the KKT conditions (14.14).

Another setting in which the existence of Lagrange multipliers is guaranteed is the following:

Theorem 14.2.4. Suppose that in Problem (14.13), the functions $g_{1}, \dots, g_{m_{1}}$ are concave and $h_{1}, \dots, h_{m_{2}}$ are linear. Let x̄ ∈ S be a local minimum. Then, x̄ satisfies the KKT conditions (14.14).

In particular, Theorem 14.2.4 implies that when all the constraints in problem (14.13) are linear, one can always find Lagrange multipliers for any local minimum of problem (14.13).

So far we have only discussed necessary optimality conditions for constrained optimization problems. Let us now turn our attention to sufficient conditions. The following theorem can be viewed as an extension of the first order sufficient condition in Proposition 14.2.2 to the constrained setting.

Theorem 14.2.5. Suppose that in Problem (14.13), the functions $g_{1}, \dots, g_{m_{1}}$ are convex, $h_{1}, \dots, h_{m_{2}}$ are linear, and X = ℝⁿ. Let x̄ ∈ ℝⁿ be feasible for (14.13). If there exist vectors v̄ ∈ R^mi and $\bar{w} \in ℝ^{m_{2}}$ such that (x̄, v̄, w̄) satisfies the KKT conditions (14.14), then x̄ is a global minimum.

To demonstrate the usage of the above results, let us consider the following example:

Example 14.2.4. (Linear Programming) Consider the standard form linear programming (LP):

$\begin{matrix} minimize & f (x) \equiv c^{T} x \\ subject to & h_{j} (x) \equiv a_{j}^{T} x - b_{j} = 0 & for j = 1, \dots, m, \\ g_{i} (x) \equiv - x_{i} \leq 0 & for i = 1, \dots, n, \end{matrix}$

(14.15)

where a₁,…, a_m, c ∈ ℝⁿ and b₁,…, b_m ∈ ℝ. Since

$\begin{matrix} \nabla f (x) & = c, \\ \nabla g_{i} (x) & = - e_{i} & for i = 1, \dots, n, \\ \nabla h_{i} (x) & = a_{j} & for j = 1, \dots, m, \end{matrix}$

the KKT conditions associated with (14.15) are given by

$\begin{matrix} c - \sum_{i = 1}^{n} v_{i} e_{i} + \sum_{j = 1}^{m} w_{j} a_{j} & = & 0, \\ x_{i} v_{i} & = & 0 & for i = 1, \dots, n, \\ v_{i} & \geq & 0 & for i = 1, \dots, n, \\ a_{j}^{T} x & = & b & for j = 1, \dots, m, \\ x_{i} & \geq & 0 & for i = 1, \dots, n . \end{matrix}$

The above system can be written more compactly as follows:

$\begin{matrix} A x & = & b, & x & \geq 0, & (a) \\ A^{T} w + c & = & v, & v & \geq 0, & (b) \\ x^{T} v & = & 0, & (c) \end{matrix}$

where A is an m × n matrix whose j-th row is a_j, where j = 1,…, m. Readers who are familiar with the theory of linear programming will immediately recognize that (a) is primal feasibility, (b) is dual feasibility, and (c) is complementarity. In particular, when we apply Theorem 14.2.4 to Problem (14.15), we obtain the strong duality theorem of linear programming.

14.2.3 Lagrangian Duality

Given an optimization problem $P$ (the primal problem), we can associate with it a dual problem whose properties are closely related to those of $P$ . To begin our investigation, consider the following primal problem:

$\begin{array}{l} v_{p}^{*} & = & \inf & f (x) \\ subject to & g_{i} (x) \leq 0 & for i = 1, \dots, m_{1}, \\ (P) & h_{j} (x) = 0 & for j = 1, \dots, m_{2}, \\ x \in X . \end{array}$

Here, f, $g_{1}, \dots g_{m_{1}}$ , $h_{1}, \dots h_{m_{2}} : ℝ^{n} \to ℝ$ are arbitrary functions, and X is an arbitrary nonempty subset of ℝⁿ. For the sake of brevity, we shall write the first two sets of constraints in (P) as g(x) ≤ 0 and h(x) = 0, where $g: ℝ^{n} \to ℝ^{m_{1}}$ is given by $g (x) = (g_{1} (x), \dots, g_{m_{1}} (x))$ and $h: ℝ^{n} \to ℝ^{m_{2}}$ is given by $(h_{1} (x), \dots {,h}_{m_{2}} (x))$ .

Now, the Lagrangian dual problem associated with (P) is the following problem:

$\begin{array}{l} v_{d}^{*} & = & \sup & θ (u, v) \equiv \inf_{x \in X} L (x, u, v) \\ (D) & subject to & u \geq 0. \end{array}$

Here, $L : ℝ^{n} \times ℝ^{m_{1}} \times ℝ^{m_{2}} \to ℝ$ is the Lagrangian function given by

$L (x, u, v) = f (x) + \sum_{i = 1}^{m} u_{i} g_{i} (x) + \sum_{j = 1}^{m_{2}} u_{i} h_{i} (x) = f (x) + u^{T} g (x) + v^{T} h (x) .$

(14.16)

Observe that the above formulation can be viewed as a penalty function approach, in the sense that we incorporate the primal constraints g(x) ≤ 0 and h(x) = 0 into the objective function of (D) using the Lagrange multipliers u and v. Also, since the set X is arbitrary, there can be many different Lagrangian dual problems for the same primal problem, depending on which constraints are handled as g(x) ≤ 0 and h(x) = 0, and which constraints are treated by X. However, different choices of the Lagrangian dual problem will in general lead to different outcomes, both in terms of the dual optimal value as well as the computational efforts required to solve the dual problem.

Let us now investigate the relationship between (P) and (D). For any x̄ ∈ X and $(\bar{u}, \bar{v}) \in ℝ_{+}^{m_{1}} \times ℝ^{m_{2}}$ , we have

$\inf_{x \in X} L (x, \bar{u}, \bar{v}) \leq f (\bar{x}) + {\bar{u}}^{T} g (\bar{x}) + {\bar{v}}^{T} h (\bar{x}) \leq \sup_{u \geq 0} L (\bar{x}, u, v) .$

This implies that

$\sup_{u \geq 0} \inf_{x \in X} L (x, u, v) \leq \inf_{x \in X} \sup_{u \geq 0} L (x, u, v) .$

(14.17)

In particular, we have the following weak duality theorem, which asserts that the dual objective value is always a lower bound on the primal objective value:

Theorem 14.2.6. (Weak Duality) Let x̄ be feasible for (P) and (ū, v̄) be feasible for (D). Then, we have θ(ū, v̄) ≤ f (x̄). In particular, if $υ_{d}^{*} = + \infty$ , then (P) has no feasible solution.

Given the primal–dual pair of problems (P) and (D), the duality gap between them is defined as $Δ = υ_{p}^{*} - υ_{d}^{*}$ . By Theorem 14.2.6, we always have Δ ≥ 0. It would be nice to have Δ = 0 (i.e., zero duality gap). However, as the following example shows, this is not true in general.

Example 14.2.5. Consider the following problem from [1, Example 6.2.2]:

$\begin{array}{l} minimize & f (x) \equiv - 2 x_{1} + x_{2} \\ subject to & h (x) \equiv x_{1} + x_{2} - 3 = 0, \\ x \in X, \end{array}$

(14.18)

where X ∈ ℝ² is the following discrete set:

$X = {(0, 0), (0, 4), (4, 4) (4, 0), (1, 2), (2, 1)} .$

By enumeration, we see that the optimal value of (14.18) is −3, attained at the point (x₁, x₂) = (2,1). Now, one can verify that the Lagrangian function is given by

$\begin{array}{l} θ (v) = \min_{x \in X} {- 2 x_{1} + x_{2} + v (x_{1} + x_{2} - 3)} \\ = {\begin{array}{l} - 4 + 5 v & for v \leq - 1, \\ - 8 + v & for - 1 \leq v \leq 2, \\ - 3 v & for v \geq 2. \end{array} \end{array}$

It follows that max_v θ(ν) = −6, which is attained at v = 2. Note that the duality gap in this example is Δ = −3 − (−6) = 3 > 0.

The above example raises the important question of when the duality gap is zero. It turns out that there is a relatively simple answer to this question. Before we proceed, let us introduce the following definition:

Definition 14.2.1. We say that (x̄, ū, v̄) is a saddle point of the Lagrangian function L defined in (14.16) if the following conditions are satisfied:

1. x̄ ∈ X,

2. ū ≥ 0, and

3. for all x ∈ X and $(u, v) \in ℝ^{m_{1}} \times ℝ^{m_{2}}$ , we have

$L (\bar{x}, u, v) \leq L (\bar{x}, \bar{u}, \bar{v}) \leq L (x, \bar{u}, \bar{v}) .$

In particular, observe that (x̄, ū, v̄) is a saddle point of L if x̄ minimizes L over X when (u, v) is fixed at (ū, v̄), and that (ū, v̄) maximizes L over all $ℝ^{m_{1}} \times ℝ^{m_{2}}$ with u ≥ 0 when x is fixed at x̄.

We are now ready to state the following theorem:

Theorem 14.2.7. (Saddle Point Optimality Conditions) The point (x̄, ū, v̄) with x̄ ∈ X and ū ≥ 0 is a saddle point of L iff

1. L(x̄, ux̄, vx̄) = min_x∈X L(X, ū, v̄),

2. g(x̄) ≤ 0 and h(x̄) = 0, and

3. ū^Tg(x̄) = 0.

Moreover, the point (x̄, ū, v̄) is a saddle point of L iff x̄ and (ū, v̄) are the optimal solutions to (P) and (D), respectively, with f (x̄) = θ(ū, v), i.e., there is no duality gap.

In other words, the existence of a saddle point (x̄, ū, v̄) of L implies that

$\inf_{x \in X} L (x, \bar{u}, \bar{v}) = L (\bar{x}, \bar{u}, \bar{v}) = \sup_{u \geq 0} L (\bar{x}, u, v),$

which in turn implies that

$\sup_{u \geq 0} \inf_{x \in X} L (x, u, v) = \inf_{x \in X} \sup_{u \geq 0} L (x, u, v),$

i.e., inequality (14.17) holds with equality, and $υ_{p}^{*} = υ_{d}^{*}$ .

Now, if we want to apply Theorem 14.2.7 to certify that the duality gap between (P) and (D) is zero, we need to produce a saddle point of the Lagrangian function L, which is not always an easy task. The following theorem, which is an application of Sion’s minimax theorem [7] (see [8] for an elementary proof), provides an easy-to-check sufficient condition for certifying zero duality gap.

Theorem 14.2.8. Let L be the Lagrangian function defined in (14.16). Suppose that

1. X is a compact convex subset of ℝⁿ,

2. (u, v) ↦ L (x, u, v) is continuous and concave on $ℝ_{+}^{m_{1}} \times ℝ^{m_{2}}$ for each x ∈ X, and

3. x ↦ L (x, u, v) is continuous and convex on X for each $(u, v) \in ℝ_{+}^{m_{1}} \times ℝ^{m_{2}}$ .

Then we have.

$\sup_{u \geq 0} \inf_{x \in X} L (x, u, v) = \inf_{x \in X} \sup_{u \geq 0} L (x, u, v) .$

Let us now illustrate some of the above results with an example.

Example 14.2.6. (Semidefinite Programming) Consider the following standard form semidefinite programming (SDP):

$\begin{array}{l} \inf & f (Z) \equiv tr (C Z), \\ subject to & h_{i} (Z) \equiv b_{j} - tr (A_{j} Z) = 0 for j = 1, \dots, m, \\ Z \in X \equiv S_{+}^{n}, \end{array}$

(14.19)

where C, A₁,…, A_m ∈ ℝ^{n × n} are symmetric matrices, b₁,…, b_m ∈ ℝ and $S_{+}^{n}$ is the set of n × n symmetric positive semidefinite matrices. The Lagrangian dual associated with (14.19) is given by

$\sup θ (v) \equiv \inf_{Z \in S_{+}^{n}} {tr (C Z) + \sum_{j = 1}^{m} v_{j} (b_{j} - tr (A_{j} Z))} .$

(14.20)

Now, for any fixed v ∈ ℝ^m, we have

$θ (v) = {\begin{array}{l} b^{T} v & if C - \sum_{j = 1}^{m} v_{j} A_{j} \in S_{+}^{n}, \\ - \infty & otherwise . \end{array}$

(14.21)

To see this, let $U Λ U^{T}$ be the spectral decomposition of $C - \sum_{j = 1}^{m} υ_{j} A_{j}$ , and suppose that Λ_ii < 0 for some i = 1,…, n. Consider the matrix $Z (α) = α U e_{i} e_{i}^{T} U$ . Clearly, we have $Z (α) \in S_{+}^{n}$ for all α > 0. Moreover, as α → ∞, we have

$\begin{array}{l} tr ((C - \sum_{j = 1}^{m} v_{j} A_{j}) Z (α)) = α \cdot tr ((U Λ U^{T}) (U e_{i} e_{i}^{T} U^{T})) \\ = α \cdot tr (Λ e_{i} e_{i}^{T}) = α Λ_{i i} \to - \infty, \end{array}$

whence

$θ (v) = b^{T} v + \inf_{Z \in S_{+}^{n}} tr ((C - \sum_{j = 1}^{m} v_{j} A_{j}) Z) = - \infty .$

On the other hand, if $C - \sum_{j - 1}^{m} υ_{j} A_{j} \in S_{+}^{n}$ , then we have tr $((C - \sum_{j = 1}^{m} υ_{j} A_{j}) Z) \geq 0$ for any $Z \in S_{+}^{n}$ . It follows that θ(ν) = b^Tv in this case (by taking, say, Z = 0).

Now, using (14.21), we see that (14.20) is equivalent to

$\begin{array}{l} \sup & b^{T} v \\ subject to & C - \sum_{j = 1}^{m} v_{j} A_{j} \in S_{+}^{n}, \end{array}$

(14.22)

which is known as a dual standard form SDP.

14.3 Application Examples

In the past decade optimization techniques, especially convex optimization techniques, have been widely used in various engineering fields such as industrial engineering, mechanical engineering, and electrical engineering. For electrical engineering in particular, optimization techniques have been applied to solve problems in communications [9, 10, 11, 12, 13, 14], networking [15, 16, 17, 18, 19], signal processing [20, 22], and even circuit design [23]. In this section, we briefly go through several examples in communications, networking, and signal processing to illustrate how we could apply the results introduced in the previous section to solve real-world problems.

Example 14.3.1. (Power Allocation Optimization in Parallel AWGN Channels) Consider the transmission over n parallel AWGN channels. The ith channel, i ∈ {1,…, n}, is characterized by the channel power gain, h_i ≥ 0, and the additive Gaussian noise power, σ_i > 0. Let the transmit power allocated to the ith channel be denoted by p_i ≥ 0. The maximum information rate that can be reliably transmitted over the ith channel is given by [24]

$r_{i} = \log (1 + \frac{h_{i} p_{i}}{σ_{i}}) .$

(14.23)

Given a constraint P on the total transmit power over n channels, i.e., $\sum_{i = 1}^{n} p_{i} \leq P$ , we want to optimize the allocated power p₁,…,p_n such that the sum rate of n channels, $\sum_{i = 1}^{n} r_{i}$ , is maximized. This problem is thus formulated as

$\begin{array}{l} maximize & \sum_{i = 1}^{n} \log (1 + \frac{h_{i} p_{i}}{σ_{i}}) \\ subject to & \sum_{i = 1}^{n} p_{i} \leq P, \\ p_{i} \geq 0 for i = 1, \dots, n . \end{array}$

(14.24)

For convenience, we rewrite the above problem equivalently as

$\begin{array}{l} maximize & f (p) \equiv - \sum_{i = 1}^{n} \log (1 + \frac{h_{i} p_{i}}{σ_{i}}) \\ subject to & h (p) \equiv \sum_{i = 1}^{n} p_{i} - P \leq 0, \\ \begin{array}{l} g_{i} (p) \equiv - p_{i} \leq 0 for i = 1, \dots, n, \\ p \in ℝ^{n}, \end{array} \end{array}$

(14.25)

where p = [p₁,…,p_n]^T. It is easy to verify that f(p) is convex, and h(p), g₁(p),…,g_n(p) are all affine and thus convex. According to Theorem 14.2.5, if we can find a set of feasible solutions p̄ = [p̄₁,…, p̄_n] ∈ ℝⁿ for the above constrained minimization problem as well as a set of u ≥ 0 and v_i ≥ 0, i = 1,…, n such that the following KKT conditions are satisfied,

$\begin{matrix} \nabla f (\bar{p}) + u \nabla h (\bar{p}) + \sum_{i = 1}^{n} v_{i} \nabla g_{i} (\bar{p}) & = & 0, & (a) \\ u h (\bar{p}) & = & 0, & (b) \\ v_{i} g_{i} (\bar{p}) & = & 0, & for i = 1, \dots, n, & (c) \end{matrix}$

(14.26)

then we can claim that p̄ is a global minimum for this problem. Suppose that u > 0. From (b), it follows that h(p̄) = 0, i.e., $\sum_{i = 1}^{n} {\bar{p}}_{i} = P$ . From (a), it follows that

${\bar{p}}_{i} = \frac{1}{u - v_{i}} - \frac{σ_{i}}{h_{i}} for i = 1, \dots, n .$

(14.27)

Suppose that p̄_i > 0. From (c), it follows that v_i = 0. Then from (14.27), it follows that ${\bar{p}}_{i} = \frac{1}{u} - \frac{σ_{i}}{^{h_{i}}} > 0$ . Clearly, if this inequality holds, the corresponding p̄_i will satisfy both (a) and (c). Otherwise, the preassumption of p̄_i > 0 cannot be true and the only feasible value for p̄_i is p̄_i = 0. In this case, since $\frac{1}{u} - \frac{σ_{i}}{^{h_{i}}} \leq 0$ , we can always find a v_i ≥ 0 such that p̄_i = 0 holds in (14.27). To summarize, for any u > 0, the set of feasible values for p̄_i that satisfy both (a) and (c) are given by

${\bar{p}}_{i} = {(\frac{1}{u} - \frac{σ_{i}}{h_{i}})}^{+} for i = 1, \dots, n,$

(14.28)

where (x)+ = max(0, x) for x ∈ ℝ. Furthermore, recall that this set of p̄_i’s needs to satisfy $\sum_{i = 1}^{n} {\bar{p}}_{i} = P$ , i.e.,

$\sum_{i = 1}^{n} {(\frac{1}{u} - \frac{σ_{i}}{h_{i}})}^{+} = P .$

(14.29)

Note that for any P > 0, in the above equation there exists a unique positive root of u (which can be found numerically by a simple bisection search over the interval 0 < u <max_i(h_i/σ_i)). With the root of u, the corresponding p̄_i’s given in (14.28) satisfy all the KKT conditions in (a), (b), and (c), and are thus the global optimal solutions for Problem (14.25). It is worth noting that the structure for the optimal power allocation in (14.28) is known as the “water-filling” solution [24].

Example 14.3.2. (Transmit Optimization for MIMO AWGN Channels with Per-Antenna Power Constraints) Consider the transmission over a MIMO AWGN channel with n transmitting antennas and m receiving antennas. The propagation channel from the transmitter to the receiver is represented by a real matrix, H ∈ ℝ^m^×n, in which all the columns are assumed to be “non-empty”, i.e., there is at least one element in each column being non-zero. The additive noises at m receiving antennas are assumed to be i.i.d. Gaussian random variables with zero mean and unit variance. The transmit signals from the ith antenna, i ∈ {1,…, n}, are denoted by x_i(t) ∈ ℝ, t = 0, 1,…, which are subject to a per-antenna average power constraint P_i, i.e., E {(x_i(t))² } ≤ P_i, where E {·} denotes the expectation. Let $Z \in S_{+}^{n}$ denote the transmit covariance matrix, i.e., Z = E {x(t) (x(t))^T}, where x(t) = [x₁(t),…,x_n(t)]^T. The set of per-antenna transmit power constraints can then be expressed as

$tr (A_{i} Z) \leq P_{i} for i = 1, \dots, n,$

(14.30)

where A_i ∈ ℝ^n×n is a matrix with all zero elements expect for the ith diagonal element being one.

For any transmit covariance matrix $Z \in S_{+}^{n}$ , the maximum transmit rate over the MIMO AWGN channel is given by [25]

$r = l o g d e t (I + H Z H^{T}),$

(14.31)

where I denotes an identity matrix. The problem of our interest here is to maximize the rate r over $Z \in S_{+}^{n}$ subject to the set of per-antenna transmit power constraints, which can be equivalently formulated as

$\begin{matrix} v_{p}^{*} = & minimize & f (Z) \equiv - \log \det (I + H Z H^{T}) \\ subject to & g_{i} (Z) \equiv tr (A_{i} Z) - P_{i} \leq 0 for i = 1, \dots, n, \\ Z \in S_{+}^{n} . \end{matrix}$

(14.32)

In the following, we apply the Lagrangian duality to solve the above problem. The Lagrangian function for this problem is given by

$L (Z, u) = f (Z) + \sum_{i = 1}^{n} u_{i} g_{i} (Z) = - \log \det (I + H Z H^{T}) + \sum_{i = 1}^{n} u_{i} (tr (A_{i} Z) - P_{i}),$

(14.33)

where $u = {[u_{1}, \dots, u_{n}]}^{T} \in ℝ_{+}^{n}$ . The Lagrangian dual problem associated with problem (14.32) is then given by

$\begin{array}{l} v_{d}^{*} = & maximize & θ (u) \equiv \min_{Z \in S_{+}^{n}} L (Z, u) \\ subject to & u \geq 0. \end{array}$

(14.34)

It can be verified that the conditions listed in Theorem 14.2.8 are all satisfied for the Lagrangian function L(Z, u) given in (14.33). We thus conclude that v*_p = v*_d, i.e., the duality gap for Problem (14.32) is zero. Accordingly, we can solve this problem equivalently by solving its dual problem (14.34), as shown next.

First, we solve the minimization problem in (14.34) to obtain the dual function θ(u) for any given u ≥ 0. Observe that θ(u) can be explicitly written as

$θ (u) = \min_{Z \in S_{+}^{n}} - \log \det (I + H Z H^{T}) + tr (A_{u} Z) - \sum_{i = 1}^{n} u_{i} P_{i}$

(14.35)

where $A_{u} = \sum_{i = 1}^{n} u_{i} A_{i}$ is a diagonal matrix with the ith diagonal element equal to u_i, i = 1,…,n. Note that for the minimization problem in the above, the optimal solution for Z is independent of the term $\sum_{i = 1}^{n} u_{i} P_{i}$ , which thus can be ignored. To solve this minimization problem, we first observe that if any diagonal element in A_u, say, u_i, i ∈ {1,…,n}, is equal to zero, then the minimum value for this problem becomes −∞, which is attained by, e.g., taking $Z = α 1_{i} 1_{i}^{T}$ , where 1_i denotes an n × 1 vector with all zero elements except for the ith element being one, and letting α → ∞. Next, we consider the case where all u_i’s are greater than zero. In this case, A_u is full-rank and thus its inverse exists. By defining a new variable $\bar{Z} = A_{u}^{1 / 2} Z A_{u}^{1 / 2} \in S_{+}^{n}$ and using the fact that tr(AB) = tr(BA), the minimization problem in (14.35) can be rewritten as

$\min_{\bar{Z} \in S_{+}^{n}} - \log \det (I + H A_{u}^{- 1 / 2} \bar{Z} A_{u}^{- 1 / 2} H^{T}) + tr (\bar{Z}) .$

(14.36)

Let the SVD of $H A_{u}^{- 1 / 2}$ be denoted by

$H A_{u}^{- 1 / 2} = U Λ V^{T},$

(14.37)

where U ∈ ℝ^{m × m} and V ∈ ℝ^{n × n} are unitary matrices, and Λ ∈ ℝ^m^×n is a diagonal matrix with the diagonal elements being denoted by λ₁,…,λ_k, k = min(m, n), and λ_i ≥ 0, i = 1,…, k. Substituting (14.37) into (14.36) and using the fact that log det (I + AB) = log det (I + BA) yield

$\min_{\bar{Z} \in S_{+}^{n}} - \log \det (I + Λ V^{T} \bar{Z} V Λ^{T}) + tr (\bar{Z}) .$

(14.38)

By letting Ẑ = V^TZ̄V and using the fact that tr (Z̄)→ = tr (Z̄), we obtain an equivalent problem of (14.38) as

$\min_{\hat{Z} \in S_{+}^{n}} - \log \det (I + Λ \tilde{Z} Λ^{T}) + tr (\hat{Z}) .$

(14.39)

Recall the Hadamard’s inequality [24], which states that for any $X \in S_{+}^{m}$ , det $(X) \leq \prod_{i = 1}^{m} X_{i i}$ , iff X is a diagonal matrix, where X_ii denotes the ith diagonal element of X, i = 1,…, m. Applying this result to Problem (14.39), it follows that the minimum value for this problem is attained iff Ẑ is a diagonal matrix. Let the diagonal elements of Ẑ be denoted by p₁,…,p_n. Since $\hat{Z} \in S_{+}^{n}$ , Problem (14.39) can be simplified as

$\begin{array}{l} minimize & - \sum_{i = 1}^{n} \log (1 + λ_{i}^{2} p_{i}) + \sum_{i = 1}^{n} p_{i} \\ subject to & p_{i} \geq 0 for i = 1, \dots, n . \end{array}$

(14.40)

Note that in the above problem, for convenience we have assumed that λ_i = 0, for i = k + 1,…, n. Similar to Exercise 14.3.1, the global minimum for the above problem can be shown to be the following water-filling solution:

$p_{i} = {(1 - \frac{1}{λ_{i}^{2}})}^{+} f o r i = 1, \dots, n .$

(14.41)

To summarize, for any given u > 0, the optimal solution for the minimization problem in (14.35) is given by

$Z_{u} = A_{u}^{- 1 / 2} V \hat{Z} V^{T} A_{i}^{- 1 / 2},$

(14.42)

where $\hat{Z}$ is a diagonal matrix with the diagonal elements given in (14.41). Moreover, the dual function θ(u) in (14.35) can be simplified to be

$θ (u) {\begin{matrix} \begin{array}{l} - \sum_{i = 1}^{k} {(\log (λ_{i}^{2}))}^{+} + \sum_{i = 1}^{k} {(1 - \frac{1}{λ_{i}^{2}})}^{+} - \sum_{i = 1}^{n} u_{i} P_{i} \\ - \infty \end{array} & \begin{array}{l} if u > 0 \\ otherwise, \end{array} \end{matrix}$

(14.43)

where λ₁,…, λ_k are related to u via (14.37).

Next, we solve the dual problem (14.34) by maximizing the dual function θ(u) in (14.43) over u ≥ 0. The corresponding dual optimal solution of u then leads to the optimal solution of Z_u in (14.42) for the primal problem (14.32). Since v*_d = v*_p ≥ 0, in fact we only need to consider the maximization of θ(u) over u > 0 in (14.43). However, due to the coupled structure of λ_i’s and u_i’s shown in (14.37), it is not evident whether θ(u) in (14.43) is differentiable over u_i’s for u > 0. As a result, conventional decent methods to find the global minimum for differentiable convex functions such as Newton’s method are ineffective for our problem at hand. Thus, we resort to an alternative method, known as subgradient based method, to handle the non-differentiable function θ(u). First, we introduce the definition of subgradient for an arbitrary real-valued function z(x) defined over a nonempty convex set S ∈ ℝⁿ. We assume that z(x) has a finite maximum. However, z(x) need not be continuously differentiable nor have an analytical expression for its differential. In this case, a vector v ∈ ℝⁿ is called the subgradient of z(x) at point x = x₀ if for any x ∈ S, the following inequality holds:

$z (x) \leq z (x_{0}) + v^{T} (x - x_{0}) .$

(14.44)

If at any point x ∈ S a corresponding subgradient v for z(x) is attainable, then the maximum of z(x) can be found via an iterative search over x ∈ S based on v (see, e.g., the ellipsoid method [26]). Since 0(u) is defined over a convex set u > 0 and has a finite maximum, the dual problem (14.34) can thus be solved by a subgradient based method. Next, we show that the subgradient of θ(u) at any point u > 0 is given by [tr (A₁Z_u) − P₁,…, tr (A_nZ_u) − P_n]^T, where Z_u is given in (14.42). Suppose that at any two points u > 0 and u′ > 0, θ(u) and θ(u′) are attained by Z = Z_u and Z = Z′_u, respectively. Then, we have the following inequalities:

$\begin{array}{l} θ (u^{'}) = L ({Z^{'}}_{u}, u^{'}) \\ = \min_{z \in S_{+}^{n}} L (Z, u^{'}) \\ \leq L (Z_{u}, u^{'}) \\ = - \log \det (I + H Z_{u} H^{T}) + [tr (A_{1} Z_{u}) - P_{1}, \dots, tr (A_{n} Z_{u}) - P_{n}] u^{'} \\ = - \log \det (I + H Z_{u} H^{T}) + [tr (A_{1} Z_{u}) - P_{1}, \dots, tr (A_{n} Z_{u}) - P_{n}] u \\ + [tr (A_{1} Z_{u}) - P_{1}, \dots, tr (A_{n} Z_{u}) - P_{n}] (u^{'} - u) \\ = L (Z_{u}, u) [tr (A_{1} Z_{u}) - P_{1}, \dots, tr (A_{n} Z_{u}) - P_{n}] (u^{'} - u) \\ = θ (u) [tr (A_{1} Z_{u}) - P_{1}, \dots, tr (A_{n} Z_{u}) - P_{n}] (u^{'} - u), \end{array}$

from which the subgradient of θ(u) follows.

Last, we can verify that the optimal primal and dual solutions, Z_u given in (14.42) and the corresponding u > 0 satisfy (a) of the following KKT conditions:

$\begin{matrix} \nabla f (Z_{u}) + \sum_{i = 1}^{n} u_{i} \nabla g_{i} (Z_{u}) = 0, (a) \\ u_{i} g_{i} (Z_{u}) = 0 f o r i = 1, \dots, n, (b) \end{matrix}$

(14.45)

while since u > 0, from (b) it follows that g_i(Z_u) = 0, i.e., tr(A_i Z_u) = P_i must hold for i = 1,…,n. Thus, all transmit antennas should transmit with their maximum power levels with the optimal transmit covariance matrix Z_u, which is consistent with the observation that the subgradient of the dual function θ(u) at the optimal dual solution of u should vanish to 0.

Example 14.3.3. (Power Efficient Beamforming in Two-Way Relay Network via SDP Relaxation) In this example, we illustrate how an originally nonconvex problem can be solved via convex techniques. As shown in Figure 14.1, we consider a two-way relay channel (TWRC) consisting of two source nodes, S1 and S2, each with a single antenna and a relay node, R, equipped with M antennas, M ≥ 2. It is assumed that the transmission protocol of TWRC uses two consecutive equal-duration time slots for one round of information exchange between S1 and S2 via R. During the first time slot, both S1 and S2 transmit concurrently to R, which linearly processes the received signal and then broadcasts the resulting signal to S1 and S2 during the second time slot. It is also assumed that perfect synchronization has been established among S1, S2, and R prior to data transmission. The received baseband signal at R in the first time slot is expressed as

$y_{R} (n) = h_{1} \sqrt{p_{1}} s_{1} (n) + h_{2} \sqrt{p_{2}} s_{2} (n) + z_{R} (n)$

(14.46)

where y_R(n) ∈ ℂ^M is the received signal vector at symbol index n, n = 1,…,N, with N denoting the total number of transmitted symbols during one time slot; h₁ ∈ ℂ^M and h₂ ∈ ℂ^M represent the channel vectors from S1 to R and from S2 to R, respectively, which are assumed to be constant during the two time slots; s₁(n) and s₂(n) are the transmitted symbols from S1 and S2, respectively, with E {|s₁(n)|} = 1, E {|s₂(n)|} = 1, and |·| denoting the absolute value for a complex number; p₁ and p₂ denote the transmit powers of S1 and S2, respectively; and z_R(n) ∈ ℂ^M is the receiver noise vector, independent over n, and without loss of generality, it is assumed that z_R(n) has a circular symmetric complex Gaussian (CSCG) distribution with zero mean and identity covariance matrix, denoted by z_R(n) ~ $C$ $N$ (0, I), $\forall_{n}$ . Upon receiving the mixed signal from S1 and S2, R processes it with amplify-and-forward (AF) relay operation, also known as linear analogue relaying, and then broadcasts the processed signal to S1 and S2 during the second time slot. Mathematically, the linear processing (beamforming) operation at the relay can be concisely represented as

$x_{R} (n) = A y_{R} (n), n = 1, \dots, N$

(14.47)

Figure 14.1: The two-way multi antenna relay channel.

where x_R(n) ∈ ℂ^M is the transmitted signal at R, and A ∈ ℂ^{M × M} is the relay processing matrix.

Note that the transmit power of R can be shown equal to

$\begin{array}{l} p_{R} (A) = E [tr (x_{R} (n) x_{R}^{H} (n))] \\ = {‖ A h_{1} ‖}_{2}^{2} p_{1} + {‖ A h_{2} ‖}_{2}^{2} p_{2} + tr (A A^{H}) . \end{array}$

(14.48)

We can assume w.l.o.g. that channel reciprocity holds for TWRC during uplink and downlink transmissions, i.e., the channels from R to S1 and S2 during the second time slot are given as $h_{1}^{T}$ and $h_{2}^{T}$ , respectively. Thus, the received signals at S1 can be written as

$\begin{array}{l} y_{1} (n) = h_{1}^{T} x_{R} (n) + z_{1} (n) \\ = h_{1}^{T} A h_{1} \sqrt{p_{1}} s_{1} (n) + h_{1}^{T} A h_{2} \sqrt{p_{2}} s_{2} (n) + h_{1}^{T} A z_{R} (n) + z_{1} (n) \end{array}$

(14.49)

for n = 1,…, N, where z₁(n)’s are the independent receiver noise samples at S1, and it is assumed that z₁(n) ~ $C$ $N$ (0, 1), ∀n. Note that on the right-hand side of (14.49), the first term is the self-interference of S1, while the second term contains the desired message from S2. Assuming that both $h_{1}^{T} {Ah}_{1}$ and $h_{1}^{T} {Ah}_{2}$ are perfectly known at S1 via training-based channel estimation prior to data transmission, S1 can first subtract its self-interference from y₁(n) and then coherently demodulate s₂(n). The above practice is known as analogue network coding (ANC). From (14.49), subtracting the self-interference from y₁(n) yields

${\tilde{y}}_{1} (n) = {\tilde{h}}_{21} \sqrt{p_{2}} s_{2} (n) + {\tilde{z}}_{1} (n), n = 1, \dots, N$

(14.50)

where ${\tilde{h}}_{21} = h_{1}^{T} {Ah}_{2}$ , and ${\tilde{z}}_{1} (n) ~ C N (0, {‖ A^{H} h_{1}^{*} ‖}_{2}^{2} + 1)$ , where * denotes the complex conjugate. From (14.50), for a given A, the maximum achievable SNR for the end-to-end link from S2 to S1 via R, denoted by γ₂₁, is given as

$γ_{21} = \frac{| h_{1}^{T} A h_{2} |^{2} p_{2}}{‖ A^{H} h_{1}^{*} ‖_{2}^{2} + 1}$

(14.51)

Similarly, it can be shown that the maximum SNR ϓ₁₂ for the link from S1 to S2 via R is given as

$γ_{12} = \frac{| h_{2}^{T} A h_{1} |^{2} p_{1}}{‖ A^{H} h_{2}^{*} ‖_{2}^{2} + 1} .$

(14.52)

Now we minimize the relay transmission power given in (14.48), under the constraints that the achievable SNRs ϓ₂₁ and ϓ₁₂ over the two directions are above two target values, ϓ̄₁ and ϓ̄₂. As such, the optimization can be formulated as

$\begin{array}{l} {minimize}_{A} & p_{R} : = ‖ A h_{1} ‖_{2}^{2} p_{1} + ‖ A h_{2} ‖_{2}^{2} p_{2} + tr (A A^{H}) \\ subject to & | h_{1}^{T} A h_{2} |^{2} \geq \frac{{\bar{γ}}_{1}}{p_{2}} ‖ A^{H} h_{1}^{*} ‖_{2}^{2} + \frac{{\bar{γ}}_{1}}{p_{2}} \\ | h_{2}^{T} A h_{1} |^{2} \geq \frac{{\bar{γ}}_{2}}{p_{1}} ‖ A^{H} h_{2}^{*} ‖_{2}^{2} + \frac{{\bar{γ}}_{2}}{p_{1}}, \end{array}$

(14.53)

For the convenience of analysis, we further modify the above problem as follows. First, let Vec(Q) be a K² x 1 vector associated with a K × K square matrix Q = [q₁,…, q_K ]^T, where q_k ∈ ℂ^K ,k = 1,…, K, by the rule of Vec $(Q) = {[q_{1}^{T}, \dots, q_{K}^{T}]}^{T}$ . Next, with b = Vec(A) and $Θ = p_{1} h_{1} h_{1}^{H} + p_{2} h_{2} h_{2}^{H} + I$ , we can express p_R in the objective function of (14.53) as $p_{R} = tr (A Θ A^{H}) = {‖ Φ b ‖}_{2}^{2}$ , with $Φ = {(diag (Θ^{T}, Θ^{T}))}^{\frac{1}{2}}$ , where diag(A, B) denotes a block-diagonal matrix with A and B as the diagonal square matrices. Similarly, let $f_{1} = Vec (h_{1} h_{2}^{T})$ and $f_{2} = Vec (h_{2} h_{1}^{T})$ . Then, from (14.53) it follows that ${| h_{1}^{T} {Ah}_{2} |}^{2} = {| f_{1}^{T} b |}^{2}$ and ${| h_{2}^{T} {Ah}_{1} |}^{2} = {| f_{2}^{T} b |}^{2}$ . Furthermore, by defining

$h_{i} = [\begin{matrix} h_{i} (1, 1) & 0 & h_{i} (2, 1) & 0 \\ 0 & h_{i} (1, 1) & 0 & h_{i} (2, 1) \end{matrix}], i = 1, 2,$

we have $‖ A^{H} h_{i}^{*} ‖_{2}^{2} = ‖ h_{i} b ‖_{2}^{2},$ , i=1,2. Using the above transformations, (14.53) can be rewritten as

$\begin{array}{l} {minimize}_{b} & p_{R} : = ‖ Φ b ‖_{2}^{2} \\ subject to & | f_{1}^{T} b |^{2} \geq \frac{{\bar{γ}}_{1}}{p_{2}} ‖ h_{1} b ‖_{2}^{2} + \frac{{\bar{γ}}_{1}}{p_{2}} \\ | f_{2}^{T} b |^{2} \geq \frac{{\bar{γ}}_{2}}{p_{1}} ‖ h_{2} b ‖_{2}^{2} + \frac{{\bar{γ}}_{2}}{p_{1}} . \end{array}$

(14.54)

The above problem can be shown to be still nonconvex. However, in the following, we show that the exact optimal solution could be obtained via a relaxed SDP problem.

We first define $E_{0} = Φ^{H} Φ, E_{1} = \frac{p_{2}}{{\tilde{γ}}_{1}} f_{1}^{T} - h_{1}^{H} h_{1}$ and $E_{2} = \frac{p_{2}}{{\tilde{γ}}_{2}} f_{2}^{*} f_{2}^{T} - h_{2}^{H} h_{2}$ . Since standard semidefinite programming (SDP) formulations only involve real variables and constants, we introduce a new real matrix variable as X = [b_R; b_I] × [b_R; b_I]^T, where b_R = Re(b) and b_I = Im (b) are the real and imaginary parts of b, respectively. To rewrite the norm representations at (14.54) in terms of X, we need to rewrite E₀, E₁, and E₂, as expanded matrices F₀, F₁, and F₂, respectively, in terms of their real and imaginary parts. Specifically, to write out F₀, we first define the short notations Φ_R = Re(Φ) and Φ_I = Im(Φ); then we have

$F_{0} = [\begin{matrix} Φ_{R}^{T} Φ_{R} + Φ_{I}^{T} Φ_{I} & Φ_{I}^{T} Φ_{R} - Φ_{R}^{T} Φ_{I} \\ Φ_{R}^{T} Φ_{I} - Φ_{I}^{T} Φ_{R} & Φ_{R}^{T} Φ_{R} + Φ_{I}^{T} Φ_{I} \end{matrix}] .$

The expanded matrices F₁ and F₂ can be generated from E₁ and E₂ in a similar way, where the two terms in E₁ or E₂ could first be expanded separately then summed together.

As such, problem (14.54) can be equivalently rewritten as

$\begin{matrix} {minimize}_{X} & p_{R} : = tr (F_{0} X) \\ subject to & tr (F_{1} X) \geq 1, & tr (F_{2} X) \geq 1, X ≽ 0, \\ rank (X) = 1. \end{matrix}$

(14.55)

The above problem is still not convex given the last rank-one constraint. However, if we remove such a constraint, this problem is relaxed into a convex SDP problem as shown below.

$\begin{array}{l} {minimize}_{X} & p_{R} : = tr (F_{0} X) \\ subject to & tr (F_{1} X) \geq 1, tr (F_{2} X) \geq 1, X ≽ 0. \end{array}$

(14.56)

Given the convexity of the above SDP problem, the optimal solution could be efficiently found by various convex optimization methods. Note that SDP relaxation usually leads to an optimal X for problem (14.56) that is of rank r with r ≥ 1, which makes it impossible to reconstruct the exact optimal solution for Problem (14.54) when r > 1. A commonly adopted method in the literature to obtain a feasible rank-one (but in general suboptimal) solution from the solution of SDP relaxation is via “randomization” (see, e.g., [27] and references therein). Fortunately, we show in the following that with the special structure in Problem (14.56), we could efficiently reconstruct an optimal rank-one solution from its optimal solution that could be of rank r with r > 1, based on some elegant results derived for SDP relaxation in [28]. In other words, we could obtain the exact optimal solution for the nonconvex problem in (14.55) without losing any optimality, and as efficiently as solving a convex problem.

Theorem 14.3.1. Assume that an optimal solution X* of rank r > 1 has been found for Problem (14.56), we could efficiently construct another feasible optimal solution X** of rank one, i.e., X** is the optimal solution for both (14.55) and (14.56).

Proof: Please refer to [21].

Note that the above proof is self-constructive, based on which we could easily obtain a routine to obtain an optimal rank-one solution for Problem (14.55) from X*. Then we could map the solution back to obtain an optimal solution for the problem in (14.53).

14.4 Exercises

Exercise 14.4.1. Please indicate whether the following sets are convex or not.

1. ${x : \frac{a^{T} x - b}{c^{T} x + d} \leq 1; c^{T} x + d < 0}$ ;

2. {X : Ax = b, ‖x‖₂ = 1};

3. {X : X₁₁a₀ + X₂₂a₁ ⪰ 0, a₀ ∈ Sⁿ, a₁ ∈ Sⁿ}; (X_ij stands for the ijth element in matrix X)

4. {X : a^TXa = 1}

Exercise 14.4.2. Please indicate whether the following functions are convex or concave or neither.

1. $f (x) = \sup_{w} {\log \sum_{i = 1}^{n} e^{\frac{x_{i}}{w}}}$ ;

2. f (x) = −(x₁x₂x₃)^1/3, x > 0;

3. f (X) = lo gdet (A^TXA) ; X ≻ 0;

4. $f (x) = x^{T} A x+2x - 5, A = [\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}]$ .

Exercise 14.4.3. With the following problem formulation, answer the followup questions.

$\begin{array}{l} \begin{array}{l} minimize & - (x_{1} + x_{2}) \\ subject to & {‖ a_{1} x ‖}_{2} \leq 1, \\ {‖ a_{2} x - b_{2} ‖}_{2} \leq 1, \end{array} \\ where x = {[x_{1}, x_{2}]}^{T}, a_{1} = a_{2} = [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}], and b_{2} = {[1, 0]}^{T} . \end{array}$

where $f (x) = x^{T} A x + 2 x - 5, A = [\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}]$ , and b₂ = [1,0]^T.

1. Is this problem convex?

2. Does Slater’s constraint condition hold?

3. What is the optimal solution for this problem? (Hint: Try to solve this problem graphically if the KKT conditions are hard to solve.)

4. What is the optimal objective value for the dual problem?

5. What is the optimal value for the dual variable associated with the second constraint?

Exercise 14.4.4. Given the optimization problem shown in Exercise 14.4.3, please reformulate it as a semidefinite programming (SDP) problem, then derive the dual problem of the resulting SDP problem.

Exercise 14.4.5. With the following optimization problem, answer the followup questions.

$\begin{array}{l} {maximize}_{P} & \sum_{i = 1}^{n} \log (1 + \frac{P_{i}}{δ_{i}}) \\ subject to & \sum_{i = 1}^{n} P_{i} = P_{t o t a l}, \\ P \geq 0, \end{array}$

where P = [P₁,…, P_n]^T, and δ_i > 0, i = 1,…, n.

1. Is KKT sufficient for us to get the optimal solution for the above problem?

2. Is KKT necessary for the optimal solution?

3. Please write out the KKT conditions for this problem.

4. Please solve the general form of optimal P_i’s.

5. If n = 3, δ₁ = 2, δ₂ = 10, δ₃ = 5, and P_total = 10, what are the optimal P_i values?

Exercise 14.4.6. Let 1 <m <n be integers, and let A be an m × n matrix with full row rank. Furthermore, let c ∈ ℝⁿ and $Q \in S_{+ +}^{n}$ be given. Consider the following optimization problem:

$\begin{matrix} minimize & \frac{1}{2} x^{T} Qx + c^{T} x \\ subject to & Ax = 0. \end{matrix}$

(14.57)

1. Explain why the KKT conditions are necessary and sufficient for (14.57).

2. Write down the KKT conditions associated with (14.57). Hence, express the optimal solution to (14.57) in closed form.

Exercise 14.4.7. Let f : ℝⁿ → ℝⁿ be a differentiable convex function. Consider the following problem:

$\begin{matrix} minimize & f (x) \\ subject to & x \geq 0. \end{matrix}$

(14.58)

Show that x̄ ∈ ℝⁿ is an optimal solution to (14.58) iff it satisfies the following system:

$\begin{matrix} \nabla f (\bar{x}) & \geq & 0, \\ \bar{x} & \geq & 0, \\ {\bar{x}}^{T} \nabla f (\bar{x}) & = & 0. \end{matrix}$

Exercise 14.4.8. This problem is concerned with finding the minimum-volume enclosing ellipsoid of a set of vectors.

1. Let u ∈ ℝ ⁿ be fixed, and define the function g:S ⁿℝ₊ $g (X) = {‖ X u ‖}_{2}^{2}$ Find ∇_g(X).

2. Let V = {v¹,…, v^m} ∈ ℝⁿ be a set of vectors that span ℝⁿ. Consider the following problem:

$\begin{matrix} \inf & - \log \det (X) \\ subject to & {‖ {Xv}^{i} ‖}_{2}^{2} \leq 1 & i = 1, \dots, m, \\ X \in S_{+ +}^{n} . \end{matrix}$

(14.59)

Let X̄ be an optimal solution to (14.59) (it can be shown that such an X̄ exists). Write down the KKT conditions that X̄ must satisfy.

3. Suppose that m = n and vⁱ = e_i for i = 1,…,n, where e_i is the i-th standard basis vector. Using the above result, determine the optimal solution to (14.59) and find the corresponding Lagrange multipliers.

Exercise 14.4.9. Let a ∈ ℝⁿ, b ∈ ℝ and c ∈ ℝⁿ be such that a, c > 0 and b > 0. Consider the following problem:

$\begin{array}{l} minimize & \sum_{i = 1}^{n} \frac{c_{i}}{x_{i}} \\ subject to & \sum_{i = 1}^{n} a_{i} x_{i} = b, \\ x \geq 0. \end{array}$

(14.60)

1. Let u₁ ∈ ℝ and u₂ ∈ ℝⁿ be the Lagrange multipliers associated with the equality and inequality constraints, respectively. Write down the KKT conditions associated with (14.60).

2. Give explicit expressions for x̄ ∈ ℝⁿ, ū₁ ∈ ℝ and ū₂ ∈ ℝⁿ such that (x̄, ū₁, ū₂) satisfies the KKT conditions above.

3. Is the solution X ∈ Rⁿ found above an optimal solution to (14.60)? Explain.

References

[1] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms, 2nd ed., ser. Wiley-Interscience Series in Discrete Mathematics and Optimization. New York: John Wiley & Sons, Inc., 1993.

[2] D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, Massachusetts: Athena Scientific, 1999.

[3] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge: Cambridge University Press, 2004, available online at http://www.stanford.edu/~boyd/cvxbook/.

[4] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, 3rd ed., ser. International Series in Operations Research and Management Science. New York: Springer Science+Business Media, LLC, 2008, vol. 116.

[5] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge: Cambridge University Press, 1985.

[6] M. Brookes, “The Matrix Reference Manual,” 2005, available online at http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html.

[7] M. Sion, “On General Minimax Theorems,” Pacific Journal of Mathematics, vol. 8, no. 1, pp. 171–176, 1958.

[8] H. Komiya, “Elementary Proof for Sion’s Minimax Theorem,” Kodai Mathematical Journal, vol. 11, no. 1, pp. 5–7, 1988.

[9] S. Cui, A. J. Goldsmith, and A. Bahai, “Energy-constrained modulation optimization,” IEEE Transactions on Wireless Communications, vol. 4, no. 5, pp. 2349–2360, September 2005.

[10] S. Cui, M. Kisialiou, Z.-Q. Luo, and Z. Ding, “Robust blind multiuser detection against signature waveform mismatch based on second order cone programming,” IEEE Transactions on Wireless Communications, vol. 4, no. 4, pp. 1285–1291, July 2005.

[11] R. Zhang and Y.-C. Liang, “Exploiting multi-antennas for opportunistic spectrum sharing in cognitive radio networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 1, pp. 88–102, Feb. 2008.

[12] W. Yu and R. Lui, “Dual methods for nonconvex spectrum optimization of multicarrier systems,” IEEE Transactions on Communications, vol. 54, no. 7, pp. 1310–1322, July 2006.

[13] L. Zhang, R. Zhang, Y.-C. Liang, Y. Xin, and S. Cui, “On the relationship between the multi-antenna secrecy communications and cognitive radio communications,” IEEE Transactions on Communications, vol. 58, no. 6, pp. 1877–1886, June 2010.

[14] R. Zhang, S. Cui, and Y.-C. Liang, “On ergodic sum capacity of fading cognitive multiple-access and broadcast channels,” IEEE Transactions on Information Theory, vol. 55, no. 11, pp. 5161–5178, November 2009.

[15] M. Chiang, “Balancing transport and physical layers in wireless multihop networks: Jointly optimal congestion control and power control,” IEEE Journal on Selected Areas in Communications, vol. 23, no. 1, pp. 104–116, January 2005.

[16] J. Xiao, S. Cui, Z. Q. Luo, and A. J. Goldsmith, “Joint estimation in sensor networks under energy constraint,” IEEE Transactions on Signal Processing, vol. 54, no. 2, pp. 413–422, February 2005.

[17] A. So and Y. Ye, “Theory of semidefinite programming for sensor network localization,” Mathematical Programming, vol. 109, pp. 367–384, 2007.

[18] S. Cui and A. J. Goldsmith, “Cross-layer design in energy-constrained networks using cooperative MIMO techniques,” EURASIP’s Signal Processing Journal, Special Issue on Advances in Signal Processing-based Cross-layer Designs, vol. 86, pp. 1804–1814, August 2006.

[19] R. Madan, S. Cui, S. Lall, and A. Goldsmith, “Modeling and optimization of transmission schemes in energy-constrained wireless sensor networks,” IEEE/ACM Transactions on Networking, vol. 15, no. 6, pp. 1359–1372, December 2007.

[20] Z. Quan, S. Cui, H. V. Poor, and A. Sayed, “Collaborative wideband sensing for cognitive radios,” IEEE Signal Processing Magazine special issue on cognitive radios, vol. 25, no. 6, pp. 60–73, January 2009.

[21] R. Zhang, Y.-C. Liang, C.-C. Chai, and S. Cui, “Optimal beamforming for two-way multi-antenna relay channel with analogue network coding,” IEEE Journal on Selected Areas of Communications, vol. 27, no. 6, pp. 699–712, June 2009.

[22] R. Zhang and S. Cui, “Cooperative interference management with miso beamforming,” IEEE Transactions on Signal Processing, vol. 58, no. 10, pp. 5450–5458, October 2010.

[23] S. P. Boyd, S.-J. Kim, D. D. Patil, and M. A. Horowitz, “Digital Circuit Optimization via Geometric Programming,” Operations Research, vol. 53, no. 6, pp. 899–932, November 2005.

[24] T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley, 1991.

[25] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. Telecommun., vol. 10, no. 6, pp. 585–595, November 1999.

[26] R. G. Bland, D. Goldfarb, and M. J. Todd, “The ellipsoid method: a survey,” Operations Research, vol. 29, no. 6, pp. 1039–1091, November 1981.

[27] Z.-Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 24, no. 8, pp. 1426–1438, August 2006.

[28] Y. Ye and S. Zhang, “New results on quadratic minimization,” SIAM J. Optim., vol. 14, pp. 245–267, 2003.

¹Recall that for a generic optimization problem $\min_{x \in S \in ℝ^{n}}$ f(x), a point x* ∈ S is called a global minimum if f(x*) ≤ f(x) for all x ∈ S. On the other hand, if there exists an ϵ > 0 such that the point x* ∈ S satisfies f(x*) ≤ f(x) for all x ∈ S ∩ B°(x*, ϵ), then it is called a local minimum. Here, B°(x̄, ϵ) = denotes the open ball centered at x̄ ∈ ℝⁿ of radius ϵ > 0.

²A map A : ℝ^m → ℝⁿ is said to be affine if there exists an n × m matrix B and a vector d ∈ ℝⁿ such that A(x) = Bx + d for all x ∈ ℝ^m.

³Let S be a nonempty convex subset of ℝⁿ. We say that f : S → ℝ is convex at x̄ ∈ S if f(αx̄ + (1 − α)x) ≤ αf (x̄) + (1 − α)f (x) for all a ∈ (0, 1) and x ∈ S. Note that a function f: S → ℝ can be convex at a particular point x̄ ∈ S without being convex on S.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 14 Unconstrained and Constrained Optimization Problems

Create new playlist

Sign In

Sign Up

Table of Contents for
14 Unconstrained and Constrained Optimization Problems