Appendix A

Additional Topics

A.1 Combinatorial Problems

Combinatorics is a branch of mathematics concerning the study of finite or countable discrete structures and, in particular, how the discrete structures can be combined or arranged. Aspects of combinatorics include counting the structures of a given kind and size, deciding when certain criteria can be met, and combinatorial optimization.

Combinatorial problems arise in many areas of pure mathematics, notably in algebra, probability theory, topology, and geometry, and combinatorics also has many applications in optimization, computer science, ergodic theory, and statistical physics. In the later twentieth century, however, powerful and general theoretical methods were developed, making combinatorics into an independent branch of mathematics in its own right. Combinatorics is used frequently in computer science to obtain formulas and estimates in the analysis of algorithms.

An important part of the study is to analyze the complexity of the proposed algorithms. In combinatorics, we are interested in particular arrangements or combinations of the objects. For instance, neighboring countries are colored differently when drawn on a map. We might be interested in the minimal number of colors needed or in how many different ways we can color the map when given a number of colors to use.

Knuth [68], page 1 distinguishes five issues of concern in combinatorics:

i. Existence: Is there an arrangement?
ii. Construction: If yes, can we find one quickly?
iii. Enumeration: How many different arrangements are there?
iv. Generation: Can all arrangements be visited systematically?
v. Optimization: What arrangement is optimal, given an objective function on the set of arrangements?

In this book we focus on issues (i), (iii), and (v). Issue (i) is also called the decision problem; issue (iii) is also known as the counting problem. In particular, we consider combinatorial problems that can be modeled by an optimization program and, in fact, most often by an integer linear program. For that purpose, we denote by b01-math-0001 the set of feasible solutions of the problem, and we assume that it is a subset of an b01-math-0002-dimensional integer vector space and that is given by the following linear integer constraints (cf. (1.1)):

A.1 b01-math-0003

where,

1. b01-math-0004 is a b01-math-0005 matrix of constraint coefficients, and b01-math-0006 is a b01-math-0007-vector of the right-hand side values. Most often we assume that the variables b01-math-0008 are non-negative integers.
2. The combinatorial problems associated with (A.1) are
  • Decision making: is b01-math-0009 nonempty?
  • Counting: calculate b01-math-0010.
  • Optimization: solve b01-math-0011 for a given objective or performance function b01-math-0012.

In this book, we describe in detail various problems, algorithms, and mathematical aspects that are related to the issues (i)–(iii) and associated with the problem (A.1). The following two sections deal with a sample of well-known counting and optimization problems, respectively.

A.1.1 Counting

In this section, we present details of the satisfiability problem, independent set problem, vertex coloring problem, Hamiltonian cycle problem, the knapsack problem, and the permanent problem.

A.1.1.1 Satisfiability Problem

The most common satisfiability problem (SAT) comprises the following two components:

  • A set of b01-math-0013 boolean variables b01-math-0014, representing statements that can either be TRUE (b01-math-0015) or FALSE (b01-math-0016). The negation (the logical NOT) of a variable b01-math-0017 is denoted by b01-math-0018. A variable or its negation is called a literal.
  • A set of b01-math-0019 distinct clauses b01-math-0020 of the form b01-math-0021b01-math-0021a, where the b01-math-0022's are the literals of the b01-math-0023 variables, and the b01-math-0024 denotes the logical OR operator.

The binary vector b01-math-0025 is called a truth assignment, or simply an assignment. Thus, b01-math-0026 assigns truth to b01-math-0027 and b01-math-0028 assigns truth to b01-math-0029, for each b01-math-0030. The simplest SAT problem can now be formulated as follows: find a truth assignment b01-math-0031 such that all clauses are true.

Denoting the logical AND operator by b01-math-0032, we can represent the above SAT problem via a single formula as

equation

where b01-math-0034 is the number of literals in clause b01-math-0035 and b01-math-0036 are the literals of the b01-math-0037 variables in clause b01-math-0038. The SAT formula is then said to be in conjunctive normal form (CNF). When all clauses have exactly the same number b01-math-0039 of literals per clause, we say that we deal with the b01-math-0040-SAT problem.

The problem of deciding whether there exists a valid assignment, and, indeed, providing such a vector is called the SAT-assignment problem and is b01-math-0041-complete [92]. We are concerned with the harder problem to count all valid assignments, which is b01-math-0042-complete.

SAT problems can be converted into the framework of a solution set given by linear constraints. Define the b01-math-0043 matrix b01-math-0044 with entries b01-math-0045 by

equation

Furthermore, let b01-math-0047 be the b01-math-0048-vector with entries b01-math-0049. Then it is easy to see that, for any configuration b01-math-0050,

equation

Thus, the SAT problem presents a particular case of the integer linear program with constraints given in (A.1).

A.1.1.2 Independent Sets

Consider a graph b01-math-0052 with b01-math-0053 nodes and b01-math-0054 edges. Our goal is to count the number of independent node sets of this graph. A node set is called independent if no two nodes are connected by an edge, that is, no two nodes are adjacent; see Figure A.1 for an illustration of this concept.

Figure A.1 The black nodes form an independent set inasmuch as they are not adjacent to each other.

b01f001

The independent set problem can be converted into the family of integer programming problems (A.1) by considering the transpose of the incidence matrix b01-math-0055. The nodes are labeled b01-math-0056; the edges are labeled b01-math-0057. Then b01-math-0058 is a b01-math-0059 matrices of zeros and ones with b01-math-0060 iff the edge b01-math-0061 and node b01-math-0062 are incident. The decision vector b01-math-0063 denotes which nodes belong to the independent set. Finally, let b01-math-0064 be the b01-math-0065-column vector with entries b01-math-0066 for all b01-math-0067. Then it is easy to see that for any configuration b01-math-0068

equation

Having formulated the problem set b01-math-0070 as the feasible set of an integer linear program with bounded variables, we can apply the procedure mentioned in Section 4.1 for constructing a sequence of decreasing sets (4.1) in order to start the splitting method.

An alternative construction of the sequence of decreasing sets (4.1) is the following. Let b01-math-0071 be the set of the first b01-math-0072 edges and define the associated subgraph b01-math-0073, b01-math-0074. Note that b01-math-0075, and that b01-math-0076 is obtained from b01-math-0077 by adding the edge b01-math-0078, which is not in b01-math-0079. Define b01-math-0080 to be the set of the independent sets of b01-math-0081, then clearly b01-math-0082. Here b01-math-0083, because b01-math-0084 has no edges and thus every subset of b01-math-0085 is an independent set, including the empty set.

A.1.1.3 Vertex Coloring

Given a graph b01-math-0086 with b01-math-0087 nodes b01-math-0088, and b01-math-0089 edges b01-math-0090, color the nodes of b01-math-0091 with at most a b01-math-0092 colors given from a set of colors b01-math-0093, such that for each edge b01-math-0094, nodes b01-math-0095 and b01-math-0096 have different colors. We shall show how to cast the node coloring problem into the framework (A.1). Define the decision variables b01-math-0097 for b01-math-0098 and b01-math-0099 by

equation

Then, b01-math-0101 iff

A.2 b01-math-0102

An alternative to (A.2) framework can be obtained by arguing as follows. Define decision variables b01-math-0103 for b01-math-0104 by

equation

Then, b01-math-0106 iff

A.3 b01-math-0107

The procedure for node coloring via the randomized algorithm is the same as for the independent sets problem. Again, let b01-math-0108 be the set of the first b01-math-0109 edges and let b01-math-0110 be the associated subgraph, b01-math-0111. Note that b01-math-0112, and that b01-math-0113 is obtained from b01-math-0114 by adding the edge b01-math-0115. Define b01-math-0116 to be the set of the node colorings of b01-math-0117, then clearly b01-math-0118. Here b01-math-0119, because b01-math-0120 has no edges and thus we can color any node with any color.

A.1.1.4 Hamiltonian Cycles

Given a graph b01-math-0121 with b01-math-0122 nodes and b01-math-0123 edges, find all Hamiltonian cycles. Figure A.2 presents a graph with eight nodes and several Hamiltonian cycles, one of which is marked in bold lines.

Figure A.2 A Hamiltonian graph. The bold edges form a Hamiltonian cycle.

b01f002

A.1.1.5 Permanent

Let b01-math-0124 be a b01-math-0125 matrix consisting of zeros and ones only. The permanent of b01-math-0126 is defined by

A.4 b01-math-0127

where b01-math-0128 is the set of all permutations b01-math-0129 of b01-math-0130.

It is well known that the permanent of b01-math-0131 equals the number of perfect matchings in a balanced bipartite graph b01-math-0132 with matrix b01-math-0133 as its biadjacency matrix [90]: two disjoint sets of nodes b01-math-0134 and b01-math-0135, where b01-math-0136 iff b01-math-0137. Recall that a matching is a collection of edges b01-math-0138 such that each node occurs at most once in b01-math-0139. A perfect matching is a matching of size b01-math-0140.

To cast the bipartite perfect matching problem in an integer linear program, we define the decision variables b01-math-0141 by

equation

Let b01-math-0143 be the set of all perfect matchings of the bipartite graph associated with the given binary matrix b01-math-0144. Then b01-math-0145 iff

A.5 b01-math-0146

As an example, let b01-math-0147 be the b01-math-0148 matrix

A.6 b01-math-0149

The corresponding bipartite graph is given in Figure A.3. The graph has three perfect matchings, one of which is displayed in the figure.

Figure A.3 A bipartite graph. The bold edges form a perfect matching.

b01f003

A.1.1.6 Knapsack Problem

Given items of sizes b01-math-0150 and a positive integer b01-math-0151, find the numbers of vectors b01-math-0152, such that

equation

The integer b01-math-0154 represents the size of the knapsack, and b01-math-0155 indicates whether or not item b01-math-0156 is placed in it. Let b01-math-0157 denote the set of all feasible solutions, that is, all different combinations of items that can be placed in the knapsack without exceeding its capacity. The goal is to determine b01-math-0158.

A.1.2 Combinatorial Optimization

In this section, we present details of the max-cut problem, the traveling salesman problem, Internet security problem, set covering/partitioning/packing problem, knapsack problems, the maximum satisfiability problem, the weighted matching problem, and graph coloring problems.

A.1.2.1 Max-Cut Problem

The maximal-cut (max-cut) problem in a graph can be formulated as follows. Given a graph b01-math-0159 with a set of nodes b01-math-0160, a set of edges b01-math-0161 between the nodes, and a weight function on the edges, b01-math-0162. The problem is to partition the nodes into two subsets b01-math-0163 and b01-math-0164 such that the sum of the weights b01-math-0165 of the edges going from one subset to the other is maximized:

equation

A cut can be conveniently represented via its corresponding cut vector b01-math-0167, where b01-math-0168 if node b01-math-0169 belongs to same partition as b01-math-0170, and b01-math-0171 else. For each cut vector b01-math-0172, let b01-math-0173 be the partition of b01-math-0174 induced by b01-math-0175, such that b01-math-0176 contains the set of nodes b01-math-0177. Unless stated otherwise, we let b01-math-0178 (thus, b01-math-0179).

Let b01-math-0180 be the set of all cut vectors b01-math-0181 and let b01-math-0182 be the corresponding cost of the cut. Then,

A.7 b01-math-0183

When we do not specify, we assume that the graph is undirected. However, when we deal with a directed graph, the cost of a cut b01-math-0184 includes both the cost of the edges from b01-math-0185 to b01-math-0186 and from b01-math-0187 to b01-math-0188. In this case, the cost corresponding to a cut vector b01-math-0189 is, therefore,

A.8 b01-math-0190

A.1.2.2 Traveling Salesman Problem

The traveling salesman problem (TSP) can be formulated as follows. Consider a weighted graph b01-math-0191 with b01-math-0192 nodes, labeled b01-math-0193. The nodes represent cities, and the edges represent the roads between the cities. Each edge from b01-math-0194 to b01-math-0195 has weight or cost b01-math-0196, representing the length of the road. The problem is to find the shortest tour that visits all the cities exactly once and such that the starting city is also the terminating one.

Let b01-math-0197 be the set of all possible tours and let b01-math-0198 be the total length of tour b01-math-0199. We can represent each tour via a permutation b01-math-0200 with b01-math-0201. Mathematically, the TSP reads as

A.9 b01-math-0202

A.1.2.3 RSA or Internet Security Problem

The RSA, also called the Internet security problem, reads as follows: find two primes, provided we are given a large integer b01-math-0203 known to be a product of these primes. Clearly, we can write the given number b01-math-0204 in the binary system with b01-math-0205-bit integers as

equation

Mathematically, the problem reads as follows: find binary b01-math-0207 such that

A.10 b01-math-0208

Remarks:

  • The problem (A.10) can be formulated for any basis, say for a decimal rather than a binary one.
  • To have a unique solution, we can impose in addition the constraint b01-math-0209.
  • The RSA problem can also be formulated as:
    Find binary b01-math-0210 such that

    A.11 b01-math-0211

Example A.1

Let b01-math-0212. In this case, the primes are b01-math-0213 and b01-math-0214. We can write them as

equation

and

equation

respectively. In this case b01-math-0217 and the program (A.10) reduces to:

Find binary b01-math-0218 such that

A.12 b01-math-0219

The program should deliver the unique solution

equation

Note that we should consider b01-math-0221 and b01-math-0222 as the same solution as the previous one.squ

A.1.2.4 Set Covering, Set Packing, and Set Partitioning

Consider a finite set b01-math-0223 and let b01-math-0224, b01-math-0225 be a collection of subsets of the set b01-math-0226 where b01-math-0227. A subset b01-math-0228 is called a cover of b01-math-0229 if b01-math-0230, and is called a packing of b01-math-0231 if b01-math-0232 for all b01-math-0233 and b01-math-0234. If b01-math-0235 is both a cover and packing, it is called a partitioning.

Suppose b01-math-0236 is the cost associated with b01-math-0237. Then the set covering problem is to find a minimum cost cover. If b01-math-0238 is the value or weight of b01-math-0239, then the set packing problem is to find a maximum weight or value packing. Similarly, the set partitioning problem is to find a partitioning with minimum cost. These problems can be formulated as zero-one linear integer programs as shown below. For all b01-math-0240 and b01-math-0241, let

equation

and

equation

Then, the set covering, set packing, and set partitioning formulations are given by

equation

equation

and

equation

respectively.

A.1.2.5 Maximum Satisfiability Problem

The satisfiability problem has been introduced in Section A.1.1. When there is no feasible solution, one might be interested in finding an assignment of truth values to the variables that satisfies the maximum number of clauses, called the MAX-SAT problem.

If one associates a weight b01-math-0247 to each clause b01-math-0248, one obtains the weighted MAX-SAT problem, denoted by MAX W-SAT. The assignment of truth values to the b01-math-0249 variables that maximizes the sum of the weights of the satisfied clauses is determined. Of course, MAX-SAT is contained in MAX W-SAT (all weights are equal to one). When all clauses have exactly the same number b01-math-0250 of literals per clause, we say that we deal with the MAX-b01-math-0251-SAT problem, or MAX W-b01-math-0252-SAT in the weighted case.

The MAX W-SAT problem has a natural integer linear programming formulation (ILP). Recall the boolean decision variables b01-math-0253, meaning TRUE (b01-math-0254) and FALSE (b01-math-0255). Let the boolean variable b01-math-0256 if clause b01-math-0257 is satisfied, b01-math-0258 otherwise. The integer linear program is

equation

equation

where b01-math-0261 and b01-math-0262 denote the set of indices of variables that appear un-negated and negated in clause b01-math-0263, respectively.

Because the sum of the b01-math-0264 is maximized and because each b01-math-0265 appears as the right-hand side of one constraint only, b01-math-0266 will be equal to one if and only if clause b01-math-0267 is satisfied.

A.1.2.6 Weighted Matching

Let b01-math-0268 be a graph with b01-math-0269 nodes (b01-math-0270 even), and let b01-math-0271 be a weight function on the edges, then the (general weighted) matching problem is formulated as

A.13 b01-math-0272

A.1.2.7 Knapsack Problems

All knapsack problems consider a set of items with associated profit b01-math-0273 and weight b01-math-0274. A subset of the items is to be chosen such that the weight sum does not exceed the capacity b01-math-0275 of the knapsack, and such that the largest possible profit sum is obtained. We will assume that all coefficients b01-math-0276, b01-math-0277, b01-math-0278 are positive integers, although weaker assumptions may sometimes be handled in the individual problems.

The 0-1 knapsack problem is the problem of choosing some of the b01-math-0279 items such that the corresponding profit sum is maximized without the weight sum exceeding the capacity b01-math-0280. Thus, it may be formulated as the following maximization problem:

A.14 b01-math-0281

where b01-math-0282 is a binary variable having value b01-math-0283 if item b01-math-0284 should be included in the knapsack, and b01-math-0285 otherwise. If we have a bounded amount b01-math-0286 of each item type b01-math-0287, then the bounded knapsack problem appears:

A.15 b01-math-0288

Here b01-math-0289 gives the amount of each item type that should be included in the knapsack in order to obtain the largest objective value. The unbounded knapsack problem is a special case of the bounded knapsack problem, since an unlimited amount of each item type is available:

A.16 b01-math-0290

Actually, any variable b01-math-0291 of an unbounded knapsack problem will be bounded by the capacity b01-math-0292, as the weight of each item is at least one. But, generally, there is no benefit by transforming an unbounded knapsack problem into its bounded version.

The most general form of a knapsack problem is the multiconstrained knapsack problem, which is basically a general integer programming problem, where all coefficients b01-math-0293, b01-math-0294, and b01-math-0295 are non-negative integers. Thus, it may be formulated as

A.17 b01-math-0296

The quadratic knapsack problem presented is an example of a knapsack problem with a quadratic objective function. It may be stated as

A.18 b01-math-0297

Here, b01-math-0298 is the profit obtained if both items b01-math-0299 and b01-math-0300 are chosen, while b01-math-0301 is the weight of item b01-math-0302. The quadratic knapsack problem is a knapsack counterpart to the quadratic assignment problem, and theproblem has several applications in telecommunication and hydrological studies.

A.1.2.8 Graph Coloring

Similar to many combinatorial optimization problems, the graph coloring problem has several mathematical programming formulations. Five such formulations are presented in the following:

(F-1):

A.19 b01-math-0303

A.20 b01-math-0304

A.21 b01-math-0305

A.22 b01-math-0306

In the above model, b01-math-0307 if color b01-math-0308 is used. The binary variables b01-math-0309 are associated with node b01-math-0310: b01-math-0311 if and only if color b01-math-0312 is assigned to node b01-math-0313. Constraints (A.19) ensure that exactly one color is assigned to each node. Constraints (A.20) prevent adjacent nodes from having the same color. Constraints (A.21) guarantee that no b01-math-0314 can be b01-math-0315 unless color b01-math-0316 is used. The optimal objective function value gives the chromatic number of the graph. Moreover, the sets b01-math-0317 for all b01-math-0318, comprise a partition of the nodes into (minimum number of) independent sets.

(F-2):

A.23 b01-math-0319

A.24 b01-math-0320

A.25 b01-math-0321

The value of b01-math-0322 indicates which color is assigned to b01-math-0323, for b01-math-0324. Constraints (A.24) and (A.25) prevent two adjacent nodes from having the same color. This can be seen by noting that, if b01-math-0325, then no feasible assignment of b01-math-0326 will satisfy both (A.24) and (A.25). The optimal objective function value equals the chromatic number of the graph under consideration.

The coloring problem can also be formulated as a set partitioning problem. Let b01-math-0327 be all the independent sets of b01-math-0328. Let the rows of the 0-1 matrix b01-math-0329 be the characteristic vectors of b01-math-0330, b01-math-0331. Define variables b01-math-0332 as follows:

equation

(F-3):

A.26 b01-math-0334

where b01-math-0335, and b01-math-0336 is of dimension b01-math-0337.

(F-4):

A.27 b01-math-0338

The graph coloring problem can also be formulated as a special case of the quadratic assignment problem. Given an integer b01-math-0339 and a graph b01-math-0340, the optimal objective function value of the following problem is zero if and only if b01-math-0341 is b01-math-0342 colorable.

(F-5):

A.28 b01-math-0343

A.2 Information

In this section we discuss briefly Shannon's entropy and Kullback–Leibler's cross-entropy. The subsequent material is taken almost verbatim from Section 1.14 [108].

A.2.1 Shannon Entropy

One of the most celebrated measures of uncertainty in information theory is the Shannon entropy, or simply entropy. The entropy of a discrete random variable b01-math-0344 with density b01-math-0345 is defined as

equation

Here b01-math-0347 is interpreted as a random character from an alphabet b01-math-0348, such that b01-math-0349 with probability b01-math-0350. We will use the convention b01-math-0351.

It can be shown that the most efficient way to transmit characters sampled from b01-math-0352 over a binary channel is to encode them such that the number of bits required to transmit b01-math-0353 is equal to b01-math-0354. It follows that b01-math-0355 is the expected bit length required to send a random character b01-math-0356.

A more general approach, which includes continuous random variables, is to define the entropy of a random variable b01-math-0357 with density b01-math-0358 by

A.29 b01-math-0359

Definition (A.29) can easily be extended to random vectors b01-math-0360 as (in the continuous case)

A.30 b01-math-0361

Often, b01-math-0362 is called the joint entropy of the random variables b01-math-0363 and is also written as b01-math-0364. In the continuous case, b01-math-0365 is frequently referred to as the differential entropy to distinguish it from the discrete case.

Example A.2

Let b01-math-0366 have a b01-math-0367 distribution for some b01-math-0368. The density b01-math-0369 of b01-math-0370 is given by b01-math-0371 and b01-math-0372, so that the entropy of b01-math-0373 is

equation

The graph of the entropy as a function of b01-math-0375 is depicted in Figure A.4. Note that the entropy is maximal for b01-math-0376, which gives the uniform density on b01-math-0377.

Figure A.4 The entropy for the b01-math-0378 distribution as a function of b01-math-0379.

b01f004

Next, consider a sequence b01-math-0380 of iid b01-math-0381 random variables. Let b01-math-0382. The density of b01-math-0383, say b01-math-0384, is simply the product of the densities of the b01-math-0385, so that

equation

The properties of b01-math-0387 in the continuous case are somewhat different from those in the discrete one. In particular,

1. The differential entropy can be negative, whereas the discrete entropy is always positive.
2. The discrete entropy is insensitive to invertible transformations, whereas the differential entropy is not. Specifically, if b01-math-0388 is discrete, b01-math-0389, and b01-math-0390 is an invertible mapping, then b01-math-0391, because b01-math-0392. However, in the continuous case, we have an additional factor due to the Jacobian of the transformation.

It is not difficult to see that of any density b01-math-0393, the one that gives the maximum entropy is the uniform density on b01-math-0394. That is,

A.31 b01-math-0395

A.2.2 Kullback–Leibler Cross-Entropy

Let b01-math-0396 and b01-math-0397 be two densities on b01-math-0398. The Kullback–Leibler cross-entropy between b01-math-0399 and b01-math-0400 is defined (in the continuous case) as

A.32 b01-math-0401

b01-math-0402 is also called the Kullback–Leibler divergence, the cross-entropy, and the relative entropy. If not stated otherwise, we shall call b01-math-0403 the cross-entropy between b01-math-0404 and b01-math-0405. Notice that b01-math-0406 is not a distance between b01-math-0407 and b01-math-0408 in the formal sense, because, in general, b01-math-0409. Nonetheless, it is often useful to think of b01-math-0410 as a distance because

equation

and b01-math-0412 if and only if b01-math-0413. This follows from Jensen's inequality (if b01-math-0414 is a convex function, such as b01-math-0415, then b01-math-0416). Namely,

equation

A.3 Efficiency of Estimators

In this book, we frequently use

A.33 b01-math-0418

which presents an unbiased estimator of the unknown quantity b01-math-0419, where b01-math-0420 are independent replications of some random variable b01-math-0421.

By the central limit theorem, b01-math-0422 has approximately a b01-math-0423 distribution for large b01-math-0424. We shall estimate b01-math-0425 via the sample variance

equation

By the law of large numbers, b01-math-0427 converges with probability 1 to b01-math-0428. Consequently, for b01-math-0429 and large b01-math-0430, the approximate b01-math-0431 confidence interval for b01-math-0432 is given by

equation

where b01-math-0434 is the b01-math-0435 quantile of the standard normal distribution. For example, for b01-math-0436 we have b01-math-0437. The quantity

equation

is often used in the simulation literature as an accuracy measure for the estimator b01-math-0439. For large b01-math-0440 it converges to the relative error of b01-math-0441, defined as

A.34 b01-math-0442

The square of the relative error

A.35 b01-math-0443

is called the squared coefficient of variation.

Example A.3 Estimation of Rare-Event Probabilities

Consider estimation of the tail probability b01-math-0444 of some random variable b01-math-0445 for a large number b01-math-0446. If b01-math-0447 is very small, then the event b01-math-0448 is called a rare event and the probability b01-math-0449 is called a rare-event probability.

We may attempt to estimate b01-math-0450 via (A.33) as

A.36 b01-math-0451

which involves drawing a random sample b01-math-0452 from the pdf of b01-math-0453 and defining the indicators b01-math-0454, b01-math-0455. The estimator b01-math-0456 thus defined is called the crude Monte Carlo (CMC) estimator. For small b01-math-0457 the relative error of the CMC estimator is given by

A.37 b01-math-0458

As a numerical example, suppose that b01-math-0459. In order to estimate b01-math-0460 accurately with relative error, say, b01-math-0461, we need to choose a sample size

equation

This shows that estimating small probabilities via CMC estimators is computationally meaningless.squ

A.3.1 Complexity

The theoretical framework in which one typically examines rare-event probability estimation is based on complexity theory (see, for example, [4]).

In particular, the estimators are classified either as polynomial-time or as exponential-time. It is shown in [4] that for an arbitrary estimator, b01-math-0463 of b01-math-0464, to be polynomial-time as a function of some b01-math-0465, it suffices that its squared coefficient of variation, b01-math-0466, or its relative error, b01-math-0467, is bounded in b01-math-0468 by some polynomial function, b01-math-0469 For such polynomial-time estimators, the required sample size to achieve a fixed relative error does not grow too fast as the event becomes rarer.

Consider the estimator (A.36) and assume that b01-math-0470 becomes very small as b01-math-0471. Note that

equation

Hence, the best one can hope for with such an estimator is that its second moment of b01-math-0473 decreases proportionally to b01-math-0474 as b01-math-0475. We say that the rare-event estimator (A.36) has bounded relative error if for all b01-math-0476

A.38 b01-math-0477

for some fixed b01-math-0478. Because bounded relative error is not always easy to achieve, the following weaker criterion is often used. We say that the estimator (A.36) is logarithmically efficient (sometimes called asymptotically optimal) if

A.39 b01-math-0479

Example A.4 The CMC Estimator Is Not Logarithmically Efficient

Consider the CMC estimator (A.36). We have

equation

so that

equation

Hence, the CMC estimator is not logarithmically efficient, and therefore alternative estimators must be found to estimate small b01-math-0482.squ

A.3.2 Complexity of Randomized Algorithms

A randomized algorithm is said to give an b01-math-0483-approximation for a parameter b01-math-0484 if its output b01-math-0485 satisfies

A.40 b01-math-0486

that is, the “relative error” b01-math-0487 of the approximation b01-math-0488 lies with high probability (b01-math-0489) below some small number b01-math-0490.

One of the main tools in proving (A.40) for various randomized algorithms is the so-called Chernoff bound, which states that for any random variable b01-math-0491 and any number b01-math-0492

A.41 b01-math-0493

Namely, for any fixed b01-math-0494 and b01-math-0495, define the functions b01-math-0496 and b01-math-0497. Then, clearly b01-math-0498 for all b01-math-0499. As a consequence, for any b01-math-0500,

equation

The bound (A.41) now follows by taking the smallest such b01-math-0502.

An important application is the following.

Theorem A.1
Let b01-math-0503 be iid b01-math-0504 random variables, then their sample mean provides an b01-math-0505-approximation for b01-math-0506, that is,
equation

provided b01-math-0508.

For the proof see, for example, Rubinstein and Kroese [108].

Definition A.1 FPRAS
A randomized algorithm is said to provide a fully polynomial-time randomized approximation scheme (FPRAS) if for any input vector b01-math-0509 and any parameters b01-math-0510 and b01-math-0511 the algorithm outputs an b01-math-0512-approximation to the desired quantity b01-math-0513 in time, that is, polynomial in b01-math-0514 and the size b01-math-0515 of the input vector b01-math-0516.

Note that the sample mean in Theorem A.1 provides a FPRAS for estimating b01-math-0517. Note also that the input vector b01-math-0518 consists of the Bernoulli variables b01-math-0519.

There exists a fundamental connection between the ability to sample uniformly from some set b01-math-0520 and counting the number of elements of interest. Because exact uniform sampling is not always feasible, MCMC techniques are often used to sample approximately from a uniform distribution.

Definition A.2 ϵ-Uniform Sample
Let b01-math-0521 be a random output of a sampling algorithm for a finite sample space b01-math-0522. We say that the sampling algorithm generates an b01-math-0523-uniform sample from b01-math-0524 if, for any b01-math-0525,
equation
Definition A.3 Variation Distance
The variation distance between two distributions b01-math-0527 and b01-math-0528 on a countable space b01-math-0529 is defined as
equation

It is well-known [88] that the definition of variation distance coincides with that of an b01-math-0531-uniform sample in the sense that a sampling algorithm returns an b01-math-0532-uniform sample on b01-math-0533 if and only if the variation distance between its output distribution b01-math-0534 and the uniform distribution b01-math-0535 satisfies

equation

Bounding the variation distance between the uniform distribution and the empirical distribution of the Markov chain obtained after some warm-up period is a crucial issue while establishing the foundations of randomized algorithms because, with a bounded variation distance, one can produce an efficient approximation for b01-math-0537.

Definition A.4 FPAUS
A sampling algorithm is called a fully polynomial almost uniform sampler (FPAUS) if, given an input vector b01-math-0538 and a parameter b01-math-0539, the algorithm generates an b01-math-0540-uniform sample from b01-math-0541 and runs in a time that is, polynomial in b01-math-0542 and the size of the input vector b01-math-0543.

An important issue is to prove that given an FPAUS for a combinatorial problem, one can construct a corresponding FPRAS.

Example A.5 FPAUS and FPRAS for Independent Sets

An FPAUS for independent sets takes as input a graph b01-math-0544 and a parameter b01-math-0545. The sample space b01-math-0546 consists of all independent sets in b01-math-0547 with the output being an b01-math-0548-uniform sample from b01-math-0549. The time to produce such an b01-math-0550-uniform sample should be polynomial in the size of the graph and b01-math-0551. Based on the product formula (4.2), Mitzenmacher and Upfal [88] prove that, given an FPAUS, one can construct a corresponding FPRAS.squ

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.39.144