Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix A

Additional Topics

A.1 Combinatorial Problems

Combinatorics is a branch of mathematics concerning the study of finite or countable discrete structures and, in particular, how the discrete structures can be combined or arranged. Aspects of combinatorics include counting the structures of a given kind and size, deciding when certain criteria can be met, and combinatorial optimization.

Combinatorial problems arise in many areas of pure mathematics, notably in algebra, probability theory, topology, and geometry, and combinatorics also has many applications in optimization, computer science, ergodic theory, and statistical physics. In the later twentieth century, however, powerful and general theoretical methods were developed, making combinatorics into an independent branch of mathematics in its own right. Combinatorics is used frequently in computer science to obtain formulas and estimates in the analysis of algorithms.

An important part of the study is to analyze the complexity of the proposed algorithms. In combinatorics, we are interested in particular arrangements or combinations of the objects. For instance, neighboring countries are colored differently when drawn on a map. We might be interested in the minimal number of colors needed or in how many different ways we can color the map when given a number of colors to use.

Knuth [68], page 1 distinguishes five issues of concern in combinatorics:

i. Existence: Is there an arrangement?

ii. Construction: If yes, can we find one quickly?

iii. Enumeration: How many different arrangements are there?

iv. Generation: Can all arrangements be visited systematically?

v. Optimization: What arrangement is optimal, given an objective function on the set of arrangements?

In this book we focus on issues (i), (iii), and (v). Issue (i) is also called the decision problem; issue (iii) is also known as the counting problem. In particular, we consider combinatorial problems that can be modeled by an optimization program and, in fact, most often by an integer linear program. For that purpose, we denote by $b01-math-0001$ the set of feasible solutions of the problem, and we assume that it is a subset of an $b01-math-0002$ -dimensional integer vector space and that is given by the following linear integer constraints (cf. (1.1)):

A.1 $b01-math-0003$

where,

1. $b01-math-0004$ is a $b01-math-0005$ matrix of constraint coefficients, and $b01-math-0006$ is a $b01-math-0007$ -vector of the right-hand side values. Most often we assume that the variables $b01-math-0008$ are non-negative integers.

2. The combinatorial problems associated with (A.1) are

Decision making: is $b01-math-0009$ nonempty?
Counting: calculate $b01-math-0010$ .
Optimization: solve $b01-math-0011$ for a given objective or performance function $b01-math-0012$ .

In this book, we describe in detail various problems, algorithms, and mathematical aspects that are related to the issues (i)–(iii) and associated with the problem (A.1). The following two sections deal with a sample of well-known counting and optimization problems, respectively.

A.1.1 Counting

In this section, we present details of the satisfiability problem, independent set problem, vertex coloring problem, Hamiltonian cycle problem, the knapsack problem, and the permanent problem.

A.1.1.1 Satisfiability Problem

The most common satisfiability problem (SAT) comprises the following two components:

A set of $b01-math-0013$ boolean variables $b01-math-0014$ , representing statements that can either be TRUE ( $b01-math-0015$ ) or FALSE ( $b01-math-0016$ ). The negation (the logical NOT) of a variable $b01-math-0017$ is denoted by $b01-math-0018$ . A variable or its negation is called a literal.
A set of $b01-math-0019$ distinct clauses $b01-math-0020$ of the form $b01-math-0021$ $b01-math-0021a$ , where the $b01-math-0022$ 's are the literals of the $b01-math-0023$ variables, and the $b01-math-0024$ denotes the logical OR operator.

The binary vector $b01-math-0025$ is called a truth assignment, or simply an assignment. Thus, $b01-math-0026$ assigns truth to $b01-math-0027$ and $b01-math-0028$ assigns truth to $b01-math-0029$ , for each $b01-math-0030$ . The simplest SAT problem can now be formulated as follows: find a truth assignment $b01-math-0031$ such that all clauses are true.

Denoting the logical AND operator by $b01-math-0032$ , we can represent the above SAT problem via a single formula as

equation

where $b01-math-0034$ is the number of literals in clause $b01-math-0035$ and $b01-math-0036$ are the literals of the $b01-math-0037$ variables in clause $b01-math-0038$ . The SAT formula is then said to be in conjunctive normal form (CNF). When all clauses have exactly the same number $b01-math-0039$ of literals per clause, we say that we deal with the $b01-math-0040$ -SAT problem.

The problem of deciding whether there exists a valid assignment, and, indeed, providing such a vector is called the SAT-assignment problem and is $b01-math-0041$ -complete [92]. We are concerned with the harder problem to count all valid assignments, which is $b01-math-0042$ -complete.

SAT problems can be converted into the framework of a solution set given by linear constraints. Define the $b01-math-0043$ matrix $b01-math-0044$ with entries $b01-math-0045$ by

equation

Furthermore, let $b01-math-0047$ be the $b01-math-0048$ -vector with entries $b01-math-0049$ . Then it is easy to see that, for any configuration $b01-math-0050$ ,

Thus, the SAT problem presents a particular case of the integer linear program with constraints given in (A.1).

A.1.1.2 Independent Sets

Consider a graph $b01-math-0052$ with $b01-math-0053$ nodes and $b01-math-0054$ edges. Our goal is to count the number of independent node sets of this graph. A node set is called independent if no two nodes are connected by an edge, that is, no two nodes are adjacent; see Figure A.1 for an illustration of this concept.

Figure A.1 The black nodes form an independent set inasmuch as they are not adjacent to each other.

The independent set problem can be converted into the family of integer programming problems (A.1) by considering the transpose of the incidence matrix $b01-math-0055$ . The nodes are labeled $b01-math-0056$ ; the edges are labeled $b01-math-0057$ . Then $b01-math-0058$ is a $b01-math-0059$ matrices of zeros and ones with $b01-math-0060$ iff the edge $b01-math-0061$ and node $b01-math-0062$ are incident. The decision vector $b01-math-0063$ denotes which nodes belong to the independent set. Finally, let $b01-math-0064$ be the $b01-math-0065$ -column vector with entries $b01-math-0066$ for all $b01-math-0067$ . Then it is easy to see that for any configuration $b01-math-0068$

Having formulated the problem set $b01-math-0070$ as the feasible set of an integer linear program with bounded variables, we can apply the procedure mentioned in Section 4.1 for constructing a sequence of decreasing sets (4.1) in order to start the splitting method.

An alternative construction of the sequence of decreasing sets (4.1) is the following. Let $b01-math-0071$ be the set of the first $b01-math-0072$ edges and define the associated subgraph $b01-math-0073$ , $b01-math-0074$ . Note that $b01-math-0075$ , and that $b01-math-0076$ is obtained from $b01-math-0077$ by adding the edge $b01-math-0078$ , which is not in $b01-math-0079$ . Define $b01-math-0080$ to be the set of the independent sets of $b01-math-0081$ , then clearly $b01-math-0082$ . Here $b01-math-0083$ , because $b01-math-0084$ has no edges and thus every subset of $b01-math-0085$ is an independent set, including the empty set.

A.1.1.3 Vertex Coloring

Given a graph $b01-math-0086$ with $b01-math-0087$ nodes $b01-math-0088$ , and $b01-math-0089$ edges $b01-math-0090$ , color the nodes of $b01-math-0091$ with at most a $b01-math-0092$ colors given from a set of colors $b01-math-0093$ , such that for each edge $b01-math-0094$ , nodes $b01-math-0095$ and $b01-math-0096$ have different colors. We shall show how to cast the node coloring problem into the framework (A.1). Define the decision variables $b01-math-0097$ for $b01-math-0098$ and $b01-math-0099$ by

equation

Then, $b01-math-0101$ iff

A.2 $b01-math-0102$

An alternative to (A.2) framework can be obtained by arguing as follows. Define decision variables $b01-math-0103$ for $b01-math-0104$ by

Then, $b01-math-0106$ iff

A.3 $b01-math-0107$

The procedure for node coloring via the randomized algorithm is the same as for the independent sets problem. Again, let $b01-math-0108$ be the set of the first $b01-math-0109$ edges and let $b01-math-0110$ be the associated subgraph, $b01-math-0111$ . Note that $b01-math-0112$ , and that $b01-math-0113$ is obtained from $b01-math-0114$ by adding the edge $b01-math-0115$ . Define $b01-math-0116$ to be the set of the node colorings of $b01-math-0117$ , then clearly $b01-math-0118$ . Here $b01-math-0119$ , because $b01-math-0120$ has no edges and thus we can color any node with any color.

A.1.1.4 Hamiltonian Cycles

Given a graph $b01-math-0121$ with $b01-math-0122$ nodes and $b01-math-0123$ edges, find all Hamiltonian cycles. Figure A.2 presents a graph with eight nodes and several Hamiltonian cycles, one of which is marked in bold lines.

Figure A.2 A Hamiltonian graph. The bold edges form a Hamiltonian cycle.

A.1.1.5 Permanent

Let $b01-math-0124$ be a $b01-math-0125$ matrix consisting of zeros and ones only. The permanent of $b01-math-0126$ is defined by

A.4 $b01-math-0127$

where $b01-math-0128$ is the set of all permutations $b01-math-0129$ of $b01-math-0130$ .

It is well known that the permanent of $b01-math-0131$ equals the number of perfect matchings in a balanced bipartite graph $b01-math-0132$ with matrix $b01-math-0133$ as its biadjacency matrix [90]: two disjoint sets of nodes $b01-math-0134$ and $b01-math-0135$ , where $b01-math-0136$ iff $b01-math-0137$ . Recall that a matching is a collection of edges $b01-math-0138$ such that each node occurs at most once in $b01-math-0139$ . A perfect matching is a matching of size $b01-math-0140$ .

To cast the bipartite perfect matching problem in an integer linear program, we define the decision variables $b01-math-0141$ by

equation

Let $b01-math-0143$ be the set of all perfect matchings of the bipartite graph associated with the given binary matrix $b01-math-0144$ . Then $b01-math-0145$ iff

A.5 $b01-math-0146$

As an example, let $b01-math-0147$ be the $b01-math-0148$ matrix

A.6 $b01-math-0149$

The corresponding bipartite graph is given in Figure A.3. The graph has three perfect matchings, one of which is displayed in the figure.

Figure A.3 A bipartite graph. The bold edges form a perfect matching.

A.1.1.6 Knapsack Problem

Given items of sizes $b01-math-0150$ and a positive integer $b01-math-0151$ , find the numbers of vectors $b01-math-0152$ , such that

equation

The integer $b01-math-0154$ represents the size of the knapsack, and $b01-math-0155$ indicates whether or not item $b01-math-0156$ is placed in it. Let $b01-math-0157$ denote the set of all feasible solutions, that is, all different combinations of items that can be placed in the knapsack without exceeding its capacity. The goal is to determine $b01-math-0158$ .

A.1.2 Combinatorial Optimization

In this section, we present details of the max-cut problem, the traveling salesman problem, Internet security problem, set covering/partitioning/packing problem, knapsack problems, the maximum satisfiability problem, the weighted matching problem, and graph coloring problems.

A.1.2.1 Max-Cut Problem

The maximal-cut (max-cut) problem in a graph can be formulated as follows. Given a graph $b01-math-0159$ with a set of nodes $b01-math-0160$ , a set of edges $b01-math-0161$ between the nodes, and a weight function on the edges, $b01-math-0162$ . The problem is to partition the nodes into two subsets $b01-math-0163$ and $b01-math-0164$ such that the sum of the weights $b01-math-0165$ of the edges going from one subset to the other is maximized:

equation

A cut can be conveniently represented via its corresponding cut vector $b01-math-0167$ , where $b01-math-0168$ if node $b01-math-0169$ belongs to same partition as $b01-math-0170$ , and $b01-math-0171$ else. For each cut vector $b01-math-0172$ , let $b01-math-0173$ be the partition of $b01-math-0174$ induced by $b01-math-0175$ , such that $b01-math-0176$ contains the set of nodes $b01-math-0177$ . Unless stated otherwise, we let $b01-math-0178$ (thus, $b01-math-0179$ ).

Let $b01-math-0180$ be the set of all cut vectors $b01-math-0181$ and let $b01-math-0182$ be the corresponding cost of the cut. Then,

A.7 $b01-math-0183$

When we do not specify, we assume that the graph is undirected. However, when we deal with a directed graph, the cost of a cut $b01-math-0184$ includes both the cost of the edges from $b01-math-0185$ to $b01-math-0186$ and from $b01-math-0187$ to $b01-math-0188$ . In this case, the cost corresponding to a cut vector $b01-math-0189$ is, therefore,

A.8 $b01-math-0190$

A.1.2.2 Traveling Salesman Problem

The traveling salesman problem (TSP) can be formulated as follows. Consider a weighted graph $b01-math-0191$ with $b01-math-0192$ nodes, labeled $b01-math-0193$ . The nodes represent cities, and the edges represent the roads between the cities. Each edge from $b01-math-0194$ to $b01-math-0195$ has weight or cost $b01-math-0196$ , representing the length of the road. The problem is to find the shortest tour that visits all the cities exactly once and such that the starting city is also the terminating one.

Let $b01-math-0197$ be the set of all possible tours and let $b01-math-0198$ be the total length of tour $b01-math-0199$ . We can represent each tour via a permutation $b01-math-0200$ with $b01-math-0201$ . Mathematically, the TSP reads as

A.9 $b01-math-0202$

A.1.2.3 RSA or Internet Security Problem

The RSA, also called the Internet security problem, reads as follows: find two primes, provided we are given a large integer $b01-math-0203$ known to be a product of these primes. Clearly, we can write the given number $b01-math-0204$ in the binary system with $b01-math-0205$ -bit integers as

Mathematically, the problem reads as follows: find binary $b01-math-0207$ such that

A.10 $b01-math-0208$

Remarks:

The problem (A.10) can be formulated for any basis, say for a decimal rather than a binary one.
To have a unique solution, we can impose in addition the constraint $b01-math-0209$ .
The RSA problem can also be formulated as:
Find binary $b01-math-0210$ such that

A.11 $b01-math-0211$

Example A.1

Let $b01-math-0212$ . In this case, the primes are $b01-math-0213$ and $b01-math-0214$ . We can write them as

and

respectively. In this case $b01-math-0217$ and the program (A.10) reduces to:

Find binary $b01-math-0218$ such that

A.12 $b01-math-0219$

The program should deliver the unique solution

Note that we should consider $b01-math-0221$ and $b01-math-0222$ as the same solution as the previous one.

A.1.2.4 Set Covering, Set Packing, and Set Partitioning

Consider a finite set $b01-math-0223$ and let $b01-math-0224$ , $b01-math-0225$ be a collection of subsets of the set $b01-math-0226$ where $b01-math-0227$ . A subset $b01-math-0228$ is called a cover of $b01-math-0229$ if $b01-math-0230$ , and is called a packing of $b01-math-0231$ if $b01-math-0232$ for all $b01-math-0233$ and $b01-math-0234$ . If $b01-math-0235$ is both a cover and packing, it is called a partitioning.

Suppose $b01-math-0236$ is the cost associated with $b01-math-0237$ . Then the set covering problem is to find a minimum cost cover. If $b01-math-0238$ is the value or weight of $b01-math-0239$ , then the set packing problem is to find a maximum weight or value packing. Similarly, the set partitioning problem is to find a partitioning with minimum cost. These problems can be formulated as zero-one linear integer programs as shown below. For all $b01-math-0240$ and $b01-math-0241$ , let

equation

and

equation

Then, the set covering, set packing, and set partitioning formulations are given by

equation

and

equation

respectively.

A.1.2.5 Maximum Satisfiability Problem

The satisfiability problem has been introduced in Section A.1.1. When there is no feasible solution, one might be interested in finding an assignment of truth values to the variables that satisfies the maximum number of clauses, called the MAX-SAT problem.

If one associates a weight $b01-math-0247$ to each clause $b01-math-0248$ , one obtains the weighted MAX-SAT problem, denoted by MAX W-SAT. The assignment of truth values to the $b01-math-0249$ variables that maximizes the sum of the weights of the satisfied clauses is determined. Of course, MAX-SAT is contained in MAX W-SAT (all weights are equal to one). When all clauses have exactly the same number $b01-math-0250$ of literals per clause, we say that we deal with the MAX- $b01-math-0251$ -SAT problem, or MAX W- $b01-math-0252$ -SAT in the weighted case.

The MAX W-SAT problem has a natural integer linear programming formulation (ILP). Recall the boolean decision variables $b01-math-0253$ , meaning TRUE ( $b01-math-0254$ ) and FALSE ( $b01-math-0255$ ). Let the boolean variable $b01-math-0256$ if clause $b01-math-0257$ is satisfied, $b01-math-0258$ otherwise. The integer linear program is

equation

where $b01-math-0261$ and $b01-math-0262$ denote the set of indices of variables that appear un-negated and negated in clause $b01-math-0263$ , respectively.

Because the sum of the $b01-math-0264$ is maximized and because each $b01-math-0265$ appears as the right-hand side of one constraint only, $b01-math-0266$ will be equal to one if and only if clause $b01-math-0267$ is satisfied.

A.1.2.6 Weighted Matching

Let $b01-math-0268$ be a graph with $b01-math-0269$ nodes ( $b01-math-0270$ even), and let $b01-math-0271$ be a weight function on the edges, then the (general weighted) matching problem is formulated as

A.13 $b01-math-0272$

A.1.2.7 Knapsack Problems

All knapsack problems consider a set of items with associated profit $b01-math-0273$ and weight $b01-math-0274$ . A subset of the items is to be chosen such that the weight sum does not exceed the capacity $b01-math-0275$ of the knapsack, and such that the largest possible profit sum is obtained. We will assume that all coefficients $b01-math-0276$ , $b01-math-0277$ , $b01-math-0278$ are positive integers, although weaker assumptions may sometimes be handled in the individual problems.

The 0-1 knapsack problem is the problem of choosing some of the $b01-math-0279$ items such that the corresponding profit sum is maximized without the weight sum exceeding the capacity $b01-math-0280$ . Thus, it may be formulated as the following maximization problem:

A.14 $b01-math-0281$

where $b01-math-0282$ is a binary variable having value $b01-math-0283$ if item $b01-math-0284$ should be included in the knapsack, and $b01-math-0285$ otherwise. If we have a bounded amount $b01-math-0286$ of each item type $b01-math-0287$ , then the bounded knapsack problem appears:

A.15 $b01-math-0288$

Here $b01-math-0289$ gives the amount of each item type that should be included in the knapsack in order to obtain the largest objective value. The unbounded knapsack problem is a special case of the bounded knapsack problem, since an unlimited amount of each item type is available:

A.16 $b01-math-0290$

Actually, any variable $b01-math-0291$ of an unbounded knapsack problem will be bounded by the capacity $b01-math-0292$ , as the weight of each item is at least one. But, generally, there is no benefit by transforming an unbounded knapsack problem into its bounded version.

The most general form of a knapsack problem is the multiconstrained knapsack problem, which is basically a general integer programming problem, where all coefficients $b01-math-0293$ , $b01-math-0294$ , and $b01-math-0295$ are non-negative integers. Thus, it may be formulated as

A.17 $b01-math-0296$

The quadratic knapsack problem presented is an example of a knapsack problem with a quadratic objective function. It may be stated as

A.18 $b01-math-0297$

Here, $b01-math-0298$ is the profit obtained if both items $b01-math-0299$ and $b01-math-0300$ are chosen, while $b01-math-0301$ is the weight of item $b01-math-0302$ . The quadratic knapsack problem is a knapsack counterpart to the quadratic assignment problem, and theproblem has several applications in telecommunication and hydrological studies.

A.1.2.8 Graph Coloring

Similar to many combinatorial optimization problems, the graph coloring problem has several mathematical programming formulations. Five such formulations are presented in the following:

(F-1):

A.19 $b01-math-0303$

A.20 $b01-math-0304$

A.21 $b01-math-0305$

A.22 $b01-math-0306$

In the above model, $b01-math-0307$ if color $b01-math-0308$ is used. The binary variables $b01-math-0309$ are associated with node $b01-math-0310$ : $b01-math-0311$ if and only if color $b01-math-0312$ is assigned to node $b01-math-0313$ . Constraints (A.19) ensure that exactly one color is assigned to each node. Constraints (A.20) prevent adjacent nodes from having the same color. Constraints (A.21) guarantee that no $b01-math-0314$ can be $b01-math-0315$ unless color $b01-math-0316$ is used. The optimal objective function value gives the chromatic number of the graph. Moreover, the sets $b01-math-0317$ for all $b01-math-0318$ , comprise a partition of the nodes into (minimum number of) independent sets.

(F-2):

A.23 $b01-math-0319$

A.24 $b01-math-0320$

A.25 $b01-math-0321$

The value of $b01-math-0322$ indicates which color is assigned to $b01-math-0323$ , for $b01-math-0324$ . Constraints (A.24) and (A.25) prevent two adjacent nodes from having the same color. This can be seen by noting that, if $b01-math-0325$ , then no feasible assignment of $b01-math-0326$ will satisfy both (A.24) and (A.25). The optimal objective function value equals the chromatic number of the graph under consideration.

The coloring problem can also be formulated as a set partitioning problem. Let $b01-math-0327$ be all the independent sets of $b01-math-0328$ . Let the rows of the 0-1 matrix $b01-math-0329$ be the characteristic vectors of $b01-math-0330$ , $b01-math-0331$ . Define variables $b01-math-0332$ as follows:

equation

(F-3):

A.26 $b01-math-0334$

where $b01-math-0335$ , and $b01-math-0336$ is of dimension $b01-math-0337$ .

(F-4):

A.27 $b01-math-0338$

The graph coloring problem can also be formulated as a special case of the quadratic assignment problem. Given an integer $b01-math-0339$ and a graph $b01-math-0340$ , the optimal objective function value of the following problem is zero if and only if $b01-math-0341$ is $b01-math-0342$ colorable.

(F-5):

A.28 $b01-math-0343$

A.2 Information

In this section we discuss briefly Shannon's entropy and Kullback–Leibler's cross-entropy. The subsequent material is taken almost verbatim from Section 1.14 [108].

A.2.1 Shannon Entropy

One of the most celebrated measures of uncertainty in information theory is the Shannon entropy, or simply entropy. The entropy of a discrete random variable $b01-math-0344$ with density $b01-math-0345$ is defined as

Here $b01-math-0347$ is interpreted as a random character from an alphabet $b01-math-0348$ , such that $b01-math-0349$ with probability $b01-math-0350$ . We will use the convention $b01-math-0351$ .

It can be shown that the most efficient way to transmit characters sampled from $b01-math-0352$ over a binary channel is to encode them such that the number of bits required to transmit $b01-math-0353$ is equal to $b01-math-0354$ . It follows that $b01-math-0355$ is the expected bit length required to send a random character $b01-math-0356$ .

A more general approach, which includes continuous random variables, is to define the entropy of a random variable $b01-math-0357$ with density $b01-math-0358$ by

A.29 $b01-math-0359$

Definition (A.29) can easily be extended to random vectors $b01-math-0360$ as (in the continuous case)

A.30 $b01-math-0361$

Often, $b01-math-0362$ is called the joint entropy of the random variables $b01-math-0363$ and is also written as $b01-math-0364$ . In the continuous case, $b01-math-0365$ is frequently referred to as the differential entropy to distinguish it from the discrete case.

Example A.2

Let $b01-math-0366$ have a $b01-math-0367$ distribution for some $b01-math-0368$ . The density $b01-math-0369$ of $b01-math-0370$ is given by $b01-math-0371$ and $b01-math-0372$ , so that the entropy of $b01-math-0373$ is

The graph of the entropy as a function of $b01-math-0375$ is depicted in Figure A.4. Note that the entropy is maximal for $b01-math-0376$ , which gives the uniform density on $b01-math-0377$ .

Figure A.4 The entropy for the $b01-math-0378$ distribution as a function of $b01-math-0379$ .

Next, consider a sequence $b01-math-0380$ of iid $b01-math-0381$ random variables. Let $b01-math-0382$ . The density of $b01-math-0383$ , say $b01-math-0384$ , is simply the product of the densities of the $b01-math-0385$ , so that

equation

The properties of $b01-math-0387$ in the continuous case are somewhat different from those in the discrete one. In particular,

1. The differential entropy can be negative, whereas the discrete entropy is always positive.

2. The discrete entropy is insensitive to invertible transformations, whereas the differential entropy is not. Specifically, if $b01-math-0388$ is discrete, $b01-math-0389$ , and $b01-math-0390$ is an invertible mapping, then $b01-math-0391$ , because $b01-math-0392$ . However, in the continuous case, we have an additional factor due to the Jacobian of the transformation.

It is not difficult to see that of any density $b01-math-0393$ , the one that gives the maximum entropy is the uniform density on $b01-math-0394$ . That is,

A.31 $b01-math-0395$

A.2.2 Kullback–Leibler Cross-Entropy

Let $b01-math-0396$ and $b01-math-0397$ be two densities on $b01-math-0398$ . The Kullback–Leibler cross-entropy between $b01-math-0399$ and $b01-math-0400$ is defined (in the continuous case) as

A.32 $b01-math-0401$

$b01-math-0402$ is also called the Kullback–Leibler divergence, the cross-entropy, and the relative entropy. If not stated otherwise, we shall call $b01-math-0403$ the cross-entropy between $b01-math-0404$ and $b01-math-0405$ . Notice that $b01-math-0406$ is not a distance between $b01-math-0407$ and $b01-math-0408$ in the formal sense, because, in general, $b01-math-0409$ . Nonetheless, it is often useful to think of $b01-math-0410$ as a distance because

and $b01-math-0412$ if and only if $b01-math-0413$ . This follows from Jensen's inequality (if $b01-math-0414$ is a convex function, such as $b01-math-0415$ , then $b01-math-0416$ ). Namely,

A.3 Efficiency of Estimators

In this book, we frequently use

A.33 $b01-math-0418$

which presents an unbiased estimator of the unknown quantity $b01-math-0419$ , where $b01-math-0420$ are independent replications of some random variable $b01-math-0421$ .

By the central limit theorem, $b01-math-0422$ has approximately a $b01-math-0423$ distribution for large $b01-math-0424$ . We shall estimate $b01-math-0425$ via the sample variance

equation

By the law of large numbers, $b01-math-0427$ converges with probability 1 to $b01-math-0428$ . Consequently, for $b01-math-0429$ and large $b01-math-0430$ , the approximate $b01-math-0431$ confidence interval for $b01-math-0432$ is given by

where $b01-math-0434$ is the $b01-math-0435$ quantile of the standard normal distribution. For example, for $b01-math-0436$ we have $b01-math-0437$ . The quantity

is often used in the simulation literature as an accuracy measure for the estimator $b01-math-0439$ . For large $b01-math-0440$ it converges to the relative error of $b01-math-0441$ , defined as

A.34 $b01-math-0442$

The square of the relative error

A.35 $b01-math-0443$

is called the squared coefficient of variation.

Example A.3 Estimation of Rare-Event Probabilities

Consider estimation of the tail probability $b01-math-0444$ of some random variable $b01-math-0445$ for a large number $b01-math-0446$ . If $b01-math-0447$ is very small, then the event $b01-math-0448$ is called a rare event and the probability $b01-math-0449$ is called a rare-event probability.

We may attempt to estimate $b01-math-0450$ via (A.33) as

A.36 $b01-math-0451$

which involves drawing a random sample $b01-math-0452$ from the pdf of $b01-math-0453$ and defining the indicators $b01-math-0454$ , $b01-math-0455$ . The estimator $b01-math-0456$ thus defined is called the crude Monte Carlo (CMC) estimator. For small $b01-math-0457$ the relative error of the CMC estimator is given by

A.37 $b01-math-0458$

As a numerical example, suppose that $b01-math-0459$ . In order to estimate $b01-math-0460$ accurately with relative error, say, $b01-math-0461$ , we need to choose a sample size

This shows that estimating small probabilities via CMC estimators is computationally meaningless.

A.3.1 Complexity

The theoretical framework in which one typically examines rare-event probability estimation is based on complexity theory (see, for example, [4]).

In particular, the estimators are classified either as polynomial-time or as exponential-time. It is shown in [4] that for an arbitrary estimator, $b01-math-0463$ of $b01-math-0464$ , to be polynomial-time as a function of some $b01-math-0465$ , it suffices that its squared coefficient of variation, $b01-math-0466$ , or its relative error, $b01-math-0467$ , is bounded in $b01-math-0468$ by some polynomial function, $b01-math-0469$ For such polynomial-time estimators, the required sample size to achieve a fixed relative error does not grow too fast as the event becomes rarer.

Consider the estimator (A.36) and assume that $b01-math-0470$ becomes very small as $b01-math-0471$ . Note that

Hence, the best one can hope for with such an estimator is that its second moment of $b01-math-0473$ decreases proportionally to $b01-math-0474$ as $b01-math-0475$ . We say that the rare-event estimator (A.36) has bounded relative error if for all $b01-math-0476$

A.38 $b01-math-0477$

for some fixed $b01-math-0478$ . Because bounded relative error is not always easy to achieve, the following weaker criterion is often used. We say that the estimator (A.36) is logarithmically efficient (sometimes called asymptotically optimal) if

A.39 $b01-math-0479$

Example A.4 The CMC Estimator Is Not Logarithmically Efficient

Consider the CMC estimator (A.36). We have

so that

Hence, the CMC estimator is not logarithmically efficient, and therefore alternative estimators must be found to estimate small $b01-math-0482$ .

A.3.2 Complexity of Randomized Algorithms

A randomized algorithm is said to give an $b01-math-0483$ -approximation for a parameter $b01-math-0484$ if its output $b01-math-0485$ satisfies

A.40 $b01-math-0486$

that is, the “relative error” $b01-math-0487$ of the approximation $b01-math-0488$ lies with high probability ( $b01-math-0489$ ) below some small number $b01-math-0490$ .

One of the main tools in proving (A.40) for various randomized algorithms is the so-called Chernoff bound, which states that for any random variable $b01-math-0491$ and any number $b01-math-0492$

A.41 $b01-math-0493$

Namely, for any fixed $b01-math-0494$ and $b01-math-0495$ , define the functions $b01-math-0496$ and $b01-math-0497$ . Then, clearly $b01-math-0498$ for all $b01-math-0499$ . As a consequence, for any $b01-math-0500$ ,

The bound (A.41) now follows by taking the smallest such $b01-math-0502$ .

An important application is the following.

Theorem A.1

Let $b01-math-0503$ be iid $b01-math-0504$ random variables, then their sample mean provides an $b01-math-0505$ -approximation for $b01-math-0506$ , that is,

provided $b01-math-0508$ .

For the proof see, for example, Rubinstein and Kroese [108].

Definition A.1 FPRAS

A randomized algorithm is said to provide a fully polynomial-time randomized approximation scheme (FPRAS) if for any input vector $b01-math-0509$ and any parameters $b01-math-0510$ and $b01-math-0511$ the algorithm outputs an $b01-math-0512$ -approximation to the desired quantity $b01-math-0513$ in time, that is, polynomial in $b01-math-0514$ and the size $b01-math-0515$ of the input vector $b01-math-0516$ .

Note that the sample mean in Theorem A.1 provides a FPRAS for estimating $b01-math-0517$ . Note also that the input vector $b01-math-0518$ consists of the Bernoulli variables $b01-math-0519$ .

There exists a fundamental connection between the ability to sample uniformly from some set $b01-math-0520$ and counting the number of elements of interest. Because exact uniform sampling is not always feasible, MCMC techniques are often used to sample approximately from a uniform distribution.

Definition A.2 ϵ-Uniform Sample

Let $b01-math-0521$ be a random output of a sampling algorithm for a finite sample space $b01-math-0522$ . We say that the sampling algorithm generates an $b01-math-0523$ -uniform sample from $b01-math-0524$ if, for any $b01-math-0525$ ,

Definition A.3 Variation Distance

The variation distance between two distributions $b01-math-0527$ and $b01-math-0528$ on a countable space $b01-math-0529$ is defined as

It is well-known [88] that the definition of variation distance coincides with that of an $b01-math-0531$ -uniform sample in the sense that a sampling algorithm returns an $b01-math-0532$ -uniform sample on $b01-math-0533$ if and only if the variation distance between its output distribution $b01-math-0534$ and the uniform distribution $b01-math-0535$ satisfies

Bounding the variation distance between the uniform distribution and the empirical distribution of the Markov chain obtained after some warm-up period is a crucial issue while establishing the foundations of randomized algorithms because, with a bounded variation distance, one can produce an efficient approximation for $b01-math-0537$ .

Definition A.4 FPAUS

A sampling algorithm is called a fully polynomial almost uniform sampler (FPAUS) if, given an input vector $b01-math-0538$ and a parameter $b01-math-0539$ , the algorithm generates an $b01-math-0540$ -uniform sample from $b01-math-0541$ and runs in a time that is, polynomial in $b01-math-0542$ and the size of the input vector $b01-math-0543$ .

An important issue is to prove that given an FPAUS for a combinatorial problem, one can construct a corresponding FPRAS.

Example A.5 FPAUS and FPRAS for Independent Sets

An FPAUS for independent sets takes as input a graph $b01-math-0544$ and a parameter $b01-math-0545$ . The sample space $b01-math-0546$ consists of all independent sets in $b01-math-0547$ with the output being an $b01-math-0548$ -uniform sample from $b01-math-0549$ . The time to produce such an $b01-math-0550$ -uniform sample should be polynomial in the size of the graph and $b01-math-0551$ . Based on the product formula (4.2), Mitzenmacher and Upfal [88] prove that, given an FPAUS, one can construct a corresponding FPRAS.