39

Randomized Graph Data-Structures for Approximate Shortest Paths*

Surender Baswana

Indian Institute of Technology, Kanpur

Sandeep Sen

Indian Institute of Technology, Delhi

39.1Introduction

39.2A Randomized Data-Structure for Static APASP: Approximate Distance Oracles

3-Approximate Distance OraclePreliminaries(2k - 1)-Approximate Distance OracleComputing Approximate Distance Oracles

39.3A Randomized Data-Structure for Decremental APASP

Main IdeaNotationsHierarchical Distance Maintaining Data-StructureBounding the Size of Bd,SuBd,Su under Edge-DeletionsImproved Decremental Algorithm for APASP up to Distance d

39.4Further Reading and Bibliography

Acknowledgments

References

39.1Introduction

Let G = (V, E) be an undirected weighted graph on n = |V| vertices and m = |E| edges. Length of a path between two vertices is the sum of the weights of all the edges of the path. The shortest path between a pair of vertices is the path of least length among all possible paths between the two vertices in the graph. The length of the shortest path between two vertices is also called the distance between the two vertices. An α-approximate shortest path between two vertices is a path of length at-most α times the length of the shortest path.

Computing all-pairs exact or approximate distances in G is one of the most fundamental graph algorithmic problem. In this chapter, we present two randomized graph data-structures for all-pairs approximate shortest paths (APASP) problem in static and dynamic environments. Both the data-structures are hierarchical data-structures and their construction involves random sampling of vertices or edges of the given graph.

The first data-structure is a randomized data-structure designed for efficiently computing APASP in a given static graph. In order to answer a distance query in constant time, most of the existing algorithms for APASP problem output a data-structure which is an n × n matrix that stores the exact/approximate distance between each pair of vertices explicitly. Recently a remarkable data-structure of o(n2) size has been designed for reporting all-pairs approximate distances in undirected graph. This data-structure is called approximate distance oracle because of its ability to answer a distance query in constant time in spite of its sub-quadratic size. We present the details of this novel data-structure and an efficient algorithm to build it.

The second data-structure is a dynamic data-structure designed for efficiently maintaining APASP in a graph that is undergoing deletion of edges. For a given graph G = (V, E) and a distance parameter dn, this data-structure provides the first o(nd) update time algorithm for maintaining α-approximate shortest paths for all pairs of vertices separated by distance ≤ d in the graph.

39.2A Randomized Data-Structure for Static APASP: Approximate Distance Oracles

There exist classical algorithms that require O(mn log n) time for solving all-pairs shortest paths (APSP) problem. There also exist algorithms based on fast matrix multiplication that achieve sub-cubic time. However, there is still no combinatorial algorithm that could achieve O(n3−ε) running time for APSP problem. In recent past, many simple combinatorial algorithms have been designed that compute all-pairs approximate shortest paths (APASP) for undirected graphs. These algorithms achieve significant improvement in the running time compared to those designed for APSP, but the distance reported has some additive or/and multiplicative error. An algorithm is said to compute all pairs α-approximate shortest paths, if for each pair of vertices u, vV, the distance reported is bounded by αδ(u, v), where δ(u, v) denotes the actual distance between u and v.

Among all the data-structures and algorithms designed for computing all-pairs approximate shortest paths, the approximate distance oracles are unique in the sense that they achieves simultaneous improvement in running time (sub-cubic) as well as space (sub-quadratic), and still answers any approximate distance query in constant time. For any k ≥ 1, it takes O(kmn1/k) time to compute (2k − 1)-approximate distance oracle of size O(kn1+1/k) that would answer any (2k − 1)-approximate distance query in O(k) time.

39.2.13-Approximate Distance Oracle

For a given undirected graph, storing distance information from each vertex to all the vertices requires θ(n2) space. To achieve sub-quadratic space, the following simple idea comes to mind.

II:

From each vertex, if we store distance information to a small number of vertices, can we still be able to report distance between any pair of vertices?

The above idea can indeed be realized using a simple random sampling technique, but at the expense of reporting approximate, instead of exact, distance as an answer to a distance query. We describe the construction of 3-approximate distance oracle as follows.

1.Let RV be a subset of vertices formed by picking each vertex randomly independently with probability γ (the value of γ will be fixed later on).

2.For each vertex uV, store the distances to all the vertices of the sample set R.

3.For each vertex uV, let p(u) be the vertex nearest to u among all the sampled vertices, and let Su be the set of all the vertices of the graph G that lie closer to u than the vertex p(u). Store the vertices of set Su along with their distance from u.

For each vertex uV, storing distance to vertices Su helps in answering distance query to vertices in locality of u, whereas storing distance from all the vertices of the graph to all the sampled vertices will be required (as shown below) to answer distance query for vertices that are not present in locality of each other. In order to extract distance information in constant time, for each vertex uV, we use two hash tables (see Reference 1, chapter 12), for storing distances from u to vertices of sets Su and R respectively. The size of each hash-table is of the order of the size of corresponding set (Su or R). A typical hash table would require O(1) expected time to determine whether wSu, and if so, report the distance δ(u, w). In order to achieve O(1) worst case time, the 2 − level hash table (see Fredman, Komlos, Szemeredi, [2]) of optimal size is employed.

The collection of these hash-tables (two tables per vertex) constitute a data-structure that we call approximate distance oracle. Let u, vV be any two vertices whose intermediate distance is to be determined approximately. If either u or v belong to set R, we can report exact distance between the two. Otherwise also exact distance δ(u, v) will be reported if v lies in Su or vice versa. The only case, that is left, is when neither vSu nor uSv. In this case, we report δ(u, p(u)) + δ(v, p(u)) as approximate distance between u and v. This distance is bounded by 3δ(u, v) as shown below.

umath39_1.jpg

fig39_1.jpg

Figure 39.1v is farther to u than p(u), bounding δ(p(u),v) using triangle inequality.

Hence distance reported by the approximate distance oracle described above is no more than three times the actual distance between the two vertices. In other words, the oracle is a 3-approximate distance oracle. Now, we shall bound the expected size of the oracle. Using linearity of expectation, the expected size of the sample set R is . Hence storing the distance from each vertex to all the vertices of sample set will take a total of O(n2γ) space. The following lemma gives a bound on the expected size of the sets Su, uV.

Lemma 39.1

Given a graph G = (V, E), let RV be a set formed by picking each vertex independently with probability γ. For a vertex uV, the expected number of vertices in the set Su is bounded by 1/γ.

Proof. Let {v1, v2, …, vn−1} be the sequence of vertices of set V{u} arranged in non-decreasing order of their distance from u. The set Su consists of all those vertices of the set V{u} that lie closer to u than any vertex of set R. Note that the vertex vi belongs to Su if none of the vertices of set {v1, v2, …, vi−1} (i.e., the vertices preceding vi in the sequence above) is picked in the sample R. Since each vertex is picked independently with probability p, therefore the probability that vertex vi belongs to set Su is (1 − γ)i−1. Using linearity of expectation, the expected number of vertices lying closer to u than any sampled vertex is

n1i=1(1γ)i11γ
i=1n1(1γ)i11γ

Hence the expected number of vertices in the set Su is no more than 1/γ.

So the total expected size of the 3-approximate distance oracle is O(n2γ + n/γ). Choosing γ=1/nγ=1/n to minimize the size, we conclude that there is a 3-approximate distance oracle of expected size n3/2.

39.2.2Preliminaries

In the previous subsection, 3-approximate distance oracle was presented based on the idea II. The (2k − 1)-approximate distance oracle is a k-level hierarchical data-structure. An important construct of the data-structure is Ball(·) defined as follows.

Definition 39.1

For a vertex u, and subsets X, YV, the set Ball(u, X, Y ) is the set consisting of all those vertices of the set X that lie closer to u than any vertex from set Y. (see Figure 39.2)

fig39_2.jpg

Figure 39.2The vertices pointed by solid-arrows constitute Ball(u, X, Y ).

It follows from the definition given above that Ball(u, X, ∅) is the set X itself, whereas Ball(u, X, X) = ∅. It can also be seen that the 3-approximate distance oracle described in the previous subsection stores Ball(u, V, R) and Ball(u, R, ∅) for each vertex uV.

If the set Y is formed by picking each vertex of set X independently with probability γ, it follows from Lemma 39.1 that the expected size of Ball(u, X, Y ) is bounded by 1/γ.

Lemma 39.2

Let G = (V, E) be a weighted graph, and XV be a set of vertices. If YX is formed by selecting each vertex independently with probability γ, the expected number of vertices in Ball(u, X, Y ) for any vertex uV is at-most 1/γ.

39.2.3(2k - 1)-Approximate Distance Oracle

In this subsection we shall give the construction of a (2k − 1)-approximate distance oracle which is also based on the idea II, and can be viewed as a generalization of 3-approximate distance oracle.

The (2k − 1)-approximate distance oracle is obtained as follows.

1.Let R1kR2kRkkR1kR2kRkk be a a hierarchy of subsets of vertices with R1k=VR1k=V, and Rik,i>1Rik,i>1 is formed by selecting each vertex of set Ri1kRi1k independently with probability n−1/k.

2.For each vertex uV, store the distance from u to all the vertices of Ball(u,Rkk,)Ball(u,Rkk,) in a hash table.

3.For each uV and each i < k, store the vertices of Ball(u,Rik,Ri+1k)Ball(u,Rik,Ri+1k) along with their distance from u in a hash table.

For sake of conciseness and without causing any ambiguity in notations, henceforth we shall use Balli(u) to denote Ball(u,Rik,Ri+1k)Ball(u,Rik,Ri+1k) or the corresponding hash-table storing Ball(u,Rik,Ri+1k)Ball(u,Rik,Ri+1k) for i < k.

The collection of the hash-tables Balli(u) : uV, ik constitute the data-structure that will facilitate answering of any approximate distance query in constant time. To provide a better insight into the data-structure, Figure 39.3 depicts the set of vertices constituting {Balli(u) | ik}.

fig39_3.jpg

Figure 39.3(i) Close description of Balli(u), i < k, (ii) hierarchy of balls around u.

39.2.3.1Reporting Distance with Stretch At-most (2k − 1)

Given any two vertices u, vV whose intermediate distance has to be determined approximately. We shall now present the procedure to find approximate distance between the two vertices using the k-level data-structure described above.

Let p1(u) = u and let pi(u), i > 1 be the vertex from the set RikRik nearest to u. Since pi(u) ∈ Balli(u) for each uV, so distance from each u to pi(u) is known for each ik.

The query answering process performs at-most k search steps. In the first step, we search Ball1(u) for the vertex p1(v). If p1(v) is not present in Ball1(u), we move to the next level and in the second step we search Ball2(v) for vertex p2(u). We proceed in this way querying balls of u and v alternatively : In ith step, we search Balli(x) for pi(y), where (x = u, y = v) if i is odd, and (x = v, y = u) otherwise. The search ends at ith step if pi(y) belongs to Balli(x), and then we report δ(x, pi(y)) + δ(y, pi(y) as an approximate distance between u and v.

Distance_Report (u, v)

Algorithm for reporting (2k − 1)-approximate distance between u, vV

l←1,
xu,yv
While (pl(y)Balll(x))(pl(y)Balll(x)) do
            swap(x, y),
            ll + 1
return δ(y, pl(y)) + δ(x, pl(y))

Note that pk(y)Rkkpk(y)Rkk, and we store the distance from x to all the vertices of set RkkRkk in Ballk(x) (which is Ball(x,Rkk,Ball(x,Rkk,). Therefore, the “while loop” of the distance reporting algorithm will execute at-most k − 1 iterations, spending O(1) time querying a hash table in each iteration.

In order to ensure that the above algorithm reports (2k − 1)-approximate distance between u, vV, we first show that the following assertion holds:

AiAi: At the end of ith iteration of the “while loop”, δ(y, pi+1(y)) ≤ (u, v).

The assertion AiAi can be proved using induction on i as follows. First note that the variables x and y take the values u and v alternatively during the “while-loop”. So δ(x, y) = δ(u, v) always.

For the base case (i = 0), p1(y) is same as y, and y is v. So δ(y, p1(y)) = 0. Hence A0A0 is true. For the rest of the inductive proof, it suffices to show that if AjAj is true, then after (j + 1)th iteration Aj+1Aj+1 is also true. The proof is as follows.

We consider the case of “even j,” the arguments for the case of “odd j” are similar. For even j, at the end of jth iteration, {x = u, y = v}, Thus AjAj implies that at the end of jth iteration δ(v, pj+1(v)) ≤ (u, v). Consider the (j + 1)th iteration. For the execution of (j + 1)th iteration, the condition in the “while-loop” must have been true. Thus pj+1(v) does not belong to Ball(u,Rj+1k,Rj+2k)Ball(u,Rj+1k,Rj+2k). Hence by Definition 39.1, the vertex pj+2(u) must be lying closer to u than the vertex pj+1(v). So at the end of (j + 1)th iteration, δ(y, pj+2(y)) can be bounded as follows

δ(y,pj+2(y))=δ(u,pj+2(u))δ(u,pj+1(v))δ(u,v)+δ(v,pj+1(v)){using triangle inequality}δ(u,v)+jδ(u,v){using Aj}=(j+1)δ(u,v)
δ(y,pj+2(y))==δ(u,pj+2(u))δ(u,pj+1(v))δ(u,v)+δ(v,pj+1(v)){using triangle inequality}δ(u,v)+jδ(u,v){using Aj}(j+1)δ(u,v)

Thus the assertion Aj+1Aj+1 holds.

Theorem 39.1

The algorithm Distance_Report(u, v) reports (2k − 1)-approximate distance between u and v

Proof As an approximate distance between u and v, note that the algorithm Distance-Report(u, v) would output δ(y, pl(y)) + δ(x, pl(y)), which by triangle inequality is no more than 2δ(y, pl(y)) + δ(x, y). Since δ(x, y) = δ(u, v), and δ(y, pl(y)) ≤ (l − 1)δ(u, v) as follows from assertion AlAl. Therefore, the distance reported is no more than (2l − 1)δ(u, v). Since the “while loop” will execute at-most k − 1 iterations, so l = k, and therefore the distance reported by the oracle is at-most (2k − 1)δ(u, v).

39.2.3.2Size of the (2k − 1)-approximate Distance Oracle

The expected size of the set RkkRkk is O(n1/k), and the expected size of each Balli(u) is n1/k using Lemma 39.2. So the expected size of the (2k − 1)-approximate distance oracle is O(n1/kn+(k1)nn1/k)=O(kn1+1/k)O(n1/kn+(k1)nn1/k)=O(kn1+1/k).

39.2.4Computing Approximate Distance Oracles

In this subsection, a sub-cubic running time algorithm is presented for computing (2k − 1)-approximate distance oracles. It follows from the description of the data-structure associated with approximate distance oracle that after forming the sampled sets of vertices RikRik, that takes O(m) time, all that is required is the computation of Balli(u) along with the distance from u to the vertices belonging to these balls for each u and ik.

Since Balli(u) is the set of all the vertices of set RikRik that lie closer to u than the vertex pi+1(u). So, in order to compute Balli(u), first we compute pi(u) for all uV, ik.

39.2.4.1Computing pi(u), ∀u ∈ V

Recall from definition itself that pi(u) is the vertex of the set RikRik that is nearest to u. Hence, computing pi(u) for each uV requires solving the following problem with X=Rik,Y=VXX=Rik,Y=VX.

Given X, YV in a graph G = (V, E), with XY = ∅, compute the nearest vertex of set X for each vertex yY.

The above problem can be solved by running a single source shortest path algorithm (Dijkstra’s algorithm) on a modified graph as follows. Modify the original graph G by adding a dummy vertex s to the set V, and joining it to each vertex of the set X by an edge of zero weight. Let G′ be the modified graph. Running Dijkstra’s algorithm from the vertex s as the source, it can be seen that the distance from s to a vertex yY is indeed the distance from y to the nearest vertex of set X. Moreover, if e(s, x), xX is the edge leading to the shortest path from s to y, then x is the vertex from the set X that lies nearest to y. The running time of the Dijkstra’s algorithm is O(m log n), we can thus state the following lemma.

Lemma 39.3

Given X, YV in a graph G = (V, E), with XY = ∅, it takes O(m log n) to compute the nearest vertex of set X for each vertex yY.

Corollary 39.1

Given a weighted undirected graph G = (V, E), and a hierarchy of subsets {Rik|ik}{Rik|ik}, we can compute pi(u) for all ik, uV in O(km log n) time.

39.2.4.2Computing Balli(u) Efficiently

In order to compute Balli(u) for each vertex uV efficiently, we first compute clusters {C(v,Ri+1k)|vRik}{C(v,Ri+1k)|vRik} which are defined as follows:

Definition 39.2

For a graph G = (V, E), and a set XV, the cluster C(v, X) consists of each vertex wV for whom v lies closer than any vertex of set X. That is, δ(w, v) < δ(w, x) for each xX.

It follows from the definition given above that uC(v,Ri+1k)uC(v,Ri+1k) if and only if vBalli(u). So, given clusters {C(v,Ri+1k)|vRik}{C(v,Ri+1k)|vRik}, we can compute {Balli(u) : uV} as follows.

For each vRikvRik do
         For each uC(v,Ri+1k)uC(v,Ri+1k) do
                  Balli(u)←Balli(u)∪{v}

Hence we can state the following Lemma.

Lemma 39.4

Given the family of clusters {C(v,Ri+1k)|vRik}{C(v,Ri+1k)|vRik}, the time required to compute {Balli(u)} is bounded by O(uV|Balli(u)|)O(uV|Balli(u)|).

The following property of the cluster C(v,Ri+1k)C(v,Ri+1k) will be used in its efficient computation.

Lemma 39.5

If uC(v,Ri+1k)uC(v,Ri+1k), then all the vertices on the shortest path from v to u also belong to the set C(v,Ri+1k)C(v,Ri+1k).

Proof. We give a proof by contradiction. Given that uC(v,Ri+1k)uC(v,Ri+1k), let w be any vertex on the shortest path from v to u. If wC(v,Ri+1k)wC(v,Ri+1k), the vertex v doesn’t lie closer to w than the vertex pi+1(w). See Figure 39.4. In other words δ(w, v) ≥ δ(w, pi+1(w).

Hence

δ(u,v)=δ(u,w)+δ(w,v)δ(u,w)+δ(w,pi+1(w))δ(u,pi+1(w))
δ(u,v)=δ(u,w)+δ(w,v)δ(u,w)+δ(w,pi+1(w))δ(u,pi+1(w))

Thus v does not lie closer to u than pi+1(w) which is a vertex of set Ri+1kRi+1k. Hence by definition, uC(v,Ri+1k)uC(v,Ri+1k), thus a contradiction.

fig39_4.jpg

Figure 39.4if w does not lie in C(v,Ri+1k)C(v,Ri+1k), then pi+1(w) would lie closer to u than v.

From Lemma 39.5, it follows that the graph induced by the vertices of the cluster C(v,Ri+1k)C(v,Ri+1k) is connected (hence the name cluster). Moreover, the entire cluster C(v,Ri+1k)C(v,Ri+1k) appears as a sub-tree of the shortest path tree rooted at v in the graph. As follows from the definition, for each vertex xC(v,Ri+1k)xC(v,Ri+1k), δ(v, x) < δ(x, pi+1(x)). Based on these two observations, here follows an efficient algorithm that computes the set C(v,Ri+1k)C(v,Ri+1k). The algorithm performs a restricted Dijkstra’s algorithm from the vertex v, wherein we don’t proceed along any vertex that does not belong to the set C(v,Ri+1k)C(v,Ri+1k).

A restricted Dijkstra’s algorithm: Note that the Dijkstra’s algorithm starts with singleton tree {v} and performs n − 1 steps to grow the complete shortest path tree. Each vertex xV{v} is assigned a label L(x), which is infinity in the beginning, but eventually becomes the distance from v to x. Let Vi denotes the set of i nearest vertices from v. The algorithm maintains the following invariant at the end of lth step:

I(l)I(l) : For all the vertices of the set Vl, the label L(x) = δ(v, x), and for every other vertex yVVl, the label L(y) is equal to the length of the shortest path from v to y that passes through vertices of Vl only.

During the (j + 1)th step, we select the vertex, say w from set VVj with least value of L(·). Since all the edge weights are positive, it follows from the invariant I(j)I(j) that L(w) = δ(w, v). Thus we add w to set Vj to get the set Vj+1. Now in order to satisfy the invariant I(j+1)I(j+1), we relax each edge e(w, y) incident from w to a vertex yVVj+1 as follows: L(y)min{L(y),L(w)+weight(w,y)}L(y)min{L(y),L(w)+weight(w,y)}. It is easy to observe that this ensures the validity of the invariant I(j+1)I(j+1).

In the restricted Dijkstra’s algorithm, we will put the following restriction on relaxation of an edge e(w, y): we relax the edge e(w, y) only if L(w) + weight(w, y) is less than δ(y, pi(y)). This will ensure that a vertex yC(v,Ri+1k)yC(v,Ri+1k) will never be visited during the algorithm. The fact that the vertices of the cluster C(v,Ri+1k)C(v,Ri+1k) form a sub-tree of the shortest path tree rooted at v, ensures that the above restricted Dijkstra’s algorithm indeed finds all the vertices (along with their distance from v) that form the cluster C(v,Ri+1k)C(v,Ri+1k). Since the running time of Dijkstra’s algorithm is dominated by the number of edges relaxed, and each edge relaxation takes log (n) time only, therefore, the restricted Dijkstra’s algorithm will run in time of the order of xC(v,Ri+1k)degree(x)lognxC(v,Ri+1k)degree(x)logn. Thus the total time for computing all the clusters {C(v,Ri+1k)|vRik}{C(v,Ri+1k)|vRik} is given by:

vRik,xC(v,Ri+1k)degree(x)logn=(xV,vBalli(x)degree(x))logn=(xV|Balli(x)|degree(x))logn
vRik,xC(v,Ri+1k)degree(x)logn=xV,vBalli(x)degree(x)logn=(xV|Balli(x)|degree(x))logn

By Lemma 39.2, the expected size of Balli(x) is bounded by n1/k, hence using linearity of expectation, the total expected cost of computing {C(v,Ri+1k)|vRik}{C(v,Ri+1k)|vRik} is asymptotically bounded by

xVn1/kdegree(x)logn=2mn1/klogn
xVn1/kdegree(x)logn=2mn1/klogn

Using the above result and Lemma 39.4, we can thus conclude that for a given weighted graph G = (V, E) and an integer k, it takes a total of ˜O(kmn1/klogn)O~(kmn1/klogn) time for computing {Balli(u) | i < k, uV }. If we use Fibonacci heaps instead of binary heaps in implementation of the restricted Dijkstra’s algorithm, we can get rid of the logarithmic factor in the running time. Hence the total expected running time for building the data-structure is O(kmn1/k). As mentioned before, the expected size of the data-structure will be O(kn1+1/k). To get O(kn1+1/k) bound on the worst case size of the data-structure, we repeat the preprocessing algorithm. The expected number of iterations will be just a constant. Hence, we can state the following theorem.

Theorem 39.2

Given a weighted undirected graph G = (V, E) and an integer k, a data-structure of size O(kn1+1/k) can be built in O(kmn1/k) expected time so that given any pair of vertices, (2k − 1)-approximate distance between them can be reported in O(k) time.

39.3A Randomized Data-Structure for Decremental APASP

There are a number of applications that require efficient solutions of the APASP problem for a dynamic graph. In these applications, an initial graph is given, followed by an on-line sequence of queries interspersed with updates that can be insertion or deletion of edges. We have to carry out the updates and answer the queries on-line in an efficient manner. The goal of a dynamic graph algorithm is to update the solution efficiently after the dynamic changes, rather than having to re-compute it from scratch each time.

The approximate distance oracles described in the previous section can be used for answering approximate distance query in a static graph. However, there does not seem to be any efficient way to dynamize these oracles in order to answer distance queries in a graph under deletion of edges. In this section we shall describe a hierarchical data structure for efficiently maintaining APASP in an undirected unweighted graph under deletion of edges. In addition to maintaining approximate shortest paths for all-pairs of vertices, this scheme has been used for efficiently maintaining approximate shortest paths for pair of vertices separated by distance in an interval [a, b] for any 1 ≤ a < bn. However, to avoid giving too much detail in this chapter, we would outline an efficient algorithm for the following problem only.

APASP-d: Given an undirected unweighted graph G = (V, E) that is undergoing deletion of edges, and a distance parameter d ≤ n, maintain approximate shortest paths for all-pairs of vertices separated by distance at-most d.

39.3.1Main Idea

For an undirected unweighted graph G = (V, E), a breadth-first-search (BFS) tree rooted at a vertex uV stores distance information with respect to the vertex u. So in order to maintain shortest paths for all-pairs of vertices separated by distance ≤ d, it suffices to maintain a BFS tree of depth d rooted at each vertex under deletion of edges. This is the approach taken by the previously existing algorithms.

The main idea underlying the hierarchical data-structure that would provide efficient update time for maintaining APASP can be summarized as follows: Instead of maintaining exact distance information separately from each vertex, keep small BFS trees around each vertex for maintaining distance information within locality of each vertex, and some what larger BFS trees around fewer vertices for maintaining global distance information.

We now provide the underlying intuition of the above idea and a brief outline of the new techniques used.

Let BduBdu denote the BFS tree of depth d rooted at vertex uV. There exists a simple algorithm for maintaining a BFS tree BduBdu under deletion of edges that takes a total of μ(Bdu)dμ(Bdu)d time, where μ(t) is the number of edges in the graph induced by tree t. Thus the total update time for maintaining shortest path for all-pairs separated by distance at-most d is of the order of uVμ(Bdu)duVμ(Bdu)d. Potentially μ(Bdu)μ(Bdu) can be as large as θ(m), and so the total update time over any sequence of edge deletions will be O(mnd). Dividing this total update cost uniformly over the entire sequence of edge deletions, we can see that it takes O(nd) amortized update time per edge deletion, and O(1) time for reporting exact distance between any pair of vertices separated by distance at-most d.

In order to achieve o(nd) bound on the update time for the problem APASP-d, we closely look at the expression of total update time uVμ(Bdu)duVμ(Bdu)d. There are n terms in this expression each of potential size θ(m). A decrease in either the total number of terms or the size of each term would give an improvement in the total update time. Thus the following simple ideas come to mind.

Is it possible to solve the problem APASP-d by keeping very few depth-d BFS trees?

Is there some other alternative t for depth bounded BFS tree BduBdu that has o(m) bound on μ(t)?

While it appears difficult for any of the above ideas to succeed individually, they can be combined in the following way: Build and maintain BFS trees of depth 2d on vertices of a set SV of size o(n), called the set of special vertices, and for each remaining vertex uVS, maintain a BFS tree (denoted by BSuBSu) rooted at u and containing all the vertices that lie closer to u than the nearest special vertex, say N(u,S)N(u,S).

Along the above lines, we present a 2-level data-structure (and its generalization to k-levels) for the problem APASP-d.

It can be seen that unlike the tree BduBdu, the new BFS tree BSuBSu might not contain all the vertices lying within distance d from u. In order to ensure that our scheme leads to a solution of problem APASP-d, we use the following observation similar to that of 3-approximate distance oracle in the previous section. If v is a vertex lying within distance d from u but not present in BSuBSu, an approximate distance from u to v can be extracted from the tree rooted at the nearest special vertex N(u,S)N(u,S). This is because (by triangle inequality) the distance from N(u,S)N(u,S) to v is at most twice the distance from u to v.

For our hierarchical scheme to lead to improved update time, it is crucial that we establish sub-linear upper bounds on μ(BSu)μ(BSu). We show that if the set S is formed by picking each vertex independently with suitable probability, then μ(BSu)=˜O(m/|S|)μ(BSu)=O~(m/|S|) with probability arbitrarily close to 1.

39.3.2Notations

For an undirected unweighted graph G = (V, E), SV, and a distance parameter dn,

δ(u, v): distance between u and v.

N(v,S)N(v,S): the vertex of the set SV nearest to v.

BdvBdv: The BFS tree of depth d rooted at vV.

BSvBSv: The BFS tree of depth (δ(u,N(u,S))1)(δ(u,N(u,S))1) rooted at v.

Bd,SvBd,Sv: The BFS tree of depth min{d,δ(v,N(v,S))1}min{d,δ(v,N(v,S))1} rooted at v.

μ(t): the number of edges in the sub-graph (of G) induced by the tree t.

ν(t): the number of vertices in tree t.

For a sequence {S0,S1,Sk1},SiV{S0,S1,Sk1},SiV, and a vertex uS0, we define

p0(u) = u.

pi+1(u) = the vertex from set Si+1 nearest to pi(u).

¯αα¯¯¯: the smallest integer of the form 2i which is greater than α.

39.3.3Hierarchical Distance Maintaining Data-Structure

Based on the idea of “keeping many small trees, and a few large trees,” we define a k-level hierarchical data-structure for efficiently maintaining approximate distance information as follows. (See Figure 39.5)

fig39_5.jpg

Figure 39.5Hierarchical scheme for maintaining approximate distance.

Let S={S0,S1,,Sk1:SiV,|Si+1|<|Si|}S={S0,S1,,Sk1:SiV,|Si+1|<|Si|} be a sequence. For a given distance parameter dn and i < k − 1, let FiFi be the collection {B2id,Si+1u:uSi}{B2id,Si+1u:uSi} of BFS trees, and Fk1Fk1 be the collection of BFS trees of depth 2k1d2k1d rooted at each uSk−1. We shall denote the set {(S0,F0),(S1,F1),,(Sk1,Fk1)}{(S0,F0),(S1,F1),,(Sk1,Fk1)} as the k-level hierarchy HkdHkd induced by the sequence SS.

Let v be a vertex within distance d from u. If v is present in Bd,S1uBd,S1u, we can report exact distance between them. Otherwise, (as will soon become clear) we can extract the approximate distance between u and v from the collection of the BFS trees rooted at the vertices u, p(u), …, pk−1(u) (see Figure 39.5). The following Lemma is the basis for estimating the distance between two vertices using the hierarchy HkdHkd.

Lemma 39.6

Given a hierarchy HkdHkd, if j < k − 1 is such that v is not present in any of the BFS trees {B2id,Si+1pi(u)|0ij}{B2id,Si+1pi(u)|0ij}, then for all ij

δ(pi+1(u),pi(u))2iδ(u,v)andδ(pi+1(u),v)2i+1δ(u,v).
δ(pi+1(u),pi(u))2iδ(u,v)andδ(pi+1(u),v)2i+1δ(u,v).

Proof. We give a proof by induction on j.

Base Case (j = 0): Since v is not present in BS1uBS1u, so the vertex p(u) must be lying equidistant or closer to u than v. Hence δ(p(u), u) ≤ δ(u, v). Using triangle inequality, it follows that δ(p(u), v) ≤ δ(p(u), u) + δ(u, v) = 2δ(u, v).

Induction Hypothesis:

δ(pi+1(u),pi(u))2iδ(u,v)δ(pi+1(u),pi(u))2iδ(u,v), and

δ(pi+1(u),v)2i+1δ(u,v)δ(pi+1(u),v)2i+1δ(u,v), for all i < l.

Induction Step (j = l): if vBSl+1pl(u)vBSl+1pl(u), then the distance between pl+1 and pl(u) must not be longer than δ(pl(u), v), which is less than 2lδ(u, v) (using induction hypothesis).

Now using triangle inequality (see the Figure 39.6) we can bound δ(pl+1(u), v) as follows.

δ(pl+1(u),v)δ(pl+1(u),pl(u))+δ(pl(u),v)2lδ(u,v)+δ(pl(u),v)2lδ(u,v)+2lδ(u,v){using I.H.}=2l+1δ(u,v)
δ(pl+1(u),v)=δ(pl+1(u),pl(u))+δ(pl(u),v)2lδ(u,v)+δ(pl(u),v)2lδ(u,v)+2lδ(u,v){using I.H.}2l+1δ(u,v)

fig39_6.jpg

Figure 39.6Bounding the approximate distance between pi+1(u) and v.

Since the depth of a BFS tree at (k − 1)th level of hierarchy HkdHkd is 2k−1d, therefore the following corollary holds true.

Corollary 39.2

If δ(u, v) ≤ d, then there is some pi(u), i < k such that v is present in the BFS tree rooted at pi(u) in the hierarchy HkdHkd.

Lemma 39.7

Given a hierarchy HkdHkd, if j < k − 1 is such that v is not present in any of the BFS trees {B2id,Sipi(u)|0ij}{B2id,Sipi(u)|0ij}, then δ(pi+1(u),u)(2i+11)δ(u,v)δ(pi+1(u),u)(2i+11)δ(u,v), for all ij.

Proof. Using simple triangle inequality, it follows that

δ(pi+1(u),u)liδ(pl+1(u),pl(u))li2lδ(u,v)=(2i+11)δ(u,v)
δ(pi+1(u),u)liδ(pl+1(u),pl(u))li2lδ(u,v)=(2i+11)δ(u,v)

It follows from Lemmas 39.6 and 39.7 that if l is the smallest integer such that v is present in the BFS tree rooted at pl(u) in the hierarchy HkdHkd, then we can report δ(pl(u), u) + δ(pl(u), v) as an approximate distance between u and v. Along these lines, we shall present an improved decremental algorithms for APASP-d.

39.3.4Bounding the Size of Bd,SuBd,Su under Edge-Deletions

We shall now present a scheme based on random sampling to find a set SV of vertices that will establish a sub-linear bound on the number of vertices (ν(BSu)ν(BSu)) as well as the number of edges (μ(BSu)μ(BSu)) induced by BSuBSu under deletion of edges. Since Bd,SuBSuBd,SuBSu, so these upper bounds also hold for Bd,SuBd,Su.

Build the set S of vertices by picking each vertex from V independently with probability ncnncn. The expected size of S is O(nc). Consider an ordering of vertices V according to their levels in the BFS tree BSuBSu (see Figure 39.7). The set of vertices lying at higher levels than the nearest sampled vertex in this ordering is what constitutes the BFS tree BSuBSu. Along similar lines as that of Lemma 39.1, it follows that the expected size of this set (and hence ν(BSu)ν(BSu)) is nncnnc. Moreover, it can be shown that ν(BSu)ν(BSu) is no more than 4nlnnnc4nlnnnc with probability >11n4>11n4. Now as the edges are being deleted, the levels of the vertices in the tree BSuBSu may fall, and so the ordering of the vertices may change. There will be a total of m such orderings during the entire course of edge deletions. Since the vertices are picked randomly and independently, therefore, the upper bound of 4nlnnnc4nlnnnc holds for ν(BSu)ν(BSu) with probability (11n4)(11n4) for any of these orderings. So we can conclude that ν(BSu)ν(BSu), the number of vertices of tree BSuBSu never exceeds (4nlnnnc)(4nlnnnc) during the entire course of edge deletions with probability >11n2>11n2.

fig39_7.jpg

Figure 39.7Bounding the size of BFS tree BSuBSu.

To bound the number of edges induced by BSuBSu, consider the following scheme. Pick every edge independently with probability ncmncm. The set S consists of the end points of the sampled edges. The expected size of S is O(nc). Consider an ordering of the edges according to their level in BSuBSu (level of an edge is defined as the minimum of the levels of its end points). Along the lines of arguments given above (for bounding the the number of vertices of BSuBSu), it can be shown that μ(BSu)μ(BSu), the number of edges induced by BSuBSu remains 4mlnnnc4mlnnnc with probability >11n2>11n2 during the entire course of edge deletions.

Note that in the sampling scheme to bound the number of vertices of tree BSuBSu, a vertex v is picked with probability ncnncn. Whereas in the sampling scheme for bounding the number of edges in the sub-graph induced by BSuBSu, a vertex v is picked with probability degree(v)ncmdegree(v)ncm. It can thus be seen that both the bounds can be achieved simultaneously by the following random sampling scheme:

R(c):Pick each vertexvVindependently with probabilityncn+degree(v)ncm.R(c):Pick each vertexvVindependently with probabilityncn+degree(v)ncm.

It is easy to see that the expected size of the set formed by the sampling scheme R(c)R(c) will be O(nc).

Theorem 39.3

Given an undirected unweighted graph G = (V, E), a constant c < 1, and a distance parameter d; a set S of size O(nc) vertices can be found that will ensure the following bound on the number of vertices and number of edges in the sub-graph of G induced by Bd,SuBd,Su.

ν(Bd,Su)=O(nlnnnc),μ(Bd,Su)=O(mlnnnc)
ν(Bd,Su)=O(nlnnnc),μ(Bd,Su)=O(mlnnnc)

with probability Ω(11n2)Ω(11n2) during the entire sequence of edge deletions.

39.3.4.1Maintaining the BFS Tree Bd,SuBd,Su under Edge Deletions

Even and Shiloach [3] design an algorithm for maintaining a depth-d BFS tree in an undirected unweighted graph.

Lemma 39.8: (Even, Shiloach [3])

Given a graph under deletion of edges, a BFS tree Bdu,uVBdu,uV can be maintained in O(d) amortized time per edge deletion.

For maintaining a Bd,SuBd,Su tree under edge deletions, we shall use the same algorithm of [3] with the modification that whenever the depth of Bd,SuBd,Su has to be increased (due to recent edge deletion), we grow the tree to its new level min{d,δ(u,N(u,S))1}min{d,δ(u,N(u,S))1}. We analyze the total update time required for maintaining Bd,SuBd,Su as follows.

There are two computational tasks: one extending the level of the tree, and another that of maintaining the levels of the vertices in the tree Bd,SuBd,Su under edge deletions. For the first task, the time required is bounded by the edges of the new level introduced which is O(μ(Bd,Su))O(μ(Bd,Su)). For the second task, we give a variant of the proof of Even and Shiloach [3] (for details, please refer [3]). The running time is dominated by the processing of the edges in this process. In-between two consecutive processing of an edge, level of one of the end-points of the edge falls down by at least one unit. The processing cost of an edge can thus be charged to the level from which it has fallen. Clearly the maximum number of edges passing a level i is bounded by μ(Bd,Su)μ(Bd,Su). The number of levels in the tree Bd,SuBd,Su is min{d,ν(Bd,Su)}min{d,ν(Bd,Su)}. Thus the total cost for maintaining the BFS tree Bd,SuBd,Su over the entire sequence of edge deletions is O(μ(Bd,Su)min{d,ν(Bd,Su)})O(μ(Bd,Su)min{d,ν(Bd,Su)}).

Lemma 39.9

Given an undirected unweighted graph G = (V, E) under edge deletions, a distance parameter d, and a set SV; a BFS tree Bd,SuBd,Su can be maintained in

O(μ(Bd,Su)mmin{d,ν(Bd,Su)})
O(μ(Bd,Su)mmin{d,ν(Bd,Su)})

amortized update time per edge deletion.

39.3.4.2Some Technical Details

As the edges are being deleted, we need an efficient mechanism to detect any increase in the depth of tree Bd,SuBd,Su. We outline one such mechanism as follows.

For every vertex vS, we keep a count C[v] of the vertices of the S that are neighbors of v. It is easy to maintain C[u], ∀uV under edge-deletions. We use the count C[v] in order to detect any increase in the depth of a tree Bd,SuBd,Su as follows. Note that when depth of a tree Bd,SuBd,Su is less than d, there has to be at-least one vertex w at leaf-level in Bd,SuBd,Su with C[w] ≥ 1 (as an indicator that the vertex p(u) is at next level). Therefore, after an edge deletion if there is no vertex w at leaf level with C[w] ≥ 1, we grow the BFS tree Bd,SuBd,Su beyond its previous level until either depth becomes d or we reach some vertex w′ with C[w′] ≥ 1.

Another technical issue is that when an edge e(x, y) is deleted, we must update only those trees which contain x and y. For this purpose, we maintain for each vertex, a set of roots of all the BFS trees containing it. We maintain this set using any dynamic search tree.

39.3.5Improved Decremental Algorithm for APASP up to Distance d

Let {(S0,F0),(S1,F1),,(Sk1,Fk1)}{(S0,F0),(S1,F1),,(Sk1,Fk1)} be a k-level hierarchy HkdHkd with S0 = V and nci = |Si|, where each ci, i < k is a fraction to be specified soon. Each set Si, i > 0 is formed by picking the vertices from set V using the random sampling scheme RR mentioned in the previous subsection.

To report distance from u to v, we start form the level 0. We first inquire if v lies in Bd,S1uBd,S1u. If v does not lie in the tree, we move to the first level and inquire if v lies in B2d,S2p(u)B2d,S2p(u). It follows from the Corollary 39.2 that if δ(u, v) ≤ d, then proceeding in this way, we eventually find a vertex pl(u), lk − 1 in the hierarchy HkdHkd such that v is present in the BFS tree rooted at pl(u). (See Figure 39.5). We then report the sum of distances from pl(u) to both u and v.

Algorithm for reporting approximate distance using HkdHkd

Distance(u, v)
{ D←0;l←0
      While (vB2ld,Sl+1pl(u)l<k1vB2ld,Sl+1pl(u)l<k1) do
      {
      If uB2ld,Sl+1pl(u), then Dδ(pl(u), u),
      DD + δ(pl(u), pl+1(u))
      ll + 1;
      }
      If vB2ld,Sl+1pl(u), then “δ(u, v) is greater than d”,
                                         else return δ(pl(u), v) + D
}

The approximation factor ensured by the above algorithm can be bounded as follows.

It follows from the Lemma 39.7 that the final value of D in the algorithm given above is bounded by (2l − 1)δ(u, v), and it follows from Lemma 39.6 that δ(pl(u), v) is bounded by 2lδ(u, v). Since lk − 1, therefore the distance reported by the algorithm is bounded by (2k − 1)δ(u, v) if v is at distance ≤ d.

Lemma 39.10

Given an undirected unweighted graph G = (V, E), and a distance parameter d. If α is the desired approximation factor, then there exists a hierarchical scheme Hkd with k=log2¯α, that can report α-approximate shortest distance between any two vertices separated by distance ≤ d, in time O(k).

Update time for maintaining the hierarchy Hdk: The update time per edge deletion for maintaining the hierarchy Hdk is the sum total of the update time for maintaining the set of BFS trees Fi,ik1.

Each BFS tree from the set Fk1 has depth 2k−1d, and edges O(m). Therefore, using Lemma 39.8, each tree from set Fk1 requires O(2k−1d) amortized update time per edge deletion. So, the amortized update time Tk−1 per edge deletion for maintaining the set Fk1 is

Tk1=O(nck12k1d)

It follows from the Theorem 39.3 that a tree t from a set Fi,i<(k1), has μ(t) = m ln n/nci+1, and depth =min{2id,nlnn/nci+1}. Therefore, using the Lemma 39.9, each tree tFi,i<k1 requires O(min{2id/nci+1,nlnn/n2ci+1}) amortized update time per edge deletion. So the amortized update time Ti per edge deletion for maintaining the set Fi is

Ti=O(min{2idncinci+1lnn,n1+cin2ci+1ln2n}),i<k1

Hence, the amortized update time T per edge deletion for maintaining the hierarchy Hdk is

T=Tk1+i<k1Ti=O(nck12k1d)+i=k2i=0O(min{2idncinci+1lnn,n1+cin2ci+1ln2n})

To minimize the sum on right hand side in the above equation, we balance all the terms constituting the sum, and get

T=˜O(2k1min{knd,(nd)2(k1)2k1})

If α is the desired approximation factor, then it follows from Lemma 39.10 that the number of levels k, in the hierarchy are log2¯α. So the amortized update time required is ˜O(αmin{log2¯αnd,(nd)¯α2(¯α1)}).

Theorem 39.4

Let G = (V, E) be an undirected unweighted graph undergoing edge deletions, d be a distance parameter, and α > 2 be the desired approximation factor. There exists a data-structure ˙Dα(1,d) for maintaining α-approximate distances for all-pairs separated by distance ≤ d in ˜O(αmin{log2¯αnd,(nd)¯α2(¯α1)}) amortized update time per edge deletion, and O(log¯α) query time.

Based on the data-structure of [3], the previous best algorithm for maintaining all-pairs exact shortest paths of length ≤ d requires O(nd) amortized update time. We have been able to achieve o(nd) update time at the expense of introducing approximation as shown in Table 39.1 on the following page.

Table 39.1Maintaining α-approximate distances for all-pairs of vertices separated by distance ≤ d.

Data-structure

α (the approximation factor)

Amortized update time per edge deletion

˙D3(1,d)

3

˜O(min(nd,(nd)2/3))

˙D7(1,d)

7

˜O(min(3nd,(nd)4/7))

˙D15(1,d)

15

˜O(min(4nd,(nd)8/15))

39.4Further Reading and Bibliography

Zwick [4] presents a very recent and comprehensive survey on the existing algorithms for all-pairs approximate/exact shortest paths. Based on the fastest known matrix multiplication algorithms given by Coppersmith and Winograd [5], the best bound for computing all-pairs shortest paths is O(n2.575) [6].

Approximate distance oracles are designed by Thorup and Zwick [7]. Based on a 1963 girth conjecture of Erdös [8], they also show that Ω(n1+1/k) space is needed in the worst case for any oracle that achieves stretch strictly smaller than (2k + 1). The space requirement of their approximate distance oracle is, therefore, essentially optimal. Also note that the preprocessing time of (2k − 1)-approximate distance oracle is O(mn1/k), which is sub-cubic. However, for further improvement in the computation time for approximate distance oracles, Thorup and Zwick pose the following question: Can (2k − 1)-approximate distance oracle be computed in ˜O(n2) time? Recently Baswana and Sen [9] answer their question in affirmative for unweighted graphs. However, the question for weighted graphs is still open.

For maintaining fully dynamic all-pairs shortest paths in graphs, the best known algorithm is due to Demetrescu and Italiano [10]. They show that it takes O(n2) amortized time to maintain all-pairs exact shortest paths after each update in the graph. Baswana et al. [11] present a hierarchical data-structure based on random sampling that provides efficient decremental algorithm for maintaining APASP in undirected unweighted graphs. In addition to achieving o(nd) update time for the problem APASP-d (as described in this chapter), they also employ the same hierarchical scheme for designing efficient data-structures for maintaining approximate distance information for all-pairs of vertices separated by distance in an interval [a, b], 1 ≤ a < bn.

Acknowledgments

The work of the first author is supported, in part, by a fellowship from Infosys Technologies Limited, Bangalore.

References

1.T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms, the MIT Press, 1990, chapter 12.

2.M.L. Fredman, J. Komlós, and E. Szemerédi, Storing a sparse table with O(1) worst case time, J. ACM, 31, 538, 1984.

3.S. Even and Y. Shiloach, An on-line edge-deletion problem, J. ACM, 28, 1, 1981.

4.U. Zwick, Exact and approximate distances in graphs - a survey, in Proc. of the 9th European Symposium on Algorithms (ESA), 2001, 33.

5.D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progressions, J. Symbolic Computation, 9, 251, 1990.

6.U. Zwick, All-pairs shortest paths in weighted directed graphs - exact and almost exact algorithms, in Proc. of the 39th IEEE Symposium on Foundations of Computer Science (FOCS), 1998, 310.

7.M. Thorup and U. Zwick, Approximate distance oracles, in Proc. of 33rd ACM Symposium on Theory of Computing (STOC), 2001, 183.

8.P. Erdös, Extremal problems in graph theory, in Theory of Graphs and its Applications (Proc. Sympos. Smolenice, 1963), Publ. House Czechoslovak Acad. Sci., Prague. 1964, 29.

9.S. Baswana and S. Sen, Approximate distance oracle for unweighted graphs in ˜O(n2) time, to appear in Proc. of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2004.

10.C. Demetrescu and G.F. Italiano, A new approach to dynamic all pairs shortest paths, in Proc. of 35th ACM Symposium on Theory of Computing (STOC), 2003, 159.

11.S. Baswana, R. Hariharan, and S. Sen, Maintaining all-pairs approximate shortest paths under deletion of edges, in Proc. of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2003, 394.

*This chapter has been reprinted from first edition of this Handbook, without any content updates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.119.120