ShortestPaths

Shortest paths algorithm finds the path between two vertices by starting at the source Vertex and then traversing the edges connecting the vertices to other vertices until it reaches the target vertex. The shortest paths algorithm works by exchanging messages between various vertices. Also this shortest paths algorithm is not directly a part of the Graph or GraphOps objects, rather must be invoked using lib.ShortestPaths():

scala> lib.ShortestPaths.run(graph,Array(1)).vertices.join(users).take(10)

res204: Array[(org.apache.spark.graphx.VertexId, (org.apache.spark.graphx.lib.ShortestPaths.SPMap, User))] = Array((4,(Map(1 -> 2),User(Liz,Doctor))), (6,(Map(1 -> 2),User(Beth,Accountant))), (8,(Map(1 -> 2),User(Mary,Cashier))), (10,(Map(1 -> 1),User(Ken,Librarian))), (2,(Map(1 -> 1),User(Mark,Doctor))), (1,(Map(1 -> 0),User(John,Accountant))), (3,(Map(1 -> 1),User(Sam,Lawyer))), (7,(Map(1 -> 2),User(Larry,Engineer))), (9,(Map(1 -> 3),User(Dan,Doctor))), (5,(Map(1 -> 2),User(Eric,Accountant))))

ShortestPaths picks the shortest paths in terms of number of hops between the two vertices. The following diagram shows three ways John can reach Larry and two of the paths are of length 2 and one of length 3. From the results of the preceding code, it clearly shows that the path chosen from Larry to John is of length 2.

The same is shown in the output in above code block as a vector containing the length of the path and the nodes (7,(Map(1 -> 2),User(Larry,Engineer))):

We can also compute the shortest path using weighted edges, which means every edge connecting users is not the same. For example, if we can consider the edge value/weight/attribute as the distance between where each user lives, we get a weighted graph. In this case, the shortest path is calculated by the distance between two users in miles:

scala> val srcId = 1 //vertex ID 1 is the user John
srcId: Int = 1

scala> val initGraph = graph.mapVertices((id, x) => if(id == srcId) 0.0 else Double.PositiveInfinity)
initGraph: org.apache.spark.graphx.Graph[Double,Long] = org.apache.spark.graphx.impl.GraphImpl@2b9b8608

scala> val weightedShortestPath = initGraph.pregel(Double.PositiveInfinity, 5)(
| (id, dist, newDist) => math.min(dist, newDist),
| triplet => {
| if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
| Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
| }
| else {
| Iterator.empty
| }
| },
| (a, b) => math.min(a, b)
| )
weightedShortestPath: org.apache.spark.graphx.Graph[Double,Long] = org.apache.spark.graphx.impl.GraphImpl@1f87fdd3

scala> weightedShortestPath.vertices.take(10).mkString(" ")
res247: String =
(4,10.0)
(6,6.0)
(8,6.0)
(10,5.0)
(2,1.0)
(1,0.0)
(3,3.0)
(7,7.0)
(9,5.0)
(5,4.0)

The following is a diagram that uses Pregel API to compute the Single Source Shortest Path from John to Larry starting from initialization and iteration by iteration until we reach the best paths.

Initialization of the graph is done by setting the value of vertex representing John to zero and all other vertices to positive infinity:

Once the initialization is complete, we will use Pregel for four iterations of recomputing the vertex values. In each iteration, we go through all the vertices and, at each vertex, check whether there is a better path from a source vertex to a destination vertex. If there is such an edge/path, then the vertex value is updated.

Let's define two functions distance(v) and distance(s, t), where distance(v) gives the value of a vertex and distance(s,t) gives the value of the edge connecting s to t.

In Iteration 1, every user except John is set to infinity and John is at 0, since he is the source vertex. Now, we use Pregel to loop through the vertices and check whether there is anything better than infinity. Using Ken as an example, we will check if distance("John") + distance("John", "Ken") < distance("Ken").

This is equivalent to checking whether 0 + 5 < Infinity, which is true; so we update Ken's distance to 5.

Similarly, we check for Mary, distance("Ken") + distance("Ken", "Mary") < distance("Mary"), which turns out to be false, since at that time Ken is still at infinity. Hence, in iteration 1, we could only update the users who are connected to John.

In the next iteration, Mary, Liz, Eric and so on, are all updated since now we have updated values for Ken, Mark, and Sam from iteration 1. This continues for a number of iterations specified in the Pregel API call.

Shown below are the illustrations of the various iterations when computing single source shortest path on the graph:

The shortest paths from John to Larry after four iterations shows that the shortest path is five miles. The path from John to Larry can be seen if you follow the path John | Mark | Sam | Larry:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.215.117