ConnectedComponents

Connected components are essentially subgraphs within a graph, where the vertices are connected to each other in some way. This means that every vertex in the same component has an edge to/from some other vertex in the component. Whenever no other edge exists to connect a vertex to a component, a new component is created with that specific vertex. This continues until all vertices are in some component.

The graph object provides a connectComponents() function to compute the connected components. This uses the Pregel API underneath to calculate the component a vertex belongs to. The following is the code to calculate connected components in the graph. Obviously, in this example, we had only one connected component, so it shows one as the component number for all users:

scala> graph.connectedComponents.vertices.collect
res198: Array[(org.apache.spark.graphx.VertexId, org.apache.spark.graphx.VertexId)] = Array((4,1), (6,1), (8,1), (10,1), (2,1), (1,1), (3,1), (7,1), (9,1), (5,1))

scala> graph.connectedComponents.vertices.join(users).take(10)

res197: Array[(org.apache.spark.graphx.VertexId, (org.apache.spark.graphx.VertexId, User))] = Array((4,(1,User(Liz,Doctor))), (6,(1,User(Beth,Accountant))), (8,(1,User(Mary,Cashier))), (10,(1,User(Ken,Librarian))), (2,(1,User(Mark,Doctor))), (1,(1,User(John,Accountant))), (3,(1,User(Sam,Lawyer))), (7,(1,User(Larry,Engineer))), (9,(1,User(Dan,Doctor))), (5,(1,User(Eric,Accountant))))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.229.92