EdgeRDD

The EdgeRDD represents the set of Edges between the vertices and is a member of the Graph class as seen earlier. EdgeRDD, just like VertexRDD, extends from RDD and takes both Edge attributes and Vertex attributes.

EdgeRDD[ED, VD] extends RDD[Edge[ED]] by storing the edges in columnar format on each partition for performance. It may additionally store the vertex attributes associated with each edge to provide the triplet view:

class EdgeRDD[ED]() extends RDD[Edge[ED]]

EdgeRDD also implements many functions, which provide important functionality related to graph operations. Each function typically accepts inputs of edges represented by EdgeRDD. Each Edge consists of a source vertexId, destination vertexId and edge attributes such as a String, Integer, or any case class. In the following example, we use a String friend as the attribute. Later in this chapter, we use the distance in miles (Integer) as the attribute.

We can create EdgeRDD by reading a file of pairs of vertexIds:

Source Vertex ID Target/Destination Vertex ID Distance in Miles
1 3 5
3 1 5
1 2 1
2 1 1
4 10 5
10 4 5
1 10 5
10 1 5
2 7 6
7 2 6
7 4 3
4 7 3
2 3 2

 

Each line of the friends.txt file contains the source vertexId and destination vertexId, so we can use the String split function here:

scala> val friends = sc.textFile("friends.txt").map{ line =>
val fields = line.split(",")
Edge(fields(0).toLong, fields(1).toLong, "friend")
}
friends: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[String]] = MapPartitionsRDD[2648] at map at <console>:125

scala> friends.take(10)
res109: Array[org.apache.spark.graphx.Edge[String]] = Array(Edge(1,3,friend), Edge(3,1,friend), Edge(1,2,friend), Edge(2,1,friend), Edge(4,10,friend), Edge(10,4,friend), Edge(1,10,friend), Edge(10,1,friend), Edge(2,7,friend), Edge(7,2,friend))

We now have vertices and edges, so it is time to put everything together and explore how we can build a Graph from the lists of vertices and edges:

scala> val graph = Graph(users, friends)
graph: org.apache.spark.graphx.Graph[User,String] = org.apache.spark.graphx.impl.GraphImpl@327b69c8

scala> graph.vertices
res113: org.apache.spark.graphx.VertexRDD[User] = VertexRDDImpl[2658] at RDD at VertexRDD.scala:57

scala> graph.edges
res114: org.apache.spark.graphx.EdgeRDD[String] = EdgeRDDImpl[2660] at RDD at EdgeRDD.scala:41

Using the Graph object, we can look at the vertices and edges using the collect() function, which will show all vertices and edges. Each vertex is of the form (VertexId, User) and each edge is of the form (srcVertexId, dstVertexId, edgeAttribute).

scala> graph.vertices.collect
res111: Array[(org.apache.spark.graphx.VertexId, User)] = Array((4,User(Liz,Doctor)), (6,User(Beth,Accountant)), (8,User(Mary,Cashier)), (10,User(Ken,Librarian)), (2,User(Mark,Doctor)), (1,User(John,Accountant)), (3,User(Sam,Lawyer)), (7,User(Larry,Engineer)), (9,User(Dan,Doctor)), (5,User(Eric,Accountant)))

scala> graph.edges.collect
res112: Array[org.apache.spark.graphx.Edge[String]] = Array(Edge(1,2,friend), Edge(1,3,friend), Edge(1,10,friend), Edge(2,1,friend), Edge(2,3,friend), Edge(2,7,friend), Edge(3,1,friend), Edge(3,2,friend), Edge(3,10,friend), Edge(4,7,friend), Edge(4,10,friend), Edge(7,2,friend), Edge(7,4,friend), Edge(10,1,friend), Edge(10,4,friend), Edge(3,5,friend), Edge(5,3,friend), Edge(5,9,friend), Edge(6,8,friend), Edge(6,10,friend), Edge(8,6,friend), Edge(8,9,friend), Edge(8,10,friend), Edge(9,5,friend), Edge(9,8,friend), Edge(10,6,friend), Edge(10,8,friend))

Now that we have a graph created, we will look at various operations in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.64.241