VertexRDD

As seen earlier, the VertexRDD is an RDD containing the vertices and their associated attributes. Each element in the RDD represents a vertex or node in the graph. In order to maintain the uniqueness of the vertex, we need to have a way of assigning a unique ID to each of the vertexes. For this purpose, GraphX defines a very important identifier known as VertexId.

VertexId is defined as a 64-bit vertex identifier that uniquely identifies a vertex within a graph. It does not need to follow any ordering or any constraints other than uniqueness.

The declaration of VertexId is as follows as simply an alias for a 64-bit Long number:

type VertexId = Long

The VertexRDD extends an RDD of a pair of VertexID and vertex attributes represented by RDD[(VertexId, VD)]. It also ensures that there is only one entry for each vertex and by preindexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be joined efficiently.

class VertexRDD[VD]() extends RDD[(VertexId, VD)]

VertexRDD also implements many functions, which provide important functionality related to graph operations. Each function typically accepts inputs of vertices represented by VertexRDD.

Let's load vertices into a VertexRDD of users. For this, we shall first declare a case class User as shown here:

case class User(name: String, occupation: String)

Now, using the file users.txt, create the VertexRDD:

VertexID Name Occupation
1 John Accountant
2 Mark Doctor
3 Sam Lawyer
4 Liz Doctor
5 Eric Accountant
6 Beth Accountant
7 Larry Engineer
8 Marry Cashier
9 Dan Doctor
10 Ken Librarian


Each line of the file users.txt contains VertexId , the Name, and the Occupation, so we can use the String split function here:

scala> val users = sc.textFile("users.txt").map{ line =>
val fields = line.split(",")
(fields(0).toLong, User(fields(1), fields(2)))
}
users: org.apache.spark.rdd.RDD[(Long, User)] = MapPartitionsRDD[2645] at map at <console>:127

scala> users.take(10)
res103: Array[(Long, User)] = Array((1,User(John,Accountant)), (2,User(Mark,Doctor)), (3,User(Sam,Lawyer)), (4,User(Liz,Doctor)), (5,User(Eric,Accountant)), (6,User(Beth,Accountant)), (7,User(Larry,Engineer)), (8,User(Mary,Cashier)), (9,User(Dan,Doctor)), (10,User(Ken,Librarian)))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.103.229