Constructing a graph is not a trivial task; we need to supply vertices and edges between them. Let's focus on the first part. The first part consists of our users, users is an RDD of VertexId and String as follows:
package com.tomekl007.chapter_7
import org.apache.spark.SparkContext
import org.apache.spark.graphx.{Edge, Graph, VertexId}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
import org.scalatest.FunSuite
class VertexAPI extends FunSuite {
val spark: SparkContext = SparkSession.builder().master("local[2]").getOrCreate().sparkContext
test("Should use Vertex API") {
//given
val users: RDD[(VertexId, (String))] =
spark.parallelize(Array(
(1L, "a"),
(2L, "b"),
(3L, "c"),
(4L, "d")
))
VertexId is of the long type; this is only a type alias for Long:
type VertexID = Long
But since our graph sometimes has a lot of content, the VertexId should be unique and a very long number. Every vertex in our vertices' RDD should have a unique VertexId. The custom data associated with the vertex can be any class, but we will go for simplicity with the String class. First, we are creating a vertex with ID 1 and string data a, the next with ID 2 and string data b, the next with ID 3 and string data c, and similarly for the data with ID 4 and string d, as follows:
val users: RDD[(VertexId, (String))] =
spark.parallelize(Array(
(1L, "a"),
(2L, "b"),
(3L, "c"),
(4L, "d")
))