Constructing a graph using the vertex

Constructing a graph is not a trivial task; we need to supply vertices and edges between them. Let's focus on the first part. The first part consists of our users, users is an RDD of VertexId and String as follows:

package com.tomekl007.chapter_7

import org.apache.spark.SparkContext
import org.apache.spark.graphx.{Edge, Graph, VertexId}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
import org.scalatest.FunSuite

class VertexAPI extends FunSuite {
val spark: SparkContext = SparkSession.builder().master("local[2]").getOrCreate().sparkContext

test("Should use Vertex API") {
//given
val users: RDD[(VertexId, (String))] =
spark.parallelize(Array(
(1L, "a"),
(2L, "b"),
(3L, "c"),
(4L, "d")
))

VertexId is of the long type; this is only a type alias for Long:

type VertexID = Long

But since our graph sometimes has a lot of content, the VertexId should be unique and a very long number. Every vertex in our vertices' RDD should have a unique VertexId. The custom data associated with the vertex can be any class, but we will go for simplicity with the String class. First, we are creating a vertex with ID 1 and string data a, the next with ID 2 and string data b, the next with ID 3 and string data c, and similarly for the data with ID 4 and string d, as follows:

    val users: RDD[(VertexId, (String))] =
spark.parallelize(Array(
(1L, "a"),
(2L, "b"),
(3L, "c"),
(4L, "d")
))
Creating a graph from only vertices will be correct but not very useful. A graph is the best way to find relationships between the data, which is why a graph is the main building block for social networks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.26.138