Using vertex and edge RDDs

  1. First we define the vertex and edge RDD as follows:
 List<Tuple2<Object, string>> vertices = new ArrayList<>(); 
 
vertices.add(new Tuple2<Object, string>(1l, "James")); 
vertices.add(new Tuple2<Object, string>(2l, "Robert")); 
vertices.add(new Tuple2<Object, string>(3l, "Charlie")); 
vertices.add(new Tuple2<Object, string>(4l, "Roger")); 
vertices.add(new Tuple2<Object, string>(5l, "Tony")); 
 
List<Edge<string>> edges = new ArrayList<>(); 
 
edges.add(new Edge<string>(1, 2, "Friend")); 
edges.add(new Edge<string>(2, 3, "Advisor")); 
edges.add(new Edge<string>(1, 3, "Friend")); 
edges.add(new Edge<string>(4, 3, "colleague")); 
edges.add(new Edge<string>(4, 5, "Relative")); 
edges.add(new Edge<string>(5, 2, "BusinessPartners")); 
 
JavaRDD<Tuple2<Object, string>> verticesRDD = javaSparkContext.parallelize(vertices); 
JavaRDD<Edge<string>> edgesRDD = javaSparkContext.parallelize(edges); 
 

org.apache.spark.graphx.Graph API requires scala.reflect.ClassTag objects for vertex type and edge type. These are Scala objects, which are singleton by nature.

  1. These can be defined by calling the Scala APIs in Java as follows:
ClassTag<string> stringTag = scala.reflect.ClassTag$.MODULE$.apply(string.class); 

As properties associated with both vertices and edges are of string, so the preceding ClassTag object can be used in both.

  1. Using these RDDs, Graph() can be created as follows:
Graph<string, string> graph = Graph.apply(verticesRDD.rdd(), edgesRDD.rdd(), "", StorageLevel.MEMORY_ONLY(),StorageLevel.MEMORY_ONLY(), stringTag, stringTag); 
  1. Vertices and edges can be printed using the collect() action as follows:
graph.vertices().toJavaRDD().collect().forEach(System.out::println); 
graph.edges().toJavaRDD().collect().forEach(System.out::println); 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.210.102