- First we define the vertex and edge RDD as follows:
List<Tuple2<Object, string>> vertices = new ArrayList<>(); vertices.add(new Tuple2<Object, string>(1l, "James")); vertices.add(new Tuple2<Object, string>(2l, "Robert")); vertices.add(new Tuple2<Object, string>(3l, "Charlie")); vertices.add(new Tuple2<Object, string>(4l, "Roger")); vertices.add(new Tuple2<Object, string>(5l, "Tony")); List<Edge<string>> edges = new ArrayList<>(); edges.add(new Edge<string>(1, 2, "Friend")); edges.add(new Edge<string>(2, 3, "Advisor")); edges.add(new Edge<string>(1, 3, "Friend")); edges.add(new Edge<string>(4, 3, "colleague")); edges.add(new Edge<string>(4, 5, "Relative")); edges.add(new Edge<string>(5, 2, "BusinessPartners")); JavaRDD<Tuple2<Object, string>> verticesRDD = javaSparkContext.parallelize(vertices); JavaRDD<Edge<string>> edgesRDD = javaSparkContext.parallelize(edges);
org.apache.spark.graphx.Graph API requires scala.reflect.ClassTag objects for vertex type and edge type. These are Scala objects, which are singleton by nature.
- These can be defined by calling the Scala APIs in Java as follows:
ClassTag<string> stringTag = scala.reflect.ClassTag$.MODULE$.apply(string.class);
As properties associated with both vertices and edges are of string, so the preceding ClassTag object can be used in both.
- Using these RDDs, Graph() can be created as follows:
Graph<string, string> graph = Graph.apply(verticesRDD.rdd(), edgesRDD.rdd(), "", StorageLevel.MEMORY_ONLY(),StorageLevel.MEMORY_ONLY(), stringTag, stringTag);
- Vertices and edges can be printed using the collect() action as follows:
graph.vertices().toJavaRDD().collect().forEach(System.out::println); graph.edges().toJavaRDD().collect().forEach(System.out::println);