outerJoinVertices

Joining datasets is a very popular operation in analytics systems. The outerJoinVertices operation is used to join graphs with external RDDs. Joining graphs to external datasets can be really useful at times when some external properties need to be merged in the graph.

The following is the signature of the outerJoinVertices transformation:

outerJoinVertices ( RDD <scala.Tuple2<Object,U>> other, scala.Function3<Object, VD, scala.Option<U>, VD2> mapFunc, scala.reflect.ClassTag<U> evidence$13, scala.reflect.ClassTag<VD2> evidence$14, scala.Predef.$eq$colon$eq<VD,VD2> eq)

Here, the first parameter is the dataset to join with. The vertex ID of the graph vertices is joined with the Tuple object/key elements. The second parameter is the user-defined join function. As the name of the operation suggests, it works as an outer join operation. Therefore, the join function takes a scala.Option option that will be empty if there is no match found for the vertex ID in the external dataset.

The remaining parameters are Scala objects, ClassTag, and eq; we have already defined them in the previous section. Let's define the external RDD. The vertex property is the graph defined in the preceding sections as random names. Here we will define an RDD that will have last names for those names:

List<Tuple2<Object, string>> dataToJoin = new ArrayList<>();
dataToJoin.add(new Tuple2<Object, string>(1l,"Wilson"));
dataToJoin.add(new Tuple2<Object, string>(2l,"Harmon"));
dataToJoin.add(new Tuple2<Object, string>(3l,"Johnson"));
dataToJoin.add(new Tuple2<Object, string>(4l,"Peterson"));
dataToJoin.add(new Tuple2<Object, string>(5l,"Adams"));
JavaRDD<Tuple2<Object, string>> dataToJoinRdd = javaSparkContext.parallelize(dataToJoin);

The following is the user-defined function defining the operation executed with the outer join of vertices:

public class AbsFunc1 extends AbstractFunction3<Object, string, Option<string>, string> implements Serializable {
@Override
public string apply(Object o, string s1, Option<string> s2) {
if (s2.isEmpty()) {
return s1 ;
}
else {
return s1 + " " + s2.get();
}
}
}

If a match is not found then it is returning the first name itself.

Therefore, the outerJoinVertices operation can be executed as follows:

Graph<string, string> outerJoinVertices = graph.outerJoinVertices(dataToJoinRdd.rdd(), new AbsFunc1(), scala.reflect.ClassTag$.MODULE$.apply(string.class), scala.reflect.ClassTag$.MODULE$.apply(string.class), scala.Predef.$eq$colon$eq$.MODULE$.tpEquals());
outerJoinVertices.vertices().toJavaRDD().collect().forEach(System.out::println);

In this section, we learned about various graph-based operations. In the next section, we will implement some graph algorithms using the GraphX library.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.213.212