To connect the graph visualization capabilities of Gephi with Spark GraphX graphs, we need to address a way to communicate between the two. The canonical candidate for doing so is Gephi's Graph Exchange XML Format (GEXF), a description of which can be found at https://gephi.org/gexf/format/. A very simple example of how graphs are described in this format is displayed in the following code listing:
<?xml version="1.0" encoding="UTF-8"?>
<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">
<meta lastmodifieddate="2009-03-20">
<creator>Gexf.net</creator>
<description>A hello world! file</description>
</meta>
<graph mode="static" defaultedgetype="directed">
<nodes>
<node id="0" label="Hello" />
<node id="1" label="Word" />
</nodes>
<edges>
<edge id="0" source="0" target="1" />
</edges>
</graph>
</gexf>
Apart from the header and the meta data of the XML, the graph encoding itself is self-explanatory. It is useful to know that the preceding XML is just the bare minimum required for graph descriptions, and in fact, GEXF can be used to encode other properties, such as edge weights or even visual attributes that are automatically picked up by Gephi.
To connect with GraphX, let's write a little helper function that takes a Graph version and returns a String version of the preceding XML format:
def toGexf[VD, ED](g: Graph[VD, ED]): String = {
val header =
"""<?xml version="1.0" encoding="UTF-8"?>
|<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">
| <meta>
| <description>A gephi graph in GEXF format</description>
| </meta>
| <graph mode="static" defaultedgetype="directed">
""".stripMargin
val vertices = "<nodes> " + g.vertices.map(
v => s"""<node id="${v._1}" label="${v._2}"/> """
).collect.mkString + "</nodes> "
val edges = "<edges> " + g.edges.map(
e => s"""<edge source="${e.srcId}" target="${e.dstId}" label="${e.attr}"/> """
).collect.mkString + "</edges> "
val footer = "</graph> </gexf>"
header + vertices + edges + footer
}
While the code might seem a bit cryptic at first sight, very little is happening. We define the header and the footer for the XML. We need to map the edge and vertex properties to the <nodes> and <edges> XML tags. To this end, we use Scala's convenient ${} notation to ingest variables directly into strings. For a change, let's use this toGexf function in a complete Scala app, which uses our simple friend graph from earlier. Note that for this to work, it is assumed that toGexf is available to GephiApp. So, either store it in the same object or in another file to import it from there. If you want to continue using spark-shell, just pasting the imports and the body of the main method, excluding the creation of conf and sc, should work without problems:
import java.io.PrintWriter
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
object GephiApp {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("Gephi Test Writer")
.setMaster("local[4]")
val sc = new SparkContext(conf)
val vertices: RDD[(VertexId, String)] = sc.parallelize(
Array((1L, "Anne"),
(2L, "Bernie"),
(3L, "Chris"),
(4L, "Don"),
(5L, "Edgar")))
val edges: RDD[Edge[String]] = sc.parallelize(
Array(Edge(1L, 2L, "likes"),
Edge(2L, 3L, "trusts"),
Edge(3L, 4L, "believes"),
Edge(4L, 5L, "worships"),
Edge(1L, 3L, "loves"),
Edge(4L, 1L, "dislikes")))
val graph: Graph[String, String] = Graph(vertices, edges)
val pw = new PrintWriter("./graph.gexf")
pw.write(toGexf(graph))
pw.close()
}
}
This app stores our friend graph as graph.gexf, which we can use to import into Gephi. To do so, go to File, then click on Open to select this file and import the graph. The following diagram shows the result of this procedure by tweaking the visual attributes using the tabs and methods described earlier: