Summary

This chapter has introduced the Titan graph database from Aurelius. It has shown how it can be installed and configured on a Linux cluster. Using a Titan Gremlin shell example, the graphs have been created, and stored in both HBase and Cassandra NoSQL databases. The choice of Titan storage option required will depend upon your project requirements; HBase HDFS based storage or Cassandra non HDFS based storage. This chapter has also shown that you can use the Gremlin shell both interactively to develop the graph scripts, and with Bash shell scripts so that you can run scheduled jobs with associated logging.

Simple Spark Scala code has been provided, which shows that Apache Spark can access the underlying tables that Titan creates on both HBase and Cassandra. This has been achieved by using the database connector modules provided by Cloudera (for HBase), and DataStax (for Cassandra). All example code and build scripts have been described along with the example output. I have included this Scala-based section to show you that the graph-based data can be accessed in Scala. The previous section processed data from the Gremlin shell, and used Spark as a processing backend. This section uses Spark as the main processing engine, and accesses Titan data from Spark. If the Gremlin shell was not suitable for your requirements, you might consider this approach. As Titan matures, so will the ways in which you can integrate Titan with Spark via Scala.

Finally, Titan's Gremlin shell has been used along with Apache Spark to demonstrate simple methods for creating, and accessing Titan-based graphs. Data has been stored on the file system, Cassandra, and HBase to do this.

Google groups are available for Aurelius and Gremlin users via the URLs at https://groups.google.com/forum/#!forum/aureliusgraphs and https://groups.google.com/forum/#!forum/gremlin-users.

Although the community seems smaller than other Apache projects, posting volume can be somewhat light, and it can be difficult to get a response to posts.

DataStax, the people who created Cassandra, acquired Aurelius, the creators of Titan this year. The creators of Titan are now involved in the development of DataStax's DSE graph database, which may have a knock-on effect on Titan's development. Having said that, the 0.9.x Titan release has been created, and a 1.0 release is expected.

So, having shown some of the Titan functionality with the help of an example with both Scala and Gremlin, I will close the chapter here. I wanted to show the pairing of Spark-based graph processing, and a graph storage system. I like open source systems for their speed of development and accessibility. I am not saying that Titan is the database for you, but it is a good example. If its future can be assured, and its community grows, then as it matures, it could offer a valuable resource.

Note that two versions of Spark have been used in this chapter: 1.3 and 1.2.1. The earlier version was required, because it was apparently the only version that would work with Titan's SparkGraphComputer, and so avoids Kyro serialization errors.

In the next chapter, extensions to the Apache Spark MLlib machine learning library will be examined in terms of the http://h2o.ai/ H2O product. A neural-based deep learning example will be developed in Scala to demonstrate its potential functionality.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.79.147