MongoDB is an open source NoSQL database written in C++. MongoDB stores documents in a JSON-like format called BSON. MongoDB’s BSON format is much different from the flexible table format of Cassandra. This chapter discusses the procedure to migrate a BSON document stored in the MongoDB server to a table in a Cassandra database.
To set the environment, you must install the following software:
MongoDB Windows binaries from http://www.mongodb.org/downloads. Extract the TGZ or ZIP file to a directory and add C:MongoDBmongodb-win32-x86_64-2008plus-2.4.9in to the PATH
environment variable.
MongoDB Java driver JAR from http://central.maven.org/maven2/org/mongodb/mongo-java-driver/.
Eclipse IDE for Java EE developers from http://www.eclipse.org/downloads/.
Apache Commons Lang 2.6 commons-lang-2.6-bin.zip from http://commons.apache.org/proper/commons-lang/download_lang.cgi. Extract it to the commons-lang-2.6-bin directory.
Hector Java client hector-core-1.1-4.jar or a later version from http://repo2.maven.org/maven2/org/hectorclient/hector-core/1.1-4/.
Apache Cassandra 2.04 from http://cassandra.apache.org/download/. Add C:Cassandraapache-cassandra-2.0.4into the PATH
variable.
Start Apache Cassandra server with the following command:
>cassandra -f
Apache Cassandra is started, as shown in Figure 7.1.
Start MongoDB server with the following command:
>mongod
MongoDB server is started, as shown in Figure 7.2.
In this section, you will create a Java project in Eclipse IDE to migrate a MongoDB document to Apache Cassandra. Follow these steps:
1. Select File > New > Other.
2. In the New dialog box, select Java Project or Java > Java Project. Then click Next, as shown in Figure 7.3.
3. In the New Java Project dialog box, specify a project name (MigrateMongoDB), select the Use Default Location checkbox, select JDK 1.7 as the JRE (Use Default JRE may already be selected), and click Next, as shown in Figure 7.4.
4. In the Java Settings dialog box, select the default settings. Select Allow Output Folders for Source Folders. Then click Finish. A Java project, MigrateMongoDB, is created.
5. Add two Java classes, CreateMongoDBDocument
and MongoDBToCassandra
. The CreateMongoDBDocument
class is for creating a BSON document in MongoDB and the MongoDBToCassandra
class is for migrating the BSON document from MongoDB to Apache Cassandra. To add a Java class, select File > New > Other. Then, in the New dialog box, select Java > Class and click Next. Finally, in the New Java Class wizard, specify a package name and a class name and click Finish. The directory structure of the MigrateMongoDB project is shown in Figure 7.5.
6. Next, you must add some JAR files for Cassandra and MongoDB to the project class path. Add the JAR files listed in Table 7.1. These JAR files are from the Cassandra server download, the MongoDB server download, the Hector Java client for Cassandra, and some third-party JARs.
To add the required JARs, right-click the project node in Package Explorer and select Properties. Then, in the Properties dialog box, select Java Build Path. Finally, click the Add External JARs button to add the external JAR files. The JARs added to the migration project are shown in Figure 7.6.
You need to add some data to MongoDB to migrate the data to the Cassandra database. Here, you will create a document in MongoDB using the Java application CreateMongoDBDocument
. The main package for the MongoDB Java driver is com.mongodb
. A MongoDB
client to connect to MongoDB server is represented with the MongoClient
class. A MongoClient
object provides connection pooling and only one instance is required for the application. Create a MongoClient
instance using the MongoClient(List<ServerAddress> seeds)
constructor. Supply the IPv4 address of the host and port as 27017.
MongoClient mongoClient = new MongoClient(Arrays.asList(new ServerAddress ("localhost", 27017)));
A logical database in MongoDB is represented with the com.mongodb.DB
class. Obtain a com.mongodb.DB
instance for the local
database, which is a default MongoDB database instance, using the getDB(String dbname)
method in the MongoClient
class. MongoDB stores data in collections. Get all collections from the database instance using the getCollectionNames()
method in com.mongodb.DB
class.
Set<String> colls = db.getCollectionNames();
The getCollectionNames()
method returns a Set<String>
of collections. Iterate over the collection to output the collection names.
for (String s : colls) { System.out.println(s); }
A MongoDB collection is represented with the DBCollection
class. Create a new DBCollection
instance using the createCollection(String name,DBObject options)
method in the com.mongodb.Db
class. You specify the options to create a collection using a key/value map represented with the DBObject
interface. The options that may be specified are listed in Table 7.2.
Create a collection called catalog
and set the options to null
:
DBCollection coll = db.createCollection("catalog", null);
A MongoDB-specific BSON object is represented with the BasicDBObject
class, which implements the DBObject
interface. The BasicDBObject
class provides the constructors listed in Table 7.3 to create a new instance.
The BasicDBObject
class provides some other utility methods, some of which are listed in Table 7.4.
Create a BasicDBObject
instance using the BasicDBObject(String key, Object value)
constructor and use the append(String key, Object val)
method to append key/value pairs:
BasicDBObject catalog = new BasicDBObject("journal","Oracle Magazine").append("publisher", "Oracle Publishing").append("edition", "November December 2013").append("title", "Engineering as a Service").append("author", "David A. Kelly");
The DBCollection
class provides an overloaded insert
method to add an instance(s) of BasicDBObject
to a collection. Add the catalog BasicDBObject
to the DBCollection
instance for the catalog
collection:
coll.insert(catalog);
The DBCollection
class also provides an overloaded findOne()
method to find a DBObject
instance. Obtain the document added using the findOne()
method:
DBObject catalog = coll.findOne();
Output the DBObject
object found by iterating over the Set
obtained from the DBObject
using the keySet()
method. The keySet()
method returns a Set<String>
. Create an Iterator
from the Set<String>
using the iterator()
method. While the Iterator
has elements as determined by the hasNext()
method, obtain the elements using the next()
method. Each element is a key in the DBObject
fetched. Obtain the value for the key using the get(String key)
method in DBObject
.
Set<String> set=catalog.keySet(); Iterator iter = set.iterator(); while(iter.hasNext()){ Object obj= iter.next(); System.out.println(obj); System.out.println(catalog.get(obj.toString())); }
The CreateMongoDBDocument
class appears in Listing 7.1.
Listing 7.1 The CreateMongoDBDocument
Class
package mongodb; import java.net.UnknownHostException; import java.util.Arrays; import java.util.Iterator; import java.util.Set; import com.mongodb.MongoClient; import com.mongodb.MongoException; import com.mongodb.WriteConcern; import com.mongodb.DB; import com.mongodb.DBCollection; import com.mongodb.BasicDBObject; import com.mongodb.DBObject; import com.mongodb.DBCursor; import com.mongodb.ServerAddress; public class CreateMongoDBDocument { public static void main(String[] args) { try { MongoClient mongoClient = new MongoClient( Arrays.asList(new ServerAddress("localhost", 27017))); /*for (String s : mongoClient.getDatabaseNames()) { System.out.println(s); } */ DB db = mongoClient.getDB("local"); /*Set<String> colls = db.getCollectionNames(); for (String s : colls) { System.out.println(s); }*/ DBCollection coll = db.createCollection("catalog", null); /*BasicDBObject catalog = new BasicDBObject("journal", "Oracle Magazine").append("publisher", "Oracle Publishing") .append("edition", "November December 2013") .append("title", "Engineering as a Service") .append("author", "David A. Kelly");*/ //coll.insert(catalog); DBObject catalog = coll.findOne(); //System.out.println(catalog); Set<String> set=catalog.keySet(); Iterator iter=set.iterator(); while(iter.hasNext()){ Object obj=iter.next(); System.out.println(obj); System.out.println(catalog.get(obj.toString())); } } catch (UnknownHostException e) { e.printStackTrace(); } } }
To run the CreateMongoDBDocument application, right-click the CreateMongoDBDocument.java file in Package Explorer and select Run As > Java Application, as shown in Figure 7.7.
A new BSON document is stored in a new collection, catalog
, in the MongoDB database. The document stored is also output as such and as key/value pairs, as shown in Figure 7.8.
In this section, you will query the BSON document stored earlier in the MongoDB server and migrate the BSON document to a Cassandra database. You will use the MongoDBToCassandra
class to migrate the BSON document from the MongoDB server to Cassandra. Create a MongoClient
instance, which is required for migrating, as discussed in the previous section to add a document.
MongoClient mongoClient = new MongoClient(Arrays.asList(new ServerAddress ("localhost", 27017)));
Create a DB
object for the local database instance using the getDB(String dbname)
method in MongoClient
. Using the DB
instance gets the catalog
collection as a DBCollection
object. Create a DBObject
instance from the document stored in MongoDB in the previous section using the findOne()
method in the DBCollection
class.
DB db = mongoClient.getDB("local"); DBCollection coll = db.getCollection("catalog"); DBObject catalog = coll.findOne();
Next, you will migrate the resulting DBObject
to the Cassandra database. Some of the procedures for migrating MongoDB to Cassandra are the same as for migrating Couchbase to Cassandra, which is discussed in Chapter 8, “Migrating Couchbase to Cassandra.”
The me.prettyprint.hector.api.Cluster
interface represents a cluster of Cassandra hosts. To access a Cassandra cluster, create a Cluster
instance for a Cassandra cluster using the getOrCreateCluster(String clusterName, String hostIp)
method as follows:
Cluster cluster = HFactory.getOrCreateCluster("migration-cluster","localhost:9160");
Next, create a schema if not already defined. A schema consists of a column family definition and a keyspace definition. Use the describeKeyspace
method in Cluster
to obtain a KeyspaceDefinition
object for HectorKeyspace
keyspace. If the keyspace definition object is null
, invoke a createSchema()
method to create a schema.
KeyspaceDefinition keyspaceDef = cluster.describeKeyspace("HectorKeyspace"); if (keyspaceDef == null) { createSchema(); }
As discussed in Chapter 1, “Using Cassandra with Hector,” add a createSchema()
method to create a column family definition and a keyspace definition for the schema. Create a column family definition for a column family named "catalog"
, a keyspace named HectorKeyspace
, and a comparator named ComparatorType.BYTESTYPE
.
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition( " HectorKeyspace", "catalog", ComparatorType.BYTESTYPE);
Use a replicationFactor
of 1
to create a KeyspaceDefinition
instance from the preceding column family definition. Specify the strategy class as org.apache.cassandra. locator.SimpleStrategy
using the constant ThriftKsDef.DEF_STRATEGY_CLASS
.
int replicationFactor = 1; KeyspaceDefinition keyspace = HFactory.createKeyspaceDefinition( " HectorKeyspace", ThriftKsDef.DEF_STRATEGY_CLASS, replicationFactor, Arrays.asList(cfDef));
Add the keyspace definition to the Cluster
instance. With blockUntilComplete
set to true
, the method blocks until schema agreement is received.
cluster.addKeyspace(keyspace, true);
Adding a keyspace definition to a Cluster
instance does not create a keyspace. Having added a keyspace definition, you need to create a keyspace. Add a createKeyspace()
method to create a keyspace and invoke the method from the main
method. A keyspace is represented with the me.prettyprint.hector.api.Keyspace
interface. The HFactory
class provides static
methods to create a Keyspace
instance from a Cluster
instance to which a keyspace definition has been added. Invoke the createKeyspace(String keyspace, Cluster cluster)
method to create a Keyspace
instance with the name HectorKeyspace
.
private static void createKeyspace() { keyspace = HFactory.createKeyspace("HectorKeyspace", cluster); }
Next, create a template and add a createTemplate()
method to it. Invoke the method from the main
method. Templates provide a reusable construct containing the fields common to all Hector client operations. Create an instance of ThriftColumnFamilyTemplate
using a class constructor ThriftColumnFamilyTemplate(Keyspace keyspace, String columnFamily,Serializer<K> keySerializer, Serializer<N> topSerializer)
. Use the Keyspace
instance created earlier and specify the column family name as "catalog"
.
ThriftColumnFamilyTemplate template = new ThriftColumnFamilyTemplate<String, String>(keyspace,"catalog", StringSerializer.get(), StringSerializer.get());
Next, you will migrate the data represented with the DBObject
instance retrieved from MongoDB to the column family "catalog"
in the keyspace HectorKeyspace
. Add a method called migrate()
and invoke it from the main
method. In the migrate()
method, you will migrate the DBObject
object retrieved from the MongoDB BSON document to Cassandra. In the Hector API, the Mutator
class is used to add data. First, you need to create an instance of Mutator
using the static
method createMutator(Keyspace keyspace,Serializer<K> keySerializer)
in HFactory
. Supply the Keyspace
instance previously created and also supply a StringSerializer
instance.
Mutator<String> mutator = HFactory.createMutator(keyspace, StringSerializer.get());
Obtain a Set
object from the DBObject
using the keySet()
method and create an Iterator
from the Set
object.
Set<String> set = catalog.keySet(); Iterator iter = set.iterator();
The Mutator
class provides the addInsertion(K key, String cf, HColumn<N, V> c)
method to add an HColumn
instance and return the Mutator
instance, which may be used again to add another HColumn
instance. You can add a series of HColumn
instances by invoking the Mutator
instance sequentially. Using the Iterator
obtained from the key set in the DBObject
from MongoDB BSON document, you will add multiple columns to a Mutator
instance using addInsertion()
invocations in series.
Using the Iterator
and the hasNext()
method, obtain a BSON document’s key in the key/value pairs as an Object
. Specify the Key
for the Cassandra row as catalog1
. The column family name is catalog
. Using the while
loop, add multiple columns to a Mutator
instance using addInsertion()
invocations in series. Add HColumn<String,String>
instances, which represent columns, using the static
method createStringColumn (String name, String value)
. By iterating over the key set, obtain the column names using the obj.toString()
method. Obtain the corresponding column value from the DBObject
instance created from the BSON document using the catalog.get(obj. toString()).toString())
method invocation.
while (iter.hasNext()) { Object obj = iter.next(); mutator = mutator.addInsertion("catalog1","catalog", HFactory.createStringColumn(obj.toString(), catalog.get(obj.toString()).toString())); }
The mutations added to the Mutator
instance are not sent to the Cassandra server until the execute()
method is invoked:
mutator.execute();
The BSON document from MongoDB is migrated to Cassandra. To find the table data created in Cassandra from the MongoDB BSON document, add a retrieveTableData()
method and invoke it from the main
method. In the retrieveTableData()
method, use the ThriftColumnFamilyTemplate
instance to query multiple columns with the queryColumns(K key)
method. This queries the columns in the row corresponding to the provided Key
value ColumnFamilyResult
instance. Using the template, query the columns in the row corresponding to "catalog"
key.
ColumnFamilyResult<String, String> res = template.queryColumns("catalog");
Obtain and output the String
column values in the ColumnFamilyResult
instance obtained from the preceding query.
String journal = res.getString("journal"); String publisher = res.getString("publisher"); String edition = res.getString("edition"); String title = res.getString("title"); String author = res.getString("author"); System.out.println(journal); System.out.println(publisher); System.out.println(edition); System.out.println(title); System.out.println(author);
The MongoDBToCassandra
class appears in Listing 7.2.
Listing 7.2 The MongoDBToCassandra
Class
package mongodb; import java.net.UnknownHostException; import java.util.Arrays; import java.util.Iterator; import java.util.Set; import me.prettyprint.cassandra.serializers.StringSerializer; import me.prettyprint.cassandra.service.ThriftKsDef; import me.prettyprint.hector.api.Cluster; import me.prettyprint.hector.api.Keyspace; import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition; import me.prettyprint.hector.api.ddl.ComparatorType; import me.prettyprint.hector.api.ddl.KeyspaceDefinition; import me.prettyprint.hector.api.exceptions.HectorException; import me.prettyprint.hector.api.factory.HFactory; import me.prettyprint.hector.api.mutation.Mutator; import me.prettyprint.cassandra.service.template.ColumnFamilyResult; import me.prettyprint.cassandra.service.template.ColumnFamilyTemplate; import me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate; import com.mongodb.DB; import com.mongodb.DBCollection; import com.mongodb.DBObject; import com.mongodb.MongoClient; import com.mongodb.ServerAddress; public class MongoDBToCassandra { private static DBObject catalog; private static Cluster cluster; private static Keyspace keyspace; private static ColumnFamilyTemplate<String, String> template; public static void main(String[] args) { try { cluster = HFactory.getOrCreateCluster("hector-cluster", "localhost:9160"); KeyspaceDefinition keyspaceDef = cluster .describeKeyspace("HectorKeyspace"); if (keyspaceDef == null) { createSchema(); } createKeyspace(); createTemplate(); MongoClient mongoClient = new MongoClient( Arrays.asList(new ServerAddress ("localhost", 27017))); DB db = mongoClient.getDB("local"); DBCollection coll = db.getCollection("catalog"); catalog = coll.findOne(); migrate(); retrieveTableData(); } catch (UnknownHostException e) { e.printStackTrace(); } } private static void migrate() { Mutator<String> mutator = HFactory.createMutator(keyspace, StringSerializer.get()); Set<String> set = catalog.keySet(); Iterator iter = set.iterator(); while (iter.hasNext()) { Object obj = iter.next(); mutator = mutator.addInsertion( "catalog1", "catalog", HFactory.createStringColumn(obj.toString(), catalog.get(obj.toString()).toString())); } mutator.execute(); } private static void createSchema() { int replicationFactor = 1; ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition( "HectorKeyspace", "catalog", ComparatorType.BYTESTYPE); KeyspaceDefinition keyspace = HFactory.createKeyspaceDefinition( "HectorKeyspace", ThriftKsDef.DEF_STRATEGY_CLASS, replicationFactor, Arrays.asList(cfDef)); cluster.addKeyspace(keyspace, true); } private static void createKeyspace() { keyspace = HFactory.createKeyspace("HectorKeyspace", cluster); } private static void createTemplate() { template = new ThriftColumnFamilyTemplate<String, String> (keyspace, "catalog", StringSerializer.get(), StringSerializer.get()); } private static void retrieveTableData() { try { ColumnFamilyResult<String, String> res = template .queryColumns("catalog1"); String journal = res.getString("journal"); String publisher = res.getString("publisher"); String edition = res.getString("edition"); String title = res.getString("title"); String author = res.getString("author"); System.out.println(journal); System.out.println(publisher); System.out.println(edition); System.out.println(title); System.out.println(author); } catch (HectorException e) { } } }
Run the MongoDBToCassandra application in the Eclipse IDE. Right-click MongoDBTo-Cassandra and select Run As > Java Application, as shown in Figure 7.9.
The BSON document from the MongoDB server is migrated to Cassandra. Subsequently, the Cassandra table column values created for the migrated BSON document are output in the Eclipse IDE, as shown in Figure 7.10.
In this chapter, you migrated a MongoDB BSON document to Apache Cassandra. You used the MongoDB Java driver to access MongoDB and the Hector Java driver to access Cassandra. You used a Java application developed in the Eclipse IDE for the migration. In the next chapter, you will migrate a JSON document from a Couchbase server to a Cassandra database.
18.216.255.250