CHAPTER 7

MIGRATING MONGODB TO CASSANDRA

MongoDB is an open source NoSQL database written in C++. MongoDB stores documents in a JSON-like format called BSON. MongoDB’s BSON format is much different from the flexible table format of Cassandra. This chapter discusses the procedure to migrate a BSON document stored in the MongoDB server to a table in a Cassandra database.

SETTING THE ENVIRONMENT

To set the environment, you must install the following software:

Images MongoDB Windows binaries from http://www.mongodb.org/downloads. Extract the TGZ or ZIP file to a directory and add C:MongoDBmongodb-win32-x86_64-2008plus-2.4.9in to the PATH environment variable.

Images MongoDB Java driver JAR from http://central.maven.org/maven2/org/mongodb/mongo-java-driver/.

Images Eclipse IDE for Java EE developers from http://www.eclipse.org/downloads/.

Images Apache Commons Lang 2.6 commons-lang-2.6-bin.zip from http://commons.apache.org/proper/commons-lang/download_lang.cgi. Extract it to the commons-lang-2.6-bin directory.

Images Hector Java client hector-core-1.1-4.jar or a later version from http://repo2.maven.org/maven2/org/hectorclient/hector-core/1.1-4/.

Images Apache Cassandra 2.04 from http://cassandra.apache.org/download/. Add C:Cassandraapache-cassandra-2.0.4into the PATH variable.

Start Apache Cassandra server with the following command:

>cassandra -f

Apache Cassandra is started, as shown in Figure 7.1.

Figure 7.1
Starting Apache Cassandra.

Images

Source: Microsoft Corporation.

Start MongoDB server with the following command:

>mongod

MongoDB server is started, as shown in Figure 7.2.

Figure 7.2
Starting MongoDB.

Images

Source: Microsoft Corporation.

CREATING A JAVA PROJECT

In this section, you will create a Java project in Eclipse IDE to migrate a MongoDB document to Apache Cassandra. Follow these steps:

1. Select File > New > Other.

2. In the New dialog box, select Java Project or Java > Java Project. Then click Next, as shown in Figure 7.3.

Figure 7.3
Selecting the Java Project wizard.

Images

Source: Eclipse Foundation.

3. In the New Java Project dialog box, specify a project name (MigrateMongoDB), select the Use Default Location checkbox, select JDK 1.7 as the JRE (Use Default JRE may already be selected), and click Next, as shown in Figure 7.4.

Figure 7.4
Specifying a project name.

Images

Source: Eclipse Foundation.

4. In the Java Settings dialog box, select the default settings. Select Allow Output Folders for Source Folders. Then click Finish. A Java project, MigrateMongoDB, is created.

5. Add two Java classes, CreateMongoDBDocument and MongoDBToCassandra. The CreateMongoDBDocument class is for creating a BSON document in MongoDB and the MongoDBToCassandra class is for migrating the BSON document from MongoDB to Apache Cassandra. To add a Java class, select File > New > Other. Then, in the New dialog box, select Java > Class and click Next. Finally, in the New Java Class wizard, specify a package name and a class name and click Finish. The directory structure of the MigrateMongoDB project is shown in Figure 7.5.

Figure 7.5
The directory structure of the MigrateMongoDB project.

Images

Source: Eclipse Foundation.

6. Next, you must add some JAR files for Cassandra and MongoDB to the project class path. Add the JAR files listed in Table 7.1. These JAR files are from the Cassandra server download, the MongoDB server download, the Hector Java client for Cassandra, and some third-party JARs.

Table 7.1 JAR Files for Migration

Images

To add the required JARs, right-click the project node in Package Explorer and select Properties. Then, in the Properties dialog box, select Java Build Path. Finally, click the Add External JARs button to add the external JAR files. The JARs added to the migration project are shown in Figure 7.6.

Figure 7.6
Adding JARs to the Java build path.

Images

Source: Eclipse Foundation.

CREATING A BSON DOCUMENT IN MONGODB

You need to add some data to MongoDB to migrate the data to the Cassandra database. Here, you will create a document in MongoDB using the Java application CreateMongoDBDocument. The main package for the MongoDB Java driver is com.mongodb. A MongoDB client to connect to MongoDB server is represented with the MongoClient class. A MongoClient object provides connection pooling and only one instance is required for the application. Create a MongoClient instance using the MongoClient(List<ServerAddress> seeds) constructor. Supply the IPv4 address of the host and port as 27017.

MongoClient mongoClient = new MongoClient(Arrays.asList(new ServerAddress
("localhost", 27017)));

A logical database in MongoDB is represented with the com.mongodb.DB class. Obtain a com.mongodb.DB instance for the local database, which is a default MongoDB database instance, using the getDB(String dbname) method in the MongoClient class. MongoDB stores data in collections. Get all collections from the database instance using the getCollectionNames() method in com.mongodb.DB class.

Set<String> colls = db.getCollectionNames();

The getCollectionNames() method returns a Set<String> of collections. Iterate over the collection to output the collection names.

     for (String s : colls) {
            System.out.println(s);
     }

A MongoDB collection is represented with the DBCollection class. Create a new DBCollection instance using the createCollection(String name,DBObject options) method in the com.mongodb.Db class. You specify the options to create a collection using a key/value map represented with the DBObject interface. The options that may be specified are listed in Table 7.2.

Table 7.2 Options to Create a DBCollection

Images

Create a collection called catalog and set the options to null:

DBCollection coll = db.createCollection("catalog", null);

A MongoDB-specific BSON object is represented with the BasicDBObject class, which implements the DBObject interface. The BasicDBObject class provides the constructors listed in Table 7.3 to create a new instance.

Table 7.3 BasicDBObject Class Constructors

Images

The BasicDBObject class provides some other utility methods, some of which are listed in Table 7.4.

Table 7.4 BasicDBObject Class Utility Methods

Images

Create a BasicDBObject instance using the BasicDBObject(String key, Object value) constructor and use the append(String key, Object val) method to append key/value pairs:

BasicDBObject catalog = new BasicDBObject("journal","Oracle
Magazine").append("publisher", "Oracle Publishing").append("edition", "November
December 2013").append("title", "Engineering as a Service").append("author",
"David A. Kelly");

The DBCollection class provides an overloaded insert method to add an instance(s) of BasicDBObject to a collection. Add the catalog BasicDBObject to the DBCollection instance for the catalog collection:

coll.insert(catalog);

The DBCollection class also provides an overloaded findOne() method to find a DBObject instance. Obtain the document added using the findOne() method:

DBObject catalog = coll.findOne();

Output the DBObject object found by iterating over the Set obtained from the DBObject using the keySet() method. The keySet() method returns a Set<String>. Create an Iterator from the Set<String> using the iterator() method. While the Iterator has elements as determined by the hasNext() method, obtain the elements using the next() method. Each element is a key in the DBObject fetched. Obtain the value for the key using the get(String key) method in DBObject.

Set<String> set=catalog.keySet();
Iterator iter = set.iterator();
while(iter.hasNext()){
Object obj=     iter.next();
System.out.println(obj);
System.out.println(catalog.get(obj.toString()));
}

The CreateMongoDBDocument class appears in Listing 7.1.

Listing 7.1 The CreateMongoDBDocument Class

package mongodb;
 
import java.net.UnknownHostException;
import java.util.Arrays;
import java.util.Iterator;
import java.util.Set;
 
 
import com.mongodb.MongoClient;
import com.mongodb.MongoException;
import com.mongodb.WriteConcern;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;
import com.mongodb.ServerAddress;
 
public class CreateMongoDBDocument {
 
       public static void main(String[] args) {
 
              try {
                     MongoClient mongoClient = new MongoClient(
                     Arrays.asList(new ServerAddress("localhost", 27017)));
                     /*for (String s : mongoClient.getDatabaseNames()) {
                   System.out.println(s);
                     }
                     */
                     DB db = mongoClient.getDB("local");
                     /*Set<String> colls = db.getCollectionNames();
                     for (String s : colls) {
                   System.out.println(s);
                     }*/
                     DBCollection coll = db.createCollection("catalog", null);
                     /*BasicDBObject catalog = new BasicDBObject("journal",
                     "Oracle Magazine").append("publisher", "Oracle
Publishing")
                     .append("edition", "November December 2013")
                     .append("title", "Engineering as a Service")
                     .append("author", "David A. Kelly");*/
                     //coll.insert(catalog);
                     DBObject catalog = coll.findOne();
                     //System.out.println(catalog);
                     Set<String> set=catalog.keySet();
                     Iterator iter=set.iterator();
                     while(iter.hasNext()){
                            Object obj=iter.next();
                            System.out.println(obj);
                            System.out.println(catalog.get(obj.toString()));
                    }
             } catch (UnknownHostException e) {
                    e.printStackTrace();
             }
       }
}

To run the CreateMongoDBDocument application, right-click the CreateMongoDBDocument.java file in Package Explorer and select Run As > Java Application, as shown in Figure 7.7.

Figure 7.7
Running the CreateMongoDBDocument application.

Images

Source: Eclipse Foundation.

A new BSON document is stored in a new collection, catalog, in the MongoDB database. The document stored is also output as such and as key/value pairs, as shown in Figure 7.8.

Figure 7.8
Storing a document in MongoDB.

Images

Source: Eclipse Foundation.

MIGRATING THE MONGODB DOCUMENT TO CASSANDRA

In this section, you will query the BSON document stored earlier in the MongoDB server and migrate the BSON document to a Cassandra database. You will use the MongoDBToCassandra class to migrate the BSON document from the MongoDB server to Cassandra. Create a MongoClient instance, which is required for migrating, as discussed in the previous section to add a document.

MongoClient mongoClient = new MongoClient(Arrays.asList(new ServerAddress
("localhost", 27017)));

Create a DB object for the local database instance using the getDB(String dbname) method in MongoClient. Using the DB instance gets the catalog collection as a DBCollection object. Create a DBObject instance from the document stored in MongoDB in the previous section using the findOne() method in the DBCollection class.

DB db = mongoClient.getDB("local");
DBCollection coll = db.getCollection("catalog");
DBObject catalog = coll.findOne();

Next, you will migrate the resulting DBObject to the Cassandra database. Some of the procedures for migrating MongoDB to Cassandra are the same as for migrating Couchbase to Cassandra, which is discussed in Chapter 8, “Migrating Couchbase to Cassandra.”

The me.prettyprint.hector.api.Cluster interface represents a cluster of Cassandra hosts. To access a Cassandra cluster, create a Cluster instance for a Cassandra cluster using the getOrCreateCluster(String clusterName, String hostIp) method as follows:

Cluster cluster =
HFactory.getOrCreateCluster("migration-cluster","localhost:9160");

Next, create a schema if not already defined. A schema consists of a column family definition and a keyspace definition. Use the describeKeyspace method in Cluster to obtain a KeyspaceDefinition object for HectorKeyspace keyspace. If the keyspace definition object is null, invoke a createSchema() method to create a schema.

KeyspaceDefinition keyspaceDef = cluster.describeKeyspace("HectorKeyspace");
       if (keyspaceDef == null) {
       createSchema();
}

As discussed in Chapter 1, “Using Cassandra with Hector,” add a createSchema() method to create a column family definition and a keyspace definition for the schema. Create a column family definition for a column family named "catalog", a keyspace named HectorKeyspace, and a comparator named ComparatorType.BYTESTYPE.

ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition(
" HectorKeyspace", "catalog", ComparatorType.BYTESTYPE);

Use a replicationFactor of 1 to create a KeyspaceDefinition instance from the preceding column family definition. Specify the strategy class as org.apache.cassandra. locator.SimpleStrategy using the constant ThriftKsDef.DEF_STRATEGY_CLASS.

int replicationFactor = 1;
KeyspaceDefinition keyspace = HFactory.createKeyspaceDefinition(
" HectorKeyspace", ThriftKsDef.DEF_STRATEGY_CLASS,
replicationFactor, Arrays.asList(cfDef));

Add the keyspace definition to the Cluster instance. With blockUntilComplete set to true, the method blocks until schema agreement is received.

cluster.addKeyspace(keyspace, true);

Adding a keyspace definition to a Cluster instance does not create a keyspace. Having added a keyspace definition, you need to create a keyspace. Add a createKeyspace() method to create a keyspace and invoke the method from the main method. A keyspace is represented with the me.prettyprint.hector.api.Keyspace interface. The HFactory class provides static methods to create a Keyspace instance from a Cluster instance to which a keyspace definition has been added. Invoke the createKeyspace(String keyspace, Cluster cluster) method to create a Keyspace instance with the name HectorKeyspace.

private static void createKeyspace() {
       keyspace = HFactory.createKeyspace("HectorKeyspace", cluster);
}

Next, create a template and add a createTemplate() method to it. Invoke the method from the main method. Templates provide a reusable construct containing the fields common to all Hector client operations. Create an instance of ThriftColumnFamilyTemplate using a class constructor ThriftColumnFamilyTemplate(Keyspace keyspace, String columnFamily,Serializer<K> keySerializer, Serializer<N> topSerializer). Use the Keyspace instance created earlier and specify the column family name as "catalog".

ThriftColumnFamilyTemplate template = new ThriftColumnFamilyTemplate<String,
String>(keyspace,"catalog", StringSerializer.get(), StringSerializer.get());

Next, you will migrate the data represented with the DBObject instance retrieved from MongoDB to the column family "catalog" in the keyspace HectorKeyspace. Add a method called migrate() and invoke it from the main method. In the migrate() method, you will migrate the DBObject object retrieved from the MongoDB BSON document to Cassandra. In the Hector API, the Mutator class is used to add data. First, you need to create an instance of Mutator using the static method createMutator(Keyspace keyspace,Serializer<K> keySerializer) in HFactory. Supply the Keyspace instance previously created and also supply a StringSerializer instance.

Mutator<String> mutator = HFactory.createMutator(keyspace,
StringSerializer.get());

Obtain a Set object from the DBObject using the keySet() method and create an Iterator from the Set object.

Set<String> set = catalog.keySet();
Iterator iter = set.iterator();

The Mutator class provides the addInsertion(K key, String cf, HColumn<N, V> c) method to add an HColumn instance and return the Mutator instance, which may be used again to add another HColumn instance. You can add a series of HColumn instances by invoking the Mutator instance sequentially. Using the Iterator obtained from the key set in the DBObject from MongoDB BSON document, you will add multiple columns to a Mutator instance using addInsertion() invocations in series.

Using the Iterator and the hasNext() method, obtain a BSON document’s key in the key/value pairs as an Object. Specify the Key for the Cassandra row as catalog1. The column family name is catalog. Using the while loop, add multiple columns to a Mutator instance using addInsertion() invocations in series. Add HColumn<String,String> instances, which represent columns, using the static method createStringColumn (String name, String value). By iterating over the key set, obtain the column names using the obj.toString() method. Obtain the corresponding column value from the DBObject instance created from the BSON document using the catalog.get(obj. toString()).toString()) method invocation.

while (iter.hasNext()) {
       Object obj = iter.next();
       mutator = mutator.addInsertion("catalog1","catalog",
       HFactory.createStringColumn(obj.toString(),
       catalog.get(obj.toString()).toString()));
}

The mutations added to the Mutator instance are not sent to the Cassandra server until the execute() method is invoked:

mutator.execute();

The BSON document from MongoDB is migrated to Cassandra. To find the table data created in Cassandra from the MongoDB BSON document, add a retrieveTableData() method and invoke it from the main method. In the retrieveTableData() method, use the ThriftColumnFamilyTemplate instance to query multiple columns with the queryColumns(K key) method. This queries the columns in the row corresponding to the provided Key value ColumnFamilyResult instance. Using the template, query the columns in the row corresponding to "catalog" key.

ColumnFamilyResult<String, String> res = template.queryColumns("catalog");

Obtain and output the String column values in the ColumnFamilyResult instance obtained from the preceding query.

String journal = res.getString("journal");
String publisher = res.getString("publisher");
String edition = res.getString("edition");
String title = res.getString("title");
String author = res.getString("author");
System.out.println(journal);
System.out.println(publisher);
System.out.println(edition);
System.out.println(title);
System.out.println(author);

The MongoDBToCassandra class appears in Listing 7.2.

Listing 7.2 The MongoDBToCassandra Class

package mongodb;
import java.net.UnknownHostException;
import java.util.Arrays;
import java.util.Iterator;
import java.util.Set;
 
import me.prettyprint.cassandra.serializers.StringSerializer;
import me.prettyprint.cassandra.service.ThriftKsDef;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition;
import me.prettyprint.hector.api.ddl.ComparatorType;
import me.prettyprint.hector.api.ddl.KeyspaceDefinition;
import me.prettyprint.hector.api.exceptions.HectorException;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;
import me.prettyprint.cassandra.service.template.ColumnFamilyResult;
import me.prettyprint.cassandra.service.template.ColumnFamilyTemplate;
import me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;
import com.mongodb.MongoClient;
import com.mongodb.ServerAddress;
public class MongoDBToCassandra {
 
private static DBObject catalog;
       private static Cluster cluster;
       private static Keyspace keyspace;
       private static ColumnFamilyTemplate<String, String> template;
       public static void main(String[] args) {
              try {
                      cluster = HFactory.getOrCreateCluster("hector-cluster",
                                     "localhost:9160");
                      KeyspaceDefinition keyspaceDef = cluster
                                     .describeKeyspace("HectorKeyspace");
                      if (keyspaceDef == null) {
                             createSchema();
                      }
                      createKeyspace();
                      createTemplate();
                      MongoClient mongoClient = new MongoClient(
                                     Arrays.asList(new ServerAddress
("localhost", 27017)));
                      DB db = mongoClient.getDB("local");
                      DBCollection coll = db.getCollection("catalog");
                      catalog = coll.findOne();
                      migrate();
                      retrieveTableData();
              } catch (UnknownHostException e) {
                      e.printStackTrace();
              }
       }
       private static void migrate() {
              Mutator<String> mutator = HFactory.createMutator(keyspace,
                             StringSerializer.get());
              Set<String> set = catalog.keySet();
              Iterator iter = set.iterator();
              while (iter.hasNext()) {
                     Object obj = iter.next();
                     mutator = mutator.addInsertion(
                                    "catalog1",
                                    "catalog",
                                    HFactory.createStringColumn(obj.toString(),
                                    catalog.get(obj.toString()).toString()));
              }
              mutator.execute();
      }
      private static void createSchema() {
              int replicationFactor = 1;
              ColumnFamilyDefinition cfDef =
HFactory.createColumnFamilyDefinition(
                            "HectorKeyspace", "catalog",
ComparatorType.BYTESTYPE);
              KeyspaceDefinition keyspace = HFactory.createKeyspaceDefinition(
                            "HectorKeyspace", ThriftKsDef.DEF_STRATEGY_CLASS,
                            replicationFactor, Arrays.asList(cfDef));
              cluster.addKeyspace(keyspace, true);
       }
       private static void createKeyspace() {
              keyspace = HFactory.createKeyspace("HectorKeyspace", cluster);
       }
       private static void createTemplate() {
              template = new ThriftColumnFamilyTemplate<String, String>
(keyspace,
                             "catalog", StringSerializer.get(),
StringSerializer.get());
       }
       private static void retrieveTableData() {
              try {
                      ColumnFamilyResult<String, String> res = template
                                    .queryColumns("catalog1");
                      String journal = res.getString("journal");
                      String publisher = res.getString("publisher");
                      String edition = res.getString("edition");
                      String title = res.getString("title");
                      String author = res.getString("author");
                      System.out.println(journal);
                      System.out.println(publisher);
                      System.out.println(edition);
                      System.out.println(title);
                      System.out.println(author);
              } catch (HectorException e) {
              }
       }
}

Run the MongoDBToCassandra application in the Eclipse IDE. Right-click MongoDBTo-Cassandra and select Run As > Java Application, as shown in Figure 7.9.

Figure 7.9
Running the MongoDBToCassandra application.

Images

Source: Eclipse Foundation.

The BSON document from the MongoDB server is migrated to Cassandra. Subsequently, the Cassandra table column values created for the migrated BSON document are output in the Eclipse IDE, as shown in Figure 7.10.

Figure 7.10
The MongoDB document is migrated to Cassandra.

Images

Source: Eclipse Foundation.

SUMMARY

In this chapter, you migrated a MongoDB BSON document to Apache Cassandra. You used the MongoDB Java driver to access MongoDB and the Hector Java driver to access Cassandra. You used a Java application developed in the Eclipse IDE for the migration. In the next chapter, you will migrate a JSON document from a Couchbase server to a Cassandra database.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.255.250