Hour 24. Implementing a MongoDB GridFS Store


What You’ll Learn in This Hour:

Image Using the MongoDB GridFS Store

Image Accessing the MongoDB GridFS Store from the console

Image Implementing the MongoDB GridFS Store in Java

Image Implementing the MongoDB GridFS Store in PHP

Image Implementing the MongoDB GridFS Store in Python

Image Implementing the MongoDB GridFS Store in Node.js


Occasionally, you want to store and retrieve data using MongoDB that exceeds the 16MB size limit. For example, you might be storing large images, ZIP files, movies, and more. To accommodate this, MongoDB provides the GridFS framework. The GridFS framework provides functionality to store large files in chunks, yet still be able to access them from the MongoDB interface.

The hour first focuses on how the GridFS Store works. Then it takes you through the process of using it in the MongoDB shell and with some of the MongoDB drivers. Each driver section is autonomous, so if you have no interest in that particular language, you can skip that section.

Understanding the GridFS Store

GridFS splits large documents into chunks. The chunks are stored in a collection called chunks in the MongoDB database. Metadata about the file is stored in another collection, called files. When you query the GridFS for a document, the metadata is read from the files collection and then the chunks are read from the chunks collection and sent back in the request.

A great feature of the GridFS is that the entire file does not need to be read into memory to be streamed back to the client request. This reduces the risk of low memory conditions.

The following lists some instances when you might want to use the MongoDB GridFS Store instead of standard file storage:

Image The file system limits the number of files in a directory. You can use GridFS to store as many files as needed.

Image You want to keep your files and metadata automatically synched and deployed across a number of systems using MongoDB replication.

Image You want to access information from portions of large files without having to load whole files into memory. You can use GridFS to recall sections of files without reading the entire file into memory.

You can implement GridFS either from the MongoDB shell or by using a MongoDB driver. Each of the MongoDB drivers provides GridFS functionality. The MongoDB Node.js driver provides the Grid and GridStore objects, which enable you to interact with the MongoDB GridFS.

Implementing a GridFS in the MongoDB Shell

MongoDB comes with a console-executable command named mongofiles that enables you to interact with a GridFS Store on the MongoDB server. The mongofiles command uses the following syntax:

mongofiles <options> <commands>

<options> enables you to specify options to connect to the MongoDB database, similar to those of the mongo command. Table 24.1 describes some of those options.

Image

TABLE 24.1 Options Supported by the mongofiles Command

<command> enables you to specify the GridFS command that lists, puts, gets, and delete files in the GridFS Store. Table 24.2 describes some of those commands.

Image

TABLE 24.2 Options Supported by the mongofiles Command

For example, to store a file named test.data in the GridFS on the local running server, you would use a command similar to the following:

mongofiles --host localhost:27017 --db myFS  put test.data

Implementing a MongoDB GridFS Using the Java MongoDB Driver

In this section, you look at accessing and utilizing the MongoBD GridFS from Java applications. This section assumes that you have read through the other Java MongoDB driver hours. It also assumes that you have completed the Try It Yourself section in the Java hours on installing Java and configuring the Java MongoDB driver.

The following sections cover the basics of implementing a MongoDB GridFS in Java. They go through the process of accessing the GridFS, as well as listing, getting, putting, and removing files.

Accessing the MongoDB GridFS Object in Java

In Java, the MongoDB GridFS is accessed through the GridFS object. This object provides the necessary methods to list, put, get, and remove files in the MongoDB GridFS. When accessing the grid, you need to get an instance of the GridFS object using the following syntax, where db is a Database object:

GridFS myFS = new GridFS(db)

For example the following code gets an instance of the GridFS object in Java:

import com.mongodb.MongoClient;
import com.mongodb.DB;
import com.mongodb.gridfs.GridFS;
MongoClient mongoClient = new MongoClient("localhost", 27017);
DB db = mongoClient.getDB("myFS");
GridFS myFS = new GridFS(db);

Listing Files in the MongoDB GridFS from Java

When you have an instance of a GridFS object in Java, you can list files in the MongoDB GridFS using the getFileList() method. You can call getFileList() using the following formats:

getFileList()
getFileList(DBObject query)
getFileList(DBObject query, DBObject sort)

Using a standard query and sort objects, you can limit which files are returned in the list. The getFileList() method returns a DBCursor object containing the files that match from the GridFS. Using the DB Cursor object, you can then iterate through the files as necessary.

For example, the following code iterates through all the files in the GridFS Store:

GridFS myFS = new GridFS(db);
DBCursor files = myFS.getFileList();
for(final DBObject file : files) {
  System.out.println(file);
}

Putting a File into the MongoDB GridFS from Java

To put a file into the MongoDB GridFS in Java, you use the createFile() method. The createFile() method uses one of the following formats:

createFile(File f)
createFile(InputStream in)
createFile(InputStream in, String filename)
createFile(InputStream in, String filename, Boolean cloesStreamOnPersist)
createFile(String filename)

Use File to insert a file that already exists on the file system. Use InputStream to insert dynamic file contents and a simple filename to create an empty file in the GridFS Store.

For example, the following code inserts an existing file into the GridFS Store:

File newFile = new File("/tmp/myFile.txt");
GridFSInputFile gridFile = myFS.createFile(newFile);
gridFile.save();

The createFile() method returns an instances of the GridFSInputFile class. You can use this class to get and output a stream or save the file.

Getting a File from the MongoDB GridFS from Java

The simplest method of retrieving files from the GridFS Store in Java is to use the find() or findOne() methods on the GridFS object. These methods work similar to the find() and findOne() methods on the DBCursor object, in that they enable you to specify a query and directly return either a DBCursor or the GridFSDBFile object. The find method uses the following formats and returns either a GridFSDBFile object or a List<GridFSDBFile> object:

find(DBObject query)
find(DBObject query, DBObject sort)
find(ObjectId, id)
find(String filename)
find(String filename, DBObject sort)

The findOne() method uses the following syntax and returns a GridFSDBFile object:

findOne(DBObject query)
findOne(ObjectId, id)
findOne(String filename)

The GridFSDBFile object provides a couple useful functions. getInputStream() provides an input stream to write to. The writeTo() method enables you to write the contents of the GridFS file to a File or OutputStream. For example, the following code gets a file and then writes the contents to disk:

GridFS myFS = new GridFS(db);
GridFSDBFile file = myFS.findOne("java.txt");
file.writeTo(new File("JavaRetrieved.txt"));

Deleting a File from the MongoDB GridFS from Java

The simplest method for removing files from the GridFS Store in Java is to use the remove() method on the GridFS object. This method deletes the file from the MongoDB GridFS Store. The remove() method uses the following syntax:

remove(DBObject query)
remove(ObjectId, id)
remove(String filename)

For example, the following statement removes a file named test.txt:

GridFS myFS = new GridFS(db);
myFS.remove("test.txt");

Implementing a MongoDB GridFS Using the PHP MongoDB Driver

In this section, you look at accessing and utilizing the MongoBD GridFS from PHP applications. This section assumes that you have read through the other PHP MongoDB driver hours. It also assumes that you have completed the Try It Yourself section in the PHP hours on installing PHP and configuring the PHP MongoDB driver.

The following sections cover the basics of implementing a MongoDB GridFS in PHP. They go through the process of accessing the GridFS, as well as listing, getting, putting, and removing files.

Accessing the MongoDB MongoGridFS Object in PHP

In PHP, the MongoDB GridFS is accessed through the MongoGridFS object. This object provides the necessary methods to list, put, get, and remove files in the MongoDB GridFS. When accessing the grid, you need to get an instance of the MongoGridFS object using the following syntax, where $db is the MongoDatabase object:

$db->getGridFS();

For example, the following code gets an instance of the MongoGridFS object in PHP:

$mongo = new MongoClient("");
$db = $mongo->myFS;
$db->getGridFS();

Listing Files in the MongoDB GridFS from PHP

When you have an instance of a MongoGridFS object in PHP, you can list files in the MongoDB GridFS using the find() or findOne() methods. The find() method uses the following format and returns a MongoCursor object containing MongoGridFSFile objects:

find([query], [fields])

The find() method uses the following format and returns a single MongoGridFSFile object:

findOne([query], [fields])

The MongoGridFSFile object has the following methods:

Image getBytes(): Returns the file contents as a string of bytes

Image getFileName(): Returns the filename

Image getSize(): Returns the size of the file

Image write(path): Writes the file to the file system

In addition, you can access the MongoDB ID of the MongoGridFSFile using the following syntax:

MongoGridFSFile->file["_id"]

For example, the following code finds and iterates through all the files in the GridFS Store:

$myFS = $db->getGridFS();
$files = $myFS->find();
foreach ($files as  $id => $file){
  print_r($file->getFileName());
}

Putting a File into the MongoDB GridFS from PHP

To put a file into the MongoDB GridFS in PHP, you use the put() method of the MongoGridFS object. The put() method uses the following syntax:

put(filename, [metadata])

For example, the following code inserts an existing file into the GridFS Store:

$myFS = $db->getGridFS();
$file = $myFS->put('test.txt');

The put() method returns the _id of the saved file in the MongoDB GridFS Store.

Getting a File from the MongoDB GridFS from PHP

The simplest method of retrieving files from the GridFS Store in PHP is to use the find() or findOne() methods, already discussed in the previous sections. These methods return a MongoGridFSFile object that represents the file.

The following example shows how to get a specific file from the database, display the contents, and then write it to the local file system:

$myFS = $db->getGridFS();
$file = $myFS->findOne('php.txt');
print_r($file->getBytes());
$file.write('local.txt');

Deleting a File from the MongoDB GridFS from PHP

The simplest method of removing files from the GridFS Store in PHP is to use the delete() method on the MongoGridFS object. This method deletes the file from the MongoDB GridFS Store. The delete() method uses the following syntax:

delete(objectId)

For example, the following statement removes a file named test.txt:

$myFS = $db->getGridFS();
$file = $myFS->findOne('test.txt');
$myFS->delete($file->file["_id"]);

Implementing a MongoDB GridFS Using the Python MongoDB Driver

In this section, you look at accessing and utilizing the MongoBD GridFS from Python applications. This section assumes that you have read through the other Python MongoDB driver hours. It also assumes that you have completed the Try It Yourself section in the Python hours on installing Python and configuring the Python MongoDB driver.

The following sections cover the basics of implementing a MongoDB GridFS in Python. They go through the process of accessing the GridFS, as well as listing, getting, putting, and removing files.

Accessing the MongoDB GridFS Object in Python

In Python, the MongoDB GridFS is accessed through the GridFS object. This object provides the necessary methods to list, put, get, and remove files in the MongoDB GridFS. When accessing the grid, you need to get an instance of the GridFS object using the following syntax, where db is the Database object:

fs = gridfs.GridFs(db)

For example, the following code gets an instance of the GridFS object in Python:

mongo = MongoClient('mongodb://localhost:27017/')
db = mongo['myFS']
fs = gridfs.GridFs(db)

Listing Files in the MongoDB GridFS from Python

When you have an instance of a GridFS object in Python, you can list files in the MongoDB GridFS using the list() method on the GridFS object. The list() method returns a list of filenames stored in the MongoDB GridFS.

For example, the following code finds and iterates through all the files in the GridFS Store:

fs = gridfs.GridFs(db)
files = fs.list()
for file in files:
  print (file)

Putting a File into the MongoDB GridFS from Python

To put a file into the MongoDB GridFS in Python, you use the put() method of the GridFS object. The put() method uses the following syntax:

put(data, [**kwargs])

You can specify a filename as one of the kwargs arguments when inserting data. For example, the following code inserts a string with the filename test.txt into the GridFS Store:

fs = gridfs.GridFS(db)
fs.put("Test Text", filename="test.txt")

Getting a File from the MongoDB GridFS from Python

The simplest method to retrieve files from the GridFS Store in Python is to use the get_last_version() or get_version() methods on the GridFS object. These methods use the following syntax:

get_last_version(filename)
get_version(filename, version)

These methods return back a GridFSFile object that enables you to read data from the server using the read() method.

The following example shows how to get a specific file from the database, read it, and display the contents:

fs = gridfs.GridFS(db)
file = fs.get_last_version(filename="python.txt")
print (file.read())

Deleting a File from the MongoDB GridFS from Python

The simplest method of removing files from the GridFS Store in Python is to use the delete() method on the GridFS object. This method deletes the file from the MongoDB GridFS Store. The delete() method uses the following syntax:

delete(objectId)

The delete() method requires an objectId, so you need to use the _id attribute of the GridFSFile object returned from one of the get methods. For example, the following statement removes a file named test.txt:

fs = gridfs.GridFS(db)
file = fs.get_last_version(filename="python.txt")
fs.delete(file._id)

Implementing a MongoDB GridFS Using the Node.js MongoDB Driver

In this section, you access and use the MongoBD GridFS from Node.js applications. This section assumes that you have read through the other Node.js MongoDB driver hours. It also assumes that you have completed the Try It Yourself section in the Node.js hours on installing Node.js and configuring the Node.js MongoDB driver.

The following sections cover the basics of implementing a MongoDB GridFS in Node.js. They go through the process of accessing the GridFS, as well as listing, getting, putting, and removing files.

Accessing the MongoDB GridFS Object in Node.js

In Node.js, the MongoDB GridFS is accessed through the GridStore object. This object provides the necessary methods to list, put, get, and remove files in the MongoDB GridFS. When accessing the grid, you need to get an instance of the GridStore object for writing data to the GridFS. However, you can also call static methods on the object to list and delete files. To create an instance of a GridStore object for writing, you use the following syntax, where db is the Database object:

myFS = new GridStore(db, filename, mode, [options]);

For example, the following code gets an instance of the GridStore object for writing a file in Node.js:

mongo.connect("mongodb://localhost/myFS", function(err, db) {
  var myFS = new GridStore(db, 'myFile.txt', 'w');
});

Listing Files in the MongoDB GridFS from Node.js

You do not need an instance of a GridStore object in Node.js to list files. You can call the list() method directly on the class. The list() method returns a list of filenames stored in the MongoDB GridFS.

For example, the following code finds and iterates through all the files in the GridFS Store:

var files = GridStore.list();
for (var i in files){
  console.log(files[i];
}

Putting a File into the MongoDB GridFS from Node.js

To put a file into the MongoDB GridFS in Node.js, you create an instance of the GridStore object and specify the write mode. Then you can use the write() or writeFile() methods to write data to the GridFS. To illustrate this process, the following code creates a new file and writes data to it:

var myFS = new GridStore(db, "test.txt", "w");
myFS.writeFile("nodejs.txt", function(err, fsObj){
 . . .
});

Getting a File from the MongoDB GridFS from Node.js

The simplest method of retrieving file data from the GridFS Store in Node.js is to use the static read() methods on the GridStore class. The read() method uses the following syntax:

read(db, filename, callback)

The data read from the files is returned as the second parameter to the callback function. This example shows how to get a specific file from the database, read it, and display the contents:

GridStore.read(db, "nodejs.txt", function(err, data){
  console.log(data.toString());
});

You can also create an instance of the GridStore class for a specific file and then read the contents of it using the read([size]) and seek(position) methods. For example, the following creates an instance of a GridStore object for a specific file, then reads 10 bytes from it, seeks to offset 1000, and then reads 10 more bytes:

var myFS = new GridStore(db, "test.txt", "r");
myFS.read(10, function(err, fsObj){
  fsObj.seek(1000, function(err, fsObj){
    fsObj.read(10, function(err, fsObj){
      . . .
    });
  });
});

Deleting a File from the MongoDB GridFS from Node.js

The simplest method of removing files from the GridFS Store in Node.js is to use the unlink() method on the GridStore class. This method deletes the file from the MongoDB GridFS Store. The unlink() method uses the following syntax:

GridStore.unlink(db, filename, callback)

For example, the following statement removes a file named test.txt:

GridStore.unlink(db, "test.txt", function(err, gridStore){
  . . .
});

Summary

The MongoDB GridFS Store enables you to store large data files in the MongoDB database by splitting the large files into chunks. The chunks are stored in a collection called chunks in the MongoDB database. Metadata about the file is stored in another collection, called files. When you query the GridFS for a document, the metadata is read from the files collection; then the chunks are read from the chunks collection and sent back in the request.

In this hour, you learned how to implement the MongoDB GridFS Store from the console and in Java, PHP, Python and Node.js applications. You learned about some of the different objects and structures for each language and implemented your own basic applications.

Q&A

Q. Is it possible to directly access the files and chunks collections using normal MongoDB methods?

A. Yes. In reality, they are still just collections, but MongoDB abstracts the process of knowing which chunks to access to retrieve the file contents.

Q. Is it faster to retrieve a file using the MongoDB GridFS or directly from the file system?

A. Directly from the file system. You should use the MongoDB GridFS store only to solve specific problems, such as directory entry limitations, or when distributing files across multiple servers.

Workshop

The workshop consists of a set of questions and answers designed to solidify your understanding of the material covered in this hour. Try answering the questions before looking at the answers.

Quiz

1. What is the name of the collection that stores the GridFS file metadata for a database?

2. What is the name of the collection that stores the actual GridFS file chunks for a database?

3. What option do you use with mongofiles to replace existing files instead of creating new ones when using the put command?

4. What command do you use with mongofiles to list files in the MongoDB GridFS Store?

Quiz Answers

1. files

2. chunks

3. --replace

4. list

Exercises

1. Copy a large file such as an audio, video, or image file into the code/hour24 folder and use mongofiles to store it in the MongoDB GridFS Store.

2. Use the mongofiles get command to retrieve the same file, but with a new name.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.185.155