What You’ll Learn in This Hour:
Using the MongoDB GridFS Store
Accessing the MongoDB GridFS Store from the console
Implementing the MongoDB GridFS Store in Java
Implementing the MongoDB GridFS Store in PHP
Implementing the MongoDB GridFS Store in Python
Implementing the MongoDB GridFS Store in Node.js
Occasionally, you want to store and retrieve data using MongoDB that exceeds the 16MB size limit. For example, you might be storing large images, ZIP files, movies, and more. To accommodate this, MongoDB provides the GridFS framework. The GridFS framework provides functionality to store large files in chunks, yet still be able to access them from the MongoDB interface.
The hour first focuses on how the GridFS Store works. Then it takes you through the process of using it in the MongoDB shell and with some of the MongoDB drivers. Each driver section is autonomous, so if you have no interest in that particular language, you can skip that section.
GridFS splits large documents into chunks. The chunks are stored in a collection called chunks
in the MongoDB database. Metadata about the file is stored in another collection, called files
. When you query the GridFS for a document, the metadata is read from the files
collection and then the chunks are read from the chunks
collection and sent back in the request.
A great feature of the GridFS is that the entire file does not need to be read into memory to be streamed back to the client request. This reduces the risk of low memory conditions.
The following lists some instances when you might want to use the MongoDB GridFS Store instead of standard file storage:
The file system limits the number of files in a directory. You can use GridFS to store as many files as needed.
You want to keep your files and metadata automatically synched and deployed across a number of systems using MongoDB replication.
You want to access information from portions of large files without having to load whole files into memory. You can use GridFS to recall sections of files without reading the entire file into memory.
You can implement GridFS either from the MongoDB shell or by using a MongoDB driver. Each of the MongoDB drivers provides GridFS functionality. The MongoDB Node.js driver provides the Grid
and GridStore
objects, which enable you to interact with the MongoDB GridFS.
MongoDB comes with a console-executable command named mongofiles
that enables you to interact with a GridFS Store on the MongoDB server. The mongofiles
command uses the following syntax:
mongofiles <options> <commands>
<options>
enables you to specify options to connect to the MongoDB database, similar to those of the mongo
command. Table 24.1 describes some of those options.
<command>
enables you to specify the GridFS command that lists, puts, gets, and delete files in the GridFS Store. Table 24.2 describes some of those commands.
For example, to store a file named test.data
in the GridFS on the local running server, you would use a command similar to the following:
mongofiles --host localhost:27017 --db myFS put test.data
In this section, you look at accessing and utilizing the MongoBD GridFS from Java applications. This section assumes that you have read through the other Java MongoDB driver hours. It also assumes that you have completed the Try It Yourself section in the Java hours on installing Java and configuring the Java MongoDB driver.
The following sections cover the basics of implementing a MongoDB GridFS in Java. They go through the process of accessing the GridFS, as well as listing, getting, putting, and removing files.
In Java, the MongoDB GridFS is accessed through the GridFS
object. This object provides the necessary methods to list, put, get, and remove files in the MongoDB GridFS. When accessing the grid, you need to get an instance of the GridFS
object using the following syntax, where db
is a Database
object:
GridFS myFS = new GridFS(db)
For example the following code gets an instance of the GridFS
object in Java:
import com.mongodb.MongoClient;
import com.mongodb.DB;
import com.mongodb.gridfs.GridFS;
MongoClient mongoClient = new MongoClient("localhost", 27017);
DB db = mongoClient.getDB("myFS");
GridFS myFS = new GridFS(db);
When you have an instance of a GridFS
object in Java, you can list files in the MongoDB GridFS using the getFileList()
method. You can call getFileList()
using the following formats:
getFileList()
getFileList(DBObject query)
getFileList(DBObject query, DBObject sort)
Using a standard query and sort objects, you can limit which files are returned in the list. The getFileList()
method returns a DBCursor
object containing the files that match from the GridFS. Using the DB Cursor
object, you can then iterate through the files as necessary.
For example, the following code iterates through all the files in the GridFS Store:
GridFS myFS = new GridFS(db);
DBCursor files = myFS.getFileList();
for(final DBObject file : files) {
System.out.println(file);
}
To put a file into the MongoDB GridFS in Java, you use the createFile()
method. The createFile()
method uses one of the following formats:
createFile(File f)
createFile(InputStream in)
createFile(InputStream in, String filename)
createFile(InputStream in, String filename, Boolean cloesStreamOnPersist)
createFile(String filename)
Use File
to insert a file that already exists on the file system. Use InputStream
to insert dynamic file contents and a simple filename to create an empty file in the GridFS Store.
For example, the following code inserts an existing file into the GridFS Store:
File newFile = new File("/tmp/myFile.txt");
GridFSInputFile gridFile = myFS.createFile(newFile);
gridFile.save();
The createFile()
method returns an instances of the GridFSInputFile
class. You can use this class to get and output a stream or save the file.
The simplest method of retrieving files from the GridFS Store in Java is to use the find()
or findOne()
methods on the GridFS
object. These methods work similar to the find()
and findOne()
methods on the DBCursor
object, in that they enable you to specify a query and directly return either a DBCursor
or the GridFSDBFile
object. The find
method uses the following formats and returns either a GridFSDBFile
object or a List<GridFSDBFile>
object:
find(DBObject query)
find(DBObject query, DBObject sort)
find(ObjectId, id)
find(String filename)
find(String filename, DBObject sort)
The findOne()
method uses the following syntax and returns a GridFSDBFile
object:
findOne(DBObject query)
findOne(ObjectId, id)
findOne(String filename)
The GridFSDBFile
object provides a couple useful functions. getInputStream()
provides an input stream to write to. The writeTo()
method enables you to write the contents of the GridFS file to a File
or OutputStream
. For example, the following code gets a file and then writes the contents to disk:
GridFS myFS = new GridFS(db);
GridFSDBFile file = myFS.findOne("java.txt");
file.writeTo(new File("JavaRetrieved.txt"));
The simplest method for removing files from the GridFS Store in Java is to use the remove()
method on the GridFS
object. This method deletes the file from the MongoDB GridFS Store. The remove()
method uses the following syntax:
remove(DBObject query)
remove(ObjectId, id)
remove(String filename)
For example, the following statement removes a file named test.txt
:
GridFS myFS = new GridFS(db);
myFS.remove("test.txt");
In this section, you look at accessing and utilizing the MongoBD GridFS from PHP applications. This section assumes that you have read through the other PHP MongoDB driver hours. It also assumes that you have completed the Try It Yourself section in the PHP hours on installing PHP and configuring the PHP MongoDB driver.
The following sections cover the basics of implementing a MongoDB GridFS in PHP. They go through the process of accessing the GridFS, as well as listing, getting, putting, and removing files.
In PHP, the MongoDB GridFS is accessed through the MongoGridFS
object. This object provides the necessary methods to list, put, get, and remove files in the MongoDB GridFS. When accessing the grid, you need to get an instance of the MongoGridFS
object using the following syntax, where $db
is the MongoDatabase
object:
$db->getGridFS();
For example, the following code gets an instance of the MongoGridFS
object in PHP:
$mongo = new MongoClient("");
$db = $mongo->myFS;
$db->getGridFS();
When you have an instance of a MongoGridFS
object in PHP, you can list files in the MongoDB GridFS using the find()
or findOne()
methods. The find()
method uses the following format and returns a MongoCursor
object containing MongoGridFSFile
objects:
find([query], [fields])
The find()
method uses the following format and returns a single MongoGridFSFile
object:
findOne([query], [fields])
The MongoGridFSFile
object has the following methods:
getBytes(): Returns the file contents as a string of bytes
getFileName(): Returns the filename
getSize(): Returns the size of the file
write(path): Writes the file to the file system
In addition, you can access the MongoDB ID of the MongoGridFSFile
using the following syntax:
MongoGridFSFile->file["_id"]
For example, the following code finds and iterates through all the files in the GridFS Store:
$myFS = $db->getGridFS();
$files = $myFS->find();
foreach ($files as $id => $file){
print_r($file->getFileName());
}
To put a file into the MongoDB GridFS in PHP, you use the put()
method of the MongoGridFS
object. The put()
method uses the following syntax:
put(filename, [metadata])
For example, the following code inserts an existing file into the GridFS Store:
$myFS = $db->getGridFS();
$file = $myFS->put('test.txt');
The put()
method returns the _id
of the saved file in the MongoDB GridFS Store.
The simplest method of retrieving files from the GridFS Store in PHP is to use the find()
or findOne()
methods, already discussed in the previous sections. These methods return a MongoGridFSFile
object that represents the file.
The following example shows how to get a specific file from the database, display the contents, and then write it to the local file system:
$myFS = $db->getGridFS();
$file = $myFS->findOne('php.txt');
print_r($file->getBytes());
$file.write('local.txt');
The simplest method of removing files from the GridFS Store in PHP is to use the delete()
method on the MongoGridFS
object. This method deletes the file from the MongoDB GridFS Store. The delete()
method uses the following syntax:
delete(objectId)
For example, the following statement removes a file named test.txt
:
$myFS = $db->getGridFS();
$file = $myFS->findOne('test.txt');
$myFS->delete($file->file["_id"]);
In this section, you look at accessing and utilizing the MongoBD GridFS from Python applications. This section assumes that you have read through the other Python MongoDB driver hours. It also assumes that you have completed the Try It Yourself section in the Python hours on installing Python and configuring the Python MongoDB driver.
The following sections cover the basics of implementing a MongoDB GridFS in Python. They go through the process of accessing the GridFS, as well as listing, getting, putting, and removing files.
In Python, the MongoDB GridFS is accessed through the GridFS
object. This object provides the necessary methods to list, put, get, and remove files in the MongoDB GridFS. When accessing the grid, you need to get an instance of the GridFS
object using the following syntax, where db
is the Database
object:
fs = gridfs.GridFs(db)
For example, the following code gets an instance of the GridFS
object in Python:
mongo = MongoClient('mongodb://localhost:27017/')
db = mongo['myFS']
fs = gridfs.GridFs(db)
When you have an instance of a GridFS
object in Python, you can list files in the MongoDB GridFS using the list()
method on the GridFS
object. The list()
method returns a list of filenames stored in the MongoDB GridFS.
For example, the following code finds and iterates through all the files in the GridFS Store:
fs = gridfs.GridFs(db)
files = fs.list()
for file in files:
print (file)
To put a file into the MongoDB GridFS in Python, you use the put()
method of the GridFS
object. The put()
method uses the following syntax:
put(data, [**kwargs])
You can specify a filename
as one of the kwargs
arguments when inserting data. For example, the following code inserts a string with the filename test.txt
into the GridFS Store:
fs = gridfs.GridFS(db)
fs.put("Test Text", filename="test.txt")
The simplest method to retrieve files from the GridFS Store in Python is to use the get_last_version()
or get_version()
methods on the GridFS
object. These methods use the following syntax:
get_last_version(filename)
get_version(filename, version)
These methods return back a GridFSFile
object that enables you to read data from the server using the read()
method.
The following example shows how to get a specific file from the database, read it, and display the contents:
fs = gridfs.GridFS(db)
file = fs.get_last_version(filename="python.txt")
print (file.read())
The simplest method of removing files from the GridFS Store in Python is to use the delete()
method on the GridFS
object. This method deletes the file from the MongoDB GridFS Store. The delete()
method uses the following syntax:
delete(objectId)
The delete()
method requires an objectId
, so you need to use the _id
attribute of the GridFSFile
object returned from one of the get
methods. For example, the following statement removes a file named test.txt
:
fs = gridfs.GridFS(db)
file = fs.get_last_version(filename="python.txt")
fs.delete(file._id)
In this section, you access and use the MongoBD GridFS from Node.js applications. This section assumes that you have read through the other Node.js MongoDB driver hours. It also assumes that you have completed the Try It Yourself section in the Node.js hours on installing Node.js and configuring the Node.js MongoDB driver.
The following sections cover the basics of implementing a MongoDB GridFS in Node.js. They go through the process of accessing the GridFS, as well as listing, getting, putting, and removing files.
In Node.js, the MongoDB GridFS is accessed through the GridStore
object. This object provides the necessary methods to list, put, get, and remove files in the MongoDB GridFS. When accessing the grid, you need to get an instance of the GridStore
object for writing data to the GridFS. However, you can also call static methods on the object to list and delete files. To create an instance of a GridStore
object for writing, you use the following syntax, where db
is the Database
object:
myFS = new GridStore(db, filename, mode, [options]);
For example, the following code gets an instance of the GridStore
object for writing a file in Node.js:
mongo.connect("mongodb://localhost/myFS", function(err, db) {
var myFS = new GridStore(db, 'myFile.txt', 'w');
});
You do not need an instance of a GridStore
object in Node.js to list files. You can call the list()
method directly on the class. The list()
method returns a list of filenames stored in the MongoDB GridFS.
For example, the following code finds and iterates through all the files in the GridFS Store:
var files = GridStore.list();
for (var i in files){
console.log(files[i];
}
To put a file into the MongoDB GridFS in Node.js, you create an instance of the GridStore
object and specify the write mode. Then you can use the write()
or writeFile()
methods to write data to the GridFS. To illustrate this process, the following code creates a new file and writes data to it:
var myFS = new GridStore(db, "test.txt", "w");
myFS.writeFile("nodejs.txt", function(err, fsObj){
. . .
});
The simplest method of retrieving file data from the GridFS Store in Node.js is to use the static read()
methods on the GridStore
class. The read()
method uses the following syntax:
read(db, filename, callback)
The data read from the files is returned as the second parameter to the callback function. This example shows how to get a specific file from the database, read it, and display the contents:
GridStore.read(db, "nodejs.txt", function(err, data){
console.log(data.toString());
});
You can also create an instance of the GridStore
class for a specific file and then read the contents of it using the read([size])
and seek(position)
methods. For example, the following creates an instance of a GridStore
object for a specific file, then reads 10 bytes from it, seeks to offset 1000, and then reads 10 more bytes:
var myFS = new GridStore(db, "test.txt", "r");
myFS.read(10, function(err, fsObj){
fsObj.seek(1000, function(err, fsObj){
fsObj.read(10, function(err, fsObj){
. . .
});
});
});
The simplest method of removing files from the GridFS Store in Node.js is to use the unlink()
method on the GridStore
class. This method deletes the file from the MongoDB GridFS Store. The unlink()
method uses the following syntax:
GridStore.unlink(db, filename, callback)
For example, the following statement removes a file named test.txt
:
GridStore.unlink(db, "test.txt", function(err, gridStore){
. . .
});
The MongoDB GridFS Store enables you to store large data files in the MongoDB database by splitting the large files into chunks. The chunks are stored in a collection called chunks
in the MongoDB database. Metadata about the file is stored in another collection, called files
. When you query the GridFS for a document, the metadata is read from the files
collection; then the chunks are read from the chunks
collection and sent back in the request.
In this hour, you learned how to implement the MongoDB GridFS Store from the console and in Java, PHP, Python and Node.js applications. You learned about some of the different objects and structures for each language and implemented your own basic applications.
Q. Is it possible to directly access the files and chunks collections using normal MongoDB methods?
A. Yes. In reality, they are still just collections, but MongoDB abstracts the process of knowing which chunks to access to retrieve the file contents.
Q. Is it faster to retrieve a file using the MongoDB GridFS or directly from the file system?
A. Directly from the file system. You should use the MongoDB GridFS store only to solve specific problems, such as directory entry limitations, or when distributing files across multiple servers.
The workshop consists of a set of questions and answers designed to solidify your understanding of the material covered in this hour. Try answering the questions before looking at the answers.
1. What is the name of the collection that stores the GridFS file metadata for a database?
2. What is the name of the collection that stores the actual GridFS file chunks for a database?
3. What option do you use with mongofiles
to replace existing files instead of creating new ones when using the put
command?
4. What command do you use with mongofiles
to list files in the MongoDB GridFS Store?
1. files
2. chunks
3. --replace
4. list
1. Copy a large file such as an audio, video, or image file into the code/hour24
folder and use mongofiles
to store it in the MongoDB GridFS Store.
2. Use the mongofiles get
command to retrieve the same file, but with a new name.
18.221.185.155