What You’ll Learn in This Hour:
Using the Python Database
objects to access the MongoDB database
Implementing the Python MongoDB driver in Python applications
Connecting to the MongoDB database in Python applications
Using methods to find and retrieve documents in Python applications
Sorting documents in a cursor before retrieving them in Python applications
This hour introduces you to implementing MongoDB in Python applications. To be able to access and utilize MongoDB in Python application, you first need to implement the Python MongoDB driver. The Python MongoDB driver is a library that provides the necessary objects and functionality to access a MongoDB server from your Python applications.
These object are similar to the objects you have already been working with in the MongoDB shell. The examples in this hour and the following one rely on the fact that you already understand the structure of Database
object and requests. If you have not already gone through Hours 5–9, you should do so before continuing with this one.
The following sections describe the objects you deal with in Python to access the MongoDB server, databases, collection, and documents. You also implement the Python MongoDB driver and begin accessing documents in the example collection.
The Python MongoDB driver provides several objects that enable you to connect to the MongoDB database and find and manipulate documents in collections. These objects represent the connection, database, collection cursor, and documents on the MongoDB server and provide the necessary functionality to integrate data from a MongoDB database into your Python applications.
The following sections describe how each of these objects is created and used in Python.
The Python MongoClient
object provides the functionality to connect to the MongoDB server and access databases. The first step you take in implementing MongoDB in your Python applications is to create an instance of the MongoClient
object. Then you can use the object to get the database, set the write concern, and perform other operations (see Table 16.1).
To create an instance of the MongoClient
object, you need to use new MongoClient()
with the appropriate options. The most basic form connects on the localhost with the default port:
mongo = new MongoClient("")
You can also use a connection string that uses this format:
mongodb://username:password@host:port/database?options
For example, to connect to the words
database on host 1.1.1.1 on port 8888 with username test
and password myPass
, you would use the following:
mongo = MongoClient("mongodb://test:[email protected]:8888/words")
After you have created an instance of the MongoClient
object, you can use the methods in Table 16.1 to get the database and set options.
The Python Database
object provides the functionality to authenticate, access users, and access and manipulate collections. The databases associated with a MongoClient
are stored as part of the MongoClient
object’s internal dictionary. The simplest method of getting an instance of a Database
object is to access it directly by name on the MongoClient
object. For example, the following gets a Database
object for the words
database:
mongo = MongoClient("")
db = mongo["words"]
After you have created an instance of the Database
object, you can use the object to access the database. Table 16.2 lists the more common methods available on the Database
object.
The Python Collection
object provides the functionality to access and manipulate documents in a collection. The collections associated with a Database
are stored as part of the Database
object’s internal dictionary. The simplest method of getting an instance of a Collection
object is to access it directly by name on the Database
object. For example, the following gets a Collection
object for the word_stats
collection on the words
database:
mongo = MongoClient("")
db = mongo["words"]
collection = db["word_stats"]
After you have created an instance of the Collection
object, you can use the object to access the collection. Table 16.3 lists the more common methods available on the Collection
object.
The Python Cursor
object represents a set of documents on the MongoDB server. Typically, a Cursor
object is returned when you query the collection using a find
operation. Instead of returning the full set of document objects to the Python application, a Cursor
object is returned, enabling you to access the documents in a controlled manner from Python.
The Cursor
object uses an index to iterate through documents on the server. The cursor pulls down documents from the server in batches. During iteration, when the index passes the end of the current batch, a new batch of documents is retrieved from the server.
The following code shows an example of getting an instance of the Cursor
object using a find
operation:
mongo = MongoClient("")
db = mongo['words']
collection = db['word_stats']
cursor = collection.find()
After you have created an instance of the Cursor
object, you can use the object to access documents in the collection. Table 16.4 lists the more common methods available on the Cursor
object.
As you saw in the MongoDB shell hours, most of the database, collection, and cursor operations accept objects as parameters. These objects define things such as query, sort, aggregation, and other operators. In addition, documents are returned from the database as objects.
In the MongoDB shell, these are JavaScript objects. However, in Python, objects that represent documents and request parameters are a Dictionary
object. When the server returns a document from a cursor or request, it is in a Dictionary
object, which has keys that match the fields in the document. For objects that you use as parameters to requests, you also use a Dictionary
object.
Dictionary
objects are built using the standard Python syntax:
myDict = {key : value, ...)
Database operations that involve writing data to the database use a write concern that defines how to verify database writes before returning. As you likely noticed in the previous sections, several objects have a write_concern
property that you can set to a Dictionary
object that defines the write concern options. These options enable you to configure the write concern, timeout, and other options that best fit your application.
The following list describes some of the options you can set in the write_concern Dictionary
object:
w: Sets the write concern value as 1
for acknowledged, 0
for unacknowledged, and majority
for majority.
j: Set to True
or False
to enable or disable journal acknowledged.
wtimeout: Amount of time (in milliseconds) to wait for write concern acknowledgment.
fsync: Forces the database to fsync
all files before returning when True
.
As an example, the following illustrates using a basic options Dictionary
object in Python:
collection.write_concern = {'w' : 1, 'j' : True, 'wtimeout': 10000, 'fsync': True);
A common task in Python applications is to find one or more documents that you need to use in your application. Finding documents in Python is similar to finding them using the MongoDB shell. You can get one document or many, and you can use queries to limit which documents are returned.
The following sections discuss using the Python objects to find and retrieve documents from a MongoDB collection.
The Collection
object provides the find()
and find_one()
methods, similar to what you saw in the MongoDB shell. These methods find a single document or multiple documents.
When you call find_one()
, the server returns a single document as a Dictionary
object. You can then use the object in your application as needed. For example:
doc = myColl.find_one()
The find()
method on the Collection
object returns a Cursor
object that only represents the documents found and does not initially retrieve them. The Cursor
object can be iterated in a few different ways.
You can use a for
loop method to determine whether you have reached the end of the cursor. For example:
cursor = myColl.find()
for doc in cursor:
print (doc)
Because Python treats the cursor as a list, you can also use Python slice
syntax to access portions of the cursor. For example, the following finds all documents and then displays documents 5–10:
cursor = collection.find()
slice = cursor[5:10]
for doc in slice:
print (doc)
Generally, you do not want to retrieve all documents in a collection from the server. The find()
and find_one()
methods enable you to send a query object to the server that limits documents in the same way you saw with the MongoDB shell.
To build the query
object, you can use the Dictionary
object described earlier. For fields in the object that require subobjects, you can create a sub Dictionary
object. For other types, such as integers, strings, and arrays, use the Python equivalent.
For example, to create a query
object that finds words with size=5
, you would use
query = {'size' : 5}
myColl.find(query)
However, to create a query
object that finds words with size>5
, you would need to use
query = {'size' :
{'$gt' : 5}}
myColl.find(query)
To create a query
object that finds words with a first
letter of x
, y
, or z
, you would need to use a String
array. For example:
query = {'first' :
{'$in' : ["x", "y", "z"]}}
myColl.find(query)
You should be able to use these techniques to build any type of query
object you need—not only for find
operations, but for others that enable you to use a query
object.
When accessing document sets in MongoDB from Python, you might want to only get a count first before deciding to retrieve a set of documents. Performing a count is much less intensive on the MongoDB server and client because the actual documents do not need to be transferred.
The count()
method on the Cursor
object enables you to get a simple count of documents that are represented. For example, the following code uses the find()
method to get a Cursor
object and then uses the count()
method to get the number of items:
cursor = wordsColl.find()
itemCount = cursor.count()
The value of itemCount
is the number of words that match the find()
operation.
An important aspect of retrieving documents from a MongoDB database is the capability to get them in a sorted order. This is especially helpful if you are retrieving only a certain number of results, such as the top 10, or if you are paging the requests. The options object provides the sort
option, which enables you to specify the sort order and direction of one or more fields in the document.
The sort()
method on the Cursor
object enables you to specify fields to sort the documents represented in the cursor and return them in that order. The sort()
method accepts a list of tuples that provide a (key,order)
pair. The key
is the field name to sort on, and order
is 1
for ascending and -1
for descending.
For example, to sort on the name
field in ascending order, you would use
sorter = [('name', 1)]
cursor = myCollection.find()
cursor.sort(sorter)
You can use multiple fields in the object passed to the sort()
method, and the documents will be sorted on those fields. You can also apply sort()
multiple times on the same cursor to sort on different fields. For example, to sort on the name
field descending first and then the value
field ascending, you could use
sorter = [('name', 1), ('value', -1)];
cursor = myCollection.find()
cursor.sort(sorter)
Or you could use
sorter1 = [('name', 1)]
sorter2 = [('value', -1)]
cursor = myCollection.find()
cursor = cursor.sort(sorter1)
cursor.sort(sorter2)
In this hour, you looked at the objects the Python MongoDB driver provides. These objects represent the connection, database, collection, cursor, and documents and provide functionality to access MongoDB from your Python applications.
You also implemented the Python MongoDB driver and created a basic Python MongoDB application to connect to the database. Then you learned how to use the Collection
and Cursor
objects to find and retrieve documents. Finally, you learned how to count and sort documents represented by the cursor before retrieving them.
Q. Are there additional Python objects not discussed in this hour?
A. Yes. This hour covers the major objects you need to know about. However, the Python MongoDB driver has a lot more supporting objects and functions. You can find the documentation at http://api.mongodb.org/python/current/api/index.html.
Q. Which versions of Python support implementing MongoDB?
A. It depends on your platform. Most platforms support version 2.5 and above for both 32-and 64-bit versions of Python.
The workshop consists of a set of questions and answers designed to solidify your understanding of the material covered in this hour. Try answering the questions before looking at the answers.
1. How do you control which documents a find()
operation returns?
2. How do you sort documents based on the name
field in ascending order?
3. How do you get the value of fields in the Database
object?
4. True or false: The find_one()
method returns a Cursor
object.
1. Create a Dictionary
query object that defines a query filter.
2. Create a parameter called [('name', 1)]
and pass it to the sort()
method.
3. Use the get(fieldName)
method.
4. False. It returns a Dictionary
object representing the document.
1. Extend the PythonFindSort.py
file to include a method that sorts documents first by size in descending order and then by last letter, also in descending order.
2. Extend the PythonFindSpecific.py
file to find words that start with a
and end with e
.
3.147.49.182