MongoDB (humongous) is a NoSQL document-oriented database. The documents are stored in the BSON format, which is JSON like. You can download a MongoDB distribution from http://www.mongodb.org/downloads. Installing should be just a matter of unpacking a compressed archive. The version at the time of writing was 2.6.3. In the bin
directory of the distribution, we will find the mongod
file, which starts the server. MongoDB expects to find a /data/db
directory. This is the directory where data is stored. We can specify another directory from the command line as follows:
$ mkdir /tmp/db
Start the database from the directory containing its binary executables:
./mongod --dbpath /tmp/db
We need to keep this process running to be able to query the database.
PyMongo is a Python driver for MongoDB. Install PyMongo as follows:
$ sudo pip install pymongo $ pip freeze|grep pymongo pymongo==2.7.1
Connect to the MongoDB test database:
from pymongo import MongoClient client = MongoClient() db = client.test_database
Recall that we can create JSON from a pandas DataFrame
. Create the JSON and store it in MongoDB:
data_loader = sm.datasets.sunspots.load_pandas() df = data_loader.data rows = json.loads(df.T.to_json()).values() db.sunspots.insert(rows)
Query the document we just created:
cursor = db['sunspots'].find({}) df = pd.DataFrame(list(cursor)) print df
This prints the entire pandas DataFrame
. Refer to the mongo_demo.py
file in this book's code bundle:
from pymongo import MongoClient import statsmodels.api as sm import json import pandas as pd client = MongoClient() db = client.test_database data_loader = sm.datasets.sunspots.load_pandas() df = data_loader.data rows = json.loads(df.T.to_json()).values() db.sunspots.insert(rows) cursor = db['sunspots'].find({}) df = pd.DataFrame(list(cursor)) print df db.drop_collection('sunspots')
18.118.7.102