Chapter 8. Working with Databases

If you work with data, sooner or later, you will come into contact with databases. This chapter introduces various databases (relational and NoSQL) and related APIs. A relational database is a database that has a collection of tables containing data organized by the relations between data items. A relationship can be set up between each row in the table and a row in another table. A relational database does not just pertain to relationships between tables; firstly, it has to do with the relationship between columns inside a table (obviously, columns within a table have to be related, for instance, a name column and an address column in a customer table); secondly, it relates to connections between tables.

NoSQL (Not Only SQL) databases are undergoing substantial growth in Big Data and web applications. NoSQL systems may in fact permit SQL-like query languages to be employed. The main theme of NoSQL databases is allowing data to be stored in a more flexible manner than the relational model permits. This may mean not having a database schema or a flexible database schema. Of course, the flexibility and speed may come at a price such as limited support for consistent transactions. NoSQL databases can store data using a dictionary style, in a column-oriented way, as documents, objects, graphs, tuples, or a combination thereof. The topics of this chapter are listed as follows:

  • Lightweight access with sqlite3
  • Accessing databases from pandas
  • Installing and setting up SQLAlchemy
  • Populating a database with SQLAlchemy
  • Querying the database with SQLAlchemy
  • Pony ORM
  • Dataset—databases for lazy people
  • PyMongo and MongoDB
  • Storing data in Redis
  • Apache Cassandra

Lightweight access with sqlite3

SQLite is a very popular relational database. It's very lightweight and used by many applications, for instance, web browsers such as Mozilla Firefox. The sqlite3 module in the standard Python distribution can be used to work with a SQLite database. With sqlite3, we can either store the database in a file or keep it in RAM. For this example, we will do the latter. Import sqlite3 as follows:

import sqlite3

A connection to the database is needed to proceed. If we wanted to store the database in a file, we would provide a filename. Instead, do the following:

with sqlite3.connect(":memory:") as con:

The with statement is standard Python and relies on the presence of a __exit__() method in a special context manager class. With this statement, we don't need to explicitly close the connection. The closing of the connection is done automatically by the context manager. After connecting to a database, we need a cursor. That's generally how it works with databases by the way. A database cursor is similar to a cursor in a text editor, in concept at least. We are required to close the cursor as well. Create the cursor as follows:

c = con.cursor()

We can now immediately create a table. Usually, you have to create a database first or have it created for you by a database specialist. In this chapter, you not only need to know Python, but SQL too. SQL is a specialized language for database querying and manipulating. We don't have enough space to describe SQL completely. However, basic SQL should be easy for you to pick up (for example, go through http://www.w3schools.com/sql/). To create a table, we pass a SQL string to the cursor as follows:

c.execute('''CREATE TABLE sensors
                 (date text, city text, code text, sensor_id real, temperature real)''')

This should create a table with several columns called sensors. In this string, text and real are data types corresponding to string and numerical values. We could trust the table creation to have worked properly. If something goes wrong, we will get an error. Listing the tables in a database is database dependent. There is usually a special table or set of tables containing metadata about user tables. List the SQLite tables as follows:

for table in c.execute("SELECT name FROM sqlite_master WHERE type = 'table'"):
        print "Table", table[0]

As expected, we get the following output:

Table sensors

Let's insert and query some random data as follows:

c.execute("INSERT INTO sensors VALUES ('2016-11-05','Utrecht','Red',42,15.14)")
c.execute("SELECT * FROM sensors")
print c.fetchone()

The record we inserted should be printed as follows:

(u'2016-11-05', u'Utrecht', u'Red', 42.0, 15.14)

When we don't need a table anymore, we can drop it. This is dangerous, so you have to be absolutely sure you don't need the table. Once a table is dropped, it cannot be recovered unless it was backed up. Drop the table and show the number of tables after dropping it as follows:

con.execute("DROP TABLE sensors")

print "# of tables", c.execute("SELECT COUNT(*) FROM sqlite_master WHERE type = 'table'").fetchone()[0]

We get the following output:

# of tables 0

Refer to the sqlite_demo.py file in this book's code bundle for the following code:

import sqlite3

with sqlite3.connect(":memory:") as con:
    c = con.cursor()
    c.execute('''CREATE TABLE sensors
                 (date text, city text, code text, sensor_id real, temperature real)''')

    for table in c.execute("SELECT name FROM sqlite_master WHERE type = 'table'"):
        print "Table", table[0]

    c.execute("INSERT INTO sensors VALUES ('2016-11-05','Utrecht','Red',42,15.14)")
    c.execute("SELECT * FROM sensors")
    print c.fetchone()
    con.execute("DROP TABLE sensors")

    print "# of tables", c.execute("SELECT COUNT(*) FROM sqlite_master WHERE type = 'table'").fetchone()[0]

    c.close()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.196.146