Chapter 14. Accessing Databases

Just about every large enterprise system uses a database for storing data. For example, Amazon.com, the online retailer, needs a database to store information on each product for sale. For Python to prove capable of handling these types of enterprise applications, the language must be able to access databases.

Luckily, Python provides a database API (Application Programming Interface — how you program for the database), which is a generic API that enables you to access most databases, in spite of the databases' different native APIs. The database, or DB, API doesn't define all aspects of working with databases, so some minor differences exist. For the most part, though, you can access databases such as Oracle or MySQL from your Python scripts without worrying too much about the details of the specific databases.

Having a generic database API is very useful because you may need to switch databases or have your application work with multiple databases, and you won't want to recode major parts of your program to allow this. Normally, you can do all of this in Python without a lot of programming changes.

Even if you aren't writing the next amazon.com online site, databases provide a convenient means to persist data for longer than the program is running (so that you don't lose the data that a user has entered if you want to restart your program), query for items, and modify your data in a safe manner.

This chapter covers the two main database systems supported by Python: dbm persistent dictionaries and relational databases with the DB API. In addition, this chapter describes how to set up a database, in case you don't have one handy.

In this chapter you learn:

  • Using the dbm libraries to create persistent dictionaries

  • About relational databases

  • Setting up the Sqlite database

  • Setting up the MySQL database

  • Working with the Python DB API

  • Creating connections

  • Accessing data with cursors

  • Connecting to databases

  • Querying and modifying data

  • Working with transactions

  • Handling errors

  • Using other database tools

In many cases, you don't require a full-blown relational database. In such cases, creating a persistent dictionary using dbm files is enough.

Working with DBM Persistent Dictionaries

A persistent dictionary acts exactly like you'd expect. You can store name/value pairs in the dictionary, which are saved to a disk, and so their data will endure between various times that your program is run. So if you save data to a dictionary that's backed by a dbm, the next time you start your program you can read the value stored under a given key again, once you've loaded the dbm file. These dictionaries work like normal Python dictionaries, which are covered in Chapter 3. The main difference is that the data is written to and read from disk.

An additional difference is that the keys and the values must both be strings.

DBM, short for database manager, acts as a generic name for a number of C language libraries originally created on UNIX systems. These libraries sport names such as dbm, gdbm, ndbm, sdbm, and so on. These names correspond closely to the available modules in Python that provide the requisite functionality.

Choosing a DBM Module

Python supports a number of dbm modules. Each dbm module supports a similar interface and uses a particular C library to store the data to disk. The main difference lies in the underlying binary format of the data files on disk. Each dbm module, unfortunately, creates incompatible files. That is, if you create a dbm persistent dictionary with one dbm module, you must use the same module to read the data. None of the other modules will work with that data file.

The following table lists the dbm modules.

Module

Description

dbm

Chooses the best dbm module

dbm.dumb

Uses a simple, but portable, implementation of the dbm library

dbm.gnu

Uses the GNU dbm library

All of these libraries exist because of the history of the dbm library. Originally, this library was available only on commercial versions of UNIX. Free versions of UNIX, and later Linux, Windows, and so on, could not use the dbm library. This led to the creation of alternative libraries, such as the Berkeley UNIX library and the GNU gdbm library.

With all the incompatible file formats, this plethora of libraries can be a real pain. The dbm module, though, offers a handy alternative to choosing a specific dbm module. With the dbm module, you can let it choose for you. In general, the dbm module will choose the best implementation available on your system when creating a new persistent dictionary. When reading a file, the dbm module uses the function whichdb to make an informed guess as to which library created the data file.

Unless you need a specific advanced feature of one of the dbm libraries, use the dbm module.

Creating Persistent Dictionaries

All of the dbm modules support an open function to create a new dbm object. Once opened, you can store data in the dictionary, read data, close the dbm object (and the associated data file or files), remove items, and test for the existence of a key in the dictionary.

To open a dbm persistent dictionary, use the open function on the module you choose. For example, you can create a persistent dictionary with the dbm module.

Accessing Persistent Dictionaries

With the dbm modules, you can treat the object you get back from the open function as a dictionary object. Get and set values using code like the following:

db['key'] = 'value'
value = db['key']

Remember that the key and the value must both be text strings.

You can delete a value in the dictionary using del:

del db['key']

The keys method returns a list of all the keys, in the same way it would with a normal dictionary:

for key in db.keys():
    # do something...

Note

The keys method may take a long time to execute if there are a huge number of keys in the file. In addition, this method may require a lot of memory to store the potentially large list that it would create with a large file.

You can use the following script as a guide for how to program with dbm persistent dictionaries.

Deciding When to Use DBM and When to Use a Relational Database

The dbm modules work when your data needs can be stored as key/value pairs. You can store more complicated data within key/value pairs with some imagination — for instance, by creating formatted strings that use a comma or some other character to delimit items in the strings, both on the key and the value part of the dictionary. This can be useful, but it can also be very difficult to maintain, and it can restrict you because your data is stored in an inflexible manner. Another way that you can be limited is technical: Note that some dbm libraries limit the amount of space you can use for the values (sometimes to a maximum of 1024 bytes, which is very, very little).

You can use the following guidelines to help determine which of these two types of data storage is appropriate for your needs:

  • If your data needs are simple, use a dbm persistent dictionary.

  • If you plan to store only a small amount of data, use a dbm persistent dictionary.

  • If you require support for transactions, use a relational database. (Transactions are when more than one thing happens at once — they let you keep your data from getting changed in one place but not in another; you get to define what happens concurrently with transactions.)

  • If you require complex data structures or multiple tables of linked data, use a relational database.

  • If you need to interface to an existing system, use that system, obviously. Chances are good this type of system will be a relational database.

Unlike the simple dbm modules, relational databases provide a far richer and more complex API.

On a side note, I should mention that there is a third type of database you can work with, though it is beyond the scope of this chapter. That third type is known as an ORM or object-relational database, and it allows for the conversion of data between type systems that are incompatible in relational databases.

Python has several options available if you wish to work with an ORM, such as SQL Object, SQLAlchemy, and even the Django ORM. For more information, visit the Python Wiki at http://wiki.python.org/moin/HigherLevelDatabaseProgramming.

Working with Relational Databases

Relational databases have been around for decades so they are a mature and well-known technology. People who work with relational databases know what they are supposed to do, and how they are supposed to work, so relational databases are the technology of choice for complex data storage.

In a relational database, data is stored in tables that can be viewed as two-dimensional data structures. The columns, or vertical part of the two-dimensional matrix, are all of the same type of data; like strings, numbers, dates, and so on. Each horizontal component of the table is made up of rows, also called records. Each row in turn is made up of columns. Typically, each record holds the information pertaining to one item, such as an audio CD, a person, a purchase order, an automobile, and so on.

For example, the following shows a simple employee table.

Empid

firstname

lastname

department

manager

phone

105

Peter

Tosh

2

45

555-5555

201

Bob

Marley

1

36

555-5551

This table holds six columns:

  • empid: Holds the employee ID number. Relational databases make extensive use of ID numbers where the database manages the assignment of unique numbers so that each row can be referenced with these numbers to make each row unique (even if they have identical data). We can then refer to each employee by the ID number. The ID alone provides enough information to look up the employee.

  • firstname: Holds the person's first name.

  • lastname: Holds the person's last name.

  • department: Holds the ID of the department in which the employee works. This would likely be a numeric ID of the department, where departments are defined in a separate table that has a unique ID for each department.

  • manager: Holds the employee ID of the manager of the given employee. This is sort of self-referential, because in this example, a manager is actually an employee.

  • phone: Holds the office phone number.

In real life, a company would likely store a lot more information about an employee, such as a taxation authority identification number (Social Security number in the U.S.), home address, and more, but not anything that's really different in principle to what you've already seen.

In this example, the column empid, the employee ID, would be used as the primary key. A primary key is a unique index for a table, where each element has to be unique because the database will use that element as the key to the given row and as the way to refer to the data in that row, in a manner similar to dictionary keys and values in Python. So, each employee needs to have a unique ID number, and once you have an ID number, you can look up any employee. So, the empid will act as the key into this table's contents.

The department column holds an ID of a department — that is, an ID of a row in another table. This ID could be considered a foreign key, because the ID acts as a key into another table. (In databases, a foreign key has a much more strict definition, so it's okay to think of it this way.)

For example, the following table shows a possible layout for the department table.

department id

name

manager

1

Development

47

2

QA

32

In these examples, the employee Peter Tosh works for department 2, the QA, or Quality Assurance, department in a dynamic world-class high-quality software development firm. Bob Marley works for department 1, the Development department.

In a large enterprise, there may be hundreds of tables in the database, with thousands or even millions of records in some tables.

Writing SQL Statements

The Structured Query Language, or SQL, defines a standard language for querying and modifying databases.

You can pronounce SQL as "sequel" or "s-q-l."

SQL supports the basic operations listed in the following table.

Operation

Usage

Select

Perform a query to search the database for specific data.

Update

Modify a row or rows, usually based on a certain condition.

Insert

Create new rows in the database.

Delete

Remove a row or rows from the database.

In general, these basic operations are called QUID, short for Query, Update, Insert, and Delete, or CRUD, short for Create, Read, Update, and Delete. SQL offers more than these basic operations, but for the most part, these are the majority of what you're going to use to write applications.

If you are not familiar with SQL, look at a SQL book or search on the Internet. You will find a huge amount of tutorial material. You can also look at the website for this book for more references to SQL resources.

SQL is important because when you access databases with the Python DB API, you must first create SQL statements and then execute these statements by having the database evaluate them. You then retrieve the results and use them. Thus, you will find yourself in the awkward position of using one language, Python, to create commands in another language, SQL.

The basic SQL syntax for the CRUD operations follows:

SELECT columns FROM tables WHERE condition ORDER BY columns ascending_or_
descending

UPDATE table SET new values WHERE condition

INSERT INTO table (columns) VALUES (values)

DELETE FROM table WHERE condition

In addition to this basic look at the available syntax, many more parameters and specifiers for each operation are optional. You can still use them with Python's DB API if you're familiar with SQL.

To insert a new row in the employee table, using the previous employee example, you can use a SQL query like the following (even though it's adding data and not getting data, the convention is that all SQL commands or statements can also be called queries):

insert into employee (empid, firstname, lastname, manager, dept, phone)
    values (3, 'Bunny', 'Wailer', 2, 2, '555-5553')

In this example, the first tuple (it's useful to think of these in Python terms, even though SQL will give these different names) holds the names of the columns in the order you are using for inserting your data. The second tuple, after the keyword values, holds the data items in the same order. Notice how SQL uses single quotes to delimit strings, and no quotes around numbers. (The phone number is different — it's actually a string because it has to be able to contain non-numbers, like dashes, periods, and plus signs, depending on how the data is entered.)

With queries, you can use shortcuts such as * to say that you want an operation to be performed using all of the columns in a table. For example, to query all of the rows in the department table, showing all of the columns for each row, you can use a query like the following:

select * from department

Note that SQL is not case-sensitive for its keywords, such as SELECT and FROM. But, some databases require table and column names to be all uppercase. It is common, therefore, to see people use SELECT and FROM and other operations in all capital letters to make them easily distinguished from other parts of the query.

This SQL statement omits the names of the columns to read and any conditions that would otherwise narrow down the data that would be returned. Thus the query will return all of the columns (from the *) and all of the rows (because there is no where clause).

You can perform a join with the select command, to query data from more than one table, but present it all in a single response. It's called a join because the data from both tables will be returned as though it was queried from a single table. For example, to extract the department name with each employee, you could perform a query like the following (all of which would need to be in one string to be a single query):

select employee.firstname, employee.lastname, department.name
from employee, department
where employee.dept =  department.departmentid
order by lastname desc

In this example, the select statement requests two columns from the employee table (the firstname and the lastname, but these are specified as coming from employee by the convention of specifying the table name and the column name in the table) and one from the department table (department.name). The order by section of the statement tells the database to order the results by the value in the lastname column, in descending order.

To simplify these queries, you can use aliases for the table names, which make them easier to type and to read (but don't change the logic or the syntax of your queries). For example, to use the alias e with the employee table, you can start a query as follows:

select e.firstname, e.lastname
from employee e
...

In this case, you must place the alias, e, after the table name in the from clause. You can also use the following format with the optional keyword as, which could be easier for you to read:

select e.firstname, e.lastname
from employee as e
...

To modify (or update) a row, use a SQL statement like the following:

update employee set manager=55 where empid=3

This example modifies the employee with an ID of 3 by setting that employee's manager to the employee with an ID of 55. As with other queries, numbers don't need to have quotes around them; however, strings would need to be quoted with single quotes.

To delete a row, use a SQL statement like the following:

delete employee where empid=42

This example deletes the employee with an ID of 42 but doesn't affect anything else in the database.

Defining Tables

When you first set up a database, you need to define the tables and the relations between them. To do this, you use the part of the SQL called the DDL, or Data Definition Language. (It defines the structure of your tables — get it?) DDL basics are pretty simple, where you use one operation to create tables, and another one to remove them:

CREATE TABLE tablename (column, type column type, ... )
DROP TABLE tablename

There is also an ALTER TABLE command to modify an existing table, but you won't need to do that for now. When you want to use this, a dedicated SQL book or web page will have more about this command.

Unfortunately, SQL is not an entirely standard language, and there are parts of it that each database handles differently. The DDL remains a part of SQL that has not been standardized. Thus, when defining tables you will find differences between the SQL dialects supported by the different databases, though the basics concepts are the same.

Setting Up a Database

In most cases when you're the programmer, you will already have a database that's up and running, perhaps even a database chosen by some other organization that you're going to have to use. For example, if you host your website with a website hosting company that provides bells and whistles, like a database, your hosting package may include access to the MySQL database. If you work for a large organization, your IT department may have already standardized on a particular database such as Oracle, DB/2, Sybase, or Informix. These latter packages are likely present in your workplace if you create enterprise applications with Python.

If you have no database at all, yet still want to work on the examples in this chapter, a good starting database is Sqlite. The main virtues of Sqlite are that it comes installed with Python, and it is simple and small, but functional. This makes it a great candidate for experimentation while you're learning, even if you have another database available to you. Just keep in mind that each database has its own quirks.

The examples in this chapter were written to work with Sqlite so that you can follow them without any external infrastructure being needed. You can easily modify these examples, though, to work with a different database. That's one of the great aspects of the Python DB API.

Using Sqlite is as simple as importing the module. The following example shows you all you need to create a database.

If you are working with another database, such as SQL Server, chances are good that a database has already been created. If not, follow the instructions from your database vendor. (A lot of the time, you can get help on tasks like this from your Database Administrator, or DBA, who would really rather have you working on a test database instead of on a production database.)

With Sqlite, creating a database is rather easy.

Using the Python Database APIs

First, some history about Python and relational databases. Python's support for relational databases started out with ad hoc solutions, with one solution written to interface with each particular database, such as Oracle. Each database module created its own API, which was highly specific to that database because each database vendor evolved its own API based on its own needs. This is hard to support, because coding for one database and trying to move it to the other gives a programmer severe heartburn, as everything needs to be completely rewritten and retested.

Over the years, though, Python has matured to support a common database, or DB, API, that's called the DB API. Specific modules enable your Python scripts to communicate with different databases, such as DB/2, PostgreSQL, and so on. All of these modules, however, support the common API, making your job a lot easier when you write scripts to access databases. This section covers this common DB API.

The DB API provides a minimal standard for working with databases, using Python structures and syntax wherever possible. This API includes the following:

  • Connections, which cover guidelines for how to connect to databases

  • Executing statements and stored procedures to query, update, insert, and delete data with cursors

  • Transactions, with support for committing or rolling back a transaction

  • Examining metadata on the database module as well as on database and table structure

  • Defining the types of errors

The following sections take you step by step through the Python database APIs.

Downloading Modules

You must download a separate DB API module for each database you need to access. For example, if you need to access an Oracle database as well as a MySQL database, you must download both the Oracle and the MySQL database modules.

See http://wiki.python.org/moin/DatabaseInterfaces for a listing of databases.

Modules exist for most major databases with the notable exception of Microsoft's SQL Server. You can access SQL Server using an ODBC module, though. In fact, the mxODBC module can communicate with most databases using ODBC on Windows or an ODBC bridge on UNIX (including Mac OS X) or Linux. If you need to do this, you can search for more information on these terms online to find out how other people are doing it.

Download the modules you need. Follow the instructions that come with the modules to install them.

You may need a C compiler and build environment to install some of the database modules. If you do, this will be described in the module's own documentation, which you'll need to read.

For some databases, such as Oracle, you can choose among a number of slightly different modules. You should choose the module that seems to best fit your needs or go to the website for this book and ask the authors for any recommendations if you're not sure.

Once you have verified that the necessary modules are installed, you can start working with Connections.

Creating Connections

A Connection object provides the means to communicate from your script to a database program. Note the major assumption here that the database is running in a separate process (or processes). The Python database modules connect to the database. They do not include the database application itself.

Each database module needs to provide a connect function that returns a connection object. The parameters that are passed to connect vary by the module and what is required to communicate with the database. The following table lists the most common parameters.

Parameter

Usage

Dsn

Data source name, from ODBC terminology. This usually includes the name of your database and the server where it's running.

Host

Host, or network system name, on which the database runs.

Database

Name of the database.

User

User name for connecting to the database.

Password

Password for the given user name.

For example, you can use the following code as a guide:

conn = dbmodule.connect(dsn='localhost:MYDB',user='tiger',password='scott')

Use your database module documentation to determine which parameters are needed.

With a Connection object, you can work with transactions, covered later in this chapter; close the connection to free system resources, especially on the database; and get a cursor.

Working with Cursors

A cursor is a Python object that enables you to work with the database. In database terms, the cursor is positioned at a particular location within a table or tables in the database, sort of like the cursor on your screen when you're editing a document, which is positioned at a pixel location.

To get a cursor, you need to call the cursor method on the connection object:

cursor = conn.cursor()

Once you have a cursor, you can perform operations on the database, such as inserting records.

Working with Transactions and Committing the Results

Each connection, while it is engaged in an action, manages a transaction. With SQL, data is not modified unless you commit a transaction. The database then guarantees that it will perform all of the modifications in the transaction or none. Thus, you will not leave your database in an uncertain and potentially erroneous state.

To commit a transaction, call the commit method of a connection:

conn.commit()

Note that the transaction methods are part of the connection class, not the cursor class.

If something goes wrong, like an exception is thrown that you can handle, you should call the rollback method to undo the effects of the incomplete transaction; this will restore the database to the state it was in before you started the transaction, guaranteed:

conn.rollback()

The capability to roll back a transaction is very important, because you can handle errors by ensuring that the database does not get changed. In addition, rollbacks are very useful for testing. You can insert, modify, and delete a number of rows as part of a unit test and then roll back the transaction to undo the effects of all the changes. This enables your unit tests to run without making any permanent changes to the database. It also enables your unit tests to be run repeatedly, because each run resets the data.

See Chapter 12 for more on testing.

Examining Module Capabilities and Metadata

The DB API defines several globals that need to be defined at the module level. You can use these globals to determine information about the database module and the features it supports. The following table lists these globals.

Global

Holds

Apilevel

Should hold '2.0' for the DB API 2.0, or '1.0' for the 1.0 API.

Paramstyle

Defines how you can indicate the placeholders for dynamic data in your SQL statements. The values include the following:

  • 'qmark' — Use question marks, as shown in the examples in this chapter.

  • 'numeric' — Use a positional number style, with ':1', ':2', and so on.

  • 'named' — Use a colon and a name for each parameter, such as :name.

  • 'format' — Use the ANSI C sprintf format codes, such as %s for a string and %d for an integer.

  • 'pyformat' — Use the Python extended format codes, such as %(name)s.

In addition, remember that pydoc is your friend. You can use pydoc to display information on modules, such as the database modules.

With a cursor object, you can check the definition attribute to see information about the data returned. This information should be a set of seven-element sequences, one for each column of result data. These sequences include the following items:

(name, type_code, display_size, internal_size, precision, scale, null_ok

None can be used for all but the first two items, as shown in this example:

(('FIRSTNAME', None, None, None, None, None, None),
('LASTNAME', None, None, None, None, None, None),
('NAME', None, None, None, None, None, None))

Handling Errors

Errors happen. With databases, errors happen a lot. The DB API defines a number of errors that must exist in each database module. The following table lists these exceptions.

Exception

Usage

Warning

Used for non-fatal issues. Must subclass StandardError.

Error

Base class for errors. Must subclass StandardError.

InterfaceError

Used for errors in the database module, not the database itself. Must subclass Error.

DatabaseError

Used for errors in the database. Must subclass Error.

DataError

Subclass of DatabaseError that refers to errors in the data.

OperationalError

Subclass of DatabaseError that refers to errors such as the loss of a connection to the database. These errors are generally outside of the control of the Python scripter.

IntegrityError

Subclass of DatabaseError for situations that would damage the relational integrity, such as uniqueness constraints or foreign keys.

InternalError

Subclass of DatabaseError that refers to errors internal to the database module, such as a cursor no longer being active.

ProgrammingError

Subclass of DatabaseError that refers to errors such as a bad table name and other things that can safely be blamed on you.

NotSupportedError

Subclass of DatabaseError that refers to trying to call unsupported functionality.

Your Python scripts should handle these errors. You can get more information about them by reading the DB API specification. See www.python.org/topics/database/ and www.python.org/peps/pep-0249.html for more information.

Summary

Databases provide a handy means for storing data. You can write Python scripts that can access all the popular databases using add-on modules. This chapter provided a whirlwind tour of SQL, the Structured Query Language, and covered Python's database APIs.

You also learned about the dbm modules that enable you to persist a dictionary using a variety of dbm libraries. These modules enable you to use dictionaries and transparently persist the data.

In addition, this chapter covered the Python database APIs, which define a standard set of methods and functions that you should expect from all database modules. This includes the following:

  • A connection object encapsulates a connection to the database. Use the connect function on the database module to get a new connection. The parameters you pass to the connect function may differ for each module.

  • A cursor provides the main object for interacting with a database. Use the connection object to get a cursor. The cursor enables you to execute SQL statements.

  • You can pass dynamic data as a tuple of values to the cursor execute method. These values are placed into your SQL statements, enabling you to create reusable SQL statements.

  • After performing a query operation, the cursor object holds the data. Use the fetchone or fetchall methods to extract the data.

  • After modifying the database, call commit on the connection to commit the transaction and save the changes. Use the rollback method to undo the changes.

  • Call close on each cursor when done. Call close on the connection when done.

  • The DB APIs include a defined set of exceptions. Your Python scripts should check for these exceptions to handle the variety of problems that may arise.

Chapter 15 covers XML, HTML, and XSL style sheets, technologies frequently used for web development.

Exercises

  1. Suppose you need to write a Python script to store the pizza preferences for the workers in your department. You need to store each person's name along with that person's favorite pizza toppings. Which technologies are most appropriate to implement this script?

    1. Set up a relational database such as MySQL or Sqlite.

    2. Use a dbm module such as dbm.

    3. Implement a web-service-backed rich web application to create a buzzword-compliant application.

  2. Rewrite the following example query using table name aliases:

    select employee.firstname, employee.lastname, department.name
    from employee, department
    where employee.dept =  department.departmentid
    order by employee.lastname desc
  3. The terminate.py script, shown previously, removes an employee row from the employee table; but this script is not complete. There remains a row in the user table for the same person. Modify the terminate.py script to delete both the employee and the user table rows for that user.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.89.18