In this section, we discuss issues that emerge in database applications when multiple users access a database system; some users are inserting, updating, or deleting data, while others run queries.
To motivate the problems and solutions discussed here, consider an example. Imagine a user of the winestore wants to buy the last bottle of an expensive, rare wine that’s in stock. The user browses the database and finds the wine. There is only bottle left, and the user quickly adds this to her shopping cart. The shopping cart is a row in the order table with only one related row in the items table. Now, the user decides to finalize the purchase and is presented with a summary of the shopping cart.
However, while the user fumbles about finding her password to log in,
another user enters the system. This user quickly locates the same
wine, sees that there is only one bottle left, adds it to his
shopping cart, logs in to the system, and purchases the wine. When
our first user finally logs in to finalize the order, all the details
look fine, but the wine has actually been sold. Our database
UPDATE
operation to deduct from the inventory
fails since the stock value is already zero, and we end up reporting
an error to our original—now very unhappy and
confused—user.
Consider another example. Imagine that one of our winestore stock managers wants to order 12 more bottles of a popular wine, but only if there are less than two dozen bottles currently in stock. The manager runs a query to sum the total stock for that wine from the inventory table. The result is that there are 15 bottles left, so the manager decides to place an order. However, he heads off to fill his coffee cup first, leaving the system displaying the query result.
A second stock manager arrives at her desk with the same intention: to order more of this popular wine if there are less than 24 bottles. The result of the query is the same: 15 bottles. The second manager orders a dozen bottles, and updates the inventory to 27, knowing the bottles will arrive in the afternoon. The problem occurs when the first manager returns: he doesn’t rerun the query—why should he?—and he too orders 12 bottles and updates the inventory to 27. Now the system has record of 27 bottles, but two dozen will arrive in the afternoon to take the total to 39!
The first problem is a design issue—as well as an example of an unrepeatable read—and one that can be solved with more restrictive system requirements, knowledge of how the DBMS behaves, and some careful script development. The second is a classic problem—what textbooks describe as a lost update—and it requires more understanding of relational database problems and theory. We cover simple solutions to fundamental problems like these here, and discuss how MySQL implements locking for transactions, concurrency, and performance.
This section isn’t intended as a substitute for a relational database text. Most textbooks contain extensive treatment of transaction and concurrency topics, and most of these are highly relevant to the state problems of web database applications.
We have illustrated two examples of the problems users have when they access a web database at the same time (that is, concurrently). Allowing uncontrolled interleaving of SQL statements—where each of the users is reading and writing—can result in several well-known problems. The management of a group of SQL statements—we call these transactions —is one important area of the theory and practice of relational databases. Here are four of the more common problems of concurrent read and write transactions:
User A reads from the database, recording a value. User B reads the same value, then updates the value immediately. User A then updates the value, overwriting the update written by User B. An example of this lost update problem was described in the introduction to this section through the stock update example.
User A reads a value from the database, changes the value, and writes it back to the database. User B then reads the value, changes the value, and writes it back to the database. User A then decides not to proceed for some reason with the rest of his actions and therefore wants to undo the changes he made. The problem is that User B has read and used the changed value, resulting in a dirty read problem.
Consider an example: a manager decides to add a 3% surcharge to a particular wine inventory, so she reads and updates the cost of that wine in the inventory table. Another manager decides to apply a 10% discount to all wines made by a particular winery, which happens to include the wine just surcharged. After all this, the first manager realizes she has made a mistake: the wrong wine was updated! Unfortunately, the second manager has already used this incorrect value as input into his update, and the change can’t be undone correctly.
One user updates values while another reads and summarizes the same values. Values summarized may be read before or after each individual update, resulting in unpredictable results.
For example, consider a case in the winestore in which one user updates inventories and another produces a management stock report.
A value is read in by a user, updated by another user, and subsequently reread by the first user for verification. Despite not modifying the value, the first user encounters two different values, i.e., an unrepeatable read.
Fortunately, most of these problems can be solved through locking or careful design of scripts that carry out database transactions. However, some problems may be deliberately unsolved in a particular system because they restrict the system requirements or add unnecessary complexity. We discuss locking in the next section.
It has been shown that a simple scheme called locking—actually, two-phase locking —solves the four transaction problems identified in the last section.
The first and most important point is that the primary use of locking
is to solve concurrency problems. If scripts are being implemented
that write to the database but aren’t multistep
operations susceptible to the problems described in the last section,
locks aren’t needed. Simple scripts that insert one
row, delete one row, or update one row, and that
don’t use results of a previous
SELECT
or data entered by the user as input,
don’t require a lock.
Locking is required only when developing scripts that first read a value from a database and later write that value to the database. Locks are never required for self-contained insert, update, or delete operations such as updating a customer’s details, adding a region to the region table, or unconditionally deleting an inventory. Locking may not be required for all parts of a web database application: parts of the application can still be safely used without violating any locking conditions.
Locks
are variables with a special property. With its default settings,
each MySQL table has an associated lock variable. If a user sets the
lock variable for a particular table, no other user can perform
particular actions on that table. The user who has set the lock
variable holds the lock on the table. In
practice, there are two kinds of locks for each table:
READ
LOCKs
, when a user is
only reading from a table, and WRITE
LOCK
s, when a user is both reading and writing to
a table.
Having locks in a DBMS leads to four rules of use:
If a user wants to write to a table, and she is performing a
transaction susceptible to a concurrency problem, she must obtain a
WRITE
LOCK
on that table.
If a user only wants to read from a table, and he is performing a
transaction susceptible to a concurrency problem, he must obtain a
READ
LOCK
on that table.
If a user requires a lock, she must lock all tables used in the transaction.
A user must release all locks when a database transaction is complete.
When a user holds a WRITE
LOCK
on a table, no other users can read or write to that table. When a
user holds a READ
LOCK
on a
table, other users can also read or hold a READ
LOCK
, but no user can write or hold a
WRITE
LOCK
on that table.
SELECT
, UPDATE
,
INSERT
, or DELETE
operations
that don’t use LOCK
TABLES
don’t proceed if locks are
held that would logically prevent their operation. For example, if a
user holds a WRITE
LOCK
on a
table, no other user can issue a SELECT
,
UPDATE
, INSERT
,
DELETE
, or LOCK
operation on
that table.
The following segment of an interaction with the MySQL command interpreter illustrates the use of locks in a summarization task that requires locking:
mysql> LOCK TABLES items READ, temp_report WRITE; mysql> SELECT sum(price) FROM items WHERE cust_id=1; +------------+ | sum(price) | +------------+ | 438.65 | +------------+ 1 row in set (0.04 sec) mysql> UPDATE temp_report SET purchases=438.65 WHERE cust_id=1; mysql> UNLOCK TABLES;
In this example, a temporary
table
called temp_report is updated with the result of
a SELECT
operation on the
items table. If locks aren’t
used, the items table can be modified by another
user, possibly altering the summary value of $438.65 used as input to
the UPDATE
operation. There are two locks obtained
for this transaction: first, a READ
LOCK
on items, since we
don’t need to change items but
we don’t want another user to make a change to it;
and, second, a WRITE
LOCK
on
temp_report, because we want to change the
table, and we don’t want other users to read or
write to the report while we make changes. The
UNLOCK
TABLES
operation
releases all locks held; locks can’t be
progressively released.
It isn’t permitted by MySQL to lock only one of the two tables used in the transaction above. The following rules apply to locks:
If a lock is held, all other tables that are to be used must also be locked. Failing to do so results in a MySQL error.
If aliases are used in queries—for example:
SELECT * from customer c where c.custid=1
the alias must be locked with:
LOCK TABLES customer c READ
or:
LOCK TABLES customer c WRITE
(depending on the transaction requirements).
If different aliases for the same table are used, each different alias must be locked.
In many cases—including those in which locking is required if
the tasks are implemented intuitively—locking can be avoided.
When designing transactions, careful use of
mysql_insert_id( )
(as opposed to using max( )
to find the next available identifier), use of temporary
summary tables, and updates that are relative (such as
UPDATE customer SET discount=discount*1.1
) are
practical techniques to avoid using the output of previous
SELECT
statements.
The LOCK
TABLES
statement
is used to lock the listed tables in either READ
or WRITE
mode. As discussed earlier, all tables
that are accessed in the transaction must be locked in either
READ
or WRITE
mode and must be
listed in a single LOCK
TABLES
statement.
A script that issues a LOCK
TABLES
statement is suspended until all locks
listed are successfully obtained. There is no time limit in waiting
for locks. For locks that can’t be immediately
obtained—because the lock is held by another user or an
operation is running on the table already—the request is placed
at the back of either the
write-
or
read-lock
queue for the table,
depending on the lock required. The write-lock queue has priority
over the read-lock queue, so a user who wants a write lock obtains it
when it becomes available, regardless of how long another user has
been waiting in the read-lock queue. This is a design decision in
MySQL that gives priority to database modifications over database
queries.
MySQL is designed to give writing priority over reading. Regardless
of how long a user has been queued in the
READ
LOCK
queue, any
request in the WRITE
LOCK
queue receives
priority. This can lead to a problem called
starvation
, where a transaction never completes
because it can’t obtain the required locks. However,
since most web database applications read from databases much more
than they write, and locks are required in only a few situations,
starvation is very uncommon in practice.
If low-priority writing is essential to an application, a
LOW_PRIORITY
option can be prefixed before the
WRITE
clause. If a transaction is queued for a
LOW_PRIORITY WRITE
, it receives the lock only when
the READ LOCK
queue is empty and no other users
are reading from the table. Again, consideration of possible
starvation is important.
Locks can’t be progressively obtained through
several LOCK
TABLES
statements.
Indeed, issuing a second LOCK
TABLES
is the same as issuing an
UNLOCK
TABLES
to release
all locks and then issuing the second LOCK
TABLES
. There are good reasons for the strictness
of this related to a well-known locking problem called
deadlock
, which we don’t
discuss here. However, MySQL is deadlock-free because of the
strictness and functionality of the LOCK TABLES
and UNLOCK TABLES
statements.[9]
MySQL has a feature called
INSERT
DELAYED
for
insertion that is described in the MySQL manual.
Don’t mix locking with INSERT
DELAYED
for insertion operations. The
INSERT
DELAYED
process is
carried out by the MySQL DBMS at a later time—under its own
control—and the locks held by the user can’t
be used by the DBMS. INSERT
DELAYED
should be used only in situations in which
locking isn’t required.
Locking is primarily designed to ensure that concurrent transactions can execute safely. However, locking is also a useful performance tool to optimize the performance of a particular transaction.
Consider, for example, a situation where we urgently require a
complex report on the stock in the winestore that uses a slow
SELECT
statement with a GROUP
BY
, ORDER
BY
, and HAVING
clause, and that
joins together many tables. With other users running queries and
using system resources, this query may run even slower. A solution is
to use LOCK
TABLES
with the
WRITE
option to stop other users running queries
or database updates, and to have exclusive access to the database for
the query duration. This permits better optimization of the query
processing by the DBMS, dedication of all the system resources to the
query, and faster disk access.
The downside of locking for performance is the reduction in concurrent access to the database. Users may be inconvenienced by slow responses or timeouts from the web database application. Locking for performance should be used sparingly.
Example 6-9 shows a PHP
function, updateDiscount( )
, that requires locking to ensure that the
value returned from the SELECT
query
can’t change before the UPDATE
operation. The script is designed to be run either by the winestore
system administrator—it would then require a
<form>
for user input—or as the final
module in the ordering process for users. Another example that
requires locking for winestore ordering is included in Chapter 12.
The script in Example 6-9 is designed to reward
loyal customers. If the customer has spent a significant amount on an
order at the winestore, a percentage discount is applied to her
order. The function updateDiscount( )
forms the
body of the script. It takes as parameters a
cust_id
, an order_id
for that
customer, a discount
to apply to that order, and a
threshold
total. If the total amount spent by the
user exceeds the threshold total, the discount is applied to the
order.
Example 6-9. The updateDiscount function in which locking is required
function updateDiscount($custId, $orderId, $discount, $minimum, $connection) { $ok = false; // Lock all tables required in this transaction $query = "LOCK TABLES items READ, orders WRITE, customer READ"; if (!mysql_query($query, $connection)) showerror( ); // Run query to find out how much a user // has spent in this purchase $query = "SELECT SUM(price*qty) FROM items, orders, customer WHERE customer.cust_id = orders.cust_id AND orders.order_id = items.order_id AND items.cust_id = orders.cust_id AND orders.order_id = $orderId AND customer.cust_id = $custId"; if (!($result = mysql_query($query, $connection))) showerror( ); // Get the $row with the total spent $row = mysql_fetch_array($result); // Is the amount spent more than the threshold? if ($row["SUM(price*qty)"] > $minimum) { // Yes, so give the customer a discount // for this order $query = "UPDATE orders SET discount = $discount WHERE cust_id = $custId AND order_id = $orderId"; if (!mysql_query($query, $connection)) showerror( ); $ok = true; } // Unlock the tables $query = "UNLOCK TABLES"; if (!mysql_query($query, $connection)) showerror( ); // Return whether the discount was given or not return $ok; }
The locking of items,
orders, and customer is
performed before the query, and the UNLOCK
TABLES
statement is issued after the database
update of the discount. As discussed in the last section, all tables
and aliases that are used must be locked for either
READ
or WRITE
. MySQL reports an
error if, for example, items is accessed but not
locked while orders and
customer were locked. If an unlocked table needs
to be accessed—or locking must be avoided for a particular
table—a second DBMS connection can be opened and used.
There are several locking paradigms that don’t work in a web database application because of the statelessness of HTTP. Each approach fails, because there is either no guarantee or no possibility that the locked tables will be unlocked. If tables are locked indefinitely, other transactions can’t proceed, and the DBMS will most likely need to be shut down and restarted.
Be careful with locking in web database applications. Remember the basic rule that all locks should be unlocked by the same script during the same execution of the script.
All web scripts that require locking should have the structure lock, query, update, and unlock. There must be no user interaction or intervening calls to other scripts that require input.
The following approaches to transactions and locking in a web database application should be avoided:
Failing to issue an UNLOCK
TABLES
on a locked persistent database connection
opened with mysql_pconnect( )
. The locks
aren’t released when the script terminates, and
there is no guarantee that the script will be run in the future or
that the same persistent connection will be used again.
It isn’t necessary to issue an
UNLOCK
TABLES
if a
nonpersistent connection opened with mysql_connect( )
is used. Locks are automatically released when the
script finishes. However, it is good practice to include the
UNLOCK
TABLES
statement.
Locking one or more tables during the first execution of a script,
then querying or updating during a second or subsequent execution of
the script. Remember that each database connection in a script is
independent and is treated as a different user by MySQL. Such an
approach queries and updates without locks unless, by chance, the
same persistent connection that issued the locks is reused. A
subsequent UNLOCK TABLES
may fail.
Retrieving a value such as the next available primary key value, presenting this to the user, waiting for the user to enter further details, and then adding a row to the database with that identifier. Remember that another user may add a row while the first user is entering the required details, and locks should never be carried across several scripts or different executions of the same script.
If values must be shown to a user, consider adding a summary table for identifiers, or copying rows to a temporary table. For example, an identifier table can store the next available identifier for each other table, this can then be incremented by the script and the value can be used in subsequent scripts without locking problems and without any clashes in numbering.
This solution is shown in Example 6-10, using an
auxiliary table named ids that manages the next
available region_id
attribute. The use of the
additional table prevents duplicate rows being inserted, and avoids
any problems with locking or updates.
Example 6-10. An auxiliary table manages the next region_id attribute
<?php // This code needs an auxiliary table called "ids" // that might be created with: // CREATE TABLE ids ( // region_id int default 0, // other_id int default 0, // another_id int default 0 // ); // It has one row, and no primary key is required. // After creating the table, a row is needed, // so issue an: INSERT INTO ids (NULL, NULL, NULL); // (if it's being added later, use MAX( ) to get the // correct ID values!) include 'db.inc'; include 'error.inc'; function getNextRegion ($connection) { // A nice way to do it... use an auxiliary table // Lock the auxiliary table $query = "LOCK TABLES ids WRITE"; if (!mysql_query($query, $connection)) showerror( ); // Add one to the region_id attribute $query = "UPDATE ids SET region_id = region_id + 1"; if (!mysql_query($query, $connection)) showerror( ); // Find out the new value of region_id $query = "SELECT * FROM ids"; if (!($result = mysql_query($query, $connection))) showerror( ); // Get the row that is returned $row = mysql_fetch_array($result); // Unlock the table $query = "UNLOCK TABLES"; if (!mysql_query($query, $connection)) showerror( ); // Return the region_id return ($row["region_id"]); } // MAIN ----- if (!($connection = @ mysql_connect($hostName, $username, $password))) die("Could not connect to database"); if (!mysql_select_db($databaseName)) showerror( ); if (empty($regionId)) { $regionId = getNextRegion($connection, $databaseName); ?> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Insert a region</title> </head> <body bgcolor="white"> region_id: <?= $regionId ?> <br> <form method="post" action="example.6-10.php"> <input type="hidden" name="regionId" value="<?=$regionId;?>"> <br>region_name: <br><input type="text" name="regionName" size=80> <br>description: <br><textarea name="description" rows=4 cols=80> </textarea> <br><input type="submit"> </form> </body> </html> <?php } else { $regionId = clean($regionId, 3); $regionName = clean($regionName, 20); $description = clean($description, 255); $query = "INSERT INTO region SET " . "region_id = " . $regionId . ", " . "region_name = "" . $regionName . "", " . "description = "" . $description . """; if ((@ mysql_query ($query, $connection)) && @ mysql_affected_rows( ) == 1) header("Location:insert_receipt.php?" . "values=$regionId&status=T"); else header("Location: insert_receipt.php?status=F"); } ?>
Until recently, MySQL supported only table locking. Other DBMSs support locking at other levels, including locking rows, groups of rows, attributes across all rows in a table, and disk pages.
A common argument against using MySQL has been that table locking is too heavy-handed and that it limits concurrency in web database applications. This isn’t really true, except when there are specific requirements that are uncharacteristic of web database applications.
Table locking works particularly well in web database applications, where typically:
DELETE
and UPDATE
operations
are on specific rows—most often accessed by the primary key
value—and the rows are accessed through an index.
There are many more read operations than write operations.
Operations require locks on whole tables. Examples include
GROUP
BY
operations, updates of
sets of rows, and reading in most rows in a table.
By default, MySQL uses a type of table called MyISAM . Up to now, the MyISAM and heap have supported only table locking. However, three new database types have recently become supported by MySQL, and these have different locking paradigms:
The Berkeley Database (BDB) tables have disk
page-level locking; the LOCK
TABLES
statement can still be used in
BDB.
The InnoDB tables have row-level locking. They are designed to support very large volumes of data efficiently, and the locking mechanisms are designed to have low overheads.
The Gemini tables have both row- and table-level locking; unlike the other table types that can be used with MySQL, the Gemini table is covered by a commercial license and isn’t free software.
Support for BDB and InnoDB tables must be compiled into MySQL during the installation process, and each requires MySQL 3.23.34 or a later version. The Gemini table type is a component of the commercially available NuSphere product range. Configuration of these table types is outside the scope of this book.
Interestingly, the MyISAM tables permit a
limited form of concurrency that isn’t immediately
obvious with the table-locking paradigm. When a mix of select and
write operations occur on a MyISAM table, MySQL
automatically allows write operations to change copies of the data.
Other SELECT
statements being run by other users
read the unchanged data and, when they are completed, the modified
copies are written back to the database. This approach is known as
data versioning
.
The row-locking paradigm is used in the InnoDB and Gemini table types, and is the dominant paradigm in other DBMSs. The BDB table type offers page locking, which is similar to locking selected rows.
Row or page locking works well in situations that are infrequently seen in web database applications, such as:
Transaction environments where a number of steps need to be undone or rolled back.
Many users are writing to the same tables concurrently.
Locks need to be maintained for long periods of time.
The drawbacks of row and page locking include:
There are two significant topics related to transactions and concurrency that aren’t covered in this chapter. We have omitted these topics because they are less important in web database applications than in traditional relational systems, and because this book isn’t intended as a substitute for a good relational database text or the documentation of the MySQL DBMS.
The first is a more traditional treatment of transactions from a commit and rollback perspective. The InnoDB, BDB, and Gemini table types support a model where a statement can be issued to begin a transaction that consists of several database operations. On completion of the operations, a commit statement can be issued to write the changes to the database and verify that these changes have occurred. If, for some reason, the operations need to be undone—for example, when a user presses Cancel—a rollback command can be issued to return the database to its original state.
Commit and rollback processing is useful, but it can be argued that it is less interesting in the stateless HTTP environment, in which operations need to be as independent as possible. For most practical purposes in web database applications, complex transactional processing isn’t required.
The second topic we have not covered is recovery. Database recovery techniques are based on logging, in which database changes are written to a file that can be used for transaction rollback and for DBMS system recovery. MySQL does support logging for recovery, and more details can be found in the MySQL manual.
[9] Deadlock is a difficult problem. As recently as Version 3.22.23 of MySQL, there were bug fixes to MySQL to avoid deadlocking problems in the DBMS.
3.141.192.183