Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 15

Protecting Data

IN THIS CHAPTER

Avoiding database damage

Understanding the problems caused by concurrent operations

Dealing with concurrency problems through SQL mechanisms

Tailoring protection to your needs with SET TRANSACTION

Protecting your data without paralyzing operations

Reducing vulnerability to malicious attacks

Everyone has heard of Murphy’s Law — usually stated, “If anything can go wrong, it will.” People joke about this pseudo-law because most of the time things go fine. At times, you may feel lucky because you’re untouched by what purports to be one of the basic laws of the universe. When unexpected problems arise, you probably just recognize what has happened and deal with it.

In a complex structure, the potential for unanticipated problems shoots way up. (A mathematician might say it “increases approximately as the square of the complexity.”) Thus large software projects are almost always delivered late and are often loaded with bugs. A nontrivial, multiuser DBMS application is a large, complex structure. In the course of operation, many things can go wrong. Methods have been developed for minimizing the impact of these problems, but the problems can never be eliminated completely. This is good news for professional people who do database maintenance and repair — automating them out of a job will probably never be possible. This chapter discusses the major things that can go wrong with a database and the tools that SQL provides for you to deal with the problems that arise.

Threats to Data Integrity

Cyberspace (including your network) is a nice place to visit, but for the data living there, it’s no picnic. Data can be damaged or corrupted in a variety of ways. Chapter 5 discusses problems resulting from bad input data, operator error, and deliberate destruction. Poorly formulated SQL statements and improperly designed applications can also damage your data — and figuring out how doesn’t take much imagination. Two relatively obvious threats — platform instability and equipment failure — can also trash your data. Both hazards are detailed in the following sections, as well as problems that can be caused by concurrent access.

Platform instability

Platform instability is a category of problem that shouldn’t even exist, but alas, it does. It is most prevalent when you’re running one or more new and relatively untried components in your system. Problems can lurk in a new DBMS release, a new operating system version, or new hardware. Conditions or situations that have never appeared before can show up while you’re running a critical job. Your system locks up, and your data is damaged. Beyond directing a few choice words at your computer and the people who built it, you can’t do much except hope your latest backup was a good one.

Never put important production work on a system that has any unproven components. Resist the temptation to put your bread-and-butter work on an untried beta release of the newest, most function-laden version of your DBMS or operating system. If you must gain some hands-on experience with a new software product, do so on a machine that’s completely isolated from your production network.

Equipment failure

Even well-proven, highly reliable equipment fails sometimes, sending your data to the great beyond. Everything physical wears out eventually — even modern, solid-state computers. If such a failure happens while your database is open and active, you can lose data — and sometimes (even worse) not realize it. Such a failure will happen sooner or later. If Murphy’s Law is in operation that day, the failure will happen at the worst possible time.

One way to protect data against equipment failure is redundancy. Keep extra copies of everything. For maximum safety (provided your organization can swing it financially), have duplicate hardware configured exactly like your production system. Have database and application backups that can be loaded and run on your backup hardware when needed. If cost constraints keep you from duplicating everything (which effectively doubles your costs), at least be sure to back up your database and applications frequently enough that an unexpected failure doesn’t require you to reenter a large amount of data. Many DBMS products include replication capabilities. That is all well and good, but it won’t help unless you configure your system to actually use them.

Another way to avoid the worst consequences of equipment failure is to use transaction processing — a topic that takes center stage later in this chapter. A transaction is an indivisible unit of work, so when you use transaction processing, either an entire transaction is executed or none of it is. This all-or-nothing approach may seem drastic, but the worst problems arise when a series of database operations is only partially processed. Thus you’re much less likely to lose or corrupt your data, even if the machine on which the database resides is crashing.

Concurrent access

Assume that you’re running proven hardware and software, your data are good, your application is bug-free, and your equipment is inherently reliable. Data utopia, right? Not quite. Problems can still arise when multiple people try to use the same database table at the same time (concurrent access), and their computers argue about who gets to go first (contention). Multiple-user database systems must be able to handle the ruckus efficiently.

Transaction interaction trouble

Contention troubles can lurk even in applications that seem straightforward. Consider this example. You’re writing an order-processing application that involves four tables: ORDER_MASTER, CUSTOMER, LINE_ITEM, and INVENTORY. The following conditions apply:

The ORDER_MASTER table has OrderNumber as a primary key and CustomerNumber as a foreign key that references the CUSTOMER table.
The LINE_ITEM table has LineNumber as a primary key, ItemNumber as a foreign key that references the INVENTORY table, and Quantity as one of its columns.
The INVENTORY table has ItemNumber as a primary key; it also has a field named QuantityOnHand.
All three tables have other columns, but they don't enter into this example.

Your company policy is to ship each order completely or not at all. No partial shipments or back orders are allowed. (Relax. It’s a hypothetical situation.) You write the ORDER_PROCESSING application to process each incoming order in the ORDER_MASTER table as follows: It first determines whether your company can ship all the line items. If so, it writes the order and then decrements the QuantityOnHand column of the INVENTORY table as required. (This action deletes the affected entries from the ORDER_MASTER and LINE_ITEM tables.) So far, so good. You set up the application to process orders in one of two ways when users access the database concurrently:

Method 1 processes the INVENTORY row that corresponds to each row in the LINE_ITEM table. If QuantityOnHand is large enough, the application decrements that field. If QuantityOnHand is not large enough, it rolls back the transaction to restore all inventory reductions made to other LINE_ITEMs in this order.
Method 2 checks every INVENTORY row that corresponds to a row in the order's LINE_ITEMs. If they are all big enough, then it processes those items by decrementing them.

Usually, Method 1 is more efficient when you succeed in processing the order; Method 2 is more efficient when you fail. Thus, if most orders can be filled most of the time, you’re better off using Method 1. If most orders can’t be filled most of the time, you’re better off with Method 2. Suppose this hypothetical application is up and running on a multiuser system that doesn’t have adequate concurrency control. Yep. Trouble is brewing, all right. Consider this scenario:

A customer contacts an order processor at your company (User 1) to order ten bolt cutters and five wide adjustable wrenches.
User 1 uses Method 1 to process the order. The first item in the order is ten pieces of Item 1 (bolt cutters).

As it happens, your company has ten bolt cutters in stock, and User 1’s order takes them all.

The order-processing function chugs along, decrementing the quantity of bolt cutters to zero. Then things get (as the Chinese proverb says) interesting. Another customer contacts your company to process an order and talks to User 2.
User 2 attempts to process the customer’s small order for one bolt-cutter — and finds that there are no bolt cutters in stock.

User 2’s order is rolled back because it can’t be filled.
Meanwhile, User 1 tries to complete his customer’s order and checks the system for five pieces of Item 37 (wide adjustable wrenches).

Unfortunately, your company only has four wide adjustable wrenches in stock. User 1’s complete order (including the bolt cutters) is rolled back because it can’t be completely filled.

The INVENTORY table is now back to the state it was in before either user started operating. Neither order has been filled, even though User 2’s order could have been.

In a slightly different scenario, Method 2 fares little better, although for a different reason. User 1 checks all the items ordered and decides that all the items ordered are available. Then User 2 comes in and processes an order for one of those items before User 1 performs the decrement operation; User 1’s transaction fails.

Serialization eliminates harmful interactions

No conflict occurs if transactions are executed serially rather than concurrently. (Taking turns — what a concept.) In the first example, if User 1’s unsuccessful transaction was completed before User 2’s transaction started, the ROLLBACK function would have made the single bolt cutter ordered by User 2 available. (The ROLLBACK function rolls back, or undoes the entire transaction.) If the transactions had run serially in the second example, User 2 would have had no opportunity to change the quantity of any item until User 1's transaction was complete. User 1’s transaction completes, either successfully or unsuccessfully, and User 2 then sees how many bolt cutters are left in stock.

If transactions are executed serially (one after the other), they have no chance of interacting destructively. Execution of concurrent transactions is serializable if the result is the same as it would be if the transactions were executed serially.

Serializing concurrent transactions isn’t a cure-all. You have to make a tradeoff between performance and protection from harmful interactions. The more you isolate transactions from each other, the more time it takes to perform each function. (In cyberspace, as in real life, waiting in line takes time.) Be aware of the tradeoffs so you can configure your system for adequate protection — but not more protection than you need. Controlling concurrent access too tightly can kill overall system performance.

Reducing Vulnerability to Data Corruption

You can take precautions at several levels to reduce the chances of losing data through some mishap or unanticipated interaction. You can set up your DBMS to take some of these precautions for you. When you configure your DBMS appropriately, it acts like a guardian angel to protect you from harm, operating behind the scenes; you don’t even know that the DBMS is helping you out. Your database administrator (DBA) can take other precautions at his or her discretion that you may not be aware of. As the developer, you can take precautions as you write your code.

To avoid a lot of grief, get into the habit of adhering to a few simple principles automatically so they’re always included in your code or in your interactions with your database:

Use SQL transactions.
Tailor the level of isolation to balance performance and protection.
Know when and how to set transactions, lock database objects, and perform backups.

Details coming right up.

Using SQL transactions

The transaction is one of SQL’s main tools for maintaining database integrity. An SQL transaction encapsulates all the SQL statements that can have an effect on the database. An SQL transaction is completed with either a COMMIT or ROLLBACK statement:

If the transaction finishes with a COMMIT, the effects of all the statements in the transaction are applied to the database in one rapid-fire sequence.
If the transaction finishes with a ROLLBACK, the effects of all the statements are rolled back (that is, undone), and the database returns to the state it was in before the transaction began.

In this discussion, the term application means either an execution of a program (whether in Java, C++, or some other programming language) or a series of actions performed at a terminal during a single logon.

An application can include a series of SQL transactions. The first SQL transaction begins when the application begins; the last SQL transaction ends when the application ends. Each COMMIT or ROLLBACK that the application performs ends one SQL transaction and begins the next. For example, an application with three SQL transactions has the following form:

Start of the application

Various SQL statements (SQL transaction-1)

COMMIT or ROLLBACK

Various SQL statements (SQL transaction-2)

COMMIT or ROLLBACK

Various SQL statements (SQL transaction-3)

COMMIT or ROLLBACK

End of the application

Remember I use the phrase SQL transaction because the application may be using other capabilities (such as for network access) that do other sorts of transactions. In the following discussion, I use transaction to mean SQL transaction specifically.

A normal SQL transaction has an access mode that is either READ-WRITE or READ-ONLY; it has an isolation level that is SERIALIZABLE, REPEATABLE READ, READ COMMITTED, or READ UNCOMMITTED. (You can find transaction characteristics in the “Isolation levels” section, later in this chapter.) The default characteristics are READ-WRITE and SERIALIZABLE. If you want any other characteristics, you have to specify them with a SET TRANSACTION statement such as the following:

SET TRANSACTION READ ONLY ;

SET TRANSACTION READ ONLY REPEATABLE READ ;

SET TRANSACTION READ COMMITTED ;

You can have multiple SET TRANSACTION statements in an application, but you can specify only one in each transaction, and it must be the first SQL statement executed in the transaction. If you want to use a SET TRANSACTION statement, execute it either at the beginning of the application or after a COMMIT or ROLLBACK.

You must perform a SET TRANSACTION at the beginning of every transaction for which you want nondefault properties, because each new transaction after a COMMIT or ROLLBACK is given the default properties automatically.

A SET TRANSACTION statement can also specify a DIAGNOSTICS SIZE, which determines the number of error conditions for which the implementation should be prepared to save information. (Such a numerical limit is necessary because an implementation can detect more than one error during a statement.) The SQL default for this limit is implementation-defined, and that default is almost always adequate.

The default transaction

The default SQL transaction has characteristics that are satisfactory for most users most of the time. If necessary, you can specify different transaction characteristics with a SET TRANSACTION statement, as described in the previous section. (SET TRANSACTION gets its own spotlight treatment later in the chapter.)

The default transaction makes a couple of other implicit assumptions:

The database will change over time.
It's always better to be safe than sorry.

It sets the mode to READ-WRITE, which, as you may expect, enables you to issue statements that change the database. It also sets the isolation level to SERIALIZABLE, which is the highest level of isolation possible (thus the safest). The default diagnostics size is implementation-dependent. Look at your SQL documentation to see what that size is for your system.

Isolation levels

Ideally, the system handles your transactions independently from every other transaction, even if those transactions happen concurrently with yours. This concept is referred to as isolation. In the real world of networked multiuser systems with real-time access requirements, however, complete isolation is not always feasible. Isolation may exact too large a performance penalty. A tradeoff question arises: “How much isolation do you really want, and how much are you willing to pay for it in terms of performance?”

Getting mucked up by a dirty read

The weakest level of isolation is called READ UNCOMMITTED, which allows the sometimes-problematic dirty read. A dirty read is a situation in which a change made by one user can be read by a second user before the first user completes her transaction with a COMMIT statement.

The problem arises if the first user aborts and rolls back her transaction. The second user's operations are now based on an incorrect value. The classic example of this foul-up can appear in an inventory application. In “Transaction interaction trouble,” earlier in this chapter, I outline one possible scenario of this type, but here’s another example: One user decrements inventory; a second user reads the new (lower) value. The first user rolls back her transaction (restoring the inventory to its initial value), but the second user, thinking inventory is low, orders more stock and possibly creates a severe overstock. And that’s if you’re lucky.

Don’t use the READ UNCOMMITTED isolation level unless you don’t care about accurate results.

You can use READ UNCOMMITTED if you want to generate approximate statistical data, such as these examples:

Maximum delay in filling orders
Average age of salespeople who don't make quota
Average age of new employees

In many such cases, approximate information is sufficient; the extra cost of the concurrency control required to give an exact result — mainly a performance slowdown — may not be worthwhile.

Getting bamboozled by a nonrepeatable read

The next highest level of isolation is READ COMMITTED: A change made by another transaction isn’t visible to your transaction until the other user has finalized the other transaction with the COMMIT statement. This level gives you a better result than you can get from READ UNCOMMITTED, but it's still subject to a nonrepeatable read — serious problem that happens like a comedy of errors.

Consider the classic inventory example:

User 1 queries the database to see how many items of a particular product are in stock. The number is ten.
At almost the same time, User 2 starts, and then finalizes, a transaction with the COMMIT statement that records an order for ten units of that same product, decrementing the inventory to zero.
Now User 1, having seen that ten are available, tries to order five of them. Five are no longer left, however, because User 2 has raided the pantry.

User 1’s initial read of the quantity available is not repeatable. Because the quantity has changed out from under User 1, any assumptions made on the basis of the initial read are not valid.

Risking the phantom read

An isolation level of REPEATABLE READ guarantees that the nonrepeatable-read problem doesn't happen. This isolation level, however, is still haunted by the phantom read — a problem that arises when the data a user is reading changes in response to another transaction (and does not show the change onscreen) while the user is reading it.

Suppose, for example, that User 1 issues a command whose search condition (the WHERE clause or HAVING clause) selects a set of rows — and, immediately afterward, User 2 performs and commits an operation that changes the data in some of those rows. Those data items met User 1's search condition at the start of this snafu, but now they no longer do. Maybe some other rows that first did not meet the original search condition now do meet it. User 1, whose transaction is still active, has no inkling of these changes; the application behaves as if nothing has happened. The hapless User 1 issues another SQL statement with the same search conditions as the original one, expecting to retrieve the same rows. Instead, the second operation is performed on rows other than those used in the first operation. Reliable results go out the window, spirited away by the phantom read.

Getting a reliable (if slower) read

An isolation level of SERIALIZABLE is not subject to any of the problems that beset the other three levels. At this level, concurrent transactions can be run without interfering with each other, and results are the same as they’d be if the transactions had been run serially — one after the other — rather than in parallel. If you’re running at this isolation level, hardware or software problems can still cause your transaction to fail, but at least you don’t have to worry about the validity of your results if you know your system is functioning properly.

Of course, superior reliability may come at the price of slower performance, so you’re back in Tradeoff City. Table 15-1 shows how the different isolation levels stack up.

TABLE 15-1 Isolation Levels and Problems Solved

Isolation Level	Problems Solved Dirty Read	Nonrepeatable \|Phantom Read \| Read
`READ UNCOMMITTED`	No	No \| No
`READ COMMITTED`	Yes	No \| No
`REPEATABLE READ`	Yes	Yes \| No
`SERIALIZABLE`	Yes	Yes \| Yes

The implicit transaction-starting statement

Some SQL implementations require that you signal the beginning of a transaction with an explicit statement, such as BEGIN or BEGIN TRAN. Standard SQL does not. If you don't have an active transaction and you issue a statement that calls for one, standard SQL starts a default transaction for you. CREATE TABLE, SELECT, and UPDATE are examples of statements that require the context of a transaction. Issue one of these statements, and standard SQL starts a transaction for you.

SET TRANSACTION

On occasion, you may want to use transaction characteristics that are different from those set by default. You can specify different characteristics with a SET TRANSACTION statement before you issue your first statement that actually requires a transaction. The SET TRANSACTION statement enables you to specify mode, isolation level, and diagnostics size.

To change all three, for example, you may issue the following statement:

SET TRANSACTION

READ ONLY,

ISOLATION LEVEL READ UNCOMMITTED,

DIAGNOSTICS SIZE 4 ;

With these settings, you can't issue any statements that change the database (READ ONLY), and you have set the lowest and most hazardous isolation level (READ UNCOMMITTED). The diagnostics area has a size of 4. You are making minimal demands on system resources.

In contrast, you may issue this statement:

SET TRANSACTION

READ WRITE,

ISOLATION LEVEL SERIALIZABLE,

DIAGNOSTICS SIZE 8 ;

These settings enable you to change the database; they also give you the highest level of isolation — and a larger diagnostics area. The tradeoff is that they also make larger demands on system resources. Depending on your implementation, these settings may turn out to be the same as those used by the default transaction. Naturally, you can issue SET TRANSACTION statements with other choices for isolation level and diagnostics size.

Set your transaction isolation level as high as you need to, but no higher. Always setting your isolation level to SERIALIZABLE just to be on the safe side may seem reasonable, but it isn't so for all systems. Depending on your implementation (and on what you’re doing), you may not need to do so — and performance can suffer significantly if you do. If you don’t intend to change the database in your transaction, for example, set the mode to READ ONLY. Bottom line: Don’t tie up any system resources that you don’t need.

COMMIT

Although SQL doesn’t require an explicit transaction-starting keyword, it has two that terminate a transaction: COMMIT and ROLLBACK. Use COMMIT when you've come to the end of the transaction and you want to make permanent the changes (if any) that you made to the database. You may include the optional keyword WORK (COMMIT WORK) if you want. If the database encounters an error or the system crashes while a COMMIT is in progress, you may have to roll the transaction back and try it again.

ROLLBACK

When you come to the end of a transaction, you may decide that you don't want to make permanent the changes that have occurred during the transaction. In such a case, you should restore the database to the state it was in before the transaction began. To do this, issue a ROLLBACK statement. ROLLBACK is a fail-safe mechanism.

Even if the system crashes while a ROLLBACK is in progress, you can restart the ROLLBACK after you restore the system; the rollback will continue its work, restoring the database to its pre-transaction state.

Locking database objects

The isolation level — set either by default or by a SET TRANSACTION statement — tells the DBMS how zealous to be in protecting your work from interaction with the work of other users. The main protection from harmful transactions that the DBMS gives to you is its application of locks to the database objects you're using. Here are a few examples:

The table row you’re accessing is locked, preventing others from accessing that record while you’re using it.
An entire table is locked, if you’re performing an operation that could affect the whole table.
Reading, but not writing, is allowed. Sometimes writing is allowed but not reading.

Each implementation handles locking in its own way. Some implementations are more bulletproof than others, but most up-to-date systems protect you from the worst problems that can arise in a concurrent-access situation.

Backing up your data

Backing up data is a protective action that your DBA should perform on a regular basis. All system elements should be backed up at intervals that depend on how frequently they’re updated. If your database is updated daily, it should be backed up daily. Your applications, forms, and reports may change, too, though less frequently. Whenever you make changes to them, your DBA should back up the new versions.

Keep several generations of backups. Sometimes, database damage doesn’t become evident until some time has passed. To return to the last good version, you may have to go back several backup versions.

You can perform a backup in one of several different ways:

Use SQL to create backup tables and copy data into them.
Use an implementation-defined mechanism that backs up the whole database or portions of it. Using such a mechanism is generally more convenient and efficient than using SQL.
Your installation may have a mechanism in place for backing up everything, including databases, programs, documents, spreadsheets, utilities, and computer games. If so, you may not have to do anything beyond assuring yourself that the backups are performed frequently enough to protect you.

HAVING AN ACID DATABASE

You may hear database designers say they want their databases to have ACID. Well, no, they’re not planning to zonk their creations with a 1960s psychedelic or dissolve the data they contain into a bubbly mess. ACID is simply an acronym for Atomicity, Consistency, Isolation, and Durability. These four characteristics are necessary to protect a database from corruption:

Atomicity: Database transactions should be atomic in the classic sense of the word: The entire transaction is treated as an indivisible unit. Either it is executed in its entirety (committed), or the database is restored (rolled back) to the state it would have been in if the transaction had not been executed.
Consistency: Oddly enough, the meaning of consistency is not consistent; it varies from one application to another. When you transfer funds from one account to another in a banking application, for example, you want the total amount of money from both accounts at the end of the transaction to be the same as it was at the beginning of the transaction. In a different application, your criterion for consistency might be different.
Isolation: Ideally, database transactions should be totally isolated from other transactions that execute at the same time. If the transactions are serializable, then total isolation is achieved. If the system has to process transactions at top speed, sometimes lower levels of isolation can enhance performance.
Durability: After a transaction has committed or rolled back, you should be able to count on the database being in the proper state: well stocked with uncorrupted, reliable, up-to-date data. Even if your system suffers a hard crash after a commit — but before the transaction is stored to disk — a durable DBMS can guarantee that upon recovery from the crash, the database can be restored to its proper state.

Savepoints and subtransactions

Ideally, transactions should be atomic — as indivisible as the ancient Greeks thought atoms were. However, atoms are not really indivisible — and, starting with SQL:1999, database transactions are not really atomic. A transaction is divisible into multiple subtransactions. Each subtransaction is terminated by a SAVEPOINT statement. The SAVEPOINT statement is used in conjunction with the ROLLBACK statement. Before the introduction of savepoints (the point in the program where the SAVEPOINT statement takes effect), the ROLLBACK statement could be used only to cancel an entire transaction. Now it can be used to roll back a transaction to a savepoint within the transaction. What good is this, you might ask?

Granted, the primary use of the ROLLBACK statement is to prevent data corruption if a transaction is interrupted by an error condition. And no, rolling back to a savepoint does not make sense if an error occurred while a transaction was in progress; you'd want to roll back the entire transaction to bring the database back to the state it was in before the transaction started. But you might have other reasons for rolling back part of a transaction.

Suppose you’re performing a complex series of operations on your data. Partway through the process, you receive results that lead you to conclude that you’re going down an unproductive path. If you were thinking ahead enough to put a SAVEPOINT statement just before you started on that path, you can roll back to the savepoint and try another option. Provided the rest of your code was in good shape before you set the savepoint, this approach works better than aborting the current transaction and starting a new one just to try a new path.

To insert a savepoint into your SQL code, use the following syntax:

SAVEPOINT savepoint_name ;

You can cause execution to roll back to that savepoint with code such as the following:

ROLLBACK TO SAVEPOINT savepoint_name ;

Some SQL implementations may not include the SAVEPOINT statement. If your implementation is one of those, you won't be able to use it.

Constraints Within Transactions

Ensuring the validity of the data in your database means doing more than just making sure the data is of the right type. Perhaps some columns, for example, should never hold a null value — and maybe others should hold only values that fall within a certain range. Such restrictions are constraints, as discussed in Chapter 5.

Constraints are relevant to transactions because they can conceivably prevent you from doing what you want. For example, suppose that you want to add data to a table that contains a column with a NOT NULL constraint. One common method of adding a record is to append a blank row to your table and then insert values into it later. The NOT NULL constraint on one column, however, causes the append operation to fail. SQL doesn't allow you to add a row that has a null value in a column with a NOT NULL constraint, even though you plan to add data to that column before your transaction ends. To address this problem, SQL enables you to designate constraints as either DEFERRABLE or NOT DEFERRABLE.

Constraints that are NOT DEFERRABLE are applied immediately. You can set DEFERRABLE constraints to be either initially DEFERRED or IMMEDIATE. If a DEFERRABLE constraint is set to IMMEDIATE, it acts like a NOT DEFERRABLE constraint — it is applied immediately. If a DEFERRABLE constraint is set to DEFERRED, it is not enforced. (No, your code doesn't have an attitude problem; it’s simply following orders.)

To append blank records or perform other operations that may violate DEFERRABLE constraints, you can use a statement similar to the following:

SET CONSTRAINTS ALL DEFERRED ;

This statement puts all DEFERRABLE constraints in the DEFERRED condition. It does not affect the NOT DEFERRABLE constraints. After you've performed all operations that could violate your constraints — and the table reaches a state that doesn’t violate them — you can reapply them. The statement that reapplies your constraints looks like this:

SET CONSTRAINTS ALL IMMEDIATE ;

If you made a mistake and any of your constraints are still being violated, you find out as soon as this statement takes effect.

If you do not explicitly set your DEFERRED constraints to IMMEDIATE, SQL does it for you when you attempt to COMMIT your transaction. If a violation is still present at that time, the transaction does not COMMIT; instead, SQL gives you an error message.

SQL's handling of constraints protects you from entering invalid data (or an invalid absence of data, which is just as important), at the same time giving you the flexibility to violate constraints temporarily while a transaction is still active.

Consider a payroll example to see why being able to defer the application of constraints is important.

Assume that an EMPLOYEE table has columns EmpNo, EmpName, DeptNo, and Salary. EMPLOYEE.DeptNo is a foreign key that references the DEPT table. Assume also that the DEPT table has columns DeptNo and DeptName. DeptNo is the primary key.

In addition, you want to have a table like DEPT that also contains a Payroll column which (in turn) holds the sum of the Salary values for employees in each department.

Assuming you are using a DBMS that supports this SQL standard functionality, you can create the equivalent of this table with the following view:

CREATE VIEW DEPT2 AS

SELECT D.*, SUM(E.Salary) AS Payroll

FROM DEPT D, EMPLOYEE E

WHERE D.DeptNo = E.DeptNo

GROUP BY D.DeptNo ;

You can also define this same view as follows:

CREATE VIEW DEPT3 AS

SELECT D.*,

(SELECT SUM(E.Salary)

FROM EMPLOYEE E

WHERE D.DeptNo = E.DeptNo) AS Payroll

FROM DEPT D ;

But suppose that, for efficiency, you don't want to calculate the SUM every time you reference DEPT3.Payroll. Instead, you want to store an actual Payroll column in the DEPT table. You will then update that column every time you change a Salary.

To make sure that the Salary column is accurate, you can include a CONSTRAINT in the table definition:

CREATE TABLE DEPT

(DeptNo CHAR(5),

DeptNameCHAR(20),

Payroll DECIMAL(15,2),

CHECK (Payroll = (SELECT SUM(Salary)

FROM EMPLOYEE E

WHERE E.DeptNo= DEPT.DeptNo)));

Now, suppose you want to increase the Salary of employee 123 by 100. You can do it with the following update:

UPDATE EMPLOYEE

SET Salary = Salary + 100

WHERE EmpNo = '123' ;

With this approach, you must remember to do the following as well:

UPDATE DEPT D

SET Payroll = Payroll + 100

WHERE D.DeptNo = (SELECT E.DeptNo

FROM EMPLOYEE E

WHERE E.EmpNo = '123') ;

(You use the subquery to reference the DeptNo of employee 123.)

But there's a problem: Constraints are checked after each statement. In principle, all constraints are checked. In practice, implementations check only the constraints that reference the values modified by the statement.

After the first preceding UPDATE statement, the implementation checks all constraints that reference any values that the statement modifies. This includes the constraint defined in the DEPT table, because that constraint references the Salary column of the EMPLOYEE table and the UPDATE statement is modifying that column. After the first UPDATE statement, that constraint is violated. You assume that before you execute the UPDATE statement the database is correct, and each Payroll value in the DEPT table equals the sum of the Salary values in the corresponding columns of the EMPLOYEE table. When the first UPDATE statement increases a Salary value, this equality is no longer true. The second UPDATE statement corrects this — and again leaves the database values in a state for which the constraint is True. Between the two updates, the constraint is False.

The SET CONSTRAINTS DEFERRED statement lets you temporarily disable or suspend all constraints, or only specified constraints. The constraints are deferred until either you execute a SET CONSTRAINTS IMMEDIATE statement or you execute a COMMIT or ROLLBACK statement. So you surround the previous two UPDATE statements with SET CONSTRAINTS statements. The code looks like this:

SET CONSTRAINTS DEFERRED ;

UPDATE EMPLOYEE

SET Salary = Salary + 100

WHERE EmpNo = '123' ;

UPDATE DEPT D

SET Payroll = Payroll + 100

WHERE D.DeptNo = (SELECT E.DeptNo

FROM EMPLOYEE E

WHERE E.EmpNo = '123') ;

SET CONSTRAINTS IMMEDIATE ;

This procedure defers all constraints. If you insert new rows into DEPT, the primary keys won't be checked; you’ve removed protection that you may want to keep. Instead, you should specify the constraints that you want to defer. To do this, name the constraints when you create them:

CREATE TABLE DEPT

(DeptNo CHAR(5),

DeptName CHAR(20),

Payroll DECIMAL(15,2),

CONSTRAINT PayEqSumsal

CHECK (Payroll = SELECT SUM(Salary)

FROM EMPLOYEE E

WHERE E.DeptNo = DEPT.DeptNo)) ;

With constraint names in place, you can then reference your constraints individually:

SET CONSTRAINTS PayEqSumsal DEFERRED;

UPDATE EMPLOYEE

SET Salary = Salary + 100

WHERE EmpNo = '123' ;

UPDATE DEPT D

SET Payroll = Payroll + 100

WHERE D.DeptNo = (SELECT E.DeptNo

FROM EMPLOYEE E

WHERE E.EmpNo = '123') ;

SET CONSTRAINTS PayEqSumsal IMMEDIATE;

Without a constraint name in the CREATE statement, SQL generates one implicitly. That implicit name is in the schema information (catalog) tables. But specifying the names explicitly is more straightforward.

Now suppose that you mistakenly specified an increment value of 1000 in the second UPDATE statement. This value is allowed in the UPDATE statement because the constraint has been deferred. But when you execute SET CONSTRAINTS … IMMEDIATE, the specified constraints are checked. If they fail, SET CONSTRAINTS raises an exception. If, instead of a SET CONSTRAINTS … IMMEDIATE statement, you execute COMMIT and the constraints are found to be False, COMMIT instead performs a ROLLBACK.

Bottom line: You can defer the constraints only within a transaction. When the transaction is terminated by a ROLLBACK or a COMMIT, the constraints are both enabled and checked. The SQL capability of deferring constraints is meant to be used within a transaction. If used properly, the terminated transaction doesn’t create any data that violates a constraint available to other transactions.

Avoiding SQL Injection Attacks

It’s hard enough to keep your data safe from platform instability, equipment failure, and concurrent access. What if someone is deliberately trying to steal or corrupt your data, or harm you in some other way? That can cause far more serious problems. There are many ways that a malicious actor could attack a computer system, but the one most connected to database applications is the SQL injection attack.

The name is very descriptive. In an SQL injection attack, a malicious actor attempts to inject malicious code into a database application. Such code could transfer control of the database to the attacker. At that point the attacker could surreptitiously alter the data to the detriment of the owner or the users, or she could just delete entire tables.

A weak point of any application is any solicitation of input from the user, including asking for login credentials. As an application developer, when you code text boxes for the user to enter a username and a password into, you expect the user to enter a username and a password. A hacker, on the other hand, will enter something that you don’t expect, something that will cause the application to react in a way that tells the attacker something that she did not already know. That knowledge enables her to penetrate a little farther. When she attains system administrator privileges, the game is over and your data is at her mercy.

SQL injection attacks take advantage of dynamic SQL in an application’s code. There are two kinds of SQL: static SQL and dynamic SQL. Static SQL is hard-coded into an application program and becomes fixed at compile time. It cannot be easily hacked. Dynamic SQL, on the other hand, is assembled and executed at runtime. An SQL injection attack makes use of this by tacking some extra code onto a legal data entry. This extra code gets incorporated into the dynamic SQL statement that was designed to accept and act upon the input that was supposed to be entered into that text box. The attacking code could cause sensitive information to be revealed to the hacker, or it could even destroy the database.

There are defenses against SQL injection attacks. Primarily, these involve carefully validating any user input before incorporating it into a dynamic SQL statement. I cover SQL injection in detail, as well as other threats to a database in my SQL All-In-One For Dummies (Wiley Publishing, Inc.).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 15: Protecting Data

Create new playlist

Sign In

Sign Up

Protecting Data

Threats to Data Integrity

Platform instability

Equipment failure

Concurrent access

Transaction interaction trouble

Serialization eliminates harmful interactions

Reducing Vulnerability to Data Corruption

Using SQL transactions

The default transaction

Isolation levels

Getting mucked up by a dirty read

Getting bamboozled by a nonrepeatable read

Risking the phantom read

Getting a reliable (if slower) read

The implicit transaction-starting statement

SET TRANSACTION

COMMIT

ROLLBACK

Locking database objects

Backing up your data

Savepoints and subtransactions

Constraints Within Transactions

Avoiding SQL Injection Attacks

Table of Contents for
Chapter 15: Protecting Data