CHAPTER 6 Locking and Latching

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 6
Locking and Latching

One of the key challenges in developing multiuser, database-driven applications is to maximize concurrent access and, at the same time, ensure that each user is able to read and modify the data in a consistent fashion. The locking mechanisms that allow this to happen are key features of any database, and Oracle excels in providing them. However, Oracle's implementation of these features is specific to Oracle—just as SQL Server's implementation is to SQL Server—and it is up to you, the application developer, to ensure that when your application performs data manipulation, it uses these mechanisms correctly. If you fail to do so, your application will behave in an unexpected way, and inevitably the integrity of your data will be compromised (as was demonstrated in Chapter 1).

In this chapter, we'll take a detailed look at how Oracle locks both data (e.g., rows in tables) and shared data structures (such as those found in the SGA). We'll investigate the granularity to which Oracle locks data and what that means to you, the developer. When appropriate, I'll contrast Oracle's locking scheme with other popular implementations, mostly to dispel the myth that row-level locking adds overhead; it adds overhead only if the implementation adds overhead. In the next chapter, we'll continue this discussion and investigate Oracle's multi-versioning techniques and how locking strategies interact with them.

What Are Locks?

Locks are mechanisms used to regulate concurrent access to a shared resource. Note how I used the term "shared resource" and not "database row." It is true that Oracle locks table data at the row level, but it also uses locks at many other levels to provide concurrent access to various resources. For example, while a stored procedure is executing, the procedure itself is locked in a mode that allows others to execute it, but it will not permit another user to alter it in any way. Locks are used in the database to permit concurrent access to these shared resources, while at the same time providing data integrity and consistency.

In a single-user database, locks are not necessary. There is, by definition, only one user modifying the information. However, when multiple users are accessing and modifying data or data structures, it is crucial to have a mechanism in place to prevent concurrent modification of the same piece of information. This is what locking is all about.

It is very important to understand that there are as many ways to implement locking in a database as there are databases. Just because you have experience with the locking model of one particular relational database management system (RDBMS) does not mean you know everything about locking. For example, before I got heavily involved with Oracle, I used other databases such as Sybase, Microsoft SQL Server, and Informix. All three of these databases provide locking mechanisms for concurrency control, but there are deep and fundamental differences in the way locking is implemented in each one. To demonstrate this, I'll outline my progression from a SQL Server developer to an Informix user and finally an Oracle developer. This happened many years ago, and the SQL Server fans out there will tell me "But we have row-level locking now!" It is true: SQL Server may now use row-level locking, but the way it is implemented is totally different from the way it is done in Oracle. It is a comparison between apples and oranges, and that is the key point.

As a SQL Server programmer, I would hardly ever consider the possibility of multiple users inserting data into a table concurrently. It was something that just didn't often happen in that database. At that time, SQL Server provided only for page-level locking and, since all the data tended to be inserted into the last page of nonclustered tables, concurrent inserts by two users was simply not going to happen.

Note A SQL Server clustered table (a table that has a clustered index) is in some regard similar to, but very different from, an Oracle cluster. SQL Server used to only support page (block) level locking, and if every row inserted was to go to the "end" of the table, you would never have had concurrent inserts, concurrent transactions in that database. The clustered index in SQL Server was used to cause rows to be inserted all over the table, in sorted order by the cluster key, and as such was used to improve concurrency in that database.

Exactly the same issue affected concurrent updates (since an UPDATE was really a DELETE followed by an INSERT). Perhaps this is why SQL Server, by default, commits or rolls back immediately after execution of each and every statement, compromising transactional integrity in an attempt to gain higher concurrency.

So in most cases, with page-level locking, multiple users could not simultaneously modify the same table. Compounding this was the fact that while a table modification was in progress, many queries were also effectively blocked against that table. If I tried to query a table and needed a page that was locked by an update, I waited (and waited and waited). The locking mechanism was so poor that providing support for transactions that took more than a second was deadly—the entire database would appear to "freeze" if you did. I learned a lot of bad habits here. I learned that transactions were "bad" and that you ought to commit rapidly and never hold locks on data. Concurrency came at the expense of consistency. You either wanted to get it right or get it fast. I came to believe that you couldn't have both.

When I moved on to Informix, things were better, but not by much. As long as I remembered to create a table with row-level locking enabled, then I could actually have two people simultaneously insert data into that table. Unfortunately, this concurrency came at a high price. Row-level locks in the Informix implementation were expensive, both in terms of time and memory. It took time to acquire and "unacquire" or release them, and each lock consumed real memory. Also, the total number of locks available to the system had to be computed prior to starting the database. If you exceeded that number, then you were just out of luck. Consequently, most tables were created with page-level locking anyway, and, as with SQL Server, both row and page-level locks would stop a query in its tracks. As a result, I found that once again I would want to commit as fast as I could. The bad habits I picked up using SQL Server were simply reinforced and, furthermore, I learned to treat a lock as a very scarce resource—something to be coveted. I learned that you should manually escalate locks from row level to table level to try to avoid acquiring too many of them and bringing the system down, and bring it down I did—many times.

When I started using Oracle, I didn't really bother reading the manuals to find out how locking worked in this particular database. After all, I had been using databases for quite a while and was considered something of an expert in this field (in addition to Sybase, SQL Server, and Informix, I had used Ingress, DB2, Gupta SQLBase, and a variety of other databases). I had fallen into the trap of believing that I knew how things should work, so I thought of course they would work in that way. I was wrong in a big way.

It was during a benchmark that I discovered just how wrong I was. In the early days of these databases (around 1992/1993), it was common for the vendors to "benchmark" for really large procurements to see who could do the work the fastest, the easiest, with the most features.

The benchmark was between Informix, Sybase, SQL Server, and Oracle. Oracle was first. Their technical people came on-site, read through the benchmark specs, and started setting it up. The first thing I noticed was that the technicians from Oracle were going to use a database table to record their timings, even though we were going to have many dozens of connections doing work, each of which would frequently need to insert and update data in this log table. Not only that, but they were going to read the log table during the benchmark as well! Being a nice guy, I pulled one of the Oracle technicians aside to ask him if they were crazy—why would they purposely introduce another point of contention into the system? Wouldn't the benchmark processes all tend to serialize around their operations on this single table? Would they jam the benchmark by trying to read from this table as others were heavily modifying it? Why would they want to introduce all of these extra locks that they would need to manage? I had dozens of "Why would you even consider that?"-type questions. The technical folks from Oracle thought I was a little daft at that point. That is, until I pulled up a window into SQL Server or Informix, and showed them the effects of two people inserting into a table, or someone trying to query a table with others inserting rows (the query returns zero rows per second). The differences between the way Oracle does it and the way almost every other database does it are phenomenal—they are night and day.

Needless to say, neither the Informix nor the SQL Server technicians were too keen on the database log table approach during their attempts. They preferred to record their timings to flat files in the operating system. The Oracle people left with a better understanding of exactly how to compete against SQL Server and Informix: just ask the audience "How many rows per second does your current database return when data is locked?" and take it from there.

The moral to this story is twofold. First, all databases are fundamentally different. Second, when designing an application for a new database platform, you must make no assumptions about how that database works. You must approach each new database as if you had never used a database before. Things you would do in one database are either not necessary or simply won't work in another database.

In Oracle you will learn that

Transactions are what databases are all about. They are a "good thing."
You should defer committing until the correct moment. You should not do it quickly to avoid stressing the system, as it does not stress the system to have long or large transactions. The rule is commit when you must, and not before. Your transactions should only be as small or as large as your business logic dictates.
You should hold locks on data as long as you need to. They are tools for you to use, not things to be avoided. Locks are not a scarce resource. Conversely, you should hold locks on data only as long as you need to. Locks may not be scarce, but they can prevent other sessions from modifying information.
There is no overhead involved with row-level locking in Oracle—none. Whether you have 1 row lock or 1,000,000 row locks, the number of "resources" dedicated to locking this information will be the same. Sure, you'll do a lot more work modifying 1,000,000 rows rather than 1 row, but the number of resources needed to lock 1,000,000 rows is the same as for 1 row; it is a fixed constant.
You should never escalate a lock (e.g., use a table lock instead of row locks) because it would be "better for the system." In Oracle, it won't be better for the system—it will save no resources. There are times to use table locks, such as in a batch process, when you know you will update the entire table and you do not want other sessions to lock rows on you. But you are not using a table lock to make it easier for the system by avoiding having to allocate row locks.
Concurrency and consistency can be achieved simultaneously. You can get the data quickly and accurately, every time. Readers of data are not blocked by writers of data. Writers of data are not blocked by readers of data. This is one of the fundamental differences between Oracle and most other relational databases.

As we cover the remaining components in this chapter and the next, I'll reinforce these points.

Locking Issues

Before we discuss the various types of locks that Oracle uses, it is useful to look at some locking issues, many of which arise from badly designed applications that do not make correct use (or make no use) of the database's locking mechanisms.

Lost Updates

A lost update is a classic database problem. Actually, it is a problem in all multiuser computer environments. Simply put, a lost update happens when the following events occur, in the order presented here:

A transaction in Session1 retrieves (queries) a row of data into local memory and displays it to an end user, User1.
Another transaction in Session2 retrieves that same row, but displays the data to a different end user, User2.
User1, using the application, modifies that row and has the application update the database and commit. Session1's transaction is now complete.
User2 modifies that row also, and has the application update the database and commit. Session2's transaction is now complete.

This process is referred to as a "lost update" because all of the changes made in step 3 will be lost. Consider, for example, an employee update screen that allows a user to change an address, work number, and so on. The application itself is very simple: a small search screen to generate a list of employees and then the ability to drill down into the details of each employee. This should be a piece of cake. So, we write the application with no locking on our part, just simple SELECT and UPDATE commands.

Then an end user (User1) navigates to the details screen, changes an address on the screen, clicks Save, and receives confirmation that the update was successful. Fine, except that when User1 checks the record the next day to send out a tax form, the old address is still listed. How could that have happened? Unfortunately, it can happen all too easily. In this case, another end user (User2) queried the same record just after User1 did—after User1 read the data, but before User1 modified it. Then after User2 queried the data, User1 performed her update, received confirmation, and even requeried to see the change for herself. However, User2 then updated the work telephone number field and clicked Save, blissfully unaware of the fact that he just overwrote User1's changes to the address field with the old data! The reason this can happen in this case is that the application developer wrote the program such that when one particular field is updated, all fields for that record are "refreshed" (simply because it's easier to update all the columns instead of figuring out exactly which columns changed and only updating those).

Notice that for this to happen, User1 and User2 didn't even need to be working on the record at the exact same time. They simply needed to be working on the record at about the same time.

I've seen this database issue crop up time and again when GUI programmers with little or no database training are given the task of writing a database application. They get a working knowledge of SELECT, INSERT, UPDATE, and DELETE and then set about writing the application. When the resulting application behaves in the manner just described, it completely destroys a user's confidence in it, especially since it seems so random, so sporadic, and it is totally irreproducible in a controlled environment (leading the developer to believe it must be a user error).

Many tools, such as Oracle Forms and HTML DB, transparently protect you from this behavior by ensuring the record is unchanged from the time you query it and locked before you make any changes to it, but many others (such as a handwritten Visual Basic or Java program) do not. What the tools that protect you do behind the scenes, or what the developers must do themselves, is use one of two types of locking strategies: pessimistic or optimistic.

Pessimistic Locking

This locking method would be put into action the instant before a user modifies a value on the screen. For example, a row lock would be placed as soon as the user indicates his intention to perform an update on a specific row that he has selected and has visible on the screen (by clicking a button on the screen, say).

Pessimistic locking is useful only in a stateful or connected environment—that is, one where your application has a continual connection to the database and you are the only one using that connection for at least the life of your transaction. This was the prevalent way of doing things in the early to mid 1990s with client/server applications. Every application would get a direct connection to the database to be used solely by that application instance. This method of connecting, in a stateful fashion, has become less common (though it is not extinct), especially with the advent of application servers in the mid to late 1990s.

Assuming you are using a stateful connection, you might have an application that queries the data without locking anything:

scott@ORA10G> select empno, ename, sal from emp where deptno = 10;



     EMPNO ENAME             SAL

---------- ---------- ----------

      7782 CLARK            2450

      7839 KING             5000

      7934 MILLER           1300

Eventually, the user picks a row she would like to update. Let's say in this case, she chooses to update the MILLER row. Our application will at that point in time (before the user makes any changes on the screen but after the row has been out of the database for a while) bind the values the user selected so we can query the database and make sure the data hasn't been changed yet. In SQL*Plus, to simulate the bind calls the application would make, we can issue the following:

scott@ORA10G> variable empno number

scott@ORA10G> variable ename varchar2(20)

scott@ORA10G> variable sal number

scott@ORA10G> exec :empno := 7934; :ename := 'MILLER'; :sal := 1300;

PL/SQL procedure successfully completed.

Now in addition to simply querying the values and verifying that they have not been changed, we are going to lock the row using FOR UPDATE NOWAIT. The application will execute the following query:

scott@ORA10G> select empno, ename, sal

  2  from emp

  3  where empno = :empno

  4    and ename = :ename

  5    and sal = :sal

  6    for update nowait

  7  /

     EMPNO ENAME             SAL

---------- ---------- ----------

      7934 MILLER           1300

The application supplies values for the bind variables from the data on the screen (in this case 7934, MILLER, and 1300) and requeries this same row from the database, this time locking the row against updates by other sessions; hence, this approach is called pessimistic locking. We lock the row before we attempt to update because we doubt—we are pessimistic—that the row will remain unchanged otherwise.

Since all tables should have a primary key (the preceding SELECT will retrieve at most one record since it includes the primary key, EMPNO) and primary keys should be immutable (we should never update them), we'll get one of three outcomes from this statement:

If the underlying data has not changed, we will get our MILLER row back, and this row will be locked from updates (but not reads) by others.
If another user is in the process of modifying that row, we will get an ORA-00054 resource busy error. We must wait for the other user to finish with it.
If, in the time between selecting the data and indicating our intention to update, someone has already changed the row, then we will get zero rows back. That implies the data on our screen is stale. To avoid the lost update scenario previously described, the application needs to requery and lock the data before allowing the end user to modify it. With pessimistic locking in place, when User2 attempts to update the telephone field the application would now recognize that the address field had been changed and would requery the data. Thus, User2 would not overwrite User1's change with the old data in that field.

Once we have locked the row successfully, the application will bind the new values, and issue the update and commit the changes:

scott@ORA10G> update emp

  2 set ename = :ename, sal = :sal

  3 where empno = :empno;

1 row updated.



scott@ORA10G> commit;

Commit complete.

We have now very safely changed that row. It is not possible for us to overwrite someone else's changes, as we verified the data did not change between when we initially read it out and when we locked it.

Optimistic Locking

The second method, referred to as optimistic locking, defers all locking up to the point right before the update is performed. In other words, we will modify the information on the screen without a lock being acquired. We are optimistic that the data will not be changed by some other user; hence, we wait until the very last moment to find out if we are right.

This locking method works in all environments, but it does increase the probability that a user performing an update will "lose." That is, when that user goes to update her row, she finds that the data has been modified, and she has to start over.

One popular implementation of optimistic locking is to keep the old and new values in the application, and upon updating the data use an update like this:

Update table

   Set column1 = :new_column1, column2 = :new_column2, ....

 Where primary_key = :primary_key

   And column1 = :old_column1

   And column2 = :old_column2

    ...

Here, we are optimistic that the data doesn't get changed. In this case, if our update updates one row, we got lucky; the data didn't change between the time we read it and the time we got around to submitting the update. If we update zero rows, we lose; someone else changed the data and now we must figure out what we want to do to continue in the application. Should we make the end user rekey the transaction after querying the new values for the row (potentially causing the user frustration, as there is a chance the row will have changed yet again)? Should we try to merge the values of the two updates by performing update conflict-resolution based on business rules (lots of code)?

The preceding UPDATE will, in fact, avoid a lost update, but it does stand a chance of being blocked—hanging while it waits for an UPDATE of that row by another session to complete. If all of your applications use optimistic locking, then using a straight UPDATE is generally OK since rows are locked for a very short duration as updates are applied and committed. However, if some of your applications use pessimistic locking, which will hold locks on rows for relatively long periods of time, then you will want to consider using a SELECT FOR UPDATE NOWAIT instead, to verify the row was not changed and lock it immediately prior to the UPDATE to avoid getting blocked by another session.

There are many methods of implementing optimistic concurrency control. We've discussed one whereby the application will store all of the before images of the row in the application itself. In the following sections, we'll explore three others, namely

Using a special column that is maintained by a database trigger or application code to tell us the "version" of the record
Using a checksum or hash that was computed using the original data
Using the new Oracle 10g feature ORA_ROWSCN

Optimistic Locking Using a Version Column

This is a simple implementation that involves adding a single column to each database table you wish to protect from lost updates. This column is generally either a NUMBER or DATE/TIMESTAMP column. It is typically maintained via a row trigger on the table, which is responsible for incrementing the NUMBER column or updating the DATE/TIMESTAMP column every time a row is modified.

The application you want to implement optimistic concurrency control on would need only to save the value of this additional column, not all of the before images of the other columns. The application would only need to verify that the value of this column in the database at the point when the update is requested matches the value that was initially read out. If these values are the same, then the row has not been updated.

Let's look at an implementation of optimistic locking using a copy of the SCOTT.DEPT table. We could use the following Data Definition Language (DDL) to create the table:

ops$tkyte@ORA10G> create table dept

  2  ( deptno     number(2),

  3    dname      varchar2(14),

  4    loc        varchar2(13),

  5    last_mod   timestamp with time zone

  6               default systimestamp

  7               not null,

  8    constraint dept_pk primary key(deptno)

  9  )

 10  /

Table created.

Then we INSERT a copy of the DEPT data into this table:

ops$tkyte@ORA10G> insert into dept( deptno, dname, loc )

  2  select deptno, dname, loc

  3    from scott.dept;

4 rows created.



ops$tkyte@ORA10G> commit;

Commit complete.

That code re-creates the DEPT table, but with an additional LAST_MOD column that uses the TIMESTAMP WITH TIME ZONE datatype (available in Oracle9i and above). We have defined this column to be NOT NULL so that it must be populated, and its default value is the current system time.

This TIMESTAMP datatype has the highest precision in Oracle, typically going down to the microsecond (millionth of a second). For an application that involves user think time, this level of precision on the TIMESTAMP is more than sufficient, as it is highly unlikely that the process of the database retrieving a row and a human looking at it, modifying it, and issuing the update back to the database could take place within a fraction of a second. The odds of two people reading and modifying the same row in the same fraction of a second are very small indeed.

Next, we need way of maintaining this value. We have two choices: either the application can maintain the LAST_MOD column by setting its value to SYSTIMESTAMP when it updates a record or a trigger/stored procedure can maintain it. Having the application maintain LAST_MOD is definitely more performant than a trigger-based approach, since a trigger will add additional processing on part of Oracle to the modification. However, this does mean that you are relying on all of the applications to maintain LAST_MOD consistently in all of the places that they modify this table. So, if each application is responsible for maintaining this field, it needs to consistently verify that the LAST_MOD column was not changed and set the LAST_MOD column to the current SYSTIMESTAMP. For example, if an application queries the row where DEPTNO=10

ops$tkyte@ORA10G> variable deptno  number

ops$tkyte@ORA10G> variable dname    varchar2(14)

ops$tkyte@ORA10G> variable loc      varchar2(13)

ops$tkyte@ORA10G> variable last_mod varchar2(50)



ops$tkyte@ORA10G> begin

  2      :deptno := 10;

  3      select dname, loc, last_mod

  4        into :dname,:loc,:last_mod

  5        from dept

  6       where deptno = :deptno;

  7  end;

  8  /

PL/SQL procedure successfully completed.

which we can see is currently

ops$tkyte@ORA10G> select :deptno dno, :dname dname, :loc loc, :last_mod lm

  2    from dual;



       DNO DNAME      LOC      LM

---------- ---------- -------- -----------------------------------

        10 ACCOUNTING NEW YORK 25-APR-05 10.54.00.493380 AM -04:00

it would use this next update statement to modify the information. The last line does the very important check to make sure the timestamp has not changed and uses the built-in function TO_TIMESTAMP_TZ (TZ is short for TimeZone) to convert the string we saved in from the select back into the proper datatype. Additionally, line 3 of the update updates the LAST_MOD column to be the current time if the row is found to be updated:

ops$tkyte@ORA10G> update dept

  2     set dname = initcap(:dname),

  3         last_mod = systimestamp

  4   where deptno = :deptno

  5     and last_mod = to_timestamp_tz(:last_mod);

1 row updated.

As you can see, one row was updated—the row of interest. We updated the row by primary key (DEPTNO) and verified that the LAST_MOD column had not been modified by any other session between the time we read it first and the time we did the update. If we were to try to update that same record again, using the same logic, but without retrieving the new LAST_MOD value, we would observe the following:

ops$tkyte@ORA10G> update dept

  2     set dname = upper(:dname),

  3         last_mod = systimestamp

  4   where deptno = :deptno

  5     and last_mod = to_timestamp_tz(:last_mod);

0 rows updated.

Notice how 0 rows updated is reported this time because the predicate on LAST_MOD was not satisfied. While DEPTNO 10 still exists, the value at the moment we wish to update no longer matches the timestamp value at the moment we queried the row. So, the application knows, based on the fact that no rows were modified, that the data has been changed in the database—and it must now figure out what it wants to do about that.

You would not rely on each application to maintain this field for a number of reasons. For one, it adds code to an application, and it is code that must be repeated and correctly implemented anywhere this table is modified. In a large application, that could be in many places. Furthermore, every application developed in the future must conform to these rules. There are many chances to "miss" a spot in the application code and not have this field properly used. So, if the application code itself is not to be made responsible for maintaining this LAST_MOD field, then I believe that the application should not be made responsible for checking this LAST_MOD field either (if it can do the check, it can certainly do the update!). So, in this case, I suggest encapsulating the update logic in a stored procedure and not allowing the application to update the table directly at all. If it cannot be trusted to maintain the value in this field, then it cannot be trusted to check it properly either. So, the stored procedure would take as inputs the bind variables we used in the previous updates and do exactly the same update. Upon detecting that zero rows were updated, the stored procedure could raise an exception back to the client to let the client know the update had, in effect, failed.

An alternate implementation uses a trigger to maintain this LAST_MOD field, but for something as simple as this, my recommendation is to avoid the trigger and let the DML take care of it. Triggers introduce a measurable amount of overhead, and in this case they would be unnecessary.

Optimistic Locking Using a Checksum

This is very similar to the previous version column method, but it uses the base data itself to compute a "virtual" version column. I'll quote the Oracle 10g PL/SQL Supplied Packages Guide (before showing how to use one of the supplied packages!) to help explain the goal and concepts behind a checksum or hash function:

A one-way hash function takes a variable-length input string, the data, and converts it to a fixed-length (generally smaller) output string called a hash value. The hash value serves as a unique identifier (like a fingerprint) of the input data. You can use the hash value to verify whether data has been changed or not.

Note that a one-way hash function is a hash function that works in one direction. It is easy to compute a hash value from the input data, but it is hard to generate data that hashes to a particular value.

The hash value or checksum is not truly unique. It is just designed such that the probability of a collision is sufficiently small—that is, the probability of two random strings having the same checksum or hash is so small as to be negligible.

We can use these hashes or checksums in the same way that we used our version column. We simply compare the hash or checksum value we obtain when we read data out of the database with what we obtain before modifying the data. If someone modified the row's values after we read it out, but before we updated it, then the hash or checksum will almost certainly be different.

There are many ways to compute a hash or checksum. I'll list three of these and demonstrate one in this section. All of these methods are based on supplied database packages:

OWA_OPT_LOCK.CHECKSUM: This method is available on Oracle8i version 8.1.5 and up. There is a function that given a string returns a 16-bit checksum, and another function that given a ROWID will compute the 16-bit checksum of that row and lock that row at the same time. Possibilities of collision are 1 in 65,536 strings (the highest chance of a false positive).
DBMS_OBFUSCATION_TOOLKIT.MD5: This method is available in Oracle8i version 8.1.7 and up. It computes a 128-bit message digest. The odds of a collision are about 1 in 3.4028E+38 (very small).
DBMS_CRYPTO.HASH: This method is available in Oracle 10g Release 1 and up. It is capable of computing a Secure Hash Algorithm 1 (SHA-1) or MD4/MD5 message digests. It is recommended that you use the SHA-1 algorithm.

Note An array of hash and checksum functions are available in many programming languages, so there may be others at your disposal outside the database.

The following example shows how you might use the DBMS_CRYPTO built-in package in Oracle 10g to compute these hashes/checksums. The technique would also be applicable for the other two listed packages; the logic would not be very much different, but the APIs you call would be.

Here we query out the information for department 10 to be displayed in some application. Immediately after querying the information, we compute the hash using the DBMS_CRYPTO package. This is the "version" information that we retain in our application:

ops$tkyte@ORA10G> begin

  2      for x in ( select deptno, dname, loc

  3                   from dept

  4                  where deptno = 10 )

  5      loop

  6          dbms_output.put_line( 'Dname: ' || x.dname );

  7          dbms_output.put_line( 'Loc:  ' || x.loc );

  8          dbms_output.put_line( 'Hash:  ' ||

  9            dbms_crypto.hash

 10            ( utl_raw.cast_to_raw(x.deptno||'/'||x.dname||'/'||x.loc),

 11              dbms_crypto.hash_sh1 ) );

 12      end loop;

 13  end;

 14  /

Dname: ACCOUNTING

Loc:   NEW YORK

Hash:  C44F7052661CE945D385D5C3F911E70FA99407A6



PL/SQL procedure successfully completed.

As you can see, the hash is just a big string of hex digits. The return value from DBMS_CRYPTO is a RAW variable, and when displayed it will be implicitly converted into HEX for us. This is the value we would want to use before updating. To update that row, we would retrieve and lock the row in the database as it exists right now, and then compute the hash value of that retrieved row and compare this new hash value with the hash value we computed when we read the data out of the database. The logic for doing so could look like the following (in real life, we would use bind variables for the literal hash values, of course):

ops$tkyte@ORA10G> begin

  2      for x in ( select deptno, dname, loc

  3                   from dept

  4                  where deptno = 10

  5                    for update nowait )

  6      loop

  7          if ( hextoraw( 'C44F7052661CE945D385D5C3F911E70FA99407A6' ) <>

  8               dbms_crypto.hash

  9               ( utl_raw.cast_to_raw(x.deptno||'/'||x.dname||'/'||x.loc),

  10                dbms_crypto.hash_sh1 ) )

  11          then

  12              raise_application_error(-20001, 'Row was modified' );

  13          end if;

  14      end loop;

  15      update dept

  16         set dname = lower(dname)

  17       where deptno = 10;

  18      commit;

  19  end;

  20  /



PL/SQL procedure successfully completed.

Upon requerying that data and computing the hash again after the update, we can see that the hash value is very different. If someone had modified the row before we did, our hash values would not have compared:

ops$tkyte@ORA10G> begin

  2      for x in ( select deptno, dname, loc

  3                   from dept

  4                  where deptno = 10 )

  5      loop

  6          dbms_output.put_line( 'Dname: ' || x.dname );

  7          dbms_output.put_line( 'Loc:  ' || x.loc );

  8          dbms_output.put_line( 'Hash:  ' ||

  9            dbms_crypto.hash

 10            ( utl_raw.cast_to_raw(x.deptno||'/'||x.dname||'/'||x.loc),

 11              dbms_crypto.hash_sh1 ) );

 12      end loop;

 13  end;

 14  /

Dname: accounting

Loc:   NEW YORK

Hash:  F3DE485922D44DF598C2CEBC34C27DD2216FB90F



PL/SQL procedure successfully completed.

This example showed how to implement optimistic locking with a hash or checksum. You should bear in mind that computing a hash or checksum is a somewhat CPU-intensive operation—it is computationally expensive. On a system where CPU is a scarce resource, you must take this fact into consideration. However, this approach is much more "network-friendly," as the transmission of a relatively small hash instead of a before and after image of the row (to compare column by column) over the network will consume much less of that resource. Our last example will use a new Oracle 10g function, ORA_ROWSCN, which is small in size like a hash, but not CPU intensive to compute.

Optimistic Locking Using ORA_ROWSCN

Starting with Oracle 10g Release 1, you have the option to use the built-in ORA_ROWSCN function. It works very much like the version column technique described previously, but it can be performed automatically by Oracle—you need no extra column in the table and no extra update/maintenance code to update this value.

ORA_ROWSCN is based on the internal Oracle system clock, the SCN. Every time you commit in Oracle, the SCN advances (other things can advance it as well, but it only advances; it never goes back). The concept is identical to the previous methods in that you retrieve ORA_ROWSCN upon data retrieval, and you verify it has not changed when you go to update. The only reason I give it more than passing mention is that unless you created the table to support the maintenance of ORA_ROWSCN at the row level, it is maintained at the block level. That is, by default many rows on a single block will share the same ORA_ROWSCN value. If you update a row on a block with 50 other rows, then they will all have their ORA_ROWSCN advanced as well. This would almost certainly lead to many false positives, whereby you believe a row was modified that in fact was not. Therefore, you need to be aware of this fact and understand how to change the behavior.

To see the behavior and then change it, we'll use the small DEPT table again:

ops$tkyte@ORA10G> create table dept

  2  (deptno, dname, loc, data,

  3   constraint dept_pk primary key(deptno)

  4  )

  5  as

  6  select deptno, dname, loc, rpad('*',3500,'*')

  7    from scott.dept;

Table created.

Now we can inspect what block each row is on (it is safe to assume in this case they are in the same file, so a common block number indicates they are on the same block). I was using an 8KB block size with a row width of about 3,550 bytes, so I am expecting there to be two rows per block for this example:

ops$tkyte@ORA10G> select deptno, dname,

  2         dbms_rowid.rowid_block_number(rowid) blockno,

  3             ora_rowscn

  4    from dept;

    DEPTNO DNAME             BLOCKNO ORA_ROWSCN

---------- -------------- ---------- ----------

        10 ACCOUNTING        20972     34676029

        20 RESEARCH          20972     34676029

        30 SALES             20973     34676029

        40 OPERATIONS        20973     34676029

And sure enough, that is what we observe in this case. So, let's update the row where DEPTNO = 10 is on block 20972:

ops$tkyte@ORA10G> update dept

  2     set dname = lower(dname)

  3   where deptno = 10;

1 row updated.



ops$tkyte@ORA10G> commit;

Commit complete.

What we'll observe next shows the consequences of ORA_ROWSCN being tracked at the block level. We modified and committed the changes to a single row, but the ORA_ROWSCN values of both of the rows on block 20972 have been advanced:

ops$tkyte@ORA10G> select deptno, dname,

  2         dbms_rowid.rowid_block_number(rowid) blockno,

  3             ora_rowscn

  4    from dept;



    DEPTNO DNAME             BLOCKNO ORA_ROWSCN

---------- -------------- ---------- ----------

        10 accounting         20972  34676046

        20 RESEARCH           20972  34676046

        30 SALES              20973  34676029

        40 OPERATIONS         20973  34676029

It would appear to anyone else that had read the DEPTNO=20 row that it had been modified, even though it was not. The rows on block 20973 are "safe"—we didn't modify them, so they did not advance. However, if we were to update either of them, both would advance. So the question becomes how to modify this default behavior. Well, unfortunately, we have to re-create the segment with ROWDEPENDENCIES enabled.

Row dependency tracking was added to the database with Oracle9i in support of advanced replication to allow for better parallel propagation of changes. Prior to Oracle 10g, its only use was in a replication environment, but starting in Oracle 10g we can use it to implement an effective optimistic locking technique with ORA_ROWSCN. It will add 6 bytes of overhead to each row (so it is not a space saver compared to the do-it-yourself version column) and that is, in fact, why it requires a table re-create and not just a simple ALTER TABLE: the physical block structure must be changed to accommodate this feature.

Let's rebuild our table to enable ROWDEPENDENCIES. We could use the online rebuild capabilities in DBMS_REDEFINITION (another supplied package) to do this, but for something so small, we'll just start over:

ops$tkyte@ORA10G> drop table dept;

Table dropped.



ops$tkyte@ORA10G> create table dept

  2  (deptno, dname, loc, data,

  3   constraint dept_pk primary key(deptno)

  4  )

  5  ROWDEPENDENCIES

  6  as

  7  select deptno, dname, loc, rpad('*',3500,'*')

  8    from scott.dept;

Table created.



ops$tkyte@ORA10G> select deptno, dname,

  2         dbms_rowid.rowid_block_number(rowid) blockno,

  3             ora_rowscn

  4    from dept;



    DEPTNO DNAME             BLOCKNO ORA_ROWSCN

---------- -------------- ---------- ----------

        10 ACCOUNTING          21020  34676364

        20 RESEARCH            21020  34676364

        30 SALES               21021  34676364

        40 OPERATIONS          21021  34676364

We're back where we were before: four rows on two blocks, all having the same initial ORA_ROWSCN value. Now when we update DEPTNO=10

ops$tkyte@ORA10G> update dept

  2     set dname = lower(dname)

  3   where deptno = 10;

1 row updated.



ops$tkyte@ORA10G> commit;

Commit complete.

we should observe the following upon querying the DEPT table:

ops$tkyte@ORA10G> select deptno, dname,

  2         dbms_rowid.rowid_block_number(rowid) blockno,

  3             ora_rowscn

  4    from dept;



    DEPTNO DNAME             BLOCKNO ORA_ROWSCN

---------- -------------- ---------- ----------

        10 accounting          21020  34676381

        20 RESEARCH            21020  34676364

        30 SALES               21021  34676364

        40 OPERATIONS          21021  34676364

The only modified ORA_ROWSCN at this point belongs to DEPTNO = 10, exactly what we wanted. We can now rely on ORA_ROWSCN to detect row-level changes for us.

CONVERTING AN SCN TO WALL CLOCK TIME

There is another benefit of the transparent ORA_ROWSCN column: we can convert an SCN into wall clock time approximately (within about +/-3 seconds) to discover when the row was last modified. So, for example, I can do this:

ops$tkyte@ORA10G> select deptno, ora_rowscn, scn_to_timestamp(ora_rowscn) ts

  2  from dept;



     DEPTNO ORA_ROWSCN TS

 ---------- ---------- -------------------------------

         10   34676381 25-APR-05 02.37.04.000000000 PM

         20   34676364 25-APR-05 02.34.42.000000000 PM

         30   34676364 25-APR-05 02.34.42.000000000 PM

         40   34676364 25-APR-05 02.34.42.000000000 PM

Here you can see that I waited almost three minutes in between the initial creation of the table and the update of DEPTNO = 10. However, this translation of an SCN to a wall clock time has definite limits: about five days of database uptime. For example, if I go to an "old" table and find the oldest ORA_ROWSCN in it (note that I've logged in as SCOTT in this case; I am not using the new table from earlier):

scott@ORA10G> select min(ora_rowscn) from dept;



MIN(ORA_ROWSCN)

---------------

         364937

If I try to convert that SCN into a timestamp, I might find the following (depending on how old the DEPT table is!):

scott@ORA10G> select scn_to_timestamp(min(ora_rowscn)) from dept;

select scn_to_timestamp(min(ora_rowscn)) from dept

       *

ERROR at line 1:

ORA-08181: specified number is not a valid system change number

ORA-06512: at "SYS.SCN_TO_TIMESTAMP", line 1

ORA-06512: at line 1

So that conversion cannot be relied on in the long term.

Optimistic or Pessimistic Locking?

So which method is best? In my experience, pessimistic locking works very well in Oracle (but perhaps not in other databases) and has many advantages over optimistic locking. However, it requires a stateful connection to the database, like a client/server connection. This is because locks are not held across connections. This single fact makes pessimistic locking unrealistic in many cases today. In the past, with client/server applications and a couple dozen or hundred users, it would have been my first and only choice. Today, however, optimistic concurrency control is what I would recommend for most applications. Having a connection for the entire duration of a transaction is just too high a price to pay.

Of the methods available, which do I use? I tend to use the version column approach with a timestamp column. It gives me the extra information "when was this row last updated" in a long-term sense. So it adds value in that way. It is less computationally expensive than a hash or checksum, and it doesn't run into the issues potentially encountered with a hash or checksum when processing LONG, LONG RAW, CLOB, BLOB, and other very large columns.

If I had to add optimistic concurrency controls to a table that was still being used with a pessimistic locking scheme (e.g., the table was accessed in both client/server applications and over the Web), I would opt for the ORA_ROWSCN approach. The reason is that the existing legacy application might not appreciate a new column appearing, or even if we took the additional step of hiding the extra column, we might not appreciate the overhead of the necessary trigger to maintain it. The ORA_ROWSCN technique would be nonintrusive and lightweight in that respect (well, after we get over the table re-creation, that is).

The hashing/checksum approach is very database independent, especially if we compute the hashes or checksums outside of the database. However, by performing the computations in the middle tier rather than the database, we will incur higher resource usage penalties, in terms of CPU usage and network transfers.

Blocking

Blocking occurs when one session holds a lock on a resource that another session is requesting. As a result, the requesting session will be blocked—it will "hang" until the holding session gives up the locked resource. In almost every case, blocking is avoidable. In fact, if you do find that your session is blocked in an interactive application, then you have probably been suffering from the lost update bug as well, perhaps without realizing it. That is, your application logic is flawed and that is the cause of the blocking.

The five common DML statements that will block in the database are INSERT, UPDATE, DELETE, MERGE, and SELECT FOR UPDATE. The solution to a blocked SELECT FOR UPDATE is trivial: simply add the NOWAIT clause and it will no longer block. Instead, your application will report back to the end user that the row is already locked. The interesting cases are the remaining four DML statements. We'll look at each of them and see why they should not block and how to correct the situation if they do.

Blocked Inserts

There are few times when an INSERT will block. The most common scenario is when you have a table with a primary key or unique constraint placed on it and two sessions attempt to insert a row with the same value. One of the sessions will block until the other session either commits (in which case the blocked session will receive an error about a duplicate value) or rolls back (in which case the blocked session succeeds). Another case involves tables linked together via referential integrity constraints. An insert into a child table may become blocked if the parent row it depends on is being created or deleted.

Blocked INSERTs typically happen with applications that allow the end user to generate the primary key/unique column value. This situation is most easily avoided by using a sequence to generate the primary key/unique column value. Sequences were designed to be a highly concurrent method of generating unique keys in a multiuser environment. In the event that you cannot use a sequence, you can use the following technique, which avoids the issue by using manual locks implemented via the built-in DBMS_LOCK package.

Note The following example demonstrates how to prevent a session from blocking on an insert due to a primary key or unique constraint. It should be stressed that the "fix" demonstrated here should be considered a short-term solution while the application architecture itself is inspected. This approach adds obvious overhead and should not be implemented lightly. A well-designed application would not encounter this issue. This should be considered a last resort and is definitely not something you want to do to every table in your application "just in case."

With inserts, there's no existing row to select and lock; there's no way to prevent others from inserting a row with the same value, thus blocking our session and causing us to wait indefinitely. Here is where DBMS_LOCK comes into play. To demonstrate this technique, we will create a table with a primary key and a trigger that will prevent two (or more) sessions from inserting the same values simultaneously. The trigger will use DBMS_UTILITY.GET_HASH_VALUE to hash the primary key into some number between 0 and 1,073,741,823 (the range of lock ID numbers permitted for our use by Oracle). In this example, I've chosen a hash table of size 1,024, meaning we will hash our primary keys into one of 1,024 different lock IDs. Then we will use DBMS_LOCK.REQUEST to allocate an exclusive lock based on that ID. Only one session at a time will be able to do that, so if someone else tries to insert a record into our table with the same primary key, then that person's lock request will fail (and the error resource busy will be raised):

Note To successfully compile this trigger, execute permission on DBMS_LOCK must be granted directly to your schema. The privilege to execute DBMS_LOCK may not come from a role.

scott@ORA10G> create table demo ( x int primary key );

Table created.



scott@ORA10G> create or replace trigger demo_bifer

  2  before insert on demo

  3  for each row

  4  declare

  5      l_lock_id  number;

  6      resource_busy  exception;

  7      pragma exception_init( resource_busy, -−54 );

  8  begin

  9      l_lock_id :=

 10         dbms_utility.get_hash_value( to_char( :new.x ), 0, 1024 );

 11      if ( dbms_lock.request

 12               (  id                => l_lock_id,

 13                  lockmode          => dbms_lock.x_mode,

 14                  timeout           => 0,

 15                  release_on_commit => TRUE ) <> 0 )

 16      then

 17          raise resource_busy;

 18      end if;

 19  end;

 20  /

Trigger created.

Now, if in two separate sessions we execute the following:

scott@ORA10G> insert into demo values ( 1 );

1 row created.

it will succeed in the first session but immediately issue the following in the second session:

scott@ORA10G> insert into demo values ( 1 );

insert into demo values ( 1 )

            *

ERROR at line 1:

ORA-00054: resource busy and acquire with NOWAIT specified

ORA-06512: at "SCOTT.DEMO_BIFER", line 14

ORA-04088: error during execution of trigger 'SCOTT.DEMO_BIFER'

The concept here is to take the supplied primary key value of the table protected by the trigger and put it in a character string. We can then use DBMS_UTILITY.GET_HASH_VALUE to come up with a "mostly unique" hash value for the string. As long as we use a hash table smaller than 1,073,741,823, we can "lock" that value exclusively using DBMS_LOCK.

After hashing, we take that value and use DBMS_LOCK to request that lock ID to be exclusively locked with a timeout of ZERO (this returns immediately if someone else has locked that value). If we timeout or fail for any reason, we raise ORA-54 Resource Busy. Otherwise, we do nothing—it is OK to insert, we won't block.

Of course, if the primary key of your table is an INTEGER and you don't expect the key to go over 1 billion, you can skip the hash and just use the number as the lock ID.

You'll need to play with the size of the hash table (1,024 in this example) to avoid artificial resource busy messages due to different strings hashing to the same number. The size of the hash table will be application (data)-specific, and it will be influenced by the number of concurrent insertions as well. Lastly, bear in mind that although Oracle has unlimited row-level locking, it has a finite number of enqueue locks. If you insert lots of rows this way without committing in your session, then you might find that you create so many enqueue locks that you exhaust the system of enqueue resources (you exceed the maximum value set in the ENQUEUE_RESOURCES system parameter), as each row will create another enqueue (a lock). If this does happen, you'll need to raise the value of the ENQUEUE_RESOURCES parameter. You might also add a flag to the trigger to allow people to turn the check on and off. If I was going to insert hundreds or thousands of records, for example, I might not want this check enabled.

Blocked Merges, Updates, and Deletes

In an interactive application—one where you query some data out of the database, allow an end user to manipulate it, and then "put it back" into the database—a blocked UPDATE or DELETE indicates that you probably have a lost update problem in your code (I would call it a bug in your code if you do). You are attempting to UPDATE a row that someone else is already updating (in other words, that someone else already has locked). You can avoid the blocking issue by using the SELECT FOR UPDATE NOWAIT query to

Verify the data has not changed since you queried it out (preventing lost updates).
Lock the row (preventing the UPDATE or DELETE from blocking).

As discussed earlier, you can do this regardless of the locking approach you take. Both pessimistic and optimistic locking may employ the SELECT FOR UPDATE NOWAIT query to verify the row has not changed. Pessimistic locking would use that statement the instant the user indicated her intention to modify the data. Optimistic locking would use that statement immediately prior to updating the data in the database. Not only will this resolve the blocking issue in your application, but also it will correct the data integrity issue.

Since a MERGE is simply an INSERT and UPDATE (and in 10g with the enhanced MERGE syntax, it's a DELETE as well), you would use both techniques simultaneously.

Deadlocks

Deadlocks occur when you have two sessions, each of which is holding a resource that the other wants. For example, if I have two tables, A and B in my database, and each has a single row in it, I can demonstrate a deadlock easily. All I need to do is open two sessions (e.g., two SQL*Plus sessions). In session A, I update table A. In session B, I update table B. Now, if I attempt to update table A in session B, I will become blocked. Session A has this row locked already. This is not a deadlock; it is just blocking. I have not yet deadlocked because there is a chance that session A will commit or roll back, and session B will simply continue at that point.

If I go back to session A and then try to update table B, I will cause a deadlock. One of the two sessions will be chosen as a "victim" and will have its statement rolled back. For example, the attempt by session B to update table A may be rolled back, with an error such as the following:

update a set x = x+1

       *

ERROR at line 1:

ORA-00060: deadlock detected while waiting for resource

Session A's attempt to update table B will remain blocked—Oracle will not roll back the entire transaction. Only one of the statements that contributed to the deadlock is rolled back. Session B still has the row in table B locked, and session A is patiently waiting for the row to become available. After receiving the deadlock message, session B must decide whether to commit the outstanding work on table B, roll it back, or continue down an alternate path and commit later. As soon as this session does commit or roll back, the other blocked session will continue on as if nothing happened.

Oracle considers deadlocks to be so rare, so unusual, that it creates a trace file on the server each and every time one does occur. The contents of the trace file will look something like this:

*** 2005-04-25 15:53:01.455

*** ACTION NAME:() 2005-04-25 15:53:01.455

*** MODULE NAME:(SQL*Plus) 2005-04-25 15:53:01.455

*** SERVICE NAME:(SYS$USERS) 2005-04-25 15:53:01.455

*** SESSION ID:(145.208) 2005-04-25 15:53:01.455

DEADLOCK DETECTED

Current SQL statement for this session:

update a set x = 1

The following deadlock is not an ORACLE error. It is a

deadlock due to user error in the design of an application

or from issuing incorrect ad-hoc SQL. The following

information may aid in determining the deadlock:...

Obviously, Oracle considers these application deadlocks a self-induced error on part of the application and, for the most part, Oracle is correct. Unlike in many other RDBMSs, deadlocks are so rare in Oracle they can be considered almost nonexistent. Typically, you must come up with artificial conditions to get one.

The number one cause of deadlocks in the Oracle database, in my experience, is unindexed foreign keys (the number two cause is bitmap indexes on tables subject to concurrent updates, which we'll cover in Chapter 11). Oracle will place a full table lock on a child table after modification of the parent table in two cases:

If you update the parent table's primary key (a very rare occurrence if you follow the rule of relational databases stating that primary keys should be immutable), the child table will be locked in the absence of an index on the foreign key.
If you delete a parent table row, the entire child table will be locked (in the absence of an index on the foreign key) as well.

These full table locks are a short-term occurrence in Oracle9i and above, meaning they need to be taken for the duration of the DML operation, not the entire transaction. Even so, they can and do cause large locking issues. As a demonstration of the first point, if we have a pair of tables set up as follows:

ops$tkyte@ORA10G> create table p ( x int primary key );

Table created.



ops$tkyte@ORA10G> create table c ( x references p );

Table created.



ops$tkyte@ORA10G> insert into p values ( 1 );

1 row created.



ops$tkyte@ORA10G> insert into p values ( 2 );

1 row created.



ops$tkyte@ORA10G> commit;

Commit complete.

and then we execute the following:

ops$tkyte@ORA10G> insert into c values ( 2 );

1 row created.

nothing untoward happens yet. But if we go into another session and attempt to delete the first parent record

ops$tkyte@ORA10G> delete from p where x = 1;

we'll find that session gets immediately blocked. It is attempting to gain a full table lock on table C before it does the delete. Now no other session can initiate a DELETE, INSERT, or UPDATE of any rows in C (the sessions that had already started may continue, but no new sessions may start to modify C).

This blocking would happen with an update of the primary key value as well. Because updating a primary key is a huge no-no in a relational database, this is generally not an issue with updates. Where I have seen this updating of the primary key become a serious issue is when developers use tools that generate SQL for them, and those tools update every single column, regardless of whether the end user actually modified that column or not. For example, say that we use Oracle Forms and create a default layout on any table. Oracle Forms by default will generate an update that modifies every single column in the table we choose to display. If we build a default layout on the DEPT table and include all three fields, Oracle Forms will execute the following command whenever we modify any of the columns of the DEPT table:

update dept set deptno=:1,dname=:2,loc=:3 where rowid=:4

In this case, if the EMP table has a foreign key to DEPT and there is no index on the DEPTNO column in the EMP table, then the entire EMP table will be locked during an update to DEPT. This is something to watch out for carefully if you are using any tools that generate SQL for you. Even though the value of the primary key does not change, the child table EMP will be locked after the execution of the preceding SQL statement. In the case of Oracle Forms, the solution is to set that table's UPDATE CHANGED COLUMNS ONLY property to YES. Oracle Forms will generate an UPDATE statement that includes only the changed columns (not the primary key).

Problems arising from deletion of a row in a parent table are far more common. As I demonstrated, if I delete a row in table P, then the child table, C, will become locked during the DML operation, thus preventing other updates against C from taking place for the duration of the transaction (assuming no one else was modifying C, of course; in which case the delete will wait). This is where the blocking and deadlock issues come in. By locking the entire table C, I have seriously decreased the concurrency in my database to the point where no one will be able to modify anything in C. In addition, I have increased the probability of a deadlock, since I now "own" lots of data until I commit. The probability that some other session will become blocked on C is now much higher; any session that tries to modify C will get blocked. Therefore, I'll start seeing lots of sessions that hold some preexisting locks on other resources getting blocked in the database. If any of these blocked sessions are, in fact, locking a resource that my session also needs, we will have a deadlock. The deadlock in this case is caused by my session preventing access to many more resources (in this case, all of the rows in a single table) than it ever needed. When someone complains of deadlocks in the database, I have them run a script that finds unindexed foreign keys, and 99 percent of the time we locate an offending table. By simply indexing that foreign key, the deadlocks—and lots of other contention issues—go away. The following example demonstrates the use of this script to locate the unindexed foreign key in table C:

ops$tkyte@ORA10G> column columns format a30 word_wrapped

ops$tkyte@ORA10G> column tablename format a15 word_wrapped

ops$tkyte@ORA10G> column constraint_name format a15 word_wrapped



ops$tkyte@ORA10G> select table_name, constraint_name,

  2       cname1 || nvl2(cname2,','||cname2,null) ||

  3       nvl2(cname3,','||cname3,null) || nvl2(cname4,','||cname4,null) ||

  4       nvl2(cname5,','||cname5,null) || nvl2(cname6,','||cname6,null) ||

  5       nvl2(cname7,','||cname7,null) || nvl2(cname8,','||cname8,null)

  6              columns

  7    from ( select b.table_name,

  8                  b.constraint_name,

  9                  max(decode( position, 1, column_name, null )) cname1,

 10                  max(decode( position, 2, column_name, null )) cname2,

 11                  max(decode( position, 3, column_name, null )) cname3,

 12                  max(decode( position, 4, column_name, null )) cname4,

 13                  max(decode( position, 5, column_name, null )) cname5,

 14                  max(decode( position, 6, column_name, null )) cname6,

 15                  max(decode( position, 7, column_name, null )) cname7,

 16                  max(decode( position, 8, column_name, null )) cname8,

 17                  count(*) col_cnt

 18             from (select substr(table_name,1,30) table_name,

 19                          substr(constraint_name,1,30) constraint_name,

 20                          substr(column_name,1,30) column_name,

 21                          position

 22                     from user_cons_columns ) a,

 23                  user_constraints b

 24            where a.constraint_name = b.constraint_name

 25              and b.constraint_type = 'R'

 26            group by b.table_name, b.constraint_name

 27         ) cons

 28   where col_cnt > ALL

 29           ( select count(*)

 30               from user_ind_columns i

 31              where i.table_name = cons.table_name

 32                and i.column_name in (cname1, cname2, cname3, cname4,

 33                                      cname5, cname6, cname7, cname8 )

 34                and i.column_position <= cons.col_cnt

 35              group by i.index_name

 36           )

 37  /



TABLE_NAME                     CONSTRAINT_NAME COLUMNS

------------------------------ --------------- ------------------------------

C                              SYS_C009485     X

This script works on foreign key constraints that have up to eight columns in them (if you have more than that, you probably want to rethink your design). It starts by building an inline view named CONS in the previous query. This inline view transposes the appropriate column names in the constraint from rows into columns, with the result being a row per constraint and up to eight columns that have the names of the columns in the constraint. Additionally, there is a column, COL_CNT, which contains the number of columns in the foreign key constraint itself. For each row returned from the inline view, we execute a correlated subquery that checks all of the indexes on the table currently being processed. It counts the columns in that index that match columns in the foreign key constraint and then groups them by index name. So, it generates a set of numbers, each of which is a count of matching columns in some index on that table. If the original COL_CNT is greater than all of these numbers, then there is no index on that table that supports that constraint. If COL_CNT is less than all of these numbers, then there is at least one index that supports that constraint. Note the use of the NVL2 function, which we used to "glue" the list of column names into a comma-separated list. This function takes three arguments: A, B, and C. If argument A is not null, then it returns argument B; otherwise, it returns argument C. This query assumes that the owner of the constraint is the owner of the table and index as well. If another user indexed the table or the table is in another schema (both rare events), it will not work correctly.

So, this script shows us that table C has a foreign key on the column X, but no index. By indexing X, we can remove this locking issue all together. In addition to this table lock, an unindexed foreign key can also be problematic in the following cases:

When you have an ON DELETE CASCADE and have not indexed the child table: For example, EMP is child of DEPT. DELETE DEPTNO = 10 should CASCADE to EMP. If DEPTNO in EMP is not indexed, you will get a full table scan of EMP for each row deleted from the DEPT table. This full scan is probably undesirable, and if you delete many rows from the parent table, the child table will be scanned once for each parent row deleted.
When you query from the parent to the child: Consider the EMP/DEPT example again. It is very common to query the EMP table in the context of a DEPTNO. If you frequently run the following query (say, to generate a report), you'll find that not having the index in place will slow down the queries:
- select * from dept, emp
- where emp.deptno = dept.deptno and dept.deptno = :X;

So, when do you not need to index a foreign key? The answer is, in general, when the following conditions are met:

You do not delete from the parent table.
You do not update the parent table's unique/primary key value (watch for unintended updates to the primary key by tools!).
You do not join from the parent to the child (like DEPT to EMP).

If you satisfy all three conditions, feel free to skip the index—it is not needed. If you meet any of the preceding conditions, be aware of the consequences. This is the one rare instance when Oracle tends to "overlock" data.

Lock Escalation

When lock escalation occurs, the system is decreasing the granularity of your locks. An example would be the database system turning your 100 row-level locks against a table into a single table-level lock. You are now using "one lock to lock everything" and, typically, you are also locking a whole lot more data than you were before. Lock escalation is used frequently in databases that consider a lock to be a scarce resource and overhead to be avoided.

Note Oracle will never escalate a lock. Never.

Oracle never escalates locks, but it does practice lock conversion or lock promotion—terms that are often confused with lock escalation.

Note The terms "lock conversion" and "lock promotion" are synonymous. Oracle typically refers to the process as "lock conversion."

Oracle will take a lock at the lowest level possible (i.e., the least restrictive lock possible) and convert that lock to a more restrictive level if necessary. For example, if you select a row from a table with the FOR UPDATE clause, two locks will be created. One lock is placed on the row(s) you selected (and this will be an exclusive lock; no one else can lock that specific row in exclusive mode). The other lock, a ROW SHARE TABLE lock, is placed on the table itself. This will prevent other sessions from placing an exclusive lock on the table and thus prevent them from altering the structure of the table, for example. Another session can modify any other row in this table without conflict. As many commands as possible that could execute successfully given there is a locked row in the table will be permitted.

Lock escalation is not a database "feature." It is not a desired attribute. The fact that a database supports lock escalation implies there is some inherent overhead in its locking mechanism and significant work is performed to manage hundreds of locks. In Oracle, the overhead to have 1 lock or 1 million locks is the same: none.

Lock Types

The three general classes of locks in Oracle are as follows:

DML locks: DML stands for Data Manipulation Language. In general this means SELECT, INSERT, UPDATE, MERGE, and DELETE statements. DML locks are the mechanism that allows for concurrent data modifications. DML locks will be, for example, locks on a specific row of data or a lock at the table level that locks every row in the table.
DDL locks: DDL stands for Data Definition Language (CREATE and ALTER statements, and so on). DDL locks protect the definition of the structure of objects.
Internal locks and latches: Oracle uses these locks to protect its internal data structures. For example, when Oracle parses a query and generates an optimized query plan, it will "latch" the library cache to put that plan in there for other sessions to use. A latch is a lightweight, low-level serialization device employed by Oracle, similar in function to a lock. Do not confuse or be misled by the term "lightweight"—latches are a common cause of contention in the database, as you will see. They are lightweight in their implementation, but not in their effect.

We will now take a more detailed look at the specific types of locks within each of these general classes and the implications of their use. There are more lock types than I can cover here. The ones I cover in the sections that follow are the most common and are held for a long duration. The other types of lock are generally held for very short periods of time.

DML Locks

DML locks are used to ensure that only one person at a time modifies a row and that no one can drop a table upon which you are working. Oracle will place these locks for you, more or less transparently, as you do work.

TX (Transaction) Locks

A TX lock is acquired when a transaction initiates its first change, and it is held until the transaction performs a COMMIT or ROLLBACK. It is used as a queuing mechanism so that other sessions can wait for the transaction to complete. Each and every row you modify or SELECT FOR UPDATE in a transaction will "point" to an associated TX lock for that transaction. While this sounds expensive, it is not. To understand why this is, you need a conceptual understanding of where locks "live" and how they are managed. In Oracle, locks are stored as an attribute of the data (see Chapter 10 for an overview of the Oracle block format). Oracle does not have a traditional lock manager that keeps a long list of every row that is locked in the system. Many other databases do it that way because, for them, locks are a scarce resource, the use of which needs to be monitored. The more locks are in use, the more these systems have to manage, so it is a concern in these systems if "too many" locks are being used.

In a database with a traditional memory-based lock manager, the process of locking a row would resemble the following:

Find the address of the row you want to lock.
Get in line at the lock manager (which must be serialized, as it is a common in-memory structure).
Lock the list.
Search through the list to see if anyone else has locked this row.
Create a new entry in the list to establish the fact that you have locked the row.
Unlock the list.
Now that you have the row locked, you can modify it. Later, as you commit your changes you must continue the procedure as follows:
Get in line again.
Lock the list of locks.
Search through the list and release all of your locks.
Unlock the list.

As you can see, the more locks acquired, the more time spent on this operation, both before and after modifying the data. Oracle does not do it that way. Oracle's process looks like this:

Find the address of the row you want to lock.
Go to the row.
Lock the row (waiting for the transaction that has it locked to end if it is already locked, unless you are using the NOWAIT option).

That's it. Since the lock is stored as an attribute of the data, Oracle does not need a traditional lock manager. The transaction will simply go to the data and lock it (if it is not locked already). The interesting thing is that the data may appear locked when you get to it, even if it is not. When you lock rows of data in Oracle, the row points to a copy of the transaction ID that is stored with the block containing the data, and when the lock is released that transaction ID is left behind. This transaction ID is unique to your transaction and represents the rollback segment number, slot, and sequence number. You leave that on the block that contains your row to tell other sessions that you "own" this data (not all of the data on the block—just the one row you are modifying). When another session comes along, it sees the lock ID and, using the fact that it represents a transaction, it can quickly see if the transaction holding the lock is still active. If the lock is not active, the session is allowed access to the data. If the lock is still active, that session will ask to be notified as soon as the lock is released. Hence, you have a queuing mechanism: the session requesting the lock will be queued up waiting for that transaction to complete, and then it will get the data.

Here is a small example showing how this happens, using three V$ tables:

V$TRANSACTION, which contains an entry for every active transaction.
V$SESSION, which shows us the sessions logged in.
V$LOCK, which contains an entry for all enqueue locks being held as well as for sessions that are waiting on locks. You will not see a row in this view for each row locked in this table by a session. As stated earlier, that master list of locks at the row level doesn't exist. If a session has one row in the EMP table locked, there will be one row in this view for that session indicating that fact. If a session has millions of rows in the EMP table locked, there will still be just one row in this view. This view shows what enqueue locks individual sessions have.

First, let's start a transaction (if you don't have a copy of the DEPT table, simply make one using a CREATE TABLE AS SELECT):

ops$tkyte@ORA10G> update dept set deptno = deptno+10;

4 rows updated.

Now, let's look at the state of the system at this point. This example assumes a single-user system; otherwise, you may see many rows in V$TRANSACTION. Even in a single-user system, do not be surprised to see more than one row in V$TRANSACTION, as many of the background Oracle processes may be performing a transaction as well.

ops$tkyte@ORA10G> select username,

  2         v$lock.sid,

  3         trunc(id1/power(2,16)) rbs,

  4         bitand(id1,to_number('ffff','xxxx'))+0 slot,

  5         id2 seq,

  6         lmode,

  7         request

  8  from v$lock, v$session

  9  where v$lock.type = 'TX'

 10    and v$lock.sid = v$session.sid

 11    and v$session.username = USER;



USERNAME   SID RBS SLOT    SEQ LMODE REQUEST

--------- ---- --- ---- ------ ----- -------

OPS$TKYTE  145   4   12  16582     6       0



ops$tkyte@ORA10G> select XIDUSN, XIDSLOT, XIDSQN

  2    from v$transaction;



    XIDUSN    XIDSLOT     XIDSQN

---------- ---------- ----------

         4         12      16582

The interesting points to note here are as follows:

The LMODE is 6 in the V$LOCK table and the request is 0. If you refer to the definition of the V$LOCK table in Oracle Server Reference manual, you will find that LMODE=6 is an exclusive lock. A value of 0 in the request means you are not making a request; you have the lock.
There is only one row in this table. This V$LOCK table is more of a queuing table than a lock table. Many people expect there would be four rows in V$LOCK since we have four rows locked. What you must remember, however, is that Oracle does not store a master list of every row locked anywhere. To find out if a row is locked, we must go to that row.
I took the ID1 and ID2 columns and performed some manipulation on them. Oracle needed to save three 16-bit numbers, but only had two columns in order to do it. So, the first column ID1 holds two of these numbers. By dividing by 2^16 with trunc(id1/power(2,16)) rbs, and by masking out the high bits with bitand(id1,to_number('ffff','xxxx'))+0 slot, I am able to get back the two numbers that are hiding in that one number.
The RBS, SLOT, and SEQ values match the V$TRANSACTION information. This is my transaction ID.

Now we'll start another session using the same username, update some rows in EMP, and then try to update DEPT:

ops$tkyte@ORA10G> update emp set ename = upper(ename);

14 rows updated.



ops$tkyte@ORA10G> update dept set deptno = deptno-10;

We're now blocked in this session. If we run the V$ queries again, we see the following:

ops$tkyte@ORA10G> select username,

  2         v$lock.sid,

  3         trunc(id1/power(2,16)) rbs,

  4         bitand(id1,to_number('ffff','xxxx'))+0 slot,

  5         id2 seq,

  6         lmode,

  7         request

  8  from v$lock, v$session

  9  where v$lock.type = 'TX'

 10    and v$lock.sid = v$session.sid

 11    and v$session.username = USER;



USERNAME   SID RBS SLOT    SEQ LMODE REQUEST

--------- ---- --- ---- ------ ----- -------

OPS$TKYTE  144   4   12  16582     0       6

OPS$TKYTE  144   5   34   1759     6       0

OPS$TKYTE  145   4   12  16582     6       0

ops$tkyte@ORA10G> select XIDUSN, XIDSLOT, XIDSQN

  2    from v$transaction;



    XIDUSN    XIDSLOT     XIDSQN

---------- ---------- ----------

         5         34       1759

         4         12      16582

What we see here is that a new transaction has begun, with a transaction ID of (5,34,1759). Our new session, SID=144, has two rows in V$LOCK this time. One row represents the locks that it owns (where LMODE=6). It also has a row in there that shows a REQUEST with a value of 6. This is a request for an exclusive lock. The interesting thing to note here is that the RBS/SLOT/SEQ values of this request row are the transaction ID of the holder of the lock. The transaction with SID=145 is blocking the transaction with SID=144. We can see this more explicitly simply by doing a self-join of V$LOCK:

ops$tkyte@ORA10G> select

  2        (select username from v$session where sid=a.sid) blocker,

  3         a.sid,

  4        ' is blocking ',

  5         (select username from v$session where sid=b.sid) blockee,

  6             b.sid

  7    from v$lock a, v$lock b

  8   where a.block = 1

  9     and b.request > 0

 10     and a.id1 = b.id1

 11     and a.id2 = b.id2;



BLOCKER    SID 'ISBLOCKING'  BLOCKEE   SID

--------- ---- ------------- --------- ----

OPS$TKYTE  145 is blocking   OPS$TKYTE 144

Now, if we commit our original transaction, SID=145, and rerun our lock query, we find that the request row has gone:

ops$tkyte@ORA10G> select username,

  2         v$lock.sid,

  3         trunc(id1/power(2,16)) rbs,

  4         bitand(id1,to_number('ffff','xxxx'))+0 slot,

  5         id2 seq,

  6         lmode,

  7         request

  8  from v$lock, v$session

  9  where v$lock.type = 'TX'

 10    and v$lock.sid = v$session.sid

 11    and v$session.username = USER;

USERNAME   SID RBS SLOT    SEQ LMODE REQUEST

--------- ---- --- ---- ------ ----- -------

OPS$TKYTE  144   5   34   1759     6       0



ops$tkyte@ORA10G> select XIDUSN, XIDSLOT, XIDSQN

  2    from v$transaction;



    XIDUSN    XIDSLOT     XIDSQN

---------- ---------- ----------

         5         34       1759

The request row disappeared the instant the other session gave up its lock. That request row was the queuing mechanism. The database is able to wake up the blocked sessions the instant the transaction is completed. There are infinitely more "pretty" displays with various GUI tools, but in a pinch, having knowledge of the tables you need to look at is very useful.

However, before we can say that we have a good understanding of how the row locking in Oracle works, we must look at one last topic: how the locking and transaction information is managed with the data itself. It is part of the block overhead. In Chapter 9, we'll get into the details of the block format, but suffice it to say that at the top of a database block is some leading "overhead" space in which to store a transaction table for that block. This transaction table contains an entry for each "real" transaction that has locked some data in that block. The size of this structure is controlled by two physical attribute parameters on the CREATE statement for an object:

INITRANS: The initial, preallocated size of this structure. This defaults to 2 for indexes and tables (regardless of what Oracle SQL Reference says, I have filed the documentation bug regarding that).
MAXTRANS: The maximum size to which this structure may grow. It defaults to 255 and has a minimum of 2, practically. In Oracle 10g, this setting has been deprecated, so it no longer applies. MAXTRANS is 255 regardless in that release.

Each block starts life with, by default, two transaction slots. The number of simultaneous active transactions that a block can ever have is constrained by the value of MAXTRANS and by the availability of space on the block. You may not be able to achieve 255 concurrent transactions on the block if there is not sufficient space to grow this structure.

We can artificially demonstrate how this works by creating a table with a constrained MAXTRANS. We'll need to use Oracle9i or before for this, since in Oracle 10g MAXTRANS is ignored. In Oracle 10g, even if MAXTRANS is set, Oracle will grow the transaction table, as long as there is room on the block to do so. In Oracle9i and before, once the MAXTRANS value is reached for that block, the transaction table will not grow, for example:

ops$tkyte@ORA9IR2> create table t ( x int ) maxtrans 2;

Table created.



ops$tkyte@ORA9IR2> insert into t select rownum from all_users;

24 rows created.



ops$tkyte@ORA9IR2> commit;

Commit complete.



ops$tkyte@ORA9IR2> select distinct dbms_rowid.rowid_block_number(rowid) from t;



DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)

------------------------------------

                                  18

So, we have 24 rows and we've verified they are all on the same database block. Now, in one session we issue

ops$tkyte@ORA9IR2> update t set x = 1 where x = 1;

1 row updated.

and in another, we issue

ops$tkyte@ORA9IR2> update t set x = 2 where x = 2;

1 row updated.

Finally, in a third session, we issue

ops$tkyte@ORA9IR2> update t set x = 3 where x = 3;

Now, since those three rows are on the same database block, and we set MAXTRANS (the maximum degree of concurrency for that block) to 2, the third session will be blocked.

Note Remember, in Oracle 10g this blocking will not happen in this example—MAXTRANS is set to 255 regardless. There would have to be insufficient space on the block to grow the transaction table to see this blocking in that release.

This example demonstrates what happens when more than one MAXTRANS transaction attempts to access the same block simultaneously. Similarly, blocking may also occur if the INITRANS is set low and there is not enough space on a block to dynamically expand the transaction. In most cases, the default of 2 for INITRANS is sufficient, as the transaction table will dynamically grow (space permitting), but in some environments you may need to increase this setting to increase concurrency and decrease waits. An example of when you might need to do this would be on a table or, even more frequently, on an index (since index blocks can get many more rows on them than a table can typically hold) that is frequently modified. You may need to increase either PCTFREE (discussed in Chapter 10) or INITRANS to set aside ahead of time sufficient space on the block for the number of expected concurrent transactions. This is especially true if you anticipate the blocks will be nearly full to begin with, meaning there is no room for the dynamic expansion of the transaction structure on the block.

TM (DML Enqueue) Locks

TM locks are used to ensure that the structure of a table is not altered while you are modifying its contents. For example, if you have updated a table, you will acquire a TM lock on that table. This will prevent another user from executing DROP or ALTER commands on that table. If another user attempts to perform DDL on the table while you have a TM lock on it, he'll receive the following error message:

drop table dept

           *

ERROR at line 1:

ORA-00054: resource busy and acquire with NOWAIT specified

This is a confusing message at first, since there is no method to specify NOWAIT or WAIT on a DROP TABLE at all. It is just the generic message you get when you attempt to perform an operation that would be blocked, but the operation does not permit blocking. As you've seen before, it's the same message you get if you issue a SELECT FOR UPDATE NOWAIT against a locked row.

The following shows how these locks would appear in the V$LOCK table:

ops$tkyte@ORA10G> create table t1 ( x int );

Table created.



ops$tkyte@ORA10G> create table t2 ( x int );

Table created.



ops$tkyte@ORA10G> insert into t1 values ( 1 );

1 row created.



ops$tkyte@ORA10G> insert into t2 values ( 1 );

1 row created.



ops$tkyte@ORA10G> select (select username

  2                         from v$session

  3                        where sid = v$lock.sid) username,

  4         sid,

  5         id1,

  6         id2,

  7         lmode,

  8         request, block, v$lock.type

  9    from v$lock

 10   where sid = (select sid

 11                  from v$mystat

 12                 where rownum=1)

 13  /

USERNAME   SID     ID1    ID2 LMODE REQUEST BLOCK TYPE

--------- ---- ------- ------ ----- ------- ----- ----

OPS$TKYTE  161  262151  16584     6       0     0 TX

OPS$TKYTE  161   62074      0     3       0     0 TM

OPS$TKYTE  161   62073      0     3       0     0 TM



ops$tkyte@ORA10G> select object_name, object_id

  2    from user_objects

  3   where object_name in ('T1','T2')

  4  /



OBJECT_NAME  OBJECT_ID

------------ ----------

T1           62073

T2           62074

Whereas we get only one TX lock per transaction, we can get as many TM locks as the objects we modify. Here, the interesting thing is that the ID1 column for the TM lock is the object ID of the DML-locked object, so it easy to find the object on which the lock is being held.

An interesting aside to the TM lock: the total number of TM locks allowed in the system is configurable by you (for details, see the DML_LOCKS parameter definition in the Oracle Database Reference manual). It may in fact be set to zero. This does not mean that your database becomes a read-only database (no locks), but rather that DDL is not permitted. This is useful in very specialized applications, such as RAC implementations, to reduce the amount of intra-instance coordination that would otherwise take place. You can also remove the ability to gain TM locks on an object-by-object basis using the ALTER TABLE TABLENAME DISABLE TABLE LOCK command. This is a quick way to make it "harder" to accidentally drop a table, as you will have to re-enable the table lock before dropping the table. It can also be used to detect a full table lock as a result of the unindexed foreign key we discussed previously.

DDL Locks

DDL locks are automatically placed against objects during a DDL operation to protect them from changes by other sessions. For example, if I perform the DDL operation ALTERTABLE T, the table T will have an exclusive DDL lock placed against it, preventing other sessions from getting DDL locks and TM locks on this table. DDL locks are held for the duration of the DDL statement and are released immediately afterward. This is done, in effect, by always wrapping DDL statements in implicit commits (or a commit/rollback pair). For this reason, DDL always commits in Oracle. Every CREATE, ALTER, and so on statement is really executed as shown in this pseudo-code:

Begin

   Commit;

   DDL-STATEMENT

   Commit;

Exception

   When others then rollback;

End;

So, DDL will always commit, even if it is unsuccessful. DDL starts by committing—be aware of this. It commits first so that if it has to roll back, it will not roll back your transaction. If you execute DDL, it will make permanent any outstanding work you have performed, even if the DDL is not successful. If you need to execute DDL, but you do not want it to commit your existing transaction, you may use an autonomous transaction.

There are three types of DDL locks:

Exclusive DDL locks: These prevent other sessions from gaining a DDL lock or TM (DML) lock themselves. This means that you may query a table during a DDL operation, but you may not modify it in any way.
Share DDL locks: These protect the structure of the referenced object against modification by other sessions, but allow modifications to the data.
Breakable parse locks: These allow an object, such as a query plan cached in the Shared pool, to register its reliance on some other object. If you perform DDL against that object, Oracle will review the list of objects that have registered their dependence and invalidate them. Hence, these locks are "breakable"—they do not prevent the DDL from occurring.

Most DDL takes an exclusive DDL lock. If you issue a statement such as

Alter table t add new_column date;

the table T will be unavailable for modifications during the execution of that statement. The table may be queried using SELECT during this time, but most other operations will be prevented, including all DDL statements. In Oracle, some DDL operations may now take place without DDL locks. For example, I can issue the following:

create index t_idx on t(x) ONLINE;

The ONLINE keyword modifies the method by which the index is actually built. Instead of taking an exclusive DDL lock, preventing modifications of data, Oracle will only attempt to acquire a low-level (mode 2) TM lock on the table. This will effectively prevent other DDL from taking place, but it will allow DML to occur normally. Oracle accomplishes this feat by keeping a record of modifications made to the table during the DDL statement and applying these changes to the new index as it finishes the CREATE. This greatly increases the availability of data.

Other types of DDL take share DDL locks. These are taken out against dependent objects when you create stored, compiled objects, such as procedures and views. For example, if you execute

Create view MyView

as

select *

  from emp, dept

  where emp.deptno = dept.deptno;

share DDL locks will be placed against both EMP and DEPT, while the CREATE VIEW command is being processed. You can modify the contents of these tables, but you cannot modify their structure.

The last type of DDL lock is a breakable parse lock. When your session parses a statement, a parse lock is taken against every object referenced by that statement. These locks are taken in order to allow the parsed, cached statement to be invalidated (flushed) in the Shared pool if a referenced object is dropped or altered in some way.

A view that is invaluable for looking at this information is DBA_DDL_LOCKS. There is no V$ view for you to look at. The DBA_DDL_LOCKS view is built on the more mysterious X$ tables and, by default, it will not be installed in your database. You can install this and other locking views by running the catblock.sql script found in the directory [ORACLE_HOME]/rdbms/admin. This script must be executed as the user SYS in order to succeed. Once you have executed this script, you can run a query against the view. For example, in a single-user database I see the following:

ops$tkyte@ORA10G> select session_id sid, owner, name, type,

  2      mode_held held, mode_requested request

  3  from dba_ddl_locks;



 SID OWNER     NAME                  TYPE                 HELD REQUEST

---- --------- --------------------- -------------------- ---- ---------

 161 SYS       DBMS_UTILITY          Body                 Null None

 161 SYS       DBMS_UTILITY          Body                 Null None

 161 SYS       DBMS_APPLICATION_INFO Table/Procedure/Type Null None

 161 OPS$TKYTE OPS$TKYTE             18                   Null None

 161 SYS       DBMS_OUTPUT           Body                 Null None

 161 SYS       DATABASE              18                   Null None

 161 SYS       DBMS_UTILITY          Table/Procedure/Type Null None

 161 SYS       DBMS_UTILITY          Table/Procedure/Type Null None

 161 SYS       PLITBLM               Table/Procedure/Type Null None

 161 SYS       DBMS_APPLICATION_INFO Body                 Null None

 161 SYS       DBMS_OUTPUT           Table/Procedure/Type Null None



11 rows selected.

These are all the objects that my session is "locking." I have breakable parse locks on a couple of the DBMS_* packages. These are a side effect of using SQL*Plus; it calls DBMS_APPLICATION_INFO, for example. I may see more than one copy of various objects here—this is normal, and it just means I have more than one thing I'm using in the Shared pool that references these objects. It is interesting to note that in the view, the OWNER column is not the owner of the lock; rather, it is the owner of the object being locked. This is why you see many SYS rows. SYS owns these packages, but they all belong to my session.

To see a breakable parse lock in action, let's first create and run a stored procedure, P:

ops$tkyte@ORA10G> create or replace procedure p as begin null; end;

  2 /

Procedure created.



ops$tkyte@ORA10G> exec p

PL/SQL procedure successfully completed.

The procedure, P, will now show up in the DBA_DDL_LOCKS view. We have a parse lock on it:

ops$tkyte@ORA10G> select session_id sid, owner, name, type,

  2         mode_held held, mode_requested request

  3    from dba_ddl_locks

  4  /



 SID OWNER     NAME                  TYPE                 HELD REQUEST

---- --------- --------------------- -------------------- ---- ---------

 161 OPS$TKYTE P                     Table/Procedure/Type Null None

 161 SYS       DBMS_UTILITY          Body                 Null None

 161 SYS       DBMS_UTILITY          Body                 Null None

 ...

 161 SYS       DBMS_OUTPUT           Table/Procedure/Type Null None



12 rows selected.

We then recompile our procedure and query the view again:

ops$tkyte@ORA10G> alter procedure p compile;

Procedure altered.



ops$tkyte@ORA10G> select session_id sid, owner, name, type,

  2         mode_held held, mode_requested request

  3    from dba_ddl_locks

  4  /



 SID OWNER     NAME                  TYPE                 HELD REQUEST

---- --------- --------------------- -------------------- ---- ---------

 161 SYS       DBMS_UTILITY          Body                 Null None

 161 SYS       DBMS_UTILITY          Body                 Null None

...

 161 SYS       DBMS_OUTPUT           Table/Procedure/Type Null None



11 rows selected.

We find that P is now missing from the view. Our parse lock has been broken.

This view is useful to you, as a developer, when it is found that some piece of code won't compile in the test or development system—it hangs and eventually times out. This indicates that someone else is using it (actually running it), and you can use this view to see who that might be. The same will happen with GRANTS and other types of DDL against the object. You cannot grant EXECUTE on a procedure that is running, for example. You can use the same method to discover the potential blockers and waiters.

Latches

Latches are lightweight serialization devices used to coordinate multiuser access to shared data structures, objects, and files.

Latches are locks designed to be held for extremely short periods of time—for example, the time it takes to modify an in-memory data structure. They are used to protect certain memory structures, such as the database block buffer cache or the library cache in the Shared pool. Latches are typically requested internally in a "willing to wait" mode. This means that if the latch is not available, the requesting session will sleep for a short period of time and retry the operation later. Other latches may be requested in an "immediate" mode, which is similar in concept to a SELECT FOR UPDATE NOWAIT, meaning that the process will go do something else, such as try to grab an equivalent sibling latch that may be free, rather than sit and wait for this latch to become available. Since many requestors may be waiting for a latch at the same time, you may see some processes waiting longer than others. Latches are assigned rather randomly, based on the luck of the draw, if you will. Whichever session asks for a latch right after it was released will get it. There is no line of latch waiters—just a mob of waiters constantly retrying.

Oracle uses atomic instructions like "test and set" and "compare and swap" for operating on latches. Since the instructions to set and free latches are atomic, the operating system itself guarantees that only one process gets to test and set the latch even though many processes may be going for it simultaneously. Since the instruction is only one instruction, it can be quite fast. Latches are held for short periods of time and provide a mechanism for cleanup in case a latch holder "dies" abnormally while holding it. This cleanup process would be performed by PMON.

Enqueues, which were discussed earlier, are another, more sophisticated serialization device used when updating rows in a database table, for example. They differ from latches in that they allow the requestor to "queue up" and wait for the resource. With a latch request, the requestor session is told right away whether or not it got the latch. With an enqueue lock, the requestor session will be blocked until it can actually attain it.

Note Using SELECT FOR UPDATE NOWAIT or WAIT [n], you can optionally decide not to wait for an enqueue lock if your session would be blocked, but if you do block and wait, you will wait in a queue.

As such, an enqueue is not as fast as a latch can be, but it does provided functionality over and above what a latch can offer. Enqueues may be obtained at various levels, so you can have many shared locks and locks with various degrees of shareability.

Latch "Spinning"

One thing I'd like to drive home with regard to latches is this: latches are a type of lock, locks are serialization devices, and serialization devices inhibit scalability. If your goal is to construct an application that scales well in an Oracle environment, you must look for approaches and solutions that minimize the amount of latching you need to perform.

Even seemingly simple activities, such as parsing a SQL statement, acquire and release hundreds or thousands of latches on the library cache and related structures in the Shared pool. If we have a latch, then someone else might be waiting for it. When we go to get a latch, we may well have to wait for it ourselves.

Waiting for a latch can be an expensive operation. If the latch is not available immediately and we are willing to wait for it, as we likely are most of the time, then on a multi-CPU machine our session will spin—trying over and over, in a loop, to get the latch. The reasoning behind this is that context switching (i.e., getting "kicked off" the CPU and having to get back on the CPU) is expensive. So, if the process cannot get a latch immediately, we'll stay on the CPU and try again immediately rather than just going to sleep, giving up the CPU, and trying later when we'll have to get scheduled back on the CPU. The hope is that the holder of the latch is busy processing on the other CPU (and since latches are designed to be held for very short periods of time, this is likely) and will give it up soon. If after spinning and constantly trying to get the latch, we still fail to obtain it, only then will our process sleep, or take itself off of the CPU, and let some other work take place. The pseudo-code to get a latch get might look like this:

Attempt to get Latch

If Latch gotten

Then

  return SUCCESS

Else

  Misses on that Latch = Misses+1;

  Loop

    Sleeps on Latch = Sleeps + 1

      For I in 1 .. 2000

      Loop

        Attempt to get Latch

        If Latch gotten

        Then

          Return SUCCESS

        End if

      End loop

    Go to sleep for short period

  End loop

End if

The logic is to try to get the latch and, failing that, to increment the miss count—a statistic we can see in a Statspack report or by querying the V$LATCH view directly. Once the process misses, it will loop some number of times (an undocumented parameter controls the number of times and is typically set to 2,000), attempting to get the latch over and over. If one of these get attempts succeeds, then it returns and we continue processing. If they all fail, the process will go to sleep for a short duration of time, after incrementing the sleep count for that latch. Upon waking up, the process begins all over again. This implies that the cost of getting a latch is not just the "test and set"-type operation that takes place, but can also be a considerable amount of CPU while we try to get the latch. Our system will appear to be very busy (with much CPU being consumed), but not much work is getting done.

Measuring the Cost of Latching a Shared Resource

As an example, we'll study the cost of latching the Shared pool. We'll compare a well-written program (one that uses bind variables) and a program that is not so well written (it uses literal SQL, or unique SQL for each statement). To do this, we'll use a very small Java program that simply logs into Oracle, turns off auto-commit (as all Java programs should do immediately after connecting to a database), and executes 25,000 unique INSERT statements in a loop. We'll perform two sets of tests: our program will not use bind variables in the first set, and in the second set it will.

To evaluate these programs and their behavior in a multiuser environment, I opted to use Statspack to gather the metrics, as follows:

Execute a Statspack snapshot to gather the current state of the system.
Run N copies of the program, having each program INSERT into its own database table so as to avoid the contention associated with having all programs trying to insert into a single table.
Take another snapshot immediately after the last copy of the program finishes.

Then it is a simple matter of printing out the Statspack report and finding out how long it took N copies of the program to complete, how much CPU was used, what the major wait events occurred, and so on.

These tests were performed on a dual-CPU machine with hyperthreading enabled (making it appear as if there were four CPUs). Given that there were two physical CPUs, you might expect very linear scaling here—that is, if one user uses 1 unit of CPU to process her inserts, then you might expect that two users would require 2 units of CPU. You'll discover that this premise, while sounding plausible, may well be inaccurate (just how inaccurate depends on your programming technique, as you'll see). It would be correct if the processing we were performing needed no shared resource, but our process will use a shared resource, namely the Shared pool. We need to latch the Shared pool to parse SQL statements, and we need to latch the Shared pool because it is a shared data structure, and we cannot modify it while others are reading it and we cannot read it while it is being modified.

Note I've performed these tests using Java, PL/SQL, Pro*C, and other languages. The end results are very much the same every time. This demonstration and discussion applies to all languages and all interfaces to the database. I chose Java for this example as I find Java and Visual Basic applications are most likely to not use bind variables when working with the Oracle database.

Without Bind Variables

In the first instance, our program will not use bind variables, but rather will use string concatenation to insert data:

import java.sql.*;

public class instest

{

   static public void main(String args[]) throws Exception

   {

      DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());

      Connection

        conn = DriverManager.getConnection

               ("jdbc:oracle:thin:@dellpe:1521:ora10gr1",

                "scott","tiger");

     conn.setAutoCommit( false );

     Statement stmt = conn.createStatement();

     for( int i = 0; i < 25000; i++ )

     {

       stmt.execute

       ("insert into "+ args[0] +

         " (x) values(" + i + ")" );

     }

     conn.commit();

     conn.close();

   }

}

I ran the test in "single user" mode and the Statspack report came back with this information:

   Elapsed:               0.52 (mins)


Cache Sizes (end)

~~~~~~~~~~~~~~~~~

               Buffer Cache:       768M      Std Block Size:         8K

           Shared Pool Size:       244M          Log Buffer:     1,024K



Load Profile

~~~~~~~~~~~~                            Per Second       Per Transaction

                                   ---------------       ---------------

...

                     Parses:                810.58             12,564.00

                Hard parses:                807.16             12,511.00

....

Top 5 Timed Events

~~~~~~~~~~~~~~~~~~                                                      % Total

Event                                               Waits    Time (s) Call Time

-------------------------------------------- ------------ ----------- ---------

CPU time                                                           26     55.15

class slave wait                                        2          10     21.33

Queue Monitor Task Wait                                 2          10     21.33

log file parallel write                                48           1      1.35

control file parallel write                            14           0      .51

I included the SGA configuration for reference, but the relevant statistics are as follows:

Elapsed time of approximately 30 seconds
807 hard parses per second
26 CPU seconds used

Now, if we were to run two of these programs simultaneously, we might expect the hard parsing to jump to about 1,600 per second (we have two CPUs available, after all) and the CPU time to double to perhaps 52 CPU seconds. Let's take a look:

   Elapsed:               0.78 (mins)



Load Profile

~~~~~~~~~~~~                            Per Second       Per Transaction

                                   ---------------       ---------------

                     Parses:              1,066.62             16,710.33

                Hard parses:              1,064.28             16,673.67



Top 5 Timed Events

~~~~~~~~~~~~~~~~~~                                                      % Total

Event                                               Waits    Time (s) Call Time

-------------------------------------------- ------------ ----------- ---------

CPU time                                                           74     97.53

log file parallel write                                53           1      1.27

latch: shared pool                                    406           1       .66

control file parallel write                            21           0       .45

log file sync                                           6           0       .04

What we discover is that the hard parsing goes up a little bit, but the CPU time triples rather than doubles! How could that be? The answer lies in Oracle's implementation of latching. On this multi-CPU machine, when we could not immediately get a latch, we "spun." The act of spinning itself consumes CPU. Process 1 attempted many times to get a latch onto the Shared pool only to discover that process 2 held that latch, so process 1 had to spin and wait for it (consuming CPU). The converse would be true for process 2—many times it would find that process 1 was holding the latch to the resource it needed. So, much of our processing time was spent not doing real work, but waiting for a resource to become available. If we page down through the Statspack report to the "Latch Sleep Breakdown" report, we discover the following:

Latch Name            Requests      Misses   Sleeps Sleeps 1->3+

---------------- ------------- ----------- -------- ------------

shared pool          1,126,006     229,537      406 229135/398/4/0

library cache        1,108,039      45,582        7 45575/7/0/0

Note how the number 406 appears in the SLEEPS column here? That 406 corresponds to the number of waits reported in the preceding "Top 5 Timed Events" report. This report shows us the number of times we tried to get a latch and failed in the spin loop. That means the "Top 5" report is showing us only the tip of the iceberg with regard to latching issues—the 229,537 misses (which means we spun trying to get the latch) are not revealed in the "Top 5" report for us. After examination of the "Top 5" report, we might not be inclined to think "We have a hard parse problem here," even though we have a very serious one. To perform 2 units of work, we needed to use 3 units of CPU. This was due entirely to the fact that we need that shared resource, the Shared pool—such is the nature of latching. However, it can be very hard to diagnose a latching-related issue, unless we understand the mechanics of how they are implemented. A quick glance at a Statspack report, using the "Top 5" section, might cause us to miss the fact that we have a fairly bad scaling issue on our hands. Only by deeper investigation in the latching section of the Statspack report will we see the problem at hand.

Additionally, it is not normally possible to determine how much of the CPU time used by the system is due to this spinning—all we know in looking at the two-user test is that we used 74 seconds of CPU time and that we missed getting a latch on the Shared pool 229,537 times. We don't know how many times we spun trying to get the latch each time we missed, so we have no real way of gauging how much of the CPU time was spent spinning and how much was spent processing. We need multiple data points to derive that information.

In our tests, because we have the single-user example for comparison, we can conclude that about 22 CPU seconds was spent spinning on the latch, waiting for that resource.

With Bind Variables

Now I'd like to look at the same situation as presented in the previous section, but this time using a program that uses significantly less latches during its processing. We'll take that Java program and code it using bind variables. To accomplish this, we'll change the Statement into a PreparedStatement, parse a single INSERT statement, and then bind and execute that PreparedStatement repeatedly in the loop:

import java.sql.*;

public class instest

{

   static public void main(String args[]) throws Exception

   {

      DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());

      Connection

         conn = DriverManager.getConnection

                ("jdbc:oracle:thin:@dellpe:1521:ora10gr1",

                 "scott","tiger");

      conn.setAutoCommit( false );

      PreparedStatement pstmt =

          conn.prepareStatement

          ("insert into "+ args[0] + " (x) values(?)" );

      for( int i = 0; i < 25000; i++ )

      {

        pstmt.setInt( 1, i );

        pstmt.executeUpdate();

      }

      conn.commit();

      conn.close();

   }

}

Let's look at the single and dual user Statspack reports, as we did for the "no bind variable" example. We'll see dramatic differences here. Here is the single-user report:

   Elapsed:             0.12 (mins)



Load Profile

~~~~~~~~~~~~                          Per Second      Per Transaction

                                 ---------------      ---------------

...

                  Parses:                   8.43                29.50

             Hard parses:                   0.14                 0.50



Top 5 Timed Events

~~~~~~~~~~~~~~~~~~                                                   % Total

Event                                               Waits    Time (s) Call Time

-------------------------------------------- ------------ ----------- ---------

CPU time                                                            4     86.86

log file parallel write                                49           0     10.51

control file parallel write                             4           0      2.26

log file sync                                           4           0       .23

control file sequential read                          542           0       .14

That is quite dramatic: from 26 CPU seconds in the no bind variables example to 4 CPU seconds here. From 807 hard parses per second to 0.14 per second. Even the elapsed time was dramatically reduced from about 45 seconds down to 8 seconds. When not using bind variables, we spent five-sixths of our CPU time parsing SQL. This was not entirely latch related, as much of the CPU time incurred without bind variables was spent parsing and optimizing the SQL. Parsing SQL is very CPU intensive, but to expend five-sixths of our CPU doing something (parsing) that doesn't really do useful work for us—work we didn't need to perform—is pretty expensive.

When we get to the two-user test, the results continue to look better:

   Elapsed:              0.20 (mins)



Load Profile

~~~~~~~~~~~~                       Per Second         Per Transaction

                              ---------------         ---------------

                  Parses:               6.58                    26.33

             Hard parses:               0.17                     0.67



Top 5 Timed Events

~~~~~~~~~~~~~~~~~~                                                   % Total

Event                                               Waits    Time (s) Call Time

-------------------------------------------- ------------ ----------- ---------

CPU time                                                           11     89.11

log file parallel write                                48           1      9.70

control file parallel write                             4           0       .88

log file sync                                           5           0       .23

log buffer space                                        2           0       .05

The amount of CPU time is about 2 to 2.5 times the amount reported by the single-user test case.

Note Due to rounding, the 4 CPU seconds is really anywhere from 3.5 to 4.49, and the 11 is really anywhere from 10.5 to 11.49 seconds.

Further, the amount of CPU used by two users with bind variables is less than half the amount of CPU a single user not using bind variables required! When I went to look at the latch report in this Statspack report, I found it was missing in this report—there was so little contention for the Shared pool and library cache that it was not even reported. In fact, digging deeper turned up the fact that the Shared pool latch was requested 50,367 times versus well over 1,000,000 times in the two-user test just shown.

Performance/Scalability Comparison

Table 6-1 summarizes the CPU used by each implementation, as well as the latching results as we increase the number of users beyond two. As you can see, the solution using fewer latches will scale much better as the user load goes up.

Table 6-1. CPU Usage Comparison With and Without Bind Variables

Users	CPU Seconds/Elapsed Time in Minutes	Shared Pool Latch Requests	Waits for Latches (Number of Waits/Time in Wait in Seconds)
	No Binds	Binds	No Binds	Binds	No Binds	Binds
1	26/0.52	4/0.10	563,883	25,232	0/0
2	74/0.78	11/0.20	1,126,006	50,367	406/1
3	155/1.13	29/0.37	1,712,280	75,541	2,830/4
4	272/1.50	44/0.45	2,298,179	100,682	9,400/5
5	370/2.03	64/0.62	2,920,219	125,933	13,800/20
6	466/2.58	74/0.72	3,526,704	150,957	30,800/80	17/0
7	564/3.15	95/0.92	4,172,492	176,085	40,800/154
8	664/3.57	106/1.00	4,734,793	201,351	56,300/240	120/1
9	747/4.05	117/1.15	5,360,188	230,516	74,600/374	230/1
10	822/4.42	137/1.30	5,901,981	251,434	60,000/450	354/1

The interesting observation for me is that 10 users using bind variables (and very few latch requests as a result) use the same amount of hardware resources as 2 to 2.5 users that do not use bind variables (i.e., that overuse a latch, or process more than they need to). When you examine the results for 10 users, you see that nonuse of bind variables results in the use of 6 times the CPU and takes 3.4 times the execution time when compared to the bind variable solution. The more users are added over time, the longer each user spends waiting for these latches. We went from an average of 4 seconds/session of wait time for latches with 5 users to an average of 45 seconds/session of wait time with 10 users. However, the implementation that avoided overuse of the latch suffered no ill effects as it scaled up.

Manual Locking and User-Defined Locks

So far we have looked mostly at locks that Oracle places for us transparently. When we update a table, Oracle places a TM lock on it to prevent other sessions from dropping that table (or performing most DDL, in fact). We have TX locks that are left on the various blocks we modify so others can tell what data we "own." The database employs DDL locks to protect objects from change while we ourselves are changing them. It uses latches and locks internally to protect its own structure.

Next, let's take a look at how we can get involved in some of this locking action. Our options are as follows:

Manually lock data via a SQL statement.
Create our own locks via the DBMS_LOCK package.

In the following sections, we will briefly discuss why you might want to do each of these.

Manual Locking

We have, in fact, already seen a couple of cases where we might want to use manual locking. The SELECT...FOR UPDATE statement is the predominant method of manually locking data. We used it in previous examples to avoid the lost update issue, whereby one session would overwrite another session's changes. We've seen it used as a method to serialize access to detail records to enforce business rules (e.g., the resource scheduler example from Chapter 1).

We can also manually lock data using the LOCK TABLE statement. This statement is actually used rarely, because of the coarseness of the lock. It simply locks the table, not the rows in the table. If you start modifying the rows, they will be "locked" as normal. So, this is not a method to save on resources (as it might be in other RDBMSs). You might use the LOCK TABLE IN EXCLUSIVE MODE statement if you were writing a large batch update that would affect most of the rows in a given table and you wanted to be sure that no one would "block" you. By locking the table in this manner, you can be assured that your update will be able to do all of its work without getting blocked by other transactions. It would be the rare application, however, that has a LOCK TABLE statement in it.

Creating Your Own Locks

Oracle actually exposes to developers the enqueue lock mechanism that it uses internally, via the DBMS_LOCK package. You might be wondering why you would want to create your own locks. The answer is typically application specific. For example, you might use this package to serialize access to some resource external to Oracle. Say you are using the UTL_FILE routine that allows you to write to a file on the server's file system. You might have developed a common message routine that every application calls to record messages. Since the file is external, Oracle won't coordinate the many users trying to modify it simultaneously. In comes the DBMS_LOCK package. Now, before you open, write, and close the file, you will request a lock named after the file in exclusive mode, and after you close the file, you will manually release the lock. In this fashion, only one person at a time will be able to write a message to this file. Everyone else will queue up. The DBMS_LOCK package allows you to manually release a lock when you are done with it, or to give it up automatically when you commit, or even to keep it as long as you are logged in.

Summary

This chapter covered a lot of material that, at times, may have made you scratch your head. While locking is rather straightforward, some of its side effects are not. However, it is vital that you understand these issues. For example, if you were not aware of the table lock Oracle uses to enforce a foreign key relationship when the foreign key is not indexed, then your application would suffer from poor performance. If you did not understand how to review the data dictionary to see who was locking whom, you might never figure that one out. You would just assume that the database "hangs" sometimes. I sometimes wish I had a dollar for every time I was able to solve the insolvable hanging issue by simply running the query to detect unindexed foreign keys and suggesting that we index the one causing the problem—I would be very rich.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 6 Locking and Latching

Create new playlist

Sign In

Sign Up

CHAPTER 6Locking and Latching