Locks

We’ve seen latching as a mechanism for protecting memory while we search through, and possibly modify, lists, but I’ve pointed out that we ought to avoid holding latches for any length of time. This means that if we need to do something time-consuming when we’ve found the bit of memory we were searching for, we need to devise a different mechanism to protect that memory while we’re using it so that we can release the latch. This is where library cache locking (and pinning) comes into play.

Before looking at library cache locks, though, we’ll turn our attention to the better known and more easily observable form of locking—the activity we see in v$lock, which, broadly speaking, is a view over x$ksqrs (the enqueue resources) joined to a union of x$ksqeq (the enqueues) and various other structures.

images Note If you’re ever asked, “What’s the difference between a lock and an enqueue?” the answer is “nothing.” They are two terms for the same concept, even though, historically, the documentation tended to use the term “lock” while the internal structures were called “enqueues” and the instance activity stats referred to “enqueues”; for example, if you query the dynamic performance view v$lock, you are querying the structure defined by the parameter enqueues. To add to the confusion, there are structures called KGL locks, or breakable parse locks, which are never called enqueues and which people don’t generally think of when you mention locks.

Infrastructure

There are many types of resources in Oracle—tables, files, tablespaces, parallel execution slaves, and redo threads, to name just a few—and there are many different reasons why we might want to protect a resource for a long time (and, to paraphrase former UK Prime Minister Harold Wilson, “7 milliseconds can be a long time in Oracle”).

To create a standard method for handling different types of resources, Oracle keeps an array in the SGA (exposed as x$ksqrs and v$resource, with a size defined by hidden parameter _enqueue_resources) where each entry can be used to represent a resource. The critical columns in the array definition are

Name              Null?    Type
----------------- -------- ------------
KSQRSIDT                   VARCHAR2(2)
KSQRSID1                   NUMBER
KSQRSID2                   NUMBER

You will probably recognize hidden in these column names the type, id1, and id2 that are revealed in view v$lock. To use an array entry as a representative of some resource, your session will simply fill in the columns; for example:

  • ('PS', 1, 4) represents parallel execution slave P004 on instance 1.
  • ('TM', 80942, 0) represents the table that has 80,942 as its object_id.
  • ('TX', 65543, 11546) represents the transaction using slot 7 (mod(65543,65536)) in undo segment 1 (trunc(65543/65536)) for the 11,546th time.

Once we have an in-memory object representing a particular resource, we can start attaching things to it to show which sessions want to use that resource and how restrictive they want to be in their use. There are several arrays of structures that Oracle uses for this. The most frequently used are x$ksqeq (generic enqueues), x$ktadm (table/DML locks), and x$ktcxb (transactions). The others are x$kdnssf, x$ktatrfil, x$ktatrfsl, x$ktatl, x$ktstusc, x$ktstusg, and x$ktstuss. These arrays have a common core of elements that, with the exception of a couple of columns from x$ktcxb, follow a uniform naming convention. These columns are given different names when they are exposed through v$lock, but the original names from the x$ structures are as follows:

Name              Null?    Type
----------------- -------- ------
ADDR                       RAW(4)   -- ktcxbxba in x$ktcxb
KSQLKADR                   RAW(4)   -- ktcxblkp in x$ktcxb
KSQLKMOD                   NUMBER
KSQLKREQ                   NUMBER
KSQLKCTIM                  NUMBER
KSQLKLBLK                  NUMBER

It’s just possible to recognize the last four columns as the lmode, request, ctime, and block of v$lock. The various structures also have a column ksqlkses, which is the session address of the locking session, exposed indirectly through v$lock through the sid (session id), and a column x$ksqlres, which is the address of the resource it’s locking, exposed indirectly through the type, id1, and id2.

The basic idea, then, is very simple: if you wish to protect a resource, you acquire a row from x$ksqrs, label it to identify the resource, and then acquire a row from x$ksqeq (or equivalent), set the lock mode, and link it to the row from x$ksqrs. There are, of course, many subtleties to worry about:

  • How do you find out (efficiently) whether or not someone else has already labeled a row in x$ksqrs to identify the same resource so that you don’t create a duplicate?
  • If someone else has already labeled a row from x$ksqrs, will you always be allowed to attach your row from x$ksqeq (or equivalent) to it? (What if their lmode denies you access to the resource?)
  • If you are allowed to attach your row from x$ksqeq (or equivalent), does that mean you will be able to use the resource? If not, how do you find out when you can use the resource?

A Graphic Image of v$lock

Before doing anything else, let’s extract some rows from v$lock and draw a picture of what those rows look like in the SGA:

select
        sid, type, id1, id2, lmode, request, ctime, block
from    v$lock
where
        type = 'TM'
and     id1 =  82772
;
     SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -- ---------- ---------- ---------- ---------- ---------- ----------
      37 TM      82772          0          3          5         66          1
      36 TM      82772          0          3          0         42          1
      39 TM      82772          0          0          6         27          0
      35 TM      82772          0          0          3          3          0

I have four sessions here handling a table that is the child table of a parent/child link—and there is no supporting index on the child table for the foreign key constraint. To get into this state, I took the four sessions through the following steps:

  1. Session 37: The only child of parent 1.
  2. Session 36: The only child of parent 2.
  3. Session 39: Attempt to lock the child table in exclusive mode (and start to wait).
  4. Session 37: Attempt to delete parent 1 (and start to wait due to missing FK index).
  5. Session 35: Attempt to delete the only child of parent 3 (and start to wait).

I contrived this unlikely sequence of actions because I wanted to demonstrate the range of possibilities with locking. Without the history I’ve given you, you might find it a little hard to determine the sequence of steps that led to this set of rows in v$lock even though all the necessary clues are available in the output. Figure 4-4 converts the text to a picture.

images

Figure 4-4. A graphic image of a few rows from v$lock

A single row from x$ksqrs (v$resource) holds the end points of three doubly linked lists (often referred to as queues in this case). The three lists are made up of rows from one of the enqueue structures, the choice of structure being dependent on what the resource represents. In our case the enqueues will come from x$ktadm because the resource represents a table (resource type is TM).

The three lists are for owners (sometimes called holders), converters, and waiters, and the process of “locking” an object requires you to join the end of the waiters queue until there is no one ahead of you in the waiters queue or the converters queue, and only then can you attach yourself to the owners queue—provided the lock mode you want is compatible with the modes held by everyone currently on the owners queue.

images Note There are a few places in the Oracle code where a process may take a lock in one mode to start doing some work, and then find that it needs a more aggressive level of locking as the work proceeds. This results in lock conversion. This is not the same as lock escalation—a mechanism Oracle does not need to implement—which involves (for example) changing from a set of row locks to a single table lock to minimize the number of locks acquired from a lock resource by a session.

The most common example of lock conversion appears in the foreign key locking problem. If you declare a foreign key constraint on a table but don’t create a supporting index for the constraint, then a session that tries to delete or update a parent key value will try to lock the child table with a share lock (mode 4); however, if the session had deleted some rows from the child table before modifying the parent table, then it would already be holding a subexclusive lock (mode 3), which means it has to up-convert the lock mode. In fact, the rules of lock modes dictate it would have to convert to a mode 5—not just a mode 4—which is what we see in our example.

So in Figure 4-4 we see that SID 35 is sitting at the end of the waiters queue because SID 39 was already stuck there. SID 39 has to wait because SID 37 is in the way in the converters queue. SID 37 is stuck in the converters queue because it wants to convert a mode 3 lock to mode 5 and SID 36 is in the owners queue holding a lock in mode 3—and no one else is allowed to hold a mode 5 if anyone is holding a mode 3.

As processes commit (or roll back), the queue will shuffle forward as follows:

  1. When SID 36 commits, the owners queue will become empty, which will allow SID 37 to move from the converters queue to the owners queue, acquiring its mode 5 lock and setting its ctime back to 0. Sessions 39 and 35 will still be stuck on the waiters queue.
  2. When SID 37 commits, the owners queue will again become empty and SID 39 will be able to move to the owners queue, acquire its mode 6 lock, and set its ctime to 0. SID 35 will now be at the front of the waiters queue, but can’t join the owners queue because you can’t get any lock on a table if any other session is already holding an exclusive (mode 6) lock.
  3. When SID 39 commits, the owners queue will again become empty and SID 35 will be able to move to the owners queue, acquire its mode 3 lock, and set its ctime to 0.

If you compare this picture with the extract I’ve shown from v$lock, you might realize that you can’t normally get the whole picture unless you’re a little lucky (or have a helpful author). In our case we can see that session 36 is holding mode 3 and that it is blocking something (block = 1). Since we know that converters take precedence over waiters, we can deduce that session 37 is the thing blocked by session 36 and that it too is blocking something. We can see that sessions 35 and 39 are blocked, but we can’t necessarily tell which one is at the top of the queue—fortunately I waited several seconds after each step of the demonstration so we can look at the ctime column and see that session 29 started waiting roughly 24 seconds before session 35. The ctime column, however, is only updated each time the lock times out, which usually means every 3 seconds, so it won’t always help—and this generally is where we have a little bit of an information gap: if several sessions are blocked somewhere in the converters or waiters queue, we can’t always tell which one will be released first.

images Note If you’re waiting to acquire a lock, the ctime column tells you how long you’ve waited, and if you’re holding a lock, it tells you how long you’ve been holding it. But if you’re converting from one mode to another, does it tell you how long you’ve been waiting to convert to the new mode, or how long you’ve been holding the old mode? Both pieces of information are useful, but the one you get is how long you’ve been holding the old mode. It would be nice if Oracle Corp. could enhance the view (and underlying code) to give us both times.

If you really need to pick the details apart, when you’re jammed in a complex locking pattern, you could dump an enqueue trace. It will look a little messy, and you’ll have to do some decoding and formatting, but it will give you a completely sequenced picture. The command is

alter session set events 'immediate trace name enqueues level 3'

Here’s an extract from the trace file I dumped after I had got all my locks in place:

res      identification         NUL SS  SX  S   SSX X   md link
         owners              converters          waiters
-------------------------------------------------------------------------
21A4CD90 TM-00014354-00000000 U   0   0   2   0   0   0  8 [21a5534c,21a5534c]
         [212d1190,212d1190] [212d1008,212d1008] [212d1254,212d13dc]

   lock     que owner    session        hold wait ser link
   ----------------------------------------------------------------------
   212D1188 OWN 2198C2C4 2198C2C4 (036)   SX NLCK  12 [21a4cd98,21a4cd98]

   212D1000 CON 2198D5AC 2198D5AC (037)   SX  SSX  65 [21a4cda8,21a4cda8]

   212D124C WAT 2198FB7C 2198FB7C (039) NLCK    X  17 [212d13dc,21a4cda0]
   212D13D4 WAT 2198AFDC 2198AFDC (035) NLCK   SX   6 [21a4cda0,212d1254]

There are two lines of data about each resource, starting with its address in x$ksqrs and then listing the resource’s identity (type, id1, id2).

The second line of information identifies the three linked lists using a format that appears very frequently in all sorts of trace files: two addresses in square brackets, the first address acting as a “forward pointer” and the second address acting as a “backward pointer.” In this case the first address points to the first item of the linked list of enqueues and the second address points to the last—but just to confuse the issue, there’s an 8-byte offset in the figures. Take a look at the pair of pointers for the waiters queue: the forward pointer (212d1254) minus 8 is the value that appears at the start of the third line of the locks, and the backward pointer (212d13dc) minus 8 is the value that appears at the start of the fourth line of the locks. We see the same pattern in the pointer pairs for the owners and converters queues, but the forward and backward pointers are the same as each other because there is only one item in each queue.

The same linked list format appears in the locks information in the link column. Again the waiters (que = WAT) are the easier to follow. Note how the forward pointer (212d13dc) minus 8 in line 3 points to the lock address in line 4, while the backward pointer (212d1254) minus 8 in line 4 corresponds to the lock address in line 3.

Finally, we can see that the backward pointer in line 3 and the forward pointer in line 4 are both (21a4cda0) pointing back to the resource address (21A4CD90), although the offset is 32 rather than 8. (The same logic applies to the converters and owners, but the offset for owners is 8 bytes and the offset for converters is 24 bytes—this may seem like an error, but it’s probably just a case of pointing to the specific field rather than pointing to the start of the whole record.)

Deadlock

Looking back as the original picture, you might wonder what would happen if session 36 decided to delete the parent row for the child row it has successfully deleted. Because of the missing foreign key index, it would have to convert its mode 3 lock to a mode 5 lock, which means it would have to join the converters queue behind session 37—you always join a queue at the tail end.

When I drew my picture, I left a “ghost” of session 37 on the owners queue; if you look at the trace file it’s not really there, of course, but as a visualization of the way the code works, it’s quite helpful to think of it as still sitting in the owners queue while also being in the converters queue.

So at this point we have session 37 in the converters queue waiting for the owners queue to empty, and session 36 sitting “behind” session 37. But session 36 (or its “ghost”) is at the end of the owners queue, so session 36 is also “in front of” session 37—we have constructed a simple deadlock, where both sessions are waiting for the other session to get out of the way.

Within 3 seconds, one of the sessions will do a statement level rollback and raise Oracle error ORA-00060, “deadlock detected,” dumping a deadlock trace into the session’s trace file at the same time. Although the manuals say that a session will be chosen at random, I have found that it’s always the session that has been waiting the longest that gets the deadlock error. In this case it would be the session at the front of the converters queue, session 37. The explanation for this behavior is that TM enqueues (and many other types of enqueue) have a timeout interval of 3 seconds—so every 3 seconds a session waiting for a TM lock will wake up, check the state of all the locks on the table, and (typically) go into another 3-second wait. It’s this check on timeout, though, that allows a session to realize it’s in a deadlock and raise the error.

So what should session 37 do when it raises the error? Many applications will respond to the error by simply terminating the session (which usually clears up the locking problem automatically, eventually, because of the implicit rollback). Some application will issue an explicit rollback;, which will also clear out the problem but may need a lot of time to roll back the work done up to that point. Some applications, unfortunately, will simply retry the last step—which in our case would put session 37 in the waiters queue behind session 36, resulting in session 36 reporting a deadlock error within 3 seconds, and if the application is consistent, it will make session 36 try again, leading to session 37 reporting a deadlock error within 3 seconds, and so on.

There is no perfect solution for handling deadlocks. Rolling back the transaction is a safe but possibly expensive option; terminating the session with extreme prejudice so that it doesn’t do an implicit commit of the work done so far is an alternative. Ideally, though, you could consider coding your application so that any long-running jobs (which are the ones where it really matters) that receive a deadlock error are coded to identify the other session involved and either page a DBA to resolve the problem or check which session will have to roll back the smaller amount of work—with the option, then, to terminate the other session. However it’s done, applications shouldn’t be allowed to respond to deadlocks with an arbitrary failure.

images Note Although I’ve said that it’s always the session that has been waiting the longest that gets the deadlock error, I have to point out that I managed to construct a very specific and totally repeatable sequence of steps in 9.2.0.1 where both sessions got a deadlock error simultaneously. I couldn’t reproduce this behavior in any other version of Oracle, so it may have been a temporary side effect (or bug) due to some of the changes that were working their way through the enqueue manager in 9.2.

Technically it’s also possible (at least in theory) that your O/S scheduler could break this pattern. Even if the two waiting sessions are restarted in the right order, this only means they are put on the CPU run queue in the right order. If your hardware implements multiple run queues, it’s possible for the second process that goes onto a run queue to be the first process that actually runs.

I’ve used table locking to demonstrate the structures involved and highlight one of the classic examples of locking and deadlocking. Foreign key deadlocking is relatively common, but you are probably more likely to see transaction deadlocking or (to give it another name) data deadlocking.

As you saw in Chapter 3, a transaction table slot is the “focal point” for all the work done by a single transaction, and every data (and index) block modified by a transaction will hold a reference to that transaction table slot in its interested transaction list (ITL). The session executing the transaction will lock the transaction table slot by creating an enqueue resource for it (type TX, id1 representing the undo segment number and slot number, and id2 representing the slot’s sequence number) and attaching an enqueue to it—specifically, a row from x$ktcxb rather than x$ksqeq—at lock mode 6 (exclusive).

If a second session wants to update a row that has been modified but not committed by the first session, then it will attach an enqueue to the first session’s TX resource requesting an exclusive lock—and this enqueue will be attached to the waiters queue. If, at this point, the first session decides to modify a row that has been modified but not committed by the second session, it will attach an enqueue to the second session’s TX resource—again, this enqueue will have to go into the waiters queue.

So we have the first session in the waiters queue on the second session’s TX resource, and the second session in the waiters queue on the first session’s TX resource—the classic transaction (or data) deadlock. Within 3 seconds, one of the sessions will raise an Oracle error ORA-00060 and its last statement will be rolled back; if the deadlock occurs inside a PL/SQL block and you don’t have an exception handler for it, then all the uncommitted SQL statements inside the block will be rolled back.

images Note When a data deadlock occurs and dumps a trace file, the most important lines in the trace file (from the perspective of Oracle Corp., at least) will say

DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a deadlock due to user error in the
design of an application or from issuing incorrect ad-hoc SQL.

The ORA-00060 error is (almost always) a program error, although, unfortunately, there have been some recent changes in 11g that have introduced some very strange locking anomalies

When deadlocks happen, the session that gets the error will dump a trace file. There’s a lot of information in it, but the starting point is the deadlock graph, which, in the simple case I’ve described, would look like this:

Deadlock graph:
                     ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name        process session holds waits  process session holds waits
TX-00030026-000040d6      13      38     X             12      50           X
TX-00060002-00004113      12      50     X             13      38           X

In this graph you can see that session 38, process 13, was holding TX lock (undo segment 3, slot 38 (0x26), sequence 16,598) in mode 6, and session 50, process 12 was waiting to acquire it. Conversely, session 50 was holding TX lock (undo segment 6, slot 2, sequence 16,659), which session 38 was waiting for.

The rest of the trace file will tell you the SQL statements the sessions were executing when they got stuck, may give you the rowid for problem rows, and will give you the entire process state dump for the session that got the error.

It’s worth highlighting a couple of important concepts. First, of course, it’s possible for several sessions to get into a ring; a deadlock doesn’t have to be limited to two sessions, although the commonest ones are. Secondly, although TX locks crossing over in mode 6 are the most common form of deadlock, any time that it’s possible to wait for a lock, it’s also possible to engineer a scenario where you can get into a deadlock (as we saw with the discussion of table deadlock from the earlier example).

images Note When a transaction deadlock occurs, one of the sessions involved will have to commit; or rollback; to resolve the problem. Even though you can set savepoints in a transaction, it is important to remember that a rollback to savepoint is not sufficient to clear the deadlock as it does not release the TX resource; all it does is apply any undo records created since the savepoint was declared.

There are a few variations on TX deadlocks that are worth mentioning. If one of the tables is an index organized table (IOT), one of the waits will be for mode S (share). An IOT is an index, and if you’re waiting to modify an IOT, you wait in mode 4 (share) rather than mode 6 (exclusive).

This leads to several other possibilities relating to indexes—and bear in mind my earlier comment that if there are circumstances that make you wait for a lock, there will be a variation on those circumstances that will result in a deadlock. Various coding errors can lead to waits because of conflict over indexes; for example:

  • Two sessions try to insert the same primary key value
  • One session inserts a row into a child table as another session deletes the parent
  • One session inserts a parent row and another inserts child rows before the parent row has been committed
  • Two sessions try to delete rows that are covered by the same bitmap index chunk

Essentially, if you’re waiting for another session to commit because of a conflict over indexes (or possibly the constraint enforced by those indexes), then you will see a TX wait in mode 4. There’s even an index case where the problem isn’t a coding problem—which appears when you have to wait for another session to complete a leaf block split

There are, however, a few other causes for TX/4 waits. The most common (though still rare) appears when a transaction wants to modify a block whose ITL is full of active transactions and there is no free space to grow the ITL. If this happens frequently, then you need to decide whether it’s worth tracking down the problem object and changing its definition (typically the initrans value). Conveniently, 10g displays ITL waits as a specific type of TX wait (enq: TX - allocate ITL entry), and captures information about the specific objects in the Statspack and AWR reports as Segments by ITL Waits.

Distributed transaction can also result in TX/4 waits and, surprisingly, these waits can appear on select statements. In recent versions of Oracle (9.2.0.6 and later), this can only happen if you are using database links; in earlier versions it could happen through XA-compliant transaction-processing monitors (TPMs). The problem comes from the two-phase commit (2PC)—if a remote database has modified some of your data and has sent you the prepare step but not yet sent the commit step of a two-phase commit, your database doesn’t know whether to view the changes as committed or rolled back, so anyone who queries the modified data has to wait until the commit (or rollback) arrives. (If you’re using an XA- complant TPM, Oracle will simply create a read-consistent version of the data, which, theoretically, could lead to inconsistent results across a distributed system.)

There are a few other exotic reasons for TX/4. One of them relates to starvation of free lists (but it’s a very extreme edge case, and most systems have stopped using free lists anyway). Another relates to waits for a data file to finish growing, so watch out what you set your autoextend values to. Yet another is when you switch a tablespace to read-only mode and have to wait for all current transactions to complete before the tablespace switch can occur, and that’s a necessity and not something you should be doing often anyway.

Lock Modes

I always have trouble remembering the names of the different locking levels (or modes) that Oracle allows. I tend to think in numbers, and Oracle doesn’t make it easier by having two different names for some of the lock modes. So I’ve produced Table 4-5 to show a list of the lock modes with a brief note of how they apply to tables. It’s probably easiest to understand the purpose of the different modes in the context of tables, but, in principle, any resource could be subject to locks in any of the lock modes.

images

Unless you’ve done something unusual with your application, and ignoring the lock-related bugs that have appeared in 11g, the only significant thing you need to remember about table locks is that if you see a mode 4 or mode 5, you probably need to create an index on a foreign key constraint.

In fact, for many applications, the only table locks that will ever appear will be mode 3 as the code updates tables, and (for versions in the 9.2 range) mode 2 locks associated with foreign key constraints.

Latches for Locks

We haven’t yet answered the question of how efficiently we can find out whether we need to set up a new row in x$ksqrs (v$resource) to represent a new resource or whether there’s a row in use already that we should attach our enqueue to. We also have an outstanding question about the apparently random “link” values we saw in the enqueue trace file.

There are two more parts of the infrastructure we need to know to be able to answer these questions—and we’re back to arrays, pointers, linked lists, and hash tables, which inevitably bring in latching.

We’ve already seen that x$ksqrs and x$ksqeq (which I’ll use as representative of all the rest of the enqueue structures) are arrays—each row is a fixed size, and it’s easy to get the Nth row if we know where the array starts and how long each row is. But we know that each of the rows in the underlying structures holds lots of pointers—we saw them in the fragment of the enqueue trace, even though they’re not visible in the x$ itself—and by playing with pointers, Oracle can “rearrange” the arrays to make them look like something completely different.

We’ve already seen that there are three doubly linked lists attached to the resources array (x$ksqrs), and that these link together chains from the enqueue arrays (x$ksqeq, et al.). What we haven’t yet seen is how Oracle makes it possible to use these structures efficiently—and the answer is a picture very similar to the earlier picture of the library cache, shown in Figure 4-5.

images

Figure 4-5. Enqueues and enqueue resources in use—the bigger picture

The resource in Figure 4-5 labeled “Resource” is a miniature copy of the resource from Figure 4-4, but now it’s shown as one link in an enqueue hash chain. The whole picture can best be explained by describing the process of acquiring a lock:

  • Use the resource identifier (type, id1, id2) to generate a hash value. The size of the hash table seems to be 2 * sessions + 35, so doesn’t fall into either of the common patterns of “prime number” or “power of 2”!
  • The hash buckets are protected by latches known as the enqueue hash chains latches. The number of latches matches the number of CPUs, and the hash buckets are shared “round-robin” across the latches. So get the latch and search the bucket (i.e., walk along the chain) to see if a row from x$ksqrs has already been linked to that hash chain to represent the resource you want to lock.
  • If the enqueue resource is in place already, then pick a row from the relevant enqueue structure (x$ksqeq, et al.), but to do this you have to get the associated enqueue latch to stop other people from picking the same enqueue row at the same time. The latch you need to acquire depends on the specific type of enqueue you are using; for example, if you want a row from x$ksqeq  you need to get the enqueue latch but for a row from x$ktadm  you need to get the dml allocation latch. Drop this latch as soon as you have made the enqueue row safe.
  • If the enqueue resource is not in place, then (still holding the enqueue hash chains latch) pick a row from x$ksqrs, label it to represent your resource, link it in to the hash chain, and then get an enqueue row to link to it.
  • Once all this is done, you can drop the enqueue hash chains latch.

There are two places where we might have to worry about efficiency, and both of them involve the same type of work: if you need to pick a row from the enqueue resources or the enqueues, how do you avoid walking through the array looking for an unused item? In a large, busy system with lots of resources and enqueues, you may have to walk through a large section of the array before finding a free element.

To my surprise, it looks as if Oracle does exactly that when looking for an enqueue row (whether it’s looking at x$ktadm, x$ksqeq, or [I assume] any of the other shorter arrays).

On the other hand, something completely different happens with the x$ksqrs array. I’m not sure of all the details, and there may be two or three variations in strategies that make it hard to work out what’s going on, but, firstly, when Oracle takes enqueue resources off the hash chains, I think it has a mechanism that uses another set of “linked list” pointers in the structure to emulate a stack (LIFO) so that it always has a pointer to the first available row in x$ksqrs and doesn’t have to search the array.

I think there’s also a lovely example of Oracle’s “lazy cleanup” strategy. When the last enqueue is detached from an enqueue resource, I think Oracle leaves the enqueue resource in the hash chain, simply marking it as “available for reuse.” The benefit of this is that, over time, all the hash chains will end up with a few “empty” resources linked to them, so if you don’t find the resource you want when walking a hash chain, you will already have an empty one there waiting to be used and you won’t have to scan the x$ksqrs array. If this hypothesis is correct, there will also have to be code to handle the case when there are no free enqueue resources on the hash chain you are searching, and you find that your only available option is to transfer a resource from another hash chain.

If you’re trying to get to grips with the complexities of how Oracle works, one of the best places to start is with the enqueue mechanisms. The methods and structures are typical of the behavior that appears all through the Oracle code, but enqueues have a wonderful benefit for the early student—they happen quite slowly and are easy to control, so it’s much easier to observe the activity and work out what’s going on.

KGL Locks (and Pins)

After all the groundwork we’ve put in on enqueues, there isn’t really very much to say about library cache locks (and pins and mutexes) at this point; we’ve covered the key points of why we need to protect areas of memory in a multiuser system, and we’ve talked about the need to be able to find the right bit of memory as efficiently as possible.

We used an initial picture (figure 4-1) of the library cache to demonstrate the concept of hash buckets with linked lists attached as a mechanism for finding (or placing) an object efficiently. Then we saw a picture (Figure 4-4) of an enqueue resource with a number of enqueue chains linked to it—and that’s a pattern that reappears in the library cache (although the things we attach to a library cache object come from different x$ structures). Finally, we used a picture (figure 4-5) showing the enqueue hash chains latch and the collection of hash buckets it protects—and the association of latches and hash buckets in the library cache is exactly the same, until we get to 10g, and by the time we reach 11.2 it’s completely different. We’ll stick with the older library cache mechanisms to start with, because they are still repeated in the buffer cache, even in 11g.

Every object in the library cache belongs in a library cache hash bucket, and the bucket number is derived from the object’s name (and the term “name” is interpreted very freely in this context). The hash buckets are shared evenly around a number of library cache latches, and the number of latches is similar to the number of CPUs, up to a maximum of 67 latches (and, like me, you might think that seems to be a rather small number of latches). The most significant difference between the locks we attach to library cache objects and the locks we attach to enqueue resources is that there are two sets of locking structures involved with the library cache objects—the KGL locks and the KGL pins—and I don’t think Oracle creates three queues for each type, so there may only be two, the owners and waiters. (The picture that I’ve described so far doesn’t look very different from the enqueues picture shown in Figure 4-5, so I won’t provide another one in this chapter.)

Locks and Pins

If you’re wondering why we have two types of locking structure on the library cache objects, it’s because they express two different intentions. (I have to say that I’ve never quite managed to convince myself that two structures are absolutely necessary, but that probably means I haven’t thought through the requirements properly—it may simply have been a scalability decision.)

The most important task of the KGL lock (which is exposed in x$kgllk and becomes v$open_cursor) is to improve concurrency. If your session has a KGL lock on an object—and for an SQL statement, that means a lock on the parent cursor and on the relevant child cursor—then the Oracle code has a mechanism that allows it to go directly to the address it has for the object rather than having to grab the library cache hash chain latch and search the hash chain.

There are three major ways in which you can make sure that this little saving in work takes place:

  • You can write your front-end code to “hold” cursors that you know you are going to use frequently.
  • You can set the session_cached_cursors parameter so that Oracle library code will automatically start holding a cursor if it sees you using a statement more than two or three times.
  • You can benefit from the semiautomatic way in which PL/SQL will hold cursors that have been opened (explicitly or implicitly) from within a PL/SQL call—from Oracle 9.2.0.5 onward, setting the session_cached_cursors parameter will also control this feature.

images Note The session_cached_cursors parameter dictates how many cursors can be held when your code does not explicitly hold cursors. It also controls the number of cursors that can be held open by PL/SQL code being run by the session. Historically (pre-9.2.0.5) the size of the PL/SQL cursor cache was set as a secondary feature of open_cursors. The value people use for open_cursors is often quite high (sometimes too high), but many people leave session_cached_cursors to its default, and the default value is often too low.

The KGL pin comes into play when you actually use an object. Although a KGL lock will hold an object in memory, there are parts of the object that are dynamically re-creatable (the execution plan for an SQL statement, for example), and these can still be discarded if there is a heavy demand for memory even if you have a KGL lock in place.

However, when you are actually using an object (running an SQL statement, say), you need to ensure that the re-creatable bits can’t be pushed out of memory, so you pin the object to protect them.

The KGL locks and KGL pins themselves are simply little packets of memory that, at one time, were individually created and discarded on demand by making calls to allocate memory from the shared pool. Since a KGL lock is about 200 bytes and a KGL pin is about 40 bytes, you can imagine that between them the constant allocation and freeing of memory could cause the free memory in the shared pool to end up in a “honey-combed” state—i.e., lots of free memory in total, but not many large pieces of contiguous memory. The KGL pins were particularly nasty because they would come and go very quickly; the KGL locks weren’t quite so bad because they could stay attached to an object for some time.

Another issue with the KGL locks and KGL pins was that, to use them, you had to constantly manipulate linked lists, attaching and detaching chunks of memory, and you had to do this while holding a latch exclusively (and, as I’ve pointed out, there are surprisingly few latches for the library cache). So for very busy systems, the whole lock/pin issue could become a significant scalability threat. In 10g, though, Oracle Corp. introduced the library cache lock latch and the library cache pin latch, which allowed some concurrent activity to be done on different hash buckets covered by the same library cache latch (you could pin a cursor in one bucket while I locked a cursor in another bucket because we wouldn’t both need to hold the same library cache latch at the same time).

However, as we moved through 10g to 11g, the whole KGL lock/KGL pin mechanism was gradually replaced by the mutex mechanism.

MUTEXES, PART 2

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.55.170