First Things, First . . .

Before we can talk about what latches are and how they work, though, you need some idea of how Oracle uses arrays, pointers, linked lists, and hash tables when tracking information in memory, because those are the types of structure that need the most protection when they are shared by multiple users.

Arrays

An array is essentially a list of objects of the same shape and size, and since all the objects are the same size, it’s easy to walk through the array looking at each object in turn. For example, x$ksuse (the structure for user sessions reported through v$session) is a fixed array with rows of 11,360 bytes in Oracle 11.2.0.2 on 32-bit Windows. Oracle need only record the starting position of the array, and the code can work out the starting position of any row in the array by simple arithmetic.

In some cases the array is a “segmented” array, which means Oracle allocates an initial chunk of memory for a fixed number of array items, but may allocate more chunks dynamically as needed. In this case Oracle needs to record the addresses of the starting point of each chunk—which might mean that it keeps a little list of the chunks, or that each chunk includes a little information about the next chunk in the list. The structure x$ktatl (the entry labeled temporary_table_locks in v$resource_limit) is such a case. In a little test I ran against a 10g instance, this started life as an array with 16 entries of 144 bytes each, but then added chunks of 16 rows at a time, with each chunk allocated fairly randomly in memory. In a similar fashion, although x$ksqrs and x$ksqeq start with a much larger array size they can grow by chunks of 32 rows at a time.

images Note The entry for enqueue locks in the dynamic performance view v$resource_limit is broken. In my little test I pushed the allocation for enqueues locks up to 2,500 entries when the original limit_value was 1,130. At this point v$resource_limit still showed the limit_value as 1,130 and the value for the current_allocation had stuck at 1,129. It’s by spotting little anomalies like this that I often learn what Oracle is doing. Perhaps this particular anomaly is a clue that entry 1,130 is being used to point to the next segment in a segmented array.

Pointers

This brings us briefly to the idea of pointers. A pointer is just a memory location holding the address of a more interesting piece of memory. For example, if I look in the fixed SGA variables array x$ksmfsv, I can find the following entry (numeric values will differ across versions):

ADDR           INDX    INST_ID KSMFSNAM     KSMFSTYP     KSMFSADR   KSMFSSIZ
-------- ---------- ---------- ------------ ------------ -------- ----------
035004B0       3923          1 kcbllsb_     ksqeq *      03D3C818          4

This tells me that location 0x035004b0 holds the value 0x0d3c818, which is an item of data that is 4 bytes long, and it is a “pointer to something of type ksqeq.” When I dump the contents of address 0x0d3c818 I find the value 0x21a33960, which happens to be the address of the first row in the fixed array x$ksqeq. So I’m looking at a pointer to a pointer to a fixed-length array—this is another clue, perhaps that we have a segmented array where the last element of a segment is pointing to the first element of the next segment.

On the other hand, we might find for (some) segmented arrays that we have two entries in x$ksmfsv, one telling us how many segments there are, and the other pointing to an array of pointers, each of which points to one of the segments.

Linked Lists

Once you have the idea of pointing from one location to another, of course, you can think about getting away from the rather rigid “fixed structure” approach that characterizes arrays. You can create lists of associated data items of varying shapes and sizes simply by making sure that each item points to the next item in the list, and this approach is used quite frequently in Oracle. In fact, many of the lists used by Oracle are doubly linked lists, which means each item points forward to the next item of the list and backward to the previous item.

We don’t even need to look at anything as transient as memory structures to find examples of linked lists, because we’ve already seen an example of Oracle using one in Chapter 3, in the transaction table (and transaction control) in the undo segment header. Here, stripped to a minimum, is a reminder of the structure:

  TRN CTL:: seq: 0x0b02 chd: 0x0011 ctl: 0x001c inc: 0x00000000 nfb: 0x0001

TRN TBL:
  index  state cflags  wrap#    uel         scn            dba
  ---------------------------------------------------------------
...
   0x11    9    0x00  0x2d25  0x001b  0x0000.041c818a  0x00805805
...
   0x19    9    0x00  0x2d24  0x002e  0x0000.041c81d1  0x00805c0b
...
   0x1b    9    0x00  0x2d23  0x0019  0x0000.041c81ce  0x00805c09
   0x1c    9    0x00  0x2d24  0xffff  0x0000.041c907c  0x00805c0d  
...
   0x27    9    0x00  0x2d25  0x001c  0x0000.041c9072  0x00805c0d
...
   0x2e    9    0x00  0x2d1f  0x001a  0x0000.041c81d2  0x00805806

In the transaction control we see that the head (chd) of the list is element 0x0011, and the tail (ctl) is element 0x001c. If we look at row 0x11 (using the index column) in the list, the uel column points us to row 0x001b; row 0x1b points us to 0x0019, row 0x19 points us to 0x002e, row 0x2e points us to 0x001a . . . at which point I’ve omitted the next 30 or so links . . . until something points us to row 0x27, which points us to row 0x1c (the ctl), which terminates the linked list by “pointing” to 0xffff.

images Note In the previous example you can see that the dump switches between one and two bytes for the index/uel entry. It’s important to remember that you can’t depend on the various dump files to tell you the truth, the whole truth, and nothing but the truth. If you want to know whether the item is stored as 1 or 2 bytes (or whether the index is actually stored at all—it isn’t), then you need to check the raw data rather than relying on the dump file.

It is interesting to note that this is an example where we have an array (fixed size, fixed type of element) but still use a linked list so that we can pick the next item that we want to use (the head item) as quickly as possible and, by attaching an item to the current end (tail) of the list when we’ve finished with it, ensure that the previous item that we used won’t get used again until every other item has also been used. This is a fairly convenient way of dealing with the need to allow multiple sessions to remove an entry from the list, hold it for a random amount of time, and then put it back on the list.

This is an example of a singly linked list—it’s a one-way journey. If we pick an item, it’s easy to find out which item we’re going to use next because the item we’re looking at currently points to it, but it’s expensive to identify the previous item because we have to check every other item to see which one is pointing at the current item. (Notice, also, how Oracle has to keep a separate note of the tail of the list so that it can attach an item to the end of the list without having to walk the whole list first.) You could call this an example of a FIFO (first in, first out) linked list; in other circumstance (e.g., freelist space management) Oracle uses linked lists to represent stacks (or LIFOs—last in, first out).

Although the singly linked list works perfectly for our transaction table, there are many places in the code where Oracle needs to use doubly linked lists. Again, we don’t need to go into memory structures to see this, as the principle is visible in a well-known data structure. Here’s an extract from a dump of the header section of an index leaf block that demonstrates the point:

Leaf block dump
===============
header address 116564572=0x6f2a25c
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 1
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 1
kdxcosdc 2
kdxconro 571
kdxcofbo 1178=0x49a
kdxcofeo 1190=0x4a6
kdxcoavs 12
kdxlespl 0
kdxlende 0
kdxlenxt 4194525=0x4000dd
kdxleprv 4194523=0x4000db
kdxledsz 6
kdxlebksz 8036

Note the entries kdxlenxt (next leaf block) and kdxleprv (previous leaf block) near the end of the dump. If we want to do a large index range scan, it’s clearly convenient to be able to move from one leaf block to the next without having to climb up and down the B-tree, so a forward pointer is a good idea; and since Oracle also allows descending index range scans, then a backward pointer is also a good idea.

Hash Tables

Arrays are fine if you have to deal with fixed structures, and linked lists can be very useful with a relatively small number of associated items; in both cases you may have to pay the cost of walking through the whole array or list to find something, but that’s okay if the number of items involved is small.

How do you deal with a large number of items that are constantly appearing and disappearing when you need to find the right one quickly every time? Of course we’re familiar with the idea of using B-tree indexes for rapid data access, but the concept might not work too efficiently when we are handling lots of very small in-memory structures concurrently. However the B-tree index isn’t the only mechanism for locating data quickly. Some Oracle DBAs take advantage of hash clusters, and this mechanism translates very nicely into the most commonly used mechanism in the Oracle code for handling memory.

The hash concept is simple. You decide on a fixed number of buckets (powers of two are popular, as are prime numbers, for the number of buckets you choose because of the number of mathematical studies that have used them for investigating hashing functions). You then choose a hashing algorithm, a piece of arithmetic that you can apply to an object to produce a number between 1 and the number of buckets (or, if you think like a computer, zero and “number of buckets – 1”). For example, you might choose to distribute your friends across ten buckets using the algorithm “associate person with bucket N, where N is the last digit of their mobile phone number”; alternatively you might choose to distribute them across 16 buckets according to the algorithm “put person in bucket N, where N is the number of children they have mod 16.”

There are details that need some thought, of course: hashing your friends on the last digit of their phone number will probably spread them fairly evenly across the ten buckets, but you won’t be able to find a specific friend easily unless you already know the last digit of their phone number; moreover, you may have lots of friends, so scattering them over ten buckets could leave a lot of people in each bucket, which means you still have a lot of work to find a friend even after you’ve picked the right bucket. (So maybe you should have 100 buckets and hash on the last two digits, but that makes it even harder to remember which bucket to look in for a given friend.)

Hashing your friends by number of children is probably going to leave a lot of the higher-numbered buckets empty and place a lot of friends in each of the lower-numbered buckets—even though you may find it easier to remember which friend has how many children.

In both cases, of course, the hash value for a friend could change—they may get a new phone number, or they may produce another child.

These examples highlight a few important principles about hashing:

  • Different input values (friends) may hash to the same hash value (bucket).
  • You don’t want to have lots of items hashing to the same bucket.
  • Some hashing algorithms will spread the data more evenly than others.
  • The hashing algorithm must put an object in the same bucket every time it sees it.
  • You want the hashing algorithm to be applied to something useful.

Let’s take the library cache as an example of Oracle using hashing. At the moment, logging on to an instance with a fairly small SGA and dumping the library cache at level 2 (see Appendix), I can see that I have 131,072 buckets for my library cache with 5,880 buckets in use; most buckets have one object in them, 136 buckets have two objects, and one bucket has three. (In passing, of the 6,000 current objects, around 800 are currently child cursors, which are the things that appear in v$sql.)

images Note The number of hash buckets in the library cache seems to have a hard-coded upper limit of 131,072 in recent versions of Oracle. In earlier versions you could increase this by fiddling with a hidden parameter, but in later versions you can only reduce it.

When you pass an SQL statement to Oracle and say “find it” (or place it) in the library cache, Oracle treats the SQL statement as a stream of numbers and does some fancy arithmetic to it to create a bucket number—so the most memorable thing about the statement (its text) is the thing that tells Oracle where to find it—and the arithmetic does a good job of sharing different statements across a large number of different buckets (although some of the early implementations, around v7, weren’t all that clever about their choice of algorithm).

So we have the “even spread,” the “useful algorithm,” and “not much in a bucket.” But this still leaves two important questions: what, exactly, is a bucket, and how does Oracle deal with two statements ending up in the same bucket? The answers to these questions explain why I started this section with a discussion of arrays, pointers, and linked lists. A bucket is just an item in a segmented array, acting as the end point to a doubly linked list of objects. Figure 4-1 shows one (highly simplified) view of the library cache.

images Note You will see the terms hash bucket and hash chain used interchangeably. There’s no great point in splitting hairs about the way people use the two terms, but if you want to make a distinction, you can think of the bucket as being the fixed end point, and the chain as being the list attached to the end point.

images

Figure 4-1. First approximation to the structure of a very small library cache

To load a new object into the library cache, Oracle does a bit of arithmetic to decide which bucket the object belongs to, and then it links the object into the appropriate list—which means picking two existing objects that currently point to each other, setting the forward and backward pointers in the new object to point to them, and then modifying the backward pointer on one object and the forward pointer on the other object to point to the new object. Adding a new object to a linked list forces me to interfere with two existing objects, as shown in Figure 4-2.

images

Figure 4-2. Inserting an item into a (doubly) linked list

Conversely, there will be times when there is not enough memory (no single chunk of free memory large enough) to create a new object in the library cache, in which case Oracle uses a least recently used (LRU) algorithm to pick a few “random” objects that can be detached from their hash chains so that their memory can be attached to the memory free lists for reuse. Again, I have to interfere with the two objects on either side of the one I am removing to make them point to each other once I’ve taken the target object out of the chain.

This is the point at which we finally realize that we need a high-speed mechanism to stop accidents from happening to complex memory structures. Imagine that your session decides that it needs to use an object in the first hash bucket in the diagram; unfortunately, at the same time, my session has decided that it needs to free some memory to use it somewhere else and has decided that there is an item in the same hash bucket that can be removed.

If we are both allowed to access the contents of the hash bucket at the same time, your session could end up following a pointer to an object that I’ve just discarded because I haven’t yet had time to correct both pointers—and bear in mind, as you work through this scenario, that the operating system could preemptively stop my session running at any moment, so I could easily be in a position where I’ve got through half the work I need to do. Somehow we have to ensure that I can’t restructure the linked list if you’re walking along it, and you can’t walk the linked list if I’m restructuring it.

A well-known mantra among Oracle users is “Readers don’t block writers, and writers don’t block readers”—but this is only true at the data level; when you get down to the raw memory level, there are moments when readers must block writers, and where a single write must block all other operations.

images Note Oracle uses latches to eliminate problems that would appear if multiple processes were allowed to modify shared memory simultaneously; the patterns of use may vary but the intent is always the same. One pattern of use addresses the conflict between searching linked lists and modifying the contents of linked lists. Another, simpler pattern involves “isolating” a counter or pointer so that only one process can update it at a time (such as the control pointers for the redo log buffer).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.206.43