Log Writer

The log writer has one critical task: copy the contents of the log buffer from memory to disk. The activity surrounding this task has, however, increased over the years to cope with increased volumes of change and higher levels of concurrency. (The requirement has also expanded to allow for the possibility of writing to a remote standby database by way of calls to RFS [remote file server] processes from LNS [network server] processes in 10g and NSS or NSA [redo transport sync/async] processes in 11g.

In earlier versions of Oracle there was just a single contiguous area of memory called the log buffer, and, to a very large extent, the appearance of multiple public and private buffers hasn’t changed the mechanics that Oracle used on that single buffer, so we’ll start by talking as if there is still only a single buffer and cover the differences later.

images Note Recently disk manufacturers have started to produce disks with a 4KB sector size. For reasons we will see later, a larger block size for the log file can make lgwr waste more space in the log file; on the other hand, a smaller block size can make log file writes less efficient (because of the need for a read/write) cycle. From 11.2 Oracle lets the DBA choose which is the lesser of two evils by allowing you to specify the block size for the log file as 512 bytes, 1KB, or 4KB.

The log files are automatically formatted in blocks matching the disk sector size (typically 512 bytes), and the log buffer itself is mapped as a “rolling window” onto the current log file, with a 16-byte header labeling each block. Figure 6-1 uses a log buffer sized at 8 blocks to demonstrate the principle.

images

Figure 6-1. Simple schematic of log buffer

In Figure 6-1, the log buffer has filled and wrapped 11 times since the last log file switch (the first time around the separate block buffers were formatted to match blocks 1 to 8 of the file). At the moment we are on the 12th cycle through the buffer, which is why the last position in the buffer is formatted for block 88 (i.e., 8 blocks × 11 cycles). The log writer has just written blocks 89 to 91, and will shortly be writing blocks 92 to 94, and relabeling block 87 to become block 95.

images Note There are many notes on the Internet about sizing the log files, but many of them are out of date. Essentially, you shouldn’t need to set the log_buffer parameter from 10g and later; it’s automatically set by the instance at startup, typically allowing each public log buffer to be a few megabytes. In very special cases you may need to set the parameter to a value such as 16MB or 32MB to extend the memory allocation to multiple granules, but only if you are seeing significant time lost on log buffer space waits.

Since the log buffer is public, and subject to modification by multiple concurrent processes, you might expect it to be protected by latches. It is, but not directly. There are three (public) memory locations affecting the use of the log buffer, and it is these locations that are protected by latches. Two locations hold pointers, one identifying the start of free space in the buffer and the other identifying the end of free space. The third location is a flag showing whether or not lgwr is busy writing. With multiple public log buffers, there are two pointers per buffer, but there is just one write flag covering all the buffers—because there is just one lgwr. The mechanics of pointer manipulation are fairly straightforward, and Figure 6-2 gives an idea of the actions required.

images

Figure 6-2. Key points in the log buffer

In Figure 6-2 we see that lgwr is currently writing, and some sessions have allocated (and possibly filled) some space beyond the end of the write. Over the next few moments the following sequence of actions (with minor variations) will take place:

  • lgwr will complete its write, at which point it will also move the “End of free space” marker (pointer 2) up to the point labeled “Current write by lgwr ends here”/“Next write by lgwr starts here.”
  • lgwr will be triggered (we’ll see how and why in a moment) into writing more of the buffer to disk. Its target will be to copy out any of the log buffer currently allocated (and we’ll consider what that means and the possible side effects in a moment).
  • Other sessions will be generating more redo for copying into the buffer, so they will be allocating space for themselves by moving pointer 1 (“Start of free space”) forward to leave space behind the pointer that they can use. Eventually pointer 1 will fall off the end of the buffer, which means it will cycle back to the beginning of the buffer.

The description is simple, but the questions it raises are significant. What triggers lgwr? How do sessions avoid overwriting each other’s redo? What happens when a session wants more space than is free? What risks are there of destructive interference between lgwr and the other sessions? Let’s start with an easy one—what triggers lgwr.

Log Writer Writes

Traditionally there have been four reasons given for lgwr to start writing:

  • On a wake-up alarm call that lgwr sets itself every 3 seconds
  • When the (public) log buffer is one-third full
  • When the (public) log buffer holds more than 1MB of change records
  • When a session issues a commit; or rollback; (see note)

images Note As stated in Chapter 3, when a session issues a commit, it generates the redo describing how to update its transaction table slot in the undo segment header block, puts this redo into the log buffer, applies it to the undo segment header block, calls the log writer to flush the log buffer to disk, and then goes into a log file sync wait until the log writer lets it know that its entry in the log buffer has been copied to disk. A call to roll back ends in a similar fashion: after applying the undo change vectors (and logging the changes they make), the last steps taken by a rollback also update the transaction table slot and call the log writer. This commit/rollback mechanism is the fourth item in our list, and it’s the mechanism that makes transactions durable.

The first option is easy to observe, especially if you’re prepared to enable tracing for wait states on lgwr. Every 3 seconds you will see that lgwr is waiting on event rdbms ipc message (waiting for interprocess communication) with a timeout of 300 centiseconds. So, if nothing happens for a few seconds, lgwr will wake up, do a little tidying up, and write any outstanding log buffer to disk.

If you want to check the next two options (1MB or one-third full), you can do so by setting up carefully sized array updates from a single session and checking three statistics: messages sent and redo size in the user session, and messages received in the lgwr session. With the log_buffer set to a value less than 3MB, you can see a message going from the user session to lgwr when the redo size is one-third of the log buffer size (as reported by x$kcrfstrand rather than the log_buffer parameter); but when the log buffer is larger than 3MB, you can see that the messages start appearing when the redo size is about 1MB.

Of course, the idea of a space limit raises more questions. The log writer sleeps for 3 seconds at a time, so how does it know how much of the buffer has been used? The answer is that each session checks the total allocated space (i.e., the size of the gap between the “End of free space” and the “Start of free space”) every time it does an allocation. If the total space used exceeds the limit, then the session sends a message to lgwr immediately after it has copied its redo into the buffer.

But that raises another question: in a busy system where lots of sessions are generating bits of redo very quickly, won’t there be lots of redundant messages sent to lgwr the moment the amount of redo in the buffer exceeds the limit? This doesn’t happen, thanks to the write flag that I mentioned earlier.

When the log writer starts to write, it sets the write flag, and when it finishes writing, it clears the write flag; so any session that thinks it needs to send a message to the log writer checks this flag to see if lgwr is already writing, and if lgwr is writing, the session doesn’t send the message. This is where latching comes in: lgwr gets the redo writing latch to set and clear the flag (which means two gets for every write), and each session has to get the latch to read the flag so that it doesn’t read the flag while it’s in flux. In both cases the gets on the latch are in willing-to-wait mode.

MESSAGES

So the fourth cause of lgwr writing (a session issuing a commit;) is true—if we ignore for the moment an enhancement in 10g that I shall discuss shortly. However, as we have just seen, if lgwr is already writing, the foreground session will not send a message; it will simply increment redo synch writes and go into a log file sync wait. This leads to one more cause for lgwr writing.

When lgwr completes a write it can scan the list of active sessions to see which of them is waiting on a log file sync and post them to allow them to resume. (In earlier versions of Oracle, I think it used to scan the structure underlying v$session, but as systems got larger, later versions of Oracle may have introduced a secondary structure that listed just those sessions that were waiting for lgwr). When lgwr scans for sessions, it can see where in the log buffer their commit record was (parameter1 of v$session_wait is their target buffer#) so it can post the sessions that have been satisfied by the write and, if there are any sessions still outstanding, start a new write immediately. So a final trigger that makes lgwr write is that it has just finished writing and can see that there are already more commit records in the log buffer that need to be written.

PL/SQL Optimization

We have seen that after putting a commit record into the log buffer, a session will wait on a log file sync wait until lgwr posts the session to allow it to continue processing, and this is how Oracle ensures the durability of transactions. But this isn’t always the way things work. There is a little-known PL/SQL optimization (and you may argue with that choice of label) that means PL/SQL doesn’t always wait for the write to complete.

Consider, first, a simple PL/SQL block (see script core_commit_01.sql, available in the Source Code/Download area of the Apress web site [www.apress.com]):

begin

        for r in (
                select  id from t1
                where   mod(id,20) = 0
        ) loop
                update  t1
                set     small_no = small_no + .1
                where   id = r.id;

                commit;

        end loop;

end;
/

This simple piece of code updates every 20th row in a table I’ve created that holds 500 rows, for a total of 25 updates and 25 commits. If we run this code in isolation (so that other sessions don’t confuse the issue with concurrent redo generation and commits) and examine the session activity, we might expect to see the following statistics and wait events:

user commits (session statistic)             25
messages sent (session statistic)            25
redo synch writes (session statistic)        25
log file sync (session events)               25

messages received (lgwr statistic)           25
redo writes (lgwr statistic)                 25
log file parallel write (lgwr events)        25

If we saw this output we could interpret it as 25 cycles of the following sequence:

  • User session issues a commit
  • User session posts lgwr and increments redo synch writes
  • User session goes into a wait (log file sync) waiting to be posted by lgwr
  • Lgwr gets woken up
  • Lgwr writes the log buffer to disk, waiting a short time on each write

However, when I first ran this test (some 12 years ago on Oracle8i), I got results more like the following:

user commits (session statistic)             25
messages sent (session statistic)             6
redo synch writes(session statistic)          1
log file sync (session events)                1

messages received (lgwr statistic)            6
redo writes (lgwr statistic)                  6
log file parallel write (lgwr events)         6

Clearly the user session is not behaving as expected—it has posted lgwr to write a few times, but it has only incremented redo synch writes once, which suggests it didn’t stop and wait for lgwr to wake it up again. The user’s session is breaching the durability requirement; if the instance crashed somewhere in the middle of this loop, it’s entirely possible that a transaction that had been committed would not be recovered.

There is a rationale for this behavior: until the entire PL/SQL block has finished running, you (the end user) don’t know how many of the transactions have committed, so Oracle doesn’t try to make them recoverable until the moment before the PL/SQL block ends and control returns to the calling program; that’s the one point at which the session increments redo synch writes and goes into a log file sync wait. In fact, you might not see a log file sync at all; you could, for example, insert a call to dbms_lock.sleep() as the last statement in the block and make the session pause for a few seconds, giving lgwr time to do one of its 3-second timeouts and clear the buffer. With a long enough pause in place, the session would wake up, check the state of lgwr, and find that it had already written all the necessary redo log. In this case you would see redo synch writes incremented, but no messages sent and no log file sync wait.

images Note Generally speaking, if you issue commits inside a single database call, your session will not go to sleep and wait for lgwr to wake it up on each commit. Instead, it will go into a log file sync wait (incrementing redo synch writes) just once as the call is about to end. The commonest example of this is probably the case of commits inside a PL/SQL loop, which could, in principle, leave you in a position where some committed transactions will not be recovered after instance failure. Having made this point, though, it’s worth mentioning that the window of opportunity for things to go wrong is (typically) less than a few hundredths of a second. This strategy does not get used if the code is doing updates across database links, so there have been occasions in the past where I’ve used a totally redundant loopback database link to ensure that some PL/SQL code would wait for a log file sync on every commit.

There is a flaw with the rationale. Although you can’t normally tell exactly how many transaction have committed (because the moment you query the data to find out, you’re too late because some more have committed by the time you get the result), Oracle gives you various mechanisms (such as pipes, external procedures, or utl_file) to push a message out from the PL/SQL block—and this means you have a mechanism for seeing the discrepancy.

Consider, for example, a banking application that uses a PL/SQL block to loop through a table one row at a time, calling an external procedure to transmit funds across the world, updating each row to say the funds have been sent, and then committing for each row. With a little bad luck you could find, after recovering from an instance failure, that you have a table showing rows where funds were (apparently) still due to be sent even though they had already been sent. Technically an application like this has a design flaw—it’s trying to do a distributed transaction without using two-phase commit, and that can always lead to inconsistent data. If your loop involves a database link, which means Oracle is using two-phase commit internally, this commit optimization is disabled.

It’s worth taking a few moments to look a little more closely at the results from the simple PL/SQL code. We can see six messages from the user session to lgwr; this might prompt you to ask if there is some special algorithm in the loop telling Oracle how often to send messages. In fact, we’ve already seen the answer to that question. Every time the code issues a commit, your session checks to see what lgwr is doing; if lgwr is writing, it would be a waste of resources to send a write message; if lgwr isn’t writing, then your session sends a message. So if you try to repeat my little test, the number of messages you see will depend on the speed of your CPU (how fast can you go round the loop) compared with the speed of your disks (how long will it take lgwr to complete a write).

images Note You will have seen that the statistic redo synch writes is tied up very closely with the event log file sync; they are nearly measuring the same thing. In fact, it gets better than that because the statistic redo synch write time (in hundredths of a second) is a match—with rounding errors—for the time spent in log file sync waits. 11.2.0.2 takes this a little further with two new statistics: redo synch write time (usec), which is the time in microseconds, and a very useful indicator named redo synch long waits, which tells you how many times a session had to wait for a log file sync that was a little slow to complete (I can’t find a statement of the limit but, at the moment, my best estimate is something between 13 and 17 milliseconds).

ACID Anomaly

If you look back at Chapter 3, or at the note in the section “Log Writer Writes” a few pages back, and think about my description of how the commit; works, you may notice a problem with durability in Oracle. The steps are as follows:

  1. Create a change vector to update the transaction table.
  2. Copy the change vector into the log buffer.
  3. Apply the change vector to the undo segment header.
  4. Post lgwr to write.

Between steps 3 and 4, another session will be able to see the transaction changes even though the commit record hasn’t been written to disk. If the instance crashes between steps 3 and 4, then you could have a report (or a remote database) showing the results of a committed transaction that won’t be recovered when the instance restarts. It’s easy to demonstrate this effect:

  1. Session 1: Use oradebug to suspend lgwr.
  2. Session 2: Update some data and commit—the session will hang.
  3. Session 1: Query the data, and notice that the update is visible.
  4. Session 1: Crash the instance (shutdown abort).

When the instance restarts, the change in step 2 will not be visible and, as I said, this change might have been observed and recorded in another database, resulting in global inconsistency. On the plus side, lgwr doesn’t crash often, and the window of opportunity is extremely short.

The most astonishing thing about this anomaly (to me, at least) is that I’ve described how the mechanism works many times over the last 12 years, and didn’t notice the consequences of what I was saying until Tony Hasler (http://tonyhasler.wordpress.com) wrote a little blog item that raised the issue.

Commit Enhancements

It’s time to look at a change that appeared in 10g when Oracle moved from a “secret” optimization on commits to a published option for controlling the behavior of commits through parameter commit_write that appeared in 10.2 and then turned into the two parameters commit_logging and commit_wait in 11g. In its simplest form you can address the durability issue of the PL/SQL loop by changing a commit into commit write batch wait.

images Note The commit command allows for four new combinations: commit write [batch|immediate] [wait|nowait]. In 11g the batch/immediate choice can be set with the parameter commit_logging and the wait/nowait choice can be set with the parameter commit_wait. In 10g the single commit_write parameter is used to set both choices.

The quickest and simplest way to demonstrate the effects of the new commit options is to show the variations in a few critical statistics when I run my little PL/SQL loop test (on its own) with the original commit and then with the four different combinations offered by the new feature. Table 6-1 shows the results for an instance of Oracle 11.2.0.2.

images

The most important points to note in these results are as follows. If you specify the wait option, then your session will increment redo synch writes on every commit and wait on a log file sync. This generally results in a larger number of smaller writes and an increase in redo wastage (which we will examine shortly), redo size (because each write has its own little packet of management information when using multiple log buffers), and redo blocks written (the number of redo log blocks written to disk). You’ll notice, of course, that the value of redo entries is 25 compared to 50; the smaller number is the one we expected; it’s the standard 10g redo optimization (which we saw in Chapter 2) collecting redo change vectors for a small transaction into a single redo record.

If you specify immediate rather than batch, then two key changes appear: the commit change vector becomes a separate redo record (I can’t think why; perhaps it’s to minimize the time window on the ACID anomaly I previously mentioned), which gives you a small increase in the redo size, and—provided you haven’t specified wait—your session doesn’t send a message to lgwr at all; it simply assumes that something else will trigger a write in the near future.

The bottom line with the new commit options is simple: if you really need to ensure that every commit is guaranteed to be recoverable (e.g., in a banking system or medical system), then you should use the wait option, and the batch wait combination ought to give you a little edge of efficiency if the transactions are all small. If you don’t mind risking a few committed transactions disappearing in the event of an instance crash (e.g., a social network), then batch nowait is clearly worth looking at for the reduction in redo, redo wastage, and session waits.

Mechanics

In the last few pages I’ve managed to throw out teasers about lgwr writing as much of the log buffer as possible, redo wastage, and contention with or between user sessions. It’s time now to take a closer look at how the log buffer is used. We’ll look at some log writer mechanics first, followed by redo wastage, and then examine private redo, which is Oracle’s latest strategy for reducing contention.

Figure 6-3 shows the log buffer at a moment in time when several sessions have committed transactions, other sessions have been busy generating more redo, and the log writer is writing. It’s a very busy picture with lots of components and illustrating a number of different points.

images

Figure 6-3. A busy log buffer

In the moments before we got to the state in Figure 6-3, the log writer was woken up by a redo write message arriving from the session that generated the commit record labeled c1. Before lgwr could swing into action, various other sessions managed to get more redo records into the log buffer, including two sessions that created commit records c2 and c3, respectively, and posted further redo write messages to lgwr, so at this point there are three sessions waiting on log file sync waits. Note that it’s possible for there to be a delay between a session posting lgwr and the moment that lgwr actually starts running—it’s this time lag that allows the extra sessions time to post lgwr (redundantly) despite the fact that the Oracle code tries to avoid redundant calls by using the write flag.

At the moment it woke up, lgwr would have acquired the redo writing latch, set the write flag, and dropped the latch, thus ensuring that all future sessions could tell that it was busy writing. Then it would have acquired the redo allocation latch to identify the highest point allocated in the log buffer at that moment (i.e., the start of free space), moved the pointer up to the end of the block—this is the point labeled “Current write by lgwr ends here” (I’ll explain the significance of allocating to the end of the block in the next section, “Redo Wastage”)—dropped the latch, and started copying the log buffer to disk. Note that although c1 was the commit record that triggered the write, lgwr is going to write all the way through commit records c2 and c3 as well, earning c2 and c3 the title of group or piggyback commits.

So Figure 6-3 shows lgwr in the act of writing, but as it writes, more redo arrives, including a few more commit records (c4, c5). The sessions that supplied these commit records will have acquired the redo writing latch and discovered that lgwr was writing, so they won’t have posted lgwr (messages sent will not change) although they will have incremented redo synch writes and they will be waiting on log file sync waits. We can also see that some space has been allocated by a session that has not yet copied in the redo record it has prepared.

As lgwr completes its write, it will clear the write flag (getting and releasing the redo writing latch to do so), move the “End of free space” pointer up to the point it has just written (getting and releasing the redo allocation latch to do so), and run through the list of sessions waiting on log file sync waits, signaling the operating system to put back on the run queue any session whose commit record has just been written. (I may have the exact order of activity wrong here, but these are the steps that have to take place.)

images Note If there are a lot of processes waiting on lgwr, and they all wake up at the same time, this can lead to a peculiar priority inversion problem where lgwr is pushed off the run queue by the sessions it has woken—lgwr has been working hard up to this point and the sessions haven’t, so they get a higher priority than lgwr when they get back on the run queue. As a result you can sometimes see the system slowing down because lgwr isn’t allowed to work. In rare cases this means you might be justified in raising the priority of the lgwr process or preventing the operating system from applying its normal rules to lgwr for reducing the priority of tasks using too much CPU.

The next thing that lgwr does is notice that there are some sessions waiting on log file sync that were not cleared by its write, so it goes through the cycle again (set the write flag, move the “Start of free space” pointer, and start writing), except this time around it discovers that there’s a hole in the log buffer; a session has allocated space but not copied into it. You might ask how it can tell, and to answer that question we have to switch briefly to the foreground sessions at this point to see what steps they take to copy their redo into the log buffer. It works like this:

  1. Get a redo copy latch. Since the number of latches is 2 × cpu_count, we can try each in turn in immediate mode, only going to willing-to wait-mode if they are all busy. The latch we pick first is randomized so that different sessions won’t all end up waiting on the same latch.
  2. Get the redo allocation latch. (With multiple public redo threads, the copy latch will dictate which buffer, which dictates which allocation latch; I’ll make a few more comments on the effects of multiple log buffers a bit later in the chapter.)
  3. Move the start of free space pointer.
  4. Drop the redo allocation latch.
  5. Make the copy.
  6. Drop the redo copy latch.
  7. If the allocation has taken the used space over one-third of the log buffer or 1MB, or if the redo record was a commit record, then post lgwr to write (but get the redo writing latch first to check if lgwr is already writing).
  8. If the redo record was a commit record, increment redo synch writes, set up a log file sync wait, set up a 1-second (10g) or 10-centisecond alarm (11.2), and take self off run queue.

images Note When a session gets the redo allocation latch to allocate some space in the log buffer, it may find that there isn’t enough space available. In this case it will drop the latch and get the redo writing latch to check whether lgwr is writing, posting it to write if it isn’t already doing so. The session will then drop this latch and put itself into a wait for log buffer space. So another check that lgwr has to make when it completes a write is a check for sessions waiting on log buffer space so that it can post them to continue processing. If lgwr was already writing when the session checks, the session simply goes back to the redo allocation latch and has another go at allocating space because the current write from lgwr should free up some space in the log buffer.

So a session allocates space in the log buffer and then copies into that space while holding the redo copy latch. In theory this means that lgwr need only get (and then release) all the redo copy latches as a check for holes before it starts to write, because once it is holding all of them, any holes in the log buffer must have been filled. According to a note by Steve Adams, though (see www.ixora.com.au/tips/tuning/redo_latches.htm), Oracle stopped doing this after 8.0; instead, it has a mechanism for seeing who is currently holding the redo copy latches without obtaining the latch (using the internal equivalent of v$latchholder), and if there are any holders, it starts waiting on LGWR wait for redo copy until posted that the latch has been released.

By watching the latch activity, we can see that something like this must be happening, but I don’t know for certain what process posts lgwr (presumably it has to be the session holding the latch) and I don’t know how it knows it has to post lgwr. However, since parameter1 of the LGWR wait for redo copy wait is the copy latch #, and since each session will know which redo copy child latch it is holding, it’s not hard to imagine that there is some code a session always calls just before it drops the latch to check if it has been holding the latch that lgwr has been waiting for, and posting lgwr if it was.

LOG FILE SYNC

Redo Wastage

I pointed out earlier that one of the first things that lgwr does in response to a write request is to get the redo allocation latch and move the start of free space to the end of the current block. Figure 6-4 zooms in on Figure 6-3 as we move on from one write to the next.

images

Figure 6-4. Block boundaries and redo wastage

In Figure 6-4 I’m back to showing my tiny log buffer, with space for just eight blocks. The log writer has moved on from the write that cleared commit records c1, c2, and c3, and has noticed that a couple of sessions still have outstanding commit records in the buffer.

In this diagram, though, I have included the block boundaries, and you can see that the most recent space allocation doesn’t reach the end of a block. So why does lgwr get the redo allocation latch and move the start of free space forward, wasting some space in that last block?

Writes to disk have to operate at the granularity of the disk sector, which is typically 512 bytes—the commonest block size for the redo log (and that’s not an accident). If lgwr didn’t move the start of free space to the end of the block, it would have to write part of a sector to disk, which would require a pre-read and fill to take place. Moments later another session would fill in the rest of the space in that last block, and lgwr would have to either copy the same block from the buffer again or read the sector back from disk, fill just the extra redo, and write it out again. The idea of reading the sector back and writing it again was seen as a nasty performance threat (especially in the days of Oracle 6 when the redo log concept was introduced); and the idea of keeping track of a block that had already been partly written was, presumably, seen as an undesirable complication in the code.

So Oracle’s strategy is simple: move the pointer to the end of the block, write the block, never read, and never look back. The aim is to keep the code as simple as possible and keep streaming the log to disk without interruptions.

The space wasted by moving the pointer forward is reported as redo wastage. Since lgwr is the only session that deals with writing the log buffer, it’s the only thing that moves the pointer in this way, and hence is the only session that will show a non-zero value for redo wastage. Statistically the average volume of redo wastage per write is likely to be about half a block unless all your transactions are virtually the same size. One of the worst cases of “same size” transactions, though, appears in applications that do a very large number of very small data changes—the killer “row by row” applications that aren’t taking advantage of the features of the database.

In fact, in the example I used for Table 6-1, I’ve accidentally demonstrated the problem. Looking at the column for the stats for commit write immediate wait you can see that I’ve generated just over 12.5KB in redo size and a further 12KB in redo wastage. Each of my transactions needed just a few bytes more than a block (if you’re doing the arithmetic, don’t forget that there’s a 16-byte header that you can’t use in each block), so we lost most of the rest of the second block on each write.

This brings me back, of course, to my comments about newer disks and 4KB sector sizes. On one of the new disks my demo would have generated 12KB for the redo size, and about 89KB of redo wastage—that’s about 510 bytes of redo size for each of the 25 transactions and 3.5KB to get to the end of the block. There will be some applications where you will want to look carefully at what your redo wastage is likely to be, and make a choice between 512 or 1,024 bytes for the redo log block size to minimize the redo wastage but paying the penalty of rereading blocks, or using 4KB blocks to minimize the overheads of rereading and updating blocks but paying the penalty of increased redo wastage with the associated increase in the volume of redo written.

Private Redo

It’s about time I said something about the impact of the 10g enhancements on redo handling, and this basically means dealing with the role and use of private log buffers (usually called private redo threads) and multiple public log buffers. Fortunately there isn’t a lot of new stuff to add.

The first point to note is that the number of threads in use, both private and public, is dynamic and Oracle tries to keep the number to a minimum, only using more if the redo allocation latches come under pressure and start showing sleep time. So, for example, when I had a system with 4 public threads and 20 private threads and started running six very busy sessions for half an hour, I found that most of the time only 1 of the public threads and only 6 of the private threads had been used. (I had a tiny amount of work done in the second public thread and, thanks to various background jobs, a seventh private thread had reported a little work.)

images Note Although Oracle will dynamically modify the number of public and private redo threads in use, the memory is allocated and fixed at instance startup.

The second point is that the basic handling of the public threads doesn’t change. When a session wants to copy something into a public thread, the method is just the same as always. The significant difference is that sessions are usually copying the content of a private thread into the public thread so, instead of a small allocation for a redo record holding a couple of change vectors, a session might allocate 80KB for a redo record holding 120 change vectors. (See Chapter 2.)

A key difference, of course, is that when a session starts a transaction, it’s going to try to acquire a private redo thread. The initial (and maximum) number of private threads defined is transactions / 10, which means that for many systems the session will often find an available private thread; if there are no private threads available, the session will use one of the public threads in the traditional (pre-10g) fashion.

Each private thread has its own redo allocation latch, and to acquire a private thread a session will first get the associated latch. I had assumed that Oracle would use an immediate get to do this, on the basis that there are multiple private threads, each with its own private latch, so if it failed on the first thread/latch it could immediately try the next one (in much the same way that Oracle deals with the redo copy latches), only using a willing-to-wait get on the latch for the last thread. On a simple test, though, it looked as if the sessions were always getting the latches in willing-to-wait mode, leading me to believe that a session walks through the private redo structures without taking any latches, looking for the first thread that is not in use, and only gets the latch to mark that thread as in use. (This still leaves a window of opportunity for two sessions to try to get the same latch for the same private thread at the same time, of course, but generally reduces the work done trying.)

The final difference I want to mention relates to the write activity. When lgwr is posted to write, it has to go through every single (active) public redo thread in turn. This means it may have to go through all the activity of acquiring allocation latches and waiting for redo several times in the course of a single post. In part this explains why Oracle adjusts the number of active buffers dynamically to keep it to a minimum.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.19.217