C H A P T E R  10

Exadata Wait Events

The Oracle database is a very well instrumented piece of code, and it has been so for quite a while. It keeps track of the amount of time spent in discrete operations via the use of wait events. While the database software is quite complex, wait event analysis allows performance analysts to determine precisely where the database is spending its time. Many difficult performance problems can be resolved by analyzing these wait events. The introduction of Exadata has resulted in the creation of several new wait events to support the unique operations that are performed on the platform. This chapter will focus on describing these new events and how they relate to the activities actually being performed, while contrasting them with the wait events used by the database on non-Exadata platforms. It will also describe a few wait events that aren't specific to Exadata but play an important role on Exadata platforms.

A wait event is actually a section of code that is timed and assigned a name. The pieces of code covered by these events are quite often discrete operating system calls such as I/O requests, but some wait events cover considerably larger portions of code. The events may even include sections of code that contain other wait events. The naming of wait events has been somewhat inconsistent over the years, and many events have somewhat misleading names. Even though some of the event names are acknowledged to be potentially misleading, Oracle has been understandably reluctant to change them. The Exadata platform has provided an excuse to rename some of the I/O-related events, and as you'll see shortly, the developers took the opportunity to do just that.

Events Specific to Exadata

There are actually no events that exist exclusively on the Exadata platform. Wait events are built into the database code, and since the compute nodes on Exadata run standard Oracle Database software, all the wait events that are used when invoking Exadata-specific features are available in databases that are running on non-Exadata platforms as well. But because the Exadata features are only available on the Exadata platform, no time is ever allocated to those events on other platforms. By way of proof, consider the following example, which compares the events from V$EVENT_NAME (which exposes the valid wait events) first on an Exadata Database Machine and then on a standard 11gR2 database on a non-Exadata platform:

SYS@EXDB> select count(*) from v$event_name;

  COUNT(*)
----------
      1118

SYS@EXDB> select count(*) from v$event_name@LAB112;

  COUNT(*)
----------
      1118

SYS@EXDB> select name event from v$event_name
  2  minus
  3* select name from v$event_name@LAB112;

no rows selected

So there are no differences in the events. This does make it a little difficult to come up with a list of “Exadata Only” events.

The “cell” Events

The Oracle Exadata Storage Server Software User's Guide 11g Release 2 provides a table of wait events. All of them start with the word “cell.” The manual lists nine such events. One of them (cell interconnect retransmit during physical read) doesn't actually exist.

There are also another nine events having the word “cell” in their names that are not included in the documentation. So we'll use the complete list of “cell” events as a starting point. You can query V$EVENT_NAME for that list, and you'll get the following results. Notice that most of the events are in one of the I/O classes. There are only 17 rows, because one of the documented events does not exist in the version of Oracle that we're running (11.2.0.2).

SYS@POC2> @valid_events
Enter value for event_name: cell%

EVENT                                         CLASS
--------------------------------------------- -------------------------
cell list of blocks physical read             User I/O
cell multiblock physical read                 User I/O
cell single block physical read               User I/O
cell smart file creation                      User I/O
cell smart index scan                         User I/O
cell smart table scan                         User I/O
cell statistics gather                        User I/O
cell manager closing cell                     System I/O
cell manager discovering disks                System I/O
cell manager opening cell                     System I/O
cell smart incremental backup                 System I/O
cell smart restore from backup                System I/O
cell manager cancel work request              Other
cell smart flash unkeep                       Other
cell worker online completion                 Other
cell worker retry                             Other
cell worker idle                              Idle

17 rows selected.

The following sections will cover all of these events, along with a couple of additional events that have special applicability to Exadata.

Plan Steps That Trigger Events

First, though, it might be interesting to see what operations (plan steps) cause the “cell” wait events to occur. Here is a query against DBA_HIST_ACTIVE_SESS_HISTORY on an active production system running on an Exadata V2 that shows cell events and the operations that caused them:

SYS@EXDB1> select event, operation,  count(*) from (
  2  select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
  3    from DBA_HIST_ACTIVE_SESS_HISTORY
  4    where event like 'cell %')
  5    group by operation, event
  6    order by 1,2,3
  7  /

EVENT                              OPERATION                                  COUNT(*)
---------------------------------- ---------------------------------------- ----------
cell list of blocks physical read                                                   62
                                   DDL STATEMENT                                     2
                                   INDEX FAST FULL SCAN                              1
                                   INDEX RANGE SCAN                               3060
                                   INDEX STORAGE FAST FULL SCAN                      7
                                   INDEX STORAGE SAMPLE FAST FULL SCAN              10
                                   INDEX UNIQUE SCAN                              1580
                                   INSERT STATEMENT                                  6
                                   TABLE ACCESS BY GLOBAL INDEX ROWID              151
                                   TABLE ACCESS BY INDEX ROWID                    5458
                                   TABLE ACCESS BY LOCAL INDEX ROWID               131
                                   TABLE ACCESS STORAGE FULL                       183
                                   TABLE ACCESS STORAGE SAMPLE                       2
                                   TABLE ACCESS STORAGE SAMPLE BY ROWID RAN          1
cell multiblock physical read                                                      3220
                                   DDL STATEMENT                                   157
                                   INDEX FAST FULL SCAN                             94
                                   INDEX RANGE SCAN                                  2
                                   INDEX STORAGE FAST FULL SCAN                   6334
                                   INDEX STORAGE SAMPLE FAST FULL SCAN             429

                                   UNIQUE SCAN                                       2
                                   VIEW ACCESS STORAGE FULL                        634
                                   MAT_VIEW ACCESS STORAGE SAMPLE                   56
                                   TABLE ACCESS BY GLOBAL INDEX ROWID                5
                                   TABLE ACCESS BY INDEX ROWID                     484
                                   TABLE ACCESS BY LOCAL INDEX ROWID                 3
                                   TABLE ACCESS STORAGE FULL                     41559
                                   TABLE ACCESS STORAGE SAMPLE                    1763
                                   TABLE ACCESS STORAGE SAMPLE BY ROWID RAN         78
                                   UPDATE                                            4

cell single block physical read                                                 181186
                                   BUFFER SORT                                       1
                                   CREATE TABLE STATEMENT                           67
                                   DDL STATEMENT                                   985
                                   DELETE                                        11204
                                   DELETE STATEMENT                                  6
                                   FIXED TABLE FIXED INDEX                         352
                                   FOR UPDATE                                       27
                                   HASH GROUP BY                                     3
                                   HASH JOIN                                        14
                                   HASH JOIN RIGHT OUTER                             1
                                   INDEX BUILD NON UNIQUE                           80
                                   INDEX BUILD UNIQUE                                6
                                   INDEX FAST FULL SCAN                              9
                                   INDEX FULL SCAN                                1101
                                   INDEX RANGE SCAN                              17597
                                   INDEX RANGE SCAN (MIN/MAX)                        1
                                   INDEX RANGE SCAN DESCENDING                       6
                                   INDEX SKIP SCAN                                 691
                                   INDEX STORAGE FAST FULL SCAN                    313
                                   INDEX STORAGE SAMPLE FAST FULL SCAN              72
                                   INDEX UNIQUE SCAN                             30901
                                   INSERT STATEMENT                               5174
                                   LOAD AS SELECT                                  120
                                   LOAD TABLE CONVENTIONAL                        5827
                                   MAT_VIEW ACCESS STORAGE FULL                      3
                                   MAT_VIEW ACCESS STORAGE SAMPLE                    1
                                   MERGE                                            12
                                   PX COORDINATOR                                    1
                                   SELECT STATEMENT                                978
                                   SORT CREATE INDEX                                 1
                                   SORT GROUP BY                                     1
                                   SORT JOIN                                         5
                                   SORT ORDER BY                                     2
                                   TABLE ACCESS BY GLOBAL INDEX ROWID             5812
                                   TABLE ACCESS BY INDEX ROWID                   65799
                                   TABLE ACCESS BY LOCAL INDEX ROWID              4591
                                   TABLE ACCESS BY USER ROWID                      464
                                   TABLE ACCESS CLUSTER                             57
                                   TABLE ACCESS STORAGE FULL                      7168
                                   TABLE ACCESS STORAGE SAMPLE                     205
                                   TABLE ACCESS STORAGE SAMPLE BY ROWID RAN         24
                                   UNION-ALL                                         7
                                   UPDATE                                        89353
                                   UPDATE STATEMENT                                367
                                   WINDOW CHILD PUSHED RANK                          2
                                   WINDOW SORT                                       1
                                   WINDOW SORT PUSHED RANK                           1
cell smart file creation                                                            35
                                   DELETE                                            3
                                   INDEX BUILD NON UNIQUE                            5
                                   LOAD AS SELECT                                    3
                                   LOAD TABLE CONVENTIONAL                           1
                                   UPDATE                                            1
cell smart incremental backup                                                      714
cell smart index scan                                                               14
                                   INDEX STORAGE FAST FULL SCAN                     42
                                   INDEX STORAGE SAMPLE FAST FULL SCAN              32
cell smart table scan                                                              163
                                   MAT_VIEW ACCESS STORAGE FULL                      1
                                   TABLE ACCESS STORAGE FULL                     12504

Obviously this output does not show all the possible combinations, but it should give you an idea of the relative frequency of events and which operations generally motivate them.

Exadata Wait Events in the User I/O Class

The User I/O Class is far and away the most important for Exadata. The most interesting events in this category are of course the two Smart Scan events (cell smart table scan and cell smart index scan). These are the events that record time for the primary Offloading optimizations provided by Exadata, which include predicate filtering, column projection, and storage index usage. The User I/O Class also contains three events described as physical I/O events. These three events actually measure time for physical I/O using the more familiar multi-block and single-block read mechanisms we're used to seeing on non-Exadata platforms, although their names have been changed to something a little more meaningful. Finally, there are two events that don't really seem to belong in the User I/O category at all. One has to do with initialization of blocks when file space is allocated. The other has to do with gathering statistical information from the storage cells. We'll cover each of these wait events in turn in the next several sections, starting with the Smart Scan events.

cell smart table scan

The cell smart table scan event is what Oracle uses to account for time spent waiting for Full Table Scans that are Offloaded. It is the most important new event on the Exadata platform. Its presence or absence can be used to verify whether a statement benefited from Offloading or not. As discussed in Chapter 2, Offloading only occurs when Oracle is able to do direct path reads. Consequently, this event replaces the direct path read event in most cases on Exadata. As with normal direct path reads, data is returned directly to the PGA of the requesting process on the database server (either the user's shadow process or a parallel slave process). Blocks are not returned to the buffer cache.

Event Meaning

Although the mechanism for performing reads across the InfiniBand network is very different from that for normal reads on non-Exadata platforms, the code path driving the smart scans is actually very similar to a direct path read on a non-Exadata platform. The main difference lies in the fact that each request to a storage cell contains a reference to the metadata of the statement, which in the case of Exadata includes the predicates and the list of columns to be returned. Since the storage cells have access to this information, they can apply the filters and do the column projection before returning the data to the requesting process. These optimizations are applied to each set of blocks as they are requested. The processes on the database servers that request the data have access to the ASM extent map and can therefore request the required allocation units (AUs) from each storage cell. The storage cells read the requested AU and apply the predicate filters and, if any rows satisfy the filters, return the projected columns to the requesting process. The process then requests the next AU and the entire routine is repeated until all the data has been scanned. So this event will occur repeatedly in a large scan.

images Note Column Projection is one of the major optimizations provided by Smart Scans. The feature is slightly misunderstood. It does not pass only columns in the select list back to the database servers; it also passes back some columns from the WHERE clause. Older versions of cellsrv passed all columns specified in a WHERE clause back to the database tier. Later versions have corrected this behavior to include only columns that are involved in join predicates.

Here's a portion of a 10046 trace file showing some of the events. The query generating this trace file did a full table scan which was performed via the direct read mechanism. (Note that we've shortened some of the lines to get them to fit, primarily by stripping off the time values):

PARSING IN CURSOR #2 len=32 dep=0 uid=0 oct=3 lid=0 hv=123 ad='196' sqlid='162wjnvwyybhn'
select sum(pk_col) from kso.skew

END OF STMT
PARSE #2:c=2000,e=2424,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=568322376
EXEC #2:c=0,e=34,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568322376
WAIT #2: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='KJC: Wait for msg sends to complete' ela= 10 msg=6674450368 dest|rcvr=65536
WAIT #2: nam='reliable message' ela= 1107 channel context=6712270872 channel handle=66967991
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 104 name|mode=126 2=65575 0=1 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 103 name|mode=126 2=65575 0=2 obj#=-1
WAIT #2: nam='cell smart table scan' ela= 162 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 244 cellhash#=379339958 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 181 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 1285 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 1327 cellhash#=379339958 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 1310 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 19755 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 39 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 24 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 847 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 44 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 674 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 32 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 39 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 928 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 37 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 97 cellhash#=379339958 p2=0 p3=0 obj#=66849

This portion of the trace file also shows the enq: KO - fast object checkpoint event, which is used to ensure that any dirty blocks for the scanned object are flushed to disk prior to beginning the scan. By the way, the direct path read event is not completely eliminated on Exadata platforms. In fact, we can use a hint to disable Offloading and see how the same statement behaves without Offloading.

PARSING IN CURSOR #2 len=84 dep=0 uid=0 oct=3 lid=0 hv=123 ad='196' sqlid='cczz6y6rtz4bn'
select /*+ OPT_PARAM('cell_offload_processing' 'false') */ sum(pk_col) from kso.skew
END OF STMT
PARSE #2:c=26995,e=33582,p=1,cr=100,cu=0,mis=1,r=0,dep=0,og=1,plh=568322376
EXEC #2:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568322376
WAIT #2: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 3 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='KJC: Wait for msg sends to complete' ela= 13 msg=66 dest|rcvr=65536 mtype=12
WAIT #2: nam='reliable message' ela= 1355 channel context=6712270872 channel
  handle=6696795216 broadcast message=6741335856 obj#=-1 tim=1285455676264108
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 172 name|mode=126 2=65573 0=1 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 221 name|mode=126 2=65573 0=2 obj#=-
WAIT #2: nam='kfk: async disk IO' ela= 14 count=1 intr=0 timeout=4294967295 obj#=66849

WAIT #2: nam='direct path read' ela= 23746 file number=7 first dba=76322 block cnt=94
  obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 28 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 20 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 11 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 6 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='direct path read' ela= 18566 file number=7 first dba=76800 block cnt=128
  obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 19 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 20 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 6 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='direct path read' ela= 35083 file number=7 first dba=77184 block cnt=128
  obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 19 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 22 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 7 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 5 count=1 intr=0 timeout=4294967295 obj#=66849

Note that we still have the enq: KO – fast object checkpoint events for flushing dirty blocks. So it is clear that the cell smart table scan event replaces this event.

Parameters

The parameters for this event are not particularly informative. Only the object ID of the table being scanned and the cell hash number are provided:

P1 - Cell hash number

P2 - Not used

P3 - Not used

obj# - The object number of the table being scanned

You'll notice that the direct path read event (which cell smart table scan replaces) provides additional information including the file number, the offset into the file (first dba) and the number of contiguous blocks read (block cnt). On the other hand, with the direct path read event, there is no indication how the read requests are routed to the individual cells.

images Note The Cell hash number reported with many of the wait events can be found in the V$CELL view. This view has only two columns, CELL_PATH and CELL_HASHVAL. The CELL_PATH column actually contains the IP address of the storage cell.

cell smart index scan

Time is clocked to the cell smart index scan event when fast full index scans are performed that are Offloaded. This event is analogous to cell smart table scan, except that the object being scanned is an index. It replaces the direct path read event and returns data directly to the PGA of the requesting process as opposed to the buffer cache.

Event Meaning

This event does not show up very often on the systems we have observed, probably for several reasons:

  • Exadata is quite good at doing full table scans and so the tendency is to eliminate a lot of indexes when moving to the platform.
  • Direct path reads are not done as often on index scans as they are on table scans. One of the important changes to Oracle 11.2 is the aggressiveness with which it does direct path reads on serial table scans. This enhancement was probably pushed forward specifically in order to allow Exadata to do more Smart Full Table Scans, but regardless of that, without this feature, only parallel table scans would be able to take advantage of Smart Scans. The same enhancement applies to index fast full scans. That is, they can also be done via serial direct path reads. However, the algorithm controlling when they happen appears to be less likely to use this technique with indexes (probably because the indexes are usually much smaller than tables).
  • In addition, only fast full scans of indexes are eligible for Smart Scans (range scans and full scans are not eligible).

images Kevin says: A SmartScan on an index is possible without parallelism, but the rules are strict. If you alter the session to disable parallel query and set _serial_direct_read=TRUE in the session, and then run your test, you should get a smart, serial, index fast full scan (IFFS). If not, then you're looking at a bug. In short, to get a serial IFFS the object needs to be more than five times the size of the SGA.

Gluing range scans to smart index scans can be done, but it's very messy. There isn't likely much to be gained there in terms of completing a query.

As a result of these issues, cell smart index scans are fairly rare compared to cell smart table scans. It is, of course, possible to encourage the feature with hints (such as parallel_index) or by decorating specific indexes with a parallel degree setting of greater than 1. Here's an excerpt from a 10046 trace file showing the event:

PARSING IN CURSOR #1 len=112 dep=1 uid=0 oct=3 lid=0 hv=123 ad='197' sqlid='dnmumhk9599p7'
select /*+ parallel_index (a skew2_all 8) index_ffs (a skew2_all) */ sum(pk_col) from kso.skew2 a where col1 > 1
END OF STMT
PARSE #1:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2801817111,tim=1285372448705192
WAIT #1: nam='PX Deq: Execution Msg' ela= 3124 sleeptime/senderid=268 passes=1 p3=0 obj#=-1
WAIT #1: nam='cell smart index scan' ela= 133 cellhash#=379339958 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 522 cellhash#=2133459483 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 277 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 1910 cellhash#=379339958 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 330 cellhash#=2133459483 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 348 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 45162 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 13 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 207 cellhash#=379339958 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 30 cellhash#=379339958 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 779 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 30 cellhash#=2133459483 p2=0 p3=0 obj#=66856

Note that this trace file was produced by one of the parallel slave processes and not the requesting process. The trace produced for the same statement when Offloading is disabled should look familiar. Here's an excerpt:

PARSING IN CURSOR #1 len=158 dep=1 uid=0 oct=3 lid=0 hv=338 ad='196' sqlid='g2966dwa2xfm8'
select /*+ OPT_PARAM('cell_offload_processing' 'false') parallel_index (a skew2_all 8) index_ffs (a skew2_all) */ sum(pk_col) from kso.skew2 a where col1 > 1
END OF STMT
PARSE #1:c=0,e=94,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2801817111,tim=1285546990415744
WAIT #1: nam='PX Deq: Execution Msg' ela= 2377 sleeptime/senderid=268566527 passes=1 p3=0
WAIT #1: nam='latch: cache buffers chains' ela= 44 address=6636639304 number=150 tries=0
WAIT #1: nam='kfk: async disk IO' ela= 12 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='direct path read' ela= 28957 file number=6 first dba=13 block cnt=124 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 32 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 8 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='direct path read' ela= 1410 file number=6 first dba=1304064 block cnt=128 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 21 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 23 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 23 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 23 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 32 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 28 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 21 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 25 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 31 count=1 intr=0 timeout=4294967295 obj#=66856

WAIT #1: nam='kfk: async disk IO' ela= 7 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='direct path read' ela= 16725 file number=6 first dba=1303171 block cnt=125 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 27 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 8 count=1 intr=0 timeout=4294967295 obj#=66856

Again we see the repeating pattern of a single direct path read event followed by several kfk: async disk IO events. The cell smart index scan event replaces both of these events. The enq: KO – fast object checkpoint events for flushing dirty blocks prior to starting the direct path reads are still present (although not shown in this excerpt because they occur in the query coordinator process, not in the parallel slave processes).

Parameters

Just as with the cell smart table scan event, the parameters for cell smart index scan do not contain a lot of details. The cell hash number and the object ID of the segment being scanned are the only information provided.

P1 - Cell hash number

P2 - Not used

P3 - Not used

obj# - The object number of the index being scanned

cell single block physical read

This event is equivalent to the db file sequential read event used on non-Exadata platforms. Single block reads are used most often for index access paths (both the index block reads and the table block reads via rowids from the index lookups). They can also be used for a wide variety of other operations where it makes sense to read a single block.

Event Meaning

Here is the output of a query that shows the operations that resulted in the cell single block physical read wait event on an active production system:

SYS@EXDB1> select event, operation,  count(*) from (
  2  select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
  3    from DBA_HIST_ACTIVE_SESS_HISTORY
  4    where event like 'cell single%')
  5    group by operation, event
  6    order by 1,2,3
  7  /

EVENT                             OPERATION                                  COUNT(*)
--------------------------------- ---------------------------------------- ----------
cell single block physical read                                                 13321
                                  CREATE TABLE STATEMENT                           35
                                  DDL STATEMENT                                   118
                                  DELETE                                          269
                                  FIXED TABLE FIXED INDEX                           3
                                  FOR UPDATE                                        2
                                  HASH JOIN                                         4
                                  HASH JOIN RIGHT OUTER                             8
                                  INDEX FULL SCAN                                9283
                                  INDEX FULL SCAN (MIN/MAX)                         1
                                  INDEX RANGE SCAN                               2763
                                  INDEX STORAGE FAST FULL SCAN                      6
                                  INDEX STORAGE SAMPLE FAST FULL SCAN              13
                                  INDEX UNIQUE SCAN                              1676
                                  INSERT STATEMENT                               1181
                                  LOAD AS SELECT                                    6
                                  LOAD TABLE CONVENTIONAL                          92
                                  MERGE                                           106
                                  SELECT STATEMENT                                 41
                                  SORT ORDER BY                                     6
                                  TABLE ACCESS BY GLOBAL INDEX ROWID            10638
                                  TABLE ACCESS BY INDEX ROWID                    8714
                                  TABLE ACCESS BY LOCAL INDEX ROWID             10446
                                  TABLE ACCESS CLUSTER                             12
                                  TABLE ACCESS STORAGE FULL                       776
                                  TABLE ACCESS STORAGE SAMPLE                      40
                                  UPDATE                                         8116

As you can see, row access via an index is the most common operation that generates this event. You should also be aware that Exadata provides a large amount of flash cache (384G) on each storage cell. For that reason, physical reads (both multi-block and single-block) are considerably faster than on most disk-based storage systems. Here is an excerpt from an AWR report showing a histogram of single block reads for the instance:

                                 % of Waits
                                 -----------------------------------------
                           Total

Event                      Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s  >1s
-------------------------- ----- ---- ---- ---- ---- ----- ----- ---- ----
cell single block physical 2940K 94.4  3.2   .3   .6    .9    .5   .2   .0

Notice that about 95% of the cell single block physical read events take less than 1ms. This is fairly representative of several production systems that we've observed.

Parameters

The cell single block physical read event provides more information than most cell events. The parameters allow you to tell exactly which object was read along with providing the disk and cell where the block was stored.

P1 - Cell hash number

P2 - Disk hash number

P3 - Total bytes passed during read operation (always 8192 assuming 8K block size)

obj# - The object number of the object being read

cell multiblock physical read

This is another renamed event. It is equivalent to the less clearly named db file scattered read event. On non-Exadata platforms, Oracle Database 11gR2 still uses the db file scattered read event whenever it issues a contiguous multi-block read to the operating system.

Event Meaning

This event is generally used with Full Table Scans and Fast Full Index scans, although it can be used with many other operations. The new name on the Exadata platform is much more descriptive than the older name. This wait event is not nearly as prevalent on Exadata platforms as on non-Exadata platforms, because Exadata handles many full scan operations with Smart Scans that have their own wait events (cell smart table scan and cell smart index scan). The cell multiblock physical read event on Exadata platforms is used for serial Full Scan operations on tables that are below the threshold for serial direct path reads. That is to say, you will see this event used most often on Full Scans of relatively small tables. It is also used for Fast Full Index Scans that are not executed with direct path reads. Here is the output of a query that shows the operations that resulted in the cell multiblock physical read wait event on an active production system:

EVENT                            OPERATION                                  COUNT(*)
-------------------------------- ---------------------------------------- ----------
cell multiblock physical read                                                    764
                                 DDL STATEMENT                                    28
                                 INDEX FAST FULL SCAN                              2
                                 INDEX STORAGE FAST FULL SCAN                    657
                                 INDEX STORAGE SAMPLE FAST FULL SCAN             133
                                 TABLE ACCESS BY INDEX ROWID                      74

                                 TABLE ACCESS BY LOCAL INDEX ROWID              1428
                                 TABLE ACCESS STORAGE FULL                      5046
                                 TABLE ACCESS STORAGE SAMPLE                     916
Parameters

The cell multiblock physical read event also provides more information than most cell events. The parameters in the following list allow you to tell which object was read and identifies the disk and cell where the blocks were stored. The total bytes passed should be a multiple of the block size.

P1 - Cell hash number

P2 - Disk hash number

P3 - Total bytes passed during read operation

obj# - The object number of the object being read

cell list of blocks physical read

This event is a replacement for the db file parallel read event. It appears that the developers took the opportunity to rename some of the events that are related to disk operations, and this is one of those events. The new name is actually much more descriptive than the previous name, since the wait event has nothing whatsoever to do with parallel query or parallel DML.

Event Meaning

This event is used for a multi-block read of noncontiguous blocks. This is more effective with async I/O, which is enabled on Exadata by default. This event can be provoked by several operations. The most common are Index Range Scans, Index Unique Scans and Table Access By Index Rowid. The most common reason for the event is index pre-fetching. Here is the output of a query that shows the operations that resulted in this wait event on an active production system:

SYS@EXDB1> select event, operation,  count(*) from (
  2  select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
  3    from DBA_HIST_ACTIVE_SESS_HISTORY
  4    where event like 'cell list%')
  5    group by operation, event
  6    order by 1,2,3
  7  /

EVENT                               OPERATION                                COUNT(*)
----------------------------------- ---------------------------------------- --------
cell list of blocks physical read                                                   2
                                    INDEX RANGE SCAN                              156
                                    INDEX STORAGE FAST FULL SCAN                    1
                                    INDEX UNIQUE SCAN                              66
                                    TABLE ACCESS BY GLOBAL INDEX ROWID             90
                                    TABLE ACCESS BY INDEX ROWID                  1273
                                    TABLE ACCESS BY LOCAL INDEX ROWID            2593

                                    TABLE ACCESS STORAGE FULL                      20
                                    TABLE ACCESS STORAGE SAMPLE                     1

As you can see, the vast majority of these events were motivated by index access paths. By the way, on non-Exadata platforms, noncontiguous multi-block reads still clock time to the old db file parallel read event. It's also possible for this older wait event name to show up on an Exadata platform for some operations.

Parameters

The cell list of blocks physical read event provides more information than most cell events. The following parameters allow you to tell exactly which object was read along with identifying the disk and cell where the block was stored.

P1 - Cell hash number

P2 - Disk hash number

P3 - Number of blocks read

obj# - The object number of the object being read

cell smart file creation

Exadata has an optimization technique that allows the storage cells to do the initialization of blocks when a data file is created or extended. This occurs when a tablespace is created or a data file is manually added to a tablespace. However, it can also occur when a data file is automatically extended during DML operations.

Event Meaning

We previously stated that this event seemed out of place in the User I/O class. However, if it occurs because of DML operations, it makes sense to have it in this category. At any rate, offloading the block formatting eliminates CPU usage and I/O from the database servers and moves it to the storage tier. When this occurs, time is collected in the cell smart file creation event. This event replaces the Data file init write event that is still used on non-Exadata platforms. Here is the output of a query from a busy production system showing operations that generated this event:

SYS@EXDB1> select event, operation,  count(*) from (
  2  select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
  3    from DBA_HIST_ACTIVE_SESS_HISTORY
  4    where event like 'cell smart file%')
  5    group by operation, event
  6    order by 1,2,3
  7  /

EVENT                      OPERATION                 COUNT(*)
-------------------------- ------------------------- --------
cell smart file creation                                   35
                           DELETE                           3
                           INDEX BUILD NON UNIQUE           5
                           LOAD AS SELECT                   3
                           LOAD TABLE CONVENTIONAL          1
                           UPDATE                           1

You'll notice that on this particular system, the cell smart file creation event was occasionally generated by a DELETE statement. The fact that a DELETE could cause this event might be a little surprising. But remember that this event is actually timing a section of code that does block formatting, not file creation.

Parameters

The only parameter of interest for this event is P1, which shows which cell was being accessed when this event was generated.

P1 - Cell hash number

P2 - Not used

P3 - Not used

cell statistics gather

The cell statistics gather event records time spent reading from various V$ and X$ tables. Although the event is grouped in the User I/O category, it does not refer to I/O in the sense of reading and writing to and from disk.

Event Meaning

Time is clocked to this event when a session is reading from V$CELL, V$CELL_THREAD_HISTORY (X$KCFISOSSN), V$CELL_REQUEST_TOTALS (X$KCFISOSST), and a few other X$ tables in the same family. The event is miscategorized in our opinion and does not really belong in the User I/O category.

Parameters

The parameters for this event provide no additional information. In fact, values are not even set for the parameters in this event. Here are the parameter definitions:

P1- Cell hash number (always 0)

P2 - Not used

P3 - Not used

Exadata Wait Events in the System I/O Class

The Exadata wait events that are assigned to the System I/O class are of less importance and do not generally show up as major consumers of time. The backup events are the most interesting as they record time for sections of code that have been optimized on the Exadata platform. The others are simply housekeeping events. The non-backup events are listed in Table 12.1, while the backup events are detailed in the following sections.

images

cell smart incremental backup

This event is used to measure time spent waiting on RMAN when doing an Incremental Level 1 backup. Exadata optimizes incremental backups by offloading much of the processing to the storage tier. This new wait event was added to account for time spent waiting on the optimized incremental backup processing that is offloaded to the storage cells.

Event Meaning

Interestingly, an Incremental Level 0 backup does not result in this wait event even though the word “incremental” is in the RMAN command. That is because the Level 0 backup doesn't actually do an incremental backup at all. It generates a full backup that is flagged as a baseline for future incremental backups. Here's an excerpt from a 10046 trace file generated on a process that was doing an incremental Level 1 backup:

WAIT #0: nam='cell smart incremental backup' ela= 27 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 54 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 119 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 603 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 120 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='control file sequential read' ela= 378 file#=0 block#=1 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 378 file#=1 block#=1 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 254 file#=0 block#=15 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 332 file#=0 block#=17 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 268 file#=0 block#=309 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 15794 file#=0 block#=307 blocks=1 obj#=-1
WAIT #0: nam='control file parallel write' ela= 351 files=2 block#=310 requests=2 obj#=-1
WAIT #0: nam='control file parallel write' ela= 404 files=2 block#=18 requests=2 obj#=-1
WAIT #0: nam='control file parallel write' ela= 211 files=2 block#=16 requests=2 obj#=-1
WAIT #0: nam='control file parallel write' ela= 294 files=2 block#=1 requests=2 obj#=-1
WAIT #0: nam='control file sequential read' ela= 944 file#=0 block#=1 blocks=1 obj#=-1
WAIT #0: nam='KSV master wait' ela= 13352 p1=0 p2=0 p3=0 obj#=-1 tim=1285518586442775
WAIT #0: nam='ASM file metadata operation' ela= 32 msgop=41 locn=0 p3=0 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 333 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='Disk file operations I/O' ela= 67 FileOperation=3 fileno=1 filetype=10 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 392 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 95 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 169 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 104 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 128 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 115 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 722 cellhash#=3176594409 p2=0 p3=0 obj#=-1
Parameters

The only parameter used for this event is P1, which shows which cell was responsible for generating the event.

P1 - Cell hash number

P2 - Not used

P3 - Not used

obj# - Not used

images Note The obj# field is a part of many wait events, even some that are not specifically related to an individual object. Be aware that in some cases, the value may be set by one event and then not cleared appropriately when the wait ends, resulting in meaningless values left in place for the next wait event. In the previous example, the obj# was cleared (set to a value of -1).

cell smart restore from backup

This event is used to measure time spent waiting on RMAN when doing a restore. Exadata optimizes RMAN restores by offloading processing to the storage cells.

Event Meaning

The event actually records time related to file initialization during a restore. Here's an excerpt from a 10046 trace file taken while a restore was in progress:

The input backup piece +RECO/test/backupset/2010_09_26/nnndn0_tag20100926t111118_0.1263.730725079 is in BASIC 11.2 compressed format.

WAIT #0: nam='kfk: async disk IO' ela= 98 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 757 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 94 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='RMAN backup & recovery I/O' ela= 20 count=1 intr=256 timeout=214748364 obj#=-1
WAIT #0: nam='RMAN backup & recovery I/O' ela= 22 count=1 intr=256 timeout=2147483647 obj#=-
WAIT #0: nam='RMAN backup & recovery I/O' ela= 15 count=1 intr=256 timeout=2147483647 obj#=-
WAIT #0: nam='RMAN backup & recovery I/O' ela= 10 count=1 intr=256 timeout=2147483647 obj#=-
WAIT #0: nam='RMAN backup & recovery I/O' ela= 14 count=1 intr=256 timeout=2147483647 obj#=-
. . .
WAIT #0: nam='RMAN backup & recovery I/O' ela= 1364 count=1 intr=256 timeout=2147483647
WAIT #0: nam='kfk: async disk IO' ela= 16 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 30 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='RMAN backup & recovery I/O' ela= 4783 count=1 intr=256 timeout=2147483647
WAIT #0: nam='kfk: async disk IO' ela= 5 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='RMAN backup & recovery I/O' ela= 2952 count=1 intr=256 timeout=2147483647
WAIT #0: nam='kfk: async disk IO' ela= 14 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 33 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 62 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 92 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 12 count=1 intr=0 timeout=4294967295 obj#=-1
. . .
WAIT #0: nam='cell smart restore from backup' ela= 43249 cellhash#=379339958 p2=0 p3=0 obj#=
WAIT #0: nam='cell smart restore from backup' ela= 18141 cellhash#=213345948 p2=0 p3=0 obj#=
WAIT #0: nam='cell smart restore from backup' ela= 430 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 378 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 15 cellhash#=2133459483 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 1355 cellhash#=213345948 p2=0 p3=0 obj#=-
WAIT #0: nam='cell smart restore from backup' ela= 766 cellhash#=2133459483 p2=0 p3=0 obj#=-
WAIT #0: nam='cell smart restore from backup' ela= 167 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 199 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 19 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 226 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 127 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 110 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 177 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 160 cellhash#=213345948 p2=0 p3=0 obj#=-1
Parameters

The only parameter used for this event is P1, which shows which cell was responsible for generating the event.

P1 - Cell hash number

P2 - Not used

P3 - Not used

Exadata Wait Events in the Other and Idle Classes

These are relatively minor events that occur primarily during startup and shut down of storage cells and fault conditions. You will probably not see them on normally functioning systems. There is one exception to this, the cell smart flash unkeep event. Table 12.2 lists the “cell” wait events in the Other class along with their parameters. A separate section will cover cell smart flash unkeep.

images

cell smart flash unkeep

This event records the time spent waiting when Oracle must flush blocks out of Exadata Smart Flash Cache. This can occur when a table that has a storage clause designating that it be pinned in Exadata Smart Flash Cache is truncated or dropped.

Event Meaning

Truncating and dropping tables doesn't generally happen very often in most production systems (with some notable exceptions) and therefore this wait event will probably not be seen very often. Here's an excerpt from a 10046 trace file showing the event:

WAIT #4: nam='enq: RO - fast object reuse' ela= 393 name|mode=1380909062 2=65581 0=1 obj#=-1
WAIT #4: nam='reliable message' ela= 1548 channel context=6712270872 channel handle=6696807856
broadcast message=6741342984 obj#=-1 tim=1286218926420916
WAIT #4: nam='enq: RO - fast object reuse' ela= 21027 name|mode=1380909 2=65581 0=1 obj#=-1
WAIT #4: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1
WAIT #4: nam='enq: RO - fast object reuse' ela= 159 name|mode=1380909057 2=65581 0=2 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 336 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 291 cellhash#=2133459483 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 319 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 418 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 1 cellhash#=2133459483 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 216 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 222 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 196 cellhash#=2133459483 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 216 cellhash#=3176594409 p2=0 p3=0 obj#=-1

Note that the cell smart flash unkeep event is preceded by a handful of enq: RO – fast object reuse events, which are used to mark time associated with cleaning up the buffer cache after a drop or truncate. The cell smart flash unkeep is basically an extension of that event to clean up the Exadata Smart flash cache on the storage server as well.

Parameters

The only parameter used for this event is P1, which shows which cell was responsible for generating the event.

P1 - Cell hash number

P2 - Not used

P3 - Not used

Old Events

In addition to the new cell events, there are also a few non-Exadata-specific wait events that you should be aware of. These are events that you may already be familiar with from managing Oracle on other platforms. They happen to also be important in an Exadata environment, so they represent cases in which your existing knowledge and skill can carry over and stand you in good stead as you move in to managing Exadata.

direct path read

Direct path reads are used by Oracle to read data directly into PGA memory (instead of into the buffer cache). They are an integral part of Exadata Offloading because SQL processing can only be offloaded to the storage cells when the direct path read mechanism is used. The direct path read wait event is actually replaced by the cell smart table scan and cell smart index scan wait events when a query is offloaded. However, the direct path read mechanism is still used by the code covered by those new wait events. That is, either the plan must include a parallel scan or Oracle must decide to use the serial direct path read mechanism.

Event Meaning

This event records time that Oracle spends waiting on a direct path read to complete. You should know that the direct path read wait event can be very misleading. Both the number of events recorded and the timings associated with them can appear to be inaccurate. This is due to the fact that direct path reads are done in an asynchronous and overlapping fashion. See MOS note 50415.1 for further details about why time recorded by this event can be misleading. It also bears mentioning that 11gR2 contains an enhancement that causes serial direct path reads to occur more frequently than in previous releases. See MOS Note 793845.1, which briefly mentions this change.

By the way, the direct path read wait event does still show up on the Exadata platform for various operations but generally not for full table scans unless the table (or partition is relatively small).

Parameters

The parameters for this event show you exactly which segment (obj) is scanned and which set of blocks were scanned during this event.

P1 - File number

P2 - First DBA

P3 - Block count

obj# - The object number of the table being scanned

As mentioned in the Cell Smart Table Scan section, the parameters contain specific information about which file and object are being accessed. The offset into the file is also provided in the P2 parameter, along with the number of contiguous blocks read in the P3 parameter.

enq: KO—fast object checkpoint

The enq:KO event has a very strange name. Don't be put off by that. The event is essentially an object checkpoint event. The V$LOCK_TYPE view describes the KO lock as follows:

SYS@EXDB1> select type, name, description from v$lock_type
  2* where type = 'KO';

TYPE  NAME                           DESCRIPTION
----- ------------------------------ ---------------------------------------------
KO    Multiple Object Checkpoint     Coordinates checkpointing of multiple objects
Event Meaning

This event is used when a session is waiting for all dirty blocks to be flushed from the buffer cache for an object prior to starting a direct path read or cell smart table scan or cell smart index scan. This event is important because the time required to do the checkpoint may outweigh the benefit of the direct path reads. This is unlikely on Exadata storage, though, where the additional Smart Scan benefits are only enabled by the direct path read mechanism.

Parameters

The parameters for this event are not overly helpful but the event does show which object is scanned. Here are the parameter definitions:

P1 - Name/ Mode

P2 - Not used

P3 - Not used

obj# - The object number of the object being checkpointed

reliable message

The reliable message event is used to record time spent communicating with background processes, like the checkpoint process (CKPT). We have included it here because of its close association with the enq: KO – fast object checkpoint event.

Event Meaning

This event is the precursor to the enq: KO – fast object checkpoint event (among others). The communication is done using an inter-process communication channel rather than a more normal post mechanism. This communication method allows the sender to request an ACK before it continues, thus the reason it is called a reliable message. It is generally a very short duration event as it only records time for communicating between processes. Both the users foreground process and the chkpt process will wait on this event as they communicate with each other. Here's an excerpt of a 10046 trace file showing a complete reliable message event:

PARSING IN CURSOR #46963980936744 len=50 dep=0 uid=0 oct=3 lid=0 tim=1301885220743528 hv=3032626544 ad='2cf675ff0' sqlid='7y09dtyuc4dbh'
select avg(pk_col) from kso.skew2 a where col1 > 0
END OF STMT
PARSE #46963980936744:c=1999,e=2122,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2117817910,tim=1301885220743527
WAIT #46963980936744: nam='reliable message' ela= 1360 channel context=11888341784 channel handle=12088585896 broadcast message=12089037216 obj#=75759 tim=1301885220745397
WAIT #469: nam='enq: KO - fast object checkpoint' ela= 183 ... obj#=75759 tim=130
WAIT #469: nam='enq: KO - fast object checkpoint' ela= 144 ... obj#=75759 tim=130
WAIT #469: nam='enq: PS - contention' ela= 200 ... obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 58333 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 101705 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='enq: PS - contention' ela= 276 ... obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 102988 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 103495 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='enq: PS - contention' ela= 237 ... obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 102655 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 102329 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='enq: PS - contention' ela= 313 ... obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 102673 p1=0 p2=0 p3=0 obj#=75759 tim=130
Parameters

Here are the parameters for the reliable message event.

P1 - channel context

P2 - channel handle

P3 - broadcast message

obj# - The object number of the object of interest (not always set)

Resource Manager Events

Finally, there are a few Resource Manager events that you should be aware of. While these are not specific to Exadata, Resource Manager provides key functionality for combining mixed workloads on Exadata. There are actually eight separate events as of release 11.2.0.2. The following query against V$EVENT_NAME shows these events and their parameters.

SYS@SANDBOX1> @valid_events
Enter value for event_name: resmgr%

EVENT                          CLASS           P1       P2                   P3
------------------------------ --------------- -------- -------------------- ----
resmgr:internal state change   Concurrency     location
resmgr:sessions to exit                        location

resmgr:internal state cleanup  Other           location

resmgr:become active           Scheduler       location
resmgr:cpu quantum                             location  consumer group id
resmgr:large I/O queued                        location
resmgr:pq queued                               location
resmgr:small I/O queued                        location

8 rows selected.

There are only two of these events that are of interest.

resmgr:cpu quantum

This event is used to record forced idle time imposed by Database Resource Manager (DBRM) due to competition with higher priority work. Said another way, it is the time a process spent waiting for DBRM to allocate it a time slice.

Event Meaning

DBRM behaves in an analogous manner to CPU scheduling algorithms in that it divides time into units (quantum) and either allows a process to run or not depending on other workload on the system. Unlike CPU scheduling algorithms though, DBRM throttling is interjected at key locations in the Oracle code to eliminate the possibility of a process being kicked off of the CPU when it is holding a shared resource such as a latch. This prevents some nasty behavior that may occur on heavily loaded systems such as priority inversion problems. In effect, the processes voluntarily go to sleep when they are not holding these shared resources. There are multiple locations in the code where these checks are implemented. Here's an excerpt of a 10046 trace file showing the resmgr:cpu quantum event:

PARSING IN CURSOR #47046073104400 len=34 dep=0 uid=85 oct=3 lid=85 tim=1301275393323414
hv=3308960238 ad='b3c9faf8' sqlid='c4js15z2mpfgf'
select count(*)
from kso.skew
END OF STMT
PARSE #470460:c=0,e=77,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=937687140,tim=130
EXEC #470460:c=0,e=49,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=937687140,tim=130
WAIT #470460: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 3457 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 68126 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 68046 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 108941 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 38234 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 103282 location=3 consumer group id=75525 ...
Parameters

Here are the parameters for this event. Note that the obj# parameter exists but is not used.

P1 - Location

P2 - Consumer group id

P3 - Not used

obj# - NA

The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code. There are at least 5 distinct locations that we have observed. Unfortunately, Oracle does not publicly document where in the Oracle kernel these checks are performed.

The consumer group number in the P2 parameter is pretty self-explanatory. It maps to the CONSUMER_GROUP_ID column in the DBA_RSRC_CONSUMER_GROUPS view. This parameter allows you to tell what consumer group a process was assigned to when its CPU usage was curtailed.

resmgr:pq queued

This event is used to record time spent waiting in the parallel query queue.

Event Meaning

The parallel statement queuing feature comes with its own wait event. Statements that are queued due to insufficient parallel server processes or other directives clock time to this event. Here's an excerpt of a 10046 trace file showing the resmgr:pq queued event:

PARSING IN CURSOR #47898436021000 len=73 dep=0 uid=0 oct=3 lid=0 tim=1301966072332694 hv=3186480617 ad='2cd2d1cb8' sqlid='bgcmmcyyyvpg9'
select /*+ parallel (a 16) */ avg(pk_col) from kso.skew3 a where col1 > 0
END OF STMT
PARSE #478:c=2000,e=1572,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1404581711,tim=130
WAIT #478: nam='resmgr:pq queued' ela= 65102047 location=1  =0  =0 obj#=523 tim=130
WAIT #478: nam='enq: KO - fast object checkpoint' ela= 258 ... obj#=523
WAIT #478: nam='reliable message' ela= 2005 channel context=118 channel handle=120 ... obj#=523
WAIT #478: nam='enq: KO - fast object checkpoint' ela= 156 ... obj#=523 ...
WAIT #478: nam='PX Deq: Join ACK' ela= 772 sleeptime/senderid=268 passes=1 p3=120 obj#=523 ...
WAIT #478: nam='PX Deq: Join ACK' ela= 932 sleeptime/senderid=268 passes=8 p3=120 obj#=523 ...
WAIT #478: nam='PX Deq: Join ACK' ela= 2 sleeptime/senderid=0 passes=0 p3=0 obj#=523 ...
Parameters

Here are the parameters for the resmgr:pq queued event.

P1 - Location

P2 - Not used

P3 - Not used

obj# - NA

The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code as described above in the resmgr:cpu quantum event.

Summary

The wait interface has been expanded to cover several Exadata-specific features. In this chapter we've covered the new wait events. By far the most interesting of the new events are cell smart table scan and cell smart index scan. These events cover the time spent waiting on an offloadable I/O request to a storage cell. There is a lot of processing that occurs at the storage layer that is lumped together under these events. It's important to understand that these events replace the direct path read event and that the mechanism of returning the data directly to the process PGA employed by the Smart Scan events is analogous to the way it is handled by direct path read.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.209.201