C H A P T E R  2

Offloading / Smart Scan

Offloading is the secret sauce of Exadata. It’s what makes Exadata different from every other platform that Oracle runs on. Offloading refers to the concept of moving processing from the database servers to the storage layer. It is also the key paradigm shift provided by the Exadata platform. But it’s more than just moving work in terms of CPU usage. The primary benefit of Offloading is the reduction in the volume of data that must be returned to the database server. This is one of the major bottlenecks of most large databases.

The terms Offloading and Smart Scan are used somewhat interchangeably. Offloading is a better description in our opinion, as it refers to the fact that part of the traditional SQL processing done by the database can be “offloaded” from the database layer to the storage layer. It is a rather generic term, though, and is used to refer to many optimizations that are not even related to SQL processing including improving backup and restore operations.

Smart Scan, on the other hand, is a more focused term, in that it refers only to Exadata’s optimization of SQL statements. These optimizations come into play for scan operations (typically Full Table Scans). A more specific definition of a Smart Scan would be any section of the Oracle kernel code that is covered by the Smart Scan wait events. There are actually two wait events that include the term “Smart Scan” in their names, Cell Smart Table Scan and Cell Smart Index Scan. We’ll discuss both of these wait events in detail a bit later, in Chapter 10. While it’s true that “Smart Scan” has a bit of a marketing flavor, it does have specific context when referring to the code covered by these wait events. At any rate, while the terms are somewhat interchangeable, keep in mind that Offloading can refer to more than just speeding up SQL statement execution.

In this chapter we will focus on Smart Scan optimizations. We’ll cover the various optimizations that can come into play with Smart Scans, the mechanics of how they work, and the requirements that must be met for Smart Scans to occur. We’ll also cover some techniques that can be used to help you determine whether Smart Scans have occurred for a given SQL statement. The other offloading optimizations will only be mentioned briefly as they are covered elsewhere in the book.

Why Offloading Is Important

We can’t emphasize enough how important this concept is. The idea of moving database processing to the storage tier is a giant leap forward. The concept has been around for some time. In fact, rumor has it that Oracle approached at least one of the large SAN manufacturers several years ago with the idea. The manufacturer was apparently not interested at the time and Oracle decided to pursue the idea on its own. Oracle subsequently partnered with HP to build the original Exadata V1, which incorporated the Offloading concept. Fast-forward a couple of years, and you have Oracle’s acquisition of Sun Microsystems. This put the company in a position to offer an integrated stack of hardware and software and gives it complete control over which features to incorporate into the product.

Offloading is important because one of the major bottlenecks on large databases is the time it takes to transfer the large volumes of data necessary to satisfy DW-type queries between the disk systems and the database servers (that is, because of bandwidth). This is partly a hardware architecture issue, but the bigger issue is the sheer volume of data that is moved by traditional Oracle databases. The Oracle database is very fast and very clever about how it processes data, but for queries that access a large amount of data, getting the data to the database can still take a long time. So as any good performance analyst would do, Oracle focused on reducing the time spent on the thing that accounted for the majority of the elapsed time. During the analysis, the team realized that every query that required disk access was very inefficient in terms of how much data had to be returned to and processed by the database servers. Oracle has made a living by developing the best cache-management software available, but for really large data sets, it is just not practical to keep everything in memory on the database servers.

images Kevin Says: The authors make a good point based on a historical perspective of Oracle query processing. However, I routinely find myself reminding people that modern commodity x64 servers are no longer architecturally constrained to small memory configurations. For example, servers based on Intel Xeon 7500 processors with Quick Path Interconnect support large numbers of memory channels each with large number of DIMM slots. Commodity-based servers with multiple terabytes of main memory are quite common. In fact, the X2-8 Exadata model supports two terabytes of main memory in the database grid, and that capacity will increase naturally over time. I expect this book to remain relevant long enough for future readers to look back on this comment as arcane, since the trend toward extremely large main memory x64 systems has only just begun. The important thing to remember about Exadata is that it is everything Oracle database offers plus Exadata Storage Servers. This point is relevant because customers can choose to combine deep compression (for example, Exadata Hybrid Columnar Compression) with the In-Memory Parallel Query feature for those cases where ruling out magnetic media entirely is the right solution for meeting service levels.

Imagine the fastest query you can think of: a single column from a single row from a single table where you actually know where the row is stored (rowid). On a traditional Oracle database, at least one block of data has to be read into memory (typically 8K) to get the one column. Let’s assume your table stores an average of 50 rows per block. You’ve just transferred 49 extra rows to the database server that are simply overhead for this query. Multiply that by a billion and you start to get an idea of the magnitude of the problem in a large data warehouse. Eliminating the time spent on transferring completely unnecessary data between the storage and the database tier is the main problem that Exadata was designed to solve.

Offloading is the approach that was used to solve the problem of excessive time spent moving irrelevant data between the tiers. Offloading has three design goals, although the primary goal far outweighs the others in importance:

  • Reduce the volume of data transferred from disk systems to the database servers.
  • Reduce CPU usage on database servers.
  • Reduce disk access times at the storage layer.

Reducing the volume was the main focus and primary goal. The majority of the optimizations introduced by Offloading contribute to this goal. Reducing CPU load is important as well, but is not the primary benefit provided by Exadata and therefore takes a back seat to reducing the volume of data transferred. (As you’ll see, however, decompression is a notable exception to that generalization, as it is performed on the storage servers.) Several optimizations to reduce disk access time were also introduced, and while some of the results can be quite stunning, we don’t consider them to be the bread-and-butter optimizations of Exadata.

Exadata is an integrated hardware/software product that depends on both components to provide substantial performance improvement over non-Exadata platforms. However, the performance benefits of the software component dwarf the benefits provided by the hardware. Here is an example:

SYS@SANDBOX> alter session set cell_offload_processing=false;

Session altered.

Elapsed: 00:00:00.06
SYS@SANDBOX> select count(*) from kso.skew3 where col1 < 0;

  COUNT(*)
----------
         2

1 row selected.

Elapsed: 00:00:51.09
SYS@SANDBOX> alter session set cell_offload_processing=true;

Session altered.

Elapsed: 00:00:00.07
SYS@SANDBOX> select count(*) from kso.skew3 where col1 < 0;

  COUNT(*)
----------
         2

1 row selected.

Elapsed: 00:00:00.15

This example shows the performance of a scan against a single table with 384 million rows. We ran it once with Offloading disabled, effectively using all the hardware benefits of Exadata and none of the software benefits. You’ll notice that even on Exadata hardware, this query took almost a minute. Keep in mind that this was only spread across three storage servers on our V2 quarter rack and did not utilize the flash cache at all. We then re-enabled Offloading, and the query completed in substantially less than a second. Obviously the hardware in play was the same in both executions. The point is that it’s the software’s ability via Offloading that made the difference.

A GENERIC VERSION OF EXADATA?

What Offloading Includes

There are many optimizations that can be lumped under the Offloading banner. This chapter focuses on SQL statement optimizations that are implemented via Smart Scans. The big three Smart Scan optimizations are Column Projection, Predicate Filtering, and Storage Indexes. The primary goal of most of the Smart Scan optimizations is to reduce the amount of data that needs to be transmitted back to the database servers during scan execution. However, some of the optimizations also attempt to offload CPU-intensive operations, decompression for example. We won’t have much to say about optimizations that are not related to SQL statement processing in this chapter, such as Smart File Creation and RMAN-related optimizations. Those topics will be covered in more detail elsewhere in the book.

images Kevin Says: This aspect of Offload Processing seems quite complicated. The authors are correct in stating that the primary benefit of Smart Scan is payload reduction between storage and the database grid. And it’s true that some CPU-offload benefit is enjoyed by decompressing Exadata Hybrid Columnar Compression units in the storage cells. However, therein lies one case where Offload Processing actually aims to increase the payload between the cells and the database grid. The trade-off is important, however. It makes sense to decompress EHCC data in the cells (after filtration) in spite of the fact that more data is sent to the database grid due to the decompression. All technology solutions have trade-offs.

Column Projection

The term Column Projection refers to Exadata’s ability to limit the volume of data transferred between the storage tier and the database tier by only returning columns of interest (that is, those in the select list or necessary for join operations on the database tier). If your query requests five columns from a 100-column table, Exadata can eliminate most of the data that would be returned to the database servers by non-Exadata storage. This feature is a much bigger deal than you might expect and it can have a very significant impact on response times. Here is an example:

SYS@SANDBOX1> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.12
SYS@SANDBOX1> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.13
SYS@SANDBOX1> alter session set "_serial_direct_read"=true;

Session altered.

Elapsed: 00:00:00.00
SYS@SANDBOX1> alter session set cell_offload_processing=false;

Session altered.

Elapsed: 00:00:00.01
SYS@SANDBOX1> select count(col1) from kso.skew3;

COUNT(COL1)
-----------
  384000044

1 row selected.

Elapsed: 00:00:51.32
SYS@SANDBOX1> alter session set cell_offload_processing=true;

Session altered.

Elapsed: 00:00:00.00
SYS@SANDBOX1> select count(col1) from kso.skew3;

COUNT(COL1)
-----------
  384000044

1 row selected.

Elapsed: 00:00:26.27

This example deserves a little discussion. First we used a trick to force direct path reads with the _SERIAL_DIRECT_READ parameter (more on that later). Next we disabled Smart Scans by setting CELL_OFFLOAD_PROCESSING to FALSE. You can see that our test query doesn’t have a WHERE clause. This means that Predicate Filtering and Storage Indexes cannot be used to cut down the volume of data that must be transferred from the storage tier, because those two optimizations can only be done when there is a WHERE clause (we’ll discuss those optimizations shortly). That leaves Column Projection as the only optimization in play. Are you surprised that Column Projection alone could cut a query’s response time in half? We were, the first time we saw it, but it makes sense if you think about it. You should be aware that columns in the select list are not the only columns that must be returned to the database server. This is a very common misconception. Join columns in the WHERE clause must also be returned. As a matter of fact, in early versions of Exadata, the Column Projection feature was not as effective as it could have been and actually returned all the columns included in the WHERE clause, which in many cases included some unnecessary columns.

The DBMS_XPLAN package can display information about column projection, although by default it does not. The projection data is stored in the PROJECTION column in the V$SQL_PLAN view as well. Here is an example:

SYS@SANDBOX> select count(s.col1),avg(length(s.col4))
  2  from kso.skew s, kso.skew2 s2
  3  where s.pk_col = s2.pk_col
  4  and s.col1 > 0
  5  and s.col2='asddsadasd';

COUNT(S.COL1) AVG(LENGTH(S.COL4))
------------- -------------------
    127999992                   1

1 row selected.

SYS@SANDBOX> select sql_id, child_number, sql_text
  2  from v$sql where sql_text like '%skew%';

SQL_ID         CHILD SQL_TEXT
------------- ------ --------------------------------------------------------------
8xa3wjh48b9ar      0 select count(s.col1),avg(length(s.col4)) from kso.skew s, kso.

1 row selected.

SYS@SANDBOX> select * from
  2 table(dbms_xplan.display_cursor('&sql_id','&child_no','+projection'));
Enter value for sql_id: 8xa3wjh48b9ar
Enter value for child_no:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  8xa3wjh48b9ar, child number 0
-------------------------------------
select count(s.col1),avg(length(s.col4)) from kso.skew s, kso.skew2 s2
where s.pk_col = s2.pk_col and s.col1 > 0 and s.col2='asddsadasd'

Plan hash value: 3361152066

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      |       |       |   360K(100)|          |
|   1 |  SORT AGGREGATE             |      |    1 |    30 |       |            |          |
|*  2 |   HASH JOIN                 |      |   64M|  1836M|   549M|   360K  (1)| 01:12:02 |
|*  3 |    TABLE ACCESS STORAGE FULL| SKEW |   16M|   366M|       | 44585   (2)| 00:08:56 |
|   4 |    TABLE ACCESS STORAGE FULL| SKEW2|  128M|   732M|       |   178K  (1)| 00:35:37 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("S"."PK_COL"="S2"."PK_COL")
   3 - storage(("S"."COL2"='asddsadasd' AND "S"."COL1">0))
       filter(("S"."COL2"='asddsadasd' AND "S"."COL1">0))

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT(LENGTH("S"."COL4"))[22], COUNT("S"."COL1")[22],
       SUM(LENGTH("S"."COL4"))[22]
   2 - (#keys=1) "S"."COL4"[VARCHAR2,1], "S"."COL1"[NUMBER,22]
   3 - "S"."PK_COL"[NUMBER,22], "S"."COL1"[NUMBER,22], "S"."COL4"[VARCHAR2,1]
   4 - "S2"."PK_COL"[NUMBER,22]


33 rows selected.

SYS@SANDBOX> select projection from v$sql_plan
  2  where projection is not null
  3  and sql_id = '8xa3wjh48b9ar';

PROJECTION
-------------------------------------------------------------------------------------------
(#keys=0) COUNT(LENGTH("S"."COL4"))[22], COUNT("S"."COL1")[22], SUM(LENGTH("S"."COL4"))[22]
(#keys=1) "S"."COL4"[VARCHAR2,1], "S"."COL1"[NUMBER,22]
"S"."PK_COL"[NUMBER,22], "S"."COL1"[NUMBER,22], "S"."COL4"[VARCHAR2,1]
"S2"."PK_COL"[NUMBER,22]

4 rows selected.

So as you can see, the plan output shows the projection information, but only if you use the +PROJECTION argument in the call to the DBMS_XPLAN package. Note also that the PK_COL columns from both tables were listed in the PROJECTION section, but that not all columns in the WHERE clause are included. Only those columns that need to be returned to the database (join columns) should be listed. Note also that the projection information is not unique to Exadata but is a generic part of the database code.

The V$SQL family of views contain columns that define the volume of data that may be saved by Offloading (IO_CELL_OFFLOAD_ELIGIBLE_BYTES) and the volume of data that was actually returned by the storage servers (IO_INTERCONNECT_BYTES). Note that these columns are cumulative for all the executions of the statement. We’ll be using these two columns throughout the book because they are key indicators of offload processing. Here’s a quick demonstration to show that projection does affect the amount of data returned to the database servers and that selecting fewer columns results in less data transferred:

SYS@SANDBOX> select /* single col */ avg(pk_col)
  2  from kso.skew3;

AVG(PK_COL)
-----------
 16093750.3

1 row selected.

Elapsed: 00:00:32.13

SYS@SANDBOX> select /* multi col */ avg(pk_col),sum(col1)
  2  from kso.skew3;

AVG(PK_COL)  SUM(COL1)
----------- ----------
 16093750.3 1.9003E+14

1 row selected.

Elapsed: 00:00:45.32
SYS@SANDBOX> set timing off
SYS@SANDBOX> select sql_id,sql_text from v$sql
  2  where sql_text like '%col */ avg(pk_col)%';

SQL_ID        SQL_TEXT
------------- ----------------------------------------------------------------------
bb3z4aaa9du7j select /* single col */ avg(pk_col) from kso.skew3
555pskb8aaqct select /* multi col */ avg(pk_col),sum(col1) from kso.skew3

2 rows selected.

SYS@SANDBOX> select sql_id, IO_CELL_OFFLOAD_ELIGIBLE_BYTES eligible,
  2  IO_INTERCONNECT_BYTES actual,
  3  100*(IO_CELL_OFFLOAD_ELIGIBLE_BYTES-IO_INTERCONNECT_BYTES)
  4  /IO_CELL_OFFLOAD_ELIGIBLE_BYTES "IO_SAVED_%", sql_text
  5  from v$sql where sql_id in ('bb3z4aaa9du7j','555pskb8aaqct'),

SQL_ID          ELIGIBLE     ACTUAL IO_SAVED_% SQL_TEXT
------------- ---------- ---------- ---------- ------------------------------------
bb3z4aaa9du7j 1.6025E+10 4511552296      71.85 select /* single col */ avg(pk_col)
555pskb8aaqct 1.6025E+10 6421233960      59.93 select /* multi col */ avg(pk_col),s

2 rows selected.

SYS@SANDBOX> @fsx4
Enter value for sql_text: %col */ avg(pk_col)%
Enter value for sql_id:

SQL_ID         CHILD OFFLOAD IO_SAVED_%     AVG_ETIME SQL_TEXT
------------- ------ ------- ---------- ------------- --------------------
6u7v77c2f8x5r      0 Yes          59.93         45.15 select /* multi col
d43dr7hvmw3yb      0 Yes          71.85         31.94 select /* single col

2 rows selected.

Note that the extra column resulted in a great deal of extra time required to complete the query and that the columns in V$SQL verified the increased volume of data that had to be transferred. We’ve also shown the output of a modified version of the fsx.sql script, which we’ll discuss in more detail later in this chapter. For now, please just accept that it shows us whether a statement was offloaded or not.

Predicate Filtering

The second of the big three Smart Scan optimizations is Predicate Filtering. This term refers to Exadata’s ability to return only rows of interest to the database tier. Since iDB includes the predicate information in its requests, this is accomplished by performing the standard filtering operations at the storage cells before returning the data. On databases using non-Exadata storage, filtering is done on the database servers. This generally means that a large number of records that will eventually be discarded will be returned to the database tier. Filtering these rows at the storage layer can provide a very significant decrease in the volume of data that must be transferred to the database tier. While this optimization also results in some savings in CPU usage on the database servers, the biggest advantage is generally the reduction in data transfer.

Here is an example:

SYS@SANDBOX> alter session set cell_offload_processing=false;

Session altered.

Elapsed: 00:00:00.01
SYS@SANDBOX> select count(pk_col) from kso.skew3;

COUNT(PK_COL)
-------------
    384000036

Elapsed: 00:00:48.45
SYS@SANDBOX> alter session set cell_offload_processing=true;

Session altered.

Elapsed: 00:00:00.01
SYS@SANDBOX> select count(pk_col) from kso.skew3;

COUNT(PK_COL)
-------------
    384000036

Elapsed: 00:00:26.61
SYS@SANDBOX> -- disable storage indexes
SYS@SANDBOX> alter system set "_kcfis_storageidx_disabled"=true;
System altered.

Elapsed: 00:00:00.17
SYS@SANDBOX> select count(pk_col) from kso.skew3 where col1 < 0;

COUNT(PK_COL)
-------------
            2

Elapsed: 00:00:08.53

First we completely disabled Offloading using the CELL_OFFLOAD_PROCESSING parameter and ran a query without a WHERE clause. Without the benefit of Offloading this query took about 48 seconds. We then enabled Offloading and re-ran the query. This time the query took only about 27 seconds. The savings of approximately 21 seconds was due strictly to Column Projection (because without a WHERE clause for filtering, there were no other optimizations that could come into play). We then used a trick to disable storage indexes by setting the hidden parameter, _KCFIS_STORAGEIDX_DISABLED, to TRUE (we’ll discuss that more in the next section) and added a WHERE clause, which reduced the execution time to about 9 seconds. This reduction of an additional 18 seconds or so was thanks to Predicate Filtering. Note that we had to disable storage indexes to be sure that we weren’t getting any benefit from that optimization and that all the improvement was due to Predicate Filtering, which brings us to the next topic.

Storage Indexes

Storage Indexes provide the third level of optimization for Smart Scans. Storage Indexes are in-memory structures on the storage cells that maintain a maximum and minimum value for each 1MB disk storage unit, for up to eight columns of a table. Storage Indexes are a little different than most Smart Scan optimizations. The goal of Storage Indexes is not to reduce the amount of data being transferred back to the database tier. In fact, whether they are used on a given query or not, the amount of data returned to the database tier remains constant. On the contrary, Storage Indexes are designed to eliminate time spent reading data from disk on the storage servers themselves. Think of this feature as a pre-filter. Since Smart Scans pass the query predicates to the storage servers, and Storage Indexes contain a map of values in each 1MB storage region, any region that can’t possibly contain a matching row can be eliminated without ever being read. You can also think of Storage Indexes as an alternate partitioning mechanism. Disk I/O is eliminated in analogous fashion to partition elimination. If a partition can’t contain any records of interest, the partition’s blocks will not be read. Similarly, if a storage region cannot contain any records of interest, that storage region need not be read.

Storage Indexes cannot be used in all cases, and there is little that can be done to affect when or how they are used. But in the right situations, the results from this optimization technique can be astounding.

Here is an example:

SYS@SANDBOX> -- disable storage indexes
SYS@SANDBOX> alter system set "_kcfis_storageidx_disabled"=true;

System altered.

Elapsed: 00:00:00.22
SYS@SANDBOX> select count(pk_col) from kso.skew3 where col1 < 0;

COUNT(PK_COL)
-------------
            2

Elapsed: 00:00:08.74
SYS@SANDBOX> -- enable storage indexes
SYS@SANDBOX> alter system set "_kcfis_storageidx_disabled"=false;

System altered.

Elapsed: 00:00:00.03
SYS@SANDBOX> select count(pk_col) from kso.skew3 where col1 < 0;

COUNT(PK_COL)
-------------
            2

Elapsed: 00:00:00.08

In this example we disabled storage indexes (using the _KCFIS_STORAGEIDX_DISABLED parameter) to remind you of the elapsed time required to read through 384 million rows using Column Projection and Predicate Filtering. Remember that even though the amount of data returned to the database tier is extremely small in this case, the storage servers still had to read through every block containing data for the SKEW3 table and then had to check each row to see if it matched the WHERE clause. This is where the majority of the 8 seconds was spent. We then re-enabled storage indexes and reran the query, which reduced the execution time to about .08 seconds. This reduction in elapsed time is a result of storage indexes being used to avoid virtually all of the disk I/O and the time spent filtering through those records.

Just to reiterate, Column Projection and Predicate Filtering (and most other Smart Scan optimizations) improve performance by reducing the volume of data being transferred back to the database servers (and thus the amount of time to transfer the data). Storage Indexes improve performance by eliminating time spent reading data from disk on the storage servers and filtering that data. Storage Indexes are covered in much more detail in Chapter 4.

Simple Joins (Bloom Filters)

In some cases, join processing can be offloaded to the storage tier as well. Offloaded joins are accomplished by creating what is called a bloom filter. Bloom filters have been around for a long time and have been used by Oracle since Oracle Database Version 10g Release 2. So they are not specific to Exadata. One of the main ways Oracle uses them is to reduce traffic between parallel query slaves. They have the advantage of being very small relative to the data set that they represent. However, this comes at a price— they can return false positives. That is, rows that should not be included in the desired result set can occasionally pass a bloom filter. For that reason, an additional filter must be applied after the bloom filter to ensure that any false positives are eliminated. The interesting thing about bloom filters from an Exadata perspective is that they may be passed to the storage servers and evaluated there. This technique can result in a large decrease in the volume of data that must be transmitted back to database servers.

Here’s an example:

SYS@SANDBOX> -- disable bloom filter offloading
SYS@SANDBOX> alter session set "_bloom_predicate_pushdown_to_storage"=false;

Session altered.

Elapsed: 00:00:00.82
SYS@SANDBOX> @bloom_join2.sql

COL2                           SUM(A.COL1)
------------------------------ -----------
2342                                   144
asddsadasd                       153598416

2 rows selected.

Elapsed: 00:11:39.39
SYS@SANDBOX> -- enable bloom filter offloading
SYS@SANDBOX> alter session set "_bloom_predicate_pushdown_to_storage"=true;

Session altered.

Elapsed: 00:00:00.82
SYS@SANDBOX> @bloom_join2.sql

COL2                           SUM(A.COL1)
------------------------------ -----------
asddsadasd                       153598416
2342                                   144

2 rows selected.

Elapsed: 00:02:06.13

SYS@SANDBOX> @dplan
Enter value for sql_id: 09m6t5qpgkywx
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
SQL_ID  09m6t5qpgkywx, child number 0
-------------------------------------
select /*+ bloom join 2  use_hash (skew temp_skew) */ a.col2,
sum(a.col1) from kso.skew3 a, kso.skew2 b where a.pk_col = b.pk_col and
b.col1 = 1 group by a.col2

Plan hash value: 466947137

-----------------------------------------------------------------------------------------
| Id  | Operation                             | Name       |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |            |        |      |            |
|   1 |  PX COORDINATOR                       |            |        |      |            |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10002   |  Q1,02 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                      |            |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE                        |            |  Q1,02 | PCWP |            |
|   5 |      PX SEND HASH                     | :TQ10001   |  Q1,01 | P->P | HASH       |
|   6 |       HASH GROUP BY                   |            |  Q1,01 | PCWP |            |
|*  7 |        HASH JOIN                      |            |  Q1,01 | PCWP |            |
|   8 |         BUFFER SORT                   |            |  Q1,01 | PCWC |            |
|   9 |          PX RECEIVE                   |            |  Q1,01 | PCWP |            |
|  10 |           PX SEND BROADCAST           | :TQ10000   |        | S->P | BROADCAST  |
|  11 |            TABLE ACCESS BY INDEX ROWID| SKEW2      |        |      |            |
|* 12 |             INDEX RANGE SCAN          | SKEW2_COL1 |        |      |            |
|  13 |         PX BLOCK ITERATOR             |            |  Q1,01 | PCWC |            |
|* 14 |          TABLE ACCESS STORAGE FULL    | SKEW3      |  Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("A"."PK_COL"="B"."PK_COL")
  12 - access("B"."COL1"=1)
  14 - storage(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"A"."PK_COL"))


36 rows selected.

SYS@SANDBOX> @dplan
Enter value for sql_id: 09m6t5qpgkywx
Enter value for child_no: 1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
SQL_ID  09m6t5qpgkywx, child number 1
-------------------------------------
select /*+ bloom join 2  use_hash (skew temp_skew) */ a.col2,
sum(a.col1) from kso.skew3 a, kso.skew2 b where a.pk_col = b.pk_col and
b.col1 = 1 group by a.col2

Plan hash value: 466947137


-----------------------------------------------------------------------------------------
| Id  | Operation                             | Name       |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |            |        |      |            |
|   1 |  PX COORDINATOR                       |            |        |      |            |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10002   |  Q1,02 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                      |            |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE                        |            |  Q1,02 | PCWP |            |
|   5 |      PX SEND HASH                     | :TQ10001   |  Q1,01 | P->P | HASH       |
|   6 |       HASH GROUP BY                   |            |  Q1,01 | PCWP |            |
|*  7 |        HASH JOIN                      |            |  Q1,01 | PCWP |            |
|   8 |         BUFFER SORT                   |            |  Q1,01 | PCWC |            |
|   9 |          PX RECEIVE                   |            |  Q1,01 | PCWP |            |
|  10 |           PX SEND BROADCAST           | :TQ10000   |        | S->P | BROADCAST  |
|  11 |            TABLE ACCESS BY INDEX ROWID| SKEW2      |        |      |            |
|* 12 |             INDEX RANGE SCAN          | SKEW2_COL1 |        |      |            |
|  13 |         PX BLOCK ITERATOR             |            |  Q1,01 | PCWC |            |
|* 14 |          TABLE ACCESS STORAGE FULL    | SKEW3      |  Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("A"."PK_COL"="B"."PK_COL")
  12 - access("B"."COL1"=1)
  14 - storage(:Z>=:Z AND :Z<=:Z AND SYS_OP_BLOOM_FILTER(:BF0000,"A"."PK_COL"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"A"."PK_COL"))


36 rows selected.

In this listing we used a hidden parameter, _BLOOM_PREDICATE_PUSHDOWN_TO_STORAGE, to disable this feature for comparison purposes. Notice that our test query ran in about 2 minutes with Offloading and 11.5 minutes without. If you look closely at the Predicate Information of the plans, you will see that the SYS_OP_BLOOM_FILTER(:BF0000,"A"."PK_COL") predicate was run on the storage servers for the second run. The offloaded version ran faster because the storage servers were able to pre-join the tables, which eliminated a large amount of data that would otherwise have been transferred back to the database servers.

Function Offloading

Oracle’s implementation of SQL includes many built-in SQL functions. These functions can be used directly in SQL statements. They may be divided into two main groups: single-row functions and multi-row functions. Single-row functions return a single result row for every row of a queried table. These single row functions can be further subdivided into the following general categories:

  • Numeric functions (SIN, COS, FLOOR, MOD, LOG, …)
  • Character functions (CHR, LPAD, REPLACE, TRIM, UPPER, LENGTH, …)
  • Datetime functions (ADD_MONTHS, TO_CHAR, TRUNC, …)
  • Conversion functions (CAST, HEXTORAW, TO_CHAR, TO_DATE, …)

Virtually all of these single-row functions can be offloaded to Exadata storage. The second major group of SQL functions operate on a set of rows. There are two subgroups in this multi-row function category:

  • Aggregate functions (AVG, COUNT, SUM, …)
  • Analytic functions (AVG, COUNT, DENSE_RANK, LAG, …)

These functions return either a single row (aggregate functions) or multiple rows (analytic functions). Note that some of the functions are overloaded and belong to both groups. None of these functions can be offloaded to Exadata. Which makes sense, because many of these functions require access to the entire set of rows, which individual storage cells do not have.

There are some additional functions that don’t fall neatly into any of the previously described groupings. These functions are a mixed bag in terms of Offloading. For example, DECODE and NVL are offloadable, but the XML functions are not. Some of the Data Mining functions are offloadable and some are not. Also keep in mind that the list of offloadable functions may change as newer versions are released. The definitive list of which functions are offloadable for your particular version is contained in V$SQLFN_METADATA.

SYS@SANDBOX> select distinct name, version, offloadable
  2  from V$SQLFN_METADATA
  3  order by 1,2;

NAME                           VERSION      OFF
------------------------------ ------------ ---
!=                             SQL/DS       YES
!=                             V6 Oracle    YES
<                              SQL/DS       YES
<                              V6 Oracle    YES
<=                             SQL/DS       YES
<=                             V6 Oracle    YES
=                              SQL/DS       YES
=                              V6 Oracle    YES
>                              SQL/DS       YES
>                              V6 Oracle    YES
>=                             SQL/DS       YES
>=                             V6 Oracle    YES
ABS                            V10 Oracle   YES
ABS                            V10 Oracle   YES
ABS                            V6 Oracle    YES
ACOS                           V10 Oracle   YES
ACOS                           V73 Oracle   YES
. . .
VSIZE                          V6 Oracle    YES
WIDTH_BUCKET                   V82 Oracle   NO
XMLCAST                        V11R1 Oracle NO
XMLCDATA                       V10 Oracle   NO
XMLCOMMENT                     V10 Oracle   NO
XMLCONCAT                      V92 Oracle   NO
XMLDIFF                        V11R1 Oracle NO
XMLEXISTS2                     V11R1 Oracle NO
XMLISNODE                      V92 Oracle   NO
XMLISVALID                     V92 Oracle   NO
XMLPATCH                       V11R1 Oracle NO
XMLQUERY                       V10 Oracle   NO
XMLTOOBJECT                    V11R1 Oracle NO
XMLTRANSFORM                   V92 Oracle   NO
XMLTRANSFORMBLOB               V10 Oracle   NO
XS_SYS_CONTEXT                 V11R1 Oracle NO

921 rows selected.

Offloading functions does allow the storage cells to do some of the work that would normally be done by the CPUs on the database servers. However, the saving in CPU usage is generally a relatively minor enhancement. The big gain usually comes from limiting the amount of data transferred back to the database servers. Being able to evaluate functions contained in WHERE clauses allows storage cells to send only rows of interest back to the database tier. So as with most Offloading, the primary goal of this optimization is to reduce the amount of traffic between the storage and database tiers.

Compression/Decompression

One Exadata feature that has received quite a bit of attention is Hybrid Columnar Compression (HCC). Exadata offloads the decompression of data stored in HCC format during Smart Scan operations. That is, columns of interest are decompressed on the storage cells when the compressed data is accessed via Smart Scans. This decompression is not necessary for filtering, so only the data that will be returned to the database tier will be decompressed. Note that all compression is currently done at the database tier, however. Decompression may also be done at the database tier when data is not accessed via a Smart Scan. So to make it simple, Table 2-1 shows where the work is done.

images

Decompressing data at the storage tier runs counter to the theme of most of the other Smart Scan optimizations. Most of them are geared to reducing the volume of data to be transported back to the database servers. Because decompression is such a CPU-intensive task, particularly with the higher levels of compression, the decision was made to do the decompression on the storage servers. This decision is not be locked in stone, however, as in some situations there may be ample CPU resources available to make decompressing data on the database servers an attractive option (that is, in some situations the reduction in data to be shipped may outweigh the reduction in database server CPU consumption). In fact, as of cellsrv version 11.2.2.3.1, Exadata does have the ability to return compressed data to the database servers when the storage cells are busy.

images Kevin Says: The authors are correct to surmise that the delineation of responsibility between the database grid and the storage grid are quite fluid—at least from an architectural standpoint. Allow me to explain. During Smart Scan processing, Exadata Storage Server software performs offload processing (such as filtration, projection, and decompression of EHCC data) on 1MB chunks of data. In true Smart Scan form, the product of Smart Scan (filtered and projected data) flows over iDB into the PGA of the requesting process in the database grid. However, there have always been conditions where Smart Scan is forced to halt intelligent processing within one of these 1MB chunks and return data in original block form. One such case is when Smart Scan encounters a chained row during filtration. Only the database grid can determine the location of the block that contains the chained row. Even if cells were given the intelligence to determine the locale of the chained row, it would most likely not be on the same cell; and cells have no direct communication paths among themselves, anyway. So, with this example in mind, it is easy to see that Smart Scan can revert to a block server when it needs to. For that matter, Smart Scan can revert to a block server when it wants to. I expect the shelf-life of this book to outlive some of the rigid descriptions we’ve given regarding roles and the delineation of responsibilities between the database grid and the storage grid of the Exadata Database Machine. Consider a scenario where the cells are servicing a moderately selective query of minimal complexity (for example, few or no joins, and light aggregation and sorting) against deeply compressed data. Given these parameters, the processor utilization would skew heavily toward the storage grid, as the cost of filtering, projecting, and decompressing the data is much greater than the effort being expended in the database grid. We know Smart Scan can revert to a block server when it needs to. The scenario I just described may indeed be a case when Smart Scan would want to revert to block server for at least some of its payload. Indeed, it’s better not to broker work to fully saturated processors if there are other processors near idle.

By the way, there is a hidden parameter that controls whether decompression will be offloaded at all. Unfortunately, it doesn’t just move the decompression back and forth between the storage and database tiers. If the _CELL_OFFLOAD_HYBRIDCOLUMNAR parameter is set to a value of FALSE, Smart Scans will be completely disabled on HCC data.

Encryption/Decryption

Encryption and decryption are handled in a manner very similar to compression and decompression of HCC data. Encryption is always done at the database tier, while decryption can be done by the storage servers or by the database servers. When encrypted data is accessed via Smart Scan, it is decrypted on the storage servers. Otherwise, it is decrypted on the database servers. Note that the X2-2 and x2-8 platforms both use Intel Xeon Westmere chips in the storage servers (X2-2 uses the same chips in the database servers, by the way). These chips contain a special instruction set (Intel AES-NI) that effectively adds a hardware boost to processes doing encryption or decryption. Note that Oracle Database Release 11.2.0.2 is necessary to take advantage of the new instruction set.

Encryption and HCC compression work well together. Since compression is done first, there is less work needed for processes doing encryption and decryption on HCC data. Note that the CELL_OFFLOAD_DECRYPTION parameter controls this behavior, and that as it does with the hidden parameter _CELL_OFFLOAD_HYBRIDCOLUMNAR, setting the parameter to a value of FALSE completely disables Smart Scans on encrypted data, which also disables decryption at the storage layer.

Virtual Columns

Virtual columns provide the ability to define pseudo-columns that can be calculated from other columns in a table, without actually storing the calculated value. Virtual columns may be used as partition keys, used in constraints, or indexed. Column level statistics can also be gathered on them. Since the values of virtual columns are not actually stored, they must be calculated on the fly when they are accessed. These calculations can be offloaded when access is via Smart Scans.

SYS@SANDBOX1> alter table kso.temp_skew add col1_plus_pk as (col1+pk_col);

Table altered.

SYS@SANDBOX1> select col1_plus_pk from kso.temp_skew where rownum < 10;

COL1_PLUS_PK
------------
    27998260
    27998258
    27998256
    27998254
    27998252
    27998250
    27998248
    27998246
    27998244

9 rows selected.

SYS@SANDBOX1> select count(*) from kso.temp_skew where col1_plus_pk=27998244;

  COUNT(*)
----------
         2


SYS@SANDBOX> @fsx4
Enter value for sql_text: select count(*) from kso.temp_skew where col1_plus_pk=27998244
Enter value for sql_id:

SQL_ID         CHILD OFFLOAD     AVG_ETIME IO_SAVED_% SQL_TEXT
------------- ------ ------- ------------- ---------- --------------------
35tqjjq5vzg4b      0 Yes              1.14      99.99 select count(*) from

1 row selected.

SYS@SANDBOX1> @dplan
Enter value for sql_id: 35tqjjq5vzg4b
Enter value for child_no:

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  35tqjjq5vzg4b, child number 0
-------------------------------------
select count(*) from kso.temp_skew where col1_plus_pk=27998244

Plan hash value: 725706675

----------------------------------------------------------------------------------------
| Id  | Operation                  | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |       |       | 44804 (100)|          |
|   1 |  SORT AGGREGATE            |           |     1 |    13 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| TEMP_SKEW |   320K|  4062K| 44804   (2)| 00:08:58 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("COL1"+"PK_COL"=27998244)
       filter("COL1"+"PK_COL"=27998244)


20 rows selected.


SYS@SANDBOX1> alter session set "_cell_offload_virtual_columns"=false;

Session altered.

SYS@SANDBOX1> @flush_sql
Enter value for sql_id: 35tqjjq5vzg4b

PL/SQL procedure successfully completed.

SYS@SANDBOX1> select count(*) from kso.temp_skew where col1_plus_pk=27998244;

  COUNT(*)
----------
         2

1 row selected.

SYS@SANDBOX1> @fsx4
Enter value for sql_text: select count(*) from kso.temp_skew where col1_plus_pk=27998244
Enter value for sql_id:
Enter value for inst_id:

SQL_ID         CHILD OFFLOAD     AVG_ETIME IO_SAVED_% SQL_TEXT
------------- ------ ------- ------------- ---------- --------------------
35tqjjq5vzg4b      0 Yes              3.00      59.79 select count(*) from

1 row selected.

SYS@SANDBOX1>  alter session set "_cell_offload_virtual_columns"=true;

Session altered.

SYS@SANDBOX1> select count(*) from kso.temp_skew where col1_plus_pk=27998244;

  COUNT(*)
----------
         2

1 row selected.

SYS@SANDBOX1> @fsx4
Enter value for sql_text: select count(*) from kso.temp_skew where col1_plus_pk=27998244
Enter value for sql_id:
Enter value for inst_id:

SQL_ID         CHILD OFFLOAD     AVG_ETIME IO_SAVED_% SQL_TEXT
------------- ------ ------- ------------- ---------- --------------------
35tqjjq5vzg4b      0 Yes              3.00      59.79 select count(*) from
35tqjjq5vzg4b      0 Yes              1.19      99.99 select count(*) from

2 rows selected.

This example shows that virtual column evaluation can be offloaded. It also shows that the optimization can be controlled using the _CELL_OFFLOAD_VIRTUAL_COLUMNS parameter. Note that storage indexes are not built on virtual columns. As with Function Offloading, the real advantage of offloading virtual column calculations has more to do with reducing the volume of data returned to the database servers than with reducing CPU usage on the database tier.

Data Mining Model Scoring

Some of the data model scoring functions can be offloaded. Generally speaking this optimization is aimed at reducing the amount of data transferred to the database tier as opposed to pure CPU offloading. As with other function Offloading, you can verify which data mining functions can be offloaded by querying V$SQLFN_METADATA. The output looks like this:

SYS@SANDBOX> select distinct name, version, offloadable
  2  from V$SQLFN_METADATA
  3  where name like 'PREDICT%'
  4  order by 1,2;

NAME                           VERSION      OFFLOADABLE
------------------------------ ------------ -----------
PREDICTION                     V10R2 Oracle YES
PREDICTION_BOUNDS              V11R1 Oracle NO
PREDICTION_COST                V10R2 Oracle YES
PREDICTION_DETAILS             V10R2 Oracle NO
PREDICTION_PROBABILITY         V10R2 Oracle YES
PREDICTION_SET                 V10R2 Oracle NO

6 rows selected.

As you can see, some of the functions are offloadable and some are not. The ones that are offloadable can be used by the storage cells for Predicate Filtering. Here’s an example query that should only return records that meet the scoring requirement specified in the WHERE clause:

select cust_id
from customers
where region = 'US'
and prediction_probability(churnmod,'Y' using *) > 0.8;

This optimization is designed to offload CPU usage as well as reduce the volume of data transferred. However, it is most beneficial in situations where it can reduce the data returned to the database tier, such as in the previous example.

Non-Smart Scan Offloading

There are a few optimizations that are not related to query processing. As these are not the focus of this chapter we will only touch on them briefly.

Smart File Creation

This optimization has a somewhat misleading name. It really is an optimization designed to speed up block initialization. Whenever blocks are allocated, the database must initialize them. This activity happens when tablespaces are created, but it also occurs when files are extended for any number of other reasons. On non-Exadata storage, these situations require the database server to format each block and then write them back to disk. All that reading and writing causes a lot of traffic between the database servers and the storage cells. As you are now aware, eliminating traffic between the layers is a primary goal of Exadata. So as you might imagine, this totally unnecessary traffic has been eliminated. Blocks are formatted by the storage cells themselves without having to send them to the database servers. Time spent waiting on this activity is recorded by the Smart File Creation wait event. This wait event and the operations that invoke it are covered in more detail in Chapter 10.

RMAN Incremental Backups

Exadata speeds up incremental backups by increasing the granularity of block change tracking. On non-Exadata platforms, block changes are tracked for groups of blocks; on Exadata, changes are tracked for individual blocks. This can significantly decrease the number of blocks that must be backed up, resulting in smaller backup sizes, less I/O bandwidth, and reduced time to complete incremental backups. This feature can be disabled by setting the _DISABLE_CELL_OPTIMIZED_BACKUPS parameter to a value of TRUE. This optimization is covered in Chapter 9.

RMAN Restores

This optimization speeds up the file initialization portion when restoring from backup on a cell. Although restoring databases from backups is rare, this optimization can also help speed up cloning of environments. The optimization reduces CPU usage on the database servers and reduces traffic between the two tiers. If the _CELL_FAST_FILE_RESTORE parameter is set to a value of FALSE, this behavior will be disabled. This optimization is also covered in Chapter 9.

Smart Scan Prerequisites

Smart Scans do not occur for every query run on Exadata. There are three basic requirements that must be met for Smart Scans to occur:

  • There must be a full scan of an object.
  • The scan must use Oracle’s Direct Path Read mechanism.
  • The object must be stored on Exadata storage.

There is a simple explanation as to why these requirements exist. Oracle is a C program. The function that performs Smart Scans (kcfis_read) is called by the direct path read function (kcbldrget), which is called by one of the full scan functions. It’s that simple. You can’t get to the kcfis_read function without traversing the code path from full scan to direct read. And of course, the storage will have to be running Oracle’s software in order to process Smart Scans.

We’ll discuss each of these requirements in turn.

Full Scans

In order for queries to take advantage of Exadata’s Offloading capabilities, the optimizer must decide to execute a statement with a Full Table Scan or a Fast Full Index Scan. Note that I am using these terms somewhat generically. Generally speaking, these terms correspond to TABLE ACCESS FULL and INDEX FAST FULL SCAN operations of an execution plan. With Exadata, these familiar operations have been renamed slightly to show that they are accessing Exadata storage. The new operation names are TABLE ACCESS STORAGE FULL and INDEX STORAGE FAST FULL SCAN. Note that there are also some minor variations of these operations, such as MAT_VIEW ACCESS STORAGE FULL, that also qualify for Smart Scans. You should, however, be aware that the fact that your execution plan shows a TABLE ACCESS STORAGE FULL operation does not mean that your query was performed with a Smart Scan. It merely means that this prerequisite has been satisfied. We’ll discuss how to verify whether a statement was actually Offloaded via a Smart Scan a little later in this chapter.

Direct Path Reads

In addition to requiring Full Scan operations, Smart Scans also require that the read operations be executed via Oracle’s Direct Path Read mechanism. Direct Path Reads have been around for a long time. Traditionally, this read mechanism has been used by slave processes, which service parallel queries. Because parallel queries were originally expected to be used for accessing very large amounts of data (typically much too large to fit in the Oracle buffer cache), it was decided that the parallel slaves should read data directly into their own memory (also known as the program global area or PGA). The Direct Path Read mechanism completely bypasses the standard Oracle caching mechanism of placing blocks in the buffer cache. This was a very good thing for very large data sets, as it eliminated extra work that was expected to not be helpful (caching full table scan data that would probably not be reused) and kept them from flushing other data out of the cache. And as we previously mentioned, the kcfis (kernel file intelligent storage) functions are buried under the kcbldrget (kernel block direct read get) function. Therefore, Smart Scans can only be performed if the direct path read mechanism is being used.

In addition to parallel slaves, direct path reads are possible for non-parallel SQL statements, when conditions are right. There is a hidden parameter, _SERIAL_DIRECT_READ, which controls this feature. When this parameter is set to its default value (AUTO), Oracle automatically determines whether to use direct path reads for non-parallel scans. The calculation is based on several factors including the size of the object, the size of the buffer cache and how many of the objects blocks are already in the buffer cache. There is also a hidden parameter (_SMALL_TABLE_THRESHOLD) which plays a role in determining how big a table must be before it will be considered for serial direct path reads. The algorithm for determining whether to use the direct path read mechanism on non-parallel scans is not published. While the ability to do serial direct path reads has been around for some time, it has only recently become a relatively common occurrence. Oracle Database 11gR2 has a modified version of the calculations used to determine whether to use direct path reads for non-parallel scans. The new modifications to the algorithm make the direct path read mechanism much more likely to occur than it was in previous versions. This was probably done as a result of Exadata’s Smart Scan optimizations and the desire for them to be triggered whenever possible. The algorithm may be somewhat overly aggressive on non-Exadata platforms.

images Note My Oracle Support Note: 793845.1 contains the following statement:

There have been changes in 11g in the heuristics to choose between direct path reads or reads through buffer cache for serial table scans. In 10g, serial table scans for “large” tables used to go through cache (by default) which is not the case anymore. In 11g, this decision to read via direct path or through cache is based on the size of the table, buffer cache size and various other stats. Direct path reads are faster than scattered reads and have less impact on other processes because they avoid latches.

Exadata Storage

Of course the data being scanned must be stored on Exadata storage in order for Smart Scans to occur. It is possible to create ASM disk groups that access non-Exadata storage on Exadata database servers. And of course it makes sense that any SQL statements accessing objects defined using these non-Exadata diskgroups will not be eligible for Offloading. While it is unusual, it is also possible to create ASM diskgroups using a combination of Exadata and non-Exadata storage. This might be done to facilitate a migration via an ASM rebalance, for example. Queries against objects whose segments reside on these mixed storage diskgroups are also not eligible for Offloading. There is actually an attribute assigned to ASM disk groups (cell.smart_scan_capable) that specifies whether a disk group is capable of processing Smart Scans. This attribute must be set to FALSE before non-Exadata storage can be assigned to an ASM disk group. Here’s a listing showing the process of creating a mixed storage disk group and the effects on queries against a table with segments stored in that disk group. Note that we are jumping ahead in this example, so some of the commands may not make a lot of sense yet. Don’t worry; we’ll cover the details in due time. For now we just want you to see that you can access non-Exadata storage, but that it will disable the Exadata storage-based optimizations.

SYS@+ASM> -- Add non-Exadata storage

SYS@+ASM> alter diskgroup SMITHERS add failgroup LOCAL disk '/dev/raw/raw5','/dev/raw/raw6';
alter diskgroup SMITHERS add failgroup LOCAL disk '/dev/raw/raw5','/dev/raw/raw6'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15285: disk '/dev/raw/raw5' violates disk group attribute cell.smart_scan_capable
ORA-15285: disk '/dev/raw/raw6' violates disk group attribute cell.smart_scan_capable


SYS@+ASM> alter diskgroup smithers set attribute 'cell.smart_scan_capable' = 'FALSE';

Diskgroup altered.

SYS@+ASM> alter diskgroup SMITHERS add failgroup LOCAL disk '/dev/raw/raw5','/dev/raw/raw6';

Diskgroup altered.

SYS@+ASM> select name, total_mb from v$asm_diskgroup where state='MOUNTED'

NAME                                               TOTAL_MB
-------------------------------------------------- --------
SMITHERS                                            512,000
SMITHERS_LOCAL                                        1,562


SYS@+ASM> select g.name "diskgroup", d.path "disk", d.failgroup "failgroup", d.total_mb "disk size" from v$asm_diskgroup g, v$asm_disk d where g.group_number=d.group_number and g.state='MOUNTED'

diskgroup       disk                                  failgroup               disk size
--------------- ------------------------------------- ----------------------- ---------
SMITHERS        /dev/raw/raw5                         LOCAL                   102,400
SMITHERS        /dev/raw/raw6                         LOCAL                   102,400
SMITHERS        o/192.168.12.3/SMITHERS_CD_05_cell01  ENKCEL01                102,400
SMITHERS        o/192.168.12.4/SMITHERS_CD_05_cell02  ENKCEL02                102,400
SMITHERS        o/192.168.12.5/SMITHERS_CD_05_cell03  ENKCEL03                102,400
SMITHERS_LOCAL  /dev/raw/raw1                         SMITHERS_LOCAL_0000     781
SMITHERS_LOCAL  /dev/raw/raw2                         SMITHERS_LOCAL_0001     781

7 rows selected.

We started out with a diskgroup on Exadata storage that contained a table compressed with HCC. When we initially tried to add the non-Exadata storage to that disk group, we got an error saying we had violated the cell.smart_scan_capable disk group attribute. Once we changed the attribute, setting it to FALSE, we were able to add the non-Exadata storage (of course, changing this setting disables Smart Scans on any object with segments stored in this disk group). We then logged onto the database and tried to access our compressed table:

SYS@SMITHERS> select table_name, compression, compress_for
  2  from dba_tables where owner='ACOLVIN';

TABLE_NAME                     COMPRESS COMPRESS_FOR
------------------------------ -------- ------------
SKEW3                          ENABLED  QUERY HIGH

SYS@SMITHERS> @table_size
Enter value for owner: ACOLVIN
Enter value for table_name: SKEW3
Enter value for type: TABLE

OWNER                SEGMENT_NAME                   TYPE               TOTALSIZE_MEGS TS
-------------------- ------------------------------ ------------------ -------------- -----
ACOLVIN              SKEW3                          TABLE                     1,020.0 USERS
                                                                       --------------
sum                                                                           1,020.0

SYS@SMITHERS> select count(*) from acolvin.skew3 where col1<0;
select count(*) from acolvin.skew3 where col1<0
                             *
ERROR at line 1:
ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadataimages
 storage

SYS@SMITHERS> alter table acolvin.skew3 move nocompress;

Table altered.

SYS@SMITHERS> select /*+ parallel (a 8) */ count(*) from acolvin.skew3 a;

  COUNT(*)
----------
 384000048

Elapsed: 00:03:24.64
SYS@SMITHERS> @fsx4
Enter value for sql_text: select /*+ parallel (a 8) */ count(*) from acolvin.skew3 a
Enter value for sql_id:

SQL_ID         CHILD OFFLOAD IO_SAVED_%     AVG_ETIME SQL_TEXT
------------- ------ ------- ---------- ------------- --------------------
5y9jm9pfbrg7q      0 No             .00        204.58 select /*+ parallel

1 row selected.

SYS@SMITHERS> @table_size
Enter value for owner: ACOLVIN
Enter value for table_name: SKEW3
Enter value for type: TABLE

OWNER                SEGMENT_NAME                   TYPE               TOTALSIZE_MEGS TS
-------------------- ------------------------------ ------------------ -------------- -----
ACOLVIN              SKEW3                          TABLE                    13,814.0 USERS
                                                                       --------------
sum                                                                          13,814.0

1 row selected.

So adding the non-Exadata storage also disabled HCC. We had to decompress the table simply to access it. Once this was done we executed a parallel query on the table and, as you can see, the query was not Offloaded. So all that was to show that you need Exadata storage to do Smart Scans.

Smart Scan Disablers

There are situations where Smart Scans are effectively disabled. The simple case is where they have not been enabled in the code yet, and so Smart Scans don’t happen at all. There are other cases where Oracle starts down the Smart Scan path but the storage software either decides, or is forced, to revert to block shipping mode. Generally this decision is made on a block-by-block basis.

Simply Unavailable

During the discussion of Smart Scan optimizations we have covered the prerequisites that must be met to enable Smart Scans. However, even when those conditions are met, there are circumstances that prevent Smart Scans. Here are a few other situations that are not related to specific optimizations, but where Smart Scans simply cannot be used (at least as of cellsrv version 11.2.2.2.0).

  • Smart Scans cannot be used on clustered tables.
  • Smart Scans cannot be used on Index Organized Tables (IOTs).
  • Smart Scans cannot be used on tables with ROWDEPENDENCIES enabled.

Reverting to Block Shipping

There are situations where Smart Scans are used, but for various reasons cellsrv reverts to block shipping mode. This is a very complex topic, and we struggled with whether to include it in an introductory chapter on offloading. But it is a fundamental concept and so in the end we decided to discuss it here.

We’ve described how Smart Scans avoid transferring large amounts of data to the database layer by returning prefiltered data directly to the PGA. The key concept to understand here is that Smart Scans can choose (or be forced) to return complete blocks to the SGA. Basically any situation that would cause Oracle to have to read another block to complete a record will cause this to happen. A chained row is perhaps the simplest example. When Oracle encounters a chained row, the head piece will contain a pointer to the block containing the second row piece. Since the storage cells do not communicate directly with each other, and it is unlikely that the chained block resides on the same storage cell, cellsrv simply ships the entire block and allows the database layer to deal with it.

So in this very simple case, the Smart Scan is paused momentarily, and a single block read is effectively performed, which motivates another single block read to get the additional row piece. Keep in mind that this is a very simple case. This same behavior comes into play when Oracle must deal with read consistency issues. For example, if Oracle notices that a block is “newer” than the current query, the process of finding an age-appropriate version of the block is left for the database layer to deal with. This effectively pauses the Smart Scan processing while the database does its traditional read consistency processing.

So is this really important, and why should you care? The answer, of course, is that it depends. In most cases you probably won’t care. Oracle guarantees that reads will be consistent, even when doing Smart Scans. This is an important point, by the way. The fact that Oracle behaves exactly the same from the application standpoint, regardless of whether Smart Scans are used or not is a big deal. The fact that Oracle may do some single block reads along with its Smart Scan is of little concern if the results are correct and the performance is not severely impacted, and in most cases it won’t be. There are cases, though, where choosing to do a Smart Scan and then reverting to block shipping mode can be painful from a performance standpoint. These are the cases where it’s important to understand what’s going on under the covers. You’ll find more information on this issue in Chapter 16.

Skipping Some Offloading

Another very complex behavior that we will only mention briefly is the ability of cellsrv to refuse to do some of the normal offload processing. This can be done to avoid overloading the CPU resources on the storage cells for example. A good example of this behavior occurs when decompressing HCC data. Decompression is an extremely CPU intensive task, especially for the higher levels of compression. In later versions of the storage software (11.2.2.3.0 and later), cellsrv can choose to skip the decompression step on some portion of the data when the CPUs on the storage cells are very busy and the database host’s CPU are not very busy. This effectively moves some of the workload back to the database tier by forcing the database hosts to do the decompression. In this case, some steps such as projection may still be done on the storage cells, despite the fact that the decompression step is skipped. As Kevin has already pointed out, this selective determination of which offload processing to do can make the boundaries between the tiers very fluid and difficult to monitor. Chapter 12 covers some techniques for monitoring this behavior.

How to Verify That Smart Scan is Happening

One of the most important things you can learn about Exadata is how to identify whether a query has been able to take advantage of Smart Scans. This is not as easy as it sounds. Unfortunately, the normal execution plan output produced by the DBMS_XPLAN package will not show you whether a Smart Scan was used or not. Here’s an example:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID  05cq2hb1r37tr, child number 0
-------------------------------------
select avg(pk_col) from kso.skew a where col1 > 0

Plan hash value: 568322376

-----------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |       |       | 44486 (100)|          |
|   1 |  SORT AGGREGATE            |      |     1 |    11 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| SKEW |    32M|   335M| 44486   (1)| 00:08:54 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("COL1">0)
       filter("COL1">0)

Notice that the optimizer chose a TABLE ACCESS STORAGE FULL operation and that the predicate section shows a storage() predicate associated with step 2 of the plan. Both of these characteristics indicate that a Smart Scan was possible, but neither provides a definitive verification. In fact, the statement in this listing was not executed with a Smart Scan.

images Note An interesting feature of the plan output is worth mentioning. Notice that in the predicate section there is a storage() clause and there is a matching filter() clause that both perform the same comparison. We scratched our heads about this for a while, wondering whether these clauses represented separate parts of the plan or whether it was just a quirk of the XPLAN output. There is actually a very simple explanation for this behavior. The cells have to fall back to regular block I/O in a number of cases. When a Smart Scan hits a chained row, for example, that block must be returned in its entirety to the DB layer. Since there is no such thing as a guaranteed “pure” Smart Scan, a filter operation must be included in the plan in addition to the filtering provided by the storage cells (represented by the storage() clause). So the two predicates actually represent two distinct operations. Keep in mind, though, that they will not overlap. The filter() operation will be done on rows returned via the block shipping mode, while the storage() operation will be performed on the storage cells for the rows that can be returned directly to the PGA via the normal Smart Scan mechanism.

The fact that execution plans do not show whether a Smart Scan was performed is a bit frustrating. However, there are several techniques that we can use to work around this issue. We’ll cover a few options in the next several sections.

10046 Trace

One of the most straightforward ways to determine whether a Smart Scan was used is to enable a 10046 trace on the statement in question. Unfortunately, this is a bit cumbersome and doesn’t allow you to do any investigation into what has happened with past executions. Nevertheless, tracing is a fairly foolproof way to verify whether a Smart Scan was used or not. If Smart Scan was used, there will be CELL SMART TABLE SCAN or CELL SMART INDEX SCAN events in the trace file. Here is an excerpt from the trace file collected for the previous statement:

PARSING IN CURSOR #47387827351064 len=49 dep=0 uid=0 oct=3 lid=0 tim=1297219338278533
 hv=3279003447 ad='2c8743808' sqlid='05cq2hb1r37tr'
select avg(pk_col) from kso.skew a where col1 > 0
END OF STMT
PARSE #47387827351064:c=57991,e=78256,p=25,cr=199,cu=0,mis=1,r=0,dep=0,
og=1,plh=568322376,tim=12
EXEC #47387827351064:c=0,e=14,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568322376,tim=1297
WAIT #47387827351064: nam='SQL*Net message to client' ela= 2 . . .
WAIT #47387827351064: nam='cell single block physical read' ela= 487 . . .
WAIT #47387827351064: nam='cell multiblock physical read' ela= 25262 . . .

*** 2011-02-08 20:42:19.106
WAIT #47387827351064: nam='cell multiblock physical read' ela= 20303 . . .
WAIT #47387827351064: nam='gc cr multi block request' ela= 493 . . .
WAIT #47387827351064: nam='gc cr multi block request' ela= 271 . . .
WAIT #47387827351064: nam='cell multiblock physical read' ela= 2550 . . .

*** 2011-02-08 20:42:20.107
WAIT #47387827351064: nam='cell multiblock physical read' ela= 3095 . . .
WAIT #47387827351064: nam='gc cr multi block request' ela= 548 . . .
WAIT #47387827351064: nam='gc cr multi block request' ela= 331 . . .
WAIT #47387827351064: nam='cell multiblock physical read' ela= 22930 . . .

Notice that there are no Smart Scan wait events in the trace file output. For comparison, here is a brief excerpt from a statement using Smart Scan:

PARSING IN CURSOR #2 len=32 dep=0 uid=0 oct=3 lid=0 hv=123 ad='196' sqlid='162wjnvwyybhn'
select sum(pk_col) from kso.skew
END OF STMT
PARSE #2:c=2000,e=2424,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=568322376
EXEC #2:c=0,e=34,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568322376
WAIT #2: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='KJC: Wait for msg sends to complete' ela= 10 msg=6674450368 dest|rcvr=65536
WAIT #2: nam='reliable message' ela= 1107 channel context=6712270872 channel handle=66967991
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 104 name|mode=126 2=65575 0=1 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 103 name|mode=126 2=65575 0=2 obj#=-1
WAIT #2: nam='cell smart table scan' ela= 162 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 244 cellhash#=379339958 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 181 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 1285 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 1327 cellhash#=379339958 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 1310 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 19755 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 39 cellhash#=3176594409 p2=0 p3=0 obj#=66849

As you can see, this trace file contains several CELL SMART TABLE SCAN wait events. There is no doubt that this statement was offloaded. We cover Exadata-specific and -related wait events in detail in Chapter 10.

Performance Statistics (v$sessstat)

Of course, we can also look at some of the performance views such as V$SESSSTAT and V$ACTIVE_SESSION_HISTORY. Tanel Poder’s Snapper script provides a great way to see what wait events are being generated while a statement is running; but again, you must catch it in the act to verify what’s happening. Active Session History (ASH) is good as well, but since the data is sampled, there is no guarantee that it will catch the wait event you are looking for. Nevertheless, performance statistics provide a reliable source of data as long as you can access the system during the execution of the statement you are investigating. Here’s an example using V$MYSTATS, which is simply a version of V$SESSSTAT that limits data to your current session. For this example we’ll look at the cell scans statistic, which is incremented when a Smart Table Scan occurs:

SYS@dbm1> set echo on
SYS@dbm1> @mystats
SYS@dbm1> select name, value
  2  from v$mystat s, v$statname n
  3  where n.statistic# = s.statistic#
  4  and name like nvl('%&name%',name)
  5  order by 1
  6  /
Enter value for name: cell scans

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell scans                                                                         833

1 row selected.

SYS@dbm1> set echo off
SYS@dbm1> select avg(pk_col) from kso.skew2 a where col1 > 0;

AVG(PK_COL)
-----------
 16093748.8

1 row selected.

SYS@dbm1> @mystats
Enter value for name: cell scan

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell scans                                                                         834

1 row selected.

SYS@dbm1> alter session set cell_offload_processing=false;

Session altered.

SYS@dbm1> select avg(pk_col) from kso.skew2 a where col1 > 0;

AVG(PK_COL)
-----------
 16093748.8

1 row selected.

SYS@dbm1> @mystats
Enter value for name: cell scans

NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell scans                                                                         834

1 row selected.

So as you can see, the first time the statement was executed it was offloaded, and the cell scans statistic was incremented from 833 to 834. We then turned off Smart Scans and ran the statement again. This time the statistic was not incremented. So this approach works well as long as we can catch the statements of interest in action. Note that Oracle performance statistics are complementary to the Oracle wait event interface and provide information that is just not available elsewhere. We cover the Exadata-related statistics in detail in Chapter 11.

Offload Eligible Bytes

There is another clue to whether a statement used a Smart Scan or not. As we’ve already mentioned, the V$SQL family of views contain a column called IO_CELL_OFFLOAD_ELIGIBLE_BYTES, which shows the number of bytes that are eligible for Offloading. This column can be used as an indicator of whether a statement used a Smart Scan. It appears that this column is set to a value greater than 0 only when a Smart Scan is used. We can make use of this observation to write a little script (fsx.sql) that returns a value of YES or NO depending on whether that column in V$SQL has a value greater than 0. The output of the script is a little too wide to fit in a book format, so we’ve used a couple of cut-down versions in our examples. And of course, all of the versions will be available in the online code repository. You’ve already seen the script in action in several of the previous sections. Here’s what’s inside the script and an example of its use:

SYS@SANDBOX1> !cat fsx.sql
----------------------------------------------------------------------------------------
--
-- File name:   fsx.sql
--
-- Purpose:     Find SQL and report whether it was Offloaded and % of I/O saved.
--
-- Usage:       This scripts prompts for two values.
--
--              sql_text: a piece of a SQL statement like %select col1, col2 from skew%
--
--              sql_id: the sql_id of the statement if you know it (leave blank to ignore)
--
-- Description:
--
--              This script can be used to locate statements in the shared pool and
--              determine whether they have been executed via Smart Scans.
--
--              It is based on the observation that the IO_CELL_OFFLOAD_ELIGIBLE_BYTES
--              column in V$SQL is only greater than 0 when a statement is executed
--              using a Smart Scan. The IO_SAVED_% column attempts to show the ratio of
--              of data received from the storage cells to the actual amount of data
--              that would have had to be retrieved on non-Exadata storage. Note that
--              as of 11.2.0.2, there are issues calculating this value with some queries.
--
--              Note that the AVG_ETIME will not be acurate for parallel queries. The
--              ELAPSED_TIME column contains the sum of all parallel slaves. So the
--              script divides the value by the number of PX slaves used which gives an
--              approximation.
--
--              Note also that if parallel slaves are spread across multiple nodes on
--              a RAC database the PX_SERVERS_EXECUTIONS column will not be set.
--
---------------------------------------------------------------------------------------
set pagesize 999
set lines 190
col sql_text format a70 trunc
col child format 99999
col execs format 9,999
col avg_etime format 99,999.99
col "IO_SAVED_%" format 999.99
col avg_px format 999
col offload for a7

select sql_id, child_number child, plan_hash_value plan_hash, executions execs,
(elapsed_time/1000000)/decode(nvl(executions,0),0,1,executions)/
decode(px_servers_executions,0,1,px_servers_executions/
decode(nvl(executions,0),0,1,executions)) avg_etime,
px_servers_executions/decode(nvl(executions,0),0,1,executions) avg_px,
decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,'No','Yes') Offload,
decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,0,
100*(IO_CELL_OFFLOAD_ELIGIBLE_BYTES-IO_INTERCONNECT_BYTES)
/decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,1,IO_CELL_OFFLOAD_ELIGIBLE_BYTES)) "IO_SAVED_%",
sql_text
from v$sql s
where upper(sql_text) like upper(nvl('&sql_text',sql_text))
and sql_text not like 'BEGIN :sql_text := %'
and sql_text not like '%IO_CELL_OFFLOAD_ELIGIBLE_BYTES%'
and sql_id like nvl('&sql_id',sql_id)
order by 1, 2, 3
/

SYS@SANDBOX1> select avg(pk_col) from kso.skew3 where col1 < 0;

AVG(PK_COL)
-----------
  1849142.5

Elapsed: 00:00:00.07
SYS@SANDBOX1> alter session set cell_offload_processing=false;

Session altered.

Elapsed: 00:00:00.00
SYS@SANDBOX1> select avg(pk_col) from kso.skew3 where col1 < 0;

AVG(PK_COL)
-----------
  1849142.5

Elapsed: 00:00:49.68
SYS@SANDBOX1> @fsx4
Enter value for sql_text: select avg(pk_col) from kso.skew3 where col1 < 0
Enter value for sql_id:

SQL_ID         CHILD OFFLOAD IO_SAVED_%  AVG_ETIME SQL_TEXT
------------- ------ ------- ---------- ---------- ---------------------------------
a6j7wgqf84jvg      0 Yes         100.00        .07 select avg(pk_col) from kso.skew3
a6j7wgqf84jvg      1 No             .00      49.68 select avg(pk_col) from kso.skew3

Elapsed: 00:00:00.04

In the fsx script you can see that the OFFLOAD column is just a DECODE that checks to see if the IO_CELL_OFFLOAD_ELIGIBLE_BYTES column is equal to 0 or not. The IO_SAVED_% column is calculated using the IO_INTERCONNECT_BYTES field, and it attempts to show how much data was returned to the database servers. The example shows the same statement run once with Smart Scan enabled and once with Smart Scan disabled. We used the CELL_OFFLOAD_PROCESSING parameter to turn Smart Scans on and off for this purpose. Changing this parameter caused the original cursor to be invalidated because of an optimizer mismatch. This resulted in two child cursors for the statement. In our example, the output of the script shows that one version of the statement was offloaded using Smart Scans and took less than a second. The second execution of the statement though was not Offloaded and took almost a minute.

The technique used by the fsx script seems to work pretty well most of the time. However, in situations where a single child cursor is used repeatedly, it is possible that some executions may be executed using Smart Scans while others are not. While this is an unusual situation, it can cause confusion because the IO_CELL_OFFLOAD_ELIGIBLE_BYTES column contains a cumulative value for all executions. That is to say that each execution adds its eligible bytes count to the running total. When some executions use Smart Scans and some don’t, the IO_CELL_OFFLOAD_ELIGIBLE_BYTES column will be greater than 0. This is a fairly rare occurrence and you may never run into it. Nevertheless, here is an example:

SYS@SANDBOX1> alter session set "_serial_direct_read"=true;

Session altered.

Elapsed: 00:00:00.01
SYS@SANDBOX1> -- execution 1
SYS@SANDBOX1> select avg(pk_col) from kso.skew a where col1 > 0;

AVG(PK_COL)
-----------
 16093748.8

1 row selected.

Elapsed: 00:00:03.51
SYS@SANDBOX1> @fsx3
Enter value for sql_text: %skew%
Enter value for sql_id:
SQL_ID         CHILD OFFLOAD EXECS ELIGIBLE_BYTES SQL_TEXT
------------- ------ ------- ----- -------------- --------------------
05cq2hb1r37tr      0 Yes         1       38797312 select avg(pk_col) f

1 row selected.

Elapsed: 00:00:00.01
SYS@SANDBOX1> alter session set "_serial_direct_read"=false;

Session altered.

Elapsed: 00:00:00.00
SYS@SANDBOX1> -- execution 2
SYS@SANDBOX1> select avg(pk_col) from kso.skew a where col1 > 0;

AVG(PK_COL)
-----------
 16093748.8

1 row selected.

Elapsed: 00:00:04.71
SYS@SANDBOX1> @fsx3
Enter value for sql_text: %skew%
Enter value for sql_id:

SQL_ID         CHILD OFFLOAD EXECS ELIGIBLE_BYTES SQL_TEXT
------------- ------ ------- ----- -------------- --------------------
05cq2hb1r37tr      0 Yes         2       38797312 select avg(pk_col) f

1 row selected.

Elapsed: 00:00:00.01
SYS@SANDBOX1>
SYS@SANDBOX1> alter session set "_serial_direct_read"=true;

Session altered.

Elapsed: 00:00:00.01
SYS@SANDBOX1> -- execution 3
SYS@SANDBOX1> select avg(pk_col) from kso.skew a where col1 > 0;

AVG(PK_COL)
-----------
 16093748.8

1 row selected.

Elapsed: 00:00:03.54
SYS@SANDBOX1> @fsx3
Enter value for sql_text: %skew%
Enter value for sql_id:
SQL_ID         CHILD OFFLOAD EXECS ELIGIBLE_BYTES SQL_TEXT
------------- ------ ------- ----- -------------- --------------------
05cq2hb1r37tr      0 Yes         3       58195968 select avg(pk_col) f

1 row selected.

Elapsed: 00:00:00.01

In this example we used the _SERIAL_DIRECT_READ parameter to disable Smart Scans. This parameter does not invalidate cursors, so the same cursor was used for all three executions of the statement. If you look at the second execution, you can probably guess that it didn’t use a Smart Scan, as it was slower than the previous execution. You can verify that assumption by noting that the Eligible Byte Count did not increase. However, the fsx script simply checks to see if the value of the IO_CELL_OFFLOAD_ELIGIBLE_BYTES column is greater than 0. As you can see, the value of the column remained the same between executions causing the fsx script to report that the statement was offloaded, even though the second execution was not offloaded. So keep in mind that our indicator is a cumulative value for all the executions of this cursor. Note that in the previous example we set up a rather unusual situation where a single cursor is offloaded on one execution and not on another. This rarely happens in real life.

The technique demonstrated by the fsx script provides a very useful alternative to tracing or using session statistics to verify whether Smart Scans are being performed. The biggest advantage is that you don’t have to try to catch Smart Scan in real time. The IO_CELL_OFFLOAD_ELIGIBLE_BYTES column is stored in V$SQL and related views, which means it’s also captured by AWR. This provides us with a historical view into how statements have been processed in the past. It is one of the main tools that we use for a quick-and-dirty verification of whether a Smart Scan has been used or not.

SQL Monitoring

There is one other tool that is very useful for determining whether a SQL statement was Offloaded. The REPORT_SQL_MONITOR procedure is part of the new Real Time SQL Monitoring functionality that was added with 11g. It is built into the DBMS_SQLTUNE package and provides a great deal of information, not only on whether a statement was offloaded, but also on which steps in a plan were offloaded. Here’s an example (first of a statement that was not offloaded and then of the same statement when it was offloaded):

SYS@SANDBOX1> alter session set cell_offload_processing=false;

Session altered.

SYS@SANDBOX1> set echo off
SYS@SANDBOX1> @avgskew3

AVG(PK_COL)
-----------
 16093750.2

SYS@SANDBOX1> @fsx4
Enter value for sql_text: %skew3%
Enter value for sql_id:
SQL_ID         CHILD OFFLOAD IO_SAVED_%  AVG_ETIME SQL_TEXT
------------- ------ ------- ---------- ---------- ----------------------------------------
6uutdmqr72smc      0 No             .00      57.95 select /* avgskew3.sql */ avg(pk_col) fr

SYS@SANDBOX1> @report_sql_monitor
Enter value for sid:
Enter value for sql_id: 6uutdmqr72smc
Enter value for sql_exec_id:

REPORT
------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
select /* avgskew3.sql */ avg(pk_col) from kso.skew3 a where col1 > 0

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  SYS (3:5465)
 SQL ID              :  6uutdmqr72smc
 SQL Execution ID    :  16777216
 Execution Started   :  03/15/2011 15:26:11
 First Refresh Time  :  03/15/2011 15:26:19
 Last Refresh Time   :  03/15/2011 15:27:09
 Duration            :  58s
 Module/Action       :  [email protected] (TNS V1-V3)/-
 Service             :  SYS$USERS
 Program             :  [email protected] (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
==========================================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  | Fetch | Buffer | Read  | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs  | Bytes |
==========================================================================================
|      58 |      35 |       23 |        0.00 |     0.00 |     1 |     2M | 15322 |  15GB |
==========================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2684249835)
======================================================================================
| Id |          Operation           | Name  | ... | Activity |    Activity Detail    |
|    |                              |       |     |   (%)    |      (# samples)      |
======================================================================================
|  0 | SELECT STATEMENT             |       |     |          |                       |
|  1 |   SORT AGGREGATE             |       |     |    41.38 | Cpu (24)              |
|  2 |    TABLE ACCESS STORAGE FULL | SKEW3 |     |    58.62 | Cpu (11)              |
|    |                              |       |     |          | direct path read (23) |
======================================================================================

SYS@SANDBOX1> @ss_on
SYS@SANDBOX1> alter session set cell_offload_processing=true;

Session altered.

SYS@SANDBOX1> @avgskew3

AVG(PK_COL)
-----------
 16093750.2

SYS@SANDBOX1> @fsx4
Enter value for sql_text: %skew3%
Enter value for sql_id:

SQL_ID         CHILD OFFLOAD IO_SAVED_%  AVG_ETIME SQL_TEXT
------------- ------ ------- ---------- ---------- ----------------------------------------
6uutdmqr72smc      0 Yes          71.85      34.54 select /* avgskew3.sql */ avg(pk_col) fr

SYS@SANDBOX1> @report_sql_monitor
Enter value for sid:
Enter value for sql_id: 6uutdmqr72smc
Enter value for sql_exec_id:

REPORT
----------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
select /* avgskew3.sql */ avg(pk_col) from kso.skew3 a where col1 > 0

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  SYS (3:5467)
 SQL ID              :  6uutdmqr72smc
 SQL Execution ID    :  16777219
 Execution Started   :  03/15/2011 15:36:11
 First Refresh Time  :  03/15/2011 15:36:15
 Last Refresh Time   :  03/15/2011 15:36:45
 Duration            :  34s
 Module/Action       :  [email protected] (TNS V1-V3)/-
 Service             :  SYS$USERS
 Program             :  [email protected] (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
=========================================================================================
| Elapsed |   Cpu   |    IO    | Application | Fetch | Buffer | Read  | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Calls |  Gets  | Reqs  | Bytes | Offload |
=========================================================================================
|      35 |      31 |     3.85 |        0.00 |     1 |     2M | 18422 |  15GB |  71.83% |
=========================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2684249835)
============================================================================================
| Id |          Operation           | Name  | |  Cell   | Activity |      Activity Detail  |
|    |                              |       | | Offload |   (%)    |        (# samples)    |
============================================================================================
|  0 | SELECT STATEMENT             |       | |         |          |                       |
|  1 |   SORT AGGREGATE             |       | |         |    50.00 | Cpu (17)              |
|  2 |    TABLE ACCESS STORAGE FULL | SKEW3 | |  71.83% |    50.00 | Cpu (12)              |
|    |                              |       | |         |          | cell smart tab... (5) |
============================================================================================

Note that we cut out a number of columns from the report because it is very wide and doesn’t fit nicely in book format. Nevertheless, you can see that the report shows which steps were Offloaded (Cell Offload), where the statement spent its time (Activity %), and what it spent time doing (Activity Detail). This can be extremely useful with more complex statements that have multiple steps eligible for offloading. Also note that monitoring occurs automatically on parallelized statements and on statements that the optimizer anticipates will run for a long time. If Oracle is not automatically choosing to monitor a statement that is of interest, you can use the MONITOR hint to tell Oracle to monitor the statement.

Parameters

There are several parameters that apply to Offloading. The main one is CELL_OFFLOAD_PROCESSING, which turns Offloading on and off. There are several others that are of less importance. Table 2-2 shows a list of the non-hidden parameters that affect Offloading (as of Oracle database version 11.2.0.2). Note that we have also included the hidden parameter, _SERIAL_DIRECT_READ, which controls this very important feature.

images

images

In addition to the normal Oracle-approved parameters, there are a number of so-called hidden parameters that affect various aspects of Offloading. The following listing shows all the cell parameters, including the hidden parameters, along with their descriptions:

SYS@POC1> @parmsd
Enter value for parameter: cell
Enter value for isset:
Enter value for show_hidden: Y

NAME                                          DESCRIPTION
--------------------------------------------- ---------------------------------------------
cell_offload_compaction                       Cell packet compaction strategy
cell_offload_decryption                       enable SQL processing offload of encrypted
                                              data to cells

cell_offload_parameters                       Additional cell offload parameters
cell_offload_plan_display                     Cell offload explain plan display
cell_offload_processing                       enable SQL processing offload to cells
_allow_cell_smart_scan_attr                   Allow checking smart_scan_capable Attr
_cell_fast_file_create                        Allow optimized file creation path for Cells
_cell_fast_file_restore                       Allow optimized rman restore for Cells
_cell_file_format_chunk_size                  Cell file format chunk size in MB
_cell_index_scan_enabled                      enable CELL processing of index FFS
_cell_offload_capabilities_enabled            specifies capability table to load
_cell_offload_hybridcolumnar                  Query offloading of hybrid columnar
                                              compressed tables to exadata

_cell_offload_predicate_reordering_enabled    enable out-of-order SQL processing offload to
                                              cells

_cell_offload_timezone                        enable timezone related SQL processing
                                              offload to cells

_cell_offload_virtual_columns                 enable offload of predicates on virtual
                                              columns to cells

_cell_range_scan_enabled                      enable CELL processing of index range scans
_cell_storidx_mode                            Cell Storage Index mode
_db_check_cell_hints
_disable_cell_optimized_backups               disable cell optimized backups
_kcfis_cell_passthru_enabled                  Do not perform smart IO filtering on the cell
_kcfis_kept_in_cellfc_enabled                 Enable usage of cellsrv flash cache for kept
                                              objects

_kcfis_nonkept_in_cellfc_enabled              Enable use of cellsrv flash cache for
                                              non-kept objects


22 rows selected.

We’ve used a few of these parameters in the examples in this chapter and we’ll use several in the upcoming chapters. There are several other kcfis (kernel file intelligent storage) parameters as well. Here’s a listing of them with descriptions:

SYS@SMITHERS> @parmsd
Enter value for parameter: kcfis
Enter value for isset:
Enter value for show_hidden: Y

NAME                                     DESCRIPTION
---------------------------------------- --------------------------------------------------
_kcfis_block_dump_level                  Smart IO block dump level
_kcfis_caching_enabled                   enable kcfis intra-scan session caching
_kcfis_cell_passthru_enabled             Do not perform smart IO filtering on the cell
_kcfis_control1                          Kcfis control1
_kcfis_control2                          Kcfis control2
_kcfis_control3                          Kcfis control3
_kcfis_control4                          Kcfis control4
_kcfis_control5                          Kcfis control5
_kcfis_control6                          Kcfis control6
_kcfis_disable_platform_decryption       Don't use platform-specific decryption on the
                                         storage cell

_kcfis_dump_corrupt_block                Dump any corrupt blocks found during smart IO
_kcfis_fast_response_enabled             Enable smart scan optimization for fast response
                                         (first rows)

_kcfis_fast_response_initiosize          Fast response - The size of the first IO in
                                         logical blocks

_kcfis_fast_response_iosizemult          Fast response - (next IO size = current IO size *
                                         this parameter)

_kcfis_fast_response_threshold           Fast response - the number of IOs after which
                                         smartIO is used

_kcfis_fault_control                     Fault Injection Control
_kcfis_io_prefetch_size                  Smart IO prefetch size for a cell
_kcfis_ioreqs_throttle_enabled           Enable Smart IO requests throttling
_kcfis_kept_in_cellfc_enabled            Enable usage of cellsrv flash cache for kept
                                         objects

_kcfis_large_payload_enabled             enable large payload to be passed to cellsrv
_kcfis_max_cached_sessions               Sets the maximum number of kcfis sessions cached
_kcfis_max_out_translations              Sets the maximum number of outstanding
                                         translations in kcfis

_kcfis_nonkept_in_cellfc_enabled         Enable use of cellsrv flash cache for non-kept
                                         objects

_kcfis_oss_io_size                       KCFIS OSS I/O size
_kcfis_rdbms_blockio_enabled             Use block IO instead of smart IO in the smart IO
                                         module on RDBMS

_kcfis_read_buffer_limit                 KCFIS Read Buffer (per session) memory limit in
                                         bytes

_kcfis_spawn_debugger                    Decides whether to spawn the debugger at kcfis
                                         initialize

_kcfis_stats_level                       sets kcfis stats level
_kcfis_storageidx_diag_mode              Debug mode for storage index on the cell
_kcfis_storageidx_disabled               Don't use storage index optimization on the
                                         storage cell

_kcfis_test_control1                     kcfis tst control1
_kcfis_trace_bucket_size                 KCFIS tracing bucket size in bytes
_kcfis_trace_level                       sets kcfis tracing level
_kcfis_work_set_appliances               Working Set of appliances in a KCFIS session

34 rows selected.

Note that hidden parameters should not be used on production systems without prior discussion with Oracle support. But they do provide valuable clues about how some of the Exadata features work and are controlled.

Summary

Offloading really is the secret sauce of Exadata. While the hardware architecture does a good job of providing more balance between the storage layer’s ability to deliver data and the database layer’s ability to consume it, the bulk of the performance gains are provided by the software. Smart Scans are largely responsible for these gains. The primary focus of most of these optimizations is to reduce the amount of data transferred between the storage tier and the database tier.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.144.65