This chapter presents Oracle’s parallel execution (sometimes known as parallel SQL) features: parallel query, parallel DML, parallel DDL, and several other types of parallel operations. If implemented effectively, parallel operations can significantly improve the performance of data-intensive jobs. The decision to execute an operation in parallel is made by Oracle at runtime and is mostly transparent to programmers and end users.
Oracle’s parallel execution features are different from the features provided by Oracle Parallel Server (OPS). You don’t need OPS to perform parallel SQL execution. However, parallel SQL execution can work in conjunction with OPS, and some aspects of parallel execution apply only to Oracle Parallel Server. Throughout this chapter, unless we note otherwise, we are dealing with parallel SQL operations in a standalone instance environment. Part III of this book discusses OPS; in particular, Chapter 13, deals with parallel execution in an OPS environment.
When a user connects to an Oracle instance, a process referred to as a server process, or shadow process, starts on the database server. The purpose of this shadow process is to perform the operations requested by the user process. When the user executes a SQL statement, the corresponding shadow process performs the necessary tasks to get the result. In a serial execution environment, all the tasks are performed by a single process (the shadow process), as shown in Figure 3.1.
Things are different in a parallel execution environment. In a parallel execution environment, the tasks involved in executing a SQL statement are divided among multiple processes, which work together to complete the execution faster. These multiple processes are referred to as slave processes. Figure 3.2 illustrates parallel execution.
Symmetric Multiprocessing (SMP) systems, Massively Parallel Processing (MPP) systems, and clustered systems benefit significantly from executing SQL statements in parallel, because they allow tasks to be spread across the multiple CPUs available in these systems. You can use parallel SQL on single-CPU machines too. However, multiple parallel processes working on behalf of one SQL statement would contend for the same CPU and ultimately might result in poor performance.
Parallel SQL was first introduced in Oracle7 (release 7.1) as the Oracle Parallel Query Option (PQO). PQO is a separately installable option in Oracle7 and helps in parallelizing queries (SELECT statements). With the release of Oracle8, PQO was incorporated into the core RDBMS product and now gets installed along with it. In addition, the parallel execution feature was enhanced to add support for more parallel operations. The full list of Oracle parallel execution features currently includes the following:
The ability to break up the execution of a SELECT statement into multiple tasks and then to execute those tasks in parallel using multiple processors
The ability to run multiple SQL*Loader sessions in parallel, which all load data into the same table
The ability to parallelize INSERT, UPDATE, and DELETE statements
The ability to parallelize DDL statements such as CREATE TABLE, CREATE INDEX, and ALTER INDEX
The ability to use multiple processes to perform instance and media recovery
The ability to propagate changes from one database to another using multiple processes working together in parallel
In the following sections we’ll talk in somewhat more detail about the types of operations that can be parallelized and the benefits that can accrue from parallelization. We’ll also include a specific example showing both a statement that benefits from parallelism and a statement that does not.
Oracle can parallelize operations that involve processing an entire table or an entire partition. These operations include:
SQL queries requiring at least one full table scan or queries involving an index range scan spanning multiple partitions
Operations such as creating or rebuilding an index or rebuilding one or more partitions of an index
Partition operations such as moving or splitting partitions
CREATE TABLE AS SELECT operations, if the SELECT involves a full table or partition scan
INSERT INTO . . . SELECT operations, if the SELECT involves a full table or partition scan
Update and delete operations on partitioned tables
Parallel execution of SQL statements is not beneficial in all circumstances. To gain a benefit from parallelization, one or more of the following must apply:
Oracle parallellizes a SQL statement by breaking it down into smaller units, assigning each unit to a slave process, and executing all of the units in parallel. Each of these slave processes needs CPU and memory resources in order to run. We recommend not using parallel execution when your system is running at full CPU or memory utilization. Doing so will cause memory and CPU bottlenecks as a result of competition for these resources by the slave processes.
Multiple parallel slave processes can simultaneously read data from multiple disks without having to compete with one another. If the data being accessed resides on a single disk, then all of the parallel slave processes working on this data will contend for that same disk, and I/O bottlenecks will occur.
Executing a SQL statement in parallel involves some overhead. The coordinator process must divide up the work of executing the statement among two or more slave processes. It then must collect the results from each of those slave processes and combine them. All of these operations add to the burden of executing the query. For SQL statements that already execute quickly, the overhead involved in parallelizing them will exceed the savings. It’s the long-running and resource-intensive queries that benefit most from being executed in parallel.
Let’s look at some examples that demonstrate the effect of parallel execution on performance. You will see that while parallel execution is not always beneficial when small amounts of data are involved, it can improve performance for queries involving a large amount of data.
The following example shows the DEGREE setting for the customers table, as well as the elapsed time needed to execute a query to count the number of records in the table. The DEGREE setting determines the number of parallel slave processes to be used to execute the query in parallel. Section 3.2.2 later in this chapter discusses the DEGREE setting (and how it is set) in more detail. For example:
SQL>SELECT degree
2FROM user_tables
3WHERE table_name = 'CUSTOMERS';
DEGREE ---------- 1 SQL>SELECT COUNT(*) FROM customers;
COUNT(*) ---------- 984 Elapsed: 00:00:00.09
The customers table is small and won’t be accessed in parallel because DEGREE is set to 1. The COUNT(*) query on this table returned a result in 0.09 seconds.
Now, let’s enable parallel query for the customers table and see what difference that makes on execution time. We’ll set the table’s DEGREE parameter to 4 (via the ALTER TABLE statement), thus requesting the use of four parallel slave processes. Notice the effect on elapsed execution time:
SQL>ALTER TABLE customers PARALLEL(DEGREE 4);
Table altered. SQL>SELECT degree FROM user_tables
WHERE table_name = 'CUSTOMERS';
DEGREE ---------- 4 SQL>SELECT COUNT(*) FROM customers;
COUNT(*) ---------- 984 Elapsed: 00:00:00.18
Enabling parallel execution for this small table actually hurts the performance of the query in this example. Using parallel execution and a DEGREE of 4, it takes 0.18 seconds to count the records in the table. That’s twice as long as the 0.09 seconds required to count the records when parallel execution was not used. In this case, because the query is so small, the overhead of parallelism is not justified.
Next, let’s look at an example in which we query a larger table. This example shows the benefit of parallelism in larger queries. The ORDERS table currently has a DEGREE setting of 1. Note the elapsed time for the COUNT(*) query with serial execution:
SQL>SELECT degree FROM user_tables
2WHERE table_name = 'ORDERS';
DEGREE ---------- 1 SQL>SELECT COUNT(*) FROM orders;
COUNT(*) ---------- 2465726 Elapsed: 00:00:01.32
The serial COUNT(*) query takes 1.32 seconds to complete. Now, let’s alter the table definition to set DEGREE to 4, thereby requesting that Oracle use four parallel slave processes to execute a query on the table. Note the elapsed time for the parallel COUNT(*) query with DEGREE 4:
SQL>ALTER TABLE orders PARALLEL(DEGREE 4);
Table altered. SQL>SELECT degree FROM user_tables
2WHERE table_name = 'ORDERS';
DEGREE ---------- 4 SQL>SELECT COUNT(*) FROM orders;
COUNT(*) ---------- 2465726 Elapsed: 00:00:00.74
With serial execution, a COUNT(*) query on the orders table returns a result in 1.32 seconds. With parallel execution and with DEGREE set to 4, the same query takes 0.74 seconds. With DEGREE set to 4, 4 parallel processes are used by this query to produce the result. However, it’s interesting to notice that the improvement in performance (from 1.32 seconds to 0.74 seconds) is not fourfold. This is because time is spent in splitting the job into four smaller units, assigning each unit of work to a separate process and combining the results. This simple example illustrates the impact of parallelism on performance. However, queries taking only a few seconds to execute may not be ideal candidates for parallel execution in all situations. As discussed in the previous section, long-running queries are the ones that gain maximum advantage from parallel execution.
18.191.223.123