Chapter 3. Parallel Execution Concepts

This chapter presents Oracle’s parallel execution (sometimes known as parallel SQL) features: parallel query, parallel DML, parallel DDL, and several other types of parallel operations. If implemented effectively, parallel operations can significantly improve the performance of data-intensive jobs. The decision to execute an operation in parallel is made by Oracle at runtime and is mostly transparent to programmers and end users.

Oracle’s parallel execution features are different from the features provided by Oracle Parallel Server (OPS). You don’t need OPS to perform parallel SQL execution. However, parallel SQL execution can work in conjunction with OPS, and some aspects of parallel execution apply only to Oracle Parallel Server. Throughout this chapter, unless we note otherwise, we are dealing with parallel SQL operations in a standalone instance environment. Part III of this book discusses OPS; in particular, Chapter 13, deals with parallel execution in an OPS environment.

What Is Parallel SQL?

When a user connects to an Oracle instance, a process referred to as a server process, or shadow process, starts on the database server. The purpose of this shadow process is to perform the operations requested by the user process. When the user executes a SQL statement, the corresponding shadow process performs the necessary tasks to get the result. In a serial execution environment, all the tasks are performed by a single process (the shadow process), as shown in Figure 3.1.

Serial statement execution

Figure 3-1. Serial statement execution

Things are different in a parallel execution environment. In a parallel execution environment, the tasks involved in executing a SQL statement are divided among multiple processes, which work together to complete the execution faster. These multiple processes are referred to as slave processes. Figure 3.2 illustrates parallel execution.

Parallel statement execution

Figure 3-2. Parallel statement execution

Symmetric Multiprocessing (SMP) systems, Massively Parallel Processing (MPP) systems, and clustered systems benefit significantly from executing SQL statements in parallel, because they allow tasks to be spread across the multiple CPUs available in these systems. You can use parallel SQL on single-CPU machines too. However, multiple parallel processes working on behalf of one SQL statement would contend for the same CPU and ultimately might result in poor performance.

Parallel SQL was first introduced in Oracle7 (release 7.1) as the Oracle Parallel Query Option (PQO). PQO is a separately installable option in Oracle7 and helps in parallelizing queries (SELECT statements). With the release of Oracle8, PQO was incorporated into the core RDBMS product and now gets installed along with it. In addition, the parallel execution feature was enhanced to add support for more parallel operations. The full list of Oracle parallel execution features currently includes the following:

Parallel query

The ability to break up the execution of a SELECT statement into multiple tasks and then to execute those tasks in parallel using multiple processors

Parallel data loading

The ability to run multiple SQL*Loader sessions in parallel, which all load data into the same table

Parallel DML

The ability to parallelize INSERT, UPDATE, and DELETE statements

Parallel DDL (object creation)

The ability to parallelize DDL statements such as CREATE TABLE, CREATE INDEX, and ALTER INDEX

Parallel recovery

The ability to use multiple processes to perform instance and media recovery

Parallel replication propagation

The ability to propagate changes from one database to another using multiple processes working together in parallel

In the following sections we’ll talk in somewhat more detail about the types of operations that can be parallelized and the benefits that can accrue from parallelization. We’ll also include a specific example showing both a statement that benefits from parallelism and a statement that does not.

Operations That Can Be Parallelized

Oracle can parallelize operations that involve processing an entire table or an entire partition. These operations include:

  • SQL queries requiring at least one full table scan or queries involving an index range scan spanning multiple partitions

  • Operations such as creating or rebuilding an index or rebuilding one or more partitions of an index

  • Partition operations such as moving or splitting partitions

  • CREATE TABLE AS SELECT operations, if the SELECT involves a full table or partition scan

  • INSERT INTO . . . SELECT operations, if the SELECT involves a full table or partition scan

  • Update and delete operations on partitioned tables

When Parallel Execution Is Beneficial

Parallel execution of SQL statements is not beneficial in all circumstances. To gain a benefit from parallelization, one or more of the following must apply:

The host computer has spare CPU and memory capacity

Oracle parallellizes a SQL statement by breaking it down into smaller units, assigning each unit to a slave process, and executing all of the units in parallel. Each of these slave processes needs CPU and memory resources in order to run. We recommend not using parallel execution when your system is running at full CPU or memory utilization. Doing so will cause memory and CPU bottlenecks as a result of competition for these resources by the slave processes.

The data being accessed is spread across multiple disk drives

Multiple parallel slave processes can simultaneously read data from multiple disks without having to compete with one another. If the data being accessed resides on a single disk, then all of the parallel slave processes working on this data will contend for that same disk, and I/O bottlenecks will occur.

Jobs are long running or resource intensive

Executing a SQL statement in parallel involves some overhead. The coordinator process must divide up the work of executing the statement among two or more slave processes. It then must collect the results from each of those slave processes and combine them. All of these operations add to the burden of executing the query. For SQL statements that already execute quickly, the overhead involved in parallelizing them will exceed the savings. It’s the long-running and resource-intensive queries that benefit most from being executed in parallel.

Examples of Parallelism’s Impact on Performance

Let’s look at some examples that demonstrate the effect of parallel execution on performance. You will see that while parallel execution is not always beneficial when small amounts of data are involved, it can improve performance for queries involving a large amount of data.

The following example shows the DEGREE setting for the customers table, as well as the elapsed time needed to execute a query to count the number of records in the table. The DEGREE setting determines the number of parallel slave processes to be used to execute the query in parallel. Section 3.2.2 later in this chapter discusses the DEGREE setting (and how it is set) in more detail. For example:

SQL> SELECT degree 
  2  FROM user_tables 
  3  WHERE table_name = 'CUSTOMERS';
    DEGREE
----------
         1

SQL> SELECT COUNT(*) FROM customers;
  COUNT(*)
----------
       984
Elapsed: 00:00:00.09

The customers table is small and won’t be accessed in parallel because DEGREE is set to 1. The COUNT(*) query on this table returned a result in 0.09 seconds.

Now, let’s enable parallel query for the customers table and see what difference that makes on execution time. We’ll set the table’s DEGREE parameter to 4 (via the ALTER TABLE statement), thus requesting the use of four parallel slave processes. Notice the effect on elapsed execution time:

SQL> ALTER TABLE customers PARALLEL(DEGREE 4);
Table altered.

SQL> SELECT degree FROM user_tables 
                  WHERE table_name = 'CUSTOMERS';
    DEGREE
----------
         4

SQL> SELECT COUNT(*) FROM customers;
  COUNT(*)
----------
       984
Elapsed: 00:00:00.18

Enabling parallel execution for this small table actually hurts the performance of the query in this example. Using parallel execution and a DEGREE of 4, it takes 0.18 seconds to count the records in the table. That’s twice as long as the 0.09 seconds required to count the records when parallel execution was not used. In this case, because the query is so small, the overhead of parallelism is not justified.

Next, let’s look at an example in which we query a larger table. This example shows the benefit of parallelism in larger queries. The ORDERS table currently has a DEGREE setting of 1. Note the elapsed time for the COUNT(*) query with serial execution:

SQL> SELECT degree FROM user_tables 
  2  WHERE table_name = 'ORDERS';
    DEGREE
----------
         1

SQL> SELECT COUNT(*) FROM orders;
  COUNT(*)
----------
   2465726
Elapsed: 00:00:01.32

The serial COUNT(*) query takes 1.32 seconds to complete. Now, let’s alter the table definition to set DEGREE to 4, thereby requesting that Oracle use four parallel slave processes to execute a query on the table. Note the elapsed time for the parallel COUNT(*) query with DEGREE 4:

SQL> ALTER TABLE orders PARALLEL(DEGREE 4);
Table altered.

SQL> SELECT degree FROM user_tables 
  2  WHERE table_name = 'ORDERS';
    DEGREE
----------
         4

SQL> SELECT COUNT(*) FROM orders;
  COUNT(*)
----------
   2465726
Elapsed: 00:00:00.74

With serial execution, a COUNT(*) query on the orders table returns a result in 1.32 seconds. With parallel execution and with DEGREE set to 4, the same query takes 0.74 seconds. With DEGREE set to 4, 4 parallel processes are used by this query to produce the result. However, it’s interesting to notice that the improvement in performance (from 1.32 seconds to 0.74 seconds) is not fourfold. This is because time is spent in splitting the job into four smaller units, assigning each unit of work to a separate process and combining the results. This simple example illustrates the impact of parallelism on performance. However, queries taking only a few seconds to execute may not be ideal candidates for parallel execution in all situations. As discussed in the previous section, long-running queries are the ones that gain maximum advantage from parallel execution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.223.123