There are many fine books on the market that cover the PL/SQL language in great detail.[1] Because this is a book about Oracle SQL, the focus of this chapter is the use of PL/SQL within SQL statements as well as the use of SQL within PL/SQL programs.
PL/SQL is a procedural programming language from Oracle that combines the following elements:
Logical constructs such as IF-THEN-ELSE and WHILE
SQL DML statements, built-in functions, and operators
Transaction control statements such as COMMIT and ROLLBACK
Cursor control statements
Object and collection manipulation statements
Despite its humble origins as a scripting language in Version 6.0, PL/SQL became an integral part of the Oracle server with release 7.0. Because release 7.0 included the ability to compile and store PL/SQL programs within the server, Oracle began using the language to provide server functionality and to assist in database installation and configuration. With the 7.1 release of the server, Oracle added a new feature of particular use to SQL programmers: the ability to call PL/SQL stored functions from SQL statements (more on this later).
Along with the array of new features made available with each release of PL/SQL, Oracle began supplying prefabricated sets of PL/SQL functionality to allow programmers to tackle more sophisticated programming tasks and to help integrate with various Oracle product offerings. These collections of stored procedures and functions, known as Oracle Supplied Packages, allow you to (among other things):
Interface with and administer Oracle’s Advanced Queueing feature
Schedule database tasks for periodic execution
Manipulate Oracle large objects (LOBs)
Read from and write to external files
Interface with Oracle’s Advanced Replication features
Issue dynamic SQL statements
Generate and parse XML files
Issue LDAP commands
The ever-expanding feature set of the PL/SQL language combined with the wide array of supplied packages has yielded a powerful database programming environment. Whether you are generating reports, writing data loading scripts, or writing custom applications, there’s probably a place for PL/SQL in your project.
Although PL/SQL can still be used to write scripts, also known as anonymous blocks, the focus of this chapter is PL/SQL routines stored in the Oracle Database server. PL/SQL routines stored in the database may be one of two types: stored procedures or stored functions.[2] Stored functions and procedures are essentially identical except for the following:
Stored functions have a return type, whereas procedures do not.
Because stored functions return a value, they can be used in expressions, whereas procedures cannot.
Stored functions and procedures may be compiled individually, or they may be grouped together into packages. Along with being a convenient way to group related functionality together, packages are important for the following reasons:
Packages are loaded into memory as a whole, increasing the likelihood that a procedure or function will be resident in memory when called.
Packages can include private elements, allowing logic to be hidden from view.
Placing functions and procedures inside packages eliminates the need to recompile all functions and procedures that reference a newly recompiled function/procedure.
Function and procedure names may be overloaded within packages, whereas standalone functions and procedures cannot be overloaded.
Functions and procedures inside packages can be checked for side effects at compile time rather than at execution time.
If these reasons haven’t convinced you to place your stored functions and procedures inside packages, here’s a bit of advice we can offer after working with PL/SQL since Version 7.0 of the Oracle Database server: you will never be sorry that you bundled your PL/SQL code into packages, but you will eventually be sorry if you don’t.
Packages consist of two distinct parts: a package specification, which defines the signatures of the package’s public procedures and functions, and a package body, which contains the code for the public procedures and functions and may also contain code for any private functions and procedures not included in the package specification. To give you an idea of what a package looks like, here is a simple example of a package specification:
CREATE OR REPLACE PACKAGE my_pkg AS PROCEDURE my_proc(arg1 IN VARCHAR2); FUNCTION my_func(arg1 IN NUMBER) RETURN VARCHAR2; END my_pkg;
and its matching package body:
CREATE OR REPLACE PACKAGE BODY my_pkg AS FUNCTION my_private_func(arg1 IN NUMBER) RETURN VARCHAR2 IS return_val VARCHAR2(20); BEGIN SELECT col1 INTO return_val FROM tab2 WHERE col2 = arg1; RETURN return_val; EXCEPTION WHEN NO_DATA_FOUND THEN RETURN 'NOT FOUND'; END my_private_func; PROCEDURE my_proc(arg1 IN VARCHAR2) IS BEGIN UPDATE tab1 SET col1 = col1 + 1 WHERE col2 = arg1; END my_proc; FUNCTION my_func(arg1 IN NUMBER) RETURN VARCHAR2 IS BEGIN RETURN my_private_func(arg1); END my_func; END my_pkg;
As you can see, the my_pkg
package includes one
public procedure and one public function. The package specification
includes the parameter names and types of the procedure and function,
along with the return type of the function, but does not include any
implementation code. The package body includes the implementation
logic for the public function and procedure, and it also includes a
private function (my_private_func
) that is only
accessible from inside the package body.
As mentioned earlier, stored functions may be called from within SQL statements. Since stored functions can in turn make calls to stored procedures, it can also be said that stored procedures may be called, albeit indirectly, from within SQL statements. Since stored functions may be used in expressions, they may be included wherever expressions are allowed in a query, including:
The SELECT clause
The WHERE clause
The GROUP BY and HAVING clauses
The ORDER BY clause
The START WITH/CONNECT BY clauses (for hierarchical queries)
The FROM clause (indirectly by using inline views or TABLE statements)
One of the most common uses of stored functions is to isolate commonly-used functionality to facilitate code reuse and simplify maintenance. For example, imagine that you are working with a large team to build a custom N-tier application. To simplify integration efforts between the various layers, it has been decided that all dates will be passed back and forth as the number of milliseconds since January 1, 1970. You could include the conversion logic in all of your queries, as in:
SELECT co.order_nbr, co.cust_nbr, co.sale_price,
ROUND((co.order_dt - TO_DATE('01011970','MMDDYYYY')) * 86400 * 1000)
FROM cust_order co
WHERE ship_dt = TRUNC(SYSDATE);
However, this would become somewhat tedious and prove problematic should you wish to modify your logic in the future. Instead, build a utility package that includes functions for translating between Oracle’s internal date format and the desired format:
CREATE OR REPLACE PACKAGE BODY pkg_util AS FUNCTION translate_date(dt IN DATE) RETURN NUMBER IS BEGIN RETURN ROUND((dt - TO_DATE('01011970','MMDDYYYY')) * 86400 * 1000); END translate_date; FUNCTION translate_date(dt IN NUMBER) RETURN DATE IS BEGIN RETURN TO_DATE('01011970','MMDDYYYY') + (dt / (86400 * 1000)); END translate_date; END pkg_util;
If you think you’re seeing double, don’t worry; the package contains two identically-named functions, one that requires a DATE parameter and returns a NUMBER, and another that requires a NUMBER parameter and returns a DATE. This strategy, called overloading, is only possible when your functions are contained in a package.
Your development team can now use these functions whenever they need to convert date formats, as in:
SELECT co.order_nbr, co.cust_nbr, co.sale_price,
pkg_util.translate_date(co.order_dt) utc_order_dt
FROM cust_order co
WHERE co.ship_dt = TRUNC(SYSDATE);
Another common use of stored functions is to simplify and hide
complex IF-THEN-ELSE logic from your SQL statements. Suppose you have
to generate a report detailing all customer orders for the past
month. You want to sort the orders using the
ship_dt
column if an order has been shipped, the
expected_ship_dt
column if a ship date has been
assigned and is not in the past, the current day if the
expected_ship_dt
is in the past, or the
order_dt
column if the order
hasn’t been assigned a ship date. You could utilize
a CASE statement in the ORDER BY clause:
SELECT co.order_nbr, co.cust_nbr, co.sale_price FROM cust_order co WHERE co.order_dt > TRUNC(SYSDATE, 'MONTH') AND co.cancelled_dt IS NULL ORDER BY CASE WHEN co.ship_dt IS NOT NULL THEN co.ship_dt WHEN co.expected_ship_dt IS NOT NULL AND co.expected_ship_dt > SYSDATE THEN co.expected_ship_dt WHEN co.expected_ship_dt IS NOT NULL THEN GREATEST(SYSDATE, co.expected_ship_dt) ELSE co.order_dt END;
However, there are two problems with this approach:
The resulting ORDER BY clause is fairly complex.
You may wish to use this logic elsewhere, and duplicating it will create maintenance problems.
Instead, add a stored function to the pkg_util
package that returns the appropriate date for a given order:
FUNCTION get_best_order_date(ord_dt IN DATE, exp_ship_dt IN DATE, ship_dt IN DATE) RETURN DATE IS BEGIN IF ship_dt IS NOT NULL THEN RETURN ship_dt; ELSIF exp_ship_dt IS NOT NULL AND exp_ship_dt > SYSDATE THEN RETURN exp_ship_dt; ELSIF exp_ship_dt IS NOT NULL THEN RETURN SYSDATE; ELSE RETURN ord_dt; END IF; END get_best_order_date;
You may then call this function from both the SELECT and ORDER BY clauses:
SELECT co.order_nbr, co.cust_nbr, co.sale_price, pkg_util.get_best_order_date(co.order_dt, co.expected_ship_dt, co.ship_dt) best_date FROM cust_order co WHERE co.order_dt > TRUNC(SYSDATE, 'MONTH') AND co.cancelled_dt IS NULL ORDER BY pkg_util.get_best_order_date(co.order_dt, co.expected_ship_dt, co.ship_dt);
If you are bothered by the fact that the stored function is called twice per row with the same parameters, you can always retrieve the data within an inline view and sort the results afterward, as in:
SELECT orders.order_nbr, orders.cust_nbr, orders.sale_price, orders.best_date FROM (SELECT co.order_nbr order_nbr, co.cust_nbr cust_nbr, co.sale_price sale_price, pkg_util.get_best_order_date(co.order_dt, co.expected_ship_dt, co.ship_dt) best_date FROM cust_order co WHERE co.order_dt > TRUNC(SYSDATE, 'MONTH') AND co.cancelled_dt IS NULL) orders ORDER BY orders.best_date;
Since a view is nothing more than a stored query and stored functions can be called from the SELECT clause of a query, columns of a view can map to stored function calls. This is an excellent way to shield your user community from complexity, and it has another interesting benefit as well. Consider the following view definition, which includes calls to several different stored functions:
CREATE OR REPLACE VIEW vw_example (col1, col2, col3, col4, col5, col6, col7, col8) AS SELECT t1.col1, t1.col2, t2.col3, t2.col4, pkg_example.func1(t1.col1, t2.col3), pkg_example.func2(t1.col2, t2.col4), pkg_example.func3(t1.col1, t2.col3), pkg_example.func4(t1.col2, t2.col4) FROM tab1 t1 INNER JOIN tab2 t2 ON t1.col1 = t2.col3;
Whereas the first four columns of the view map to columns of the
tab1
and tab2
tables, values
for the remaining columns are generated by calling various functions
in the pkg_example
package. If one of your users
executes the following query:
SELECT col2, col4, col7 FROM vw_example WHERE col1 = 1001;
Only one stored function (pkg_example.func3
) is
actually executed even though the view contains four columns that map
to stored function calls. This is because when a query is executed
against a view, the Oracle server constructs a new query by combining
the original query and the view definition. In this case, the query
that is actually executed looks like:
SELECT t1.col2, t2.col4, pkg_example.func3(t1.col1, t2.col3) FROM tab1 t1 INNER JOIN tab2 t2 ON t1.col1 = t2.col3 WHERE t1.col1 = 1001;
Therefore, your view could contain dozens of stored function calls, but only those that are explicitly referenced by queries will be executed.[3]
Imagine that you have deployed a set of views for your users to generate reports and ad-hoc queries against, and one of your users asks that a new column be added to one of the views. The column is from a table not yet included in the FROM clause, and the column is only needed for a single report issued once a month. You could add the table to the FROM clause, add the column to the SELECT clause, and add the join conditions to the ON clause. However, every query issued against the view would include the new table, even though most queries don’t reference the new column.
An alternative strategy is to write a stored function that queries
the new table and returns the desired column. The stored function can
then be added to the SELECT clause without the need to add the new
table to the FROM clause. To illustrate, let’s
expand on the previous simple example. If the desired column is
col6
in the tab3
table, you
could add a new function to the pkg_example
package such as:
FUNCTION func5(param1 IN NUMBER) RETURN VARCHAR2 IS ret_val VARCHAR2(20); BEGIN SELECT col6 INTO ret_val FROM tab3 WHERE col5 = param1; RETURN ret_val; EXCEPTION WHEN NO_DATA_FOUND THEN RETURN null; END func5;
You can now add a column to the view that maps to the new function, as in:
CREATE OR REPLACE VIEW vw_example
(col1, col2, col3, col4, col5, col6, col7, col8, col9)
AS SELECT t1.col1,
t1.col2,
t2.col3,
t2.col4,
pkg_example.func1(t1.col1, t2.col3),
pkg_example.func2(t1.col2, t2.col4),
pkg_example.func3(t1.col1, t2.col3),
pkg_example.func4(t1.col2, t2.col4),
pkg_example.func5(t2.col3)
FROM tab1 t1 INNER JOIN tab2 t2
ON t1.col1 = t2.col3;
Thus, you have provided your users access to column
col6
of the tab3
table without
adding the tab3
table to the
view’s FROM clause. Users who don’t
reference the new col9
column of the view will
experience no changes to the performance of their queries against
vw_example
.
Even though the column was originally targeted for a single report,
don’t be surprised if other users decide to include
the new column in their queries. As the column utilization increases,
it may be advantageous to abandon the stored function strategy and
include the tab3
table in the FROM clause. Since a
view was employed, however, you would be able to make this change
without the need for any of your users to modify their queries.
Earlier in this section, we created a package containing two functions to be used for translating between a date and the number of milliseconds since January 1, 1970. Because these functions do not depend on data stored in the database or in package variables, they will always return the same result for any given input parameter. Any function having this property can be marked as DETERMINISTIC in the function declaration, as in:
CREATE OR REPLACE PACKAGE BODY pkg_util AS FUNCTION translate_date(dt IN DATE) RETURN NUMBER DETERMINISTIC; FUNCTION translate_date(dt IN NUMBER) RETURN DATE DETERMINISTIC; END pkg_util;
Marking your functions as DETERMINISTIC allows the Oracle server to perform certain optimizations, such as storing a function’s parameters and results in memory so that subsequent calls to the same function can be handled without the need to call the function again.
Although calling stored functions from SQL is a powerful feature, it is important to understand how doing so might have unintended consequences. For example, imagine that one of your co-workers has written a stored function that, given a part number, returns the number of times that part is included in all open orders. The function is contained in a utilities package, such as the following:
CREATE OR REPLACE PACKAGE pkg_util AS FUNCTION get_part_order_qty(pno IN VARCHAR2) RETURN NUMBER; END pkg_util;
You have been tasked with generating a weekly inventory report, and you would like to make use of the function in one of your queries, as in:
SELECT p.part_nbr, p.name, s.name, p.inventory_qty,
pkg_util.get_part_order_qty(p.part_nbr) open_order_qty
FROM part p INNER JOIN supplier s
ON p.supplier_id = s.supplier_id
ORDER BY s.name, p.part_nbr;
When you run the query, however, you are surprised to see the following error:
ORA-14551: cannot perform a DML operation inside a query
Upon checking the package body, you find that the
get_part_order_qty
function, along with
calculating the number of times a part is included in all open
orders, generates a request to restock the part by inserting a record
into the part_order
table if the calculated value
exceeds the number in inventory. Had Oracle allowed your statement to
be executed, your query would have resulted in changes to the
database without your knowledge or consent.
To determine whether a stored function might have unintended consequences when called from a SQL statement, Oracle assigns a purity level to the function that answers the following four questions:
Does the function read from database tables?
Does the function reference any global package variables?
Does the function write to any database tables?
Does the function modify any global package variables?
For each negative response to these questions, a designation is added to the purity level, as shown in Table 11-1.
Question # |
Designation |
Description |
1 |
RNDS |
Reads no database state |
2 |
RNPS |
Reads no package state |
3 |
WNDS |
Writes no database state |
4 |
WNPS |
Writes no package state |
Therefore, a function with a purity level of {WNPS, WNDS} is guaranteed not to write to the database or modify package variables, but it may reference package variables and/or read from database tables. For a function to be called from a SQL statement, its purity level must at a minimum include the WNDS designation.
When using packaged functions in Oracle versions prior to release 8.1, it was required that the purity level be specified prior to calling a function from a SQL statement. This is accomplished by adding a pragma, or compiler directive, to the package specification. The RESTRICT_REFERENCES pragma follows the function declaration in the package specification, as demonstrated here:
CREATE OR REPLACE PACKAGE my_pkg AS
FUNCTION my_func(arg1 IN NUMBER) RETURN VARCHAR2;
PRAGMA RESTRICT_REFERENCES(my_func, RNPS, WNPS, WNDS);
END my_pkg;
When the package body is compiled, the code is checked against the designations listed in the RESTRICT_REFERENCES pragma. If the code does not meet the purity level asserted in the pragma, compilation fails with the following error:
PLS-00452: Subprogram 'MY_FUNC' violates its associated pragma
Therefore, you tell the compiler what your function will and won’t do via the RESTRICT_REFERENCES pragma, and the compiler checks to see if you are telling it the truth.
Beginning with Oracle8i, you are no longer required to specify the purity level of functions in the package specification. All functions called from queries, whether stand-alone or within packages, are checked at runtime to ensure that there are no side effects. Nevertheless, you may want to consider using the RESTRICT_REFERENCES pragma so that you avoid any surprises later on.
One of the reasons Oracle has relaxed the requirement that the purity level be asserted at compile time is that PL/SQL can make calls to functions written in C and Java, which have no mechanisms similar to PL/SQL’s PRAGMA for asserting purity. To allow functions written in different languages to call each other, Oracle introduced the TRUST keyword in Oracle8i. Adding TRUST to the RESTRICT_REFERENCES pragma for a function causes Oracle to:
Treat the function as if it satisfies the pragma without actually checking the code.
Treat any functions or procedures called from the function that have the TRUST keyword as if they satisfy the pragma as well.
Thus, a stored function whose RESTRICT_REFERENCES pragma includes WNDS and TRUST could make calls to other PL/SQL functions that do not specify RESTRICT_REFERENCES pragmas and/or external C and Java functions and still be callable from SQL statements. In the case of external C or Java calls, you will need to include the TRUST designation in your function’s RESTRICT_REFERENCES pragma if you want to call the function from SQL, since the C or Java source code is not available to the server for inspection.
To use TRUST, simply append it to the end of the purity designation list, as in:
CREATE OR REPLACE PACKAGE my_pkg AS
FUNCTION my_func(arg1 IN NUMBER) RETURN VARCHAR2;
PRAGMA RESTRICT_REFERENCES(my_func, RNPS, WNPS, WNDS, TRUST);
END my_pkg;
In addition to the WNDS requirement, Oracle checks that each function invoked from a SQL statement abides by the following rules:
The function can’t end the current transaction using COMMIT or ROLLBACK.
The function can’t alter a transaction by creating savepoints or rolling back to a previously defined savepoint.
The function can’t issue an ALTER SYSTEM or ALTER SESSION statement.
All parameter types, including the return type, must be standard SQL types such as VARCHAR2, NUMBER, and DATE. PL/SQL types such as BOOLEAN and RECORD are not allowed.
All parameters must be IN parameters. OUT and IN OUT parameters are not allowed.
The first three restrictions are designed to protect against changes that could alter the operational environment of the parent query. The fourth and fifth restrictions ensure that the data being returned from the PL/SQL function can be handled by the SQL statement.
Stored functions may also be called from INSERT, UPDATE, and DELETE statements. Whereas most of the restrictions outlined earlier apply equally to stored functions called from DML statements, there is one major difference: since the parent DML statement is changing the state of the database, stored functions invoked from DML statements do not need to abide by the WNDS restriction. However, such stored functions may not read or modify the same table as the parent DML statement.
Like queries, DML statements may call stored functions where expressions are allowed, including:
The VALUES clause of an INSERT statement
The SET clause of an UPDATE statement
The WHERE clause of an INSERT, UPDATE, or DELETE statement
Any subqueries called from a DML statement may also call stored functions as well under the same set of restrictions as the parent DML statement.
Often, sets of complementary stored functions are called from both
queries and DML statements. For example, you saw earlier how the
pkg_util.translate_date
function could be called
from a query to translate from the Oracle date format stored in the
database to the format needed by a Java client. Similarly, the
overloaded pkg_util.translate_date
function may be
used within an update statement to perform the reverse translation,
as in:
UPDATE cust_order SET expected_ship_dt = pkg_util.translate_date(:1) WHERE order_nbr = :2;
where :1
and :2
are
placeholders for the UTC timedate and order number passed in by the
Java client.
Stored functions may also be used in the WHERE clause in place of
correlated subqueries, both to simplify the DML statement and to
facilitate code reuse. For example, suppose you have been asked to
push the expected ship date by five days for any order containing
part number F34-17802. You could issue an UPDATE statement against
the cust_order
table using a correlated subquery,
as in:
UPDATE cust_order co SET co.expected_ship_dt = NVL(co.expected_ship_dt, SYSDATE) + 5 WHERE co.cancelled_dt IS NULL and co.ship_dt IS NULL AND EXISTS (SELECT 1 FROM line_item li WHERE li.order_nbr = co.order_nbr AND li.part_nbr = 'F34-17802'),
After having written many subqueries against the
line_item
table, however, you feel
it’s time to write a multipurpose function and add
it to the pkg_util
package:
FUNCTION get_part_count(ordno IN NUMBER, partno IN VARCHAR2 DEFAULT NULL, max_cnt IN NUMBER DEFAULT 9999) RETURN NUMBER IS tot_cnt NUMBER(5) := 0; li_part_nbr VARCHAR2(20); CURSOR cur_li(c_ordno IN NUMBER) IS SELECT part_nbr FROM line_item WHERE order_nbr = c_ordno; BEGIN OPEN cur_li(ordno); WHILE tot_cnt < max_cnt LOOP FETCH cur_li INTO li_part_nbr; EXIT WHEN cur_li%NOTFOUND; IF partno IS NULL OR (partno IS NOT NULL AND partno = li_part_nbr) THEN tot_cnt := tot_cnt + 1; END IF; END LOOP; CLOSE cur_li; RETURN tot_cnt; END get_part_count;
The function may be used for multiple purposes, including:
To count the number of line items in a given order
To count the number of line items in a given order containing a given part
To determine whether the given order has at least X occurrences of a given part
The UPDATE statement may now use the function to locate open orders that have at least one occurrence of part F34-17802:
UPDATE cust_order co SET co.expected_ship_dt = NVL(co.expected_ship_dt, SYSDATE) + 5 WHERE co.cancelled_dt IS NULL and co.ship_dt IS NULL AND 1 = pkg_util.get_part_count(co.order_nbr, 'F34-17802', 1);
Now that we’ve explored calling PL/SQL from SQL, let’s turn the tables and explore the use of SQL inside your PL/SQL code. SQL is great at manipulating large sets of data, but there are situations where you need to work with data at the row level. PL/SQL, with its looping and cursor control capabilities, allows the flexibility to work at the set level using SQL or at the row level using cursors. However, many PL/SQL programmers forego the power of SQL and do everything at the row level, even when it is unnecessary and time-consuming to do so.
As an analogy, imagine that you are working at a warehouse, and a large shipment of parts arrives on the loading dock. Your job is to separate the shipment by part type and distribute the pieces to different areas of the warehouse. To make your job easier, the warehouse owner has procured the best forklift money can buy. There are two possible strategies to employ:
Pick up one box at a time, determine the type, and drive it to the appropriate destination.
Spend some time analyzing the situation, determine that every box on a pallet is of the same type, and drive entire pallets to the appropriate destination.
Although this analogy might be overly simplistic, it does serve to illustrate the difference between set operations and row operations. Allowing the Oracle server to manipulate large sets in a single operation can often yield a performance improvement of several orders of magnitude over manipulating one row at a time, especially on systems with multiple CPUs.
When a procedural language is used for database access (whether it is PL/SQL, C with OCI calls, or Java using JDBC), there is a tendency to employ the first strategy. Perhaps programmers are accustomed to coding at a low level of granularity when using a procedural language and this spills over into their data access logic. This situation is especially prevalent in systems that need to process and load large amounts of data from external files, such as data warehouse load utilities.
Imagine that you are charged with building an infrastructure to accept files from multiple OLTP systems, perform various data cleaning operations, and aggregate the data into a data warehouse. Using PL/SQL (or C, Java, C++, Cobol, etc.), you could build functionality that:
Opens a given file.
Reads a line, verifies/cleans the data, and updates the appropriate row of the appropriate fact table in the data warehouse.
Repeats #2 until the file is exhausted.
Closes the file.
Although this approach might work for small files, it is not uncommon for large warehouses to receive feeds containing hundreds of thousands or millions of items. Even if your code is extremely efficient, processing a million-line file could take several hours.
Here’s an alternate strategy that employs the power of the Oracle server to make quick work of large data feeds:
Create a staging table for each unique data feed file format.
At the start of the load process, truncate the staging tables.
Use SQL*Loader with the direct path option to quickly load the data file into the appropriate staging table.
Update all rows of the staging table to clean, verify, and transform data, marking rows as invalid if they fail verification. Perform the operation in parallel if possible.
Update the appropriate fact table using a subquery against the staging table. Again, perform in parallel if possible.
For this strategy to succeed, you need to have adequate disk space and sufficiently large rollback and temporary tablespaces. With adequate resources and properly constructed SQL statements, however, this strategy can yield a 10X improvement over the previous strategy.
So what role should PL/SQL play in such a scenario? In this case, PL/SQL would be an excellent vehicle for executing steps 4 and 5 of the previous list. Although the stored procedures might each contain only a single update statement, the SQL is likely to be complex and may contain optimizer hints and other advanced features. Therefore, it would be advisable to isolate the SQL from the rest of the application so that it may be independently monitored and tuned.
In general, when dealing with complex logic involving large data sets, it is advantageous to think in terms of data sets rather than programming steps. In other words, ask yourself where your data is, where it needs to move to, and what needs to happen to it during its journey instead of thinking in terms of what needs to happen with each piece of data to satisfy the business requirements. If you follow this strategy, you will find yourself writing substantial, efficient SQL statements that employ PL/SQL where appropriate, rather than writing complex PL/SQL routines that employ SQL when needed. In doing so, you will be providing the server with the opportunity to split large workloads into multiple pieces that run in parallel, which can greatly improve performance.
[1] For example, Oracle PL/SQL Programming by Steven Feuerstein (O’Reilly).
[2] Database triggers are another type of stored PL/SQL, but they are outside the scope of this discussion.
[3] This is one reason why you should never use SELECT * when working with a view. Always explicitly name the columns that you need so that the server doesn’t waste time generating data that you never reference.
18.118.253.223