Chapter 11. PL/SQL

There are many fine books on the market that cover the PL/SQL language in great detail.[1] Because this is a book about Oracle SQL, the focus of this chapter is the use of PL/SQL within SQL statements as well as the use of SQL within PL/SQL programs.

What Is PL/SQL?

PL/SQL is a procedural programming language from Oracle that combines the following elements:

  • Logical constructs such as IF-THEN-ELSE and WHILE

  • SQL DML statements, built-in functions, and operators

  • Transaction control statements such as COMMIT and ROLLBACK

  • Cursor control statements

  • Object and collection manipulation statements

Despite its humble origins as a scripting language in Version 6.0, PL/SQL became an integral part of the Oracle server with release 7.0. Because release 7.0 included the ability to compile and store PL/SQL programs within the server, Oracle began using the language to provide server functionality and to assist in database installation and configuration. With the 7.1 release of the server, Oracle added a new feature of particular use to SQL programmers: the ability to call PL/SQL stored functions from SQL statements (more on this later).

Along with the array of new features made available with each release of PL/SQL, Oracle began supplying prefabricated sets of PL/SQL functionality to allow programmers to tackle more sophisticated programming tasks and to help integrate with various Oracle product offerings. These collections of stored procedures and functions, known as Oracle Supplied Packages, allow you to (among other things):

  • Interface with and administer Oracle’s Advanced Queueing feature

  • Schedule database tasks for periodic execution

  • Manipulate Oracle large objects (LOBs)

  • Read from and write to external files

  • Interface with Oracle’s Advanced Replication features

  • Issue dynamic SQL statements

  • Generate and parse XML files

  • Issue LDAP commands

The ever-expanding feature set of the PL/SQL language combined with the wide array of supplied packages has yielded a powerful database programming environment. Whether you are generating reports, writing data loading scripts, or writing custom applications, there’s probably a place for PL/SQL in your project.

Procedures, Functions, and Packages

Although PL/SQL can still be used to write scripts, also known as anonymous blocks, the focus of this chapter is PL/SQL routines stored in the Oracle Database server. PL/SQL routines stored in the database may be one of two types: stored procedures or stored functions.[2] Stored functions and procedures are essentially identical except for the following:

  • Stored functions have a return type, whereas procedures do not.

  • Because stored functions return a value, they can be used in expressions, whereas procedures cannot.

Stored functions and procedures may be compiled individually, or they may be grouped together into packages. Along with being a convenient way to group related functionality together, packages are important for the following reasons:

  • Packages are loaded into memory as a whole, increasing the likelihood that a procedure or function will be resident in memory when called.

  • Packages can include private elements, allowing logic to be hidden from view.

  • Placing functions and procedures inside packages eliminates the need to recompile all functions and procedures that reference a newly recompiled function/procedure.

  • Function and procedure names may be overloaded within packages, whereas standalone functions and procedures cannot be overloaded.

  • Functions and procedures inside packages can be checked for side effects at compile time rather than at execution time.

If these reasons haven’t convinced you to place your stored functions and procedures inside packages, here’s a bit of advice we can offer after working with PL/SQL since Version 7.0 of the Oracle Database server: you will never be sorry that you bundled your PL/SQL code into packages, but you will eventually be sorry if you don’t.

Packages consist of two distinct parts: a package specification, which defines the signatures of the package’s public procedures and functions, and a package body, which contains the code for the public procedures and functions and may also contain code for any private functions and procedures not included in the package specification. To give you an idea of what a package looks like, here is a simple example of a package specification:

CREATE OR REPLACE PACKAGE my_pkg AS
  PROCEDURE my_proc(arg1 IN VARCHAR2);

  FUNCTION my_func(arg1 IN NUMBER) RETURN VARCHAR2;
END my_pkg;

and its matching package body:

CREATE OR REPLACE PACKAGE BODY my_pkg AS
  FUNCTION my_private_func(arg1 IN NUMBER) RETURN VARCHAR2 IS
    return_val VARCHAR2(20);
  BEGIN
    SELECT col1 INTO return_val
    FROM tab2
    WHERE col2 = arg1;

    RETURN return_val;
  EXCEPTION
    WHEN NO_DATA_FOUND THEN 
      RETURN 'NOT FOUND';
  END my_private_func;

  PROCEDURE my_proc(arg1 IN VARCHAR2) IS
  BEGIN
    UPDATE tab1 SET col1 = col1 + 1
    WHERE col2 = arg1;
  END my_proc;

  FUNCTION my_func(arg1 IN NUMBER) RETURN VARCHAR2 IS
  BEGIN
    RETURN my_private_func(arg1);
  END my_func;
END my_pkg;

As you can see, the my_pkg package includes one public procedure and one public function. The package specification includes the parameter names and types of the procedure and function, along with the return type of the function, but does not include any implementation code. The package body includes the implementation logic for the public function and procedure, and it also includes a private function (my_private_func) that is only accessible from inside the package body.

Calling Stored Functions from Queries

As mentioned earlier, stored functions may be called from within SQL statements. Since stored functions can in turn make calls to stored procedures, it can also be said that stored procedures may be called, albeit indirectly, from within SQL statements. Since stored functions may be used in expressions, they may be included wherever expressions are allowed in a query, including:

  • The SELECT clause

  • The WHERE clause

  • The GROUP BY and HAVING clauses

  • The ORDER BY clause

  • The START WITH/CONNECT BY clauses (for hierarchical queries)

  • The FROM clause (indirectly by using inline views or TABLE statements)

One of the most common uses of stored functions is to isolate commonly-used functionality to facilitate code reuse and simplify maintenance. For example, imagine that you are working with a large team to build a custom N-tier application. To simplify integration efforts between the various layers, it has been decided that all dates will be passed back and forth as the number of milliseconds since January 1, 1970. You could include the conversion logic in all of your queries, as in:

SELECT co.order_nbr, co.cust_nbr, co.sale_price,
  ROUND((co.order_dt - TO_DATE('01011970','MMDDYYYY')) * 86400 * 1000)
FROM cust_order co
WHERE ship_dt = TRUNC(SYSDATE);

However, this would become somewhat tedious and prove problematic should you wish to modify your logic in the future. Instead, build a utility package that includes functions for translating between Oracle’s internal date format and the desired format:

CREATE OR REPLACE PACKAGE BODY pkg_util AS  
  FUNCTION translate_date(dt IN DATE) RETURN NUMBER IS
  BEGIN
    RETURN ROUND((dt - TO_DATE('01011970','MMDDYYYY')) * 86400 * 1000);
  END translate_date;

  FUNCTION translate_date(dt IN NUMBER) RETURN DATE IS
  BEGIN
    RETURN TO_DATE('01011970','MMDDYYYY') + (dt / (86400 * 1000));
  END translate_date;
END pkg_util;

If you think you’re seeing double, don’t worry; the package contains two identically-named functions, one that requires a DATE parameter and returns a NUMBER, and another that requires a NUMBER parameter and returns a DATE. This strategy, called overloading, is only possible when your functions are contained in a package.

Your development team can now use these functions whenever they need to convert date formats, as in:

SELECT co.order_nbr, co.cust_nbr, co.sale_price,
  pkg_util.translate_date(co.order_dt) utc_order_dt
FROM cust_order co
WHERE co.ship_dt = TRUNC(SYSDATE);

Another common use of stored functions is to simplify and hide complex IF-THEN-ELSE logic from your SQL statements. Suppose you have to generate a report detailing all customer orders for the past month. You want to sort the orders using the ship_dt column if an order has been shipped, the expected_ship_dt column if a ship date has been assigned and is not in the past, the current day if the expected_ship_dt is in the past, or the order_dt column if the order hasn’t been assigned a ship date. You could utilize a CASE statement in the ORDER BY clause:

SELECT co.order_nbr, co.cust_nbr, co.sale_price
FROM cust_order co
WHERE co.order_dt > TRUNC(SYSDATE, 'MONTH')
  AND co.cancelled_dt IS NULL
ORDER BY 
  CASE 
    WHEN co.ship_dt IS NOT NULL THEN co.ship_dt
    WHEN co.expected_ship_dt IS NOT NULL 
      AND co.expected_ship_dt > SYSDATE
        THEN co.expected_ship_dt
    WHEN co.expected_ship_dt IS NOT NULL 
      THEN GREATEST(SYSDATE, co.expected_ship_dt)
    ELSE co.order_dt
  END;

However, there are two problems with this approach:

  • The resulting ORDER BY clause is fairly complex.

  • You may wish to use this logic elsewhere, and duplicating it will create maintenance problems.

Instead, add a stored function to the pkg_util package that returns the appropriate date for a given order:

FUNCTION get_best_order_date(ord_dt IN DATE, exp_ship_dt IN DATE,
  ship_dt IN DATE) RETURN DATE IS
BEGIN
  IF ship_dt IS NOT NULL THEN
    RETURN ship_dt;
  ELSIF exp_ship_dt IS NOT NULL AND exp_ship_dt > SYSDATE THEN
    RETURN exp_ship_dt;
  ELSIF exp_ship_dt IS NOT NULL THEN
    RETURN SYSDATE;
  ELSE
    RETURN ord_dt;
  END IF;
END get_best_order_date;

You may then call this function from both the SELECT and ORDER BY clauses:

SELECT co.order_nbr, co.cust_nbr, co.sale_price,
  pkg_util.get_best_order_date(co.order_dt, co.expected_ship_dt,
    co.ship_dt) best_date
FROM cust_order co
WHERE co.order_dt > TRUNC(SYSDATE, 'MONTH')
  AND co.cancelled_dt IS NULL
ORDER BY pkg_util.get_best_order_date(co.order_dt, co.expected_ship_dt,
    co.ship_dt);

If you are bothered by the fact that the stored function is called twice per row with the same parameters, you can always retrieve the data within an inline view and sort the results afterward, as in:

SELECT orders.order_nbr, orders.cust_nbr, 
  orders.sale_price, orders.best_date
FROM
 (SELECT co.order_nbr order_nbr, co.cust_nbr cust_nbr, 
    co.sale_price sale_price,
    pkg_util.get_best_order_date(co.order_dt, co.expected_ship_dt,
    co.ship_dt) best_date
  FROM cust_order co
  WHERE co.order_dt > TRUNC(SYSDATE, 'MONTH')
    AND co.cancelled_dt IS NULL) orders
ORDER BY orders.best_date;

Stored Functions and Views

Since a view is nothing more than a stored query and stored functions can be called from the SELECT clause of a query, columns of a view can map to stored function calls. This is an excellent way to shield your user community from complexity, and it has another interesting benefit as well. Consider the following view definition, which includes calls to several different stored functions:

CREATE OR REPLACE VIEW vw_example 
 (col1, col2, col3, col4, col5, col6, col7, col8)
AS SELECT t1.col1,
  t1.col2,
  t2.col3,
  t2.col4,
  pkg_example.func1(t1.col1, t2.col3),
  pkg_example.func2(t1.col2, t2.col4),
  pkg_example.func3(t1.col1, t2.col3),
  pkg_example.func4(t1.col2, t2.col4)
FROM tab1 t1 INNER JOIN tab2 t2
  ON t1.col1 = t2.col3;

Whereas the first four columns of the view map to columns of the tab1 and tab2 tables, values for the remaining columns are generated by calling various functions in the pkg_example package. If one of your users executes the following query:

SELECT col2, col4, col7
FROM vw_example
WHERE col1 = 1001;

Only one stored function (pkg_example.func3) is actually executed even though the view contains four columns that map to stored function calls. This is because when a query is executed against a view, the Oracle server constructs a new query by combining the original query and the view definition. In this case, the query that is actually executed looks like:

SELECT t1.col2,
  t2.col4,
  pkg_example.func3(t1.col1, t2.col3)
FROM tab1 t1 INNER JOIN tab2 t2
  ON t1.col1 = t2.col3
WHERE t1.col1 = 1001;

Therefore, your view could contain dozens of stored function calls, but only those that are explicitly referenced by queries will be executed.[3]

Avoiding Table Joins

Imagine that you have deployed a set of views for your users to generate reports and ad-hoc queries against, and one of your users asks that a new column be added to one of the views. The column is from a table not yet included in the FROM clause, and the column is only needed for a single report issued once a month. You could add the table to the FROM clause, add the column to the SELECT clause, and add the join conditions to the ON clause. However, every query issued against the view would include the new table, even though most queries don’t reference the new column.

An alternative strategy is to write a stored function that queries the new table and returns the desired column. The stored function can then be added to the SELECT clause without the need to add the new table to the FROM clause. To illustrate, let’s expand on the previous simple example. If the desired column is col6 in the tab3 table, you could add a new function to the pkg_example package such as:

FUNCTION func5(param1 IN NUMBER) RETURN VARCHAR2 IS
  ret_val VARCHAR2(20);
BEGIN
  SELECT col6 INTO ret_val
  FROM tab3
  WHERE col5 = param1;

  RETURN ret_val;
EXCEPTION
  WHEN NO_DATA_FOUND THEN
    RETURN null;
END func5;

You can now add a column to the view that maps to the new function, as in:

CREATE OR REPLACE VIEW vw_example 
 (col1, col2, col3, col4, col5, col6, col7, col8, col9)
AS SELECT t1.col1,
  t1.col2,
  t2.col3,
  t2.col4,
  pkg_example.func1(t1.col1, t2.col3),
  pkg_example.func2(t1.col2, t2.col4),
  pkg_example.func3(t1.col1, t2.col3),
  pkg_example.func4(t1.col2, t2.col4),
  pkg_example.func5(t2.col3)
FROM tab1 t1 INNER JOIN tab2 t2
  ON t1.col1 = t2.col3;

Thus, you have provided your users access to column col6 of the tab3 table without adding the tab3 table to the view’s FROM clause. Users who don’t reference the new col9 column of the view will experience no changes to the performance of their queries against vw_example.

Even though the column was originally targeted for a single report, don’t be surprised if other users decide to include the new column in their queries. As the column utilization increases, it may be advantageous to abandon the stored function strategy and include the tab3 table in the FROM clause. Since a view was employed, however, you would be able to make this change without the need for any of your users to modify their queries.

Deterministic Functions

Earlier in this section, we created a package containing two functions to be used for translating between a date and the number of milliseconds since January 1, 1970. Because these functions do not depend on data stored in the database or in package variables, they will always return the same result for any given input parameter. Any function having this property can be marked as DETERMINISTIC in the function declaration, as in:

CREATE OR REPLACE PACKAGE BODY pkg_util AS  
  FUNCTION translate_date(dt IN DATE) RETURN NUMBER DETERMINISTIC;
  FUNCTION translate_date(dt IN NUMBER) RETURN DATE DETERMINISTIC;
END pkg_util;

Marking your functions as DETERMINISTIC allows the Oracle server to perform certain optimizations, such as storing a function’s parameters and results in memory so that subsequent calls to the same function can be handled without the need to call the function again.

Restrictions on Calling PL/SQL from SQL

Although calling stored functions from SQL is a powerful feature, it is important to understand how doing so might have unintended consequences. For example, imagine that one of your co-workers has written a stored function that, given a part number, returns the number of times that part is included in all open orders. The function is contained in a utilities package, such as the following:

CREATE OR REPLACE PACKAGE pkg_util AS
  FUNCTION get_part_order_qty(pno IN VARCHAR2) RETURN NUMBER;
END pkg_util;

You have been tasked with generating a weekly inventory report, and you would like to make use of the function in one of your queries, as in:

SELECT p.part_nbr, p.name, s.name, p.inventory_qty,
  pkg_util.get_part_order_qty(p.part_nbr) open_order_qty
FROM part p INNER JOIN supplier s
  ON p.supplier_id = s.supplier_id
ORDER BY s.name, p.part_nbr;

When you run the query, however, you are surprised to see the following error:

ORA-14551: cannot perform a DML operation inside a query

Upon checking the package body, you find that the get_part_order_qty function, along with calculating the number of times a part is included in all open orders, generates a request to restock the part by inserting a record into the part_order table if the calculated value exceeds the number in inventory. Had Oracle allowed your statement to be executed, your query would have resulted in changes to the database without your knowledge or consent.

Purity Level

To determine whether a stored function might have unintended consequences when called from a SQL statement, Oracle assigns a purity level to the function that answers the following four questions:

  1. Does the function read from database tables?

  2. Does the function reference any global package variables?

  3. Does the function write to any database tables?

  4. Does the function modify any global package variables?

For each negative response to these questions, a designation is added to the purity level, as shown in Table 11-1.

Table 11-1. Purity level designations

Question #

Designation

Description

1

RNDS

Reads no database state

2

RNPS

Reads no package state

3

WNDS

Writes no database state

4

WNPS

Writes no package state

Therefore, a function with a purity level of {WNPS, WNDS} is guaranteed not to write to the database or modify package variables, but it may reference package variables and/or read from database tables. For a function to be called from a SQL statement, its purity level must at a minimum include the WNDS designation.

When using packaged functions in Oracle versions prior to release 8.1, it was required that the purity level be specified prior to calling a function from a SQL statement. This is accomplished by adding a pragma, or compiler directive, to the package specification. The RESTRICT_REFERENCES pragma follows the function declaration in the package specification, as demonstrated here:

CREATE OR REPLACE PACKAGE my_pkg AS
  FUNCTION my_func(arg1 IN NUMBER) RETURN VARCHAR2;
  PRAGMA RESTRICT_REFERENCES(my_func, RNPS, WNPS, WNDS);
END my_pkg;

When the package body is compiled, the code is checked against the designations listed in the RESTRICT_REFERENCES pragma. If the code does not meet the purity level asserted in the pragma, compilation fails with the following error:

PLS-00452: Subprogram 'MY_FUNC' violates its associated pragma

Therefore, you tell the compiler what your function will and won’t do via the RESTRICT_REFERENCES pragma, and the compiler checks to see if you are telling it the truth.

Beginning with Oracle8i, you are no longer required to specify the purity level of functions in the package specification. All functions called from queries, whether stand-alone or within packages, are checked at runtime to ensure that there are no side effects. Nevertheless, you may want to consider using the RESTRICT_REFERENCES pragma so that you avoid any surprises later on.

Trust Me...

One of the reasons Oracle has relaxed the requirement that the purity level be asserted at compile time is that PL/SQL can make calls to functions written in C and Java, which have no mechanisms similar to PL/SQL’s PRAGMA for asserting purity. To allow functions written in different languages to call each other, Oracle introduced the TRUST keyword in Oracle8i. Adding TRUST to the RESTRICT_REFERENCES pragma for a function causes Oracle to:

  • Treat the function as if it satisfies the pragma without actually checking the code.

  • Treat any functions or procedures called from the function that have the TRUST keyword as if they satisfy the pragma as well.

Thus, a stored function whose RESTRICT_REFERENCES pragma includes WNDS and TRUST could make calls to other PL/SQL functions that do not specify RESTRICT_REFERENCES pragmas and/or external C and Java functions and still be callable from SQL statements. In the case of external C or Java calls, you will need to include the TRUST designation in your function’s RESTRICT_REFERENCES pragma if you want to call the function from SQL, since the C or Java source code is not available to the server for inspection.

To use TRUST, simply append it to the end of the purity designation list, as in:

CREATE OR REPLACE PACKAGE my_pkg AS
  FUNCTION my_func(arg1 IN NUMBER) RETURN VARCHAR2;
  PRAGMA RESTRICT_REFERENCES(my_func, RNPS, WNPS, WNDS, TRUST);
END my_pkg;

Other Restrictions

In addition to the WNDS requirement, Oracle checks that each function invoked from a SQL statement abides by the following rules:

  • The function can’t end the current transaction using COMMIT or ROLLBACK.

  • The function can’t alter a transaction by creating savepoints or rolling back to a previously defined savepoint.

  • The function can’t issue an ALTER SYSTEM or ALTER SESSION statement.

  • All parameter types, including the return type, must be standard SQL types such as VARCHAR2, NUMBER, and DATE. PL/SQL types such as BOOLEAN and RECORD are not allowed.

  • All parameters must be IN parameters. OUT and IN OUT parameters are not allowed.

The first three restrictions are designed to protect against changes that could alter the operational environment of the parent query. The fourth and fifth restrictions ensure that the data being returned from the PL/SQL function can be handled by the SQL statement.

Stored Functions in DML Statements

Stored functions may also be called from INSERT, UPDATE, and DELETE statements. Whereas most of the restrictions outlined earlier apply equally to stored functions called from DML statements, there is one major difference: since the parent DML statement is changing the state of the database, stored functions invoked from DML statements do not need to abide by the WNDS restriction. However, such stored functions may not read or modify the same table as the parent DML statement.

Like queries, DML statements may call stored functions where expressions are allowed, including:

  • The VALUES clause of an INSERT statement

  • The SET clause of an UPDATE statement

  • The WHERE clause of an INSERT, UPDATE, or DELETE statement

Any subqueries called from a DML statement may also call stored functions as well under the same set of restrictions as the parent DML statement.

Often, sets of complementary stored functions are called from both queries and DML statements. For example, you saw earlier how the pkg_util.translate_date function could be called from a query to translate from the Oracle date format stored in the database to the format needed by a Java client. Similarly, the overloaded pkg_util.translate_date function may be used within an update statement to perform the reverse translation, as in:

UPDATE cust_order
SET expected_ship_dt = pkg_util.translate_date(:1)
WHERE order_nbr = :2;

where :1 and :2 are placeholders for the UTC timedate and order number passed in by the Java client.

Stored functions may also be used in the WHERE clause in place of correlated subqueries, both to simplify the DML statement and to facilitate code reuse. For example, suppose you have been asked to push the expected ship date by five days for any order containing part number F34-17802. You could issue an UPDATE statement against the cust_order table using a correlated subquery, as in:

UPDATE cust_order co
SET co.expected_ship_dt = NVL(co.expected_ship_dt, SYSDATE) + 5
WHERE co.cancelled_dt IS NULL and co.ship_dt IS NULL
  AND EXISTS (SELECT 1 FROM line_item li
    WHERE li.order_nbr = co.order_nbr
      AND li.part_nbr = 'F34-17802'),

After having written many subqueries against the line_item table, however, you feel it’s time to write a multipurpose function and add it to the pkg_util package:

FUNCTION get_part_count(ordno IN NUMBER, 
  partno IN VARCHAR2 DEFAULT NULL, max_cnt IN NUMBER DEFAULT 9999)
  RETURN NUMBER IS
  tot_cnt NUMBER(5) := 0;
  li_part_nbr VARCHAR2(20);
  CURSOR cur_li(c_ordno IN NUMBER) IS
    SELECT part_nbr
    FROM line_item
    WHERE order_nbr = c_ordno;
BEGIN
  OPEN cur_li(ordno);
  WHILE tot_cnt < max_cnt LOOP
    FETCH cur_li INTO li_part_nbr;
    EXIT WHEN cur_li%NOTFOUND;

    IF partno IS NULL OR 
      (partno IS NOT NULL AND partno = li_part_nbr) THEN
      tot_cnt := tot_cnt + 1;
    END IF;
  END LOOP;
  CLOSE cur_li;

  RETURN tot_cnt;
END get_part_count;

The function may be used for multiple purposes, including:

  • To count the number of line items in a given order

  • To count the number of line items in a given order containing a given part

  • To determine whether the given order has at least X occurrences of a given part

The UPDATE statement may now use the function to locate open orders that have at least one occurrence of part F34-17802:

UPDATE cust_order co
SET co.expected_ship_dt = NVL(co.expected_ship_dt, SYSDATE) + 5
WHERE co.cancelled_dt IS NULL and co.ship_dt IS NULL
  AND 1 = pkg_util.get_part_count(co.order_nbr, 'F34-17802', 1);

The SQL Inside Your PL/SQL

Now that we’ve explored calling PL/SQL from SQL, let’s turn the tables and explore the use of SQL inside your PL/SQL code. SQL is great at manipulating large sets of data, but there are situations where you need to work with data at the row level. PL/SQL, with its looping and cursor control capabilities, allows the flexibility to work at the set level using SQL or at the row level using cursors. However, many PL/SQL programmers forego the power of SQL and do everything at the row level, even when it is unnecessary and time-consuming to do so.

As an analogy, imagine that you are working at a warehouse, and a large shipment of parts arrives on the loading dock. Your job is to separate the shipment by part type and distribute the pieces to different areas of the warehouse. To make your job easier, the warehouse owner has procured the best forklift money can buy. There are two possible strategies to employ:

  • Pick up one box at a time, determine the type, and drive it to the appropriate destination.

  • Spend some time analyzing the situation, determine that every box on a pallet is of the same type, and drive entire pallets to the appropriate destination.

Although this analogy might be overly simplistic, it does serve to illustrate the difference between set operations and row operations. Allowing the Oracle server to manipulate large sets in a single operation can often yield a performance improvement of several orders of magnitude over manipulating one row at a time, especially on systems with multiple CPUs.

When a procedural language is used for database access (whether it is PL/SQL, C with OCI calls, or Java using JDBC), there is a tendency to employ the first strategy. Perhaps programmers are accustomed to coding at a low level of granularity when using a procedural language and this spills over into their data access logic. This situation is especially prevalent in systems that need to process and load large amounts of data from external files, such as data warehouse load utilities.

Imagine that you are charged with building an infrastructure to accept files from multiple OLTP systems, perform various data cleaning operations, and aggregate the data into a data warehouse. Using PL/SQL (or C, Java, C++, Cobol, etc.), you could build functionality that:

  1. Opens a given file.

  2. Reads a line, verifies/cleans the data, and updates the appropriate row of the appropriate fact table in the data warehouse.

  3. Repeats #2 until the file is exhausted.

  4. Closes the file.

Although this approach might work for small files, it is not uncommon for large warehouses to receive feeds containing hundreds of thousands or millions of items. Even if your code is extremely efficient, processing a million-line file could take several hours.

Here’s an alternate strategy that employs the power of the Oracle server to make quick work of large data feeds:

  1. Create a staging table for each unique data feed file format.

  2. At the start of the load process, truncate the staging tables.

  3. Use SQL*Loader with the direct path option to quickly load the data file into the appropriate staging table.

  4. Update all rows of the staging table to clean, verify, and transform data, marking rows as invalid if they fail verification. Perform the operation in parallel if possible.

  5. Update the appropriate fact table using a subquery against the staging table. Again, perform in parallel if possible.

For this strategy to succeed, you need to have adequate disk space and sufficiently large rollback and temporary tablespaces. With adequate resources and properly constructed SQL statements, however, this strategy can yield a 10X improvement over the previous strategy.

So what role should PL/SQL play in such a scenario? In this case, PL/SQL would be an excellent vehicle for executing steps 4 and 5 of the previous list. Although the stored procedures might each contain only a single update statement, the SQL is likely to be complex and may contain optimizer hints and other advanced features. Therefore, it would be advisable to isolate the SQL from the rest of the application so that it may be independently monitored and tuned.

In general, when dealing with complex logic involving large data sets, it is advantageous to think in terms of data sets rather than programming steps. In other words, ask yourself where your data is, where it needs to move to, and what needs to happen to it during its journey instead of thinking in terms of what needs to happen with each piece of data to satisfy the business requirements. If you follow this strategy, you will find yourself writing substantial, efficient SQL statements that employ PL/SQL where appropriate, rather than writing complex PL/SQL routines that employ SQL when needed. In doing so, you will be providing the server with the opportunity to split large workloads into multiple pieces that run in parallel, which can greatly improve performance.



[1] For example, Oracle PL/SQL Programming by Steven Feuerstein (O’Reilly).

[2] Database triggers are another type of stored PL/SQL, but they are outside the scope of this discussion.

[3] This is one reason why you should never use SELECT * when working with a view. Always explicitly name the columns that you need so that the server doesn’t waste time generating data that you never reference.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.253.223