CHAPTER 2

image

SQL and PL/SQL by Example

A number of our established “powerful” programming language features, even beloved ones, could very well turn out to belong rather to “the problem set” than to “the solution set.”

—Dutch computer scientist Edsger Dijkstra, advocate of structured programming and winner of the 1972 Turing Award, in “Correctness Concerns and, Among Other Things, Why They Are Resented”

All database activity is conducted in SQL, and therefore database administrators need to be intimately familiar with it. Figure 2-1 illustrates that the greatest potential for performance improvement usually lies within the software application, not within the database where the application stores its data or within the physical infrastructure where the database is housed.1

9781484201947_Fig02-01.jpg
Figure 2-1. The performance improvement pyramid

A common example of performance problems caused by applications is that the applications aren’t designed to take advantage of indexing schemes. As an example of a performance problem that you might encounter in SQL, consider the query in Listing 2-1, which retrieves the names of employees with first_name “Steven” and last_name “King”. An index exists on the first_name and last_name columns of the employees table, but the query can’t use the index because it uses the upper function in the restriction. Because Oracle Database can’t use the index, it has to retrieve and examine every single record in the employees table.2

Another source of performance problems is the temptation to use the Oracle Database engine as a calculator to perform arithmetic calculations and string manipulations. This is a highly inefficient use of SQL and is better done using math and string libraries instead. However, it is usually fine to issue a simple query against dual such as in Listing 2-2 that executes just one time.

An equally important reason why database administrators need to understand SQL is that all database administration activities such as database maintenance and user management are also conducted in SQL. It should come as no surprise, therefore, that the Oracle Database 12c SQL Language Reference is almost 2,000 pages—compare this with the 20 or so pages in this chapter. Fortunately, you can go online and search the SQL manual or download the electronic version free of charge.3 You can purchase a printed copy if you like having a mountain of paper on your desk.

A third reason why database administrators need to understand SQL is that it has deficiencies that must be guarded against. These deficiencies include redundancy, problems introduced by nullable data items, and the absence of prohibitions on duplicate data records. I return to this subject later in this chapter.

TRUE STORY

One day, an angry software developer demanded that we find out why Oracle was not responding to “simple queries.” We found that he had submitted a query that indiscriminately joined seven tables. His query was of the form SELECT COUNT(*) ... FROM Table#1, Table#2, Table#3, Table#4, Table#5, Table#6, Table#7.4 The number of rows produced by such a query equals the product of the number of rows in each table; if each specified table contained 1,000 rows, the query would produce 100 trillion rows.

When we asked the software developer why he hadn’t specified any joining criteria, he said that he first wanted to determine whether Oracle could handle a “simple” query before submitting a “complex” query!

It happened to me—it could happen to you!

Railroad Diagrams

SQL statements have many optional clauses, so the Oracle reference manuals use railroad diagrams as a visual aid. Figure 2-2 is an example of a railroad diagram for a hypothetical ROW command.

9781484201947_Fig02-02.jpg
Figure 2-2. A railroad diagram for a hypothetical ROW command

Start at the left of the diagram, and take any path the diagram allows you to take. Along the way, you encounter mandatory words and clauses as well as optional words and clauses and even subdiagrams. The diagram indicates that the ROW command can take the forms indicated in Listing 2-3.

Let’s look at some railroad diagrams for a simplified version of the SELECT statement. The SELECT statement starts with the keyword SELECT and is used to retrieve data from a relational database. What you see here is a very small subset of the complete syntax, but it’s powerful enough for many purposes.

The diagrams in Figures 2-3 through 2-8 show that the keyword SELECT is followed a number of clauses: the mandatory SELECT_list and FROM_clause and the optional WHERE_clause, GROUP_BY_clause, and ORDER_BY_clause. Separate railroad diagrams are also shown for each of the clauses.

9781484201947_Fig02-03.jpg
Figure 2-3. A simplified railroad diagram for the SELECT statement
9781484201947_Fig02-04.jpg
Figure 2-4. The railroad diagram for SELECT_list
9781484201947_Fig02-05.jpg
Figure 2-5. The railroad diagram for FROM_clause
9781484201947_Fig02-06.jpg
Figure 2-6. The railroad diagram for WHERE_clause
9781484201947_Fig02-07.jpg
Figure 2-7. The railroad diagram for GROUP_BY_clause
9781484201947_Fig02-08.jpg
Figure 2-8. The railroad diagram for ORDER_BY_clause

As an example, let’s generate a report of the total salary in each department, sorted by descending order of average salary. Let’s assemble the necessary clauses.

The SELECT_list clause is a comma-separated list of column names or expressions in which you’re interested. Here’s what you need. The upper function produces the uppercase version of a character string, and the avg function produces the average of values in a group:

SELECT upper(department_name),
  AVG(salary)

The WHERE_clause portion of the statement is a list of tables that must be joined. You need the departments table and the employees table. You give each table a short alias that will come in handy in other clauses:

FROM departments d, 
  employees e

The optional GROUP_BY_clause portion of the statement tells Oracle Database that you apply aggregate functions such as avg to groups of records. In this case, you need to group by upper(department_name):

GROUP BY upper(department_name)

Finally, the ORDER_BY_clause portion of the statement specifies how you want the report sorted. In this case, you need to sort by descending values of AVG(salary):

ORDER BY AVG(salary) DESC;

Let’s put this all together:

SELECT upper(department_name),
  AVG(salary)
FROM departments d,
  employees e
WHERE e.department_id = d.department_id
GROUP BY upper(department_name)
ORDER BY AVG(salary) DESC;

And here is the result:

UPPER(DEPARTMENT_NAME)         AVG(SALARY)
------------------------------ -----------
EXECUTIVE                       19333.3333
ACCOUNTING                           10150
PUBLIC RELATIONS                     10000
MARKETING                             9500
SALES                           8955.88235
FINANCE                               8600
HUMAN RESOURCES                       6500
IT                                    5760
ADMINISTRATION                        4400
PURCHASING                            4150
SHIPPING                        3475.55556

Notice the formatting of the results. Formatting instructions aren’t part of the SQL query; the formatting is left to the application program—SQL*Plus in this case—that sent the SQL query to the Oracle Database engine for processing.

Now take a minute and check out the complete syntax of the SELECT statement in the SQL reference manual. Notice the subquery diagram in particular and how it references itself; just before the ORDER BY clause, you can optionally specify a UNION, INTERSECT, or MINUS clause followed by another subquery. Also observe that the subquery diagram is referenced by other diagrams. For example:

  • If you check the Subquery Factoring Clause diagram, you see that it too refers to the subquery diagram.
  • If you check the Table Reference subdiagram, you see that a table reference can include a subquery. This kind of subquery is called an inline view.
  • Subqueries that produce a single value can be used any place where a single value is indicated. This kind of subquery is called a scalar subquery. For example, scalar subqueries can be used in a SELECT list and in a WHERE clause.

Types of SQL

Only a small fraction of the SQL reference manual is devoted to the sort of SQL statements you’ve encountered so far. SQL statements are commonly classified into Database Manipulation Language (DML) statements to modify data and Database Definition Language (DDL) statements to create and modify the different types of objects that compose an Oracle database. The SQL reference manual also describes commands that can create and modify databases and perform database administration activities such as stopping and starting databases.

Data Definition Language

A large portion of the SQL reference manual is devoted to DDL: commands that are used to create, alter, and drop different types of database objects such as tables and indexes. Listing 2-4 shows the DDL commands that can be used to create the employees table in the HR schema. These commands specify a data type such as variable character (VARCHAR2) or numeric for each data item, the length of each data item, and, in the case of numeric data items, the precision. For example, VARCHAR2(32) indicates no more than 32 data characters, and NUMBER(8, 2) indicates a decimal number with no more than 8 digits, 2 of which are to the right of the decimal point. Some of the columns are specified to be NOT NULL, and you also have check, unique-key, and primary-key constraints.

Using SQL Developer, you can review the DDL-creation commands of other tables in the HR schema. Simply go to the SQL tab for each table.

Database Manipulation Language

You’ve already seen several examples of the SELECT statement. Next, let’s consider the INSERT, UPDATE, DELETE, and MERGE statements.

The INSERT Statement

The INSERT statement is the way in which data enters the relational database. In Listing 2-5, a new record is being added to the employees table. However, if you were to try this command using SQL Developer, it would fail because the employee_id column is the primary-key column and the value 100 is already in use. Also, values in the email column are required to be unique, and the value “SKING” is already in use. Try this statement using SQL Developer, and observe the error message. Then change the value to one that will succeed.

The UPDATE Statement

The UPDATE statement enables you to specify a subset of data rows and instructions for modifying them. In Listing 2-6, the e-mail address of an employee is being changed. However, if you were to try this command using SQL Developer, it would fail because the values in the email column are required to be unique and the value “NKOCHAR” is already in use. Try this statement using SQL Developer, and observe the error message. Then change the value “NKOCHAR” to one that will succeed.

The DELETE Statement

The DELETE statement lets you specify a subset of data rows to delete. In Listing 2-7, an employee record is being deleted. However, if you were to try this statement using SQL Developer, it would fail, because the employee_id column is referenced by a foreign-key constraint specifying that the value in the manager_id column must be a valid employee_id (or be left unspecified). The employee_id column is also referenced by a foreign-key constraint on the job_history table, which specifies that the values in the employee_id column should all be valid. Try this statement using SQL Developer, and observe the error message. This statement won’t succeed as long as there any rows for which the manager_id column has the value 101.

The MERGE Statement

MERGE is a powerful statement that combines the capabilities of the INSERT statement, the UPDATE statement, and the DELETE statement. Consider the following example. Suppose you have employee updates in a table called employee_updates. If a record contains an employee_id that already exists in the table, then the corresponding record in the employees table has to be updated. If a record contains an employee_id that doesn’t exist in the table, then the values must be inserted into the employees table. Finally, if the terminated column has value 1, then the corresponding record has to be deleted from the database. You can merge the contents of the employee_updates table into the employees table by using the MERGE statement—inserting, updating, or deleting records as necessary—as shown in Listing 2-8.

Embedded SQL

Application programs written by application software developers can communicate with an Oracle database only by using SQL. These application programs must therefore be linked with Oracle-supplied routines that give them the capability to communicate with an Oracle database.

Listing 2-9 shows an example of SQL statements embedded in a Java program. Each embedded SQL statement is prefixed with the phrase #sql (refer to the text in boldface in the example). The example lists the names of all parts supplied by a specified supplier and the price quote in each case; the account name and password required to connect to the database must be provided by the user at runtime.

SQL*Plus and SQL Developer

Oracle Database provides a command-line tool called SQL*Plus that enables you to interact with the database without having to embed SQL in an application program; this is the tool that database administrators most frequently use in their work besides SQL Developer. The advantage of a command-line tool such as SQL*Plus is that it can be used to automate tasks—that is, a series of SQL statements can be placed in a file and automatically executed at prescribed times.

The SQL*Plus utility can also be used as a simple report-writing tool to produce neatly formatted reports. The commands in Listing 2-10 cause the entire employees table to be listed. The average salary is also printed for each department. Listing 2-11 shows the output produced by the commands in Listing 2-10.

Criticisms of SQL

The IBM team that developed SQL made certain decisions that violated relational principles—chief among them that duplicate rows were allowed. The inventor of relational database technology, Dr. Edgar Codd, was not part of IBM’s development team, and, despite his urging, these deficiencies were not corrected in subsequent revisions of the language. In the opinion of some commentators, the gap then continued to widen. Nevertheless, no one has been successful in providing an alternative to SQL, and you must learn to live with its deficiencies. Chris Date makes the point forcefully in An Introduction to Database Systems:

SQL is now so far from being a true embodiment of relational principles—it suffers from so many sins of both omission and commission—that I would frankly prefer not to discuss it at all! However, SQL is obviously important from a commercial point of view; thus, every database professional needs to have some familiarity with it...

Michael Stonebraker makes the point that SQL is the intergalactic language. Everybody in the galaxy speaks SQL, so, like it or not, we have to be masters of it.

Duplicates

The SQL standard allows tables to contain duplicate data records and doesn’t require that duplicates be eliminated from the results of a projection or union operation, thus violating the principles laid down by Codd. Date has published an example in which 12 formulations of a certain SQL query returned 9 different results because duplicate data rows existed in the tables in question. Not only does this become a programming nightmare, but Date points out that this is one reason why the query optimizer can’t always rewrite queries into alternative forms that are more efficient—the optimizer can’t be sure the result will be unaffected by the rewrite.

Redundancy

Redundancy isn’t a violation of relational principles but creates a serious performance problem that database administrators must understand. There are usually many ways to rephrase the same SQL query. For various reasons (such as the one described in the previous section), the query optimizer might not find the optimal query execution plan in all cases even though all the plans retrieve the same data and, therefore, the query plan that is optimal in one case is equally optimal in all other cases.

Listing 2-12 shows lots of different ways to express the query Departments that have at least one employee. Only the first version conforms to the simplified railroad diagram in Figure 2-3 and is obviously an abbreviated way of specifying the application of three separate relational algebra operators: join, restriction, and projection.

In Chapter 1, you saw the SQL answer to the problem Which employees have worked in all accounting positions; that is, those for which the job_id starts with the characters AC? Listing 2-13 shows two alternative formulations: the first formulation uses correlated subqueries, and the second formulation uses aggregate functions.

Nullable Data Items

Sometimes the value of a data item isn’t known. If you permit a record to be stored even if the values of some data items aren’t known, those data items are said to be nullable. Missing information leads to “fuzzy” logic in which there is a third alternative—unknown—to truth and falsehood of a statement such as department_id=90. Nullable data items are commonly used by database designers, but three-valued logic isn’t intuitive. In the employees table, manager_id is a nullable column. Intuitively, the following queries should return the value 107—the number of rows in the employees table—but instead they return the value 106 because one record doesn’t contain a value for department_id:

SELECT COUNT(*) FROM employees WHERE manager_id = 100 OR manager_id != 100;
SELECT COUNT(*) FROM employees WHERE manager_id = manager_id;
SELECT COUNT(*) FROM employees WHERE manager_id != -1;

To obtain the expected answer, you need to add the clause OR manager_id IS NULL to the preceding SQL queries; you can find more information about the problems created by nullable data items in any good book on SQL.

Introduction to PL/SQL

Oracle Database gives you the ability to store programs in the database using a language called Procedural Language/SQL (PL/SQL). PL/SQL offers the entire suite of structured programming mechanisms, such as condition checking, loops, and subroutines, as shown in Figure 2-9. Just like programs written in other languages, PL/SQL programs use SQL to interact with the database.

9781484201947_Fig02-09.jpg
Figure 2-9. Programming patterns provided by all programming languages

A common use of PL/SQL is to write triggers. Triggers are tied to actions such as the action of inserting a record into a table. For example, the HR schema contains a trigger that inserts a history record into the job_history table whenever the job_id in the employees table is updated. This preserves the history of changes to an employee’s position.

Figure 2-10 diagrams the logic of a PL/SQL program to merge the contents of the employee_updates table with the employees table. This is not the way you should do things, because the merge command is available, but the approach in the figure nicely illustrates the capabilities of PL/SQL. The program corresponding to Figure 2-10 is shown in Listing 2-14. It uses a cursor, which is a mechanism that allows you to iterate through the rows of data returned by a SQL query.

9781484201947_Fig02-10.jpg
Figure 2-10. Flow chart for the employee_updates procedure

Storing programs in the database has many advantages. Special PL/SQL programs called triggers can be executed whenever a user performs a specified action. This gives you the ability to enforce business rules, control access to data, and keep records of who accessed the data and how it changed. Storing sequences of commands in the database greatly reduces the amount of communication between client and server and improves efficiency. Also, PL/SQL functions can be used in SQL statements; this increases the power and flexibility of SQL.

Summary

Database administrators need to understand SQL in all its forms. Many Oracle experts have devoted their entire careers to the study of SQL, and I have hardly been able to scratch the surface in this chapter. I highly recommend the books listed at the end of this chapter; and, of course, you can always download the official Oracle reference works from the Oracle web site. Here is a short summary of the concepts discussed in this chapter:

  • All database activity, including database administration activities, is transacted in SQL.
  • Oracle reference works use railroad diagrams to teach the SQL language. Railroad diagrams can include subdiagrams and can even refer to themselves in recursive fashion. For instance, a table reference can be an entire subquery—this kind of subquery is called an inline view. The SELECT list can include scalar subqueries—that is, subqueries that return exactly one data item from exactly one data row.
  • SQL is divided into Data Manipulation Language (DML) and Data Definition Language (DDL). DML includes the SELECT, INSERT, UPDATE, MERGE, and DELETE statements. DDL includes the CREATE, ALTER, and DROP statements for the different classes of objects in an Oracle database. The SQL reference manual also describes commands that can be used to perform database administration activities such as stopping and starting databases.
  • SQL needs to be embedded into software application programs so they can communicate with the database.
  • SQL has been criticized because it doesn’t prohibit duplicates; the absence of a prohibition against duplicates causes queries that are seemingly equivalent to produce differing results and inhibits the query optimization process.
  • Using nullable data items can lead to results that contradict common sense.
  • SQL queries can usually be formulated in several ways. The Oracle Database optimizer may not always choose the same query execution plan in all cases, even though the query plan that is most efficient in one case is obviously the most efficient for all other cases.
  • Programs written in PL/SQL can be stored in an Oracle database. Using these programs has many advantages, including efficiency, control, and flexibility. PL/SQL offers a full complement of structured programming mechanisms such as condition checking, loops, and subroutines.

Exercises

  • Download Oracle Database 12c SQL Reference from http://docs.oracle.com. Review the railroad diagram for the SELECT statement and all the subdiagrams.
  • Using SQL Developer, try the INSERT, UPDATE, and DELETE statements in Listing 2-5, Listing 2-6, and Listing 2-7. Construct a solution in each case.
  • Use SQL Developer to perform the following tasks:
    1. Create an employee_updates table by cloning the employees table.
    2. Add a terminated column to the employee_updates table.
    3. Insert records into the employee_updates table.
    4. Merge the data in the employee_updates table into the employees table using the MERGE command.

Further Reading

  • De Haan, Lex, and Tim Gorman. Beginning Oracle SQL: For Oracle Database 12c. Apress, 2009. The work of the late Lex De Haan, a well-known Oracle expert and teacher has been revised for Oracle Database 12c by Tim Gorman, an Oracle ACE Director and a member of the OakTable network.
  • Feuerstein, Steven. Oracle Magazine. www.oracle.com/technetwork/issue-archive/index-087690.html. A 12-part series on PL/SQL starting in the March/April 2011 issue and ending in the March/April 2013 issue. Part 2, “Controlling the Flow of Execution,” and Part 12, “Working with Cursors,” are especially relevant to the PL/SQL example in this chapter.
  • Feuerstein, Steven, and Bill Pribyl. Oracle PL/SQL Programming. O’Reilly, 2014. Steven Feuerstein is the world’s top expert in PL/SQL and has a highly readable style. If you’re planning to invest in a PL/SQL book, I recommend this one.
  • Gennick, Jonathan. Oracle SQL*Plus: The Definitive Guide. O’Reilly, 2009. Oracle reference works are good at documenting every detail of a product but not as good at describing how to use the product to maximum advantage. This book will help you do that.
  • McJones, Paul. “The 1995 SQL Reunion: People, Projects, and Politics.” www.hpl.hp.com/techreports/Compaq-DEC/SRC-TN-1997-018.pdf.

Footnotes

1Based on a discussion in Oracle Rdb Guide to Database Performance and Tuning.

2It’s possible to create a function-based index for the upper(first_name) and upper(last_name) columns.

3The URL is http://docs.oracle.com.

4Such a query is called a Cartesian product.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.121.153