Core SQL
Whether you’re relatively new to writing SQL or you’ve been writing it for years, learning to write “good” SQL is a process that requires a strong knowledge foundation of core syntax and concepts. This chapter provides a review of the core concepts of the SQL language and its capabilities, along with descriptions of the common SQL commands with which you should already be familiar. For those of you who have worked with SQL previously and have a good grasp on the basics, this chapter will be a brief refresher to prepare you for the more detailed treatment of SQL we examine in later chapters. If you’re a new SQL user, you may want to read Beginning Oracle SQL first to make sure you’re comfortable with the basics. Either way, Chapter 1 “level sets” you with a whirlwind tour of the five core SQL statements and provides a quick review of the tool we’ll be using to execute SQL: SQL*Plus.
The SQL Language
The SQL language was originally developed during the 1970s by IBM and was called Structured English Query Language, or SEQUEL. The language was based on the model for relational database management systems (RDBMSs) developed by E. F. Codd in 1969. The acronym was later shortened to SQL as a result of a trademark dispute. In 1986, the American National Standards Institute (ANSI) adopted SQL as a standard, and in 1987, the International Organization for Standardization (otherwise known as the ISO) did so as well. A piece of not-so-common knowledge is that the official pronunciation of the language was declared to be “ess queue ell” by ANSI. Most people, including me, still use the “see-qwell” pronunciation just because it flows a bit easier linguistically.
The purpose of SQL is simply to provide an interface to a database—in our case, Oracle. Every SQL statement is a command, or instruction, to the database. SQL differs from other programming languages such as C and Java in that it is intended to process data in sets, not individual rows. The language also doesn’t require that you provide instructions on how to navigate to the data; this happens transparently, under the covers. But, as you’ll see in the chapters ahead, knowing about your data and how and where they are stored is very important if you want to write efficient SQL in Oracle.
Although there are minor differences in how vendors (such as Oracle, IBM, and Microsoft) implement the core functionality of SQL, the skills you learn in one database transfer to another. Basically, you can use the same SQL statements to query, insert, update, and delete data and create, alter, and drop objects regardless of the database vendor.
Although SQL is the standard for use with various RDBMSs, it is not particularly relational in practice (I expand on this a bit later in the book). I recommend that you read C. J. Date’s book SQL and Relational Theory (O’Reilly Media, 2011) for a detailed review of how SQL and relational theory intersect. Keep in mind that the SQL language doesn’t always follow the relational model precisely; it doesn’t implement some elements of the relational model at all, and it implements other elements improperly. The fact remains that because SQL is based on this model, you must understand SQL and the relational model as well as know how to write SQL as correctly and efficiently as possible.
Interfacing to the Database
Numerous ways have been developed throughout the years for transmitting SQL to a database and getting results back. The native interface to the Oracle database is the Oracle Call Interface (OCI). The OCI powers the queries that are sent internally by the Oracle kernel to the database. You use the OCI any time you use one of Oracle’s tools, such as SQL*Plus or SQL Developer. Various other Oracle tools, including SQL*Loader, Data Pump, and Real Application Testing (RAT), use OCI as well as language-specific interfaces, such as Oracle JDBC-OCI, ODP.Net, Oracle Precompilers, Oracle ODBC, and the Oracle C++ Call Interface (OCCI) drivers.
When you use programming languages such as COBOL or C, the statements you write are known as embedded SQL statements and are preprocessed by a SQL preprocessor before the application program is compiled. Listing 1-1 shows an example of a SQL statement that could be used within a C/C++ block.
Listing 1-1. Embedded SQL Statement Used within a C/C++ Block
{
int a;
/* ... */
EXEC SQL SELECT salary INTO :a
FROM hr.employees
WHERE employee_id = 108;
/* ... */
printf("The salary is %d ", a);
/* ... */
}
Other tools, such as SQL*Plus and SQL Developer, are interactive tools. You enter and execute commands, and the output is displayed back to you. Interactive tools don’t require you to compile your code explicitly before running it; you simply enter the command you wish to execute. Listing 1-2 shows an example of using SQL*Plus to execute a statement.
Listing 1-2. Using SQL*Plus to Execute a SQL Statement
SQL> select salary
2 from hr.employees
3 where employee_id = 108;
SALARY
---------------
12000
In this book, we’ll use SQL*Plus for our example listings for consistency’s sake, but keep in mind that whichever method or tool you use to enter and execute SQL statements, everything ultimately goes through the OCI. The bottom line is that the tool you use doesn’t matter; the native interface is the same for all.
Review of SQL*Plus
SQL*Plus is a command-line tool provided with every Oracle installation regardless of platform (Windows, Unix). It is used to enter and execute SQL commands and to display the resulting output in a text-only environment. The tool allows you to enter and edit commands, save and execute commands individually or via script files, and display the output in nicely formatted report form. To start SQL*Plus you simply start sqlplus from your host’s command prompt.
Connect to a Database
There are multiple ways to connect to a database from SQL*Plus. Before you can connect, however, you likely need to have entries for the databases to which you need to connect entered in the $ORACLE_HOME/network/admin/tnsnames.ora file. Two common ways to supply your connection information when you start SQL*Plus are shown in Listing 1-3; another is to use the SQL*Plus connect command after SQL*Plus starts, as shown in Listing 1-4.
Listing 1-3. Connecting to SQL*Plus from the Windows Command Prompt
$ sqlplus hr@ora12c
SQL*Plus: Release 12.1.0.1.0 Production on Tue May 7 12:32:36 2013
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Enter password:
Last Successful login time: Tue May 07 2013 12:29:09 -04:00
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL>
Listing 1-4. Connecting to SQL*Plus and Logging in to the Database from the SQL> Prompt
$ sqlplus /nolog
SQL*Plus: Release 12.1.0.1.0 Production on Tue May 7 12:34:21 2013
Copyright (c) 1982, 2013, Oracle. All rights reserved.
SQL> connect hr@ora12c
Enter password:
Connected.
SQL>
When starting SQL*Plus using the 12c version, you will notice a new feature that displays your last login time by default. If you don’t want this to appear, simply use the option –nologintime.
To start SQL*Plus without being prompted to log in to a database, start SQL*Plus with the /nolog option.
Configuring the SQL*Plus Environment
SQL*Plus has numerous commands that allow you to customize the working environment and display options. The SQL*Plus help data must be installed by the database administrator or it may not be available. There are three help script files found in $ORACLE_HOME/SQLPLUS/ADMIN/HELP/ that drop, create, and populate the SQL*Plus help tables: HLPBLD.SQL, HELPDROP.SQL, and HELPUS.SQL. Listing 1-5 shows the SQL*Plus commands available after entering the SQL*Plus help index command at the SQL> prompt.
Listing 1-5. SQL*Plus Command List
SQL> help index
Enter Help [topic] for help.
The set command is the primary command used for customizing your environment settings. Listing 1-6 shows the help text for the set command.
Listing 1-6. SQL*Plus SET Command
SQL> help set
SET
---
Sets a system variable to alter the SQL*Plus environment settings
for your current session. For example, to:
- set the display width for data
- customize HTML formatting
- enable or disable printing of column headings
- set the number of lines per page
SET system_variable value
where system_variable and value represent one of the following clauses:
APPI[NFO]{OFF|ON|text} NEWP[AGE] {1|n|NONE}
ARRAY[SIZE] {15|n} NULL text
AUTO[COMMIT] {OFF|ON|IMM[EDIATE]|n} NUMF[ORMAT] format
AUTOP[RINT] {OFF|ON} NUM[WIDTH] {10|n}
AUTORECOVERY {OFF|ON} PAGES[IZE] {14|n}
AUTOT[RACE] {OFF|ON|TRACE[ONLY]} PAU[SE] {OFF|ON|text}
[EXP[LAIN]] [STAT[ISTICS]] RECSEP {WR[APPED]|EA[CH]|OFF}
BLO[CKTERMINATOR] {.|c|ON|OFF} RECSEPCHAR {_|c}
CMDS[EP] {;|c|OFF|ON} SERVEROUT[PUT] {ON|OFF}
COLSEP {_|text} [SIZE {n | UNLIMITED}]
CON[CAT] {.|c|ON|OFF} [FOR[MAT] {WRA[PPED] |
COPYC[OMMIT] {0|n} WOR[D_WRAPPED] |
COPYTYPECHECK {ON|OFF} TRU[NCATED]}]
DEF[INE] {&|c|ON|OFF} SHIFT[INOUT] {VIS[IBLE] |
DESCRIBE [DEPTH {1|n|ALL}] INV[ISIBLE]}
[LINENUM {OFF|ON}] [INDENT {OFF|ON}] SHOW[MODE] {OFF|ON}
ECHO {OFF|ON} SQLBL[ANKLINES] {OFF|ON}
EDITF[ILE] file_name[.ext] SQLC[ASE] {MIX[ED] |
EMB[EDDED] {OFF|ON} LO[WER] | UP[PER]}
ERRORL[OGGING] {ON|OFF} SQLCO[NTINUE] {> | text}
[TABLE [schema.]tablename] SQLN[UMBER] {ON|OFF}
[TRUNCATE] [IDENTIFIER identifier] SQLPLUSCOMPAT[IBILITY]
{x.y[.z]}
ESC[APE] {|c|OFF|ON} SQLPRE[FIX] {#|c}
ESCCHAR {@|?|%|$|OFF} SQLP[ROMPT] {SQL>|text}
EXITC[OMMIT] {ON|OFF} SQLT[ERMINATOR] {;|c|ON|OFF}
FEED[BACK] {6|n|ON|OFF} SUF[FIX] {SQL|text}
FLAGGER {OFF|ENTRY|INTERMED[IATE]|FULL} TAB {ON|OFF}
FLU[SH] {ON|OFF} TERM[OUT] {ON|OFF}
HEA[DING] {ON|OFF} TI[ME] {OFF|ON}
HEADS[EP] {||c|ON|OFF} TIMI[NG] {OFF|ON}
INSTANCE [instance_path|LOCAL] TRIM[OUT] {ON|OFF}
LIN[ESIZE] {80|n} TRIMS[POOL] {OFF|ON}
LOBOF[FSET] {1|n} UND[ERLINE] {-|c|ON|OFF}
LOGSOURCE [pathname] VER[IFY] {ON|OFF}
LONG {80|n} WRA[P] {ON|OFF}
LONGC[HUNKSIZE] {80|n} XQUERY {BASEURI text|
MARK[UP] HTML [OFF|ON] ORDERING{UNORDERED|
[HEAD text] [BODY text] [TABLE text] ORDERED|DEFAULT}|
[ENTMAP {ON|OFF}] NODE{BYVALUE|BYREFERENCE|
[SPOOL {OFF|ON}] DEFAULT}|
[PRE[FORMAT] {OFF|ON}] CONTEXT text}
SQL>
Given the number of commands available, you can customize your environment easily to suit you best. One thing to keep in mind is that the set commands aren’t retained by SQL*Plus when you exit/close the tool. Instead of typing in each of the set commands you want to apply each time you use SQL*Plus, you can create a file named login.sql. There are actually two files that SQL*Plus reads by default each time you start it. The first is glogin.sql and it can be found in the directory $ORACLE_HOME/sqlplus/admin. If this file is found, it is read and the statements it contains are executed. This allows you to store the SQL*Plus commands and SQL statements that customize your experience across SQL*Plus sessions.
After reading glogin.sql, SQL*Plus looks for the login.sql file. This file must exist in either the directory from which SQL*Plus was started or in a directory included in the path to which the environment variable SQLPATH points. Any commands in login.sql will take precedence over those in glogin.sql. Since version 10g, Oracle reads both glogin.sql and login.sql each time you either start SQL*Plus or execute the connect command from within SQL*Plus. Prior to 10g, the login.sql script was only executed when SQL*Plus started. The contents of a common login.sql file are shown in Listing 1-7.
Listing 1-7. A Common login.sql File
SET LINES 3000
--Sets width of display line (default 80 characters)
SET PAGES 1000
-Sets number of lines per page (default 14 lines)
SET TIMING ON
--Sets display of elapsed time (default OFF)
SET NULL <null>
--Sets display of nulls to show <null> (default empty)
SET SQLPROMPT '&_user@&_connect_identifier> '
--Sets the prompt to show connected user and instance
Note the use of the variables _user and _connect_identifier in the SET SQLPROMPT command. These are two examples of predefined variables. You may use any of the following predefined variables in your login.sql file or in any other script file you may create:
Executing Commands
There are two types of commands that can be executed within SQL*Plus: SQL statements and SQL*Plus commands. The SQL*Plus commands shown in Listings 1-5 and 1-6 are specific to SQL*Plus and can be used for customizing the environment and executing commands that are specific to SQL*Plus, such as DESCRIBE and CONNECT. Executing a SQL*Plus command requires only that you type the command at the prompt and press Enter. The command is executed automatically. On the other hand, to execute SQL statements, you must use a special character to indicate you wish to execute the entered command. You may use either a semicolon or a forward slash to do this. A semicolon may be placed directly at the end of the typed command or on a following blank line. The forward slash must be placed on a blank line to be recognized. Listing 1-8 shows how these two execution characters are used.
Listing 1-8. Execution Character Usage
SQL>select empno, deptno from scott.emp where ename = 'SMITH' ;
EMPNO DEPTNO
---------- ----------
7369 20
SQL>select empno, deptno from scott.emp where ename = 'SMITH'
2 ;
EMPNO DEPTNO
---------- ----------
7369 20
SQL>select empno, deptno from scott.emp where ename = 'SMITH'
2 /
EMPNO DEPTNO
---------- ----------
7369 20
SQL>select empno, deptno from scott.emp where ename = 'SMITH'
2
SQL>/
EMPNO DEPTNO
---------- ----------
7369 20
SQL>select empno, deptno from scott.emp where ename = 'SMITH'/
2
SQL>l
1* select empno, deptno from scott.emp where ename = 'SMITH'/
SQL>/
select empno, deptno from scott.emp where ename = 'SMITH'/
*
ERROR at line 1:
ORA-00936: missing expression
Notice the fifth example that puts / at the end of the statement. The cursor moves to a new line instead of executing the command immediately. Then, if you press Enter again, the statement is entered into the SQL*Plus buffer but is not executed. To view the contents of the SQL*Plus buffer, the list command is used (abbreviated to l). If you then attempt to execute the statement in the buffer using /, which is how the / command is intended to be used, you get an error because, originally, you placed / at the end of the SQL statement line. The forward slash is not a valid SQL command and thus causes an error when the statement attempts to execute.
Another way to execute commands is to place them in a file. You can produce these files with the text editor of your choice outside of SQL*Plus or you may invoke an editor directly from SQL*Plus using the EDIT command. The EDIT command either opens a named file or creates a file if it doesn’t exist. The file must be in the default directory or you must specify the full path of the file. To set the editor to one of your choice, simply set the predefined _editor variable using the following command: define _editor='/<full path>/myeditor.exe'. Files with the extension .sql will execute without having to include the extension and can be run using either the @ or START command. Listing 1-9 shows the use of both commands.
Listing 1-9. Executing .sql Script Files
SQL> @list_depts
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
SQL>
SQL> start list_depts
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
SQL>
SQL>l
1* select * from scott.dept
SQL>
SQL*Plus has many features and options—far too many to cover here. For what we need in this book, the previous overview should suffice. However, the Oracle documentation provides guides for SQL*Plus use and there are numerous books, including Beginning Oracle SQL (as mentioned earlier), that go in to more depth if you’re interested.
The Five Core SQL Statements
The SQL language contains many different statements. In your professional career you may end up using just a small percentage of what is available to you. But isn’t that the case with almost any product you use? I once heard a statistic that most people use 20 percent or less of the functionality available in the software products or programming languages they use regularly. I don’t know whether this is actually true, but in my experience, it seems fairly accurate. I have found the same basic SQL statement formats in use within most applications for almost 20 years. Very few people ever use everything SQL has to offer—and they often implement improperly those features they do use frequently. Obviously, we will not be able to cover all the statements and their options found in the SQL language. This book is intended to provide you with deeper insight into the most commonly used statements and to help you understand how to apply them more effectively.
In this book, we will examine five of the most frequently used SQL statements. These statements are SELECT, INSERT, UPDATE, DELETE, and MERGE. Although we’ll address each of these core statements in some fashion, our focus is primarily on the SELECT statement. Developing a good command of these five statements will provide a strong foundation for your day-to-day work with the SQL language.
The SELECT statement is used to retrieve data from one or more tables or other database objects. You should already be familiar with the basics of the SELECT statement, so instead of reviewing the statement from the beginner’s point of view, I want to review how a SELECT statement processes logically. You should have already learned the basic clauses that form a common SELECT statement, but to build the foundation mind-set you need to write well-formed and efficient SQL consistently, you need to understand how SQL processes.
How a query statement is processed logically may be quite different from its actual physical processing. The Oracle cost-based optimizer (CBO) is responsible for generating the actual execution plan for a query, and we will examine what the optimizer does, and how it does it and why. For now, note that the optimizer determines how to access tables and the order in which to process them, and how to join multiple tables and apply filters. The logical order of query processing occurs in a very specific order. However, the steps the optimizer chooses for the physical execution plan can end up actually processing the query in a very different order. Listing 1-10 shows a query stub that contains the main clauses of a SELECT statement with step numbers assigned to each clause in the order it is processed logically.
Listing 1-10. Logical Query Processing Order
5 SELECT <column list>
1 FROM <source object list>
1.1 FROM <left source object> <join type>
JOIN <right source object> ON <on predicates>
2 WHERE <where predicates>
3 GROUP BY <group by expression(s)>
4 HAVING <having predicates>
6 ORDER BY <order by list>
You should notice right away that SQL differs from other programming languages in that the first written statement (SELECT) is not the first line of code that is processed; the FROM clause is processed first. Note that I show two different FROM clauses in this listing. The one marked 1.1 is provided to show the difference when ANSI syntax is used. It may be helpful to imagine that each step in the processing order creates a temporary dataset. As each step is processed, the dataset is manipulated until a final result is formulated. It is this final result set of data that the query returns to the caller.
To walk through each part of the SELECT statement in more detail, you need to use the query in Listing 1-11 that returns a result set that contains a list of female customers that have placed more than four orders.
Listing 1-11. Female Customers Who Have Placed More Than Four Orders
SQL> select c.customer_id, count(o.order_id) as orders_ct
2 from oe.customers c
3 join oe.orders o
4 on c.customer_id = o.customer_id
5 where c.gender = 'F'
6 group by c.customer_id
7 having count(o.order_id) > 4
8 order by orders_ct, c.customer_id
9 /
CUSTOMER_ID ORDERS_CT
----------- ----------
146 5
147 5
The FROM Clause
The FROM clause lists the source objects from which data are selected. This clause can contain tables, views, materialized views, partitions, or subpartitions, or may specify a subquery that identifies objects. If multiple source objects are used, this logical processing phase also applies to each join type and ON predicate. We explore join types in more detail later, but note that, as joins are processed, they occur in the following order:
In the example query in Listing 1-11, the FROM clause lists two tables: customers and orders. They are joined on the customer_id column. So, when this information is processed, the initial dataset that is produced by the FROM clause includes rows in which customer_id matches in both tables. The result set contains 105 rows at this point. To verify this is true, simply execute only the first four lines of the example query, as shown in Listing 1-12.
Listing 1-12. Partial Query Execution through the FROM Clause Only
SQL> select c.customer_id cust_id, o.order_id ord_id, c.gender
2 from oe.customers c
3 join oe.orders o
4 on c.customer_id = o.customer_id;
CUST_ID ORD_ID G CUST_ID ORD_ID G CUST_ID ORD_ID G
------- ------ - ------- ------ - ------- ------ -
147 2450 F 101 2430 M 109 2394 M
147 2425 F 101 2413 M 116 2453 M
147 2385 F 101 2447 M 116 2428 M
147 2366 F 101 2458 M 116 2369 M
147 2396 F 102 2431 M 116 2436 M
148 2451 M 102 2414 M 117 2456 M
148 2426 M 102 2432 M 117 2429 M
148 2386 M 102 2397 M 117 2370 M
148 2367 M 103 2437 F 117 2446 M
148 2406 M 103 2415 F 118 2457 M
149 2452 M 103 2433 F 118 2371 M
149 2427 M 103 2454 F 120 2373 M
149 2387 M 104 2438 F 121 2374 M
149 2368 M 104 2416 F 122 2375 M
149 2434 M 104 2355 F 123 2376 F
150 2388 M 104 2354 F 141 2377 M
151 2389 M 105 2439 F 143 2380 M
152 2390 M 105 2417 F 144 2445 M
153 2391 M 105 2356 F 144 2422 M
154 2392 F 105 2358 F 144 2382 M
155 2393 M 106 2441 M 144 2363 M
156 2395 F 106 2418 M 144 2435 M
157 2398 M 106 2359 M 145 2448 M
158 2399 M 106 2381 M 145 2423 M
159 2400 M 107 2442 F 145 2383 M
160 2401 M 107 2419 F 145 2364 M
161 2402 M 107 2360 F 145 2455 M
162 2403 M 107 2440 F 119 2372 M
163 2404 M 108 2443 M 142 2378 M
164 2405 M 108 2420 M 146 2449 F
165 2407 M 108 2361 M 146 2424 F
166 2408 F 108 2357 M 146 2384 F
167 2409 M 109 2444 M 146 2365 F
169 2411 F 109 2421 M 146 2379 F
170 2412 M 109 2362 M 168 2410 M
105 rows selected.
Note I formatted the result of this output manually to make it fit nicely on the page. The actual output was displayed over 105 separate lines.
The WHERE Clause
The WHERE clause provides a way to limit conditionally the rows emitted to the query’s final result set. Each condition, or predicate, is entered as a comparison of two values or expressions. The comparison matches (evaluates to TRUE) or does not match (evaluates to FALSE). If the comparison is FALSE, then the row is not included in the final result set.
I need to digress just a bit to cover an important aspect of SQL related to this step. Actually, the possible values of a logical comparison in SQL are TRUE, FALSE, and UNKNOWN. The UNKNOWN value occurs when a null is involved. Nulls compared with anything or nulls used in expressions evaluate to null, or UNKNOWN. A null represents a missing value and can be confusing because of inconsistencies in how nulls are treated within different elements of the SQL language. I address how nulls affect the execution of SQL statements throughout the book, although I mention the topic only briefly at this point. What I stated previously is still basically true; comparisons return either TRUE or FALSE. What you’ll find is that when a null is involved in a filter comparison, it is treated as if it is FALSE.
In our example, there is a single predicate used to limit the result only to females who have placed orders. If you review the intermediate result after the FROM clause was processed (see Listing 1-12), you’ll note that only 31 of the 105 rows were placed by female customers (gender = ‘F’). Therefore, after the WHERE clause is applied, the intermediate result set is reduced from 105 rows to 31 rows.
After the WHERE clause is applied, the detailed result set is ready. Note that I use the phrase detailed result set. What I mean is the rows that satisfy the query requirements are now available. Other clauses may be applied (GROUP BY, HAVING) that aggregate and limit the final result set further that the caller receives, but it is important to note that, at this point, all the data your query needs to compute the final answer are available.
The WHERE clause is intended to restrict, or reduce, the result set. The less restrictions you include, the more data your final result set contains. The more data you need to return, the longer the query takes to execute.
The GROUP BY Clause
The GROUP BY clause aggregates the filtered result set available after processing the FROM and WHERE clauses. The selected rows are grouped by the expression(s) listed in this clause to produce a single row of summary information for each group. You may group by any column of any object listed in the FROM clause, even if you don’t intend to display that column in the list of output columns. Conversely, any nonaggregate column in the select list must be included in the GROUP BY expression.
There are two additional operations that can be included in a GROUP BY clause: ROLLUP and CUBE. The ROLLUP operation is used to produce subtotal values. The CUBE operation is used to produce cross-tabulation values. If you use either of these operations, you get more than one row of summary information. Both these operations are discussed in detail in Chapter 7.
In the example query, the requested grouping is customer_id. This means that there is only one row for each distinct customer_id. Of the 31 rows that represent the females who placed orders that made it through the WHERE clause processing, there are 11 distinct customer_id values, as shown in Listing 1-13.
Listing 1-13. Partial Query Execution through the GROUP BY Clause
SQL> select c.customer_id, count(o.order_id) as orders_ct
2 from oe.customers c
3 join oe.orders o
4 on c.customer_id = o.customer_id
5 where gender = 'F'
6 group by c.customer_id;
CUSTOMER_ID ORDERS_CT
----------- ----------
156 1
123 1
166 1
154 1
169 1
105 4
103 4
107 4
104 4
147 5
146 5
11 rows selected.
Notice that the output from the query, although grouped, is not ordered. The display makes it appear as though the rows are ordered by order_ct, but this is more coincidence and not guaranteed behavior. This is an important item to remember; the GROUP BY clause does not ensure ordering of data. If you want the list to display in a specific order, you have to specify an ORDER BY clause.
The HAVING Clause
The HAVING clause restricts the grouped summary rows to those for which the condition(s) in the clause are TRUE. Unless you include a HAVING clause, all summary rows are returned. The GROUP BY and HAVING clauses are actually interchangeable positionally (it doesn’t matter which one comes first). However, it seems to make more sense to code them with the GROUP BY clause first, because GROUP BY is processed logically first. Essentially, the HAVING clause is a second WHERE clause that is evaluated after GROUP BY occurs, and is used to filter on grouped values. Because the WHERE clause is applied before the GROUP BY occurs, you cannot filter grouped results in the WHERE clause; you must use the HAVING clause instead.
In our example query, the HAVING clause, HAVING COUNT(o.order_id) > 4, limits the grouped result data of 11 rows down to two rows. You can confirm this by reviewing the list of rows returned after GROUP BY is applied, as shown in Listing 1-13. Note that only customers 146 and 147 have placed more than four orders. The two rows that make up the final result set are now ready.
The SELECT List
The SELECT list is where the columns included in the final result set from your query are provided. A column can be an actual column from a table, an expression, or even the result of a SELECT statement, as shown in Listing 1-14.
Listing 1-14. Example Query Showing SELECT List Alternatives
SQL> select c.customer_id, c.cust_first_name||' '||c.cust_last_name,
2 (select e.last_name from hr.employees e where e.employee_id = c.account_mgr_id) acct_mgr)
3 from oe.customers c;
CUSTOMER_ID CUST_NAME ACCT_MGR
--------------- ----------------------------------------- --------------
147 Ishwarya Roberts Russell
148 Gustav Steenburgen Russell
...
931 Buster Edwards Cambrault
981 Daniel Gueney Cambrault
319 rows selected.
When another SELECT statement is used to produce the value of a column, the query must return only one row and one column value. These types of subqueries are referred to as scalar subqueries. Although this can be very useful syntax, keep in mind that the scalar subquery is executed once for each row in the result set. There are optimizations available that may eliminate some duplicate executions of the subquery, but the worst-case scenario is that each row requires this scalar subquery to be executed. Imagine the possible overhead involved if your result set had thousands, or millions, of rows! I review scalar subqueries later in the book and discuss how to use them optimally.
Another option you may need to use in the SELECT list is the DISTINCT clause. The example provided here doesn’t use it, but I want to mention it briefly. The DISTINCT clause causes duplicate rows to be removed from the dataset produced after the other clauses have been processed.
After the SELECT list is processed, you now have the final result set for your query. The only thing that remains to be done, if it is included, is to sort the result set into a desired order.
The ORDER BY Clause
The ORDER BY clause is used to order the final set of rows returned by the statement. In this case, the requested sort order was to be by orders_ct and customer_id. The orders_ct column is the value computed using the COUNT aggregate function in the GROUP BY clause. As shown in Listing 1-13, there are two customers that each placed more than four orders. Because each customer placed five orders, the order_ct is the same, so the second ordering column determines the final display order. As shown in Listing 1-15, the final sorted output of the query is a two-row dataset ordered by customer_id.
Listing 1-15. Example Query Final Output
SQL> select c.customer_id, count(o.order_id) as orders_ct
2 from oe.customers c
3 join oe.orders o
4 on c.customer_id = o.customer_id
5 where c.gender = 'F'
6 group by c.customer_id
7 having count(o.order_id) > 4
8 order by orders_ct, c.customer_id
9 /
CUSTOMER_ID ORDERS_CT
----------- ----------
146 5
147 5
When ordered output is requested, Oracle must take the final set of data after all other clauses have been processed and sort them as specified. The size of the data that needs to be sorted is important. When I say size, I mean total bytes of data in the result set. To estimate the size of the dataset, multiply the number of rows by the number of bytes per row. The bytes per row are determined by summing the average column lengths of each of the columns in the SELECT list.
The example query requests only the customer_id and orders_ct column values in the SELECT list. Let’s use ten as our estimated bytes-per-row value. In Chapter 6, I show you where to find the optimizer’s estimate for this value. So, given that we only have two rows in the result set, the sort size is actually quite small, approximately 20 bytes. Remember, this is only an estimate, but the estimate is an important one.
Small sorts should be accomplished entirely in memory whereas large sorts may have to use temporary disk space to complete the sort. As you may likely deduce, a sort that occurs in memory is faster than a sort that must use disk. Therefore, when the optimizer estimates the effect of sorting data, it has to consider how big the sort is to adjust how to accomplish getting the query result in the most efficient way. In general, consider sorts as a fairly expensive overhead to your query processing time, particularly if the size of your result set is large.
The INSERT statement is used to add rows to a table, partition, or view. Rows can be inserted in either a single-table or multitable method. A single-table insert inserts values into one row of one table either by specifying the values explicitly or by retrieving the values using a subquery. The multitable insert inserts rows into one or more tables and computes the row values it inserts by retrieving the values using a subquery.
Single-Table Inserts
The first example in Listing 1-16 illustrates a single-table insert using the VALUES clause. Each column value is entered explicitly. The column list is optional if you include values for each column defined in the table. However, if you only want to provide values for a subset of the columns, you must specify the column names in the column list. A good practice is to include the column list regardless of whether you specify values for all the columns. Doing so acts to self-document the statement and also helps reduce possible errors that might occur in the future should someone add a new column to the table.
Listing 1-16. Single-Table Insert
SQL> insert into hr.jobs (job_id, job_title, min_salary, max_salary)
2 values ('IT_PM', 'Project Manager', 5000, 11000) ;
1 row created.
SQL> insert into scott.bonus (ename, job, sal)
2 select ename, job, sal * .10
3 from scott.emp;
14 rows created.
The second example in Listing 1-16 illustrates an insert using a subquery, which is a very flexible option for inserting rows. The subquery can be written to return one or more rows. Each row returned is used to supply column values for the new rows to be inserted. The subquery can be as simple or as complex as needed to satisfy your needs. In this example, we use the subquery to compute a 10 percent bonus for each employee based on his or her current salary. The bonus table actually has four columns, but we only populate three of them with this insert. The comm column isn’t populated with a value from the subquery and we do not include it in the column list. Because we don’t include this column, the value for it is null. Note that if the comm column had a NOT NULL constraint, you would get a constraint error and the statement would fail.
Multitable Inserts
The multitable insert example in Listing 1-17 illustrates how rows returned from a single subquery can be used to insert rows into more than one table. We start with three tables: small_customers, medium_customers, and large_customers. Let’s populate these tables with customer data based on the total amount of orders a customer has placed. The subquery sums the order_total column for each customer, and then the insert places a row conditionally in the proper table based on whether the customer is considered to be small (less than $10,000 of total orders), medium (between $10,000 and $99,999.99), or large (greater than or equal to $100,000).
Listing 1-17. Multitable Insert
SQL> select * from small_customers ;
no rows selected
SQL> select * from medium_customers ;
no rows selected
SQL> select * from large_customers ;
no rows selected
SQL> insert all
2 when sum_orders < 10000 then
3 into small_customers
4 when sum_orders >= 10000 and sum_orders < 100000 then
5 into medium_customers
6 else
7 into large_customers
8 select customer_id, sum(order_total) sum_orders
9 from oe.orders
10 group by customer_id ;
47 rows created.
SQL> select * from small_customers ;
CUSTOMER_ID SUM_ORDERS
----------- ----------
120 416
121 4797
152 7616.8
157 7110.3
160 969.2
161 600
162 220
163 510
164 1233
165 2519
166 309
167 48
12 rows selected.
SQL> select * from medium_customers ;
CUSTOMER_ID SUM_ORDERS
----------- ----------
102 69211.4
103 20591.4
105 61376.5
106 36199.5
116 32307
119 16447.2
123 11006.2
141 38017.8
142 25691.3
143 27132.6
145 71717.9
146 88462.6
151 17620
153 48070.6
154 26632
155 23431.9
156 68501
158 25270.3
159 69286.4
168 45175
169 15760.5
170 66816
22 rows selected.
SQL> select * from large_customers ;
CUSTOMER_ID SUM_ORDERS
----------- ----------
101 190395.1
104 146605.5
107 155613.2
108 213399.7
109 265255.6
117 157808.7
118 100991.8
122 103834.4
144 160284.6
147 371278.2
148 185700.5
149 403119.7
150 282694.3
13 rows selected.
Note the use of the ALL clause after the INSERT keyword. When ALL is specified, the statement performs unconditional multitable inserts, which means that each WHEN clause is evaluated for each row returned by the subquery regardless of the outcome of a previous condition. Therefore, you need to be careful about how you specify each condition. For example, if I had used WHEN sum_orders < 100000 instead of the range I specified, the medium_customers table would have included the rows that were also inserted into small_customers.
Specify the FIRST option to cause each WHEN to be evaluated in the order it appears in the statement and to skip subsequent WHEN clause evaluations for a given subquery row. The key is to remember which option, ALL or FIRST, best meets your needs, then use the one most suitable.
The UPDATE Statement
The UPDATE statement is used to change the column values of existing rows in a table. The syntax for this statement is composed of three parts: UPDATE, SET, and WHERE. The UPDATE clause specifies the table to update. The SET clause specifies which columns are changed and the modified values. The WHERE clause is used to filter conditionally which rows are updated. This clause is optional; if it is omitted, the update operation is applied to all rows of the specified table.
Listing 1-18 demonstrates several different ways an UPDATE statement can be written. First, I create a duplicate of the employees table called employees2, then I execute several different updates that accomplish basically the same task: the employees in department 90 are updated to have a 10 percent salary increase and, in the case of example 5, the commission_pct column is also updated. The following list includes the different approaches taken:
Listing 1-18. UPDATE Statement Examples
SQL> -- create a duplicate employees table
SQL> create table employees2 as select * from employees ;
Table created.
SQL> -- add a primary key
SQL> alter table employees2
1 add constraint emp2_emp_id_pk primary key (employee_id) ;
Table altered.
SQL> -- retrieve list of employees in department 90
SQL> select employee_id, last_name, salary
2 from employees where department_id = 90 ;
EMPLOYEE_ID LAST_NAME SALARY
--------------- ------------------------- ---------------
100 King 24000
101 Kochhar 17000
102 De Haan 17000
3 rows selected.
SQL> -- Example 1: Update a single column value using an expression
SQL> update employees2
2 set salary = salary * 1.10 -- increase salary by 10%
3 where department_id = 90 ;
3 rows updated.
SQL> commit ;
Commit complete.
SQL> select employee_id, last_name, salary
2 from employees2 where department_id = 90 ;
EMPLOYEE_ID LAST_NAME SALARY
----------- ---------- ------
100 King 26400 -- previous value 24000
101 Kochhar 18700 -- previous value 17000
102 De Haan 18700 -- previous value 17000
3 rows selected.
SQL> -- Example 2: Update a single column value using a subquery
SQL> update employees
2 set salary = (select employees2.salary
3 from employees2
4 where employees2.employee_id = employees.employee_id
5 and employees.salary != employees2.salary)
6 where department_id = 90 ;
3 rows updated.
SQL> select employee_id, last_name, salary
2 from employees where department_id = 90 ;
EMPLOYEE_ID LAST_NAME SALARY
--------------- ------------------------- ---------------
100 King 26400
101 Kochhar 18700
102 De Haan 18700
3 rows selected.
SQL> rollback ;
Rollback complete.
SQL> -- Example 3: Update single column using subquery in
SQL> -- WHERE clause to determine which rows to update
SQL> update employees
2 set salary = salary * 1.10
3 where department_id in (select department_id
4 from departments
5 where department_name = 'Executive') ;
3 rows updated.
SQL> select employee_id, last_name, salary
2 from employees
3 where department_id in (select department_id
4 from departments
5 where department_name = 'Executive') ;
EMPLOYEE_ID LAST_NAME SALARY
--------------- ------------------------- ---------------
100 King 26400
101 Kochhar 18700
102 De Haan 18700
3 rows selected.
SQL> rollback ;
Rollback complete.
SQL> -- Example 4: Update a table using a SELECT statement
SQL> -- to define the table and column values
SQL> update (select e1.salary, e2.salary new_sal
2 from employees e1, employees2 e2
3 where e1.employee_id = e2.employee_id
4 and e1.department_id = 90)
5 set salary = new_sal;
3 rows updated.
SQL> select employee_id, last_name, salary, commission_pct
2 from employees where department_id = 90 ;
EMPLOYEE_ID LAST_NAME SALARY COMMISSION_PCT
--------------- ------------------------- --------------- ---------------
100 King 26400
101 Kochhar 18700
102 De Haan 18700
3 rows selected.
SQL> rollback ;
Rollback complete.
SQL> -- Example 5: Update multiple columns using a subquery
SQL> update employees
2 set (salary, commission_pct) = (select employees2.salary, .10 comm_pct
3 from employees2
4 where employees2.employee_id = employees.employee_id
5 and employees.salary != employees2.salary)
6 where department_id = 90 ;
3 rows updated.
SQL> select employee_id, last_name, salary, commission_pct
2 from employees where department_id = 90 ;
EMPLOYEE_ID LAST_NAME SALARY COMMISSION_PCT
--------------- ------------------------- --------------- ---------------
100 King 26400 .1
101 Kochhar 18700 .1
102 De Haan 18700 .1
3 rows selected.
SQL> rollback ;
Rollback complete.
SQL>
The DELETE Statement
The DELETE statement is used to remove rows from a table. The syntax for this statement is composed of three parts: DELETE, FROM, and WHERE. The DELETE keyword stands alone. Unless you decide to use a hint, which we examine later, there are no other options associated with the DELETE keyword. The FROM clause identifies the table from which rows are to be deleted. As the examples in Listing 1-19 demonstrate, the table can be specified directly or via a subquery. The WHERE clause provides any filter conditions to help determine which rows are deleted. If the WHERE clause is omitted, the DELETE operation deletes all rows in the specified table.
Listing 1-19. DELETE Statement Examples
SQL> select employee_id, department_id, last_name, salary
2 from employees2
3 where department_id = 90;
EMPLOYEE_ID DEPARTMENT_ID LAST_NAME SALARY
--------------- --------------- ------------------------- ---------------
100 90 King 26400
101 90 Kochhar 18700
102 90 De Haan 18700
3 rows selected.
SQL> -- Example 1: Delete rows from specified table using
SQL> -- a filter condition in the WHERE clause
SQL> delete from employees2
2 where department_id = 90;
3 rows deleted.
SQL> select employee_id, department_id, last_name, salary
2 from employees2
3 where department_id = 90;
no rows selected
SQL> rollback;
Rollback complete.
SQL> select employee_id, department_id, last_name, salary
2 from employees2
3 where department_id = 90;
EMPLOYEE_ID DEPARTMENT_ID LAST_NAME SALARY
--------------- --------------- ------------------------- ---------------
100 90 King 26400
101 90 Kochhar 18700
102 90 De Haan 18700
3 rows selected.
SQL> -- Example 2: Delete rows using a subquery in the FROM clause
SQL> delete from (select * from employees2 where department_id = 90);
3 rows deleted.
SQL> select employee_id, department_id, last_name, salary
2 from employees2
3 where department_id = 90;
no rows selected
SQL> rollback;
Rollback complete.
SQL> select employee_id, department_id, last_name, salary
2 from employees2
3 where department_id = 90;
EMPLOYEE_ID DEPARTMENT_ID LAST_NAME SALARY
--------------- --------------- ------------------------- ---------------
100 90 King 26400
101 90 Kochhar 18700
102 90 De Haan 18700
3 rows selected.
SQL> -- Example 3: Delete rows from specified table using
SQL> -- a subquery in the WHERE clause
SQL> delete from employees2
2 where department_id in (select department_id
3 from departments
4 where department_name = 'Executive'),
3 rows deleted.
SQL> select employee_id, department_id, last_name, salary
2 from employees2
3 where department_id = 90;
no rows selected
SQL> rollback;
Rollback complete.
SQL>
Listing 1-19 demonstrates several different ways a DELETE statement can be written. Note that I am using the employees2 table created in Listing 1-18 for these examples. The following are the different delete methods that you can use:
The MERGE Statement
The MERGE statement is a single command that combines the ability to update or insert rows into a table by deriving conditionally the rows to be updated or inserted from one or more sources. It is used most frequently in data warehouses to move large amounts of data, but its use is not limited only to data warehouse environments. The big value-add this statement provides is that you have a convenient way to combine multiple operations into one, which allows you to avoid issuing multiple INSERT, UPDATE, and DELETE statements. And, as you’ll see later in the book, if you can avoid doing work you really don’t have to do, your response times will likely improve.
The syntax for the MERGE statement is as follows:
MERGE <hint>
INTO <table_name>
USING <table_view_or_query>
ON (<condition>)
WHEN MATCHED THEN <update_clause>
DELETE <where_clause>
WHEN NOT MATCHED THEN <insert_clause>
[LOG ERRORS <log_errors_clause> <reject limit <integer | unlimited>];
To demonstrate the use of the MERGE statement, Listing 1-20 shows how to create a test table and then insert or update rows appropriately into that table based on the MERGE conditions.
Listing 1-20. MERGE Statement Example
SQL> create table dept60_bonuses
2 (employee_id number
3 ,bonus_amt number);
Table created.
SQL> insert into dept60_bonuses values (103, 0);
1 row created.
SQL> insert into dept60_bonuses values (104, 100);
1 row created.
SQL> insert into dept60_bonuses values (105, 0);
1 row created.
SQL> commit;
Commit complete.
SQL> select employee_id, last_name, salary
2 from employees
3 where department_id = 60 ;
EMPLOYEE_ID LAST_NAME SALARY
--------------- ------------------------- ---------------
103 Hunold 9000
104 Ernst 6000
105 Austin 4800
106 Pataballa 4800
107 Lorentz 4200
5 rows selected.
SQL> select * from dept60_bonuses;
EMPLOYEE_ID BONUS_AMT
--------------- ---------------
103 0
104 100
105 0
3 rows selected.
SQL> merge into dept60_bonuses b
2 using (
3 select employee_id, salary, department_id
4 from employees
5 where department_id = 60) e
6 on (b.employee_id = e.employee_id)
7 when matched then
8 update set b.bonus_amt = e.salary * 0.2
9 where b.bonus_amt = 0
10 delete where (e.salary > 7500)
11 when not matched then
12 insert (b.employee_id, b.bonus_amt)
13 values (e.employee_id, e.salary * 0.1)
14 where (e.salary < 7500);
4 rows merged.
SQL> select * from dept60_bonuses;
EMPLOYEE_ID BONUS_AMT
--------------- ---------------
104 100
105 960
106 480
107 420
4 rows selected.
SQL> rollback;
Rollback complete.
SQL>
The MERGE accomplished the following:
Without the MERGE statement, you would have had to write at least three different statements to complete the same work.
Summary
As you can tell from the examples shown so far, the SQL language offers many alternatives that can produce the same result set. What you may have also noticed is that each of the five core statements can use similar constructs, such as subqueries. The key is to learn which constructs are the most efficient under various circumstances. We look at how to do this later.
If you had any trouble following the examples in this chapter, make sure to take the time to review either Beginning Oracle SQL (mentioned earlier) or The SQL Reference Guide in the Oracle documentation. The rest of this book assumes you are comfortable with the basic constructs for each of the five core SQL statements: SELECT, INSERT, UPDATE, DELETE, and MERGE.
3.138.106.233