Chapter 26. Relational Databases and SQL

Relational Databases and SQL

In the first half of this chapter we’ll look at the most widely used kind of database, the relational database. We’ll examine the model for using it, and we’ll introduce some of the special terms that database experts use. We will work through a couple of small examples to show the techniques that apply equally well to much larger databases. The second half of the chapter is a primer on SQL, the specialized programming language used to talk to a database and update information in it. We’ll look at the way you create database tables and populate them with data. Then we’ll describe the SQL statements to extract, update, and remove information from a database. SQL has its own data types and operators, presented here. All this will prepare us to use JDBC, the Java library that supports access to databases, in the following chapter. If you are already well versed in databases and SQL, you can safely go straight on to the next chapter now.

The JDBC library is one of the most important Java libraries, right up there with the XML library and the servlet library. JDBC solves a problem that has previously been a drawback to all other database programming approaches, namely, platform lock-in. When you write your database access code in Java using JDBC, your database can easily be moved to another database vendor and another OS environment. If you outgrow the capacities of either of these, it’s straightforward to trade up to a more capable platform with minimal or no changes to your software. If you want to change database vendors for any reason, your software is easily portable to the new environment.

Database programming is a central part of modern enterprise software systems. All professional programmers should understand the basics. This chapter gives you a solid grounding in programming database queries in SQL. JDBC uses SQL to do its work. We focus on SQL here and defer the Java part to the next chapter.

Introduction to Relational Databases

A database is a structured collection of data. It may be anything from a simple list of members of a sports club, to the inventory of a store, or the huge amounts of information in a corporate network. To retrieve and update data stored in a computer database, you need database management software. There are several approaches to database software architecture: relational, hierarchical, and Codasyl network. Here, we’ll focus on relational databases, which have become the most widely used approach by far. They are simpler to understand, to implement, and to program.

Like so many of the best ideas in computer science (e.g., TCP/IP, HTTP, XML, ethernet, sockets, or the JVM), relational databases are based on a simple fundamental concept. The idea behind relational databases is that you can organize your data into tables. Other approaches to databases have tried to keep everything in one big repository. Having individual tables that keep the related pieces of data together simplifies the design and programming. It also adds speed and flexibility to the implementation.

What specifically do we mean by a “table”? We mean the data can be represented in tabular form with columns that all contain values of one type, and where a row holds the related data on one thing (one customer, one account, one order, one product, etc.). The table format is a logical, not a physical, organization. Under the covers, the database management software will typically store the data in indexed files and cache it in memory in tree structures when the program is running. But it will always present the appearance of tables to your programs.

Table 26-1 shows some (fictional) people, their age, favorite bands, and where they live.

Table 26-1. The “People and Music” Table

Name

Age

Lives in

Listens To

Robert Bellamy

24

England

Beatles

Robert Bellamy

24

England

Abba

Robert Bellamy

24

England

Oasis

Judith Brown

34

Africa

Muddy Ibe

Judith Brown

34

Africa

Abba

Butch Fad

53

USA

Metallica

Timothy French

24

Africa

Oasis

Timothy French

24

Africa

Nirvana

Grayham Downer

59

Africa

Beatles

There are some special database terms that go with tables. A single row of data is known as a tuple, or more commonly a record or row. It’s effectively a set of data that belong together in some way. In our first table, the record (Robert Bellamy, 24, England, Beatles) is saying that the age, place, and music preferences are those of Robert Bellamy. The second record stores another music preference for Robert: he also listens to Abba. The name of the data in a column is called an attribute. Age is an attribute here. An attribute is the term for an individual field of a record.

A domain is the set of allowable values for an attribute. An example domain for the age attribute could be “integers between 0 and 125.” The domain is a constraint on the values of the attribute. A relation is (informally) a table with columns and rows. The number of attributes in a relation (i.e., how many fields a record has) is called its degree. The number of tuples in a relation (i.e., the number of data records you have in it) is called its cardinality. If math scares you, you can forget this paragraph, but it’s how database maestros talk to each other at conferences. We’ll prefer the more widely used terms of record, column, etc. from this point.

We have special terms for all these things because they are mathematical concepts. Relational databases are based on the mathematical concepts known as set theory and predicate logic. People often think that the “relational” part of the name comes because we store related things together. Actually, it is because the architecture uses mathematical relations that say how one group of data is associated with another. There is a formal underpinning to our data manipulation, and it can be proved that certain operations will yield the correct result, are equivalent to some longer operation, and so on. To make certain that our databases remain true to the mathematics, these qualities are required to be true at all times:

  • Each table has a different name from all the others in the database.

  • Each column has a different name from all the others in the database.

  • Each row has a different value from all the others in the table. There is no duplicate data in a table.

  • Each “cell” (attribute) in a table contains exactly one value.

  • The order in which the rows and columns appear has no significance. To make order be significant, you create a new column to relate two rows instead.

  • Values of a column are all from the same domain, i.e., if a column starts off representing age, it doesn’t suddenly change into salary halfway through a table.

Referring to Table 26-1, say we want to find all the people who live in Africa and listen to Abba. We simply look at each record and compare the “Lives in” and “Listens to” attributes, printing out the ones that match “Africa” and “Abba,” respectively. It works well for this query, but there are some big disadvantages to storing the data all in one table. If someone moves out of the country, we have to find every record in which their name appears and update it. As we are going through the database trying to update records, we have to lock out other read attempts to stop them seeing inconsistent data. Because so much data is duplicated, our storage needs will be bigger, and all programs which run against the database will take longer. The amount of data here is very small, but keep in mind that all the sizes scale up. A company could easily have a database containing hundreds of tables, some of which have millions of rows.

The recommended approach to designing databases is to try to minimize the amount of data duplication. You try to have one relation for each kind of entity (customer, employee, order, cd catalog, shipment, etc.) and store in there only the attributes directly associated with that kind of entity, not every possible attribute it has. In this case, we will probably create a couple of tables, such as Table 26-2.

Table 26-2. The “Person” Table

Name

Age

Lives in

Robert Bellamy

24

England

Grayham Downer

59

Africa

Timothy French

24

Africa

Butch Fad

53

USA

Judith Brown

34

Africa

In Table 26-3, we store attributes called “Name” and “Music Group Name.” There, each record represents a person and a group that they listen to regularly.

Table 26-3. The “Listens To” Table

Name Music

Group Name

Robert Bellamy

Beatles

Robert Bellamy

Abba

Robert Bellamy

Oasis

Butch Fad

Metallica

Judith Brown

Muddy Ibe

Judith Brown

Abba

Timothy French

Oasis

Timothy French

Nirvana

Grayham Downer

Beatles

Do you see what we have done? We have “factored out” the common data of name, age, and lives in into one table, leaving name and music group name in Table 26-3. Now when Robert Bellamy moves to the USA, that information only needs to be updated in exactly one record. The data in “Listens To” is related to the data in “Person” by matching the name attribute. If Robert stops listening to Abba, we can delete that row without also dropping Robert, his age, and his country from the database.

We will be using these tables with this data for the rest of the chapter, so you may want to turn the corner of the page down so you can easily refer back to this section. Now we dive into some low-level details for a few paragraphs. You need these terms to understand how to use SQL, so don’t go on until you have understood it.

Primary and Foreign Keys

Every table must and will have an attribute (or group of attributes together) that uniquely identifies a record. This attribute or group of attributes is called the primary key to the table. By “uniquely identify a record,” we mean that no other record has the same value for that attribute or group of attributes. The primary key to our Person table is the “Name” attribute. We can never allow two different people to have the same name in this small database, although that is an unrealistic restriction in real life. That’s why banks and other agencies identify you by social security number or an account number, which is guaranteed to be unique.

In our “Listens To” table, we need both attributes to uniquely identify a record. People who like two bands are in there once for each band, so name is not unique. And since several people can listen to the same band, music group names are duplicated too. But the combination of person name plus music group name is unique. So the primary key to our “Listens To” table is both these attributes.

As well as primary keys, many tables contain foreign keys. These are attributes in one table that are a primary key in some other table. The “Listens To” table has a foreign key of Name, which is the primary key for Person. The Person table does not contain any foreign keys. Although it has the Name attribute, that is only part of the key for the “Listens To” table, not the whole key. It’s called a “foreign” key because it is not a key in this table, but a table in some other place.

When you have a value of a foreign key, for instance, “Judith Brown” in the “Listens To” table, that value must also occur in the table for which it is the primary key. In other words, there must be a “Judith Brown” entry in the Person table. In fact, the purpose of keys is to be able to get to related data in other tables. Keys are how we navigate through the database. When the foreign key existence requirement is met, then the database is said to have referential integrity. You keep referential integrity in a database by being careful about the data you remove. If you drop a customer account because of lack of activity, you must also drop all references to that customer in all tables in your database. The onus is on the programmer to keep referential integrity; the database cannot do it for you. If your database lacks referential integrity, you’ll get funny results when you try to extract data from more than one table together.

There are two other forms of integrity that databases need: entity integrity and database integrity. If you don’t have a value for some attribute, perhaps because you are still acquiring data for that table, there is a special value called “null” that can be assigned. Null doesn’t mean zero. It means “no value has yet been assigned.” An expression involving null evaluates to null. We could use null in the age column, when someone does not wish to give us their age. We must never use null in any column that is part of a primary or foreign key. This is referred to as entity integrity. The entity integrity rule ensures that all our keys are always valid keys.

To understand database integrity think of an update to a database that moves money from one account to another. There will probably be a relation for each of the different accounts. You need to write a statement that deducts the money from the first account, and a second statement that adds that sum to the second account. You want to be absolutely sure that either both statements are executed or neither of them are. You never want to be in a situation where the money was deducted but not paid in to the second account. If you have a way to group statements and ensure that the whole transaction either occurs or does not occur, you can maintain database integrity.

1-Many, Many-Many Relationships

Let’s say more about the relationship between a foreign key and the table where it is a primary key. That relationship can be 1-to-1, 1-to-many, or many-to-many. A 1-1 relationship says that the two things are matched exactly (it’s a special case of a 1-many relationship, too). The relationship between ship and captain is 1-1. Each ship has one captain, and each captain has one ship (ignoring real world details like captains waiting for a command, etc.). As you might guess, a 1-many relationship means that one thing in this table corresponds to potentially many things in that table. Each individual ship has multiple sailors, so the ship/crew member relationship is 1-many. All the sailors on a given ship will have the same value for the “is a crewmember of” attribute. The “1” side of a 1-many relationship will be a primary key, as shown in Figure 26-1. The ship’s name would be a primary key in the table of a shipping line’s fleet.

Breaking up a many-many relationship: “Many people listen to many bands.”

Figure 26-1. Breaking up a many-many relationship: “Many people listen to many bands.”

Many-many relationships occur when multiple records in the first table are somehow related to multiple records in the second table. We can see what this means if we introduce a “MusicGroup” table that is a list of bands. We could store any band-specific information in it too, such as the “land of origin” for each music group. Take, for example, Table 26-4.

Table 26-4. The “MusicGroup” Table

Music Group Name

Land of Origin

Beatles

England

Abba

Sweden

Oasis

England

Metallica

USA

Muddy Ibe

Africa

Nirvana

USA

Now our database has tables that hold a many-many relationship. Some people listen to several bands, and other bands are listened to by several people. That’s a many-to-many relationship between the Person and MusicGroup tables. Many-to-many relationships can’t be processed directly in relational databases (though it’s clearly possible to create the table). The reason is that a primary key can only link tables on a one-to-many basis. Unless you take other steps, the restriction on many-many means we cannot make direct queries based on band name (e.g., “who listens to a given band?”). That may be acceptable if you never want to make that kind of query, but you want to build flexibility into your designs, not rule it out.

Stating the many-many limitation in terms of Java code, you can think of a primary key as being like the index variable in a Java “for” loop. It lets you process the whole table without missing any rows out, or considering any primary keys twice. Many-many would be like resetting the index variable several times in the looping. Luckily, there is an easy way to get over the restriction that many-many relationships can’t be processed directly. You resolve many-many relationships by adding a new table that can express the relationship in terms of two 1-many tables.

We simply need a new table that, for each band, has a record for each person who listens to it. That new table is on the “many” end of a 1-many relationship with the MusicGroup table. The new table must also have for each person, a record for each band they listen to. So the new table will also be on the “many” end of a 1-many relationship with the Person table. And that’s an exact description of our existing ListensTo table! (Of course, the design was chosen with this in mind). Each name there is related to several music group names, and each music group is related to several names. You can see that Robert is associated with the Beatles, Abba, and Oasis, while Oasis is associated with Timothy and Robert.

Using the ListensTo table we represent the many-many relationship between MusicGroup and Person. We can now do the SQL equivalent of “for each MusicGroupName in MusicGroup table, find the matching MusicGroupName in the ListensTo table, and print out the person name.” This new table allows us to do queries by band. To summarize, we resolve many-many relationships in a relational database by decomposing them into two 1-many relationships, adding a new table as needed.

Normal Forms

Tables need to follow a set of numbered rules known as “normal form.” First normal form says that all attributes must be atomic. That means there can be no lists of items in an attribute. You can’t have an attribute that is “contents of a shopping cart,” because that could contain several items. Atomic means “only one, and it cannot be subdivided any further.”

Second normal form says that it is in first normal form and every non-key attribute depends fully on the key. The Person table has non-key attributes of “age” and “lives in” and both of these are completely dependent on the person we are identifying by the name attribute. However, if we added a column to the table to store, say, the land of origin of the band, we would be breaking second normal form. The band’s land does not depend on the primary key (the person name).

A table is in third normal form if it is in second normal form and all non-key columns are mutually independent. In the Person table, the non-key columns are “age” and “lives in.” These are mutually independent because they can change without affecting the other. If we were to add a column to store “is a minor” data in Person, the table would no longer be in third normal form. Whether or not someone is a minor depends on their age, which is another attribute in the Person relation.

If you find your table designs break a normal form rule, you can always fix it by adding an extra table as we did above to resolve a many-many relationship. You may need to move columns from one table to another or to add an identification number to a couple of tables to relate records between them. Raw data can always be put into third normal form. There are additional normal forms beyond this, but third normal form is enough for most purposes. If you make sure all your tables are in third normal form, you will be able to use relational database operations on them and get the right results. A database that has been designed to be in third normal form is said to be “normalized.” There’s a great memory aid for the form in which you want your tables: the data in a table has to depend on the key, the whole key, and nothing but the key.

Relational Database Glossary

Table 26-5 is a glossary of terms that you can review and refer to as necessary. There are a few terms in here that appear later in the chapter.

Table 26-5. Relational Database Glossary

Name

Description

attributea

column in a table, e.g., the “Name” attribute

cardinality

The number of tuples in a relation

database integrity

You maintain database integrity by grouping statements in a “transaction” that is either executed as a whole, or has no part of it executed

degree

The number of attributes in a relation

domain

The set of permissible values for an attribute

entity integrity

The requirement that key attributes never contain a null

first normal form

A table in which all attributes consist of exactly one data item

foreign key

The attribute or group of attributes in one table that form a primary key for some other table

join

An operation that combines the data in two or more tables by using foreign keys in the first table to access related data in a subsequent table. This is an “inner join” or “equijoin”

prepared statement

An SQL statement which is cached in native code form to allow faster processing

primary key

The attribute or group of attributes that together uniquely identify a record in a table

referential integrity

The requirement that all foreign keys are present in the table where they are a primary key

relation

A table in a relational database. It corresponds to a file of records

relational database

A collection of relations in third normal form

second normal form

A table is in second normal from when it is in first normal form and also every non-key attribute depends on the primary key

stored procedure

A group of related SQL statements which are kept in precompiled form, and can be invoked just like a method call

third normal form

A table is in third normal form when it is in second normal form and also all non-key attributes are independent of each other

trigger

A trigger is an SQL statement that is stored in the database, and that executes automatically when a specified event occurs in the database, such as a column update. It will usually be some kind of automatic delete, insert, or update of some other attribute

tuple

A tuple is a row of related data in a table. It is essentially a record of related data

Once we have our tables, SQL is the language we use to access and process them. The JDBC library is a way of connecting to a database, shipping it SQL statements, and getting back the results in a form that Java can process. In this section we will present a primer on SQL, showing the highlights of the language for those who have never seen it before. If you are already familiar with SQL, you can safely skip this section.

At first, every database vendor had its own special database query language. Users eventually got fed up enough to create an industry standard around IBM’s SQL. There was the SQL’89 standard, followed by the SQL’92 standard, both created under the umbrella of ANSI (American National Standards Institute). SQL version 3 was published in 1999, and is known as “SQL:1999” or SQL-3. SQL is also a FIPS standard, FIPS PUB 127-2. FIPS is a Federal Information Processing Standard issued with the full weight and authority of the U.S. Government, after approval by the Secretary of Commerce. In practice, SQL is fragmented with many slightly incompatible dialects from database vendors. In this chapter we keep to the current ANSI standard and do not present any vendor-specific code. You should follow the same practice in your programs too.

Basic SQL Primer

SQL is an abbreviation for “Structured Query Language” and is a programming language in its own right. It’s usually pronounced like the word “sequel” or spelled out as individual letters s-q-l. SQL is specialized for its application area, and is not used for general purpose programming. But all of the operations that you are likely to want to do to a database are built-in functions in SQL. One attractive feature of the language is that you express what you want to do in English-like text such as this. By convention, the SQL keywords are written in uppercase. They need not be.

SELECT name FROM Person
     WHERE lives_in = 'Africa'
     ORDER BY name; 

In other words, you describe the results you want, not the steps to carry out to get them. This style of programming is known as “functional programming” and it contrasts with the “procedural programming” of more familiar languages like Java. Because you don’t give the steps to get what you want, database implementors are free to find the most efficient way to get it. The big database companies put a lot of effort into their query optimizers, and this is one of the big advantages of SQL over earlier query languages.

The designers of SQL could have chosen to make programmers express the operations in terms of mathematical formulae or algebra, instead of words. That would make programs harder to read for many people and raise an unnecessary barrier to learning and teaching. Executing the above SQL statement on our Person table would yield a result set of:

Grayham Downer

Judith Brown

Timothy French

So far we have outlined the way a relational database stores data and extracts it from a single table. The power of the technology comes from the flexible way you can extract and combine data from several tables to create new tables. It’s a little contrived to show this in a small example, so keep in mind that this works equally well on the huge datasets common in industry, and the benefits are proportionately larger.

There are four categories of SQL statement:

  • CREATE and INSERT to create tables and put records into them

  • SELECT to query the database and get back data that matches your criteria

  • UPDATE to change the values in existing records

  • DELETE and DROP to remove records and tables from the database

There is a surprisingly rich variety of options that can be added to these statements, allowing a large amount of work to be done with a few simple English phrases. JDBC issues SQL commands by putting them in a String, and passing that String to various methods in the JDBC library. So to program in JDBC we need to know what the SQL phrases look like. Or to put it another way, you don’t have to learn another database language if you know SQL. So why bother with JDBC and Java at all? Because you want to do something with the data you pull out of the database: send it to a client, wrap an email around it, mark it up with XML, and so on. Java is frequently the best way to do that something.

These SQL statements are powerful, but they seem to have been designed in a way that makes it very hard to present them in an easy-to-read format! So they are shown here in terms of a template, which is annotated on the right-hand side with some additional remarks. If you try to show the formal grammar for SQL, it explodes in size and gets in the way of clarity. Even so, it’s a bit of work to show the four different kinds of statement. You may want to make one quick pass through the remainder of this section, and then come back to it when you need specific information.

Creating and Populating Tables

The CREATE statement is used to create a new table, and the INSERT statement is used to add a new record to a table.

The CREATE statement has this general format:

Format of SQL CREATE Statement

Additional Information

CREATE TABLE tablename(
     colName  dataType
optionalConstraint
); 

<— can repeat this line, separated by commas

Here is an example of the use of the CREATE statement:

Example of SQL CREATE Statement

Additional Information

CREATE TABLE Person (
       name      VARCHAR(100) PRIMARY KEY,
       age       INTEGER,
       lives_in  VARCHAR(100)
); 

“PRIMARY KEY” is a constraint

This statement will create the Person table that we saw earlier in the chapter. It will have three columns called “name,” “age,” and “lives in.” The “optionalConstraint” above means that you can add or omit a constraint to a column, giving more information about what kind of values are legal there. We have added a constraint to the “name” column, saying that this is the primary key of the table. That has the effect of making sure that records always have a non-null unique value there when the records are inserted into the table. “Not null” and “unique” are also constraints that can be applied individually.

Some datatypes that SQL understands and the corresponding Java types are shown in Table 26-6. The SQL keywords and datatypes can use any letter case. The convention is to write them all in uppercase.

Table 26-6. Some SQL Datatypes

SQL Datatype

Corresponding Java Type

CHAR(n )

String, exactly n chars

VARCHAR(n )

String, up to n chars

INTEGER or INT

int

DOUBLE

double

DATE

java.sql.Date

TIMESTAMP

java.sql.Timestamp

BLOB

java.sql.Blob

ARRAY

java.sql.Array

DECIMAL, NUMERIC

java.math.BigDecimal

The BLOB datatype means “Binary Large Object.” There are also CLOBs, “Character Large Objects.” SQL arrays are sequences of data. Think back to earlier in the chapter where we said that first normal form means that there are no lists of items in an attribute. There’s a question of where you draw the line though. If you are recording student marks over 12 weeks of homework assignments, it’s overkill to store each mark in a new record. Instead, we’ll have a student/course marks record, and the course marks will be held as a fixed length array that gets updated with a new mark each week. BLOBs, CLOBs, and arrays are represented by classes in the java.sql package, and they have methods to get their values.

The CREATE statement just creates an empty table. As yet it has no records in it. We will put records in using the INSERT statement, which has this general appearance:

Format of SQL INSERT Statement

Additional Information

INSERT INTO tablename
   (colName1  ,colName2  ,colName3 ...)
   VALUES
   (value1    ,value2    ,value3 ...)
; 

<— can provide a value for all attributes or just some

<— can repeat this line, separated by commas these values are inserted into the attributes listed

Here is an example of the use of the INSERT statement:

Examples of SQL INSERT Statement

Additional Information

INSERT INTO Person ( name, age, lives in )
   VALUES ('Robert Bellamy', 24, 'England' ),
              ( 'Grayham Downer', null, 'Africa' ),
              ( 'Judith Brown', 34, 'Africa' ); 

Note the use of single quotes to surround a String

Downer doesn't want to give his age

This statement will start to populate (fill in with data) the Person table that we saw earlier in the chapter. The values are inserted into the record in the order in which the attributes are named. The number of values given in each of the value lists should match the number of attributes in the list before the “values” keyword. SQL is very picky about the requirement that character strings be enclosed in single quotes.

Note that some database vendors have not implemented support for inserting multiple rows with one statement. So to retain maximum portability you would want to restrict yourself to adding one record per insert statement.

Querying and Retrieving Data

The SELECT statement is used to query a database and get back the data that matches your query.

The SELECT statement has this general format:

Format of SQL SELECT Statement

Additional Information

SELECT
   Name1  ,Name2  ,Name3 ...
FROM
   tablename1, tableName2, ...
WHERE
      conditions
ORDER BY  colNames
; 

<— can mention one or more columns, or “*” for all columns

<— can mention one or more tables

<— the “WHERE” clause is optional and can be omitted

<— the “ORDER BY” clause is optional and can be omitted

<— It returns the data sorted by this field

We have already seen an example of a basic select from a single table. The power of the statement arises when you select from two or more tables at once. So to find all the people in Africa in our database who listen to the Beatles or the band Fela Kuti, we could use the SQL command shown below. Numbers have been added on the left to help with commenting on the code; these will not appear in actual SQL code.

1    SELECT Person.name, Person.lives_in, ListensTo.music_group_name
2    FROM Person, ListensTo
3    WHERE ListensTo.music_group_name IN ( 'Fela Kuti', 'Beatles' )
4    AND Person.name = ListensTo.person_name
5    AND Person.lives in = 'Africa' ; 

Going through the statement line by line, we can make the following observations:

Line 1 gives the columns that we want to get back in our answer. Notice that the table name can be used to qualify the column so that there is no ambiguity.

Line 2 gives the names of the tables that we will be running the query on.

Line 3 starts our “where” clause. It says which data values or rows will be returned as the answer, based on matching the criteria that follow. The first criterion is that the music_group_name must be one of those in the list given. Notice the way you can compare against a list of items in parentheses.

Line 4 adds another condition. It says that whenever we have found one of those two bands, we look for the same name in the Person table.

Line 5 is the final part of the condition. It says that the “lives in” field for that person should hold the value “Africa.” Voila, we are done. Running the query produces the output:

Grayham Downer, Africa, Fela Kuti
Grayham Downer, Africa, Beatles
Judith Brown, Africa, Fela Kuti 

Grayham Downer appears in the list twice because he matches the criteria twice. He listens to both the target bands. If you were sending out promotional mail based on this query, you would want to ensure that you did not send two mails to him. Database inquiries frequently have results that may seem surprising if you are not familiar with set theory. The keyword “DISTINCT” after “SELECT” will eliminate duplicate records from being returned to you. If you sent the SQL command:

SELECT DISTINCT Person.name
    FROM Person, ListensTo
    WHERE ListensTo.music_group_name IN ( 'Fela Kuti', 'Beatles' )
    AND Person.name = ListensTo.person_name
    AND Person.lives_in = 'Africa' ; 

The result set will be:

Grayham Downer
Judith Brown 

The significance of primary key and foreign key should now be clearer. You always use a foreign key to relate one table to another. The operation is called “join” because you are merging or joining the data in two or more tables where the data match your conditions. In this case we used person_name which is a foreign key in the ListensTo table and the primary key for the Person table. Because it is a key, that allows us to retrieve the data from Person that corresponds to the name we found in ListensTo. This kind of join is an “inner join” or “equijoin.” In set theory terms, it is data that falls in the intersection of the two tables. There are also “outer joins,” which get you the data that is in one table, but not the other. These are outside the scope of this basic primer.

Let’s elaborate on the conditional selections. You can use all the operators shown in Table 26-7 to compare attributes.

Table 26-7. SQL Comparison Operators

Meaning

SQL Operator

Example

equals

=

WHERE lives_in = 'Africa'

greater than

>

WHERE age > 39

less than

<

WHERE age < 21

greater than or equal

>=

WHERE name >= 'Brown'

less than or equal

<=

WHERE age <= 65

not equal to

<>

WHERE name <> 'Brown'

pattern match

LIKE

WHERE name LIKE '%own'

matches any of several choices

IN

WHERE age IN (18, 19, 20)

When you compare a string for being greater than some other string, it does a lexical comparison of the characters. So the name “Crown” is greater than “Brown.” The “like” operator is for pattern-matching, and uses a “%” as a wild card. The example shown in the table will match any names that end with “own.” There are other operators in SQL. The name we select from a table can be a mathematical function of some column in the table. That’s expressed like this:

SELECT COUNT(*) FROM Person; 

That statement gives you the cardinality of the Person table. There are other functions too. Table 26-8 shows some of them. These come after the SELECT keyword, and the entire statement may also have a WHERE clause that restricts the records that are input to the function.

Table 26-8. Some SQL Functions

Meaning

SQL Function

Example

Gives the number of rows satisfying the WHERE condition if present.

COUNT(*)

SELECT COUNT(*) FROM Person;

Gives the total of the named column, for all rows that meet the condition. This examples adds the ages of people over 21 in our database.

SUM(col )

SELECT SUM(age) FROM Person

WHERE age 21;

Calculates the average of the named column. This example gives the average age of the minors.

AVG(col )

SELECT AVG(age) FROM Person

WHERE age < 21;

Returns the largest value in that column.

MAX(col )

SELECT MAX(age) FROM Person;

Returns the smallest value in that column.

MIN(col )

SELECT MIN(age) FROM Person;

Subquery Selections

Quite frequently you want to submit a further select on the result of a select. There are several ways to do that, one way being to nest a select statement inside another. A nested select statement is called a subquery. Here is an example of a subquery:

SELECT person.name FROM person
WHERE
   person.lives_in IN ('England', 'USA')
AND
   person.name NOT IN
     ( SELECT listens_to.name FROM listens_to
       WHERE
          listens_to.music_group_name = 'Beatles' ); 

The simplest way to understand subqueries is to look at them piece by piece, starting from the innermost nested one. In this case, the nested select statement is:

( SELECT listens_to.name FROM listens_to
  WHERE
     listens_to.music_group_name = 'Beatles' ); 

A moment’s reading should convince you that this provides a result set of names of people who listen to the Beatles. So substitute that into the entire statement, and we get:

SELECT person.name FROM person
WHERE
   person.lives_in IN ('England', 'USA')
AND
   person.name NOT IN (names-of-people-who-listen-to-Beatles) ; 

That can quickly be seen as all the people who live in England or the USA, and who do not listen to the Beatles. Be careful. Excessive use of subqueries results in SQL code that is hard to understand and hard to debug. As an alternative to subqueries you can often create, insert into, select from, and then drop temporary tables. Another alternative is to generate the queries dynamically. That is, to use one query to get the list of names, hold that in a variable, and use that variable in the second query. This will become clearer after reading the next chapter.

SELECT and all the SQL statements have even more features than are shown here, but this is enough to start writing real applications.

Result Set of a SELECT Query

We’ve seen informally in previous examples how the results of a SELECT statement are returned to you. The results of a query come back in the form of zero, one or more rows, and is called the result set. The rows in the result set can be retrieved and examined individually using something called a cursor. Just as a GUI cursor marks your position on the screen, a database cursor indicates the row of the result set that you are currently looking at. A cursor is usually implemented as an unsigned integer that holds the offset into the file containing your result set. It has enough knowledge to move forward row by row through the result set.

Database management systems typically provide a cursor to the SQL programmer automatically. The programmer can use it to iterate through the result set. JDBC 2 upgrades the features of a cursor available to Java. Now you can move the cursor backward as well as forward, providing the underlying database supports that. You can also move the cursor to an absolute position (e.g., the fifth row in the result set) or to a position relative to where it is now (e.g., go to the immediate previous record).

We can ask for our result set to come to us sorted by some column or columns. We achieve this by using the “ORDER BY” clause. In this case, it makes sense to use the cursor to ask for the record before the one we are currently looking at. For example, if you order by “billing price” you can go backward until you reach orders under $10. That way, you can process your most valuable orders first, and stop invoicing when the amount is smaller than the cost of processing.

SELECT Pitfalls

Here are some common pitfalls encountered when using the SELECT statement. When you hit an error in your programs in the next chapter, check if it is one of these!

  • not surrounding a literal string in single quotes

  • only mentioning the tables that you are extracting from in the “from” clause. You need to mention all the tables that you will be looking at in the “where” clause

  • failing to specify “distinct,” and thus getting duplicate values in certain columns

  • failing to leave a space between keywords when creating a Java String on several lines containing SQL

Updating Values

The UPDATE statement is used to change the values in an existing record.

The UPDATE statement has this general format:

Format of SQL UPDATE Statement

Additional Information

UPDATE tablename
   SET
     colName1=value1  ,colName2=value2...
   WHERE
     colNamei  someOperator  valuei   ...
; 

<— can provide a value for all attributes or just some

<— can repeat this line, separated by AND or OR

Here is an example of the use of the UPDATE statement:

Example of SQL UPDATE Statement

Additional Information

UPDATE Person
  SET  age = 25, lives_in = 'USA'
  WHERE name='Robert Bellamy' ; 

Robert celebrated his birthday by moving to the USA.

This statement will start to populate (fill in with data) the Person table that we saw earlier in the chapter. The values are inserted into the record in the order in which the attributes are named.

Deleting Records and Tables

The DELETE statement is used to remove records from a table, and the DROP statement is used to completely remove all trace of a table from the database.

The DELETE statement has this general format:

Format of SQL DELETE Statement

Additional Information

DELETE FROM tablename
   WHERE
     colName  someOperator  value   ...
; 

<— can repeat this line, separated by AND/OR to further refine which records get deleted

If you forget the “where” clause, all records in the table are deleted! A table with no records still exists in the database, and can be queried, updated, etc. To get rid of all traces of a table (not a common operation in most databases), use the DROP statement.

The DROP statement has this general format.

Format of SQL DROP Statement

Additional Information

DROP TABLE tablename ;

There is frequently more than one way to write an SQL query. Some of the ways will do less work than other ways. Nowadays it is the database’s responsibility to reorder queries for the best performance.

SQL Prepared Statements and Stored Procedures

Prepared statements and stored procedures are two different ways of organizing your SQL code and getting it to run faster. When you send an SQL statement to your database, there is an SQL interpreter that reads the statement, figures out what it means and which database files are involved, and then issues the lower level native instructions to carry it out. Depending on what the statement is exactly, it may be quite a lot of work to analyze and interpret it.

If you find that you are issuing a statement over and over again, the database will be doing a lot of work that can be avoided. The way to do this is with a prepared statement. As the name implies, the prepared statement is constructed and sent to the SQL interpreter. The output of the interpreter (the native code instructions) is then saved. The prepared statement can later be reissued, perhaps with different parameters, and it will run much more quickly because the interpretation step has already been done. Does this remind you of anything? This is exactly how Just-In-Time (JIT) Java compilers speed up execution—by compiling to native code and caching the results.

A stored procedure is a similar idea to prepared statements but taken one step further. Instead of caching an individual statement, you can save a whole series of statements as a procedure. A stored procedure will typically implement one entire operation on a database, like adding an employee to all the relevant tables (payroll, department, benefits, social club, etc.). It is typical to provide parameters to a stored procedure; for example, giving the details of the employee who is being added to the company.

The vast majority of database systems support stored procedures, but a major sticking point has been the variation in the exact syntax used. JDBC 2 solves this issue by allowing you to write stored procedures in Java. That means your library of stored procedures is now portable to all databases, which is a major step forward!

This concludes our tour of the concepts of SQL and databases, and we now proceed to the next chapter to look at how Java interfaces to all this.

Further Reading

There is a terrific SQL tutorial written by Frank Torres that includes a server-side script allowing you to submit SQL queries against a small database at www.sqlcourse.com/. It is worthwhile to look at this site, which is sponsored by Oracle, the world’s premier database management software company. There is a follow-up advanced version of the course at sqlcourse2.com. Note that the SQL server at Frank’s site doesn’t support all of the standard SQL shown in this chapter, but it easily does enough to let you try some SQL now.

If you’re interested in the emerging technology known as “object-oriented databases,” there is a good paper to read at the Slashdot site. See “Why Aren't You Using an Object Oriented Database Management System?” by Dan Obasanjo at slashdot.org/features/01/05/03/1434242.shtml.

Exercises

  1. Define and give examples of the following database terms: tuple, attribute, relation.

  2. What does it mean to normalize a database design? Describe first, second, and third normal forms.

  3. Review the basic SQL course at www.sqlcourse.com/ that allows you to formulate and run SQL queries online.

  4. Write an SQL statement to display the name and age of everyone in the Person table who is older than 39.

  5. Write an SQL statement to display the name of everyone in the Person table who lives in a NATO country and listens to the Beatles. There are 19 member nations of the North Atlantic Treaty Organization, including the USA, UK, Canada, France, Germany, Greece, and Poland. New members join from time to time, so the roll should probably be kept in a table, rather than a set of literals.

  6. What is an SQL subquery, and when would you use one?

  7. Write an SQL statement to display the name of everyone in the Person table who lives in a NATO country and does not listen to the Beatles. Be careful to exclude people who listen to the Beatles, and also listen to other bands as well. The simplest way to do this is to use a subquery.

  8. Explain, using examples, the difference between a primary key and a foreign key.

Some Light Relief—Reading the Docs

How do you tell if a user has read the software documentation? If users are anything like us programmers, it’s a pretty safe assumption that they have not read the documentation. Time is short, and reading manuals is tedious and time-consuming. I knew someone who worked on the support desk for a large internal software application. He cut his workload by 85% using one simple technique. Whenever someone complained about a bug in the software, he asked them which page of the manual it violated before he would investigate it. Most users preferred to live with any bug rather than spend hours tunneling through the manual, and Jenkins’s technique saved him a lot of bother, right up until the time he got fired.

Another way of encouraging people to read the manual is to have the program ask “Did you read the manual, answer y/n:” The program won’t proceed until it gets the right answer. And neither “yes” nor “no” is the right answer. Somewhere in the manual, buried deep in an obscure paragraph, is the information that this question expects to continue: the answer “foo.” But you’ll only know that if you read the manual thoroughly.

Are you a student reading this chapter for an “Advanced Java” class? OK, then! Please demonstrate that you have read this chapter by writing your favorite color at the top right of the front sheet of your homework for this chapter. If blue is your favorite color, write “blue.” Write “black” if you like black best, etc. Professors: see how many of your students really do the assigned reading.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.171.246