Chapter 23. Relational Databases and SQL

  • Introduction to Relational Databases

  • Primary and Foreign Keys

  • Relationships

  • Normal Forms

  • Relational Database Glossary

  • Download and Install Mckoi

  • Basic SQL Primer

  • Creating and Populating Tables

  • Querying and Retrieving Data

  • Subquery Selections

  • Result Set of a SELECT Query

  • Updating Values

  • Deleting Records and Tables

  • SQL Prepared Statements and Stored Procedures

  • Exercises

  • Some Light Relief—Reading the Docs

In the first half of this chapter we'll look at the most widely used kind of database, the relational database. We'll examine the model for using it, and we'll introduce some of the special terms that database experts use. We will work through a couple of small examples to show the techniques that apply equally well to much larger databases. The second half of the chapter is a primer on SQL, the specialized programming language used to talk to a database and update information in it. We'll look at the way you create database tables and populate them with data. Then we'll describe the SQL statements to extract, update, and remove information from a database. SQL has its own data types and operators, presented here.

All this will prepare us to use JDBC, the Java library that supports access to databases, in Chapter 24, “JDBC.” If you are already well versed in databases and SQL, you can safely go straight on to that chapter now.

The JDBC library is one of the most important Java libraries, right up there with the XML library and the servlet library. JDBC solves a problem that has previously been a drawback to all other database programming approaches, namely, platform lock-in. When you write your database access code in Java using JDBC, your database can easily be moved to another database vendor and another OS environment. If you outgrow the capacities of either of these, it's straightforward to trade up to a more capable platform with minimal or no changes to your software. If you want to change database vendors for any reason, your software is easily portable to the new environment.

Database programming is a central part of modern enterprise software systems. All professional programmers should understand the basics. This chapter gives you a solid grounding in programming database queries in SQL. JDBC uses SQL to do its work. We focus on SQL here and defer the Java part to Chapter 24.

Finally, we'll use a really terrific open source database known as the Mckoi database as our example. This is a free, open source database, written in 100% Java, and available for easy download. The Mckoi database comes with a small GUI program that lets you run SQL queries against the example database. It is an excellent tool for learning and experimenting with SQL without wasting a lot of time.

Introduction to Relational Databases

A database is a structured collection of data. It may be anything from a simple list of members of a sports club, to the inventory of a store, or the huge amounts of information in a corporate network. To retrieve and update data stored in a computer database, you need database management software. There are several approaches to database software architecture: relational, hierarchical, and Codasyl network. Here, we'll focus on relational databases, which is the most widely used approach by far.

The relational database concept

Like so many of the best ideas in computer science (e.g., TCP/IP, HTTP, XML, ethernet, sockets, or the JVM), relational databases are based on a simple fundamental concept. The idea behind relational databases is that you can organize your data into tables. Other approaches to databases have tried to keep everything in one big repository. Having individual tables that keep the related pieces of data together simplifies the design and programming. It also adds speed and flexibility to the implementation.

What is a table?

What specifically do we mean by a “table”? We mean the data can be represented in tabular form with columns that all contain values of one type, and where a row holds the related data on one thing (one customer, one account, one order, one product, etc.). The table format is a logical, not a physical, organization. Under the covers, the database management software will typically store the data in indexed files and cache it in memory in tree structures when the program is running. But it will always present the appearance of tables to your programs.

Table 23-1 shows some (fictional) people, their age, favorite bands, and where they live.

Table 23-1. The “People and Music” table

Name

Age

Lives in

Listens to

Robert Bellamy

24

England

Beatles

Robert Bellamy

24

England

Abba

Robert Bellamy

24

England

Oasis

Judith Brown

34

Africa

Muddy Ibe

Judith Brown

34

Africa

Abba

Butch Fad

53

USA

Metallica

Timothy French

24

Africa

Oasis

Timothy French

24

Africa

Nirvana

Grayham Downer

59

Africa

Beatles

Tricky terminology

There are some special database terms that go with tables. A single row of data is known as a tuple, or more commonly a record or row. It's effectively a set of data that belong together in some way. In our first table, the record (Robert Bellamy, 24, England, Beatles) is saying that the age, place, and music preferences are those of Robert Bellamy. The second record stores another music preference for Robert: he also listens to Abba.

The name of the data in a column is called an attribute. Age is an attribute here. An attribute is the term for an individual field of a record. If math scares you, forget the rest of this paragraph, but it's how database maestros talk to each other at conferences. A domain is the set of allowable values for an attribute. An example domain for the age attribute could be “integers between 0 and 125.” The domain is a constraint on the values of the attribute. A relation is (informally) a table with columns and rows. The number of attributes in a relation (i.e., how many fields a record has) is called its degree. The number of tuples in a relation (i.e., the number of data records you have in it) is called its cardinality. We'll prefer the more widely used terms of record, column, etc. from this point.

Mathematics makes databases work perfectly

We have special terms for all these things because they are mathematical concepts. Relational databases are based on the mathematical concepts known as set theory and predicate logic. People often think that the “relational” part of the name comes because we store related things together. Actually, it is because the architecture uses mathematical relations that say how one group of data is associated with another. There is a formal underpinning to our data manipulation, and it can be proved that certain operations will yield the correct result, are equivalent to some longer operation, and so on. To make certain that our databases remain true to the mathematics, these qualities are required to be true at all times:

  • Each table has a different name from all the others in the database.

  • Each column has a different name from all the others in the database.

  • Each row has a different value from all the others in the table. There is never a duplicate row in a table.

  • Each “cell” (attribute) in a table contains exactly one value.

  • The order in which the rows and columns appear has no significance. If you want order to be significant, you create a new column to store that value.

  • Values of a column are all from the same domain, i.e., if a column starts off representing age, it doesn't suddenly change into salary halfway through a table.

Database design

Referring to Table 23-1, say we want to find all the people who live in Africa and listen to Abba. We simply look at each record and compare the “Lives in” and “Listens to” attributes, printing out the ones that match “Africa” and “Abba,” respectively. It works well for this query, but there are some big disadvantages to storing the data all in one table. If someone moves out of the country, we have to find every record in which their name appears and update it. As we are going through the database trying to update records, we have to lock out other read attempts to stop them seeing inconsistent data. Because so much data is duplicated, our storage needs will be bigger, and all programs which run against the database will take longer. The amount of data here is very small, but keep in mind that all the sizes scale up. A company could easily have a database containing hundreds of tables, some of which have millions of rows.

The recommended approach to designing databases is to try to minimize the amount of data duplication. You try to have one relation for each kind of entity (customer, employee, order, cd catalog, shipment, etc.) and store in there only the attributes directly associated with that kind of entity, not every possible attribute it has. In this case, we will probably create a couple of tables, such as Table 23-2.

Table 23-2. The “Person” table

Name

Age

Lives in

Robert Bellamy

24

England

Grayham Downer

59

Africa

Timothy French

24

Africa

Butch Fad

53

USA

Judith Brown

34

Africa

In Table 23-3, we store attributes called “Name” and “Music Group Name.” There, each record represents a person and a group that they listen to regularly.

Table 23-3. The “ListensTo” table

Name

Music Group Name

Robert Bellamy

Beatles

Robert Bellamy

Abba

Robert Bellamy

Oasis

Butch Fad

Metallica

Judith Brown

Muddy Ibe

Judith Brown

Abba

Timothy French

Oasis

Timothy French

Nirvana

Grayham Downer

Beatles

Do you see what we have done? We have “factored out” the common data of name, age, and lives in into one table, leaving name and music group name in Table 23-3. Now when Robert Bellamy moves to the USA, that information only needs to be updated in exactly one record. The data in “Listens To” is related to the data in “Person” by matching the name attribute. If Robert stops listening to Abba, we can delete that row without also dropping Robert, his age, and his country from the database.

We will be using these tables with this data for the rest of the chapter, so you may want to put a post-it note on this page so you can easily refer back to this section. Now we dive into some low-level details for a few paragraphs. You need these terms to understand how to use SQL, so don't go on until you have understood it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.79.176