2
Relational Overview

Recall the question posed in Chapter 1: What is a database? The answer given there was:

A database is a tool that stores data and lets you create, read, update, and delete the data in some manner.

This broad definition allows you to consider all sorts of odd things as databases, including notebooks, filing cabinets, and your brain. If you're flexible about what you consider data, then this definition includes even stranger objects like a chess set (which stores board positions) or a parking lot (which stores car types and positions, although it might be hard for you to update any given car's position without the owner's consent).

This chapter begins our move into the realm of computerized databases.

Relational databases are the most commonly used computerized databases today, and most of this book (and other database books) focus on them. Relational databases are useful in a huge number of situations, but they're not the only game in town. Sometimes, a different kind of database may make more sense for your particular problem, so I'll say more about nonrelational databases in the next chapter and later in the book. For now, though, let's focus on relational databases.

Before you can start learning how to properly design a relational database, you should understand some basic concepts and terms. This chapter provides an introduction to relational databases. It explains the major ideas and terms that you need to know before you can start designing and building relational databases.

In this chapter, you'll learn about relational database terms such as:

  • Table and relation
  • Record, row, and tuple
  • Column, field, and attribute
  • Constraint, key, and index

Finally, you'll learn about the operations that you can use to get data out of a relational database.

PICKING A DATABASE

There's an expression, “If all you have is a hammer, everything looks like a nail.” If the only kind of database you understand is the relational database, then you'll probably try to hammer every kind of data into a relational database, and that can sometimes lead to trouble.

I once worked on a fairly large database application with 40 developers and more than 120,000 lines of code. The program loaded some fairly large relational databases and used their data to build huge, tree-like structures. Those structures allowed sales representatives to design and modify extremely complicated projects for customers involving tens of thousands of line items.

The data was naturally hierarchical but was stored in relational databases, so the program was forced to spend a long time loading each data set. Many projects took 5 to 20 minutes to load. When the user made even a simple change, the program's design required it to recalculate parts of the tree and then save the changes back into the database—a process that took another 5 to 30 minutes depending on the complexity of the model. The program was so slow that the users couldn't perform the types of experiments they needed to optimize the projects they were building. You couldn't quickly see the effects of tweaking a couple of numbers here and there.

To make matters worse, loading and saving all of that hierarchical data in a relational database required tens of thousands of lines of moderately tricky code that was hard to debug and maintain.

At one point, I performed a quick experiment to see what would happen if the data were stored in an XML database, a database that naturally stores hierarchical data. My test program was able to load and save data sets containing 20,000 items in 3 to 4 seconds.

At that point, the project was too big and the design too entrenched to make such a fundamental change. (Afterward, political pressure within the company pulled the project in too many directions and it eventually shredded like a tissue in a tug-of-war.)

The lesson is clear: before you spend a lot of time building the ultimate relational database and piling thousands of lines of code on top of it, make sure that's the kind of database you need. Had this project started with an XML database, it probably would have had a simpler, more natural design with much less code and would probably have lasted for many years to come.

RELATIONAL POINTS OF VIEW

Relational databases play a critical role in many important (i.e., money-related) computer applications. As is the case whenever enormous amounts of money are at stake, people have spent a huge amount of time and effort building, studying, and refining relational databases. Database researchers usually approach relational databases from one of three points of view.

The first group approaches the problem from a database-theoretical point of view. These people tend to think in terms of provability, mathematical set theory, and propositional logic. You'll see them at parties throwing around phrases like relational algebra, Cartesian product, and tuple relational calculus. This approach is intellectually stimulating (and looks good on a résumé) but can be a bit intimidating. These researchers focus on logical design and idealized database principles.

The second group approaches the matter from a less formal “just build the database and get it done” point of view. Their terminology tends to be less precise and rigorous but more intuitive. They tend to use terms that you may have heard before like table, row, and column. These people focus on physical database design and pay more attention to concrete bits-and-bytes issues dealing with actually building a database and getting the most out of it.

The third group tends to think in terms of flat files and the underlying disk structure used to hold data. Although these people are probably in the minority these days, their terms file, record, and field snuck into database nomenclature and stuck. Many who still use these terms are programmers and other developers who look at the database from a consumer's “how do I get my data out of it” point of view.

These differing viewpoints have led to several different and potentially puzzling ways to consider relational databases. This can cause some confusion, particularly because the different groups have latched on to some of the same terms but used them for different meanings. In fact, they sometimes use the term relation in very different ways (that are described later in this chapter).

This chapter loosely groups these terms into “formal” and “informal” categories, where the formal category includes the database theoretical terms and the informal category includes everything else.

This chapter begins with informal terms. Each section initially focuses on informal terms and concepts, and then explains how they fit together with their more formal equivalents.

TABLE, ROWS, AND COLUMNS

Informally you can think of a relational database as a collection of tables, with each containing rows and columns. At this level, it looks a lot like a computerized workbook containing several worksheets (or spreadsheets), although a worksheet is much less constrained than a database table is. You can put just about anything in any cell in a worksheet. In contrast, every entry in a particular column of a table is expected to contain the same kind of data. For example, all of the cells in a particular column might contain phone numbers or last names.

The set of the values that are allowed for a column is called the column's domain. For example, a column's domain might be telephone numbers, bank account numbers, snowshoe sizes, or hang glider colors.

Domain is closely related to data type, but it's not quite the same. A column's data type is the kind of data that the column can hold. (Yes, I know this is pretty obvious.) The data types that you can use for a column depend on the particular database you are using, but typical data types include integer, floating-point number (a number with a decimal point), string, and date.

To see the difference between domain and data type, note that street address (323 Relational Rd) and jersey color (red) are both strings. However, the domain for the street address column is valid street addresses, whereas the domain for the jersey color column is colors (and possibly not even all colors if you only allow a few choices). You can think of the data type as the highest level or most general possible domain. For example, an address or color domain is a more restrictive subset of the domain allowing all strings.

The rows in a table correspond to column values that are related to each other according to the table's purpose. For example, suppose you have a Competitors table that contains contact information for participants in your First (and probably Last) Annual Extreme Pyramid Sports Championship (aka the Cairolympics). This table includes columns to hold competitor name, address, event, blood type, and next of kin, as shown in Figure 2.1. (Note that this is not a good database design. You'll see why in later chapters.)

A representation exhibits a relational model.

FIGURE 2.1

A particular row in the table holds all of the values for a given competitor. For example, the values in the first row (Alice Adventure, 6543 Flak Ter, Runner AZ 82018, Pyramid Boarding, A+, Art Adventure) all apply to the competitor Alice Adventure.

Back in olden times when database developers worked with primitive tools by candlelight, everyone lived much closer to nature. In this case, it means they needed to work more closely with the underlying filesystem. It was common to store data in “flat” files without any indexes, search tools, or other fancy modern luxuries. A file would hold the related information that you might represent at a higher level as a table. The file was divided into chunks called records, each of which had the same size and corresponded to a row in a table. The records were divided into fixed-length fields that corresponded to the columns in a table.

For example, if you think of the table shown in Figure 2.1 as a flat file, the first row corresponds to a record in the file. Each record contains Name, Address, Event, and other fields to hold the data.

Though relatively few DBAs still work with flat files at this level, the terms file, record, and field are still with us and are often used in database documentation and discussions.

RELATIONS, ATTRIBUTES, AND TUPLES

The values in a row are related by the fact that they apply to a particular person. Because of this fact, the formal term for a table is a relation. This may cause some confusion because the word relation is also used informally to describe a relationship between two tables. This use is described in the section “Foreign Key Constraints,” later in this chapter.

The formal term for a column is an attribute or data element. For example, in the Competitors relation shown in Figure 2.1, Name, Address, BloodType, and NextOfKin are the attributes of each of the people represented. You can think of this as in “each person in the relation has a Name attribute.”

The formal term for a row is a tuple (rhymes with “scruple”). This almost makes sense if you think of a two-attribute relation as holding data pairs, a three-attribute relation as holding value triples, and a four-attribute relation as holding data quadruples. Beyond four items, mathematicians would say 5-tuple, 6-tuple, and so forth, hence the name tuple.

Don't confuse the formal term relation (meaning table) with the more general and less precise use of the term that means “related to” as in “these fields form a relation between these two tables” (or “that psycho is no relation of mine”). Similarly, don't confuse the formal term attribute with the less precise use that means “feature of” as in “this field has the ‘required’ attribute” (or “don't attribute that comment to me!”). I doubt you'll confuse the term tuple with anything—it's probably confusing enough all by itself.

Theoretically a relation does not impose any ordering on the tuples that it contains, nor does it give an ordering to its attributes. Generally, the orderings don't matter to mathematical database theory. In practice, however, database applications usually sort the records selected from a table in some manner to make it easier for the user to understand the results. It's also a lot easier to write the program (and for the user to understand) if the order of the fields remains constant, so database products typically return fields in the order in which they were created in the table unless told otherwise.

KEYS

Relational database terminology includes an abundance of different flavors of keys. (They are key terms, so you could say that relational databases have a lot of key key terms.) In the loosest sense, a key is a combination of one or more columns that you use to find rows in a table. For example, a Customers table might use CustomerID to find customers. If you know a customer's ID, then you can quickly find that customer's record in the table. (In fact, many ID numbers, such as employee IDs, student IDs, driver's license numbers, and so forth, were invented just to make searching in database tables easier. My library card certainly doesn't include a 10-digit ID number for my convenience.)

The more formal relational vocabulary includes several other more precise definitions of keys.

In general, a key is a set of one or more columns in the table that have certain properties. A compound key or composite key is a key that includes more than one column. For example, you might use the combination of FirstName and LastName to look up customers.

A superkey is a set of one or more columns in a table for which no two rows can have the exact same values. For example, in the Competitors table shown in Figure 2.1, the Name, Address, and Event columns together form a superkey because no two rows have exactly the same Name, Address, and Event values. Because superkeys define fields that must be unique within a table, they are sometimes called unique keys.

Because no two rows in the table have the same values for a superkey, a superkey can uniquely identify a particular row in the table. In other words, a program could use a superkey to find any particular record.

A candidate key is a minimal superkey. That means if you remove any of the columns from the superkey, it won't be a superkey anymore.

For example, you already know that Name/Address/Event is a superkey for the Competitors table. If you remove Event from the superkey, you're left with Name/Address. This is not a superkey because everyone in the table is participating in multiple events, and therefore they have more than one record in the table with the same name and address.

If you remove Name, then Address/Event is not a superkey because Dean Daring and his roommate George Foreman share the same address and are both signed up for Pyramid Luge. (They also have the same blood type. They became friends and decided to become roommates when Dean donated blood for George after a particularly flamboyant skateboarding accident.)

Finally if you remove Address, then Name/Event is still a superkey. That means Name/Address/Event is not a candidate key because it is not minimal. However, Name/Event is a candidate key because no two rows have the same Name/Event values and you can easily see neither Name nor Event is a superkey, so the pair is minimal.

You could still have a problem if one of George's other brothers, who are all named George, moves in. If they compete in the same event, you won't be able to tell them apart. Perhaps we should add a CompetitorId column to the table after all.

Note that there may be more than one superkey or candidate key in a table. In Figure 2.1, Event/NextOfKin also forms a candidate key because no two rows have the same Event and NextOfKin values. (That would probably not be the most natural way to look up rows, however. “Yes sir, I can look up your record if you give me your event and next of kin.”)

A unique key is a superkey that is used to uniquely identify the rows in a table. The difference between a unique key and any other candidate key is in how it is used. A candidate key could be used to identify rows if you wanted it to, but a unique key is used to constrain the data. In this example, if you make Name/Event be a unique key, the database will not allow you to add two rows with the same Name and Event values. A unique key is an implementation issue, not a more theoretical concept like a candidate key is.

A primary key is a superkey that is actually used to uniquely identify or find the rows in a table. A table can have only one primary key (hence the name “primary”). Again, this is more of an implementation issue than a theoretical concern. Database products generally take special action to make finding records based on their primary keys faster than finding records based on other keys.

Some databases allow alternate key fields to have missing values, whereas all of the fields in a primary key are required. For example, the Competitors table might have Name/Address/Event as a unique key and Name/Event as a primary key. Then it could contain a record with Name and Event but no Address value (although that would be a bit strange—we might want to require that all the fields have a value).

An alternate key is a candidate key that is not the primary key. Some also call this a secondary key, although others use the term secondary key to mean any set of fields used to locate records even if the fields don't define unique values.

That's a lot of keys to try to remember! The following list briefly summarizes the different flavors:

  • Compound key or composite key—A key that includes more than one field.
  • Superkey—A set of columns for which no two rows can have the exact same values.
  • Candidate key—A minimal superkey.
  • Unique key—A superkey used to require uniqueness by the database.
  • Primary key—A unique key that is used to quickly locate records by the database.
  • Alternate key—A candidate key that is not the primary key.
  • Secondary key—A key used to look up records but that may not guarantee uniqueness.

One last kind of key is the foreign key. A foreign key is used as a constraint rather than to find records in a table, so it is described a bit later in the section “Constraints.”

INDEXES

An index is a database structure that makes it quicker and easier to find records based on the values in one or more fields. Indexes are not the same as keys, although the two are related closely enough that many developers confuse the two and use the terms interchangeably.

For example, suppose you have a Customers table that holds customer information: name, address, phone number, Swiss bank account number, and so forth. The table also contains a CustomerId field that it uses as its primary key.

Unfortunately, customers usually don't remember their customer IDs (I know I don't), so you need to be able to look them up by name or phone number. If you make Name and PhoneNumber two different keys, then you can quickly locate a customer's record in three ways: by customer ID, by name, and by phone number.

Building and maintaining an index takes the database some extra time, so you shouldn't make indexes gratuitously. Place indexes on the fields that you are most likely to need to search and don't bother indexing fields like apartment number or telephone extension, which you're unlikely to need to search.

CONSTRAINTS

As you might guess from the name, a constraint places restrictions on the data allowed in a table. In formal database theory, constraints are not considered part of the database. However, in practice, constraints play such a critical role in managing the data properly that they are informally considered part of the database. (Besides, the database product enforces them!)

The following sections describe some kinds of constraints that you can place on the fields in a table.

Domain Constraints

Relational databases let you specify some simple basic constraints on a particular field. For example, you can make a field required.

The special value null represents an empty value. For example, suppose you don't know a customer's income. You can place the value null in the Income field to indicate that you don't know the correct value. This is different from placing 0 in the field, which would indicate that the customer doesn't have any income.

Making a field required means it cannot hold a null value, so this is also called a not null constraint.

The database will also prevent a field from holding a value that does not match its data type. For example, you cannot put a 20-character string in a 10-character field. Similarly, you cannot store the value “twelve” in a field that holds integers.

These types of constraints restrict the values that you can enter into a field. They help define the field's domain, so they are called domain constraints. Some database products allow you to define more complex domain constraints, often by using check constraints.

Check Constraints

A check constraint is a more complicated type of restriction that evaluates a Boolean expression (a logical expression that evaluates to true or false) to see if certain data should be allowed. If the expression evaluates to true, then the data is allowed.

A field-level check constraint validates a single column. For example, in a SalesPeople table, you could place the constraint Salary > 0 on the Salary field to mean that the field's value must be positive (even if you think that some salespeople deserve negative salaries).

A table-level check constraint can examine more than one of a record's fields to see if the data is valid. For example, the constraint (Salary > 0) OR (Commission > 0) requires that each SalesPeople record must have a positive salary or a positive commission (or both).

Primary Key Constraints

By definition, no two records can have identical values for the fields that define the table's primary key. That greatly constrains the data.

In more formal terms, this type of constraint is called entity integrity. It simply means that no two records are exact duplicates (which is true if the fields in their primary keys are not duplicates), and that all of the fields that make up the primary key have non-null values.

Unique Constraints

A unique constraint requires that the values in one or more fields be unique. Note that it only makes sense to place a uniqueness constraint on a superkey. Recall that a superkey is a group of one or more fields that cannot contain duplicate values. It wouldn't make sense to place a uniqueness constraint on fields that can validly contain duplicated values. For example, it would probably be silly to place a uniqueness constraint on a Customer table's City or State field.

Foreign Key Constraints

A foreign key is not quite the same kind of key defined previously. Instead of defining fields that you can use to locate records, a foreign key refers to a key in a different (foreign) table. The database uses it to locate records in the other table. Because it defines a reference from one table to another, this kind of constraint is also called a referential integrity constraint.

A foreign key constraint requires that a record's values in one or more fields in one table (the referencing table) must match the values in another table (the foreign or referenced table). The fields in the referenced table must form a candidate key in that table. Usually, they are that table's primary key, and most database products try to use the foreign table's primary key by default when you make a foreign key constraint.

For a simple example, suppose you want to validate the entries in the Competitors table's Event field so that the minimum wage interns who answer the phones cannot assign anyone to an event that doesn't exist.

To do this with a foreign key, create a new table named Events that has a single column called Event. Make this the new table's primary key and create records that list the possible events: Pyramid Boarding, Pyramid Luge, Camel Drafting, and Sphinx Jumping.

Next, make a foreign key that relates the Competitors table's Event field with the Events table's Event field. Now whenever someone adds a new record to the Competitors table, the foreign key constraint requires that the new record's Event value be listed in the Events table.

The database will also ensure that no one modifies a Competitors record to change the Event value to something that is not in the Events table.

Finally, the database will take special action if you try to delete a record in the Events table if its value is being used by a Competitors record (for example, if you decide to cancel the Pyramid Luge, but Dean Daring has already signed up for it). Depending on the type of database and how you have the relationship configured, the database will either refuse to remove the Events record or it will automatically delete all of the Competitors records that use it.

This example uses the Events table as a lookup table for the Competitors table. Another common use for foreign key constraints is to ensure that related records always go together. For example, you could build a NextOfKin table that contains information about the competitors’ next of kin (including name, phone number, email address, beneficiary status, and so forth). Then you could create a foreign key constraint to ensure that every Competitors record's NextOfKin value is contained in the Name fields in some NextOfKin table record. That way you know that you can always contact the next of kin for anyone in the Competitors table.

Figure 2.2 shows the Competitors, Events, and NextOfKin tables with arrows showing the relationships among their related fields.

A representation exhibits the new tables holding the original data.

FIGURE 2.2

Foreign keys define associations between tables that are sometimes called relations, relationships, or links between the tables. The fact that the formal database vocabulary uses the word relation to mean “table” sometimes leads to confusion. Fortunately, the formal and informal database people usually get invited to different parties, so the terms usually don't collide in the same conversation.

DATABASE OPERATIONS

Eight operations were originally defined for relational databases, and they form the core of modern database operations. The following list describes those original operations:

  • SelectionThis selects some or all of the records in a table. For example, you might want to select only the Competitors records where Event is Pyramid Luge so that you can know who to expect for that event (and how many ambulances to have standing by).
  • ProjectionThis drops columns from a table or selection. For example, when you make your list of Pyramid Luge competitors, you may want to list only their names and not their addresses, blood types, events (which you know is Pyramid Luge anyway), or next of kin.
  • UnionThis combines tables with similar columns and removes duplicates. For example, suppose you have another table named FormerCompetitors that contains data for people who participated in previous years’ competitions. Some of these people are competing this year and some are not. You could use the union operator to build a list of everyone in either table. (Note that the operation would remove duplicates, but for these tables you would still get the same person several times with different events.)
  • IntersectionThis finds the records that are the same in two tables. The intersection of the FormerCompetitors and Competitors tables would list those few who competed in previous years and who survived to compete again this year (i.e., the slow learners).
  • DifferenceThis selects the records in one table that are not in a second table. For example, the difference between FormerCompetitors and Competitors would give you a list of those who competed in previous years but who are not competing this year (so you can email them and ask them what the problem is).
  • Cartesian ProductThis creates a new table containing every record in a first table combined with every record in a second table. For example, if one table contains values 1, 2, 3, and a second table contains values A, B, C, then their Cartesian product contains the values 1/A, 1/B, 1/C, 2/A, 2/B, 2/C, 3/A, 3/B, and 3/C.
  • JoinThis is similar to a Cartesian product except records in one table are paired only with those in the second table if they meet some condition. For example, you might join the Competitors records with the NextOfKin records where a Competitors record's NextOfKin value matches the NextOfKin record's Name value. In this example, that gives you a list of the competitors together with their corresponding next of kin data.
  • DivideThis operation is the opposite of the Cartesian product. It uses one table to partition the records in another table. It finds all of the field values in one table that are associated with every value in another table. For example, if the first table contains the values 1/A, 1/B, 1/C, 2/A, 2/B, 2/C, 3/A, 3/B, and 3/C and a second table contains the values 1, 2, 3, then the first divided by the second gives A, B, C. (Don't worry, I think it's pretty weird and confusing, too, so it probably won't be on the final exam.)

The workhorse operation of the relational database is the join, which is often combined with selection and projection. For example, you could join Competitors records with NextOfKin records that have the correct name. Next, you could project to select only the competitors’ names, the next of kin names, and the next of kin phone numbers. You could then select only Bart Bold's records. Finally, you could select for unique records so the result would contain only a single record containing the values Bart Bold, Betty Bold, 302-288-9278. That's a result that you might actually need to use.

The following Structured Query Language (SQL) query produces this result:

 SELECT DISTINCT Competitors.Name, NextOfKin.Name, Phone
 FROM Competitors, NextOfKin
 WHERE Competitors.NextOfKin = NextOfKin.Name
   AND Competitors.Name = 'Bart Bold'

The SELECT clause performs selection, the FROM clause tells which tables to join, the first part of the WHERE clause (Competitors.NextOfKin = NextOfKin.Name) gives the join condition, the second part of the WHERE clause (Competitors.Name = 'Bart Bold') selects only Bart's records, and the DISTINCT keyword selects unique results.

The results of these operations are table-like objects that aren't permanently stored in the database. They have rows and columns so they look like tables, but their values are generated on the fly when the database operations are executed. These result objects are called views. Because they are often generated by SQL queries, they are also called query results. Because they look like tables that are generated as needed, they are sometimes called virtual tables.

Later chapters have a lot more to say about relational database operations as they are implemented in practice.

POPULAR RDBs

There are many relational database products available for you to use. All provide the same basic features, such as the ability to build tables, perform CRUD operations, carry out the eight basic relational database operations (selection, projection, union, etc.), define indexes and keys, and so forth.

They all also provide some form of SQL. SQL is a standardized language, so many queries are the same in most RDBMSs, although there are some slight differences. For example, different systems call a 4-byte integer an INT, INTEGER, NUMBER(4), or INT4. Many of these differences affect statements that modify the database (such as adding or deleting tables) rather than queries.

You can find a good SQL tutorial at www.w3schools.com/sql.

For a catalog of SQL differences on different RDBMSs, see https://en.wikibooks.org/wiki/SQL_Dialects_Reference.

You can find a list of around 100 RDBMSs at https://en.wikipedia.org/wiki/List_of_relational_database_management_systems and a similar list at https://database.guide/list-of-relational-database-management-systems-rdbms.

The following list shows the most popular RDBMSs in use as of September 2022 according to the DB-Engines page, https://db-engines.com/en/ranking/relational+dbms:

  1. Oracle
  2. MySQL
  3. Microsoft SQL Server
  4. PostgreSQL
  5. IBM Db2
  6. Microsoft Access
  7. SQLite
  8. MariaDB
  9. Snowflake
  10. Microsoft Azure SQL Database

I started writing a summary of each of the products, but found that they were so similar that it wasn't worth the effort. The following paragraphs describe some of their common features and give a few details for specific products.

Many of these products have free or community editions that have restricted features. Some restrict what you can do. For example, they place limits on table size, database size, and so on. Others restrict how long you can use the free version. The idea is that you can start with the free version, and then move to a more comprehensive (which coincidentally rhymes with “expensive”) version if you outgrow the free edition. Many of the free versions are still quite powerful, so many smaller projects can use those versions indefinitely.

Many of these can run in the cloud in some way. Some companies have separate cloud databases that are similar to their non-cloud versions. For example, Microsoft Azure SQL Database is Microsoft's cloud version of SQL Server. The cloud versions generally have their own free and non-free versions that you can try.

Although these are RDBMSs, many also include at least some NoSQL features. Some also include other features. For example, some of them have object-relational features, and PostgreSQL is really an object-relational database first.

MySQL, PostgreSQL (which is pronounced “post-gress-que-ell”), SQLite, and MariaDB are open source and freeish. The basic installations that you can obtain from the open source sites are free, but companies may provide their own versions that are not free.

For example, you can find the free community edition of MySQL at www.mysql.com/products/community, and links on that page lead to other editions that you can buy from Oracle.

You can download the free edition of PostgreSQL at www.postgresql.org. You can buy PostgreSQL in the cloud from various companies, such as Amazon, Google, Microsoft, EnterpriseDB, and others.

SQLite is a small, fast library written in C that provides a SQL database engine. It is not intended to be used as a stand-alone application, but rather as an engine embedded inside other applications. For example, it is embedded in Android, iOS, Mac OS X 10.4 and newer, and Windows 10 and newer. You can download the SQLite source code or compiled libraries at www.sqlite.org/index.html.

You can download MariaDB Community and find links to other versions at https://mariadb.com/downloads/community.

SPREADSHEETS

You've probably worked with spreadsheets before. Although they're not really relational databases, they have some analogous features. For example, a spreadsheet (analogous to a database) can contain multiple sheets (analogous to tables), which have rows and columns. Rows and columns are superficially similar to records and fields, but they are much less constrained, so a sheet's cells can contain just about anything.

Depending on the particular spreadsheet, you can add functions to cells to calculate new results or validate data entry in a way that is somewhat similar to the way a relational database can validate fields.

You can even use part of a sheet to validate the entries in another part of a sheet. For example, you could create a sheet listing product names, and then require that a cell on another sheet must contain one of those names.

Many spreadsheets also have some sort of programming interface or scripting capability. For example, you can write code behind a Microsoft Excel worksheet in Visual Basic for Applications (VBA). It's a somewhat outdated language, but it adds to the power of Excel. Similarly, Google Sheets has a developer application programming interface (API) that you can use to manipulate sheets programmatically.

Spreadsheets are quite powerful and can perform some complex calculations, automatically updating when necessary, but they are not relational databases. For example, they don't perform relational queries, have indexed fields, enforce relationships among fields, or support many simultaneous users.

If you need the features of a spreadsheet, then by all means use one. However, if you find that you're writing more and more code to do things like join data from different areas, filter results, execute prepackaged queries, and check for errors, then you should consider using a relational database instead. I've worked on a couple projects that used spreadsheets that contained tens of thousands of lines of code behind the scenes and they ended in tears.

SUMMARY

Before you can start designing and building relational databases, you need to understand some of the basics. This chapter provided an introduction to relational databases and their terminology.

In this chapter you learned about:

  • Formal relational database terms like relation, attribute, and tuple
  • Informal terms like table, row, record, column, and field
  • Several kinds of keys, including superkeys, candidate keys, and primary keys
  • Different kinds of constraints that you can place on columns or tables
  • Operations defined for relational databases

As I mentioned earlier, relational databases are popular, but they're not the only game in town. The next chapter describes a few nonrelational databases, some of which are growing more popular.

Before you move on to Chapter 3, however, use the following exercises to test your understanding of the material covered in this chapter. You can find the solutions to these exercises in Appendix A.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.189.220