1.3. Some Historical Background

This section briefly overviews the evolution of computing languages for information systems and then outlines the historical development of the main kinds of logical data structures used in database systems. We begin with a simple example to illustrate how the level of a language impacts how easy it is to formulate questions.

Table 1.5 summarizes how five generations of computing languages might be used to request a computer to list the name, mass, and moons (if any) of each planet, assuming the information is stored in an astronomical database. The higher the generation, the closer to natural language, and usually the less you have to say. Nowadays nobody uses machine code or assembler to access databases. Most database applications are coded using fourth generation languages (4GLs), perhaps in combination with third generation languages (3GLs).

Table 1.5. Five generations of computer languages.
GenerationLanguage exampleSample code for same task
5ConQuerPlanet that has ✓ Mass and possibly is orbited by ✓ Moon
4SQLselect X1.planetName, Xl.mass, X2.moonName from Planet as X1 left outer join Moon as X2 on Xl.planetName = X2.planetName
3PascalTwo pages of instructions like: for i:= 1 to n do begin write (planetName[i], mass[i]);
28086 AssemblerMany pages of instructions like: ADDI AX, 1
18086 machine codeMany pages of instructions like: 00000101 00000001 00000000

Third generation languages, such as C# and Java, are procedural, emphasizing the procedures used to carry out the task. With 3GLs we typically need to specify how to access data one record at a time. Fourth generation languages, such as SQL, are primarily declarative in nature: one declares what has to be done rather than how to do it. With a 4GL, a single statement can be used to perform operations on whole tables, or sets of rows, at once. Hence 4GLs are set oriented rather than record oriented.

Fifth generation languages (5GLs), such as ConQuer (an ORM query language), allow you to specify queries naturally, without knowing the underlying data structures used to store the information. The widespread use of fifth generation languages is still in the future.

The first database management systems were developed in the early 1960s, starting with simple file managers. Various logical data architectures have been proposed as a basis for specifying the structure of databases. In the hierarchic data model, the database schema is basically a tree of linked record types, where each record type has a different structure (unlike many trees where each node is of the same type).

Records may include one or more fields, each of which can hold only a single value. Record types are related by parent-child links (e.g., using pointers), where a parent may have many children but each child has only one parent. Hence the type structure is that of a tree, or hierarchy.

For example, in Figure 1.9 the parent record type Department has two child record types: Product and Employee. Each record type contains a sequence of named fields, shown here as boxes, and the parent-child links are shown as connecting lines. For discussion purposes, one record instance has been added below each record type. As an exercise, try reading off all the facts that are contained in this database before reading on.

Figure 1.9. A hierarchic database schema with sample data, and fact arcs.


To begin with, there are five facts stored in the record instances. To make these facts more obvious, Figure 1.9 includes arcs connecting the relevant fields, one arc for each fact. Although these arcs are a useful annotation, they are not part of the schema notation. If we are familiar with the business domain, we can verbalize these arcs into relationships. For example, we might verbalize the five facts as follows.

Department 10 is located in Building 69.

Department 10 has Budget 200000 USD.

Product ‘IS2’ has ProductName ‘InfoStar 2’.

Employee 357 has EmployeeName ‘Jones E’.

Employee 357 is of Sex ‘F’.

Are there more facts? Yes! The parent-child links encode the following two facts:

Department 10 develops Product ‘IS2’.

Department 10 employs Employee 357.

Hierarchic DBMSs such as IBM’s Information Management System can efficiently manage hierarchic structures (e.g., a file directory system). However, having to explicitly navigate over predefined record links to get at the facts can be somewhat challenging. The complexity rapidly rises if the application is not hierarchic in nature.

Suppose that the same product may be developed by more than one department. Conceptually, the Department develops Product association is now many:many. Since parent-child links are always l:many, a workaround is needed to handle this situation. For example, to record the facts that departments 10 and 20 both develop product TS2’ we could have the two department record instances point to separate copies of the record instance for product TS2’.

Although the type and instance link structures are still trees, the fact that product ‘IS2’ is named ‘InfoStar 2’ now appears twice in the database. Hence, we need to control this redundancy. Moreover, while retrieving products developed by a given department is easy, retrieving all the departments that developed a product is not so easy.

The network data model was developed by the Conference on Data Systems and Languages (CODASYL) Database Task Group. This model is more complex than the hierarchic model. Most of the data is stored in records, a single field of which may contain a single value, a set of values, or even a set of value groups. Record types are related by owner-member links, and the graph of these connections may be a network: a record type may have many owners and also own many record types.

As in the hierarchic model, facts are stored either in records or as record links. An owner-member link between record types is restricted to a l:many association. To handle a many:many association, such as the case discussed earlier, we might introduce a new record type (e.g., Development) with many:l associations to the other record types (in this case, Department and Product).

In general, encoding of facts in access paths such as interrecord links complicates the management of the application and makes it less flexible. For example, some new queries will have to wait until access paths have been added for them, and internal optimization efforts can be undone as the application structure evolves.

Partly to address such problems, Dr. Edgar (”Ted”) Codd introduced a simpler model: the relational data model. A year after his original 1969 IBM research report on the subject, Codd published a revised version for a wider audience (Codd, 1970) where he first argued that relations should be normalized so that each data entry would be atomic—we now call this first normal form. Other normal forms were defined later.

The most significant feature of the relational model is that all the facts are stored in tables, which are treated as mathematical relations. For example, Figure 1.10 shows the relational database for our sample application. Again, the database is annotated with arcs corresponding to the facts stored. Notice the extra deptNr columns in the Employee and Product tables. The facts that Department 10 employs Employee 357 and develops product ‘IS2’ are stored in the table rows themselves. Access paths between tables are not used to specify facts (as allowed in hierarchic or network models).

Figure 1.10. All the facts in a relational database are stored in the tables themselves.


To specify queries and constraints, table columns may be associated by name. This allows ad hoc queries to be specified at will and simplifies management of the application. Note that constraints specified between tables are not the same as access paths.

For example, in Figure 1.11, arrows “link” the deptNr column of the Employee and Product tables to the deptNr column of the Department table. However these “links” merely express the constraints that any value in the deptNr column of the Employee and Product tables must also occur as a value in the deptNr column of the Department table. These constraints do not express employment and product development facts.

Figure 1.11. “Links” between tables express constraints, not facts.


The relational model is logically cleaner than the network and hierarchic models, but it initially had poor performance, which led to its slow acceptance. However, by the late 1980s, efficient relational systems had become commonplace. Although network and hierarchic database systems are still in use today, relational DBMSs are the preferred choice for developing most new database applications. A DBMS should ideally provide an integrated data dictionary, dynamic optimization, data security, automatic recovery, and a user-friendly interface. The main query languages used with relational databases are SQL (informally known as “Structured Query Language”) and QBE (Query By Example). Many systems support both of these.

SQL has long been accepted as an international standard and is commonly used for communication of queries between different database systems. For this reason, SQL is the main query language discussed in this book.

Recently the extensible Markup Language (XML) has also become widely used for communication between different systems, but this is currently focused on sharing data for purposes such as electronic commerce and web publication. Many SQL-based DBMSs now support storing of XML data, as well as querying XML data directly using a query language such as XQuery.

Data architectures exist for object-oriented databases and deductive databases, but these have a long way to go in terms of standardization and maturity before they have a chance of widespread acceptance. Although relational systems give adequate performance for most applications, they are inefficient for some applications involving complex data structures (e.g., VLSI design).

To overcome such difficulties, many relational systems are being enhanced with object-oriented features, leading to object relational database systems. It appears that such extended relational systems will ensure the dominance of relational databases for the near future, with XML databases and object databases being their main competitors.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.138.144