Postrelational Databases

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

16.7. Postrelational Databases

A conceptual schema may be mapped to various logical data models. The Rmap procedure discussed in Chapter 11 assumes that the application platform is a centralized, relational DBMS. It is a fairly simple task to specify other procedures for mapping to pre-relational systems such as hierarchic or network DBMSs. However, it has been argued that relational database systems are “out of date” and should be replaced by something better (e.g., object-oriented or deductive database systems). Moreover, there is a growing trend for databases to include special support for nontraditional kinds of data (e.g., spatial data, images, and sounds) and for databases to be decentralized in one way or another. This section outlines some issues behind these movements, and some other recent trends in database research.

Although relational DBMSs have long been dominant in the commercial marketplace, traditional systems based on the hierarchic or network data model are still in use. While purely relational DBMSs suit most business applications, they may be unsuitable for complex applications such as CASE tools, computer-aided design tools (e.g., VLSI design, mechanical engineering, architecture), document processing, spatial databases, expert systems, scientific databases (e.g., genetic engineering), and communications management. Note that most of these applications involve complex objects. Moreover, these account for at most 10% of database applications. For the other 90%, a relational DBMS is quite satisfactory.

Many reasons are cited for dissatisfaction with purely relational DBMSs for complex applications. They may be too slow—they don’t perform well with complex objects mainly because they require too many table joins. They often model objects in an unnatural way (e.g., information about a single object such as a person may be spread over several tables—this gets worse with complex objects). They are dependent on value-based identifiers. They don’t facilitate reuse (e.g., they have no direct support for subtyping). They require access to a procedural language for difficult tasks (e.g., special rules and behavior), which leads to an “impedance mismatch” with the declarative, set-based, relational query language. They might not support binary large objects (BLOBs) such as images (icons, pictures, maps etc.), sound tracks, video, and so on.

Over the last few decades, various research efforts aimed to develop a next generation of DBMS to overcome such deficiencies. Some new kinds of database that emerged are object databases, object-relational databases, deductive databases, spatial databases, temporal databases, and XML databases. We’ll briefly examine each of these in turn.

Object Orientation

An object database (ODB), or object-oriented database (OODB), incorporates various object-oriented features. Historically, ODB research drew upon related developments in four areas: programming languages, semantic data models, logical data models, and artificial intelligence.

With programming languages, the need was seen for user-definable abstract data types and for persistent data. The object-oriented programming paradigm began with Simula (1967), and Smalltalk (1972). Nowadays many object-oriented programming languages exist (e.g., Eiffel), and many traditional programming languages have been given object-oriented extensions (e.g., C+ + and C#).

Some object-oriented features were taken from semantic data modeling, which as we know models reality in terms of objects and their relationships and includes notions such as subtyping (e.g., ORM, UML, and extended ER). Various ideas were also borrowed from work on logical data models (network, hierarchic, relational, and especially the nested relational model). Finally, some concepts were adapted from artificial intelligence, where structures such as frames are used for knowledge representation.

What features must a DBMS have to count as “object-oriented”? There is no commonly agreed-upon answer. A classic paper (”The OODBMS Manifesto”) on this issue was presented at the first international conference on object-oriented and deductive databases (Atkinson et al. 1989). To distinguish ODBMSs from OO-programming languages, five essential features of a DBMS were identified and then object-oriented features were added, of which eight were considered essential and five optional (see Table 16.2). Let’s look briefly at the essential object-oriented features in this proposal.

Table 16.2. Feature list in the “OODBMS manifesto” (Atkinson et al. 1989).
DBMS features	Essential OO features	Optional OO features
Persistence	Complex objects	Multiple inheritance
Secondary storage	Object identity	Type checking and inferencing
Concurrency	Encapsulation	Distribution
Recovery	Types or classes	Design transactions
Ad hoc query facility	Inheritance	Versions
	Overriding and late binding
	Computationally complete
	Extensibility

Complex objects are built from simpler ones by constructors. The manifesto proposed that these constructors should include at least set, list, and tuple, and be orthogonal (they can be applied in any order, recursively). For example, one object might be a list of sets of sets of tuples. Constructors are not orthogonal in the relational model (only sets of tuples of atomic values are allowed) or the nested relational models (e.g., the top level construct must be a set). The notion of complex objects is considered important since it allows us to model a complex structure in a direct, natural way.

The basic idea behind object identity is that objects should be identified by system-generated object identifiers (oids) rather than by the values of their properties. This is in sharp contrast to the relational model, where for instance tuples are identified by the value of their primary key.

Among other things, oids can help keep track of objects whose external, value-based identification may change with time. This may occur because of simple renaming. For example, television channel 0 becomes channel 10, or a woman changes her family name on marriage. More drastically, the reference scheme itself may change (e.g., a student identified by a student number becomes an employee identified by an employee number).

Object identifiers in the OO sense partly overcome this problem since they are rigid identifiers (i.e., they refer to the same object throughout time). They are system generated, nonreusable, immutable, and typically hidden. In ODB systems, they might be implemented as surrogates (logical identifiers or autoincremented counters, mapped by indexes to physical addresses), as typed surrogates (which makes migration between types awkward), or as structured addresses.

Encapsulation involves bundling the operations and data of an object together, with normal access to the object being through its operational interface, with implementation details hidden. For example, hiring, firing, and promoting employees are regarded as operations on Employee and are encapsulated with it. Implementations of operations are often called “methods”.

Encapsulation includes the idea that, as in a conceptual schema, objects should be classified in terms of types or classes. Moreover, some form of inheritance mechanism should be provided (e.g., so that a subtype may inherit the data and operations of its supertype(s)). A subtype may have a specialized version of a function with the same name as one its supertype functions. In this case, the specialized version will typically override the more general version when the operation on the subtype is invoked. For example, the display procedure for a colored, equilateral triangle may differ from the general display procedure for a polygon. Various overriding options are possible.

The requirement for computational completeness means that any computable function can be expressed in the data manipulation language (if necessary, by calling programming languages). The extensibility requirement means that users may define their own types, and the system should support them just like its built-in types.

Partly in response to the OODB manifesto, a group of academic and industrial researchers proposed an alternative “3rd generation DBMS manifesto” (Stonebraker et al. 1990). Here they referred to hierarchic and network systems as first generation, and relational systems as second generation. Under this scheme, third generation DBMSs are the next generation. They specified three basic tenets and 13 detailed propositions to be adhered to by the next generation DBMSs (see Table 16.3). This proposal essentially argued for extending existing relational systems with object-oriented features and effectively laid the groundwork for today’s object-relational database systems, which we’ll discuss shortly.

Table 16.3. Next generation DBMS features proposed by Stonebraker et al. (1990).
Basic tenets		Detailed propositions
1.	Besides traditional data management services, next generation DBMS will provide support for richer object structures and rules.	1.1	Next generation DBMS must have a rich type system.
		1.2	Inheritance is a good idea.
		1.3	Functions, including database procedures, methods, and encapsulation, are a good idea.
		1.4	Unique identifiers (uids) for records should be assigned by the DBMS only if a user-defined primary key is not available.
		1.5	Rules (triggers, constraints) will become a major feature in future systems. They should not be associated with a specific function or collection.
2.	Next generation DBMS must subsume previous generation DBMS.	2.1	Essentially all programmatic access to a database should be through a non-procedural, high level access language.
		2.2	There should be at least two ways to specify collections, one using enumeration of members and one using the query language to specify membership.
		2.3	Updatable views are essential.
		2.4	Performance indicators have almost nothing to do with data models and must not appear in them.
3.	Next generation DBMS must be open to other subsystem	3.1	Next generation DBMS must be accessible from multiple high level languages.
		3.2	Persistent X for a variety of Xs is a good idea. They will all be supported on top of a single DBMS by compiler extensions and a (more or less) complex run-time system.
		3.3	For better or worse, SQL is intergalactic dataspeak.
		3.4	Queries and their results should be the lowest level of communication between a client and a server.

Although supporting several of the object-oriented features, the group argued against implementing them in such a way as to negate key advances made by the relational approach. For example, all facts in a relational system are stored in tables. In contrast, some facts in hierarchic, network, and object database systems may be specified as links between structures, requiring navigation paths to be specified when the data is accessed.

Hence OO queries typically require specification of access paths. Although path expressions are often more compact than relational queries, their reliance on existing navigation links can make it difficult to perform ad hoc queries efficiently.

As an example, consider the UoD schematized in Figure 16.22. Students either hold a degree or are currently seeking one (or both). The exclusion constraint forbids students from re-enrolling in a degree that they already hold. Subjects are recorded only for current students, as shown by the subset constraint. Since this is not an equality constraint, students may enroll in a degree before choosing subjects. This aspect of the schema may alternatively be modeled by introducing the subtype CurrentStudent, to which the optional takes role is attached, using the definition Each CurrentStudent is a Student who seeks some Degree.

Figure 16.22. An ORM schema about university records.

Each subject is identified by its subject code, but also has a unique title. For some subjects, a lecture plan may be available. This lists the topics discussed in the various lectures. For example, lecture 10 for CS114 might discuss the following topics: relational projection, relational selection, and natural joins. The footnoted textual constraint qualifies (and implies) the external uniqueness constraint, requiring lectures to be numbered from 1 within each subject.

The conceptual schema of Figure 16.22 (minus some constraints) might be specified in an ODB schema roughly as follows, using a syntax based on the Object Definition Language (ODL), as specified in Cattell and Barry (2000).

class Student {
   attribute unsigned long studentNr;
   attribute string studentName;
   attribute set<string> degreesHeld;
   attribute string currentDegree; };

class CurrentStudent extends    Student {
   relationship set<Subject> takes
         inverse  Subject::is_taken_by; };

class Subject {
   attribute string subjectCode;
   attribute string title;
   attribute unsigned short credit;
   relationship set<CurrentStudent> is_taken_by
      inverse CurrentStudent::takes;
   relationship list<Lecture> has
      inverse Lecture::is_for; };

class Lecture {
   attribute unsigned short lectureNr;
   relationship Subject is_for
      inverse Subject::has;
   attribute set<string> topics; };

Although readable, this OO schema is incomplete and needs additional code for the extra constraints in the conceptual schema. We could display this on a UML class diagram, with the extra constraints in notes (e.g., exclusive-or, pair exclusion, and subset constraints). In general, OO schemas are best developed as abstractions of ORM schemas.

Notice that the OO classes correspond to the major object types abstracted from the conceptual schema. Hence they provide a useful basis for building screen forms for the application. Encapsulation involves adding generic operations as well as type-specific operations (e.g., a graduate operation might be added to Student).

One problem of the ODB approach is that it mixes too many levels together—an OO schema includes conceptual, logical, and physical elements. It also involves redundant specification of associations. For example, consider the fact type: Student takes Subject. In the OO schema this is specified twice: once on CurrentStudent and again on Subject.

In specifying the OO schema, inverses were used to declare bidirectional object links between CurrentStudent and Subject, as well as between Subject and Lecture. This enables queries in either direction to be specified using path expressions. This versatility is lost if the link is made unidirectional. Notice that no such navigational problem can exist in the relational model, although of course many queries will require joins. While joins might slow things down, there is no restriction on their use, since they do not require links to be set up beforehand. In contrast, the ODB approach obtains its efficiency by “hardwiring” in the links to be used in queries. This makes it difficult to optimize ad hoc queries.

The problem of relying on predeclared access paths in the schema itself to achieve efficiency is exacerbated when the schema evolves. We may then have to reset navigation pathways to optimize them for the new situation rather than simply relying on the system optimizer to do the job, as in current relational systems.

This is not to say that ODBs are a bad idea, or that complex objects should not be modeled as such at the implementation level. Many complex structures are awkward to model in relational terms. However, we should be able to specify such structures in a clean way, without resorting to low level mechanisms.

Unlike ODBs, relational databases are based on a single data model formally based on predicate logic. Moreover, relational DBMSs are now dominant, and many are being extended to address several of the deficiencies mentioned earlier. Such extended relational database systems are usually called object-relational database (ORDB) systems. These are essentially relational DBMSs extended by adding support for many OO features, such as extra data types (spatial, image, video, text...), constructors (arrays, sets, ...), and inheritance. Some of the major commercial systems support extended data types by allowing modules to be plugged in. Such modules may be developed by the vendor or a third party. These modules are variously called “relational extenders”, “datablades”, or “data cartridges”.

The SQL:2003 standard has also been extended significantly to include several object-oriented features (e.g., user-defined types and functions, encapsulation, support for oids, subtyping, triggered actions, and computational completeness), as well as deductive features (e.g., recursive union).

Given the massive installed base of relational DBMSs, the ongoing extensions to these products, the cost of porting applications to a new data model, and the widespread adoption of the SQL standard, it may well be that the next generation of DBMSs will evolve out of current relational products. Just as Hinduism absorbed features from other religions that threatened its existence, the relational model can probably absorb the interesting features of the object-oriented faith without being replaced by it.

It’s debatable whether all the transformations taking place in SQL-based DBMSs are actually desirable. While the use of extended data types (spatial, video, etc.) is clearly a step forward, the use of constructors (arrays, multisets, etc.) is questionable in many cases, since they can make it harder to design efficient databases for ad hoc queries and updates, and they complicate the task of writing good optimizers. For a detailed critique of the way in which many OO features have been grafted onto the relational model, see Date (2000) and Date and Darwen (1998). At any rate, relational and object-relational systems currently dominate commercially, and other contenders such as object database systems are struggling to gain any significant market share.

Although object-relational systems may well dominate the market for the near future, they may be “overkill” for some specialized or small applications. And there are things that even SQL:2003 can’t do (e.g., it doesn’t allow triggers on views) or does awkwardly (e.g., recursion).

SQL:2003 includes arrays, row types, and multisets. One challenge then is to model such collections and provide mapping algorithms to implement them efficiently. Various constructors (e.g., for sets, bags, sequences, and schemas) have been added to some versions of ORM and ER. In addition, both ORM and UML allow some kinds of collections to be expressed as mapping annotations on the conceptual schema. Although the use of collection types in conceptual models can facilitate mapping to equivalent implementation structures, it is typically much harder to model complex objects directly at the conceptual level, and they are extremely awkward to validate by verbalization and population. Complex objects also tend to be overused, and simpler solutions overlooked. For further discussion on collection types, see Section 10.4.

Nowadays many applications are built using a 3GL such as C# or Java to specify an object model for the transient, in-memory storage, and using SQL to specify a relational or object-relational database for the persistent storage. As data needs then to be moved between the object model and the relational model, considerable effort is spent on the object-relational mapping. Using a tool such as NORMA to generate both object and relational models from ORM models simplifies the task of creating an object-relational mapping process. As another way to facilitate moving data between transient and persistent models, Microsoft is incorporating LINQ (Language Integrated Query) into .NET languages such as C# and Visual Basic, allowing the programmer to use an SQL-like syntax to query databases from inside the 3GL.

Other recent trends

Deductive databases offer elegant and powerful ways of managing complex data in a declarative way, especially for information that is derived by use of recursion. Deductive systems typically provide a declarative query language such as a logic programming language (e.g., Prolog). This gives them a strong rule enforcement mechanism with built-in backtracking and excellent support for recursive rules. For example, the ancestor relation can be derived from a base parent relation and the following two rules: X is an ancestor of Y if X is a parent of Y (basis clause); X is an ancestor of Y if X is a parent of Z and Z is an ancestor of Y (recursive clause).

In contrast, SQL-92 cannot express recursive queries at all. SQL: 1999 introduced a recursive union operator, but its syntax is more complex and its execution does not enjoy built-in back-tracking. Despite their elegance however, deductive database systems have major problems to be solved (especially in the performance area) and in the short term are unlikely to achieve more than a niche market.

Although purely deductive databases are not popular, there is a growing need to enforce many Event-Condition-Action (ECA) rules. For example, on the event that client X requests an upgrade to class ‘B’ on flight Y, if the condition that X.rating = ‘premier’ and count(vacant B seats on flight Y) > 0 is satisfied, then perform the following action: upgrade client X to class ‘B’ on flight Y. Most relational DBMSs effectively support ECA rules by using triggers or procedures, and triggers are included in the SQL: 1999 standard.

Two specialized database varieties that have recently become popular are spatial databases and temporal databases. Spatial databases require efficient management of spatial data, such as maps (roads, land, etc.), two-dimensional designs (circuits, town planning, etc.), and three-dimensional designs (visualization of medical operations, molecular structures, flight paths, etc.). They provide built-in support for spatial data types (points, lines, polygons, etc.), spatial operators (overlap, contains, intersect, etc.) and spatial indexes (R-trees, quad trees, etc.). This allows efficient formulation and processing of spatial queries, such as: How many houses are there within 5 km of the proposed shopping center? Which flights fly over the greater LA area? Which diagrams are similar to diagram 37?

Previously, special systems were used for spatial data while a relational DBMS was used for alphanumeric data. The current trend is to manage both standard and spatial data in the one system. A typical application of a geographic information system (GIS) might proceed as follows: standard data on traffic accidents is entered; the road maps are displayed to highlight the accident sites; and this is now used to determine regions where extra precautions need to be taken (e.g., radar traps).

Historically, GIS vendors adopted three main approaches: hybrid (georelational—objects in a spatial file store are given identifiers that can be referenced in relational tables); integrated (spatial and non-spatial data are stored in relational tables); and object oriented (all data is stored in an ODB). More recently, many spatial database applications as well as other applications using non-standard data have been implemented using object-relational technology.

Although time has only one dimension, unlike space’s three dimensions, the efficient management of temporal information is no easy task. If historical rather than snapshot records need to be maintained about objects, time will feature largely in the modeling.

As discussed in Section 10.3, a variety of approaches may be adopted for modeling time. In some cases, we simply include object types such as Time and Period on the conceptual schema and map these like other object types. Distinctions may be needed between transaction time (when the system records a fact) and valid time (when the fact is true in the UoD being modeled). Often we need to make use of temporal relations (such as before and after) and temporal operations (e.g., to compute an interval between two time points).

The SQL standard currently includes only basic support for temporal data, and sometimes an ordinary relational database does not allow temporal aspects to be implemented efficiently. For such applications, special DBMSs known as “temporal database systems” are sometimes used; these provide in-built support for automatic time stamping and the various temporal operators.

Most work on temporal databases focuses on maintaining relevant histories of application objects through time, with the assumption that the conceptual schema itself is fixed. Moving up one level, the problem becomes more complicated if we allow the conceptual schema itself to change with time. This is one aspect of the schema evolution problem.

Moving up another level, we might allow the conceptual metaschema itself to change with time (e.g., we might decide at a later stage to allow constructors for complex object types in our conceptual schema language). The management of such higher order evolution has been addressed in research on evolving information systems. This topic provides one motivation for the next section on metamodeling.

Apart from the kind of database used, the size and spread of databases have seen a continued upward trend. Many databases are becoming very large, with users at many different sites. For this situation we need to decide whether the overall system will be centralized, distributed, or federated.

In a centralized system, the database management is controlled at a single site. Any site may send update and query requests to the central site, and results are sent back. If the sites are far apart, the communication times involved in such transactions can be very significant.

To reduce the communication overhead, a distributed database system allows the data to be spread across various sites, with most of the data relevant to a given site stored locally at that site. In the simplest case, the population of a schema might be partitioned (e.g., each branch of a bank stores data about its clients only). Typically, however, there is a need to replicate some data at more than one site, thus requiring measures to be enforced to control the redundancy. As you might guess, optimizing the performance of a distributed system requires attention to a whole new batch of problems. The research literature on distributed databases is vast, and many commercial systems provide distributed capabilities, to varying extents.

Federated databases deal with situations where there is a need for data sharing between several existing database systems, possibly heterogeneous (e.g., some relational, some hierarchic, and so on). In this framework, each individual system maintains its local autonomy and communicates with other sites on a needs basis. As the heterogeneity problem requires translation between different data models, the control of federated systems is non-trivial. The size of the problem can be reduced by supporting only partial integration; any two sites need only share the common data relevant to both of them rather than share all their data.

Different solutions have arisen to address the problems of communicating between different database systems, possibly of different types. For a long time, SQL has been used as a common language between relational DBMSs. In today’s world of eCommerce, the variety of systems that need to exchange data has grown significantly. Currently, the most popular solution to this problem is to use XML for communicating structured information between different systems.

As discussed in Section 13.9, XML is a low level, hierarchically structured, textual language that allows the specification of both schema and data. The SQL:2003 standard now includes built-in support for XML. Most major relational DBMSs also provide support for XML, including automatic conversion between relational and XML structures.

Like SQL, XML is good for communication between computer systems, but is not high level enough for humans to easily visualize and model their business domain. Hence an XML schema is best developed by first modeling in a high level language such as ORM, ER, or UML and then mapping the model to XML. For a discussion of mapping ORM to XML schema, see Bird et al. (2000).

One of the most exciting, and perhaps almost frightening, trends in modern computing has been the recent progress in artificial intelligence. If you’d like an insight into where computer science may well go in the next 50 years, have a read of Denning and Metcalfe (1997). For a radical view of where artificial intelligence may take us in the next 100 years, see Kurzweil (2005)—we disagree with some of his projections, which, among other things, assume a materialist philosophy, but they are worth thinking about.

The chapter notes provide some further discussion and references for the topics covered in this section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Postrelational Databases

Create new playlist

Sign In

Sign Up

16.7. Postrelational Databases

Object Orientation

Figure 16.22. An ORM schema about university records.

Other recent trends

Table of Contents for
Postrelational Databases