2.1. Four Information Levels

Advanced information systems are sometimes described as “intelligent”. Just what intelligence is, and whether machines will ever be intelligent, are debatable questions. In the Turing Test of intelligence, based on a revised version of a test proposed by Alan Turing in 1950,[1] an opaque screen is placed between a typical human (see A in Figure 2.1) and the object being tested for intelligence. The human can communicate with the object only by means of computer (with keyboard for input and screen for output). The human may communicate in natural language about any desired topic.

[1] Turing’s original test was based on the Imitation Game, requiring only that a computer be at least as good as a man in formulating responses intended to convey the idea that the man was actually a woman. For further discussion, see http://en.wikipedia.org/wiki/Turing_test.

Figure 2.1. The Turing test: can A distinguish between B and C?


If the human can’t tell from the object’s responses whether it is an intelligent human or a machine, then arguably the object should be classified as intelligent. Futurist Ray Kurzweil (2005) wagered $10,000 that a machine will pass the Turing Test by 2029 and feels it is likely that by 2020 a $1000 computer will reach the computational ability of the human brain (estimated at 1016 instructions per second).[2]

[2] Kurzweil (2005) also predicts that by 2035 a personal computer will match the combined intellect of the whole human race and that by 2045 it will match one billion times this combined intellect (a profound event he calls “the singularity”)! Do you think this is likely?

Two key conditions in the test are that natural language is used and that there are no restrictions on the discussion topics. Once we place restrictions on the language and confine the discussion to a predefined topic, we can find examples where a computer has performed at the level of a human expert (e.g., chess, diagnosis of blood diseases, mineral exploration). Such systems are called “expert systems” because they perform as well as a human expert in a specific domain. Expert systems have passed “restricted Turing Tests” specific to given universes of discourse.

Expert systems use sophisticated programs, often in conjunction with large but highly specific databases. A fifth generation information system (5GIS) is like a “user-definable” expert system in that it allows the user to enter a description of the universe of discourse and then conduct a conversation about this, all in natural language. Just how well the system handles its end of the conversation depends on how powerful its user interface, database management, and inference capabilities are.

Although desirable, it is not necessary that a 5GIS always be able to operate at an expert level when we communicate with it. It must, however, allow us to communicate with it in a natural way. Natural languages such as English and Japanese are complex and subtle. It will be many years before an information system will be able to converse freely in unrestricted natural language. We should be content in the meantime if the system supports dialog in a formalized subset of natural language. There may be many “formal, natural languages”, one for English, one for Japanese, and so on. A 5GIS should be able to respond in the same language used by the human. For example, suppose we posed the query:

What is the age of Selena?

and we received the reply

sanjusai.

This would not help unless we knew that this is Japanese for “30 years old”. Even if we can translate Japanese to English, we might still misinterpret the reply, because instead of assigning people an age of zero years when born, the Japanese assign an age of one year. So an age of 30 years in the Japanese system corresponds to an age of 29 years in the Western system. In addition to the requirement for a common language, effective communication between two people requires that each assigns the same meaning to the words being used. To achieve this, they should (a) share the same context or universe of discourse and (b) speak in sentences that are unambiguous with respect to this UoD.

With our example, the confusion over whether Selena’s age is 29 or 30 years results from different age conventions, one Western and one Japanese. Natural speech abounds with examples that can be disambiguated only by context. Consider “Pluto is owned by Mickey”. This is true for the world of Walt Disney’s cartoon characters. But suppose someone unfamiliar with Mickey Mouse and his dog Pluto interpreted this within an astronomical context, taking “Pluto” to refer to the minor planet Pluto—a drastic communication failure! It is essential to have a clear way of describing the UoD to the information system.

An information system may be viewed from four levels: conceptual, logical, physical, and external. The conceptual level is the most fundamental, portraying the business domain naturally in human concepts. At this level, the blueprint of the UoD is called the conceptual schema. This describes the structure or grammar of the business domain (e.g., what types of object populate it, what roles these play, and what constraints apply).

While the conceptual schema indicates the structure of the UoD, the conceptual database at any given time indicates the content or instances populating a specific state of the UoD. Since information adds meaning to the data, the term “information base” is more appropriate (van Griethuysen 1982). However, we’ll use the briefer, more popular term “database”. Conceptually, the database is a set of sentences expressing propositions taken to be true of the UoD. Since sentences may be added or deleted, the database may undergo transitions from one state to another. However, at any particular time, the sentences populating the database must individually and collectively conform to the domain-specific grammar or design plan that is the conceptual schema. To summarize:

The conceptual schema specifies the structure for all permitted states and transitions of the conceptual database.

To enforce this law we now introduce a third system component: the conceptual information processor (CIP). This supervises updates to the database by the user and answers user queries. Figure 2.2 shows the basic conceptual architecture of an information system. This diagram assumes that the conceptual schema is already stored in the system. For each application area, a different conceptual schema is entered.

Figure 2.2. Information system: Conceptual level.


Although the diagram may seem to suggest that the user is interacting directly with the CIP, the user’s interaction with the system is external rather than conceptual. The conceptual schema is not concerned with convenient user interfaces or with the physical details of how the database can be efficiently maintained. These concerns are catered for by including external and internal components within the overall architecture.

When interpreted by people, the conceptual schema and database both provide knowledge about the UoD. The combination of conceptual schema and (conceptual) database is called a conceptual model of the UoD. The model or knowledge base is a formal description of the UoD, and the CIP controls the flow of information between the model and humans. Some authors use the term “model” in the more restricted sense of “schema”. This book uses the term “model” to include both the schema and a set of facts that populate the schema.

Recall that the domain expert is a person or group of people collectively familiar with the business domain. The modeler or conceptual designer is a person or team that specifies the conceptual schema by formalizing the domain expert’s knowledge. An end user makes use of the implemented system. For a small system, the domain expert, modeler, and user might be the same person. A large system might have several partial domain experts, a team of analysts and designers, and thousands of end users.

The modeler inputs the conceptual schema to the system and has read/write access to it. Typically, most users have read-only capability for the schema, but read/write access to the database. Different interfaces might be created for different users so that some users have access to only part of the model. Thus, different users may access different subschemas of the global conceptual schema. This situation is summarized in Figure 2.3. Some users may have read-only access to the database.

Figure 2.3. Access to the model.


An external schema specifies the UoD design and operations accessible to a particular user or group of users. Here we specify what kind of facts may be read, added, or deleted, and how they are displayed. For security reasons, different user groups are often allocated different access rights (e.g., to ensure that sensitive information is not made public). For user convenience, views may be constructed to hide information irrelevant to a user group, or to collect and display information more efficiently. Different interfaces may be designed to cater for users with different expertise levels even when accessing the same underlying information.

Conceptual schemas are designed for clear communication, especially between modelers and domain experts. While they give a clear picture of the UoD, they are usually converted to a lower level structure for efficient implementation. For a given application, an appropriate logical data model (e.g., relational, object-oriented) is chosen, and the conceptual schema is mapped to a logical schema expressed in terms of the abstract structures for data and operations supported in that data model. For example, in a relational schema facts are stored in tables, and constraints are expressed using primary and foreign key declarations and so on.

The logical schema may now be realized as a physical schema in a specific DBMS. For example, a relational schema might be implemented in Microsoft SQL Server or IBM’s DB2. The physical schema includes details about the physical storage and access structures used in that system (indexes, file clustering, etc.). Different physical choices can be made for the same DBMS, and different DBMSs often differ in what choices are possible. Hence different physical schemas might be chosen for the same logical schema. Operations at the external level are converted by the system into operations at the physical level. Logical and physical schemas reside at the internal level.

One advantage of the conceptual level is that it is the most stable of all the levels. It is unaffected by changes in user interfaces or storage and access techniques. Suppose a conceptual schema is implemented in a relational DBMS, and later we wish to use an object database instead. Unless the UoD has changed, the conceptual schema can remain the same. We need only apply a different mapping procedure and then migrate the data.

If a language is the object of study it is said to be the object language. The language used to study it is then called the metalanguage. For example, you might use English as a metalanguage to study Japanese as an object language. An object language may be its own metalanguage, for example, English may be used to learn about English. Any conceptual schema may be expressed as a set of sentences, and hence may be viewed as a database in its own right.

This enables us to construct a metaschema for talking about conceptual schemas. This metaconceptual schema specifies the design rules that must be obeyed by any conceptual schema (e.g., each role is played by exactly one object type). CASE tools used to assist in designing conceptual schemas make use of such a metaschema to ensure that schemas entered by the modeler are well formed or “grammatical”.

While on the subject of grammar, let us agree to accept both “schemas” and “schemata” as plural of “schema”. Although “data” is the plural of “datum”, we’ll adopt the common practice of allowing “data” to be used in both singular and plural senses. The first exercise in the book is shown next. Answers to exercises are included in the online supplement.

Exercise 2.1

  1. Classify each of the following as A (external), B (conceptual), or C (internal).

    1. This level deals with physical details of how to efficiently store and access data.

    2. This level is concerned with providing a convenient and authorized view of the database for an individual user.

    3. This level is concerned with representing information in a fundamental way.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.234.225