Computerized applications developed for personal use, for organizations and for enterprises, access, manipulate and manage the data. The data is stored in a database for easy access and retrieval. It is essential to have an understanding of the concepts of the database system to be able to manipulate the database and manage it properly. The purpose of this chapter is to introduce you to the fundamentals of the Databases.
Databases and database systems are essential parts of our life. We have been interacting with non-computerized databases since long time, remember, looking for a word in a dictionary or finding the telephone number of your friend from a telephone directory. Computerized databases use computer to store, manipulate, and manage the database. In our daily lives we interact with the computerized databases when we go for the reservation of railway tickets and movie tickets, for the searching of a book in a library, to get the salary details, to get the balance of our account while using an ATM, to get the rate list while purchasing items from a cash and carry store and, this list can run into several pages. Lately, databases (Figure 12.1) are also used to store highly data intensive items like photographs (http://flickr.com), video clippings (http://YouTube.com), and images (http://images.google.co.in).
This chapter discusses the database approach and its advantage over the traditional file-based approach. The chapter describes the components of the database system and the various data models. A brief description of the architecture of the database system is also given. In the end, the chapter highlights the different types of database architecture and the database applications.
Figure 12.1 Databases
Database is a repository or collection of logically related, and similar data. Database stores similar kind of data that is organized in a manner that the information can be derived from it, modified, data added, or deleted to it, and used when needed. Some examples of databases in real life situations are: dictionary—a database of words organized alphabetically along with their meaning, telephone directory—a database of telephone numbers and addresses organized by the last name of people, railway timetable— a database of trains organized by train names, and, companies listed on Stock Exchange organized by names alphabetically.
A database is defined as—(1) a collection, or repository of data, (2) having an organized structure, and (3) for a specific purpose. A database stores information, which is useful to an organization. It contains data based on the kind of application for which it is required. For example, an airline database may contain data about the airplane, the routes, airline reservation, airline schedules etc.; a college database may contain data about the students, faculty, administrative staff, courses, results etc.; a database for manufacturing application may contain data about the production, inventory, supply chain, orders, sales etc.; and a student database may contain data about students, like student names, student course etc.
In the early days, data was stored in files. For an application, multiple files are required to be created. Each file stores and maintains its own related data. For example, a student information system would include files like student profile, student course, student result, student fees etc. The application is built on top of the file system. However, there are many drawbacks of using the file system, as discussed below—
Figure 12.2 Data inconsistency and data redundancy
Database approach provides solutions for handling the problems of the file system approach. The emergence of database approach has resulted in a paradigm shift, from each application defining and maintaining its own data (file-oriented approach)—to the data being defined and administered centrally (database approach). In the database approach, data is defined and stored centrally.
The main characteristics of the database approach are defined as follows:
In addition to the advantages highlighted above, there are several other implications of using the database approach like provision of multiple-user interfaces, representation of complex relationships, concurrent data access etc.
There are, however, some disadvantages of database systems over the file-based systems. The database systems are more vulnerable than file-based systems because of the centralized nature of a large integrated database. If a failure occurs, the recovery process is more complex and sometimes may result in lost transactions.
The information stored inside a database is represented using data modeling. The data model describes the structure of the database. A data model consists of components for describing the data, the relationships among them, and the semantics of data and the constraints that hold data. Many data models exist based on the way they describe the structure of database. The data models are generally divided into three categories as follows:
Schema is the logical structure of the database. A schema contains information about the descriptions of the database like the names of the record type, the data items within a record type, and constraints. A schema does not show the data in the database. The database schema does not change frequently.
Instances are the actual data contained in the database at a particular point of time. The content of the database may change from time to time.
The conceptual data model is a description of the data requirements of the user. This model is not concerned with the implementation details. It ensures that all the functional and data requirements of the users are specified, conceptually. The conceptual model is defined using terms like (1) Entity, (2) Attribute, and (3) Relationship. The Entity-Relationship model (E-R model) is an example of conceptual data model. The following subsections, briefly describe these terms and the E-R model.
An entity is the basic unit for modeling. It is a real-world object that exists physically or conceptually. An entity that exists physically is a tangible object like student, employee, room, machine, part or supplier. An object that exists conceptually is a non-tangible object like an event or job title. For e.g. student information system may consist of entities like student_profile, marks and course. A set of entities of the same type having same properties, or attributes is defined as an entity set. For example, a set of all persons who are students of the university can be defined as an entity set student. Likewise, the entity set course may represent a set of all courses offered by a University. An entity set is usually referred to by the same name as an entity. For example, Student is an entity set of all student entities in the database. Diagrammatically, an entity is represented using a rectangle. Figure 12.3 shows the entities for a database created in MS-Access 2007.
An attribute describes some property or characteristics of the entity. For e.g. student name, student address, and student age are attributes of the entity student_profile. An attribute ensures that similar information is stored about each entity in an entity set, in the database. However, each attribute of an entity may have its own value. The set of permitted values for an attribute is called the domain of the attribute. For example, the domain of the attribute student name is text string, or, the domain of student age is a positive integer ranging between 18 and 65. Diagrammatically, an attribute is represented as an ellipse connected to the entity with a line. Figure 12.3 shows the attributes for the entity “Employee” in MS-Access 2007.
Figure 12.3 Entities and attributes
An association or link between two entities is represented using a relationship. A set of relationships of the same type form a relationship set. For e.g. “student enrolls in course” is a relationship set between the entities student and course. Cardinality ratio is the number of entities to which another entity gets associated in a relationship set. The cardinality ratio of a relationship set is any one of the four kinds—(1) One-to-One, (2) One-to-Many, (3) Many-to-One, and (4) Many-to-Many. In two entity sets A and B, the different cardinality ratio imply the following:
Diagrammatically, a relationship is represented using a diamond connected to the entity by a line. Figure 12.4 shows the different kinds of relationships.
E-R model is a model of the real world. E-R model represents the entities contained in the database. The entities are further described in the database using attributes. The relation between the entities is shown using the relationships. The model also shows the cardinality constraints to which the database must adhere to. The E-R model is represented diagrammatically using an E-R diagram. Figure 12.5 shows a simple E-R diagram. The diagram shows two entities—Student and Course. The Stud_Name and Stud_RollNo are the attributes of the entity Student. The Course_Id and Course_Name are the attributes of the entity Course. The Admission relationship associates the student with the course. Database design in E-R model is converted to design in the Representation Model which is used for storage and processing.
Figure 12.4 Relationships (a) One-to-one (b) One-to-many (c) Many-to-many
Figure 12.5 E-R diagram
Object-Oriented (OO) database model use object-oriented programming. The object data model defines a database in terms of objects, classes, and methods. It is beneficial in web applications, and in specialized application areas like multimedia, manufacturing system, and engineering design, and for the storing of multimedia objects.
The Conceptual Data Model is transformed into the Representation Data Model. Representation data model uses concepts that are understood by the end-user and are also close to the way the data is organized in the computer. These models hide the details of data storage. The data models are broadly classified as traditional data models that include—(1) hierarchical, (2) relational, and (3) network data models. Object-relational data model is an emerging data model.
The Relational Database Model was proposed in 1970 by E. F. Codd. The first commercial system based on the relational model became available in early eighties. Relational database model is the most common type of database model. Table, record, field, key, and data values are the terms associated with a relational model. The data elements are stored in different tables made up of rows and columns. The data in different tables are related through the use of common data elements. We briefly define the terms as follows—
Figure 12.6 Table “Employee”
In its simplest form, a database may contain a single table stored as a file. In more complex databases, a collection of related tables may be stored together. For example, a student database may contain a table for student profile, student result, student courses etc. Figure 12.6 shows a table “Employee”, with fields, records, data values & primary key.
Several commercial products like DB2, ORACLE, SQL Server, SYBASE, and INFORMIX are relational databases. The Relational Data Model is extended to include the object database concepts thus forming the Object-Relational Data Model.
The Hierarchical DatabaseModel was developed by IBM and is the oldest database model. The hierarchical database model is defined as follows—
Figure 12.7 Hierarchical database model
The Network Database Model was formalized by DataBase Task Group (DBTG) group of Conference on Data Systems Languages (CODASYL) in the late 1960s. The Network Database Model is defined as follows:
Figure 12.8 Network database model
The hierarchical model is simple to construct. However, the network model is able to handle complex relationships easily. Both hierarchical and network models are also called Segacy models as most current systems support the Relational Model.
The Physical Data Model describes the internal storage structures, the access mechanism, and the organization of database files. Physical level describes how a record (e.g., customer) is stored. Sequential files, direct files, and indexed sequential files are the different types of file organizations at the physical level.
A bank, hospital, college, university, manufacturer, government are some examples of organizations or enterprises that are established for specific purposes. All organizations or enterprises have some basic common functions. They need to collect and store data, process data, and disseminate data for their various functions depending on the kind of organization. Some of the common functions include payroll, sales report etc.
A database system integrates the collection, storage, and dissemination of data required for the different operations of an organization, under a single administration. A database system is a computerized record keeping system. The purpose of the database system is to maintain the data and to make the information available on demand.
A database system has four main components—(1) Users, (2) Hardware, (3) Software, and (4) Data.
The architecture of a database system provides a general framework for database systems. Different database systems, small or big, may not support all aspects of the architecture; however, different systems can be matched to the framework. The architecture for database system is proposed by ANSI/SPARC study group and is called ANSI/SPARC architecture. The purpose of the architecture is to make databases more independent of the application that is using the database. Schema contains information about the description of the database.
The ANSI/SPARC architecture is divided into three levels (Figure 12.9), as follows:
Figure 12.9 Three schema architecture
The interrelated set of data that forms the database needs to be stored and managed, so that the database can be accessed for the retrieval of data, and for insertion, deletion, or updating of data. DBMS is a software system for creating, organizing and managing the database. DBMS handles all access to the database and manages the database. Managing the database implies that it provides a convenient environment to the user to perform operations on the database for creation, insertion, deletion, updating, and retrieval of data. DBMS defines the scope of the use of database. This keeps data secure from unauthorized access. The functionality of DBMS includes—(1) the database that contains interrelated data, and (2) a set of programs to access the data. The DBMS implements the three schema architecture—internal schema at internal level, conceptual schema at conceptual level, and external schema at external level. The different DBMSs may not provide a clean distinction between the three levels, but generally they follow the three schema architecture.
Mappingis a term used for transforming requests and the results between different levels of the DBMS architecture. In a DBMS, the user interacts with the DBMS through the external view, and the data in the DBMS is stored at the physical level. Any request from the user, would require that the DBMS access the data and then present it to the user as specified in the external schema. This requires transforming the request from one level to the other, or mapping.
Some of the common commercial DBMSs are—Oracle Database, IBMs DB2, Microsoft’s SQL Server and Microsoft Access. MySQL is a popular open source DBMS.
Data independence is defined as the ability to change a schema at one level without affecting the schema at another level. The mapping from the external level to conceptual level and from the conceptual level to the physical level provides two types of data independence—Logical Data Independence and Physical Data Independence.
The concept of data independence is similar to the concept of abstract data types in programming languages, where the interface is presented to user and the implementation details are hidden.
Data dictionary stores the data about the actual data stored in the database. Data dictionary contains metadata, i.e. data about the data. Metadata is the data that describe the properties or characteristics of other data. Some of these properties include data definitions, data structures and rules or constraints, the data item, the data type, length, minimum and maximum allowable values (where required) and a brief description of each data item. Metadata allow database designers and users to understand what data exist and what the data mean. Data dictionary keeps track of the following:
DBA is a person or a group of persons who have centralized control of the database. DBA coordinates all activities of the database system. DBA has a good understanding of the database and of the needs and resources of the organization where it is installed. The DBA is responsible for creating, modifying and maintaining the three levels of DBMS architecture.
The functions of DBA include—(1) defining of schema, (2) defining of storage structure and access method, (3) modification of schema and physical organization, (4) granting user authority to access the database, and (5) acting as liaison with users, monitoring performance, and responding to changes in requirements.
In a DBMS, there is a need to specify the conceptual schema and the external schema. Once the schema is defined and data is filled in the schema, there is a need to manipulate the data, i.e. insertion, deletion, and modification of the data. For all these purposes, DBMS provides a set of languages— Data Definition Language (DDL), and Data Manipulation Language (DML). DDL is used by database designers for defining the database schema. DML is used for the manipulation of data. Structured Query Language (SQL) is a relational database language that represents a combination of both DDL and DML.
DDL is a specification notation for defining the database schema. For example, a DDL command in SQL is as follows—
create table student (
stud_rollno char(10),
stud_name char(20),
stud_age integer)
The create table command creates a table named student having three fields—stud_rollno of type character and size 10, stud_name of type character and size 20, and stud_age of type integer.
The DBMS has a DDL compiler which processes the DDL statements and generates a set of tables which are then stored in a data dictionary.
DML is a query language for accessing and manipulating the data stored in database. Data Manipulation Languages are of two main types—procedural, and non-procedural language. In a procedural language, user specifies both the data that is required and how to retrieve that data. The procedural language is embedded into a general-purpose programming language. However, in a non-procedural language, user specifies only what data is required without specifying how to retrieve the data. SQL is the most widely used non-procedural query language. For example, the SQL query to find name of the student having roll no A121, in the table student is—
select student.stud_name
from student
where student.stud_rollno = ‘A121’
The DML commands (Figure 12.10) when used in a standalone interactive manner is called the query language.
Figure 12.10 DML commands
The computers has evolved from big mainframe computers to small desktop personal computers. The advances in computer and its architecture have also resulted in the advances in databases and its architecture. Here, we discuss the architecture of databases of three kinds—Centralized DBMS Architecture, Client-Server Architecture, and Distributed Databases.
Figure 12.11 Centralized database
Figure 12.12 Client-server architecture (a) 2-Tier (b) 3-Tier
Figure 12.13 Distributed databases
Databases range from those designed for a single user with a desktop computer to those on mainframe computers with thousands of users. The database applications can be for different purposes like—(1) personal databases that support one user with a stand-alone personal computer, (2) workgroup databases for a small team of people (less than 25) who work in collaboration on a project, (3) departmental databases designed to support the various functions and activities of a department (a functional unit of an organization), and (4) enterprise databases to support organization-wide operations and decision making. Data warehouse is an enterprise database.
The DBMS commonly provides tools to the application programmer for application development. Some of the tools provided by DBMS include tools for screen, menu, and report generation, application generators, compilers, and data and view definition facilities. Modern database systems provide language components that are much more powerful than those of traditional languages. For example, Developer 2000/PowerBuilder for Oracle, and Visual Basic for Microsoft SQL server.
Application Programmers |
Data inconsistency |
Instances |
Application software |
Data Independence |
Integrated data |
Attribute |
Data Integrity |
Internal (Physical) Level |
Backup and Recovery |
Data Manipulation Language |
Key |
Cardinality Ratio |
(DML) |
Logical Data Independence |
Centralized architecture |
Data Models |
Low-level Data Model |
Client-Server Architecture |
Data redundancy |
Mapping |
Conceptual Data Model |
Data Security |
Network Database Model |
Conceptual Level |
Data sharing |
Owner-member |
Data |
Data Values |
Parent-child |
Database |
Distributed Databases |
Physical Data Independence |
Database administrator |
Domain |
Primary key |
Database Administrator |
End-Users |
Record or Row |
(DBA) |
Entity |
Relational Database Model |
Database Approach |
Entity set |
Relationship |
Database Architecture |
E-R Model |
Schema |
Database Management System |
External (View) Level |
Shared data |
(DBMS) |
Field or Column |
Software |
Database system |
File-Oriented Approach |
Structured Query Language |
Data Definition Language |
Hardware |
(SQL) |
(DDL) |
Hierarchical Database |
Table |
Data dictionary |
Implementation Data Model |
User Interface |
Section 12.2—12.2.2
1. Define: (1) Database, (2) Data redundancy, (3) Data consistency, and (4) Data Sharing.
2. List the differences between file-oriented approach and database approach.
3. List the characteristics of the database approach.
4. Explain the characteristics of database approach.
5. “Data redundancy provides opportunity for data inconsistency”. Explain.
Section 12.2.3—12.2.4.4
6. Define: (1) Data model, (2) Schema, (3) Instance, (4) Entity, (5) Attribute, (6) Domain of attribute, and (7) Cardinality Ratio.
7. How are entity and attribute related?
8. Explain the different kinds of cardinality ratios.
9. Name the symbols used to represent (1) entity, (2) attribute, and (3) relationship in an E-R diagram.
10. Give an example of conceptual data model.
11. What is an E-R diagram?
Section 12.2.5–12.2.5.1
12. Name the different types of representation data models.
13. Who proposed the Relational Data Model?
14. How are the terms table, record, field, key, and data values related?
15. What is the purpose of the key in relational data model?
16. Name two commercial relational databases.
17. What is a primary key?
18. Define the terms: (1) table, (2) record, (3) field, (4) key, and (5) data values.
Section 12.2.5.2–12.2.6
19. Describe the features of Hierarchical Data Model.
20. The schema of a hierarchical database is represented using a diagram.
21. The schema of hierarchical database is based on_ relationship.
22. Name one commercial hierarchical database.
23. The hierarchical database model was developed by_
24. The schema of network database model is represented using a_ diagram.
25. The schema of network database is based on relationship
26. Describe the features of Network Data Model.
27. Name one commercial network database.
28. What is the purpose of Physical Data Model?
Section 12.3–12.3.2
29. Define a database system.
30. List the components of database system.
31. Explain in detail the components of database system.
32. Explain the role of different categories of users in a database system.
33. What is the purpose of application software and user interface software in the database system?
34. “The data in a database system is integrated and shared”. Explain.
35. Name the three levels of ANSI/SPARC architecture.
36. What is the purpose of internal level, conceptual level and external level in the ANSI/SPARC architecture?
37. Explain the three schema architecture of database system in detail.
Section 12.4–12.4.4.2
38. Define: (1) DBMS, (2) Mapping, and (3) Data independence.
39. What is the need of mapping in a DBMS?
40. Name two commercial DBMS software.
41. What is logical data independence?
42. What is physical data independence?
43. Define a data dictionary.
44. Explain the function of data dictionary.
45. What are the functions of DBA?
46. What is the need of database languages?
47. DBMS provides two database languages_ and_
48. What is the use of DDL?
49. What is the use of DML language?
50. What is a Structured Query Language?
Section 12.5
51. What is a centralized DBMS architecture?
52. Explain the client-server architecture.
53. How is the two-tier client-server architecture different from the three-tier client-server architecture?
54. What are distributed databases?
55. Name some database applications.
Extra Questions
56. Give full form of the following abbreviations:
57. Write short notes on:
58. Give differences between the following:
18.218.187.108