1

FUNDAMENTALS OF DATA REPRESENTATION

Data structure is the study of concrete implementations of frequently occurring abstract data types. An abstract data type is a set, together with a collection of operations on the elements of the set. There are several terms we need to define carefully before we proceed to different types of data structures such as arrays, stacks, linked list, and so on.

The meaning of data representation is introduced in Section 1.1. In Section 1.2 there are definitions of data types, data object, and data structure. The notion of abstraction is very important in computing. We are particularly interested in its application to data stored in a digital computer. In Section 1.3 we will introduce the concept of data abstraction and abstract data types.

A data type is an abstract concept defined by a set of logical properties. Once such an abstract data type is defined, it is important to know how to implement it in a machine. Section 1.4 describes the system-defined data types. Section 1.5 will highlight the concepts of primitive data structure. C language is used in this book since it is used globally and continues to grow in popularity.

1.1 BASIC CONCEPTS OF DATA REPRESENTATION

The study of any aspect in computer science involves the processing of information. Data is defined as a raw fact but information is called processed data. A data value is a piece of data that we can consider as a single entity. We might consider the integer value 123 as a single value. If a data value can be decomposed into component parts, we call each part a component element. An atomic data value is a piece of data that we choose to consider as a single, non-decomposable entity. For example, the integer 45923 may be considered as a single decomposable entity. If we wish to decompose it into 4, 5, 9, 2, and 3, we may do so.

A natural level at which to stop the decomposition of data values stored in a digital storage medium is the bit. Logically, we may think of a bit as a data element that must have at any time one of the two values, and we will assign it the numeric values 0 and 1. Of course, we may decompose these if we wish. If the value is stored on a magnetic disc, for example, it is represented by an electromagnetic signal which is recorded on or in the disc surface. Taking the abstract point of view, we will ignore how the values are physically stored. We might think of this point as one boundary between hardware and software.

In computers the most widely used method for storing integers is binary number system. The base of this system is 2. Each bit position represents a power of 2 with a 20 in LSB (least significant bit), 21 next to LSB, and so on. For example, 10010 represents the integer n bit x 2° x 0 + 21 x 1 + 22 x 0 + 23 x 0 + 24 x 1 = 18. In this representation a string of n bit represents integer numbers between 0 and 2n – 1. The negative binary numbers are stored in a two-complement form. Given n bits, the range of numbers that can be represented is –2(n-1) to 2(n-1) –1.

Real numbers, in computers, are stored in a floating-point notation. In this representation, a real number is expressed in two parts, mantissa and exponent. The base of an exponent is usually fixed, and the mantissa and exponent vary to represent different real numbers. For example, the decimal number 125.55 could be represented as 12,555 x l0–2. The mantissa is 12,555 and the exponent is –2. The advantage of this representation is that it can be used to represent numbers with extremely large or extremely small absolute values. Usually in a 32-bit word length, 24 bits are reserved for mantissa and 8 bits for exponent. The size of mantissa and exponent depends on the machine configuration.

Data is not always interpreted numerically but is often stored in a non-numeric form. The number of bits necessary to represent a character in a particular computer is called the byte size and a group of bits of that number is called a byte. For character representation two types of code are normally used, American Standard Code for Information Interchange (ASCII) and Extended Binary Coded Decimal Interchange Code (EBCDIC). Both use a byte to represent a character. So 256 possible characters can be represented using these codes with a size of 1 byte. For example, in ASCII the capital letter ‘A’ is represented by the decimal number 65.

In computers, the internal representation of an integer or real or character is a string of bit pattern. For example, the bit string 01100110 can be interpreted as the number 66 (in binary coded decimal), which represents the character ‘B’. A method of interpreting a bit pattern is often called a data type. We use several data types such as binary, real, and so on, in the context of their representation in the computer. In the next section, we will describe the basic concept of data types related to abstraction of data.

1.2 DATA TYPE

A data type is a collection of values along with a set of operations defined on those values. The essence of a type is that it attempts to identify qualities common to a group of individuals or objects that distinguish it as an identifiable class or kind. In a programming language, the data type of a variable is the set of values that the variable may assume. The basic data types vary from language to language.

Let us look at two classes of data types. A simple, or basic, data type is made up of values that cannot be decomposed. In ‘C’, they are int (for integers), float (for real), char (for characters), and so on. A composite data type, also called a data structure, is one in which the elements of the data type can be decomposed into either simple data types or other composite data types. Examples of composite types include the familiar array and structure in C language. In data structure the values of data types are decomposable, and we must therefore be aware of their internal construction. There are two essential ingredients to any object that can be decomposed—it must have component elements and it must have structure, the rules for relating or fitting the elements together.

The operations of a structured data type might not only act on the values of the data type, they might also act on component elements of the data structure. We now present the formal definitions of some terms that must be known to the readers.

In a programming language, a data type is a term that refers to the nature of data which variables hold. In C the data types are int, float, char, short, unsigned, double, long, and so on. These are built-in data types and type def in C can be used to construct new data types.

Data object refers to a set of elements, say F. For example, the data object ‘float’ refers to F={0, ± .5, ± .6….}.

Similary, the data type ‘int’ in C language refers to data object integers, that is, a variable of int data type can hold only integer-type data objects.

We are not only interested in the content of data objects but we also need to know the way they are related. A data structure is a data type whose values are composed of component elements that are related by some structure. Since a data structure is a data type, it must have a set of operations on its value. Further, there may be operations that act on its component elements. We can write program using the operations defined on the data and its structure. We imagine an abstract data type in our program and we can do so, being concerned with neither how the data will be represented in the computer nor the details of the code that implements the operations.

In the next section, we will present data abstraction and abstract data types.

1.3 DATA ABSTRACTION AND ABSTRACT DATA TYPES

One of the most powerful ideas in programming and problem solving is the concept of abstraction—the ability to view something as a high-level object while temporarily ignoring the enormous amount of underlying detail associated with that object. Another way to represent abstraction is viewing something only in terms of its external appearance without regard for its internal implementation. It is difficult to manage a complex system without abstraction.

We can define an abstraction more formally as an idea that concentrates on the essential properties of something rather than on concrete realization or actual cases.

In computer science the process of abstraction is to simplify by separating the essential qualities of data, their structure and operations, from the inessential details of their representation and implementation. One of the basic problems in computer science is the amount of complexity to be reduced in the software that we might wish to build. Our approach is therefore to begin the study of each data structure by considering only the specification of its abstract data type, independent of its representation and implementation. This simplifies the study of the data structure. Thus we attempt to bring the power of abstraction to bear on the study of data structure.

The abstract data type approach lends itself in a natural way for separating the specification of a data type from its implementation. The implementation can then assure that the integrity of the data structure is protected.

We can think of an abstract data type (ADT) as a mathematical model with a collection of operations defined on that model. We can define the abstract data type as follows.

An abstract data type indicates a data type that exists as a product of our imagination and concentrates on the essential properties of the data type, ignoring implementation constraints and details.

There are several important advantages associated with the study of data structure from the point of view of abstract data types. Here we will discuss some of them.

As defined earlier, an abstraction is an idea that concentrates on the essential properties rather than on concrete realizations or actual cases. The objective is thus to simplify by isolating the essential qualities of data, their structure and operations, from inessential details of their representation and implementation. This has the effect of simplifying the study of the data structure. Thus we attempt to bring the power of abstraction to bear on the study of data structures. In order to do that, we provide a template to view and discuss each data structure. This template is called an abstract data structure and consists of three basic components: (i) specification, (ii) representation, and (iii) implementation.

Our approach throughout the book is to implement data structures using modules. These modules here act as black boxes. The user has no direct access to the data structure since the data structure and the algorithms are encapsulated within the module. The integrity of the data structure is protected because the user gets control over it through operations that are separately specified and implemented. The implementation is done very carefully and in such a way as to assure preservation of the integrity of the data structure. This is an advantage to users for designing a software system.

Another advantage is maintainability. Implementation independence frees the user from non-functional details. The implementation may be changed with no effect on the way in which the program executes. It may, however, affect the performance, that is, time, space, and maintainability. For example, if we change the basic technique used to perform an operation—without changing the operation performed—then the user may see a change in the performance for the module containing that operation but will not see any change in the results produced. The user is protected from changes in the way in which operations are implemented.

If an abstract data structure is to be more than mere theoretical interest, it must be implemented. Although the user still deals with the abstract conception of the structure, and indeed, the notion of abstract data structuring is to guarantee that the user need deal with no more than abstraction. The implementor must face the problems of representation and implementation. As was told earlier, the abstraction can be treated as the functional specifications of a black box. The implementor must design the box in such a way that memory space is not wasted and the operations are performed simply and efficiently. The implementor must be familiar with the physical data type and virtual data type. We often implement our data type using some high-level language. For example, in C language we might define the variables A, B, and C as integer, real, and character data type,

         int A;
         float B;
         char C [ 100 ];

We call the above as virtual data type. Eventually any structure is stored in a physical memory to be operated on a physical machine, that is, computer. The actual physical operations that the machine can perform are limited to those in its machine language. We will call a data type at this level a physical data type. Thus abstract data types are implemented with virtual data types. Virtual data types are translated into physical data types.

In summary, we have understood the basic idea of an abstract data type. Many different modules can be written that implement the same abstract type. Advantages of abstract data types are highlighted.

1.4 SYSTEM-DEFINED DATA TYPE

In defining an abstract data type as a mathematical concept, we do not consider the implementation issue. Often no implementation, hardware or software, can model a mathematical concept completely. For example, an arbitrarily large integer cannot be represented due to the finite size of the machine's memory. Thus, it is not the data type ‘integer’ that is represented by the hardware but rather the data type ‘interger between a and b’, where a and b are the minimum and maximum integers representable by that machine. Once a representation has been chosen for objects of a particular data type and routines have been written to operate on those representations, the programmer is free to use that data type to solve a problem. The programmer need not worry about how the computer is designed and what circuitry is used to execute each instruction. The programmer needs know only what instructions are available and how those instructions can be used. The programmer must know about the data types which are available in the system.

A data type is an abstract concept by a set of logical properties. We can define a limitless number of data types but system considerations are necessary before implementing the data types. In hardware implementation, proper circuitry is necessary to perform the requisite operations and software implementation includes specifications of how such data types are to be manipulated.

Every computer system has a set of ‘native’ data types. It is the programmer's responsibility to know what data types are available in the system and how they are stored in memory.

C has the ‘usual’ simple data types: characters, integers, and numbers with fractional components. In addition, C allows variants of some of these types. Simple types include char, int, float, and double. These types differ in the sort of information they contain and in the amount of storage space allocated to them on different systems, ranging from 1 to 8 bytes.

C's character data type is known as char. C allocated 1 byte for storing a character. We can store at the most 256 values. The integer data type, int, is used to represent whole numbers within a specified range of values. Variables of type int are usually stored in 2 bytes ranging from –32,768 to 32,767. Internal representation of a whole number can be treated as a character or an integer.

The real numbers are represented by data type float. C allocated four types to represent variables of floating type. The double type is for double-precision floating-point numbers. For doubles, C allocates 8 bytes as storage space.

In addition to simple types, C provides other types, which are variations of char and int. These types are essentially for a different amount of memory to be allocated for storing a value. Signed types, where numbers can be positive or negative, are standard in most languages. C also allows us to declare integers as unsigned, so the sign bit is used as part of the number rather than as a sign indicator. Short and long are requests for versions of the int type for which different amount of storage may be allocated. The amount of space allocated for these variants depends on the implementation. For example, short reserves 1 byte while long requires 4 bytes as storage space. Finally, to represent a number as a hexadecimal constant, Ox or OX is placed before the hexadecimal representation and octal integers start with the digit 0.

1.5 PRIMITIVE DATA STRUCTURES AND THEIR REPRESENTATION

In this section we discuss the primitive data structures which are commonly used to solve problems with a computer system. Primitive data structure is defined as a structure which can be operated by machine-level instructions. These structures are the basis of all other types of data structures.

We begin the discussion of primitive data structures by examining integer and real numbers. A quantity representing an object is discrete in nature and can be represented by an integer. The integer is also used to represent whole numbers. For example, total number of students in a class, the number of passengers in a train, and so on, are all information items expressible as integers. Integers are represented in signed magnitude form, signed complement form, and signed 2's complement form. Although the first form is probably the simplest in concept, other methods such as two's complement representation are used in modem computer systems to simplify the design of computer circuitry. The data type integer provided by C can be viewed as an abstract data type whose specification is as follows.

(a) It reserves 2 bytes for int and unsigned int but 1 byte for short int.

(b) The elements are the whole numbers from – maxinteger to maxinteger. The value of maxinteger is implementation dependent.

(c) Integers are both ordered and linear.

(d) The set of operations is implementation dependent.

Real numbers can be represented by either fixed-point representation, such as in 15.75 or floating-point representation, such as ·1575 × 102. Floating-point representation is the most common storage structure used for real data. In this representation, the real number is expressed by mantissa and exponent. For example, ·1575 x 102 is expressed with a mantissa ·1575 and exponent 102, with a radix 10. The radix and the number of digit positions represented with a floating-point format vary from one computer to another. Floating-point reals in C can be viewed as an abstract data type with the following properties.

(a) The values are a finite subset of the real numbers. The actual subset is implementation dependent.

(b) The structure is ordered.

(c) Typical operations are assignment, arithmetic, relational, and so on.

(d) Four to eight bytes are required to store real numbers.

Usually, the sign is the first bit, that is, MSB (most significant bit) in a floating-point representation, and by convention 0 denotes a positive number and 1 denotes a negative number (this is also true in case of integer-number representation). The biased exponent is an expression of the exponent in a form of notation called excess notation. In a seven-digit field, we can express non-negative integers in the range 0 to 127 (in excess notation) though we are capable of representing integers in the range – 64 to + 63 in 2's complement representation. A floating-point number with an exponent of –25 would have a characteristic of – 25 + 64 = 39 in excess – 64 notation. Further, the mantissa part of a floating-point number is expressed as a normalization form, that is, there is no significant digit before the decimal point. For example, 0.4272 is in normalized mantissa whereas 4.272 is not in normalized form.

In addition to a floating-point representation of real numbers, a fixed-point storage representation is also possible. Real numbers are stored similar to the structure involving integer numbers. In C language both representations are available with different control parameters (such as %f and %e).

A wide variety of character sets or alphabets are handled by most computers. The two most widely used codes are the ASCII and the EBCDIC. Characters are used as a primitive data structure since they are useful in expressing much of the non-numeric information which can be processed by a computer. Each character is stored as a fixed number of bits in the computer's memory. A common technique for storing characters in a computer's memory is to store each character in 1 byte. One byte is a sequence of 8 bits. In many digital computers it is the smallest unit of information that can be addressed directly. Such machines operate efficiently on individual characters.

C allocates 1 byte for storing a character. This means that we can have at the most 256 character values. Generally, the byte used to store a character variable is interpreted as having values ranging from –128 to 127. Depending on the character encoding scheme used, positive values of a character variable correspond to particular characters. For example, a character value of 100 corresponds to the character ‘d’ in the ASCII character set used on most systems. Negative values generally do not have a ‘useful' interpretation. Also, integer numbers and characters can be used interchangibly with the control parameters %d and %c.

A logical data item is a primitive data structure that can assume the value of either ‘true’ or ‘false’. C has three classes of operators: arithmetic, relational and logical, and bitwise operators. The key to the concept of relational and logical operators is the idea of true and false. In C, true is any value other than 0 and false is 0. Therefore, expressions that use relational or logical operators will return 1 for true and 0 for false. The three most common logical operators are ‘AND’ (&&), ‘OR’ (!!), and ‘NOT’ (!). If A and B are logical variables, then A && B is true if A and B both have the value true, otherwise, the result is false. The result of A!!B is false if A and B have the value false, otherwise the result is true. !A is false if A is true or !A is false if A is true. Logical variables are used often to represent complex logical expressions and also as terminating conditions in the loop evaluation. Relational and logical operators always produce a result that is either 0 or 1 and bitwise operators are used to change the values of variables, not to evaluate true or false conditions.

The storage representation of logical values is dependent upon the compiler and the machine for which the compiler is designed. One bit is sufficient to represent true or false but because of the difficulty most computers have in isolating a single bit, it is common to find an entire byte. Most computers cannot address a single bit in their memory but must address atleast a byte. Therefore, 8 bits at a time are fetched into the registers. The single bit representing the boolean quantity would then have to be isolated by masking out all of the others. Most designers have chosen to sacrifice some memory space to avoid their complexity and use the whole byte to represent boolean quantities.

A pointer is a reference to a data structure. It is a word or portion of a word in memory, which, instead of containing data, contains the address of another word or byte. Pointer is a single fixed-size data item and it provides a homogenous method of referencing any data structure. Pointer permits faster addition and deletion of elements to and from a data structure. In terms of storage representation, addresses are generally assigned a word or half a word of storage in most computers. Thus, the larger the number of addresses in the computer, the larger the amount of storage needed to represent an address. In the next chapter we will describe fundamentals of data structure.

EimagesXimagesEimagesRimagesCimagesIimagesSimagesEimagesS

1. What are the two basic data types? How are they defined in C language?

2. Define the term ‘data object’. How is it different from abstract data type?

3. How many components are there in abstract data structure? Explain the term ‘maintainability’.

4. Why is system-defined data type different from primitive data types? Explain.

5. Describe the specifications of integer data type.

6. What are the different data types that are available in C language?

7. Are the following system-defined data types? Give reasons for your answers.

(a) Files

(b) Pointers

(c) Enumerated data types

8. Explain why it is not possible to support opaque types in C.

9. Describe one or two situations in your everyday life where you use the idea of abstraction to simplify large tasks that you need to perform.

10. Describe the nature of pointer data type in C.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.188.138