1.3 Data Hierarchy

Data items processed by computers form a data hierarchy that becomes larger and more complex in structure as we progress from the simplest data items (called “bits”) to richer ones, such as characters and fields. The following diagram illustrates a portion of the data hierarchy:

A diagram of a data hierarchy.

Bits

A bit (short for “binary digit”—a digit that can assume one of two values) is the smallest data item in a computer. It can have the value 0 or 1. Remarkably, the impressive functions performed by computers involve only the simplest manipulations of 0s and 1s—examining a bit’s value, setting a bit’s value and reversing a bit’s value (from 1 to 0 or from 0 to 1). Bits for the basis of the binary number system, which you can study in-depth in our online “Number Systems” appendix.

Characters

Work with data in the low-level form of bits is tedious. Instead, people prefer to work with decimal digits (09), letters (AZ and az) and special symbols such as

$ @ % & * ( ) – + " : ; , ? /

Digits, letters and special symbols are known as characters. The computer’s character set contains the characters used to write programs and represent data items. Computers process only 1s and 0s, so a computer’s character set represents every character as a pattern of 1s and 0s. Python uses Unicode® characters that are composed of one, two, three or four bytes (8, 16, 24 or 32 bits, respectively)—known as UTF-8 encoding.5

Unicode contains characters for many of the world’s languages. The ASCII (American Standard Code for Information Interchange) character set is a subset of Unicode that represents letters (a–z and A–Z), digits and some common special characters. You can view the ASCII subset of Unicode at

http://unicode.org/charts/PDF/U0000.pdf

The Unicode charts for all languages, symbols, emojis and more are viewable at

http://www.unicode.org/charts/

Fields

Just as characters are composed of bits, fields are composed of characters or bytes. A field is a group of characters or bytes that conveys meaning. For example, a field consisting of uppercase and lowercase letters can be used to represent a person’s name, and a field consisting of decimal digits could represent a person’s age.

Records

Several related fields can be used to compose a record. In a payroll system, for example, the record for an employee might consist of the following fields (possible types for these fields are shown in parentheses):

  • Employee identification number (a whole number).

  • Name (a string of characters).

  • Address (a string of characters).

  • Hourly pay rate (a number with a decimal point).

  • Year-to-date earnings (a number with a decimal point).

  • Amount of taxes withheld (a number with a decimal point).

Thus, a record is a group of related fields. All the fields listed above belong to the same employee. A company might have many employees and a payroll record for each.

Files

A file is a group of related records. More generally, a file contains arbitrary data in arbitrary formats. In some operating systems, a file is viewed simply as a sequence of bytes—any organization of the bytes in a file, such as organizing the data into records, is a view created by the application programmer. You’ll see how to do that in Chapter 9, “Files and Exceptions.” It’s not unusual for an organization to have many files, some containing billions, or even trillions, of characters of information.

Databases

A database is a collection of data organized for easy access and manipulation. The most popular model is the relational database, in which data is stored in simple tables. A table includes records and fields. For example, a table of students might include first name, last name, major, year, student ID number and grade-point-average fields. The data for each student is a record, and the individual pieces of information in each record are the fields. You can search, sort and otherwise manipulate the data, based on its relationship to multiple tables or databases. For example, a university might use data from the student database in combination with data from databases of courses, on-campus housing, meal plans, etc. We discuss databases in Chapter 17, “Big Data: Hadoop, Spark, NoSQL and IoT.”

Big Data

The table below shows some common byte measurements:

A table shows common byte measurements

The amount of data being produced worldwide is enormous and its growth is accelerating. Big data applications deal with massive amounts of data. This field is growing quickly, creating lots of opportunity for software developers. Millions of IT jobs globally already are supporting big data applications. Section 1.13 discusses big data in more depth. You’ll study big data and associated technologies in Chapter 17.

Self Check

  1. (Fill-In) A(n)       (short for “binary digit”—a digit that can assume one of two values) is the smallest data item in a computer.
    Answer: bit.

  2. (True/False) In some operating systems, a file is viewed simply as a sequence of bytes—any organization of the bytes in a file, such as organizing the data into records, is a view created by the application programmer.
    Answer: True.

  3. (Fill-In) A database is a collection of data organized for easy access and manipulation. The most popular model is the       database, in which data is stored in simple tables.
    Answer: relational

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.144.197