CHAPTER 17

Binary and Random Access Files

Chapter Objectives

By the end of the chapter, readers will be able to:

images  Discuss the overall characteristics, advantages, and disadvantages associated with binary files.

images  Explain the options available for opening and closing binary files.

images  Develop programs that read from and write to binary files.

images  Explain the characteristics of using random access files.

images  Compare and contrast sequential files and random access files.

images  Illustrate programmatically the use of various methods for manipulating random access files.

Introduction

In Chapter 11, the concepts of using files for data storage and using files for reports were introduced. Up to this point, all of the file-related code examples and programs have had two things in common: (1) they involved text files, and (2) the information in the files was accessed sequentially. In this chapter, the concepts of files and file access are expanded to allow the use of binary files as well as the ability to access information in a nonsequential manner.

17.1 Text Files Versus Binary Files

Text files, as previously discussed, are files written in a way that is readable by humans. This is accomplished by translating the information being written to the file into ASCII or Unicode characters.

Binary files are written in a way that requires no translation. The information is, therefore, virtually unreadable by humans, making binary files unusable for reports. Program executables or documents created using Microsoft Word or some other word processor are examples of binary files. To see how the lack of translation affects the readability of a file, try opening one of your executables using Notepad or some other text editor. Microsoft Visual Studio will actually show a hex dump of the binary file. The term “hex dump” refers to showing the contents of a binary file in hexadecimal format as well as offering a translation of those bytes recognizable as ASCII or Unicode characters.

17.1.1 Advantages and Disadvantages of Binary Files

There are a number of advantages that binary files have over traditional text or ASCII files. One advantage is that reading and writing data is much faster when using binary files. With binary files, there is no need to translate all the data stored within a computer's memory into ASCII characters before being written to a file. Instead, the data can simply be stored in the file as it appears in memory. Likewise, when reading data from a binary file, the data can simply be read and placed into memory directly, again without any need for translation. In many cases text files are too limiting, because they don't allow for special formatting symbols or commands to be placed within the file.

While there are a number of advantages to using binary files, there are also some disadvantages. For example, binary files are not directly human-readable because they contain a copy of the actual memory where the data was held. In addition, they are not always portable from one machine to another.

Section 17.1 Exercises

1.  What are two disadvantages of binary files?

2.  What is the main difference between binary files and text files?

3.  What is a hex dump?

4.  For each of the following file extensions, state whether the file is typically a text file or a binary file.

a.  .txt

b.  .exe

c.  .obj

d.  .cpp

e.  .h

17.2 Opening and Closing Binary Files

Opening binary files is similar in many ways to what we demonstrated in Chapter 11 when working with text or ASCII files. As you recall, there are a couple of different options for opening files, as shown in the following syntax:

images

As usual, the filename must be a cString or a string. The optional second parameter represents the mode in which the file is to be opened and can include a combination of the flags presented in Table 17.2.1.

The first three modes in Table 17.2.1 were discussed in Chapter 11. Up to this point we have often been able to omit the second parameter because all files have been text files, which is the default mode. However, when working with binary streams, we are required to include the second parameter, ios::binary. Example 17.2.1 illustrates various options for opening binary files.

Mode   Description
ios::in   Open file for input.
ios::out   Open file for output.
ios::app   Open file for output. All new output is added at the end of the file.
ios::ate   Open file with the file pointer initially set at the end of the file. Data can be written anywhere within the file.
ios::binary   Open file in binary mode (the default mode is text).
ios::trunc   Open file for output. If a file already exists, it is replaced by the new file. If using ios::out, this is the default unless you specify ios::app, ios::ate, or ios::in.

Table 17.2.1 File modes

images

Notice in Example 17.2.1 that multiple file mode flags can be combined using the bitwise OR operator (|). The binary data stream in the example was opened so that new information would be appended to the file.

One additional member function that is often helpful is .clear. The .clear function clears, or resets, the I/O state flags for the stream. For example, when you have reached the end of a file, a stream state flag of eofbit is set. If you attempt to reuse a stream object, you will need to clear this flag; otherwise, any attempt to read from the file will fail because the end-of-file (EOF) marker has already been encountered. To reset this state flag, or any of the other I/O flags, use the .clear function. The signature of the .clear function is as follows:

void clear();

Just as you have been doing with text files, once you have completed the necessary input and output activities associated with a binary file, always make it a point to close the file.

STYLE NOTE When naming binary data files, it is often common to use the extension .dat, as illustrated in our code examples.

Section 17.2 Exercises

1.  What is wrong with the following statement?

ofstream data(“c:\data.dat”, ios::out & ios::app & ios::binary );

2.  Indicate which file mode(s) should be used to open a file in the following situations:

a.  To read from a binary file

b.  To write data at the end of a binary file

c.  To open a binary file for both reading and writing

3.  What is wrong with the following statement? Assume that you are attempting to open a binary file.

ifstream input(“binary.dat” );

4.  What header file(s) and namespace statements are required to make the following statement compile?

ifstream fin(“file.dat”, ios::in );

Section 17.2 Learn by Doing Exercise

1.  Create a word-processing document that has your name in it. Open the binary file in append mode, performing appropriate file operations such as checking for correct opening and closing of the file. Run your program. Reopen the file in your word processor and verify that the information is still there. Now modify your program to open the file in write mode. What should happen to your data? Verify your suspicions.

17.3 Binary File I/O

Reading from a binary file is usually accomplished with an ifstream object, while writing to a binary file typically uses an ofstream object. To facilitate reading from a binary file, we use the .read method, while output is accomplished with the .write method. The syntax for both methods is as follows:

read( char * buffer, int size );

write( char * buffer, int size );

The .read and .write stream member functions extract or insert a specified number of bytes, as indicated by the second parameter. The information to be written, or the buffer in which to store the information, is designated by the first parameter.

The first parameter is a pointer to the buffer that holds the information read and must be cast to a char * (i.e., a character pointer). Does this mean that we can only read or write characters? No. The first parameter is a character pointer because a character takes up a single byte of memory. Therefore, a byte data type can be represented by a character. Since both .read and .write manipulate a certain number of bytes, they expect an address of a block of memory in which every element is one byte long.

The second argument indicates how many bytes are to be read from or written to the stream. While reading, if the EOF marker is encountered before extracting the number of bytes specified, the read method will simply stop. The ifstream member function .gcount can be used to retrieve the number of bytes read during the last .read statement. The signature for the .gcount method follows. The return value, streamsize, is simply a typedef that equates to a signed integer.

streamsize gcount() const;

It is important to remember that the contents of a block of memory can be written into a stream with simply one .write statement. Likewise, it is possible to read the entire contents of a stream with one .read statement. Retrieving or writing a block of information cuts down on the number of storage-device accesses, thus speeding up the I/O process. Example 17.3.1 shows a relatively complete program that includes functions designed to both write to and read from a binary file.

images

images

Notice in Example 17.3.1 that the address of the array is typecast to a character pointer. However, this type of casting is probably unfamiliar to you. The reinterpret_cast is another form of C++ typecasting that is usually used to convert an address to a different type of address. Although discouraged by many programmers because it can be unsafe, it is a necessary evil in this case. Using a static_cast will not work and will not even compile in Visual Studio.

Another important aspect of Example 17.3.1 is that the structure contains only fixed-size data members. If the cString data member were to be replaced by a string, it would cause a number of problems. Remember that the string class contains a dynamic cString. Therefore, a member of the string class is a pointer. As we discussed in Chapter 13, writing a structure or class that has a pointer data member will create shallow copy issues. The same issues would arise if you were to use string data members in a structure used to write to a binary file—only the address would be written and not the data.

Section 17.3 Exercises

1.  True or false: Assuming that the file is open, it is possible to read an entire binary file with one statement.

2.  What does the .gcount member function do, and what is its return type?

3.  Write the statement necessary to read from a binary file and store the contents of a single structure into a variable.

Section 17.3 Learn by Doing Exercises

1.  Create a text file that has the following information: first name, last name, hourly wage, and hours worked. Write a program that reads the text file, storing this information in an array of structures, and then write the information to a binary file. Before any data from the text file is written to the binary file, write the number of records in your array. Be sure to use cStrings and not strings for the name data members.

2.  Create another program that reads the information stored in the binary file created in the previous exercise. Remember to read the number of records from the file and create a dynamic array to accommodate the data. Now read the entire data file into the dynamic array using only one read statement. Write the contents of the array to the screen.

3.  Modify the previous exercise so that the data is read one record at a time.

4.  Modify the previous exercise so that the first_name data member is five characters longer. Rerun your program and verify that the information stored in the arrays is invalid. Why did the data get corrupted?

17.4 Sequential Files Versus Random Access Files

So far our focus has been on reading and writing text files in sequential order. For example, when we read data from our files, the only way to get to the third record in the file is to physically read and process the first and second records. It is not possible to randomly jump to the fiftieth record without reading the first forty-nine. To update or modify a sequential file requires actually creating a new file into which all of the data is copied. While reading and writing records in sequential fashion works well for many applications, it is often faster and more efficient to be able to go directly to a specific location within a file. The capability to directly access a specific location within a file is the premise behind random access files. To perform random file access requires the ability to actually move the file position marker (FPM).

The following sections examine some of the options available for locating and moving the FPM. Before discussing the methods available to indicate where the FPM is currently located within a specific file, we need to provide some additional background information about stream objects. To begin with, it is important to note that all stream objects have their own internal marker for referencing a specific position within a file. For ifstream, this marker is sometimes referred to as the get pointer and simply points to the location of the data to be read. For instances of ofstream, this marker is called the put pointer and indicates the location within the file where data will be written.

17.4.1 Determining Current FPM Location

Two member functions are available to determine where the pointer is currently located within a file. The first method is called .tellg and is used with the get pointer; the second method is .tellp and is used with the put pointer. In the syntax that follows, notice that both functions return a data type called pos_type. This type is basically a long integer value indicating the current position—the number of bytes—of the FPM from the start of the file.

pos_type tellg();

pos_type tellp();

Example 17.4.1 demonstrates the use of the .tellg method. In this example, if the sizeof the Student structure is 56, the value returned by the .tellg method would be 112.

images

17.4.2 Moving the FPM

To provide random access of data in a file, the programmer must be able to move or change the position of the marker, or pointer, as needed. The member functions .seekg and .seekp allow the positioning of the pointer to any location within the file. It should come as no surprise that the .seekg method is used with the ifstream get pointer while the .seekp method is used with the ofstream put pointer.

Both .seekp and .seekg can be used with two different options. In the first option, the single parameter represents the number of bytes from the beginning of the file to where you want to move the FPM. This option follows; notice that the parameter type is the same as that used in the .tellg and .tellp methods previously discussed.

seekg( pos_type position );
seekp( pos_type position );

The second overloaded version of these methods requires two parameters. The first parameter represents the offset relative to some specified starting point, as indicated by the second parameter. This option is as follows:

seekg( off_type offset, ios_base::seekdir direction );
seekp( off_type offset, ios_base::seekdir direction );

Under the hood, off_type represents a signed integer, while the direction parameter is one of the enumerated values from the seekdir type. Table 17.4.1 shows the options available for use as the second parameter.

Example 17.4.2 shows two examples illustrating how the seek methods can be used to change the position of the pointer within a stream. In the first statement, the FPM is moved the size of two Student structures from the beginning of the stream. The second example sets the file marker 10 bytes toward the end of the file relative to the stream's current FPM.

Direction   Description
ios::beg   Seek (change the current read or write position) relative to the beginning of the file.
ios::cur   Seek relative to the current FPM position.
ios::end   Seek relative to the end of the file.

Table 17.4.1 Enumerations of seekdir

Example 17.4.3 demonstrates a number of the methods discussed within this section. Notice that the instance of the fstream object was opened for both input and output. The code illustrates not only how to move the FPM to a specific record but also how to re-write a record in the file without altering or rewriting the other records.

images

images

The fstream class is the parent of the ifstream and ofstream classes. Therefore, much of the functionality of the children is derived from the parent. The fstream class is used because of the default modes of the other two classes. To accomplish the task of both reading and writing to the same stream in Example 17.4.3, we used the fstream class.

Section 17.4 Exercises

1.  Write the code necessary to move the FPM 10 bytes toward the end of the file from the current position within the file object, fin.

2.  Explain what .tellg and .tellp are used for.

3.  Explain what .seekg and .seekp are used for.

Section 17.4 Learn by Doing Exercise

1.  Rewrite the program created for Section 17.3 Learn by Doing Exercise 1 so that there is an additional data member in the structure called record_number. As the information is read from the text file into the array of structures, increment the record number so that each element of the array has a sequential value for the record number. Now create a program that accepts as input from the user the record number of the person he or she wishes to modify. Read that record only from the file and allow the user to change the data, then rewrite the information out to the file, overwriting the original data. Use the random access methods discussed in this section to accomplish these tasks.

17.5 C—The Differences

As with all I/O, there are numerous differences between C and C++. Although the syntax and functions differ, the concepts remain the same. The following sections present the syntax of these differences.

17.5.1 File Modes

In Chapter 11 we presented the syntax necessary to open files using C. The second parameter to fopen is a cString representing the file mode. The only file modes that were discussed were the r, w, and a (read, write, and append) modes. There are several other modes that can be used not only to allow access to binary files but also to allow random access techniques to be used. Table 17.5.1 shows these modes.

In addition to the random access modes listed in Table 17.5.1, a “b” can be appended to any C file mode cString to allow access to binary files. This is shown in Example 17.5.1.

Mode     Explanation
r+     Read and write. File must exist.
w+     Read and write. Will create file if possible. Existing data is destroyed.
a+     Read and append. Will create file if possible. Existing data is retained. All writing will be done at the end of the file.

Table 17.5.1 File modes

images

17.5.2 Binary File I/O with C

The C functions used to read from and write to binary files are fread and fwrite. The syntax for these functions is as follows:

images

As shown in the preceding function signatures, both of these functions require an address of the buffer in which the information is stored or will be stored. Unlike in C++, this address will not need to be typecast because the function accepts a void *, which you may remember is a pointer to anything. The next parameter is the size of one item that will be written or read. The third parameter is the number of items to be written or read, and the last parameter is the C file pointer. Both of these functions return the number of items, not bytes, completely read or written.

Example 17.5.2 demonstrates how to use fread and fwrite. This is a translation of Example 17.3.1 from C++ to C.

images

Identifier     Description
SEEK_CUR     Position FPM relative to the current position of the FPM.
SEEK_END     Position FPM relative to the end of the file.
SEEK_SET     Position FPM relative to the beginning of the file.

Table 17.5.2 The fseek function origin identifiers

Remember the first parameter to fread and fwrite is the address of the data buffer. In Example 17.5.2, the buffer is an array that is already an address. If it were not an address, the address of operator would need to be used.

17.5.3 Random Access Functions

There are three C functions that are used to manipulate the FPM within binary files: rewind, ftell, and fseek. The rewind function moves the FPM back to the beginning of the file, ftell returns the number of bytes that the FPM is currently away from the beginning of the file, and fseek repositions the FPM within the file. The function signatures of these functions are as follows:

images

The origin parameter of fseek can have one of three values, as shown in Table 17.5.2.

Example 17.5.3 translates Example 17.4.3 from C++ to C.

images

images

17.6 SUMMARY

Up to this point, all file-related I/O has involved sequential text files. This chapter focused on two additional file formats: binary files and random access files.

Unlike text files, a binary file takes the data directly from the computer's memory and writes it immediately into the stream. Likewise, when reading binary data from a file, the data can be placed directly into memory without the need to do any translation into a human-readable format.

To facilitate binary I/O, we introduced the .read and .write functions. In addition, we presented a function called .gcount, which can be used for retrieving the number of bytes read via the last read statement.

Another topic presented within this chapter centered on random access files. Random access allows the programmer to go directly to a specific location within a file. It is that ability to move the FPM as desired that provides the major framework behind this file format.

To support the ability to randomly access data, we discussed such C++ functions as .tellg and .tellp, both of which are used to determine where the file pointer is currently located. Two additional functions, .seekp and .seekg, allow the ability to move the FMP to any desired location, helping to support the increase in data-access speed offered by random files.

The capability to take a stream of data and read or write to an external device is an extremely important aspect of programming. Now that you have been exposed to a number of options for storing and accessing data, be sure to carefully evaluate the needs of your application when making decisions regarding which file organization will work the best for the task at hand.

17.7 Programming Exercise

1.  Using Microsoft Visual Studio as a model, create a hex dump utility. Prompt the user for the filename and then display the hexadecimal values as well as the translated ASCII characters.

17.8 Answers to Chapter Exercises

Section 17.1

1.  Two disadvantages of binary files are that they cannot be easily read by humans and that they are not always easily transferable from one machine to another.

2.  As the data from a text file is read or written, it is translated into either ASCII or Unicode characters, making the file easily read by humans. Binary files are written (and read) in such a way that no translation is required as the data is moved into or out of the memory or the file. However, the resulting file is not in a human-readable format.

3.  Hex dump refers to viewing the contents of a section of computer memory or of a binary file in hexadecimal format and providing the translation of those bytes that are recognizable as ASCII or Unicode characters.

4.  a. text file

 b.  inary file

 c.  binary file

 d.  text file

 e.  text file

Section 17.2

1.  Both AND symbols ( & ) should be replaced by OR symbols ( | ).

2.  a.  ios::in | ios::binary

b.  ios::app | ios::binary

c.  ios::in | ios::out | ios::binary

3.  When opening binary files, remember to include the second parameter: ios::binary.

4.  #include <fstream>
using std::ifstream;
using std::ios;

Section 17.3

1.  True

2.  The .gcount member function retrieves the number of bytes read during the last read statement. It technically returns a streamsize, which is a typedef for a signed integer.

3.  fin.read( reinterpret_cast<char *> (&my_var), sizeof(MY_TYPE) );

Section 17.4

1.  fin.seekg( 10, ios::beg );

2.  Both .tellg and .tellp are used for determining where within the file the pointer is currently located.

3.  Both the .seekg and the .seekp methods are used to move the FPM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.7.208