CHAPTER 12

Working with Files

If your computer could only ever process data stored within the main memory of the machine, the scope and variety of applications that you could deal with would be severely limited. Virtually all serious business applications require more data than would fit into main memory and depend on the ability to process data that's stored on an external device, such as a fixed disk drive. In this chapter, you'll explore how you can process data stored in files on an external device.

C provides a range of functions in the header file <stdio.h> for writing to and reading from external devices. The external device you would use for storing and retrieving data is typically a fixed disk drive, but not exclusively. Because, consistent with the philosophy of the C language, the library facilities that you'll use for working with files are device-independent, so they apply to virtually any external storage device. However, I'll assume in the examples in this chapter that we are dealing with disk files.

In this chapter you'll learn the following:

  • What a file is in C
  • How files are processed
  • How to write and read formatted files and binary files
  • How to retrieve data from a file by direct random access to the information
  • How to use temporary work files in a program
  • How to update binary files
  • How to write a file viewer program

The Concept of a File

With all the examples you've seen up to now, any data that the user enters when the program is executed is lost once the program finishes running. At the moment, if the user wants to run the program with the same data, he or she must enter it again each time. There are a lot of occasions when this is not only inconvenient, but also makes the programming task impossible.

If you want to maintain a directory of names, addresses, and telephone numbers, for instance, a program in which you have to enter all the names, addresses, and telephone numbers each time you run it is worse than useless! The answer is to store data on permanent storage that continues to be maintained after your computer is switched off. As I'm sure you know, this storage is called a file, and a file is usually stored on a hard disk.

You're probably familiar with the basic mechanics of how a disk works. If so, this can help you recognize when a particular approach to file usage is efficient and when it isn't. On the other hand, if you know nothing about disk file mechanics, don't worry at this point. There's nothing in the concept of file processing in C that depends on any knowledge of physical storage devices.

A file is essentially a serial sequence of bytes, as illustrated in Figure 12-1.

image

Figure 12-1 Structure of a file

Positions in a File

A file has a beginning and an end, and it has a current position, typically defined as so many bytes from the beginning, as Figure 12-1 illustrates. The current position is where any file action (a read from the file or a write to the file) will take place. You can move the current position to any other point in the file. A new current position can be specified as an offset from the beginning of the file or, in some circumstances, as a positive or negative offset from the previous current position.

File Streams

The C library provides functions for reading and writing to or from data streams. A stream is an abstract representation of any external source or destination for data, so the keyboard, the command line on your display, and files on disk are all examples of streams. You therefore use the same input/output functions for reading and writing any external device that is mapped to a stream.

There are two ways of writing data to a stream that is a disk file. Firstly, you can write a file as a text file, in which case data is written as a characters organized as lines, where each line is terminated by a newline character. Obviously, binary data such as values of type int or type double have to be converted to characters to allow them to be written to a text file, and you've already seen how this formatting is done with the printf() function. Secondly you can write a file as a binary file. Data that is written to a binary file is always written as a series of bytes, exactly as it appears in memory, so a value of type double for example would be written as the 8 bytes that appear in memory.

Of course, you can write any data you like to a file, but once a file has been written, it just consists of a series of bytes on disk. Regardless of whether you write a file as a binary file or as a text file, it ultimately ends up as just a series of bytes, whatever the data is. This means that when the file is read, the program must know what sort of data the file represents. You've seen many times now that exactly what a series of bytes represents is dependent upon how you interpret it. A sequence of 12 bytes in a binary file could be 12 characters, 12 8-bit signed integers, 12 8-bit unsigned integers, 6 16-bit signed integers, a 32-bit integer followed by an 8-byte floating-point value, and so on. All of these will be more or less valid interpretations of the data, so it's important that a program that is reading a file has the correct assumptions about how it was written.

Accessing Files

The files that are resident on your disk drive each have a name, and the rules for naming files will be determined by your operating system. When you write a program to process a file, it would not be particularly convenient if the program would only work with a specific file with a particular name. If it did, you would need to produce a different program for each file you might want to process. For this reason, when you process a file in C, your program references a file through a file pointer. A file pointer is an abstract pointer that is associated with a particular file when the program is run so that the program can work with different files on different occasions. A file pointer points to a struct that represents a stream. In the examples in this chapter, I'll use Microsoft Windows file names. If you're using a different operating system environment, such as UNIX, you'll need to adjust the names of the files appropriately.

If you want to use several files simultaneously in a program, you need a separate file pointer for each file, although as soon as you've finished using one file, you can associate the file pointer you were using with another file. So if you need to process several files, but you'll be working with them one at a time, you can do it with one file pointer.

Opening a File

You associate a specific external file name with an internal file pointer variable through a process referred to as opening a file. You open a file by calling the standard library function fopen(), which returns the file pointer for a specific external file. The function fopen() is defined in <stdio.h>, and it has this prototype:

FILE *fopen(char *name, char *mode);

The first argument to the function is a pointer to a string that is the name of the external file that you want to process. You can specify the name explicitly as an argument, or you can use an array, or a variable of type pointer to char that contains the address of the character string that defines the file name. You would typically obtain the file name through some external means, such as from the command line when the program is started, or you could arrange to read it in from the keyboard. Of course, you can also define a file name as a constant at the beginning of a program when the program always works with the same file.

The second argument to the fopen()function is a character string called the file mode that specifies what you want to do with the file. As you'll see, this spans a whole range of possibilities, but for the moment I'll introduce just three file modes (which nonetheless comprise the basic set of operations on a file). Table 12-1 lists these three file modes.

Table 12-1. File Modes

Mode Description
"w" Open a text file for write operations. If the file exists, its current contents are discarded.
"a" Open a text file for append operations. All writes are to the end of the file.
"r" Open a text file for read operations.

Note Notice that a file mode specification is a character string between double quotes, not a single character between single quotes.


These three modes only apply to text files that are files that are written as characters. You can also work with binary files that are written as a sequence of bytes and I'll discuss that in the section "Binary File Input and Output" later in this chapter. Assuming the call to fopen() is successful, the function returns a pointer of type File * that you can use to reference the file in further input/output operations, using other functions in the library. If the file cannot be opened for some reason, fopen() returns a null pointer.


Note The pointer returned by fopen() is referred to as a file pointer, or a stream pointer.


So a call to fopen() does two things for you: it creates a file pointer that identifies the specific file on disk that your program is going to operate on, and it determines what you can do with that file within your program.

The pointer that's returned by fopen() is of type FILE * or "pointer to FILE," where FILE specifies a structure type that has been predefined in the header file <stdio.h> through a typedef. The structure that a file pointer points to will contain information about the file. This will be such things as the open mode you specified, the address of the buffer in memory to be used for data, and a pointer to the current position in the file for the next operation. You don't need to worry about the contents of this structure in practice. It's all taken care of by the input/output functions. However, if you really want to know about the FILE structure, you can browse through the library header file.

As I mentioned earlier, when you want to have several files open at once, they must each have their own file pointer variable declared, and you open each of them with a separate call to fopen () with the value that is returned stored in a separate file pointer. There's a limit to the number of files you can have open at one time that will be determined by the value of the constant FOPEN_MAX that's defined in <stdio.h>. FOPEN_MAX is an integer that specifies the maximum number of streams that can be open at one time. The C language standard requires that the value of FOPEN_MAX be at least 8, including stdin, stdout and stderr. Thus, as a minimum, you will be able to be working with up to 5 files simultaneously.

If you want to write to an existing text file with the name myfile.txt, you would use these statements:

FILE *pfile = fopen("myfile.txt", "w");  /* Open file myfile.txt to write it */

This statement opens the file and associates the physical file specified by the file name myfile.txt with your internal pointer pfile. Because you've specified the mode as "w", you can only write to the file; you can't read from it. The string that you supply as the first argument is limited to a maximum of FILENAME_MAX characters, where FILENAME_MAX is defined in the <stdio.h> header file. This value is usually sufficiently large enough that it isn't a real restriction.

If a file with the name myfile.txt does not already exist, the call to the function fopen() in the previous statement will create a new file with this name. Because you have just provided the file name without any path specification as the first argument to the fopen() function, the file is assumed to be in the current directory, and if the file is not found there, that's where it will be created. You can also specify a string that is the full path and name for the file, in which case the file will be assumed to be at that location and a new file will be created there if necessary. Note that if the directory that's supposed to contain the file doesn't exist when you specify the file path, neither the directory nor the file will be created and the fopen() call will fail. If the call to fopen() does fail for any reason, NULL will be returned. If you then attempt further operations with a NULL file pointer, it will cause your program to terminate.


Note So here you have the facility to create a new text file. Simply call fopen() with mode "w" and the first argument specifying the name you want to assign to the new file.


On opening a file for writing, the file is positioned at the beginning of any existing data for the first operation. This means that any data that was previously written to the file will be overwritten when you initiate any write operations.

If you want to add to an existing text file rather than overwrite it, you specify mode "a", which is the append mode of operation. This positions the file at the end of any previously written data. If the file specified doesn't exist, as in the case of mode "w", a new file will be created. Using the file pointer that you declared previously, to open the file to add data to the end, use the following statement:

pfile = fopen("myfile.txt", "a");      /* Open file myfile.txt to add to it */

When you open a file in append mode, all write operations will be at the end of the data in the file on each write operation. In other words, all write operations append data to the file and you cannot update the existing contents in this mode.

If you want to read a file, once you've declared your file pointer, open it using this statement:

pfile = fopen("myfile.txt", "r");

Because you've specified the mode argument as "r", indicating that you want to read the file, you can't write to this file. The file position will be set to the beginning of the data in the file.

Clearly, if you're going to read the file, it must already exist. If you inadvertently try to open a file for reading that doesn't exist, fopen() will return NULL. It's therefore a good idea to check the value returned from fopen() in an if statement, to make sure that you really are accessing the file you want.

Renaming a File

There are many circumstances in which you'll want to rename a file. You might be updating the contents of a file by writing a new, updated file, for instance. You'll probably want to assign a temporary name to the new file while you're creating it, and then change the name to that of the old file once you've deleted it. Renaming a file is very easy. You just use the rename() function, which has the following prototype:

int rename(const char *oldname, const char *newname);

The integer that's returned will be 0 if the name change is successful, and nonzero otherwise. The file must be closed when you call rename(), otherwise the operation will fail.

Here's an example of using the rename() function:

if(rename( "C:\temp\myfile.txt", "C:\temp\myfile_copy.txt"))
  printf("Failed to rename file.");
else
  printf("File renamed successfully.");

The preceding code fragment will change the name of the myfile.txt file in the temp directory on drive C to myfile_copy.text. A message will be produced that indicates whether the name change succeeded. Obviously, if the file path is incorrect or the file doesn't exist, the renaming operation will fail.


Caution Note the double backslash in the file path string. If you forget to use the escape sequence for a backslash when specifying a Microsoft Windows file path you won't get the file name that you want.


Closing a File

When you've finished with a file, you need to tell the operating system that this is the case and free up your file pointer. This is referred to as closing a file. You do this by calling the fclose() function which accepts a file pointer as an argument and returns a value of type int, which will be EOF if an error occurs and 0 otherwise. The typical usage of the fclose() function is as follows:

fclose(pfile);                         /* Close the file associated with pfile */

The result of executing this statement is that the connection between the pointer, pfile, and the physical file name is broken, so pfile can no longer be used to access the physical file it represented. If the file was being written, the current contents of the output buffer are written to the file to ensure that data isn't lost.


Note EOF is a special character called the end-of-file character. In fact, the symbol EOF is defined in <stdio.h> and is usually equivalent to the value −1. However, this isn't necessarily always the case, so you should use EOF in your programs rather than an explicit value. EOF generally indicates that no more data is available from a stream.


It's good programming practice to close a file as soon as you've finished with it. This protects against output data loss, which could occur if an error in another part of your program caused the execution to be stopped in an abnormal fashion. This could result in the contents of the output buffer being lost, as the file wouldn't be closed properly. You must also close a file before attempting to rename it or remove it.


Note Another reason for closing files as soon as you've finished with them is that the operating system will usually limit the number of files you may have open at one time. Closing files as soon as you've finished with them minimizes the chances of you falling afoul of the operating system in this respect.


There is a function in <stdio.h> that will force any unwritten data left in a buffer to be written to a file. This is the function fflush(), which you've already used in previous chapters to flush the input buffer. With your file pointer pfile, you could force any data left in the output buffer to be written to the file by using this statement:

fflush(pfile);

The fflush() function returns a value of type int, which is normally 0 but will be set to EOF if an error occurs.

Deleting a File

Because you have the ability to create a file in your code, at some point you'll want to be able to delete a file programmatically, too. The remove() function that's declared in <stdio.h> does this. You use it like this:

remove("pfile.txt");

This will delete the file from the current directory that has the name pfile.txt. Note that the file should not be open when you call remove() to delete it. If the file is open, the effect of calling remove is implementation-defined, so consult your library documentation.

You always need to double-check any operations on files, but you need to take particular care with operations that delete files.

Writing to a Text File

Once you've opened a file for writing, you can write to it any time from anywhere in your program, provided you have access to the pointer for the file that has been set by fopen(). So if you want to be able to access a file from anywhere in a program that contains multiple functions, you need to ensure the file pointer has global scope or arrange for it to be passed as an argument to any function that accesses the file.


Note As you'll recall, to ensure that the file pointer has global scope you place the declaration for it outside of all of the functions, usually at the beginning of the source file.


The simplest write operation is provided by the function fputc(), which writes a single character to a text file. It has the following prototype:

int fputc(int c, FILE *pfile);

The fputc() function writes the character specified by the first argument to the file defined by the second argument, which is a file pointer. If the write is successful, it returns the character that was written. Otherwise it returns EOF.

In practice, characters aren't written to the physical file one by one. This would be extremely inefficient. Hidden from your program and managed by the output routine, output characters are written to an area of memory called a buffer until a reasonable number have been accumulated; they are then all written to the file in one go. This mechanism is illustrated in Figure 12-2.

image

Figure 12-2 Writing a file

Note that the putc() function is equivalent to fputc(). It requires the same arguments and the return type is the same. The difference between them is that putc() may be implemented in the standard library as a macro, whereas fputc() is definitely a function.

Reading from a Text File

The fgetc() function is complementary to fputc() and reads a character from a text file that has been opened for reading. It takes a file pointer as its only argument and returns the character read as type int if the read is successful; otherwise, it returns EOF. The typical use of fgetc() is illustrated by the following statement:

mchar = fgetc(pfile);                  /* Reads a character into mchar */

You're assuming here that the variable mchar has been declared to be of type int.

Behind the scenes, the actual mechanism for reading a file is the inverse of writing to a file. A whole block of characters is read into a buffer in one go. The characters are then handed over to your program one at a time as you request them, until the buffer is empty, whereupon another block is read. This makes the process very fast, because most fgetc() operations won't involve reading the disk but simply moving a character from the buffer in main memory to the place where you want to store it.

Note that the function getc() that's equivalent to fgetc() is also available. It requires an argument of type FILE* and returns the character read as type int, so it's virtually identical to fgetc(). The only difference between them is that getc() may be implemented as a macro, whereas fgetc() is a function.


Caution Don't confuse the function getc() with the function gets(). They're quite different in operation: getc() reads a single character from the stream specified by its argument, whereas gets() reads a whole line of input from the standard input stream, which is the keyboard. You've already used the gets() function in previous chapters for reading a string from the keyboard.


Writing Strings to a Text File

Analogous to the puts() function for writing a string to stdout, you have the fputs() function for writing a string to a text file. Its prototype is as follows:

int fputs(char *pstr, FILE *pfile);

The first argument is a pointer to the character string that's to be written to the file, and the second argument is a file pointer. The operation of the function is slightly odd, in that it continues to write characters from a string until it reaches a '' character, which it doesn't write to the file. This can complicate reading back variable-length strings from a file that have been written by fputs(). It works this way because it's a character write operation, not a binary write operation, so it's expecting to write a line of text that has a newline character at the end. A newline character isn't required by the operation of the function, but it's very helpful when you want to read the file back (using the complementary fgets() function, as you'll see).

The fputs() function returns EOF if an error occurs, and 0 under normal circumstances. You use it in the same way as puts(), for example

fputs("The higher the fewer", pfile);

This will output the string appearing as the first argument to the file pointed to by pfile.

Reading Strings from a Text File

Complementing fputs() is the function fgets() for reading a string from a text file. It has the following prototype:

char *fgets(char *pstr, int nchars, FILE *pfile);

The fgets() function has three parameters. The function will read a string into the memory area pointed to by pstr, from the file specified by pfile. Characters are read from the file until either a ' ' is read or nchars-1 characters have been read from the file, whichever occurs first.

If a newline character is read, it's retained in the string. A '' character will be appended to the end of the string in any event. If there is no error, fgets() will return the pointer, pstr; otherwise, NULL is returned. The second argument to this function enables you to ensure that you don't overrun the memory area that you've assigned for input in your program. To prevent the capacity of your data input area from being exceeded, just specify the length of the area or the array that will receive the input data as the second argument to the function.

Formatted File Input and Output

Writing characters and strings to a text file is all very well as far as it goes, but you normally have many other types of data in your programs. To write numerical data to a text file, you need something more than you've seen so far, and where the contents of a file are to be human readable, you need a character representation of the numerical data. The mechanism for doing just this is provided by the functions for formatted file input and output.

Formatted Output to a File

You already encountered the function for formatted output to a file when I discussed standard streams back in Chapter 10. It's virtually the same as the printf() statement, except that there's one extra parameter and a slight name change. Its typical usage is the following:

fprintf(pfile, "%12d%12d%14f", num1, num2, fnum1);

As you can see, the function name has an additional f (for file), compared with printf(), and the first argument is a file pointer that specifies the destination of the data to be written. The file pointer obviously needs to be set through a call to fopen() first. The remaining arguments are identical to that of printf(). This example writes the values of the three variables num1, num2, and num3 to the file specified by the file pointer pfile, under control of the format string specified as the second argument. Therefore, the first two variables are of type int and are to be written with a field width of 12, and the third variable is of type float and is to be written to the file with a field width of 14.

Formatted Input from a File

You get formatted input from a file by using the function fscanf(). To read three variable values from a file pfile you would write this:

fscanf(pfile, "%12d%12d%14f", &num1, &num2, &fnum1);

This function works in exactly the same way as scanf() does with stdin, except that here you're obtaining input from a file specified by the first argument. The same rules govern the specification of the format string and the operation of the function as apply to scanf(). The function returns EOF if an error occurs such that no input is read; otherwise, it returns the number of values read as a value of type int.

Dealing with Errors

The examples in this book have included minimal error checking and reporting because the code for comprehensive error checking and reporting tends to take up a lot of space in the book and make the programs look rather more complicated than they really are. In real-world programs, however, it's essential that you do as much error checking and reporting as you can.

Generally, you should write your error messages to stderr, which is automatically available to your program and always points to your display screen. Even though stdout may be redirected to a file by an operating system command, stderr continues to be assigned to the screen. It's important to check that a file you want to read does in fact exist and you have been doing this in the examples, but there's more that you can do. First of all, you can write error messages to stderr rather than stdin, for example

char *filename = "C:\MYFILE.TXT";     /* File name    */
FILE *pfile = NULL;                    /* File pointer */

if(!(pfile = fopen(filename, "r")))
{
  fprintf(stderr, " Cannot open %s to read it.", filename);
  exit(1);
}

The merit of writing to stderr is that the output will always be directed to the display and it will always be written immediately to the display device. This means that you will always see the output directed to stderr, regardless of what happens in the program. The stdin stream is buffered, so there is the risk that data could be left in the buffer and never displayed if your program crashes. Terminating a program by calling exit() ensures that output stream buffers will be flushed so output will be written to the ultimate destination. The stream stdin can be redirected to a file, but stderr can't be redirected simply to ensure that the output always occurs.

Knowing that some kind of error occurred is useful, but you can do more than this. The perror() function outputs a string that you pass as an argument plus an implementation-defined error message corresponding to the error that occurred. You could therefore rewrite the previous fragment as follows:

if(!(pfile = fopen(myfile, "r")))
{
  perror(strcat("Error opening ", filename));
  exit(1);
}

This will output your message consisting of the file name appended to the first argument to strcat(), plus a system-generated message relating to the error. The output will be written to stderr.

If an error occurs when you're reading a file, you can check whether the error is due to reaching the end of file. The feof() function will return a nonzero integer if the end of file has been reached, so you can check for this with statements such as these:

if(feof(pfile))
  printf("End of file reached.");

Note that I didn't write the message to stderr here because reaching the end of the file isn't necessarily an error.

The ferror() function returns a nonzero integer if an error occurrs with an operation on the stream that's identified by the file pointer that you pass as the argument. Calling this function enables you to establish positively that an error did occur. The <errno.h> header file defines a value with the name errno that may indicate what kind of file error has occurred. You need to read the documentation for your C implementation to find out the specifics of this. The value of errno may be set for errors other than just file operations.

You should always include some basic error checking and reporting code in all of your programs. Once you've written a few programs, you'll find that including some standard bits of code for each type of operation warranting error checks is no hardship. With a standard approach, you can copy most of what you need from one program to another.

Further Text File Operation Modes

Text mode is the default mode of operation with the open modes you have seen up to now, but in earlier versions of C you could specify explicitly that a file is to be opened in text mode. You could do this by adding t to the end of the existing specifiers. This gives you the mode specifiers "wt", "rt", and "at" in addition to the original three. I am only mentioning this because you may come across it in other C programs. Although most compilers will support this, it's not specifically part of the current C standard so it is best not to use this option in your code.

You can also open a text file for update—that is, for both reading and writing—using the specifier "r+". You can also specify the open mode as "w+" if you want to both read and write a new file, or when you want to discard the original contents of an existing file before you start. Opening a file with the mode "w+" truncates the length of an existing file to zero, so only use this mode when you want to discard the current file contents. In older programs you may come across these modes written as "rt+" or "r+t" and "wt+" or "w+t".

As I've said, in update mode you can both read and write a text file. However, you can't write to the file immediately after reading it or read from the file immediately after writing it, unless the EOF has been reached or the position in the file has been changed by some means. (This involves calling a function such as rewind() or some other function that modifies the file position.) The reason for this is that writing to a file doesn't necessarily write the data to the external device. It simply transfers it to a buffer in memory that's written to the file once it's full, or when some other event causes it to be written. Similarly, the first read from a file will fill a buffer area in memory, and subsequent reads will transfer data from the buffer until it's empty, whereupon another file read to fill the buffer will be initiated. This is illustrated in Figure 12-3.

image

Figure 12-3 Buffered input operations

This means that if you were able to switch immediately from write mode to read mode, data would be lost because it would be left in the buffer. In the case of switching from read mode to write mode, the current position in the file may be different from what you imagine it to be, and you may inadvertently overwrite data on the file. A switch from read to write or vice versa, therefore, requires an intervening event that implicitly flushes the buffers. The fflush() function will cause the bytes remaining in an output buffer for the stream you pass as the argument to be written to an output file.

Binary File Input and Output

The alternative to text mode operations on a file is binary mode. In this mode, no transformation of the data takes place, and there's no need for a format string to control input or output, so it's much simpler than text mode. The binary data as it appears in memory is transferred directly to the file. Characters such as ' ' and '' that have specific significance in text mode are of no consequence in binary mode.

Binary mode has the advantage that no data is transformed or precision lost, as can happen with text mode due to the formatting process. It's also somewhat faster than text mode because no transformation operations are performed. The two modes are contrasted in Figure 12-4.

image

Figure 12-4 Contrasting binary mode and text mode

Specifying Binary Mode

You specify binary mode by appending b to the basic open mode specifiers I introduced initially. Therefore, you have the additional open mode specifiers "wb" for writing a binary file, "rb" to read a binary file, "ab" to append data to the end of a binary file, and "rb+" to enable reading and writing of a binary file.

Because binary mode involves handling the data to be transferred to and from the file in a different way from text mode, you have a new set of functions to perform input and output.

Writing a Binary File

You use the fwrite() function to write a binary file. This is best explained with an example of its use. Suppose that you open the file to be written with the following statements:

char *filename = "myfile.bin";
FILE *pfile = fopen(filename, "wb");

The filename variable points to the string that defines the name of the file, and pfile is a variable to store a pointer to an object of type FILE as before.

You could write to the file with these statements:

long pdata[] = {2L, 3L, 4L};
int num_items = sizeof(pdata)/sizeof(long);
FILE *pfile = fopen(filename, "wb");
size_t wcount = fwrite(pdata, sizeof(long), num_items, pfile);

The fwrite() function operates on the principle of writing a specified number of binary data items to a file, where each item is a given number of bytes long. The first argument, pdata, is a pointer containing the starting address in memory of where the data items to be written are stored. The second argument specifies the size in bytes of each item to be written. The third argument, num_items, defines a count of the number of items to be written to the file. The file to which the data is to be transferred is identified by the last argument, pfile. The function fwrite() returns the count of the number of items actually written as a value of type size_t. If the operation is unsuccessful for some reason, this value will be less than num_items.

Note that there is no check that you opened the file in binary mode when you call the fwrite() function. The write operation will write binary data to a file that you open in text mode. Equally, there is nothing to prevent you from writing text data to a binary file. Of course, if you do this a considerable amount of confusion is likely to result.

The return value and the second and third arguments to the function are all of the same type as that returned by the sizeof operator. This is defined as type size_t, which you probably remember is an unsigned integer type.

The code fragment above uses the sizeof operator to specify the size in bytes of the objects to be transferred and also determines the number of items to be written using the expression sizeof(pdata)/sizeof(long). This is a good way of specifying these values when this is possible, because it reduces the likelihood of error. Of course, in a real context, you should also check the return value in wcount to be sure the write is successful.

The fwrite() function is geared to writing a number of binary objects of a given length to a file. You can write in units of your own structures as easily as you can write values of type int, values of type double, or sequences of individual bytes.

This doesn't mean that the values you write in any given output operation all have to be of the same type. You might allocate some memory using malloc(), for instance, into which you assemble a sequence of data items of different types and lengths. You could then write the whole block of memory in one go as a sequence of bytes. Of course, when you come to read them back, you need to know the precise sequence and types for the values in the file if you are to make sense of them.

Reading a Binary File

You use the fread() function to read a binary file once it has been opened in read mode. Using the same variables as in the example of writing a binary file, you could read the file using a statement such as this:

size_t wcount = fread( pdata, sizeof(long), num_items, pfile);

This operates exactly as the inverse of the write operation. Starting at the address specified by data, the function reads num_items objects, each occupying the number of bytes specified by the second argument. The function returns the count of the number of items that were read. If the read isn't completely successful, the count will be less than the number of objects requested.

Moving Around in a File

For many applications, you need to be able to access data in a file other than in the sequential order you've used up to now. You can always find some information that's stored in the middle of a file by reading from the beginning and continuing in sequence until you get to what you want. But if you've written a few million items to the file, this may take some time.

Of course, to access data in random sequence requires that you have some means of knowing where the data that you would like to retrieve is stored in the file. Arranging for this is a complicated topic in general. There are many different ways of constructing pointers or indexes to make direct access to the data in a file faster and easier. The basic idea is similar to that of an index to a book. You have a table of keys that identify the contents of each record in the file you might want, and each key has an associated position in the file defined that records where the data is stored.

Let's look at the basic tools in the library that you need to enable you to deal with this kind of file input/output.


Note You cannot update a file in append mode. Regardless of any operations you may invoke to move the file position, all writes will be to the end of the existing data.


File Positioning Operations

There are two aspects to file positioning: finding out where you are at a given point in a file, and moving to a given point in a file. The former is basic to the latter: if you never know where you are, you can never decide how to get to where you want to go.

A random position in a file can be accessed regardless of whether the file concerned was opened in binary mode or in text mode. However, working with text mode files can get rather complicated in some environments, particularly Microsoft Windows. This is because the number of characters recorded in the file can be greater than the number of characters you actually write to the file. This is because a newline (' ' character) in memory can translate into two characters when written to a file in text mode (carriage return, CR, followed by linefeed, LF). Of course, your C library function for reading the information sorts everything out when you read the data back. A problem only arises when you think that a point in the file is 100 bytes from the beginning. Whether writing 100 characters to a file in text mode results in 100 bytes actually appearing in the file depends on whether the data includes newline characters. If you subsequently want to write some different data that is the same length in memory as the original data written to the file, it will only be the same length on the file if it contains the same number of ' ' characters.

Thus writing to text files randomly is best avoided. For this reason, I'll sidestep the complications of moving about in text files and concentrate the examples on the much more useful—and easier—context of randomly accessing the data in binary files.

Finding Out Where You Are

You have two functions to tell you where you are in a file, both of which are very similar but not identical. They each complement a different positioning function. The first is the function ftell(), which has the prototype

long ftell(FILE *pfile);

This function accepts a file pointer as an argument and returns a long integer value that specifies the current position in the file. This could be used with the file that's referenced by the pointer pfile that you've used previously, as in the following statement:

fpos = ftell(pfile);

The fops variable of type long now holds the current position in the file and, as you'll see, you can use this in a function call to return to this position at any subsequent time. The value is actually the offset in bytes from the beginning of the file.

The second function providing information on the current file position is a little more complicated. The prototype of the function is the following:

int fgetpos(FILE *pfile, fpos_t *position);

The first parameter is your old friend the file pointer. The second parameter is a pointer to a type that's defined in <stdio.h> called fpos_t. fpos_t will be a type other than an array type that is able to record every position within a file. It is typically an integer type and with my library it is type long. If you're curious about what type fpos_t is on your system, then have a look at it in <stdio.h>.

The fgetpos() function is designed to be used with the positioning function fsetpos(), which I'll come to very shortly. The function fgetpos() stores the current position and file state information for the file in position and returns 0 if the operation is successful; otherwise, it returns a nonzero integer value. You could declare a variable here to be of type fpos_t with a statement such as this:

fpos_t here = 0;

You could now record the current position in the file with the statement

fgetpos(pfile, &here);

This records the current file position in the variable here that you have defined.


Caution Note that you must declare a variable of type fpos_t. It's no good just declaring a pointer of type fpos_t*, as there won't be any memory allocated to store the position data.


Setting a Position in a File

As a complement to ftell(), you have the function fseek(), which has the following prototype:

int fseek(FILE *pfile, long offset, int origin);

The first parameter is a pointer to the file that you're repositioning. The second and third parameters define where you want to go in the file. The second parameter is an offset from a reference point specified by the third parameter. The reference point can be one of three values that are specified by the predefined names SEEK_SET, which defines the beginning of the file; SEEK_CUR, which defines the current position in the file; and SEEK_END, which, as you might guess, defines the end of the file. Of course, all three values are defined in the header file <stdio.h>. For a text mode file, the second argument must be a value returned by ftell() if you're to avoid getting lost. The third argument for text mode files must be SEEK_SET. So for text mode files, all operations with fseek() are performed with reference to the beginning of the file.

For binary files, the offset argument is simply a relative byte count. You can therefore supply positive or negative values for the offset when the reference point is specified as SEEK_CUR.

You have the fsetpos() function to go with fgetpos(). This has the rather straightforward prototype

int fsetpos(FILE *pfile, fpos_t *position);

The first parameter is a pointer to the file opened with fopen(), and the second is a pointer of the type you can see, where the value was obtained by calling fgetpos().

You can't go far wrong with this one really. You could use it with a statement such as this:

fsetpos(pfile, &here);

The variable here was previously set by a call to fgetpos(). As with fgetpos(), a nonzero value is returned on error. Because this function is designed to work with a value that is returned by fgetpos(), you can only use it to get to a place in a file that you've been before, whereas fseek() allows you to go to any specific position.

Note that the verb seek is used to refer to operations of moving the read/write heads of a disk drive directly to a specific position in the file. This is why the function fseek() is so named.

With a file that you've opened for update by specifying the mode as "rb+" or "wb+", for example, either a read or a write may be safely carried out on the file after executing either of the file positioning functions, fsetpos() or fseek(). This is regardless of what the previous operation on the file was.

Using Temporary Work Files

Very often you need a work file just for the duration of a program. You use it only to store intermediate results and you can throw it away when the program is finished. The program that calculates primes in this chapter is a good example; you really only need the file during the calculation.

You have a choice of two functions to help with temporary file usage, and each has advantages and disadvantages.

Creating a Temporary Work File

The first function will create a temporary file automatically. Its prototype is the following:

FILE *tmpfile(void);

The function takes no arguments and returns a pointer to the temporary file. If the file can't be created for any reason—for example, if the disk is full—the function returns NULL. The file is created and opened for update, so it can be written and read, but obviously it needs to be in that order. You can only ever get out what you have put in. The file is automatically deleted on exit from your program, so there's no need to worry about any mess left behind. You'll never know what the file is called, and because it doesn't last this doesn't matter.

The disadvantage of this function is that the file will be deleted as soon as you close it. This means you can't close the file, having written it in one part of the program, and then reopen it in another part of the program to read the data. You must keep the file open for as long as you need access to the data. A simple illustration of creating a temporary file is provided by these statements:

FILE pfile;                            /* File pointer                  */
pfile = tmpfile();                     /* Get pointer to temporary file */

Creating a Unique File Name

The second possibility is to use a function that provides you with a unique file name. Whether this ends up as the name of a temporary file is up to you. The prototype for this function is the following:

char *tmpnam(char *filename);

If the argument to the function is NULL, the file name is generated in an internal static object, and a pointer to that object is returned. If you want the name stored in a char array that you declare yourself, it must be at least L_tmpnam characters long, where L_tmpnam is an integer constant that is defined in <stdio.h>. In this case, the file name is stored in the array that you specify as an argument, and a pointer to your array is also returned. If the function is unable to create a unique name, it will return NULL.

So to take the first possibility, you can create a unique file with the following statements:

FILE *pFile = NULL;
char *filename = tmpnam(NULL);
if(filename != NULL)
  pfile = fopen(filename, "wb+");

Here you declare your file pointer pfile and then your pointer filename that is initialized with the address of the temporary file name that the tmpnam() function returns. Because the argument to tmpnam() is NULL, the file name will be generated as an internal static object whose address will be placed in the pointer filename. As long as filename is not NULL you call fopen() to create the file with the mode "wb+". Of course, you can also create temporary text files, too.

Don't be tempted to write this:

pfile = fopen(tmpnam(NULL), "wb+");    /* Wrong!! */

Apart from the fact there is a possibility that tmpnam() may return NULL, you also no longer have access to the file name, so you can't use remove() to delete the file.

If you want to create the array to hold the file name yourself, you could write this:

FILE *pfile = NULL;
char filename[L_tmpnam];
if(tmpnam(filename) != NULL)
  pfile = fopen(filename, "wb+");

Remember, the assistance that you've obtained from the standard library is just to provide a unique name. It's your responsibility to delete any files created.


Note You should note that you'll be limited to a maximum number of unique names from this function in your program. You can access the maximum number through TMP_MAX that is defined in <stdio.h>.


Updating Binary Files

You have three open modes that provide for updating binary files:

  • The mode "r+b" (or you can write it as "rb+") opens an existing binary file for both reading and writing. With this open mode you can read or write anywhere in the file.
  • The mode "w+b" (or you can write "wb+") truncates the length of an existing binary file to zero so the contents will be lost; you can then carry out both read and write operations but, obviously because the file length is zero, you must write something before you can read the file. If the file does not exist, a new file will be created when you call fopen() with mode "w+b".
  • The third mode "a+b" (or "ab+") opens an existing file for update. This mode only allows write operations at the end of the file.

While you can write each of the open modes for updating binary files in either of two ways, I prefer to always put the + at the end because for me it is more obvious that the + is significant and means update. We can first put together an example that uses mode "wb+" to create a new file that we can then update using the other modes.

Changing the File Contents

We could revise and extend the previous example so that it uses the other two binary update modes. Let's add capability to update the existing records in the file as well as add records or delete the file. This program will be rather more complicated so it will be helpful to break the operations down into more functions. We will still write the file so the names are recorded as they are, so the records consisting of a name and an age will vary in length. This will provide an opportunity to see some of the complications this introduces when we want to change the contents of the file.

To give you an idea of where we are headed, let's look at the program in outline. The program will consist of the following nine functions:

main():
Controls overall operation of the program and allows the user to select from a range of operations on the file.
listfile():
Outputs the contents of the file to stdin.
writefile():
Operates in two modes, either writes a new file with records read from stdin, or appends a record to the existing file.
getrecord():
Reads a record from stdin.
getname():
Reads a name from stdin.
writerecord():
Writes a record to the file.
readrecord():
Reads a record from the file.
findrecord():
Find the record in the file with a name that matches input.
duplicatefile():
Reproduces the file replacing a single updated record. This function is used to update a record when the new record will be a different length from the record being replaced.

Figure 12-5 shows the call hierarchy for the functions in the application.

The three functions called by main() implement the basic functionality of the program. The functions to the right of these three provide functionality that helps to simplify the three primary functions.

It will simplify the code if we define a structure that we can use to pass a name and age between functions:

struct Record
{
  char name[MAXLEN];
  int age;
};

We could easily write objects of type Record to the file, but this would mean the whole name array of MAXLEN elements would be written each time, so the file would contain a lot of spurious bytes for names shorter than MAXLEN characters. However, the structure will provide a very convenient way of passing a name and the associated age value to a function. There will be several functions in the new example, so let's first look at the code for each function before we put the example together. You can assemble the code for the functions into a single source file as you read through the following sections.

image

Figure 12-5 The hierarchy of function calls in Program 12.8

Reading a Record from the Keyboard

We can write a function that will read a name string and age value from stdin and store them in a Record object. The prototype of the function will be the following:

struct Record *getrecord(struct Record *precord);

The function requires an argument that is a pointer to an existing Record structure object and it returns the address of the same object. By returning the address of the Record object you make it possible to use a call to this function as an argument to another function that expects an argument of type Record *.

Here's how the implementation of the function looks:

/* Read the name and age for a record from the keyboard */
struct Record *getrecord(struct Record *precord)
{
  /* Verify the argument is good */
  if(!precord)
  {
    printf("No Record object to store input.");
    return NULL;
  }

  printf(" Enter a name less than %d characters:", MAXLEN);
  getname(precord->name);                     /* readf the name    */

  printf("Enter the age of %s: ", precord->name);
  scanf(" %d", &precord->age);                /* Read the age      */
  return precord;
}

This is a straightforward operation where the name and age that are read from stdin are stored in the appropriate members of the Record object that is pointed to by precord. The name is read by the auxiliary function getname() that you can implement like this:

/* Read a name from the keyboard */
void getname(char *pname)
{
  fflush(stdin);
  fgets(pname, MAXLEN, stdin);        /* Read the name     */
  int len = strlen(pname);
  if(pname[len-1] == ' ')         /* if there's a newline */
    pname[len-1] = '';           /* overwrite it         */
}

The only slight complication in getname() is the need to deal with the ' ' that may be stored by the fgets() function. If the input exceeds MAXLEN characters then the ' ' will still be in the input buffer and not stored in the array pointed to by pname. You'll need to read a name at more that one location in the program so packaging the operation in the getname() function is convenient.

Writing a Record to a File

You can now define a function that will write the members of a record object to a file identified by a pointer of type FILE *. The prototype would look like this:

void writerecord(struct Record *precord, FILE *pFile);

The first parameter is a pointer to a Record structure that has the name and age that are to be written to the file as members. The second argument is the file pointer.

The implementation looks like this:

/* Write a new record to the file at the current position */
void writerecord(struct Record *precord, FILE *pFile)
{
/* Verify the arguments are good */
  if(!precord)
  {
    printf("No Record object to write to the file.");
    return;
  }
  if(!pFile)
  {
    printf("No stream pointer for the output file.");
    return;
  }

  /* Write the name & age to file */
  size_t length = strlen(precord->name);                 /* Get name length   */
  fwrite(&length, sizeof(length), 1, pFile);             /* Write name length */
  fwrite(precord->name, sizeof(char), length, pFile);    /* then the name     */
  fwrite(&precord->age, sizeof(precord->age), 1, pFile); /* then the age      */
}

The function checks that the file pointer exists and the file will be written at the current position. It is therefore the responsibility of the calling function to ensure that the file has been opened in the correct mode and the file position has been set appropriately. The function first writes the length of the string to the file, followed by the string itself, excluding the terminating ''. This is to enable the code that will read the file to determine first how many characters are in the name string. Finally the age value is written to the file.

Reading a Record from a File

Here's the prototype of a function to read a single record from a file:

struct Record *readrecord(struct Record *precord, FILE *pFile);

The file to be read is identified by the second parameter, a file pointer. Purely as a convenience, the return value is the address that is passed as the first argument.

The implementation of the readrecord() function looks like this:

/* Reads a record from the file at the current position */
struct Record * readrecord(struct Record *precord, FILE *pFile)
{
  /* Verify the arguments are good */
  if(!precord)
  {
    printf("No Record object to store data from the file.");
    return NULL;
  }
  if(!pFile)
  {
    printf("No stream pointer for the input file.");
    return NULL;
  }

  size_t length = 0;                                    /* Name length      */
  fread(&length, sizeof(length), 1, pFile);             /* Read the length  */
  if(feof(pFile))                                       /* If it's end file */
    return NULL;                                        /* return NULL      */
/* Verify the name can be accommodated */
  if(length+1>MAXLEN)
  {
    fprintf(stderr, " Name too long. Ending program.");
    exit(1);
  }

  fread(precord->name, sizeof(char), length, pFile);    /* Read the name     */
  precord->name[length] = '';                         /* Append terminator */
  fread(&precord->age, sizeof(precord->age), 1, pFile); /* Read the age      */

  return precord;
}

Like the writerecord() function, the readrecord() function assumes the file has been opened with the correct mode specified and by default attempts to read a record from the current position. Each record starts with a length value that is read first. Of course, the file position could be at the end of the file, so you check for EOF by calling feof() with the file pointer as the argument after the read operation. If it is the end-of-file, the feof() function returns a nonzero integer value, so in this case you return NULL to signal the calling program that EOF has been reached.

The function then checks for the possibility that the length of the name exceeds the length of the name array. If it does, the program ends after outputting a message to the standard error stream.

If all is well, the name and age are read from the file and stored in the members of the record object. A '' has to be appended to the name string to avoid disastrous consequences when working with the string subsequently.

Writing a File

Here's the prototype of a function that will write an arbitrary number of records to a file, where the records are entered from the keyboard:

void writefile(char *filename, char *mode);

The first parameter is the name of the file to be written, so this implies that the function will take care of opening the file. The second parameter is the file open mode to be used. By passing "wb+" as the mode, the writefile() function will write to a file discarding any existing contents or create a new file with the specified name if it does not already exist. If the mode is specified as "ab+", records will be appended to an existing file, and a new file will be created if there isn't one already.

Here's the implementation of the function:

/* Write to a file */
void writefile(char *filename, char *mode)
{
  char answer = 'y';

  FILE *pFile = fopen(filename, mode);    /* Open the file                 */
  if(pFile == NULL)                       /* Verify file is open           */
  {
    fprintf(stderr, " File open failed.");
    exit(1);
  }
do
  {
    struct Record record;                 /* Stores a record name & age    */

    writerecord(getrecord(&record), pFile); /* Get record & write the file */

    printf("Do you want to enter another(y or n)?  " );
    scanf(" %c", &answer);
    fflush(stdin);                        /* Remove whitespace             */
  } while(tolower(answer) == 'y'),

  fclose(pFile);                          /* Close the file                */
}

After opening the file with the mode passed as the second argument, the function writes the file in the do-while loop. The read from stdin and the write to the file are done in the single statement that calls writerecord() with a call to getdata() as the first argument. The pointer to a Record object that getdata() returns is passed directly to the writerecord() function. The operation ends when the user enters 'n' or 'N' to indicate that no more data is to be entered. The file is closed before returning from the function.

Listing the File Contents

The prototype of a function that will list the records in a file on the standard output stream looks like this:

void listfile(char *filename);

The parameter is the name of the file, so the function will take care of opening the file initially and then closing it when the operation is complete.

Here's the implementation:

/* List the contents of the binary file */
void listfile(char *filename)
{
  /* Create the format string for names up to MAXLEN long */
  /* format array length allows up to 5 digits for MAXLEN */
  char format[15];                           /* Format string              */
  sprintf(format, " %%-%ds Age:%%4d", MAXLEN);

  FILE *pFile = fopen(filename, "rb");       /* Open file to read          */
  if(pFile == NULL)                          /* Check file is open         */
  {
    printf("Unable to open %s. Verify it exists. ", filename);
    return;
  }

  struct Record record;                      /* Stores a record            */
  printf(" The contents of %s are:", filename);

  while(readrecord(&record, pFile) != NULL)  /* As long as we have records */
    printf(format, record.name, record.age); /* Output the record          */
printf(" ");                              /* Move to next line          */

  fclose(pFile);                             /* Close the file             */
}

The function generates a format string that will adjust the name field width to be MAXLEN characters. The sprintf() function writes the format string to the format array.

The file is opened in binary read mode so the initial position will be at the beginning of the file. If the file is opened successfully, records are read from the file in the while loop by calling the readrecord() function that we defined earlier. The call to readrecord() is done in the loop condition so when NULL is returned signaling end-of-file has been detected, the loop ends. Within the loop you write the members of the Record object that was read by readrecord() to stdout using the string in the format array that was created initially. When all the records have been read, the file is closed by calling fclose() with the file pointer as the argument.

Updating the Existing File Contents

Updating existing records in the file adds a complication because of the variable length of the names in the file. You can't just arbitrarily overwrite an existing record because the chances are it won't fit in the space occupied by the record to be replaced. If the length of the new record is the same as the original, you can overwrite it. If they are different, the only solution is to write a new file. Here's the prototype of the function to update the file:

void updatefile(char *filename);

The only parameter is the file name, so the function will handle finding out which record is to be changed, as well as opening and closing the file. Here's the code:

/* Modify existing records in the file */
void updatefile(char *filename)
{  char answer = 'y';

  FILE *pFile = fopen(filename, "rb+");      /* Open the file for update   */
  if(pFile == NULL)                          /* Check file is open         */
  {
    fprintf(stderr, " File open for updating records failed.");
    return;
  }
  struct Record record;                      /* Stores a record            */
  int index = findrecord(&record, pFile);    /* Find the record for a name */
  if(index<0)                                /* If the record isn't there  */
  {
    printf(" Record not found.");           /* ouput a message            */
    return;                                  /* and we are done.           */
  }

  printf(" %s is aged %d,", record.name, record.age);
  struct Record newrecord;                   /* Stores replacement record  */
  printf(" You can now enter the new name and age for %s.", record.name);
  getrecord(&newrecord);                     /* Get the new record         */
/* Check if we can update in place */
  if((strlen(record.name) == strlen(newrecord.name)))
  { /* Name lengths are the same so we can */
    /* Move to start of old record         */
    fseek(pFile,
          -(long)(sizeof(size_t)+strlen(record.name)+sizeof(record.age)),
          SEEK_CUR);
    writerecord(&newrecord, pFile);          /* Write the new record       */
    fflush(pFile);                           /* Force the write            */
  }
  else
    duplicatefile(&newrecord, index, filename, pFile);

  printf("File update complete. ");
}

There's quite a lot of code in this function but it consists of a sequence of fairly simple steps:

  1. Open the file for update.
  2. Find the index (first record at index 0) for the record to be updated.
  3. Get the data for the record to replace the old record.
  4. Check if the record can be updated in place. This is possible when the lengths of the names are the same. If so move the current position back by the length of the old record and write the new record to the old file.
  5. If the names are different lengths duplicate the file with the new record replacing the old in the duplicate file.

After opening the file for update, the function reads the name corresponding to the record that is to be changed. The findrecord() function, which I'll get to in a moment, reads the name for the record to be updated, then returns the index value for that record, if it exists, with the first record at index 0. The findrecord() function will return −1 if the record is not found.

If the old and new names are the same length, move the file position back by the length of the old record by calling fseek(). Then write the new record to the file and flush the output buffer. Calling fflush() for the file forces the new record to be transferred from the file.

If the old and new records are different lengths, call duplicatefile() to copy the file with the new record replacing the old in the copy. You can implement the function like this:

/* Duplicate the existing file replacing the record to be update */
/* The record to be replaced is index records from the start     */
void duplicatefile(struct Record *pnewrecord, int index,
                                              char *filename, FILE *pFile)
{
    /* Create and open a new file */
  char tempname[L_tmpnam];
  if(tmpnam(tempname) == NULL)
  {
    printf(" Temporary file name creation failed.");
    return;
  }
  char tempfile[strlen(dirpath)+strlen(tempname)+1];
  strcpy(tempfile, dirpath);                  /* Copy original file path  */
  strcat(tempfile, tempname);                 /* Append temporary name    */
  FILE *ptempfile = fopen(tempfile, "wb+");
/* Copy first index records from old file to new file */
  rewind(pFile);                              /* Old file back to start    */
  struct Record record;                       /* Store for a record        */
  for(int i = 0 ; i<index ; i++)
    writerecord(readrecord(&record, pFile), ptempfile);


  writerecord(pnewrecord, ptempfile);         /* Write the new record      */
  readrecord(&record,pFile);                  /* Skip the old record       */

  /* Copy the rest of the old file to the new file */
  while(readrecord(&record,pFile))
    writerecord(&record, ptempfile);

  /* close the files */
  if(fclose(pFile)==EOF)
    printf(" Failed to close %s", filename);
  if(fclose(ptempfile)==EOF)
    printf(" Failed to close %s", tempfile);

  if(!remove(filename))                        /* Delete the old file       */
  {
    printf(" Removing the old file  failed. Check file in %s", dirpath);
    return;
  }

  /* Rename the new file same as original */
  if(!rename(tempfile, filename))
    printf(" Renaming the file copy failed. Check file in %s", dirpath);
}

This function carries the update through the following steps:

  1. Create a new file with a unique name in the same directory as the old file. The dirpath variable will be a global that contains the path to the original file.
  2. Copy all records preceding the record to be changed from the old file to the new file.
  3. Write the new record to the new file and skip over the record to be updated in the old file.
  4. Write all the remaining records from the old file to the new file.
  5. Close both files.
  6. Delete the old file and rename the new file with the name of the old file.

Once the new file is created using the name generated by tmpnam(), records are copied from the original file to the new file, with the exception that the record to be updated is replaced with the new record in the new file. The copying of the first index records is done in the for loop where the pointer that is returned by readrecord() reading the old file is passed as the argument to writerecord() for the new file. The copying of the records that follow the updated record is done in the while loop. Here you have to continue copying records until the end-of-file is reached in the old file. Finally, after closing both files, delete the old file to free up its name and then rename the new file to the old. If you want to do this more safely, you can rename the old file in some way rather than deleting it, perhaps by appending "_old" to the existing file name. You can then rename the new file as you do here. This would leave a backup file in the directory that would be useful if the update goes awry.

The implementation of the findrecord() function that is called by updatefile() to find the index for the record that matches the name that is entered looks like this:

/* Find a record                          */
/* Returns the index number of the record */
/* or −1 if the record is not found.      */
int findrecord(struct Record *precord, FILE *pFile)
{
  char name[MAXLEN];
  printf(" Enter the name for the record you wish to find: ");
  getname(name);

  rewind(pFile);                       /* Make sure we are at the start */
  int index = 0;                       /* Index of current record       */

  while(true)
  {
    readrecord(precord, pFile);
    if(feof(pFile))                     /* If end-of-file was reached    */
      return −1;                        /* record not found              */
    if(!strcmp(name, precord->name))
      break;
    ++index;
  }
  return index;                         /* Return record index           */
}

This function reads a name for the record that is to be changed, then reads records looking for a name that matches the name that was entered. If end-of-file is reached without finding the name, −1 is returned to signal to the calling program that the record is not in the file. If a name match is found, the function returns the index value of the matching record.

You can now assemble the complete working example.

File Open Modes Summary

You probably will need a little practice before the file open mode strings come immediately to mind, so Table 12-2 contains a summary that you can refer back to when necessary.

Table 12-2. File Modes

Mode Description
"w" Open or create a text file for write operations.
"a" Open a text file for append operations, adding to the end of the file.
"r" Open a text file for read operations.
"wb" Open or create a binary file for write operations.
"ab" Open a binary file for append operations.
"rb" Open a binary file for read operations.
"w+" Open or create a text file for update operations. An existing file will be truncated to zero length.
"a+" Open or create a text file for update operations, adding to the end of the file.
"r+" Open a text file for update operations (read and write anywhere).
"w+b" or "wb+" Open or create a binary file for update operations. An existing file will be truncated to zero length.
"a+b" or "ab+" Open a binary file for update operations, adding to the end of the file.
"r+b" or "rb+" Open a binary file for update operations (read and write anywhere).

Designing a Program

Now that you've come to the end of this chapter, you can put what you've learned into practice with a final program. This program will be shorter than the previous example, but nonetheless it's an interesting program that you may find useful.

The Problem

The problem you're going to solve is to write a file-viewer program. This will display any file in hexadecimal representation and as characters.

The Analysis

The program will open the file as binary read-only and then display the information in two columns, the first being the hexadecimal representation of the bytes in the file, and the second being the bytes represented as characters. The file name will be supplied as a command line argument or, if that isn't supplied, the program will ask for the file name.

The stages are as follows:

  1. If the file name isn't supplied, get it from the user.
  2. Open the file.
  3. Read and display the contents of the file.

The Solution

This section outlines the steps you'll take to solve the problem.

Step 1

You can easily check to see if the file name appears at the command line by specifying that the function main() has parameters. Up until now, we have ignored the possibility of parameters being passed to main(), but here you can use it as the means of identifying the file that's to be displayed. You'll recall that when main() is called, two parameters are passed to it. The first parameter is an integer indicating the number of words in the command line, and the second is an array of pointers to strings. The first string contains the name that you use to start the program at the command line, and the remaining strings represent the arguments that follow at the command line. Of course, this mechanism allows an arbitrary number of values to be entered at the command line and passed to main().

If the value of the first argument to main() is 1, there's only the program name on the command line, so in this case you'll have to prompt for the file name to be entered:

/* Program 12.8 Viewing the contents of a file */
#include <stdio.h>

const int MAXLEN = 256;                    /* Maximum file path length      */

int main(int argc, char *argv[])
{
  char filename[MAXLEN];                   /* Stores the file path          */

  if(argc == 1)                            /* No file name on command line? */
  {
    printf("Please enter a filename: ");   /* Prompt for input              */
    fgets(filename, MAXLEN, stdin);        /* Get the file name entered     */
/* Remove the newline if it's there */
    int len = strlen(filename);
    if(filename[len-1] == ' ')
      filename[len-1] = '';
  }
  return 0;
}

This allows for a maximum file path length of 256 characters.

Step 2

If the first argument to main() isn't 1, then you have at least one more argument, which you assume is the file name. You therefore copy the string pointed to by argv[1] to the variable openfile. Assuming that you have a valid file name, you can open the file and start reading it:

/* Program 12.8 Viewing the contents of a file */
#include <stdio.h>

const int MAXLEN = 256;                    /* Maximum file path length      */

int main(int argc, char *argv[])
{
  char filename[MAXLEN];                   /* Stores the file path          */
  FILE *pfile;                             /* File pointer                  */

  if(argc == 1)                            /* No file name on command line? */
  {
    printf("Please enter a filename: ");   /* Prompt for input              */
    fgets(filename, MAXLEN, stdin);        /* Get the file name entered     */

    /* Remove the newline if it's there */
    int len = strlen(filename);
    if(filename[len-1] == ' ')
      filename[len-1] = '';
  }
  else
    strcpy(filename, argv[1]);             /* Get 2nd command line string   */

  /* File can be opened OK? */
  if(!(pfile = fopen(filename, "rb")))
  {
    printf("Sorry, can't open %s", filename);
    return −1;
  }
  fclose(pfile);                           /* Close the file                */
  return 0;
}

You put the call to the fclose() function to close the file at the end of the program so that you don't forget about it later. Also, you use a return value of −1 for the program to indicate when an error has occurred.

Step 3

You can now output the file contents. You do this by reading the file one byte at a time and saving this data in a buffer. Once the buffer is full or the end of file has been reached, you output the buffer in the format you want. When you output the data as characters, you must first check that the character is printable, otherwise strange things may start happening on the screen. You use the function isprint(), declared in ctype.h, for this. If the character isn't printable, you'll print a period instead.

Here's the complete code for the program:

/* Program 12.8 Viewing the contents of a file */
#include <stdio.h>
#include <ctype.h>
#include <string.h>

const int MAXLEN = 256;                    /* Maximum file path length      */
const int DISPLAY = 80;                    /* Length of display line        */
const int PAGE_LENGTH = 20;                /* Lines per page                */

int main(int argc, char *argv[])
{
  char filename[MAXLEN];                   /* Stores the file path          */
  FILE *pfile;                             /* File pointer                  */
  unsigned char buffer[DISPLAY/4 - 1];     /* File input buffer             */
  int count = 0;                           /* Count of characters in buffer */
  int lines = 0;                           /* Number of lines displayed     */

  if(argc == 1)                            /* No file name on command line? */
  {
    printf("Please enter a filename: ");   /* Prompt for input              */
    fgets(filename, MAXLEN, stdin);        /* Get the file name entered     */

    /* Remove the newline if it's there */
    int len = strlen(filename);
    if(filename[len-1] == ' ')
      filename[len-1] = '';
  }
  else
    strcpy(filename, argv[1]);             /* Get 2nd command line string   */

    /* File can be opened OK?        */
    if(!(pfile = fopen(filename, "rb")))
    {
      printf("Sorry, can't open %s", filename);
      return −1;
    }
  while(!feof(pfile))                      /* Continue until end of file    */
  {
    if(count < sizeof buffer)              /* If the buffer is not full     */
      buffer[count++] = (unsigned char)fgetc(pfile);    /* Read a character */
    else
    { /* Output the buffer contents, first as hexadecimal */
      for(count = 0; count < sizeof buffer; count++)
        printf("%02X ", buffer[count]);
      printf("| ");                        /* Output separator              */
      /* Now display buffer contents as characters */
      for(count = 0; count < sizeof buffer; count++)
        printf("%c", isprint(buffer[count]) ? buffer[count]:'.'),
      printf(" ");                        /* End the line                  */
      count = 0;                           /* Reset count                   */

      if(!(++lines%PAGE_LENGTH))           /* End of page?                  */
       if(getchar()=='E')                  /* Wait for Enter                */
         return 0;                         /* E pressed                     */
    }
  }

  /* Display the last line, first as hexadecimal */
  for(int i = 0; i < sizeof buffer; i++)
    if(i < count)
      printf("%02X ", buffer[i]);          /* Output hexadecimal            */
    else
      printf("   ");                       /* Output spaces                 */
  printf("| ");                            /* Output separator              */

  /* Display last line as characters */
  for(int i = 0; i < count; i++)
    /* Output character    */
    printf("%c",isprint(buffer[i]) ? buffer[i]:'.'),

  /* End the line */
  printf(" ");
  fclose(pfile);                           /* Close the file                */
  return 0;
}

The symbol DISPLAY specifies the width of a line on the screen for output, and the symbol PAGE_LENGTH specifies the number of lines per page. You arrange to display a page, and then wait for Enter to be pressed before displaying the next page, thus avoiding the whole file whizzing by before you can read it.

You declare the buffer to hold input from the file as

unsigned char buffer[DISPLAY/4 - 1];     /* File input buffer             */

The expression for the array dimension arises from the fact that you'll need four characters on the screen to display each character from the file, plus one separator. Each character will be displayed as two hexadecimal digits plus a space, and as a single character, making four characters in all.

You continue reading as long as the while loop condition is true:

while(!feof(pfile))                      /* Continue until end of file    */

The library function, feof(), returns true if EOF is read from the file specified by the argument; otherwise, it returns false.

You fill the buffer array with characters from the file in the if statement:

if(count < sizeof buffer)              /* If the buffer is not full       */
      buffer[count++] = (unsigned char)fgetc(pfile);   /* Read a character    */

When count exceeds the capacity of buffer, the else clause will be executed to output the contents of the buffer:

else
    { /* Output the buffer contents, first as hexadecimal */
      for(count = 0; count < sizeof buffer; count++)
        printf("%02X ", buffer[count]);
      printf("| ");                        /* Output separator              */

      /* Now display buffer contents as characters */
      for(count = 0; count < sizeof buffer; count++)
        printf("%c", isprint(buffer[count]) ? buffer[count]:'.'),
      printf(" ");                        /* End the line                  */
      count = 0;                           /* Reset count                   */

      if(!(++lines%PAGE_LENGTH))           /* End of page?                  */
       if(getchar()=='E')                  /* Wait for Enter                */
         return 0;                         /* E pressed                     */
    }

The first for loop outputs the contents of buffer as hexadecimal characters. You then output a separator character and execute the next for loop to output the same data as characters. The conditional operator in the second argument to printf() ensures that nonprinting characters are output as a period.

The if statement increments the line count, lines, and for every PAGE_LENGTH number of lines, wait for a character to be entered. If you press Enter, the next pageful will be displayed, but if you press E and then Enter, the program will end. This provides you with an opportunity to escape from continuing to output the contents of a file that's larger than you thought.

The final couple of for loops are similar to those you've just seen. The only difference is that spaces are output for array elements that don't contain file characters. An example of the output is as follows. It shows part of the source file for the self-same program and you can deduce from the output that the file path was entered as a command line argument.


2F 2A 20 50 72 6F 67 72 61 6D 20 31 32 2E 38 20 56 69 65 | /* Program 12.8 Vie
77 69 6E 67 20 74 68 65 20 63 6F 6E 74 65 6E 74 73 20 6F | wing the contents o
66 20 61 20 66 69 6C 65 20 2A 2F 0D 0A 23 69 6E 63 6C 75 | f a file */..#inclu
64 65 20 3C 73 74 64 69 6F 2E 68 3E 0D 0A 23 69 6E 63 6C | de <stdio.h>..#incl
75 64 65 20 3C 63 74 79 70 65 2E 68 3E 0D 0A 23 69 6E 63 | ude <ctype.h>..#inc
6C 75 64 65 20 3C 73 74 72 69 6E 67 2E 68 3E 0D 0A 0D 0A | lude <string.h>....
63 6F 6E 73 74 20 69 6E 74 20 4D 41 58 4C 45 4E 20 3D 20 | const int MAXLEN =
32 35 36 3B 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | 256;
20 20 20 20 20 2F 2A 20 4D 61 78 69 6D 75 6D 20 66 69 6C |      /* Maximum fil
65 20 70 61 74 68 20 6C 65 6E 67 74 68 20 20 20 20 20 20 | e path length
2A 2F 0D 0A 63 6F 6E 73 74 20 69 6E 74 20 44 49 53 50 4C | */..const int DISPL
41 59 20 3D 20 38 30 3B 20 20 20 20 20 20 20 20 20 20 20 | AY = 80;
20 20 20 20 20 20 20 20 20 2F 2A 20 4C 65 6E 67 74 68 20 |          /* Length
6F 66 20 64 69 73 70 6C 61 79 20 6C 69 6E 65 20 20 20 20 | of display line
20 20 20 20 2A 2F 0D 0A 63 6F 6E 73 74 20 69 6E 74 20 50 |     */..const int P
41 47 45 5F 4C 45 4E 47 54 48 20 3D 20 32 30 3B 20 20 20 | AGE_LENGTH = 20;
20 20 20 20 20 20 20 20 20 20 20 20 20 2F 2A 20 4C 69 6E |              /* Lin
65 73 20 70 65 72 20 70 61 67 65 20 20 20 20 20 20 20 20 | es per page
20 20 20 20 20 20 20 20 2A 2F 0D 0A 0D 0A 69 6E 74 20 6D |         */....int m
61 69 6E 28 69 6E 74 20 61 72 67 63 2C 20 63 68 61 72 20 | ain(int argc, char

A lot more output follows, ending with this:


66 65 72 5B 69 5D 3A 27 2E 27 29 3B 0D 0A 20 20 2F 2A 20 | fer[i]:'.'),..  /*
45 6E 64 20 74 68 65 20 6C 69 6E 65 20 20 20 20 20 20 20 | End the line
20 20 20 2A 2F 0D 0A 20 20 70 72 69 6E 74 66 28 22 5C 6E |    */..  printf("
22 29 3B 0D 0A 20 20 66 63 6C 6F 73 65 28 70 66 69 6C 65 | ");..  fclose(pfile
29 3B 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | );
20 20 20 20 20 20 20 20 20 20 2F 2A 20 43 6C 6F 73 65 20 |           /* Close
74 68 65 20 66 69 6C 65 20 20 20 20 20 20 20 20 20 20 20 | the file
20 20 20 20 20 2A 2F 0D 0A 20 20 72 65 74 75 72 6E 20 30 |      */..  return 0
3B 0D 0A 7D 0D 0A 0D 0A FF                               | ;..}.....

Summary

Within this chapter I've covered all of the basic tools necessary to provide you with the ability to program the complete spectrum of file functions. The degree to which these have been demonstrated in examples has been, of necessity, relatively limited. There are many ways of applying these tools to provide more sophisticated ways of managing and retrieving information in a file. For example, it's possible to write index information into the file, either as a specific index at a known place in the file, often the beginning, or as position pointers within the blocks of data, rather like the pointers in a linked list. You should experiment with file operations until you feel confident that you understand the mechanisms involved.

Although the functions I discussed in this chapter cover most of the abilities you're likely to need, you'll find that the input/output library provided with your compiler offers quite a few additional functions that give you even more options for handling your file operations.

Exercises

The following exercises enable you to try out what you've learned in this chapter. If you get stuck, look back over the chapter for help. If you're still stuck, you can download the solutions from the Source Code/Download area of the Apress web site (http://www.apress.com), but that really should be a last resort.

Exercise 12-1. Write a program that will write an arbitrary number of strings to a file. The strings should be entered from the keyboard and the program shouldn't delete the file, as it will be used in the next exercise.

Exercise 12-2. Write a program that will read the file that was created by the previous exercise, and retrieve the strings one at a time in reverse sequence and write them to a new file in the sequence in which they were retrieved. For example, the program will retrieve the last string and write that to the new file, then retrieve the second to last and retrieve that from the file, and so on, for each string in the original file.

Exercise 12-3. Write a program that will read names and telephone numbers from the keyboard and write them to a new file if a file doesn't already exist and add them if the file does exist. The program should optionally list all the entries.

Exercise 12-4. Extend the program from the previous exercise to implement retrieval of all the numbers corresponding to a given second name. The program should allow further enquiries, adding new name/number entries and deleting existing entries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.70.185