Chapter 4. Files and File I/O

In this chapter

  • 4.1 Introducing the Linux/Unix I/O Model page 84

  • 4.2 Presenting a Basic Program Structure page 84

  • 4.3 Determining What Went Wrong page 86

  • 4.4 Doing Input and Output page 91

  • 4.5 Random Access: Moving Around within a File page 102

  • 4.6 Creating Files page 106

  • 4.7 Forcing Data to Disk page 113

  • 4.8 Setting File Length page 114

  • 4.9 Summary page 115

  • Exercises page 115

This chapter describes basic file operations: opening and creating files, reading and writing them, moving around in them, and closing them. Along the way it presents the standard mechanisms for detecting and reporting errors. The chapter ends off by describing how to set a file’s length and force file data and metadata to disk.

Introducing the Linux/Unix I/O Model

The Linux/Unix API model for I/O is straightforward. It can be summed up in four words: open, read, write, close. In fact, those are the names of the system calls: open(), read(), write(), close(). Here are their declarations:

#include <sys/types.h>                                           POSIX
#include <sys/stat.h>           /* for mode_t */
#include <fcntl.h>              /* for flags for open() */
#include <unistd.h>             /* for ssize_t */

int open(const char *pathname, int flags, mode_t mode);
ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
int close(int fd);

In the next and subsequent sections, we illustrate the model by writing a very simple version of cat. It’s so simple that it doesn’t even have options; all it does is concatenate the contents of the named files to standard output. It does do minimal error reporting. Once it’s written, we compare it to the V7 cat.

We present the program top-down, starting with the command line. In succeeding sections, we present error reporting and then get down to brass tacks, showing how to do actual file I/O.

Presenting a Basic Program Structure

Our version of cat follows a structure that is generally useful. The first part starts with an explanatory comment, header includes, declarations, and the main() function:

 1  /*
 2   * ch04-cat.c --- Demonstrate open(), read(), write(), close(),
 3   *                errno and strerror().
 4   */
 5
 6  #include <stdio.h>      /* for fprintf(), stderr, BUFSIZ */
 7  #include <errno.h>      /* declare errno */
 8  #include <fcntl.h>      /* for flags for open() */
 9  #include <string.h>     /* declare strerror() */
10  #include <unistd.h>     /* for ssize_t */
11  #include <sys/types.h>
12  #include <sys/stat.h>   /* for mode_t */
13
14  char *myname;
15  int process(char *file);
16
17  /* main --- loop over file arguments */
18
19  int
20  main(int argc, char **argv)
21  {
22      int i;
23      int errs = 0;
24
25      myname = argv[0];
26
27      if (argc == 1)
28          errs = process("-");
29      else
30          for (i = 1; i < argc; i++)
31              errs += process(argv[i]);
32
33      return (errs != 0);
34  }
    ... continued later in the chapter ...

The myname variable (line 14) is used later for error messages; main() sets it to the program name (argv[0]) as its first action (line 25). Then main() loops over the arguments. For each argument, it calls a function named process() to do the work.

When given the filename - (a single dash, or minus sign), Unix cat reads standard input instead of trying to open a file named -. In addition, with no arguments, cat reads standard input. ch04-cat implements both of these behaviors. The check for ’argc == 1’ (line 27) is true when there are no filename arguments; in this case, main() passes "-" to process(). Otherwise, main() loops over all the arguments, treating them as files to be processed. If one of them happens to be "-", the program then processes standard input.

If process() returns a nonzero value, it means that something went wrong. Errors are added up in the errs variable (lines 28 and 31). When main() ends, it returns 0 if there were no errors, and 1 if there were (line 33). This is a fairly standard convention, whose meaning is discussed in more detail in Section 9.1.5.1, “Defining Process Exit Status”, page 300.

The structure presented in main() is quite generic: process() could do anything we want to the file. For example (ignoring the special use of "-"), process() could just as easily remove files as concatenate them!

Before looking at the process() function, we have to describe how system call errors are represented and then how I/O is done. The process() function itself is presented in Section 4.4.3, “Reading and Writing”, page 96.

Determining What Went Wrong

 

“If anything can go wrong, it will”.

 
 --Murphy’s Law
 

“Be prepared.”

 
 --The Boy Scouts

Errors can occur anytime. Disks can fill up, users can enter invalid data, the server on a network from which a file is being read can crash, the network can die, and so on. It is important to always check every operation for success or failure.

The basic Linux system calls almost universally return -1 on error, and 0 or a positive value on success. This lets you know that the operation has succeeded or failed:

int result;

result = some_system_call(param1, param2);
if (result < 0) {
    /* error occurred, do something */
}
else
    /* all ok, proceed */

Knowing that an error occurred isn’t enough. It’s necessary to know what error occurred. For that, each process has a predefined variable named errno. Whenever a system call fails, errno is set to one of a set of predefined error values. errno and the predefined values are declared in the <errno.h> header file:

#include <errno.h>                                         ISO C

extern int errno;

errno itself may be a macro that acts like an int variable; it need not be a real integer. In particular, in threaded environments, each thread will have its own private version of errno. Practically speaking, though, for all the system calls and functions in this book, you can treat errno like a simple int.

Values for errno

The 2001 POSIX standard defines a large number of possible values for errno. Many of these are related to networking, IPC, or other specialized tasks. The manpage for each system call describes the possible errno values that can occur; thus, you can write code to check for particular errors and handle them specially if need be. The possible values are defined by symbolic constants. Table 4.1 lists the constants provided by GLIBC.

Table 4.1. GLIBC values for errno

Name

Meaning

E2BIG

Argument list too long.

EACCES

Permission denied.

EADDRINUSE

Address in use.

EADDRNOTAVAIL

Address not available.

EAFNOSUPPORT

Address family not supported.

EAGAIN

Resource unavailable, try again (may be the same value as EWOULDBLOCK).

EALREADY

Connection already in progress.

EBADF

Bad file descriptor.

EBADMSG

Bad message.

EBUSY

Device or resource busy.

ECANCELED

Operation canceled.

ECHILD

No child processes.

ECONNABORTED

Connection aborted.

ECONNREFUSED

Connection refused.

ECONNRESET

Connection reset.

EDEADLK

Resource deadlock would occur.

EDESTADDRREQ

Destination address required.

EDOM

Mathematics argument out of domain of function.

EDQUOT

Reserved.

EEXIST

File exists.

EFAULT

Bad address.

EFBIG

File too large.

EHOSTUNREACH

Host is unreachable.

EIDRM

Identifier removed.

EILSEQ

Illegal byte sequence.

EINPROGRESS

Operation in progress.

EINTR

Interrupted function.

EINVAL

Invalid argument.

EIO

I/O error.

EISCONN

Socket is connected.

EISDIR

Is a directory.

ELOOP

Too many levels of symbolic links.

EMFILE

Too many open files.

EMLINK

Too many links.

EMSGSIZE

Message too large.

EMULTIHOP

Reserved.

ENAMETOOLONG

Filename too long.

ENETDOWN

Network is down.

ENETRESET

Connection aborted by network.

ENETUNREACH

Network unreachable.

ENFILE

Too many files open in system.

ENOBUFS

No buffer space available.

ENODEV

No such device.

ENOENT

No such file or directory.

ENOEXEC

Executable file format error.

ENOLCK

No locks available.

ENOLINK

Reserved.

ENOMEM

Not enough space.

ENOMSG

No message of the desired type.

ENOPROTOOPT

Protocol not available.

ENOSPC

No space left on device.

ENOSYS

Function not supported.

ENOTCONN

The socket is not connected.

ENOTDIR

Not a directory.

ENOTEMPTY

Directory not empty.

ENOTSOCK

Not a socket.

ENOTSUP

Not supported.

ENOTTY

Inappropriate I/O control operation.

ENXIO

No such device or address.

EOPNOTSUPP

Operation not supported on socket.

EOVERFLOW

Value too large to be stored in data type.

EPERM

Operation not permitted.

EPIPE

Broken pipe.

EPROTO

Protocol error.

EPROTONOSUPPORT

Protocol not supported.

EPROTOTYPE

Protocol wrong type for socket.

ERANGE

Result too large.

EROFS

Read-only file system.

ESPIPE

Invalid seek.

ESRCH

No such process.

ESTALE

Reserved.

ETIMEDOUT

Connection timed out.

ETXTBSY

Text file busy.

EWOULDBLOCK

Operation would block (may be the same value as EAGAIN).

EXDEV

Cross-device link.

Many systems provide other error values as well, and older systems may not have all the errors just listed. You should check your local intro(2) and errno(2) manpages for the full story.

Note

errno should be examined only after an error has occurred and before further system calls are made. Its initial value is 0. However, nothing changes errno between errors, meaning that a successful system call does not reset it to 0. You can, of course, manually set it to 0 initially or whenever you like, but this is rarely done.

Initially, we use errno only for error reporting. There are two useful functions for error reporting. The first is perror():

#include <stdio.h>                                        ISO C

void perror(const char *s);

The perror() function prints a program-supplied string, followed by a colon, and then a string describing the value of errno:

if (some_system_call(param1, param2) < 0) {
    perror("system call failed");
    return 1;
}

We prefer the strerror() function, which takes an error value parameter and returns a pointer to a string describing the error:

#include <string.h>                                       ISO C

char *strerror(int errnum);

strerror() provides maximum flexibility in error reporting, since fprintf() makes it possible to print the error in any way we like:

if (some_system_call(param1, param2) < 0) {
    fprintf(stderr, "%s: %d, %d: some_system_call failed: %s
",
            argv[0], param1, param2, strerror(errno));
    return 1;
}

You will see many examples of both functions throughout the book.

Error Message Style

C provides several special macros for use in error reporting. The most widely used are __FILE__ and __LINE__, which expand to the name of the source file and the current line number in that file. These have been available in C since its beginning. C99 defines an additional predefined identifier, __func__, which represents the name of the current function as a character string. The macros are used like this:

if (some_system_call(param1, param2) < 0) {
    fprintf(stderr, "%s: %s (%s %d): some_system_call(%d, %d) failed: %s
",
        argv[0], __func__, __FILE__, __LINE__,
 param1, param2, strerror(errno));
    return 1;
}

Here, the error message includes not only the program’s name but also the function name, source file name, and line number. The full list of identifiers useful for diagnostics is provided in Table 4.2.

Table 4.2. C99 diagnostic identifiers

Identifier

C version

Meaning

__DATE__

C89

Date of compilation in the form "Mmm nn yyyy".

__FILE__

Original

Source-file name in the form "program.c".

__LINE__

Original

Source-file line number in the form 42.

__TIME__

C89

Time of compilation in the form "hh:mm:ss".

__func__

C99

Name of current function, as if declared const char __func__[] = "name".

The use of __FILE__ and __LINE__ was quite popular in the early days of Unix, when most people had source code and could find the error and fix it. As Unix systems became more commercial, use of these identifiers gradually diminished, since knowing the source code location isn’t of much help to someone who only has a binary executable.

Today, although GNU/Linux systems come with source code, said source code often isn’t installed by default. Thus, using these identifiers for error messages doesn’t seem to provide much additional value. The GNU Coding Standards don’t even mention them.

Doing Input and Output

All I/O in Linux is accomplished through file descriptors. This section introduces file descriptors, describes how to obtain and release them, and explains how to do I/O with them.

Understanding File Descriptors

A file descriptor is an integer value. Valid file descriptors start at 0 and go up to some system-defined limit. These integers are in fact simple indexes into each process’s table of open files. (This table is maintained inside the operating system; it is not accessible to a running program.) On most modern systems, the size of the table is large. The command ’ulimit -n’ prints the value:

$ ulimit -n
1024

From C, the maximum number of open files is returned by the getdtablesize() (get descriptor table size) function:

#include <unistd.h>                                              Common

int getdtablesize(void);

This small program prints the result of the function:

/* ch04-maxfds.c --- Demonstrate getdtablesize(). */

#include <stdio.h>         /* for fprintf(), stderr, BUFSIZ */
#include <unistd.h>        /* for ssize_t */

int
main(int argc, char **argv)
{
   printf("max fds: %d
", getdtablesize());
   exit(0);
}

When compiled and run, not surprisingly the program prints the same value as printed by ulimit:

$ ch04-maxfds
max fds: 1024

File descriptors are held in normal int variables; it is typical to see declarations of the form ’int fd’ for use with I/O system calls. There is no predefined type for file descriptors.

In the usual case, every program starts running with three file descriptors already opened for it. These are standard input, standard output, and standard error, on file descriptors 0, 1, and 2, respectively. (If not otherwise redirected, each one is connected to your keyboard and screen.)

Opening and Closing Files

New file descriptors are obtained (among other sources) from the open() system call. This system call opens a file for reading or writing and returns a new file descriptor for subsequent operations on the file. We saw the declaration earlier:

#include <sys/types.h>                                       POSIX
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int open(const char *pathname, int flags, mode_t mode);

The three arguments are as follows:

const char *pathname

  • A C string, representing the name of the file to open.

int flags

  • The bitwise-OR of one or more of the constants defined in <fcntl.h>. We describe them shortly.

mode_t mode

  • The permissions mode of a file being created. This is discussed later in the chapter, see Section 4.6, “Creating Files,” page 106. When opening an existing file, omit this parameter.[1]

The return value from open() is either the new file descriptor or -1 to indicate an error, in which case errno will be set. For simple I/O, the flags argument should be one of the values in Table 4.3.

Table 4.3. Flag values for open()

Symbolic constant

Value

Meaning

O_RDONLY

0

Open file only for reading; writes will fail.

O_WRONLY

1

Open file only for writing; reads will fail.

O_RDWR

2

Open file for reading and writing.

We will see example code shortly. Additional values for flags are described in Section 4.6, “Creating Files,” page 106. Much early Unix code didn’t use the symbolic values. Instead, the numeric value was used. Today this is considered bad practice, but we present the values so that you’ll recognize their meanings if you see them.

The close() system call closes a file: The entry for it in the system’s file descriptor table is marked as unused, and no further operations may be done with that file descriptor. The declaration is

#include <unistd.h>                                              POSIX

int close(int fd);

The return value is 0 on success, -1 on error. There isn’t much you can do if an error does occur, other than report it. Errors closing files are unusual, but not unheard of, particularly for files being accessed over a network. Thus, it’s good practice to check the return value, particularly for files opened for writing.

If you choose to ignore the return value, specifically cast it to void, to signify that you don’t care about the result:

(void) close(fd);      /* throw away return value */

The flip side of this advice is that too many casts to void tend to the clutter the code. For example, despite the “always check the return value” principle, it’s exceedingly rare to see code that checks the return value of printf() or bothers to cast it to void. As with many aspects of C programming, experience and judgment should be applied here too.

As mentioned, the number of open files, while large, is limited, and you should always close files when you’re done with them. If you don’t, you will eventually run out of file descriptors, a situation that leads to a lack of robustness on the part of your program.

The system closes all open files when a process exits, but—except for 0, 1, and 2—it’s bad form to rely on this.

When open() returns a new file descriptor, it always returns the lowest unused integer value. Always. Thus, if file descriptors 0–6 are open and the program closes file descriptor 5, then the next call to open() returns 5, not 7. This behavior is important; we see later in the book how it’s used to cleanly implement many important Unix features, such as I/O redirection and piping.

Mapping FILE * Variables to File Descriptors

The Standard I/O library functions and FILE * variables from <stdio.h>, such as stdin, stdout, and stderr, are built on top of the file-descriptor-based system calls.

Occasionally, it’s useful to directly access the file descriptor associated with a <stdio.h> file pointer if you need to do something not defined by the ISO C standard. The fileno() function returns the underlying file descriptor:

#include <stdio.h>                                              POSIX

int fileno(FILE *stream);

We will see an example later, in Section 4.4.4, “Example: Unix cat,” page 99.

Closing All Open Files

Open files are inherited by child processes from their parent processes. They are, in effect, shared. In particular, the position in the file is shared. We leave the details for discussion later, in Section 9.1.1.2, “File Descriptor Sharing,” page 286.

Since programs can inherit open files, you may occasionally see programs that close all their files in order to start out with a “clean slate.” In particular, code like this is typical:

int i;

/* leave 0, 1, and 2 alone */
for (i = 3; i < getdtablesize(); i++)
    (void) close(i);

Assume that the result of getdtablesize() is 1024. This code works, but it makes (1024 – 3) * 2 = 2042 system calls. 1020 of them are needless, since the return value from getdtablesize() doesn’t change. Here is a better way to write this code:

int i, fds;

for (i = 3, fds = getdtablesize(); i < fds; i++)
    (void) close(i);

Such an optimization does not affect the readability of the code, and it can make a difference, particularly on slow systems. In general, it’s worth looking for cases in which loops compute the same result repeatedly, to see if such a computation can’t be pulled out of the loop. In all such cases, though, be sure that you (a) preserve the code’s correctness and (b) preserve its readability!

Reading and Writing

I/O is accomplished with the read() and write() system calls, respectively:

#include <sys/types.h>                                         POSIX
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);

Each function is about as simple as can be. The arguments are the file descriptor for the open file, a pointer to a buffer to read data into or to write data from, and the number of bytes to read or write.

The return value is the number of bytes actually read or written. (This number can be smaller than the requested amount: For a read operation this happens when fewer than count bytes are left in the file, and for a write operation it happens if a disk fills up or some other error occurs.) The return value is -1 if an error occurred, in which case errno indicates the error. When read() returns 0, it means that end-of-file has been reached.

We can now show the rest of the code for ch04-cat. The process() routine uses 0 if the input filename is "-", for standard input (lines 50 and 51). Otherwise, it opens the given file:

36  /*
37   * process --- do something with the file, in this case,
38   *             send it to stdout (fd 1).
39   *             Returns 0 if all OK, 1 otherwise.
40   */
41
42  int
43  process(char *file)
44  {
45      int fd;
46      ssize_t rcount, wcount;
47      char buffer[BUFSIZ];
48      int errors = 0;
49
50      if (strcmp(file, "-") == 0)
51          fd = 0;
52      else if ((fd = open(file, O_RDONLY)) < 0) {
53          fprintf(stderr, "%s: %s: cannot open for reading: %s
",
54                  myname, file, strerror(errno));
55          return 1;
56      }

The buffer buffer (line 47) is of size BUFSIZ; this constant is defined by <stdio.h> to be the “optimal” block size for I/O. Although the value for BUFSIZ varies across systems, code that uses this constant is clean and portable.

The core of the routine is the following loop, which repeatedly reads data until either end-of-file or an error is encountered:

58      while ((rcount = read(fd, buffer, sizeof buffer)) > 0) {
59          wcount = write(1, buffer, rcount);
60          if (wcount != rcount) {
61              fprintf(stderr, "%s: %s: write error: %s
",
62                      myname, file, strerror(errno));
63              errors++;
64              break;
65          }
66      }

The rcount and wcount variables (line 45) are of type ssize_t, “signed size_t,” which allows them to hold negative values. Note that the count value passed to write() is the return value from read() (line 59). While we want to read fixed-size BUFSIZ chunks, it is unlikely that the file itself is a multiple of BUFSIZ bytes big. When the final, smaller, chunk of bytes is read from the file, the return value indicates how many bytes of buffer received new data. Only those bytes should be copied to standard output, not the entire buffer.

The test ’wcount != rcount’ on line 60 is the correct way to check for write errors; if some, but not all, of the data were written, then wcount will be positive but smaller than rcount.

Finally, process() checks for read errors (lines 68–72) and then attempts to close the file. In the (unlikely) event that close() fails (line 75), it prints an error message. Avoiding the close of standard input isn’t strictly necessary in this program, but it’s a good habit to develop for writing larger programs, in case other code elsewhere wants to do something with it or if a child program will inherit it. The last statement (line 82) returns 1 if there were errors, 0 otherwise.

68      if (rcount < 0) {
69          fprintf(stderr, "%s: %s: read error: %s
",
70                  myname, file, strerror(errno));
71          errors++;
72      }
73
74      if (fd != 0) {
75          if (close(fd) < 0) {
76              fprintf(stderr, "%s: %s: close error: %s
",
77                  myname, file, strerror(errno));
78              errors++;
79          }
80      }
81
82      return (errors != 0);
83  }

ch04-cat checks every system call for errors. While this is tedious, it provides robustness (or at least clarity): When something goes wrong, ch04-cat prints an error message that is as specific as possible. The combination of errno and strerror() makes this easy to do. That’s it for ch04-cat, only 88 lines of code!

To sum up, there are several points to understand about Unix I/O:

I/O is uninterpreted.

  • The I/O system calls merely move bytes around. They do no interpretation of the data; all interpretation is up to the user-level program. This makes reading and writing binary structures just as easy as reading and writing lines of text (easier, really, although using binary data introduces portability problems).

I/O is flexible.

  • You can read or write as many bytes at a time as you like. You can even read and write data one byte at a time, although doing so for large amounts of data is more expensive that doing so in large chunks.

I/O is simple.

  • The three-valued return (negative for error, zero for end-of-file, positive for a count) makes programming straightforward and obvious.

I/O can be partial.

  • Both read() and write() can transfer fewer bytes than requested. Application code (that is, your code) must always be aware of this.

Example: Unix cat

As promised, here is the V7 version of cat.[2] It begins by checking for options. The V7 cat accepts a single option, -u, for doing unbuffered output.

The basic design is similar to the one shown above; it loops over the files named by the command-line arguments and reads each file, one character at a time, sending the characters to standard output. Unlike our version, it uses the <stdio.h> facilities. In many ways code using the Standard I/O library is easier to read and write, since all buffering issues are hidden by the library.

 1  /*
 2   * Concatenate files.
 3   */
 4
 5  #include <stdio.h>
 6  #include <sys/types.h>
 7  #include <sys/stat.h>
 8
 9  char    stdbuf[BUFSIZ];
10
11  main(argc, argv)                        int main(int argc, char **argv)
12  char **argv;
13  {
14      int fflg = 0;
15      register FILE *fi;
16      register c;
17      int dev, ino = -1;
18      struct stat statb;
19
20      setbuf(stdout, stdbuf);
21      for( ; argc>1 && argv[1][0]=='-'; argc--,argv++) {
22          switch(argv[1] [1]) {             Process options
23          case 0:
24              break;
25          case 'u':
26              setbuf(stdout, (char *)NULL);
27              continue;
28          }
29          break;
30      }
31      fstat(fileno(stdout), &statb);         Lines 31–36 explained in Chapter 5
32      statb.st_mode &= S_IFMT;
33      if (statb.st_mode!=S_IFCHR && statb.st_mode!=S_IFBLK) {
34          dev = statb.st_dev;
35          ino = statb.st_ino;
36      }
37      if (argc < 2) {
38          argc = 2;
39          fflg++;
40      }
41      while (--argc > 0) {                   Loop over files
42          if (fflg || (*++argv) [0]=='-' && (*argv) [1]=='')
43              fi = stdin;
44          else {
45              if ((fi = fopen(*argv, "r")) == NULL) {
46                  fprintf(stderr, "cat: can't open %s
", *argv);
47                  continue;
48              }
49          }
50          fstat(fileno(fi), &statb);         Lines 50–56 explained in Chapter 5
51          if (statb.st_dev==dev && statb.st_ino==ino) {
52              fprintf(stderr, "cat: input %s is output
",
53                 fflg?"-": *argv);
54              fclose(fi);
55              continue;
56          }
57          while ((c = getc(fi)) != EOF)      Copy file contents to stdout
58              putchar(c);
59          if (fi!=stdin)
60              fclose(fi);
61      }
62      return(0);
63  }

Of note is that the program always exits successfully (line 62); it could have been written to note errors and indicate them in main()’s return value. (The mechanics of process exiting and the meaning of different exit status values are discussed in Section 9.1.5.1, “Defining Process Exit Status,” page 300.)

The code dealing with the struct stat and the fstat() function (lines 31–36 and 50–56) is undoubtedly opaque, since we haven’t yet covered these functions, and won’t until the next chapter. (But do note the use of fileno() on line 50 to get at the underlying file descriptor associated with the FILE * variables.) The idea behind the code is to make sure that no input file is the same as the output file. This is intended to prevent infinite file growth, in case of a command like this:

$ cat myfile >> myfile                       Append one copy of myfile onto itself?

And indeed, the check works:

$ echo hi > myfile                           Create a file
$ v7cat myfile >> myfile                     Attempt to append it onto itself
cat: input myfile is output

If you try this with ch04-cat, it will keep running, and myfile will keep growing until you interrupt it. The GNU version of cat does perform the check. Note that something like the following is beyond cat’s control:

$ v7cat < myfile > myfile
cat: input - is output
$ ls -l myfile
-rw-r--r-- 1 arnold devel               0 Mar 24 14:17 myfile

In this case, it’s too late because the shell truncated myfile (with the > operator) before cat ever gets a chance to examine the file!

In Section 5.4.4.2, “The V7 cat Revisited,” page 150, we explain the struct stat code.

Random Access: Moving Around within a File

So far, we have discussed sequential I/O, whereby data are read or written beginning at the front of the file and continuing until the end. Often, this is all a program needs to do. However, it is possible to do random access I/O; that is, read data from an arbitrary position in the file, without having to read everything before that position first.

The offset of a file descriptor is the position within an open file at which the next read or write will occur. A program sets the offset with the lseek() system call:

#include <sys/types.h>    /* for off_t */                        POSIX
#include <unistd.h>       /* declares lseek() and whence values */

off_t lseek(int fd, off_t offset, int whence);

The type off_t (offset type) is a signed integer type representing byte positions (offsets from the beginning) within a file. On 32-bit systems, the type is usually a long. However, many modern systems allow very large files, in which case off_t may be a more unusual type, such as a C99 int64_t or some other extended type. lseek() takes three arguments, as follows:

int fd

  • The file descriptor for the open file.

off_t offset

  • A position to which to move. The interpretation of this value depends on the whence parameter. offset can be positive or negative: Negative values move toward the front of the file; positive values move toward the end of the file.

int whence

  • Describes the location in the file to which offset is relative. See Table 4.4.

Table 4.4. whence values for lseek()

Symbolic constant

Value

Meaning

SEEK_SET

0

offset is absolute, that is, relative to the beginning of the file.

SEEK_CUR

1

offset is relative to the current position in the file.

SEEK_END

2

offset is relative to the end of the file.

Much old code uses the numeric values shown in Table 4.4. However, any new code you write should use the symbolic values, whose meanings are clearer.

The meaning of the values and their effects upon file position are shown in Figure 4.1. Assuming that the file has 3000 bytes and that the current offset is 2000 before each call to lseek(), the new position after each call is as shown:

Offsets for lseek()

Figure 4.1. Offsets for lseek()

Negative offsets relative to the beginning of the file are meaningless; they fail with an “invalid argument” error.

The return value is the new position in the file. Thus, to find out where in the file you are, use

off_t curpos;
...
curpos = lseek(fd, (off_t) 0, SEEK_CUR);

The l in lseek() stands for long. lseek() was introduced in V7 Unix when file sizes were extended; V6 had a simple seek() system call. As a result, much old documentation (and code) treats the offset parameter as if it had type long, and instead of a cast to off_t, it’s not unusual to see an L suffix on constant offset values:

curpos = lseek(fd, 0L, SEEK_CUR);

On systems with a Standard C compiler, where lseek() is declared with a prototype, such old code continues to work since the compiler automatically promotes the 0L from long to off_t if they are different types.

One interesting and important aspect of lseek() is that it is possible to seek beyond the end of a file. Any data that are subsequently written at that point go into the file, but with a “gap” or “hole” between the data at the previous end of the file and the new data. Data in the gap read as if they are all zeros.

The following program demonstrates the creation of holes. It writes three instances of a struct at the beginning, middle, and far end of a file. The offsets chosen (lines 16–18, the third element of each structure) are arbitrary but big enough to demonstrate the point:

 1  /* ch04-holes.c --- Demonstrate lseek() and holes in files. */
 2
 3  #include <stdio.h>      /* for fprintf(), stderr, BUFSIZ */
 4  #include <errno.h>      /* declare errno */
 5  #include <fcntl.h>      /* for flags for open() */
 6  #include <string.h>     /* declare strerror() */
 7  #include <unistd.h>     /* for ssize_t */
 8  #include <sys/types.h>  /* for off_t, etc. */
 9  #include <sys/stat.h>   /* for mode_t */
10
11  struct person {
12      char name[10];      /* first name */
13      char id[10];        /* ID number */
14      off_t pos;          /* position in file, for demonstration */
15  } people[] = {
16      { "arnold", "123456789", 0 },
17      { "miriam", "987654321", 10240 },
18      { "joe",    "192837465", 81920 },
19  };
20
21  int
22  main(int argc, char **argv)
23  {
24      int fd;
25      int i, j;
26
27      if (argc < 2) {
28          fprintf(stderr, "usage: %s file
", argv[0]);
29          return 1;
30      }
31
32      fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666);
33      if (fd < 0) {
34          fprintf(stderr, "%s: %s: cannot open for read/write: %s
",
35                  argv[0], argv[1], strerror(errno));
36          return 1;
37      }
38
39      j = sizeof(people) / sizeof(people[0]);      /* count of elements */

Lines 27–30 make sure that the program was invoked properly. Lines 32–37 open the named file and verify that the open succeeded.

The calculation on line 39 of j, the array element count, uses a lovely, portable trick: The number of elements is the size of the entire array divided by the size of the first element. The beauty of this idiom is that it’s always right: No matter how many elements you add to or remove from such an array, the compiler will figure it out. It also doesn’t require a terminating sentinel element; that is, one in which all the fields are set to zero, NULL, or some such.

The work is done by a loop (lines 41–55), which seeks to the byte offset given in each structure (line 42) and then writes the structure out (line 49):

41      for (i = 0; i < j; i++) {
42          if (lseek(fd, people[i].pos, SEEK_SET) < 0) {
43              fprintf(stderr, "%s: %s: seek error: %s
",
44                  argv[0], argv[1], strerror(errno));
45              (void) close(fd);
46              return 1;
47          }
48
49          if (write(fd, &people[i], sizeof(people[i])) != sizeof(people[i])) {
50              fprintf(stderr, "%s: %s: write error: %s
",
51                  argv[0], argv[1], strerror(errno));
52              (void) close(fd);
53              return 1;
54          }
55      }
56
57      /* all ok here */
58      (void) close(fd);
59      return 0;
60  }

Here are the results when the program is run:

$ ch04-holes peoplelist                 Run the program
$ ls -ls peoplelist                     Show size and blocks used
  16 -rw-r--r--    1 arnold   devel     81944 Mar 23 17:43 peoplelist
$ echo 81944 / 4096 | bc -l             Show blocks if no holes
20.00585937500000000000

We happen to know that each disk block in the file uses 4096 bytes. (How we know that is discussed in Section 5.4.2, “Retrieving File Information,” page 141. For now, take it as a given.) The final bc command indicates that a file of size 81,944 bytes needs 21 disk blocks. However, the -s option to ls, which tells us how many blocks a file really uses, shows that the file uses only 16 blocks![3] The missing blocks in the file are the holes. This is illustrated in Figure 4.2.

Holes in a file

Figure 4.2. Holes in a file

Note

ch04-holes.c does direct binary I/O. This nicely illustrates the beauty of random access I/O: You can treat a disk file as if it were a very large array of binary data structures.

In practice, storing live data by using binary I/O is a design decision that you should consider carefully. For example, suppose you need to move the data to a system using different byte orders for integers? Or different floating-point formats? Or to a system with different alignment requirements? Ignoring such issues can become significantly costly.

Creating Files

As described earlier, open() apparently opens existing files only. This section describes how brand-new files are created. There are two choices: creat() and open() with additional flags. Initially, creat() was the only way to create a file, but open() was later enhanced with this functionality as well. Both mechanisms require specification of the initial file permissions.

Specifying Initial File Permissions

As a GNU/Linux user, you are familiar with file permissions as printed by ’ls -l’: read, write, and execute for each of user (the file’s owner), group, and other. The various combinations are often expressed in octal, particularly for the chmod and umask commands. For example, file permissions -rw-r--r-- is equivalent to octal 0644 and -rwxr-xr-x is equivalent to octal 0755. (The leading 0 is C’s notation for octal values.)

When you create a file, you must know the protections to be given to the new file. You can do this as a raw octal number if you choose, and indeed it’s not uncommon to see such numbers in older code. However, it is better to use a bitwise OR of one or more of the symbolic constants from <sys/stat.h>, described in Table 4.5.

Table 4.5. POSIX symbolic constants for file modes

Symbolic constant

Value

Meaning

S_IRWXU

00700

User read, write, and execute permission.

S_IRUSR

00400

User read permission.

S_IREAD

 

Same as S_IRUSR.

S_IWUSR

00200

User write permission.

S_IWRITE

 

Same as S_IWUSR.

S_IXUSR

00100

User execute permission.

S_IEXEC

 

Same as S_IXUSR.

S_IRWXG

00070

Group read, write, and execute permission.

S_IRGRP

00040

Group read permission.

S_IWGRP

00020

Group write permission.

S_IXGRP

00010

Group execute permission.

S_IRWXO

00007

Other read, write, and execute permission.

S_IROTH

00004

Other read permission.

S_IWOTH

00002

Other write permission.

S_IXOTH

00001

Other execute permission.

The following fragment shows how to create variables representing permissions -rw-r--r-- and -rwxr-xr-x (0644 and 0755 respectively):

mode_t rw_mode, rwx_mode;

rw_mode  = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;               /* 0644 */
rwx_mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;     /* 0755 */

Older code used S_IREAD, S_IWRITE, and S_IEXEC together with bit shifting to produce the same results:

mode_t rw_mode, rwx_mode;

rw_mode  = (S_IREAD|S_IWRITE) | (S_IREAD >> 3) | (S_IREAD >> 6); /* 0644 */
rwx_mode = (S_IREAD|S_IWRITE|S_IEXEC) |
           ((S_IREAD|S_IEXEC) >> 3) | ((S_IREAD|S_IEXEC) >> 6);  /* 0755 */

Unfortunately, neither notation is incredibly clear. The modern version is preferred since each permission bit has its own name and there is less opportunity to do the bitwise operations incorrectly.

The additional permission bits shown in Table 4.6 are available for use when you are changing a file’s permission, but they should not be used when you initially create a file. Whether these bits may be included varies wildly by operating system. It’s best not to try; rather, you should explicitly change the permissions after the file is created. (Changing permission is described in Section 5.5.2, “Changing Permissions: chmod() and fchmod(),” page 156. The meanings of these bits is discussed in Chapter 11, “Permissions and User and Group ID Numbers,” page 403.)

Table 4.6. Additional POSIX symbolic constants for file modes

Symbolic constant

Value

Meaning

S_ISUID

04000

Set user ID.

S_ISGID

02000

Set group ID.

S_ISVTX

01000

Save text.

When standard utilities create files, the default permissions they use are -rw-rw-rw- (or 0666). Because most users prefer to avoid having files that are world-writable, each process carries with it a umask. The umask is a set of permission bits indicating those bits that should never be allowed when new files are created. (The umask is not used when changing permissions.) Conceptually, the operation that occurs is

actual_permissions = (requested_permissions & (~umask));

The umask is usually set by the umask command in $HOME/.profile when you log in. From a C program, it’s set with the umask() system call:

#include <sys/types.h>                                      POSIX
#include <sys/stat.h>

mode_t umask(mode_t mask);

The return value is the old umask. Thus, to determine the current mask, you must set it to a value and then reset it (or change it, as desired):

mode_t mask = umask(0);       /* retrieve current mask */
(void) umask(mask);           /* restore it */

Here is an example of the umask in action, at the shell level:

$ umask                                   Show the current mask
0022
$ touch newfile                           Create a file
$ ls -l newfile                           Show permissions of new file
-rw-r--r--    1 arnold   devel         0 Mar 24 15:43 newfile
$ umask 0                                 Set mask to empty
$ touch newfile2                          Create a second file
$ ls -l newfile2                          Show permissions of new file
-rw-rw-rw-    1 arnold   devel         0 Mar 24 15:44 newfile2

Creating Files with creat()

The creat()[4] system call creates new files. It is declared as follows:

#include <sys/types.h>                                    POSIX
#include <sys/stat.h>
#include <fcntl.h>

int creat(const char *pathname, mode_t mode);

The mode argument represents the permissions for the new file (as discussed in the previous section). The file named by pathname is created, with the given permission as modified by the umask. It is opened for writing (only), and the return value is the file descriptor for the new file or -1 if there was a problem. In this case, errno indicates the error. If the file already exists, it will be truncated when opened.

In all other respects, file descriptors returned by creat() are the same as those returned by open(); they’re used for writing and seeking and must be closed with close():

int fd, count;

/* Error checking omitted for brevity */
fd = creat("/some/new/file", 0666);
count = write(fd, "some data
", 10);
(void) close(fd);

Revisiting open()

You may recall the declaration for open():

int open(const char *pathname, int flags, mode_t mode);

Earlier, we said that when opening a file for plain I/O, we could ignore the mode argument. Having seen creat(), though, you can probably guess that open() can also be used for creating files and that the mode argument is used in this case. This is indeed true.

Besides the O_RDONLY, O_WRONLY, and O_RDWR flags, additional flags may be bitwise OR’d when open() is called. The POSIX standard mandates a number of these additional flags. Table 4.7 presents the flags that are used for most mundane applications.

Table 4.7. Additional POSIX flags for open()

Flag

Meaning

O_APPEND

Force all writes to occur at the end of the file.

O_CREAT

Create the file if it doesn’t exist.

O_EXCL

When used with O_CREAT, cause open() to fail if the file already exists.

O_TRUNC

Truncate the file (set it to zero length) if it exists.

Given O_APPEND and O_TRUNC, you can imagine how the shell might open or create files corresponding to the > and >> operators. For example:

int fd;
extern char *filename;
mode_t mode = S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH;  /* 0666 */

fd = open(filename, O_CREAT|O_WRONLY|O_TRUNC, mode);            /* for > */

fd = open(filename, O_CREAT|O_WRONLY|O_APPEND, mode);           /* for >> */

Note that the O_EXCL flag would not be used here, since for both > and >>, it’s not an error for the file to exist. Remember also that the system applies the umask to the requested permissions.

Also, it’s easy to see that, at least conceptually, creat() could be written this easily:

int creat(const char *path, mode_t mode)
{
    return open(path, O_CREAT|O_WRONLY|O_TRUNC, mode);
}

Note

If a file is opened with O_APPEND, all data will be written at the end of the file, even if the current position has been reset with lseek().

Modern systems provide additional flags whose uses are more specialized. Table 4.8 describes them briefly.

Table 4.8. Additional advanced POSIX flags for open()

Flag

Meaning

O_NOCTTY

If the device being opened is a terminal, it does not become the process’s controlling terminal. (This is a more advanced topic, discussed briefly in Section 9.2.1, page 312.)

O_NONBLOCK

Disables blocking of I/O operations in certain cases (see Section 9.4.3.4, page 333).

O_DSYNC

Ensure that data written to a file make it all the way to physical storage before write() returns.

O_RSYNC

Ensure that any data that read() would read, which may have been written to the file being read, have made it all the way to physical storage before read() returns.

O_SYNC

Like O_DSYNC, but also ensure that all file metadata, such as access times, have also been written to physical storage.

The O_DSYNC, O_RSYNC, and O_SYNC flags need some explanation. Unix systems (including Linux) maintain an internal cache of disk blocks, called the buffer cache. When the write() system call returns, the data passed to the operating system have been copied to a buffer in the buffer cache. They are not necessarily written out to the disk.

The buffer cache provides considerable performance improvement: Since disk I/O is often an order of magnitude or more slower than CPU and memory operations, programs would slow down considerably if they had to wait for every write to go all the way through to the disk. In addition, if data have recently been written to a file, a subsequent read of that same data will find the information already in the buffer cache, where it can be returned immediately instead of having to wait for an I/O operation to read it from the disk.

Unix systems also do read-ahead; since most reads are sequential, upon reading one block, the operating system will read several more consecutive disk blocks so that their information will already be in the buffer cache when a program asks for it. If multiple programs are reading the same file, they all benefit since they will all get their data from the same copy of the file’s disk blocks in the buffer cache.

All of this caching is wonderful, but of course there’s no free lunch. While data are in the buffer cache and before they have been written to disk, there’s a small—but very real—window in which disaster can strike; for example, if the power goes out. Modern disk drives exacerbate this problem: Many have their own internal buffers, so while data may have made it to the drive, it may not have made it onto the media when the power goes out! This can be a significant issue for small systems that aren’t in a data center with controlled power or that don’t have an uninterruptible power supply (UPS).[5]

For most applications, the chance that data in the buffer cache might be inadvertently lost is acceptably small. However, for some applications, any such chance is not acceptable. Thus, the notion of synchronous I/O was added to Unix systems, whereby a program can be guaranteed that if a system call has returned, the data are safely written on a physical storage device.

The O_DSYNC flag guarantees data integrity; the data and any other information that the operating system needs to find the data are written to disk before write() returns. However, metadata, such as access and modification times, may not be written to disk. The O_SYNC flag requires that metadata also be written to disk before write() returns. (Here too there is no free lunch; synchronous writes can seriously affect the performance of a program, slowing it down noticeably.)

The O_RSYNC flag is for data reads: If read() finds data in the buffer cache that were scheduled for writing to disk, then read() won’t return that data until they have been written to disk. The other two flags can affect this: In particular, O_SYNC will cause read() to wait until the file metadata have been written out as well.

Note

As of kernel version 2.4, Linux treats all three flags the same, with essentially the meaning of O_SYNC. Furthermore, Linux defines additional flags that are Linux specific and intended for specialized uses. Check the GNU/Linux open(2) manpage for more information.

Forcing Data to Disk

Earlier, we described the O_DSYNC, O_RSYNC, and O_SYNC flags for open(). We noted that using these flags could slow a program down since each write() does not return until all data have been written to physical media.

For a slightly higher risk level, we can have our cake and eat it too. We do this by opening a file without one of the O_xSYNC flags and then using one of the following two system calls at whatever point it’s necessary to have the data safely moved to physical storage:

#include <unistd.h>

int fsync(int fd);                                            POSIX FSC
int fdatasync(int fd);                                        POSIX SIO

The fdatasync() system call is like O_DSYNC: It forces all file data to be written to the final physical device. The fsync() system call is like O_SYNC, forcing not just file data, but also file metadata, to physical storage. The fsync() call is more portable; it has been around in the Unix world for longer and is more likely to exist across a broad range of systems.

You can use these calls with <stdio.h> file pointers by first calling fflush() and then using fileno() to obtain the underlying file descriptor. Here is an fpsync() function that can be used to wrap both operations in one call. It returns 0 on success:

/* fpsync --- sync a stdio FILE * variable */

int fpsync(FILE *fp)
{
    if (fp == NULL || fflush(fp) == EOF || fsync(fileno(fp)) < 0)
        return -1;

    return 0;
}

Technically, both of these calls are extensions to the base POSIX standard: fsync() in the “File Synchronization” extension (FSC), and fdatasync() in the “Synchronized Input and Output” extension. Nevertheless, you can use them on a GNU/Linux system without any problem.

Setting File Length

Two system calls make it possible to adjust the size of a file:

#include <unistd.h>
#include <sys/types.h>

int truncate(const char *path, off_t length);                  XSI
int ftruncate(int fd, off_t length);                           POSIX

As should be obvious from the parameters, truncate() takes a filename argument, whereas ftruncate() works on an open file descriptor. (The xxx() and fxxxx() naming convention for system call pairs that work on a filename or file descriptor is common. We see several examples in this and subsequent chapters.) For both, the length argument is the new size of the file.

This system call originated in 4.2 BSD Unix, and in early systems could only be used to shorten a file’s length, hence the name. (It was created to simplify implementation of the truncate operation in Fortran.) On modern systems, including Linux, the name is a misnomer, since it’s possible to extend the length of a file with these calls, not just shorten a file. (However, POSIX indicates that the ability to extend a file is an XSI extension.)

For these calls, the file being truncated must have write permission (for truncate()), or have been opened for writing (for ftruncate()). If the file is being shortened, any data past the new end of the file are lost. (Thus, you can’t shorten the file, lengthen it again, and expect to find the original data.) If the file is extended, as with data written after an lseek(), the data between the old end of the file and the new end of file read as zeros.

These calls are very different from ’open(file, ... |O_TRUNC, mode’. The latter truncates a file completely, throwing away all its data. These calls simply set the file’s absolute length to the given value.

These functions are fairly specialized; they’re used only four times in all of the GNU Coreutils code. We present an example use of ftruncate() in Section 5.5.3, “Changing Timestamps: utime(),” page 157.

Summary

  • When a system call fails, it usually returns -1, and the global variable errno is set to a predefined value indicating the problem. The functions perror() and strerror() can be used for reporting errors.

  • Files are manipulated by small integers called file descriptors. File descriptors for standard input, standard output, and standard error are inherited from a program’s parent process. Others are obtained with open() or creat(). They are closed with close(), and getdtablesize() returns the maximum number of allowed open files. The value of the umask (set with umask()) affects the permissions given to new files created with creat() or the O_CREAT flag for open().

  • The read() and write() system calls read and write data, respectively. Their interface is simple. In particular, they do no interpretation of the data; files are linear streams of bytes. The lseek() system call provides random access I/O: the ability to move around within a file.

  • Additional flags for open() provide for synchronous I/O, whereby data make it all the way to the physical storage media before write() or read() return. Data can also be forced to disk on a controlled basis with fsync() or fdatasync().

  • The truncate() and ftruncate() system calls set the absolute length of a file. (On older systems, they can only be used to shorten a file; on modern systems they can also extend a file.)

Exercises

  1. Using just open(), read(), write(), and close(), write a simple copy program that copies the file named by its first argument to the file named by its second.

  2. Enhance the copy program to accept "-" to mean “standard input” if used as the first argument and “standard output” as the second. Does ’copy - -’ work correctly?

  3. Look at the proc(5) manpage on a GNU/Linux system. In particular the fd subsection. Do an ’ls -l /dev/fd’ and examine the files in the /proc/self/fd directly. If /dev/stdin and friends had been around in the early versions of Unix, how would that have simplified the code for the V7 cat program? (Many other modern Unix systems have a /dev/fd directory or filesystem. If you’re not using GNU/Linux, see what you can discover about your Unix version.)

  4. Even though you don’t understand it yet, try to copy the code segment from the V7 cat.c that uses the struct stat and the fstat() function into ch04-cat.c so that it too reports an error for ’cat file >> file’.

  5. (Easy.) Assuming the existence of strerror(), write your own version of perror().

  6. What is the result of ’ulimit -n’ on your system?

  7. Write a simple version of the umask program, named myumask, that takes an octal mask on the command line. Use strtol() with a base of 8 to convert the character string command-line argument into an integer value. Change the umask to the new mask with the umask() system call.

    Compile and run myumask, and then examine the value of the umask with the regular umask command. Explain the results. (Hint: in Bash, enter ’type umask’.)

  8. Change the simple copy program you wrote earlier to use open() with the O_SYNC flag. Using the time command, compare the performance of the original version and the new version on a large file.

  9. For ftruncate(), we said that the file must have been opened for writing. How can a file be open for writing when the file itself doesn’t have write permission?

  10. Write a truncate program whose usage is ’truncate filelength’.



[1] open() is one of the few variadic system calls.

[2] See /usr/src/cmd/cat.c in the V7 distribution. The program compiles without change under GNU/Linux.

[3] At least three of these blocks contain the data that we wrote out; the others are for use by the operating system in keeping track of where the data reside.

[4] Yes, that’s how it’s spelled. Ken Thompson, one of the two “fathers” of Unix, was once asked what he would have done differently if he had it to do over again. He replied that he would have spelled creat() with an “e.” Indeed, that is exactly what he did for the Plan 9 From Bell Labs operating system.

[5] If you don’t have a UPS and you use your system for critical work, we highly recommend investing in one. You should also be doing regular backups.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.36.30