Preface

One of the best ways to learn about programming is to read well-written programs. This book teaches the fundamental Linux system call APIs—those that form the core of any significant program—by presenting code from production programs that you use every day.

By looking at concrete programs, you can not only see how to use the Linux APIs, but you also can examine the real-world issues (performance, portability, robustness) that arise in writing software.

While the book's title is Linux Programming by Example, everything we cover, unless otherwise noted, applies to modern Unix systems as well. In general we use “Linux” to mean the Linux kernel, and “GNU/Linux” to mean the total system (kernel, libraries, tools). Also, we often say “Linux” when we mean all of Linux, GNU/Linux and Unix; if something is specific to one system or the other, we mention it explicitly.

Audience

This book is intended for the person who understands programming and is familiar with the basics of C, at least on the level of The C Programming Language by Kernighan and Ritchie. (Java programmers wishing to read this book should understand C pointers, since C code makes heavy use of them.) The examples use both the 1990 version of Standard C and Original C.

In particular, you should be familiar with all C operators, control-flow structures, variable and pointer declarations and use, the string management functions, the use of exit(), and the <stdio.h> suite of functions for file input/output.

You should understand the basic concepts of standard input, standard output, and standard error and the fact that all C programs receive an array of character strings representing invocation options and arguments. You should also be familiar with the fundamental command-line tools, such as cd, cp, date, ln, ls, man (and info if you have it), rmdir, and rm, the use of long and short command-line options, environment variables, and I/O redirection, including pipes.

We assume that you want to write programs that work not just under GNU/Linux but across the range of Unix systems. To that end, we mark each interface as to its availability (GLIBC systems only, or defined by POSIX, and so on), and portability advice is included as an integral part of the text.

The programming taught here may be at a lower level than you're used to; that's OK. The system calls are the fundamental building blocks for higher operations and are thus low-level by nature. This in turn dictates our use of C: The APIs were designed for use from C, and code that interfaces them to higher-level languages, such as C++ and Java, will necessarily be lower level in nature, and most likely, written in C. It may help to remember that “low level” doesn't mean “bad,” it just means “more challenging.”

What You Will Learn

This book focuses on the basic APIs that form the core of Linux programming:

  • Memory management

  • File input/output

  • File metadata

  • Processes and signals

  • Users and groups

  • Programming support (sorting, argument parsing, and so on)

  • Internationalization

  • Debugging

We have purposely kept the list of topics short. We believe that it is intimidating to try to learn “all there is to know” from a single book. Most readers prefer smaller, more focused books, and the best Unix books are all written that way.

So, instead of a single giant tome, we plan several volumes: one on Interprocess Communication (IPC) and networking, and another on software development and code portability. We also have an eye toward possible additional volumes in a Linux Programming by Example series that will cover topics such as thread programming and GUI programming.

The APIs we cover include both system calls and library functions. Indeed, at the C level, both appear as simple function calls. A system call is a direct request for system services, such as reading or writing a file or creating a process. A library function, on the other hand, runs at the user level, possibly never requesting any services from the operating system. System calls are documented in section 2 of the reference manual (viewable online with the man command), and library functions are documented in section 3.

Our goal is to teach you the use of the Linux APIs by example: in particular, through the use, wherever possible, of both original Unix source code and the GNU utilities. Unfortunately, there aren't as many self-contained examples as we thought there'd be. Thus, we have written numerous small demonstration programs as well. We stress programming principles: especially those aspects of GNU programming, such as “no arbitrary limits,” that make the GNU utilities into exceptional programs.

The choice of everyday programs to study is deliberate. If you've been using GNU/Linux for any length of time, you already understand what programs such as ls and cp do; it then becomes easy to dive straight into how the programs work, without having to spend a lot of time learning what they do.

Occasionally, we present both higher-level and lower-level ways of doing things. Usually the higher-level standard interface is implemented in terms of the lower-level interface or construct. We hope that such views of what's “under the hood” will help you understand how things work; for all the code you write, you should always use the higher-level, standard interface.

Similarly, we sometimes introduce functions that provide certain functionality and then recommend (with a provided reason) that these functions be avoided! The primary reason for this approach is so that you'll be able to recognize these functions when you see them and thus understand the code using them. A well-rounded knowledge of a topic requires understanding not just what you can do, but what you should and should not do.

Finally, each chapter concludes with exercises. Some involve modifying or writing code. Others are more in the category of “thought experiments” or “why do you think...” We recommend that you do all of them—they will help cement your understanding of the material.

Small Is Beautiful: Unix Programs

 

Hoare’s law: “Inside every large program is a small program struggling to get out.”

 
 --C.A.R. Hoare

Initially, we planned to teach the Linux API by using the code from the GNU utilities. However, the modern versions of even simple command-line programs (like mv and cp) are large and many-featured. This is particularly true of the GNU variants of the standard utilities, which allow long and short options, do everything required by POSIX, and often have additional, seemingly unrelated options as well (like output highlighting).

It then becomes reasonable to ask, “Given such a large and confusing forest, how can we focus on the one or two important trees?” In other words, if we present the current full-featured program, will it be possible to see the underlying core operation of the program?

That is when Hoare's law[1] inspired us to look to the original Unix programs for example code. The original V7 Unix utilities are small and straightforward, making it easy to see what's going on and to understand how the system calls are used. (V7 was released around 1979; it is the common ancestor of all modern Unix systems, including GNU/Linux and the BSD systems.)

For many years, Unix source code was protected by copyrights and trade secret license agreements, making it difficult to use for study and impossible to publish. This is still true of all commercial Unix source code. However, in 2002, Caldera (currently operating as SCO) made the original Unix code (through V7 and 32V Unix) available under an Open Source style license (see Appendix B, “Caldera Ancient UNIX License,” page 655). This makes it possible for us to include the code from the early Unix system in this book.

Standards

Throughout the book we refer to several different formal standards. A standard is a document describing how something works. Formal standards exist for many things, for example, the shape, placement, and meaning of the holes in the electrical outlet in your wall are defined by a formal standard so that all the power cords in your country work in all the outlets.

So, too, formal standards for computing systems define how they are supposed to work; this enables developers and users to know what to expect from their software and enables them to complain to their vendor when software doesn't work.

Of interest to us here are:

  1. ISO/IEC International Standard 9899: Programming Languages—C, 1990”. The first formal standard for the C programming language.

  2. ISO/IEC International Standard 9899: Programming Languages—C, Second edition, 1999”. The second (and current) formal standard for the C programming language.

  3. ISO/IEC International Standard 14882: Programming Languages—C++, 1998”. The first formal standard for the C++ programming language.

  4. ISO/IEC International Standard 14882: Programming Languages—C++, 2003”. The second (and current) formal standard for the C++ programming language.

  5. IEEE Standard 1003.1–2001: Standard for Information Technology—Portable Operating System Interface (POSIX®)”. The current version of the POSIX standard; describes the behavior expected of Unix and Unix-like systems. This edition covers both the system call and library interface, as seen by the C/C++ programmer, and the shell and utilities interface, seen by the user. It consists of several volumes:

    • Base DefinitionsThe definitions of terms, facilities, and header files.

    • Base Definitions—RationaleExplanations and rationale for the choice of facilities that both are and are not included in the standard.

    • System InterfacesThe system calls and library functions. POSIX terms them all “functions.”

    • Shell and UtilitiesThe shell language and utilities available for use with shell programs and interactively.

Although language standards aren't exciting reading, you may wish to consider purchasing a copy of the C standard: It provides the final definition of the language. Copies can be purchased from ANSI[2] and from ISO.[3] (The PDF version of the C standard is quite affordable.)

The POSIX standard can be ordered from The Open Group.[4] By working through their publications catalog to the items listed under “CAE Specifications,” you can find individual pages for each part of the standard (named “C031” through “C034”). Each one's page provides free access to the online HTML version of the particular volume.

The POSIX standard is intended for implementation on both Unix and Unix-like systems, as well as non-Unix systems. Thus, the base functionality it provides is a subset of what Unix systems have. However, the POSIX standard also defines optional extensions—additional functionality, for example, for threads or real-time support. Of most importance to us is the X/Open System Interface (XSI) extension, which describes facilities from historical Unix systems.

Throughout the book, we mark each API as to its availability: ISO C, POSIX, XSI, GLIBC only, or nonstandard but commonly available.

Features and Power: GNU Programs

Restricting ourselves to just the original Unix code would have made an interesting history book, but it would not have been very useful in the 21st century. Modern programs do not have the same constraints (memory, CPU power, disk space, and speed) that the early Unix systems did. Furthermore, they need to operate in a multilingual world—ASCII and American English aren't enough.

More importantly, one of the primary freedoms expressly promoted by the Free Software Foundation and the GNU Project[5] is the “freedom to study.” GNU programs are intended to provide a large corpus of well-written programs that journeyman programmers can use as a source from which to learn.

By using GNU programs, we want to meet both goals: show you well-written, modern code from which you will learn how to write good code and how to use the APIs well.

We believe that GNU software is better because it is free (in the sense of “freedom,” not “free beer”). But it's also recognized that GNU software is often technically better than the corresponding Unix counterparts, and we devote space in Section 1.4, “Why GNU Programs Are Better,” page 14, to explaining why.

A number of the GNU code examples come from gawk (GNU awk). The main reason is that it's a program with which we're very familiar, and therefore it was easy to pick examples from it. We don't otherwise make any special claims about it.

Summary of Chapters

Driving a car is a holistic process that involves multiple simultaneous tasks. In many ways, Linux programming is similar, requiring understanding of multiple aspects of the API, such as file I/O, file metadata, directories, storage of time information, and so on.

The first part of the book looks at enough of these individual items to enable studying the first significant program, the V7 ls. Then we complete the discussion of files and users by looking at file hierarchies and the way filesystems work and are used.

Chapter 1, “Introduction,” page 3,

  • describes the Unix and Linux file and process models, looks at the differences between Original C and 1990 Standard C, and provides an overview of the principles that make GNU programs generally better than standard Unix programs.

Chapter 2, “Arguments, Options, and the Environment,” page 23,

  • describes how a C program accesses and processes command-line arguments and options and explains how to work with the environment.

Chapter 3, “User-Level Memory Management,” page 51,

  • provides an overview of the different kinds of memory in use and available in a running process. User-level memory management is central to every nontrivial application, so it's important to understand it early on.

Chapter 4, “Files and File I/O,” page 83,

  • discusses basic file I/O, showing how to create and use files. This understanding is important for everything else that follows.

Chapter 5, “Directories and File Metadata,” page 117,

  • describes how directories, hard links, and symbolic links work. It then describes file metadata, such as owners, permissions, and so on, as well as covering how to work with directories.

Chapter 6, “General Library Interfaces — Part 1,” page 165,

  • looks at the first set of general programming interfaces that we need so that we can make effective use of a file's metadata.

Chapter 7, “Putting It All Together: ls,” page 207,

  • ties together everything seen so far by looking at the V7 ls program.

Chapter 8, “Filesystems and Directory Walks,” page 227,

  • describes how filesystems are mounted and unmounted and how a program can tell what is mounted on the system. It also describes how a program can easily “walk” an entire file hierarchy, taking appropriate action for each object it encounters.

The second part of the book deals with process creation and management, interprocess communication with pipes and signals, user and group IDs, and additional general programming interfaces. Next, the book first describes internationalization with GNU gettext and then several advanced APIs.

Chapter 9, “Process Management and Pipes,” page 283,

  • looks at process creation, program execution, IPC with pipes, and file descriptor management, including nonblocking I/O.

Chapter 10, “Signals,” page 347,

  • discusses signals, a simplistic form of interprocess communication. Signals also play an important role in a parent process's management of its children.

Chapter 11, “Permissions and User and Group ID Numbers,” page 403,

  • looks at how processes and files are identified, how permission checking works, and how the setuid and setgid mechanisms work.

Chapter 12, “General Library Interfaces — Part 2,” page 427,

  • looks at the rest of the general APIs; many of these are more specialized than the first general set of APIs.

Chapter 13, “Internationalization and Localization,” page 485,

  • explains how to enable your programs to work in multiple languages, with almost no pain.

Chapter 14, “Extended Interfaces,” page 529,

  • describes several extended versions of interfaces covered in previous chapters, as well as covering file locking in full detail.

We round the book off with a chapter on debugging, since (almost) no one gets things right the first time, and we suggest a final project to cement your knowledge of the APIs covered in this book.

Chapter 15, “Debugging,” page 567,

  • describes the basics of the GDB debugger, transmits as much of our programming experience in this area as possible, and looks at several useful tools for doing different kinds of debugging.

Chapter 16, “A Project that Ties Everything Together,” page 641,

  • presents a significant programming project that makes use of just about everything covered in the book.

Several appendices cover topics of interest, including the licenses for the source code used in this book.

Appendix A, “Teach Yourself Programming in Ten Years,” page 649,

  • invokes the famous saying, “Rome wasn't built in a day.” So too, Linux/Unix expertise and understanding only come with time and practice. To that end, we have included this essay by Peter Norvig which we highly recommend.

Appendix B, “Caldera Ancient UNIX License,” page 655,

  • covers the Unix source code used in this book.

Appendix C, “GNU General Public License,” page 657,

  • covers the GNU source code used in this book.

Typographical Conventions

Like all books on computer-related topics, we use certain typographical conventions to convey information. Definitions or first uses of terms appear in italics, like the word “Definitions” at the beginning of this sentence. Italics are also used for emphasis, for citations of other works, and for commentary in examples. Variable items such as arguments or filenames, appear like this. Occasionally, we use a bold font when a point needs to be made strongly.

Things that exist on a computer are in a constant-width font, such as filenames (foo.c) and command names (ls, grep). Short snippets that you type are additionally enclosed in single quotes: ’ls -l *.c’.

$ and > are the Bourne shell primary and secondary prompts and are used to display interactive examples. User input appears in a different font from regular computer output in examples. Examples look like this:

$ ls -l                Look at files. Option is digit 1, not letter l
foo
bar
baz

We prefer the Bourne shell and its variants (ksh93, Bash) over the C shell; thus, all our examples show only the Bourne shell. Be aware that quoting and line-continuation rules are different in the C shell; if you use it, you're on your own![6]

When referring to functions in programs, we append an empty pair of parentheses to the function's name: printf(), strcpy(). When referring to a manual page (accessible with the man command), we follow the standard Unix convention of writing the command or function name in italics and the section in parentheses after it, in regular type: awk(1), printf(3).

Where to Get Unix and GNU Source Code

You may wish to have copies of the programs we use in this book for your own experimentation and review. All the source code is available over the Internet, and your GNU/Linux distribution contains the source code for the GNU utilities.

Unix Code

Archives of various “ancient” versions of Unix are maintained by The UNIX Heritage Society (TUHS), http://www.tuhs.org.

Of most interest is that it is possible to browse the archive of old Unix source code on the Web. Start with http://minnie.tuhs.org/UnixTree/. All the example code in this book is from the Seventh Edition Research UNIX System, also known as “V7.”

The TUHS site is physically located in Australia, although there are mirrors of the archive around the world—see http://www.tuhs.org/archive_sites.html. This page also indicates that the archive is available for mirroring with rsync. (See http://rsync.samba.org/ if you don't have rsync: It's standard on GNU/Linux systems.)

You will need about 2–3 gigabytes of disk to copy the entire archive. To copy the archive, create an empty directory, and in it, run the following commands:

mkdir Applications 4BSD PDP-11 PDP-11/Trees VAX Other

rsync -avz minnie.tuhs.org::UA_Root .
rsync -avz minnie.tuhs.org::UA_Applications Applications
rsync -avz minnie.tuhs.org::UA_4BSD 4BSD
rsync -avz minnie.tuhs.org::UA_PDP11 PDP-11
rsync -avz minnie.tuhs.org::UA_PDP11_Trees PDP-11/Trees
rsync -avz minnie.tuhs.org::UA_VAX VAX
rsync -avz minnie.tuhs.org::UA_Other Other

You may wish to omit copying the Trees directory, which contains extractions of several versions of Unix, and occupies around 700 megabytes of disk.

You may also wish to consult the TUHS mailing list to see if anyone near you can provide copies of the archive on CD-ROM, to avoid transferring so much data over the Internet.

The folks at Southern Storm Software, Pty. Ltd., in Australia, have “modernized” a portion of the V7 user-level code so that it can be compiled and run on current systems, most notably GNU/Linux. This code can be downloaded from their web site.[7]

It's interesting to note that V7 code does not contain any copyright or permission notices in it. The authors wrote the code primarily for themselves and their research, leaving the permission issues to AT&T's corporate licensing department.

GNU Code

If you're using GNU/Linux, then your distribution will have come with source code, presumably in whatever packaging format it uses (Red Hat RPM files, Debian DEB files, Slackware .tar.gz files, etc.). Many of the examples in the book are from the GNU Coreutils, version 5.0. Find the appropriate CD-ROM for your GNU/Linux distribution, and use the appropriate tool to extract the code. Or follow the instructions in the next few paragraphs to retrieve the code.

If you prefer to retrieve the files yourself from the GNU ftp site, you will find them at ftp://ftp.gnu.org/gnu/coreutils/coreutils-5.0.tar.gz.

You can use the wget utility to retrieve the file:

$ wget ftp://ftp.gnu.org/gnu/coreutils/coreutils-5.0.tar.gz    Retrieve the distribution
... lots of output here as file is retrieved ...

Alternatively, you can use good old-fashioned ftp to retrieve the file:

$ ftp ftp.gnu.org                                     Connect to GNU ftp site
Connected to ftp.gnu.org (199.232.41.7).
220 GNU FTP server ready.
Name (ftp.gnu.org:arnold): anonymous                  Use anonymous ftp
331 Please specify the password.
Password:                                             Password does not echo on screen
230-If you have any problems with the GNU software or its downloading,
230-please refer your questions to <[email protected]>.
...                                                   Lots of verbiage deleted
230 Login successful. Have fun.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd /gnu/coreutils                                Change to Coreutils directory
250 Directory successfully changed.
ftp> bin
200 Switching to Binary mode.
ftp> hash                                             Print # signs as progress indicators
Hash mark printing on (1024 bytes/hash mark).
ftp> get coreutils-5.0.tar.gz                         Retrieve file
local: coreutils-5.0.tar.gz remote: coreutils-5.0.tar.gz
227 Entering Passive Mode (199, 232, 41, 7, 86, 107)
150 Opening BINARY mode data connection for coreutils-5.0.tar.gz (6020616 bytes)
#################################################################################
#################################################################################
...
226 File send OK.
6020616 bytes received in 2.03e+03 secs (2.9 Kbytes/sec)
ftp> quit                                             Log off
221 Goodbye.

Once you have the file, extract it as follows:

$ gzip -dc < coreutils-5.0.tar.gz | tar -xvpf -                Extract files
... lots of output here as files are extracted ...

Systems using GNU tar may use this incantation:

$ tar -xvpzf coreutils-5.0.tar.gz                              Extract files
... lots of output here as files are extracted ...

In compliance with the GNU General Public License, here is the Copyright information for all GNU programs quoted in this book. All the programs are “free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.” See Appendix C, “GNU General Public License,” page 657, for the text of the GNU General Public License.

Coreutils 5.0 File

Copyright dates

lib/safe-read.c

Copyright © 1993–1994, 1998, 2002

lib/safe-write.c

Copyright © 2002

lib/utime.c

Copyright © 1998, 2001–2002

lib/xreadlink.c

Copyright © 2001

src/du.c

Copyright © 1988–1991, 1995–2003

src/env.c

Copyright © 1986, 1991–2003

src/install.c

Copyright © 1989–1991, 1995–2002

src/link.c

Copyright © 2001–2002

src/ls.c

Copyright © 1985, 1988, 1990, 1991, 1995–2003

src/pathchk.c

Copyright © 1991–2003

src/sort.c

Copyright © 1988, 1991–2002

src/sys2.h

Copyright © 1997–2003

src/wc.c

Copyright © 1985, 1991, 1995–2002

Gawk 3.0.6 File

Copyright dates

eval.c

Copyright © 1986, 1988, 1989, 1991–2000

Gawk 3.1.3 File

Copyright dates

awk.h

Copyright © 1986, 1988, 1989, 1991–2003

builtin.c

Copyright © 1986, 1988, 1989, 1991–2003

eval.c

Copyright © 1986, 1988, 1989, 1991–2003

io.c

Copyright © 1986, 1988, 1989, 1991–2003

main.c

Copyright © 1986, 1988, 1989, 1991–2003

posix/gawkmisc.c

Copyright © 1986, 1988, 1989, 1991–1998, 2001–2003

Gawk 3.1.4 File

Copyright dates

builtin.c

Copyright © 1986, 1988, 1989, 1991–2004

GLIBC 2.3.2 File

Copyright dates

locale/locale.h

Copyright © 1991, 1992, 1995–2002

posix/unistd.h

Copyright © 1991–2003

time/sys/time.h

Copyright © 1991–1994, 1996–2003

Make 3.80 File

Copyright dates

read.c

Copyright © 1988–1997, 2002

Where to Get the Example Programs Used in This Book

The example programs used in this book can be found at http://authors.phptr.com/robbins.

About the Cover

 

“This is the weapon of a Jedi Knight ..., an elegant weapon for a more civilized age. For over a thousand generations the Jedi Knights were the guardians of peace and justice in the Old Republic. Before the dark times, before the Empire.”

 
 --Obi-Wan Kenobi

You may be wondering why we chose to put a light saber on the cover and to use it throughout the book's interior. What does it represent, and how does it relate to Linux programming?

In the hands of a Jedi Knight, a light saber is both a powerful weapon and a thing of beauty. Its use demonstrates the power, knowledge, control of the Force, and arduous training of the Jedi who wields it.

The elegance of the light saber mirrors the elegance of the original Unix API design. There, too, the studied, precise use of the APIs and the Software Tools and GNU design principles lead to today's powerful, flexible, capable GNU/Linux system. This system demonstrates the knowledge and understanding of the programmers who wrote all its components.

And, of course, light sabers are just way cool!

Acknowledgments

Writing a book is lots of work, and doing it well requires help from many people. Dr. Brian W. Kernighan, Dr. Doug McIlroy, Peter Memishian, and Peter van der Linden reviewed the initial book proposal. David J. Agans, Fred Fish, Don Marti, Jim Meyering, Peter Norvig, and Julian Seward provided reprint permission for various items quoted throughout the book. Thanks to Geoff Collyer, Ulrich Drepper, Yosef Gold, Dr. C.A.R. (Tony) Hoare, Dr. Manny Lehman, Jim Meyering, Dr. Dennis M. Ritchie, Julian Seward, Henry Spencer, and Dr. Wladyslaw M. Turski, who provided much useful general information. Thanks also to the other members of the GNITS gang: Karl Berry, Akim DeMaille, Ulrich Drepper, Greg McGary, Jim Meyering, François Pinard, and Tom Tromey, who all provided helpful feedback about good programming practice. Karl Berry, Alper Ersoy, and Dr. Nelson H.F. Beebe provided valuable technical help with the Texinfo and DocBook/XML toolchains.

Good technical reviewers not only make sure that an author gets his facts right, they also ensure that he thinks carefully about his presentation. Dr. Nelson H.F. Beebe, Geoff Collyer, Russ Cox, Ulrich Drepper, Randy Lechlitner, Dr. Brian W. Kernighan, Peter Memishian, Jim Meyering, Chet Ramey, and Louis Taber acted as technical reviewers for the entire book. Dr. Michael Brennan provided helpful comments on Chapter 15. Both the prose and many of the example programs benefited from their reviews. I hereby thank all of them. As most authors usually say here, “Any remaining errors are mine.”

I would especially like to thank Mark Taub of Pearson Education for initiating this project, for his enthusiasm for the series, and for his help and advice as the book moved through its various stages. Anthony Gemmellaro did a phenomenal job of realizing my concept for the cover, and Gail Cocker's interior design is beautiful. Faye Gemmellaro made the production process enjoyable, instead of a chore. Dmitry Kirsanov and Alina Kirsanova did the figures, page layout, and indexing; they were a pleasure to work with.

Finally, my deepest gratitude and love to my wife, Miriam, for her support and encouragement during the book's writing.

Arnold RobbinsNof AyalonISRAEL



[1] This famous statement was made at The International Workshop on Efficient Production of Large Programs in Jablonna, Poland, August 10–14, 1970.

[6] See the csh(1) and tcsh(1) manpages and the book Using csh & tcsh, by Paul DuBois, O'Reilly & Associates, Sebastopol, CA, USA, 1995. ISBN: 1-56592-132-1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.195.225