People use Linux to mean different things. A technically accurate definition is this:
Linux is a freely distributable, Unix-like operating system kernel.
However, most people use Linux to mean an entire operating system based on the Linux kernel:
Linux is a freely distributable, Unix-like operating system that includes a kernel, system tools, applications, and a complete development environment.
In this book, we use the second definition, as you will be programming for the entire operating system, not just the kernel.
Linux (by the second definition) provides a good platform from which to port programs, because its recommended interfaces (the ones we discuss in this book) are supported by nearly every version of Unix available, as well as by most Unix clones. After you learn the contents of this book, you should be able to port your programs to nearly every Unix and Unix-like system, with little extra work.
On the other hand, after working with Linux, you may prefer to use only Linux and not bother porting.
Linux is not just another Unix. It is more than a good platform from which to port programs—it is also a good platform on which to build and run applications. It is widely used around the world, and has become a household name. It has helped to popularize the concept of Open Source or Free Software. A brief history lesson will help explain how, and why, this has happened.
This history is simplified and biased toward the most important elements in a Linux system. For longer, more even coverage, you can read an entire book, A Quarter Century of UNIX [Salus, 1994].
In the earliest days of computing, software was seen as little more than a feature of the hardware. It was the hardware that people were trying to sell, so companies gave away the software with their systems. Enhancements, new algorithms, and new ideas flowed freely among students, professors, and corporate researchers.
It did not take long for companies to recognize the value of software as intellectual property. They began enforcing copyrights on their software technologies and restricting distribution of their source code and binaries. The innovations that had been seen as public property became fiercely protected corporate assets, and the culture of computer software development changed.
Richard Stallman, at the Massachusetts Institute of Technology (MIT), did not want any part of a world in which software innovation was controlled by corporate ambitions. His answer to this development was to found the Free Software Foundation (FSF). The goal of the FSF is to encourage the development and use of freely redistributable software.
The use of the word free in this context has created great confusion, however. Richard Stallman meant free as in freedom, not free as in zero cost. He strongly believes that software and its associated documentation should be available with source code, with no restrictions placed on additional redistribution. More recently, others coined the term Open Source in an attempt to describe the same goals, without the confusion over the word free. The terms Open Source and Free Software are generally treated synonymously.
To promote his ideal, Richard Stallman, with help from others, created the General Public License (GPL). This license has been so influential that GPL has entered the developers’ jargon lexicon as a verb; to apply the terms of the GPL to software you write is to GPL it.
The GPL has three major points:
Anyone who receives GPLed software has the right to obtain the source code to the software at no additional charge (beyond the cost of delivery).
Any software derived from GPLed software must retain the GPL as its license terms for redistribution.
Anyone in possession of GPLed software has the right to redistribute that software under terms that do not conflict with the GPL.
An important point to notice about these licensing terms is that they do not mention price (except that source is not allowed to be an extra-cost item). GPLed software may be sold to customers at any price. However, those customers then have the right to redistribute the software, including the source code, as they please. With the advent of the Internet, this right has the effect of keeping the price of GPLed software low—generally zero—while still allowing companies to sell GPLed software and services, such as support, designed to complement the software.
The part of the GPL that generates the most controversy is the second point: that software derived from GPLed software also must be GPLed. Although detractors refer to the GPL as a virus because of this clause, supporters insist that this clause is one of the GPL’s greatest strengths. It prevents companies from taking GPLed software, adding features, and turning the result into a proprietary package.
The major project the FSF sponsors is the GNU’s Not Unix (GNU) project, whose goal is to create a freely distributable Unix-like operating system. There was little high-quality freely distributable software available for the GNU project when it was started, so project contributors began by creating the applications and tools for the system rather than the operating system itself. As the GPL was also produced by the FSF, many of the key components of the GNU operating system are GPLed, but through the years the GNU project has adopted many other software packages, such as the X Window System, the TEX typesetting system, and the Perl language, that are freely redistributable under other licenses.
Several major packages, and a multitude of minor ones, have been produced as a result of the GNU project. Major ones include the Emacs editor, the GNU C library, the GNU Compiler Collection (gcc, which originally stood for GNU C Compiler before C++ was added), the bash shell, and gawk (GNU’s awk). Minor ones include the high-quality shell utilities and text-manipulation programs that users expect to find on a Unix system.
In 1991, Linus Torvalds, at that time a student at the University of Helsinki, started a project to teach himself about low-level Intel 80386 programming. At the time, he was running the Minix operating system, designed by Andrew Tanenbaum, so he initially kept his project compatible with the Minix system calls and on-disk file-system layout to make his work much easier. Although he released the first version of the Linux kernel to the Internet under a fairly restrictive license, he was soon convinced to change his license to the GPL.
The combination of the GPL and the early functionality of the Linux kernel convinced other developers to help develop the kernel. A C library implementation, derived from the then-dormant GNU C library project, was released, allowing developers to build native user applications. Native versions of gcc, Emacs, and bash quickly followed. In early 1992, a moderately skilled developer could install and boot Linux 0.95 on most Intel 80386 machines.
The Linux project was closely associated with the GNU project from the beginning. The GNU project’s source base became an extremely important resource for the Linux community from which to build a complete system. Although significant portions of Linux-based systems are derived from sources that include freely available Unix code from the University of California at Berkeley and the X Consortium, many important parts of a functional Linux system come directly from the GNU project.
As Linux matured, some individuals, and later, companies, focused on easing the installation and usability of Linux systems for new users by creating packages, called distributions, of the Linux kernel and a reasonably complete set of utilities that together constituted a full operating system.
In addition to the Linux kernel, a Linux distribution contains development libraries, compilers, interpreters, shells, applications, utilities, graphical operating environments, and configuration tools, along with many other components. When a Linux system is built, distribution developers collect the components from a variety of places to create a complete collection of all the software components that are necessary for a functional Linux system. Most distributions also contain custom components that ease the installation and maintenance of Linux systems.
Many Linux distributions are available. Each has its own advantages and disadvantages; however, they all share the common kernel and development libraries that distinguish Linux systems from other operating systems. This book is intended to help developers build programs for any Linux system. Because all Linux distributions use the same code to provide system services, program binaries and source code are highly compatible across distributions.
One project that has contributed to this compatibility is the Filesystem Hierarchy Standard (FHS), previously called the Linux Filesystem Standard (FSSTND), which specifies where many files should be kept and explains, in general terms, how the rest of the file system should be organized. More recently, a project called Linux Standard Base (LSB) has expanded beyond the file system layout, defining Application Program Interfaces (APIs) and Application Binary Interfaces (ABIs) intended to make it possible to compile an application once and deploy it on any system that complies with the LSB definition for that CPU architecture. These documents are available, with others, at http://freestandards.org/.
Although the major portions of Linux comprise code developed independently of traditional Unix source bases, the interfaces that Linux provides were influenced heavily by existing Unix systems.
In the early 1980s, Unix development split into two camps, one at the University of California at Berkeley, the other at AT&T’s Bell Laboratories. Each institution developed and maintained Unix operating systems that were derived from the original Unix implementation done by Bell Laboratories.
The Berkeley version of Unix became known as the Berkeley Software Distribution (BSD) and was popular in academia. The BSD system was the first to include TCP/IP networking, which contributed to its success and helped to convince Sun Microsystems to base Sun’s first operating system, SunOS, on BSD.
Bell Laboratories also worked on enhancing Unix, but, unfortunately, it did so in ways slightly different from those of the Berkeley group. The various releases from Bell Laboratories were denoted by the word System followed by a roman numeral. The final major release of Unix from Bell Laboratories was System V (or SysV); UNIX System V Release 4 (SVR4) provides the code base for most commercial Unix operating systems today. The standard document describing System V is the System V Interface Definition (SVID).
This forked development of Unix caused major differentiation in the system calls, system libraries, and basic commands of Unix systems. One of the best examples of this split is in the networking interfaces that each operating system provided to applications. BSD systems used an interface known as sockets to allow programs to talk to one another over a network. By contrast, System V provided the Transport Layer Interface (TLI), which is completely incompatible with sockets, and is officially defined in the X/Open Transport Interface (XTI). This divergent development greatly diminished the portability of programs across versions of Unix, increasing the cost and decreasing the availability of third-party products for all versions of Unix.
Another example of the incompatibilities among Unix systems is the ps
command, which allows users to query the operating system’s process information. On BSD systems, ps aux
gives a complete listing of all the processes running on a machine; on System V, that command is invalid, and ps -ef
can be used instead. The output formats are as incompatible as the command-line arguments. (The Linux ps
command attempts to recognize both styles.)
In an attempt to standardize all the aspects of Unix that had diverged because of the split development in this period (affectionately known as the Unix Wars), the Unix industry sponsored a set of standards that would define the interfaces Unix provides. The portion of the standards that deals with programming and system-tool interfaces was known as POSIX (technically, this is the IEEE Std 1003 series, comprised of many separate standards and draft standards), and was issued by the Institute for Electrical and Electronic Engineers (IEEE).
The original POSIX series of standards, however, were insufficiently complete. For example, basic UNIX concepts such as processes were considered optional. A more complete standard went through several versions and names (such as the X/Open Portability Guide [XPG] series of standards) before being named the Single Unix Specification (SUS), released by The Open Group (the owner of the UNIX trademark). The SUS has gone through several revisions and now also has been adopted by the IEEE as the latest version of the POSIX standard, currently IEEE Std 1003.1-2004 [Open Group, 2002], and updated occasionally by corrigenda. IEEE Std 1003.1-2003 was also adopted as an ISO/IEC standard as ISO/IEC 9945-2003. You can read the latest version of the standard online at http://www.unix-systems.org/.
Older standards from which this newer unified standard was created include all the older IEEE Std 1003.1 (POSIX.1—the C programming interface), IEEE Std 1003.2 (POSIX.2—the shell interface), and all related POSIX standards, such as the real-time extensions specified as POSIX.4, later renamed POSIX.1b, and several draft standards.
Since “POSIX” is pronounceable and “POSIX” and “SUS” are now synonymous, we refer to the combined work as POSIX throughout this book.
“The best thing about standards is that there are so many to choose from.”[1] Linux developers had 20 years of Unix history to examine when they designed Linux, and, more important, they had high-quality standards to reference. Linux was designed primarily according to POSIX; where POSIX left off, Linux generally followed System V practice, except in networking, where both the system calls and networking utilities followed the far more popular BSD model. Now that the joint SUS/POSIX standard exists, further development is normally compatible with the newer POSIX standard, and past deviations have been largely corrected when possible.
The biggest difference between SVR4 and Linux, from a programming perspective, is that Linux does not provide quite as many duplicate programming interfaces. For example, even programmers who coded exclusively for SVR4 systems generally preferred Berkeley sockets to SysV TLI; Linux eschews the overhead of TLI and provides only sockets.
When there are insufficient available standards (formal, de jure, and informal, de facto) for an implementation, Linux sometimes has to provide its own extensions beyond POSIX. For example, the POSIX asynchronous I/O specification is widely judged as inadequate for many real applications, so Linux implements the POSIX standard by means of a wrapper around a more general, more useful implementation. Also, there is no general specification for a highly scalable I/O polling interface, so an entirely new interface called epoll was devised and added. We call out these nonstandard interfaces as such when we document them.
18.224.52.212