15. Tools: The Tactics of Development

Unix is user-friendly—it’s just choosy about who its friends are.

—Anonymous

15.1 A Developer-Friendly Operating System

Unix has a long-established reputation as a good environment to develop under. It’s well equipped with tools written by programmers for programmers. These automate away many of the grubby little tasks that would otherwise distract you from concentrating on the most important (and most enjoyable!) aspect of development—your design.

While all the tools you’ll need are there and individually well documented, they’re not knit together by an integrated development environment (IDE). Finding and assembling them into a kit that suits your needs has traditionally taken considerable effort.

If you’re used to a good IDE—the kind of GUI-driven combination of editor, configuration-manager, compiler, and debugger now common on Macintosh and Windows systems—the Unix approach may seem casual, murky, and primitive. But there’s actually method in it.

IDEs make a lot of sense for single-language programming in a tool-poor environment. If what you’re doing is confined to grinding out C or C++ code by hand and the yard, they’re quite appropriate. Under Unix, however, your languages and implementation options are a lot more varied. It’s common to use multiple code generators, custom configurators, and many other standard and custom tools.

IDEs do exist under Unix (there are several good open-source ones, including emulations of the major Macintosh and Windows IDEs). But it’s difficult to control an open-ended variety of programming tools with them, and they’re not much used. Unix encourages a more flexible style, one less exclusively centered on the edit/compile/debug loop.

In this chapter we introduce you to the tactics of development under Unix—building code, managing code configurations, profiling, debugging, and automating away a lot of the drudgery associated with these tasks so you can concentrate on the fun parts. As usual, the exposition focuses more on the architectural picture than the how-to details. When you want how-to details, most of the tools in this chapter are well described in Programming with GNU Software [Loukides-Oram].

Many of these tools automate things that you could do yourself by hand, albeit more slowly and with a higher error rate. The one-time cost of climbing the learning curve should be more than paid off by the ability to write programs more efficiently, and spend less attention on low-level details and more on design.

Unix programmers traditionally learn how to use these tools by osmosis from other programmers, and by exploration over a period of years. If you’re a novice, pay careful attention; we’re going to try to jump you over a big section of the Unix learning curve by showing you what is possible right at the outset. If you are an experienced Unix programmer in a hurry, you can skip this chapter—but maybe you shouldn’t. There might just be some bit of useful lore here that even you don’t know.

15.2 Choosing an Editor

The first and most basic tool of development is a text editor suitable for modifying and writing programs.

Literally dozens of text editors are available under Unix; writing one seems to be one of the standard finger exercises for budding open-source hackers. Most of these are ephemera, not suitable for extended use by anyone other than their authors. A few are emulations of non-Unix editors, useful as transition aids for programmers used to other operating systems. You can browse through a wide variety at SourceForge or ibiblio or any other major open-source archive.

For serious editing work, two editors completely dominate the Unix programming scene. Each is available in a couple of minor variant implementations, but has a standard version you can rely on finding on any modern Unix system. These two editors are vi and Emacs. We discussed them in Chapter 13 as part of our discussion of the right size of software.

As we noted in Chapter 13, these two editors express sharply contrasting design philosophies, but both are extremely popular and command great loyalty from identifiable core user populations. Surveys of Unix programmers consistently indicate about a 50/50 split between them, with all other editors barely registering.

In our earlier examinations of vi and Emacs, we were primarily concerned with their optional complexity and the surrounding design-philosophy issues. Many other things are worth knowing about these editors, both as a matter of practicality and of Unix cultural literacy.

15.2.1 Useful Things to Know about vi

The name of vi is an abbreviation for “visual editor” and is pronounced /vee eye/ (not /vie/ and definitely not /siks/!).

vi was not quite the earliest screen-oriented editor; that palm goes to the Rand editor, re, that ran on Version 6 Unix in the 1970s. But vi is the longest-lived screen-oriented editor built for Unix that is still in use, and is a hallowed part of Unix tradition.

The original vi was the version present in the earliest BSD software distributions beginning in 1976; it is now obsolete. Its replacement was ’new vi’ which shipped with 4.4BSD and is found on modern 4.4BSD variants such as BSD/OS, FreeBSD, and NetBSD systems. There are several variants with extended features, notably vim, vile, elvis, and xvi; of these vim is probably the most popular and is found on many Linux systems. All the variants are rather similar and share a core command set unchanged from the original vi.

Ports of vi are available for the Windows operating systems and MacOS.

Most introductory Unix books include a chapter describing basic vi usage. One place a vi FAQ is available is the Editor FAQ/vi <http://www.faqs.org/faqs/editor-faq/vi/>; you can find many other copies with a WWW keyword search for page titles including “vi” and “FAQ”.

15.2.2 Useful Things to Know about Emacs

Emacs stands for ’EDiting MACroS’ (pronounce it /ee´·maks/). It was originally written in the late 1970s as a set of macros in an editor called TECO, then reimplemented several times in different ways. In an amusing twist, modern Emacs implementations include a TECO emulation mode.

In our earlier discussion of editors and optional complexity, we noted that many people consider Emacs excessively heavyweight. However, investing the time to learn it can yield rich rewards in productivity. Emacs supports many powerful editing modes that offer help with the syntax of various programming languages and markups. We’ll see later in this chapter how Emacs can be used in combination with other development tools to give capabilities comparable to (and in many ways surpassing) those of conventional IDEs.

The standard Emacs, universally available on modern Unixes, is GNU Emacs; this is what generally runs if you type emacs to a Unix shell prompt. GNU Emacs sources and documentation are available at the Free Software Foundation archive site <ftp://gnu.org/pub/gnu>.

The only major variant is called XEmacs; it has a better X interface but otherwise quite similar capabilities (it forked from Emacs 19). XEmacs has a home page <http://www.xemacs.org>. Emacs (and Emacs Lisp) is universally available under modern Unixes. It has been ported to MS-DOS (where it works poorly) and Windows 95 and NT (where it is said to work reasonably well).

Emacs includes its own interactive tutorial and very complete on-line documentation; you’ll find instructions on how to invoke both on the default Emacs startup screen. A good introduction on paper is Learning GNU Emacs [Cameron].

The keystroke commands used in the Unix ports of Netscape/Mozilla and Internet Explorer text windows (in forms and the mailer) are copied from the stock Emacs bindings for basic text editing. These bindings are the closest thing to a cross-platform standard for editor keystrokes.

15.2.3 The Antireligious Choice: Using Both

Many people who regularly use both vi and Emacs tend to use them for different things, and find it valuable to know both.

In general, vi is best for small jobs—quick replies to mail, simple tweaks to system configuration, and the like. It is especially useful when you’re using a new system (or a remote one over a network) and don’t have your Emacs customization files handy.

Emacs comes into its own for extended editing sessions in which you have to handle complex tasks, modify multiple files, and use results from other programs during the session. For programmers using X on their console (which is typical on modern Unixes), it’s normal to start up Emacs shortly after login time in a large window and leave it running forever, possibly visiting dozens of files and even running programs in multiple Emacs subwindows.

15.3 Special-Purpose Code Generators

Unix has a long-standing tradition of hosting tools that are specifically designed to generate code for various special purposes. The venerable monuments of this tradition, which go back to Version 7 and earlier days, and were actually used to write the original Portable C Compiler back in the 1970s, are lex(1) and yacc(1) Their modern, upward-compatible successors are flex(1) and bison(1), part of the GNU toolkit and still heavily used today. These programs have set an example that is carried forward in projects like GNOME’s Glade interface builder.

15.3.1 yacc and lex

yacc and lex are tools for generating language parsers. We observed in Chapter 8 that your first minilanguage is all too likely to be an accident rather than a design. That accident is likely to have a hand-coded parser that costs you far too much maintenance and debugging time—especially if you have not realized it is a parser, and have thus failed to properly separate it from the remainder of your application code. Parser generators are tools for doing better than an accidental, ad-hoc implementation; they don’t just let you express your grammar specification at a higher level, they also wall off all the parser’s implementation complexity from the rest of your code.

If you reach a point where you are planning to implement a minilanguage from scratch, rather than by extending or embedding an existing scripting language or parsing XML, yacc and lex will probably be your most important tools after your C compiler.

lex and yacc each generate code for a single function—respectively, “get a token from the input stream” and “parse a sequence of tokens to see if it matches a grammar”. Usually, the yacc-generated parser function calls a Lex-generated tokenizer function each time it wants to get another token. If there are no user-written C callbacks at all in the yacc-generated parser, all it will do is a syntax check; the value returned will tell the caller if the input matched the grammar it was expecting.

More usually, the user’s C code, embedded in the generated parser, populates some runtime data structures as a side-effect of parsing the input. If the minilanguage is declarative, your application can use these runtime data structures directly. If your design was an imperative minilanguage, the data structures might include a parse tree which is immediately fed to some kind of evaluation function.

yacc has a rather ugly interface, through exported global variables with the name prefix yy_. This is because it predates structs in C; in fact, yacc predates C itself; the first implementation was written in C’s predecessor B. The crude though effective algorithm yacc-generated parsers use to try to recover from parse errors (pop tokens until an explicit error production is matched) can also lead to problems, including memory leaks.

If you are building parse trees, using malloc to make nodes, and you start popping things off the stack in error recovery, you don’t get to recover (free) the storage. In general, Yacc can’t do it, since it doesn’t know enough about what’s on the stack. If the yacc parser were in C++, it could assume that the values were classes and “destruct” them. In “real” compilers, parse tree nodes are generated using an arena-based allocator, so the nodes don’t leak, but there is a logical leak anyway that needs to be thought about to make industrial-strength error recovery.

—Steve Johnson

lex is a lexical analyzer generator. It’s a member of the same functional family as grep(1) and awk(1), but more powerful because it enables you to arrange for arbitrary C code to be executed on each match. It accepts a declarative minilanguage and emits skeleton C code.

A crude but useful way to think about what a lex-generated tokenizer does is as a sort of inverse grep(1). Where grep(1) takes a single regular expression and returns a list of matches in the incoming data stream, each call to a lex-generated tokenizer takes a list of regular expressions and indicates which expression occurs next in the datastream.

Splitting input analysis into tokenizing input and parsing the token stream is a useful tactic even if you’re not using Yacc and Lex and your “tokens” are nothing like the usual ones in a compiler. More than once I’ve found that splitting input handling into two levels made the code much simpler and easier to understand, despite the complexity added by the split itself.

—Henry Spencer

lex was written to automate the task of generating lexical analyzers (tokenizers) for compilers. It turned out to have a surprisingly wide range of uses for other kinds of pattern recognition, and has since been described as “the Swiss-army knife of Unix programming”.1

1 The common latter-day description of Perl as a “Swiss-army chainsaw” is derivative.

If you are attacking any kind of pattern-recognition or state-machine problem in which all the possible input stimuli will fit in a byte, lex may enable you to generate code that will be more efficient and reliable than a hand-crafted state machine.

John Jarvis at Holmdel [an AT&T laboratory] used lex to find faults in circuit boards, by scanning the board, using a chain-encoding technique to represent the edges of areas on the board, and then using Lex to define patterns that would catch common fabrication errors.

—Mike Lesk

Most importantly, the lex specification minilanguage is much higher-level and more compact than equivalent handcrafted C. Modules are available to use flex, the open-source version, with Perl (find them with a Web search for “lex perl”), and a work-alike implementation is part of PLY in Python.

lex generates parsers that are up to an order of magnitude slower than hand-coded parsers. This is not a good reason to hand-code, however; it’s an argument for prototyping with lex and hand-hacking only if prototyping reveals an actual bottleneck.

yacc is a parser generator. It, too, was written to automate part of the job of writing compilers. It takes as input a grammar specification in a declarative minilanguage resembling BNF (Backus-Naur Form) with C code associated with each element of the grammar. It generates code for a parser function that, when called, accepts text matching the grammar from an input stream. As each grammar element is recognized, the parser function runs the associated C code.

The combination of lex and yacc is very effective for writing language interpreters of all kinds. Though most Unix programmers never get to do the kind of general-purpose compiler-building that these tools were meant to assist, they’re extremely useful for writing parsers for run-control file syntaxes and domain-specific minilanguages.

lex-generated tokenizers are very fast at recognizing low-level patterns in input streams, but the regular-expression minilanguage that lex knows is not good at counting things, or recognizing recursively nested structures. For parsing those, you want yacc. On the other hand, while you theoretically could write a yacc grammar to do its own token-gathering, the grammar to specify that would be hugely bloated and the parser extremely slow. For tokenizing input, you want lex. Thus, these tools are symbiotic.

If you can implement your parser in a higher-level language than C (which we recommend you do; see Chapter 14 for discussion), then look for equivalent facilities like Python’s PLY (which covers both lex and yacc)2 or Perl’s PY and Parse::Yapp modules, or Java’s CUP,3 Jack,4 or Yacc/M5 packages.

2 PLY is downloadable <http://systems.cs.uchicago.edu/ply/>.

3 CUP is downloadable <http://www.cs.princeton.edu/~appel/modern/java/CUP/>.

4 Jack is downloadable <http://www.javaworld.com/javaworld/jw-12-1996/jw-12-jack.html>.

5 Yacc/M is downloadable <http://david.tribble.com/yaccm.html>.

As with macro processors, one of the problems with code generators and preprocessors is that compile-time errors in the generated code may carry line numbers that are relative to the generated code (which you don’t want to edit) rather than the generator input (which is where you need to make corrections). yacc and lex address this by generating the same #line constructs that the C preprocessor does; these set the current line number for error reporting so the numbers will come out right. Any program that generates C or C++ should do likewise.

More generally, well-designed procedural-code generators should never require the user to hand-alter or even look at the generated parts. Getting those right is the code generator’s job.

15.3.1.1 Case Study: The fetchmailrc Grammar

The canonical demonstration example that seems to have appeared in every lex and yacc tutorial ever written is a toy interactive calculator program that parses and evaluates arithmetic expressions entered by the user. We will spare you yet another repetition of this cliche; if you are interested, consult the source code of the bc(1) and dc(1) calculator implementations from the GNU project, or the paradigm example ’hoc’6 from [Kernighan-Pike84].

6 <http://cm.bell-labs.com/cm/cs/upe/>

Instead, the grammar of fetchmail’s run-control-file parser provides a good medium-sized case study in lex and yacc usage. There are a couple of points of interest here.

The lex specification, in rcfile_l.l, is a very typical implementation of a shell-like syntax. Note how two complementary rules support either single or double-quoted strings; this is a good idea in general. The rules for accepting (possibly signed) integer literals and discarding comments are also pretty generic.

The yacc specification, in rcfile_y.y, is long but straightforward. It does not perform any fetchmail actions, just sets bits in a list of internal control blocks. After startup, fetchmail’s normal mode of operation is just to repeatedly walk that list, using each record to drive a retrieval session with a remote site.

15.3.2 Case Study: Glade

We looked at Glade in Chapter 8 as a good example of a declarative minilanguage. We also noted that its back end produces a result by generating code in any one of several languages.

Glade is a good modern example of an application-code generator. What makes it Unixy in spirit are the following features, which most GUI builders (especially most proprietary GUI builders) don’t have:

• Rather than being glued together as one monster monolith, the Glade GUI and Glade code generator obey the Rule of Separation (following the “separated engine and interface” design pattern).

The GUI and code generator are connected by an (XML-based) textual data file format that can be read and modified by other tools.

• Multiple target languages (as opposed to just C or C++) are supported. More could easily be added.

The design implies that it should also be possible to replace the Glade GUI editor component, should that ever become desirable.

15.4 make: Automating Your Recipes

Program sources by themselves don’t make an application. The way you put them together and package them for distribution matters, too. Unix provides a tool for semi-automating these processes; make(1). Make is covered in most introductory Unix books. For a really thorough reference, you can consult Managing Projects with Make [Oram-Talbot]. If you’re using GNU make (the most advanced make, and the one normally shipped with open-source Unixes) the treatment in Programming with GNU Software [Loukides-Oram] may be better in some respects. Most Unixes that carry GNU make will also support GNU Emacs; if yours does you will probably find a complete make manual on-line through Emacs’s info documentation system.

Ports of GNU make to DOS and Windows are available from the FSF.

15.4.1 Basic Theory of make

If you’re developing in C or C++, an important part of the recipe for building your application will be the collection of compilation and linkage commands needed to get from your sources to working binaries. Entering these commands is a lot of tedious detail work, and most modern development environments include a way to put them in command files or databases that can automatically be re-executed to build your application.

Unix’s make(1) program, the original of all these facilities, was designed specifically to help C programmers manage these recipes. It lets you write down the dependencies between files in a project in one or more ’makefiles’. Each makefile consists of a series of productions; each one tells make that some given target file depends on some set of source files, and says what to do if any of the sources are newer than the target. You don’t actually have to write down all dependencies, as the make program can deduce a lot of the obvious ones from filenames and extensions.

For example: You might put in a makefile that the binary myprog depends on three object files myprog.o, helper.o, and stuff.o. If you have source files myprog.c, helper.c, and stuff.c, make will know without being told that each .o file depends on the corresponding .c file, and supply its own standard recipe for building a .o file from a .c file.

Make originated with a visit from Steve Johnson (author of yacc, etc.), storming into my office, cursing the Fates that had caused him to waste a morning debugging a correct program (bug had been fixed, file hadn’t been compiled, cc *.o was therefore unaffected). As I had spent a part of the previous evening coping with the same disaster on a project I was working on, the idea of a tool to solve it came up. It began with an elaborate idea of a dependency analyzer, boiled down to something much simpler, and turned into Make that weekend. Use of tools that were still wet was part of the culture. Makefiles were text files, not magically encoded binaries, because that was the Unix ethos: printable, debuggable, understandable stuff.

—Stuart Feldman

When you run make in a project directory, the make program looks at all productions and timestamps and does the minimum amount of work necessary to make sure derived files are up to date.

You can read a good example of a moderately complex makefile in the sources for fetchmail. In the subsections below we’ll refer to it again.

Very complex makefiles, especially when they call subsidiary makefiles, can become a source of complications rather than simplifying the build process. A now-classic warning is issued in Recursive Make Considered Harmful.7 The argument in this paper has become widely accepted since it was written in 1997, and has come near to reversing previous community practice.

7 Available on the Web <http://www.tip.net.au/~millerp/rmch/recu-make-cons-harm.html>.

No discussion of make(1) would be complete without an acknowledgement that it includes one of the worst design botches in the history of Unix. The use of tab characters as a required leader for command lines associated with a production means that the interpretation of a makefile can change drastically on the basis of invisible differences in whitespace.

Why the tab in column 1? Yacc was new, Lex was brand new. I hadn’t tried either, so I figured this would be a good excuse to learn. After getting myself snarled up with my first stab at Lex, I just did something simple with the pattern newline-tab. It worked, it stayed. And then a few weeks later I had a user population of about a dozen, most of them friends, and I didn’t want to screw up my embedded base. The rest, sadly, is history.

—Stuart Feldman

15.4.2 make in Non-C/C++ Development

make is not just useful for C/C++ recipes, however. Scripting languages like those we described in Chapter 14 may not require conventional compilation and link steps, but there are often other kinds of dependencies that make(1) can help you with.

Suppose, for example, that you actually generate part of your code from a specification file, using one of the techniques from Chapter 9. You can use make to tie the spec file and the generated source together. This will ensure that whenever you change the spec and remake, the generated code will automatically be rebuilt.

It’s quite common to use makefile productions to express recipes for making documentation as well as code. You’ll often see this approach used to automatically generate PostScript or other derived documentation from masters written in some markup language (like HTML or one of the Unix document-macro languages we’ll survey in Chapter 18). In fact, this sort of use is so common that it’s worth illustrating with a case study.

15.4.2.1 Case Study: make for Document-File Translation

In the fetchmail makefile, for example, you’ll see three productions that relate files named FAQ, FEATURES, and NOTES to HTML sources fetchmail-FAQ.html, fetchmail-features.html, and design-notes.html.

The HTML files are meant to be accessible on the fetchmail Web page, but all the HTML markup makes them uncomfortable to look at unless you’re using a browser. So the FAQ, FEATURES, and NOTES are flat-text files meant to be flipped through quickly with an editor or pager program by someone reading the fetchmail sources themselves (or, perhaps, distributed to FTP sites that don’t support Web access).

The flat-text forms can be made from their HTML masters by using the common open-source program lynx(1). lynx is a Web browser for text-only displays; but when invoked with the -dump option it functions reasonably well as an HTML-to-ASCII formatter.

With the productions in place, the developer can edit the HTML masters without having to remember to manually rebuild the flat-text forms afterwards, secure in the knowledge that FAQ, FEATURES, and NOTES will be properly rebuilt whenever they are needed.

15.4.3 Utility Productions

Some of the most heavily used productions in typical makefiles don’t express file dependencies at all. They’re ways to bundle up little procedures that a developer wants to mechanize, like making a distribution package or removing all object files in order to do a build from scratch.

Nonfile productions were intentional and in there from day one. ’Make all’ and ’clean’ were my own conventions from earliest days. One of the older Unix jokes is “Make love” which results in “Don’t know how to make love”.

—Stuart Feldman

There is a well-developed set of conventions about what utility productions should be present and how they should be named. Following these will make your makefile much easier to understand and use.

all

Your all production should make every executable of your project. Usually the all production doesn’t have an explicit rule; instead it refers to all of your project’s top-level targets (and, not accidentally, documents what those are). Conventionally, this should be the first production in your makefile, so it will be the one executed when the developer types make with no argument.

test

Run the program’s automated test suite, typically consisting of a set of unit tests8 to find regressions, bugs, or other deviations from expected behavior during the development process. The ’test’ production can also be used by end-users of the software to ensure that their installation is functioning correctly.

8 A unit test is test code attached to a module to verify correct performance. Use of the term ’unit test’ suggests that the test is written concurrently with the code by the developer of the code, and implies a discipline in which module releases aren’t considered complete until they have attached test code. The term and the concept originated in the “Extreme Programming” methodology popularized by Kent Beck, but has gained wide acceptance among Unix programmers since about 2001.

clean

Remove all files (such as binary executables and object files) that are normally created when you make all. A make clean should reset the process of building the software to a good initial state.

dist

Make a source archive (usually with the tar(1) program) that can be shipped as a unit and used to rebuild the program on another machine. This target should do the equivalent of depending on all so that a make dist automatically rebuilds the whole project before making the distribution archive—this is a good way to avoid last-minute embarrassments, like not shipping derived files that are actually needed (like the flat-text README in fetchmail, which is actually generated from an HTML source).

distclean

Throw away everything but what you would include if you were bundling up the source with make dist. This may be the the same as make clean but should be included as a production of its own anyway, to document what’s going on. When it’s different, it usually differs by throwing away local configuration files that aren’t part of the normal make all build sequence (such as those generated by autoconf(1); we’ll talk about autoconf(1) in Chapter 17).

realclean

Throw away everything you can rebuild using the makefile. This may be the same as make distclean, but should be included as a production of its own anyway, to document what’s going on. When it’s different, it usually differs by throwing away files that are derived but (for whatever reason) shipped with the project sources anyway.

install

Install the project’s executables and documentation in system directories so they will be accessible to general users (this typically requires root privileges). Initialize or update any databases or libraries that the executables require in order to function.

uninstall

Remove files installed in system directories by make install (this typically requires root privileges). This should completely and perfectly reverse a make install. The presence of an uninstall production implies a kind of humility that experienced Unix hands look for as a sign of thoughtful design; conversely, not having an uninstall production is at best careless, and (when, for example, an installation creates large database files) can be quite rude and thoughtless.

Working examples of all the standard targets are available for inspection in the fetchmail makefile. By studying all of them together you will see a pattern emerge, and (not incidentally) learn much about the fetchmail package’s structure. One of the benefits of using these standard productions is that they form an implicit roadmap of their project.

But you need not limit yourself to these utility productions. Once you master make, you’ll find yourself more and more often using the makefile machinery to automate little tasks that depend on your project file state. Your makefile is a convenient central place to put these; using it makes them readily available for inspection and avoids cluttering up your workspace with trivial little scripts.

15.4.4 Generating Makefiles

One of the subtle advantages of Unix make over the dependency databases built into many IDEs is that makefiles are simple text files—files that can be generated by programs.

In the mid-1980s it was fairly common for large Unix program distributions to include elaborate custom shellscripts that would probe their environment and use the information they gathered to construct custom makefiles. These custom configurators reached absurd sizes. I wrote one once that was 3000 lines of shell, about twice as large as any single module in the program it was configuring—and this was not unusual.

The community eventually said “Enough!” and various people set out to write tools that would automate away part or all of the process of maintaining makefiles. These tools generally tried to address two issues:

One issue is portability. Makefile generators are commonly built to run on many different hardware platforms and Unix variants. They generally try to deduce things about the local system (including everything from machine word size up to which tools, languages, service libraries, and even document formatters it has available). They then try to use those deductions to write makefiles that exploit the local system’s facilities and compensate for its quirks.

The other issue is dependency derivation. It’s possible to deduce a great deal about the dependencies of a collection of C sources by analyzing the sources themselves (especially by looking at what include files they use and share). Many makefile generators do this in order to mechanically generate make dependencies.

Each different makefile generator tackles these objectives in a slightly different way. Probably a dozen or more generators have been attempted, but most proved inadequate or too difficult to drive or both, and only a few are still in live use. We’ll survey the major ones here. All are available as open-source software on the Internet.

15.4.4.1 makedepend

Several small tools have tackled the rule automation part of the problem exclusively. This one, distributed along with the X windowing system from MIT, is the fastest and most useful and comes preinstalled under all modern Unixes, including all Linuxes.

makedepend takes a collection of C sources and generates dependencies for the corresponding .o files from their #include directives. These can be appended directly to a makefile, and in fact makedepend is defined to do exactly that.

makedepend is useless for anything but C projects. It doesn’t try to solve more than one piece of the makefile-generation problem. But what it does it does quite well.

makedepend is sufficiently documented by its manual page. If you type man makedepend at a terminal window you will quickly learn what you need to know about invoking it.

15.4.4.2 Imake

Imake was written in an attempt to mechanize makefile generation for the X window system. It builds on makedepend to tackle both the dependency-derivation and portability problems.

Imake system effectively replaces conventional makefiles with Imakefiles. These are written in a more compact and powerful notation which is (effectively) compiled into makefiles. The compilation uses a rules file which is system-specific and includes a lot of information about the local environment.

Imake is well suited to X’s particular portability and configuration challenges and universally used in projects that are part of the X distribution. However, it has not achieved much popularity outside the X developer community. It’s hard to learn, hard to use, hard to extend, and produces generated makefiles of mind-numbing size and complexity.

The Imake tools will be available on any Unix that supports X, including Linux. There has been one heroic effort [DuBois] to make the mysteries of Imake comprehensible to non-X-programming mortals. These are worth learning if you are going to do X programming.

15.4.4.3 autoconf

autoconf was written by people who had seen and rejected the Imake approach. It generates per-project configure shellscripts that are like the old-fashioned custom script configurators. These configure scripts can generate makefiles (among other things).

Autoconf is focused on portability and does no built-in dependency derivation at all. Although it is probably as complex as Imake, it is much more flexible and easier to extend. Rather than relying on a per-system database of rules, it generates configure shell code that goes out and searches your system for things.

Each configure shellscript is built from a per-project template that you have to write, called configure.in. Once generated, though, the configure script will be self-contained and can configure your project on systems that don’t carry autoconf(1) itself.

The autoconf approach to makefile generation is like imake’s in that you start by writing a makefile template for your project. But autoconf’s Makefile.in files are basically just makefiles with placeholders in them for simple text substitution; there’s no second notation to learn. If you want dependency derivation, you must take explicit steps to call makedepend(1) or some similar tool—or use automake(1).

autoconf is documented by an on-line manual in the GNU info format. The source scripts of autoconf are available from the FSF archive site, but are also preinstalled on many Unix and Linux versions. You should be able to browse this manual through your Emacs’s help system.

Despite its lack of direct support for dependency derivation, and despite its generally ad-hoc approach, in mid-2003 autoconf is clearly the most popular of the makefile generators, and has been for some years. It has eclipsed Imake and driven at least one major competitor (metaconfig) out of use.

A reference, GNU Autoconf, Automake and Libtool is available [Vaughan]. We’ll have more to say about autoconf, from a slightly different angle, in Chapter 17.

15.4.4.4 automake

automake is an attempt to add Imake-like dependency derivation as a layer on top of autoconf(1). You write Makefile.am templates in a broadly Imake-like notation; automake(1) compiles them to Makefile.in files, which autoconf’s configure scripts then operate on.

automake is still relatively new technology in mid-2003. It is used in several FSF projects but has not yet been widely adopted elsewhere. While its general approach looks promising, it is as yet rather brittle—it works when used in stereotyped ways but tends to break badly if you try to do anything unusual with it.

Complete on-line documentation is shipped with automake, which can be downloaded from the FSF archive site.

15.5 Version-Control Systems

Code evolves. As a project moves from first-cut prototype to deliverable, it goes through multiple cycles in which you explore new ground, debug, and then stabilize what you’ve accomplished. And this evolution doesn’t stop when you first deliver for production. Most projects will need to be maintained and enhanced past the 1.0 stage, and will be released multiple times. Tracking all that detail is just the sort of thing computers are good at and humans are not.

15.5.1 Why Version Control?

Code evolution raises several practical problems that can be major sources of friction and drudgery—thus a serious drain on productivity. Every moment spent on these problems is a moment not spent on getting the design and function of your project right.

Perhaps the most important problem is reversion. If you make a change, and discover it’s not viable, how can you revert to a code version that is known good? If reversion is difficult or unreliable, it’s hard to risk making changes at all (you could trash the whole project, or make many hours of painful work for yourself).

Almost as important is change tracking. You know your code has changed; do you know why? It’s easy to forget the reasons for changes and step on them later. If you have collaborators on a project, how do you know what they have changed while you weren’t looking, and who was responsible for each change?

Amazingly often, it is useful to ask what you have changed since the last known-good version, even if you have no collaborators. This often uncovers unwanted changes, such as forgotten debugging code. I now do this routinely before checking in a set of changes.

—Henry Spencer

Another issue is bug tracking. It’s quite common to get new bug reports for a particular version after the code has mutated away from it considerably. Sometimes you can recognize immediately that the bug has already been stomped, but often you can’t. Suppose it doesn’t reproduce under the new version. How do you get back the state of the code for the old version in order to reproduce and understand it?

To address these problems, you need procedures for keeping a history of your project, and annotating it with comments that explain the history. If your project has more than one developer, you also need mechanisms for making sure developers don’t overwrite each others’ versions.

15.5.2 Version Control by Hand

The most primitive (but still very common) method is all hand-hacking. You snapshot the project periodically by manually copying everything in it to a backup. You include history comments in source files. You make verbal or email arrangements with other developers to keep their hands off certain files while you hack them.

The hidden costs of this hand-hacking method are high, especially when (as frequently happens) it breaks down. The procedures take time and concentration; they’re prone to error, and tend to get slipped under pressure or when the project is in trouble—that is, exactly when they are most needed.

As with most hand-hacking, this method does not scale well. It restricts the granularity of change tracking, and tends to lose metadata details such as the order of changes, who did them, and why. Reverting just a part of a large change can be tedious and time consuming, and often developers are forced to back up farther than they’d like after trying something that doesn’t work.

15.5.3 Automated Version Control

To avoid these problems, you can use a version-control system (VCS), a suite of programs that automates away most of the drudgery involved in keeping an annotated history of your project and avoiding modification conflicts.

Most VCSs share the same basic logic. To use one, you start by registering a collection of source files—that is, telling your VCS to start archive files describing their change histories. Thereafter, when you want to edit one of these files, you have to check out the file—assert an exclusive lock on it. When you’re done, you check in the file, adding your changes to the archive, releasing the lock, and entering a change comment explaining what you did.

The history of the project is not necessarily linear. All VCSs in common use actually allow you to maintain a tree of variant versions (for ports to different machines, say) with tools for merging branches back into the main “trunk” version. This feature becomes important as the size and dispersion of the development group increases. It needs to be used with care, however; multiple active variants of the code base can be very confusing (just associated bug reports to the right version are not necessarily easy), and automated merging of branches does not guaranteed that the combined code works.

Most of the rest of what a VCS does is convenience: labeling, and reporting features surrounding these basic operations, and tools which allow you to view differences between versions, or to group a given set of versions of files as a named release that can be examined or reverted to at any time without losing later changes.

VCSs have their problems. The biggest one is that using a VCS involves extra steps every time you want to edit a file, steps that developers in a hurry tend to want to skip if they have to be done by hand. Near the end of this chapter we’ll discuss a way to solve this problem.

Another problem is that some kinds of natural operations tend to confuse VCSs. Renaming files is a notorious trouble spot; it’s not easy to automatically ensure that a file’s version history will be carried along with it when it is renamed. Renaming problems are particularly difficult to resolve when the VCS supports branching.

Despite these difficulties, VCSs are a huge boon to productivity and code quality in many ways, even for small single-developer projects. They automate away many procedures that are just tedious work. They help a lot in recovering from mistakes. Perhaps most importantly, they free programmers to experiment by guaranteeing that reversion to a known-good state will always be easy.

(VCSs, by the way, are not merely good for program code; the manuscript of this book was maintained as a collection of files under RCS while it was being written.)

15.5.4 Unix Tools for Version Control

Historically, three VCSs have been of major significance in the Unix world, and we’ll survey them here. For an extended introduction and tutorial, consult Applying RCS and SCCS [Bolinger-Bronson].

15.5.4.1 Source Code Control System (SCCS)

The first was SCCS, the original Source Code Control System developed by Bell Labs around 1980 and featured in System III Unix. SCCS seems to have been the first serious attempt at a unified source-code management system; concepts that it pioneered are still found at some level in all later ones, including commercial Unix and Windows products such as ClearCase.

SCCS itself is, however, now obsolete; it was proprietary Bell Labs software. Superior open-source alternatives have since been developed, and most of the Unix world has converted to those. SCCS is still in use to manage old projects at some commercial vendors, but can no longer be recommended for new projects.

No complete open-source implementation of SCCS exists. A clone called CSSC (Compatibly Stupid Source Control) is in development under the sponsorship of the FSF.

15.5.4.2 Revision Control System (RCS)

The superior open-source alternatives began with RCS (Revision Control System), born at Purdue University a few years after SCCS and originally distributed with 4.3BSD Unix. It is logically similar to SCCS but has a cleaner command interface, and good facilities for grouping together entire project releases under symbolic names.

RCS is currently the most widely used version control system in the Unix world. Some other Unix version-control systems use it as a back end or underlayer. It is well suited for single-developer or small-group projects hosted at a single development shop.

The RCS sources are maintained and distributed by the FSF. Free ports are available for Microsoft operating systems and VAX VMS.

15.5.4.3 Concurrent Version System (CVS)

CVS (Concurrent Version System) began life as a front end to RCS developed in the early 1990s, but the model of version control it uses was different enough that it immediately qualified as a new design. Modern implementations don’t rely on RCS.

Unlike RCS and SCCS, CVS doesn’t exclusively lock files when they’re checked out. Instead, it tries to reconcile nonconflicting changes mechanically when they’re checked back in, and requests human help on conflicts. The design works because patch conflicts are much less common than one might intuitively think.

The interface of CVS is significantly more complex than that of RCS, and it needs a lot more disk space. These properties make it a poor choice for small projects. On the other hand, CVS is well suited to large multideveloper efforts distributed across several development sites connected by the Internet. CVS tools on a client machine can easily be told to direct their operations to a repository located on a different host.

The open-source community makes heavy use of CVS for projects such as GNOME and Mozilla. Typically, such CVS repositories allow anyone to check out sources remotely. Anyone can, therefore, make a local copy of a project, modify it, and mail change patches to the project maintainers. Actual write access to the repository is more limited and has to be explicitly granted by the project maintainers. A developer who has such access can perform a commit option from his modified local copy, which will cause the local changes to get made directly to the remote repository.

You can see an example of a well-run CVS repository, accessible over the Internet, at the GNOME CVS site <http://cvs.gnome.org>. This site illustrates the use of CVS-aware browsing tools such as Bonsai, which are useful in helping a large and decentralized group of developers coordinate their work.

The social machinery and philosophy accompanying the use of CVS is as important as the details of the tools. The assumption is that projects will be open and decentralized, with code subject to peer review and inspection even by developers who are not officially members of the project group.

Just as importantly, CVS’s nonlocking philosophy means that projects can’t be blocked by a lock if a programmer disappears in the middle of making some changes. CVS thus allows developers to avoid the “single person point of failure” problem; in turn, this means that project boundaries can be fluid, casual contributions are relatively easy, and projects are not required to have an elaborate hierarchy of control.

The CVS sources are maintained and distributed by the FSF.

CVS has significant problems. Some are merely implementation bugs, but one basic problem is that your project’s file namespace is not versioned in the same way changes to files themselves are. Thus, CVS is easily confused by file renamings, deletions, and additions. Also, CVS records changes on a per-file basis, rather than as sets of changes made to files. This makes it harder to back out to specific versions, and harder to handle partial check-ins. Fortunately, none of these problems are intrinsic to the nonlocking style, and they have been successfully addressed by newer version-control systems.

15.5.4.4 Other Version-Control Systems

CVS’s design problems are sufficient to have created demand for a better open-source VCS. Several such efforts are under way as of 2003. The most notable of these are Aegis and Subversion.

Aegis <http://www.pcug.org.au/~millerp/aegis/aegis.html> has the longest history of any of these alternatives, has hosted its own development since 1991, and is a mature production system. It features a heavy emphasis on regression-testing and validation.

Subversion <http://subversion.tigris.org/> is positioned as “CVS done right”, with the known design problems fully addressed, and in 2003 probably has the best near-term prospect of replacing CVS.

The BitKeeper <http://www.bitkeeper.com> project explores some interesting design ideas related to change-sets and multiple distributed code repositories. Linus Torvalds uses Bitkeeper for the Linux kernel sources. Its non-open-source license is, however, controversial, and has significantly retarded the acceptance of the product.

15.6 Runtime Debugging

Anyone who has been programming longer than a week knows that getting the syntax of your programming language right is the easy part of debugging. The hard part comes after that, when you need to understand why your syntactically correct program doesn’t behave as you expect.

The Unix tradition encourages developers to anticipate this problem by designing for transparency—in particular, designing programs in such a way that their internal data flows are readily monitored with the naked eye and simple tools, and readily mentally modeled. This is a topic we covered in detail in Chapter 6. Design for transparency is valuable both for preventing bugs and for easing the runtime-debugging task.

Design for transparency is not, however, sufficient in itself. When you are debugging a program at runtime, it’s extremely useful to be able to examine the state of your program at runtime, set breakpoints, and execute pieces of it down to the single-statement level in a controlled way. Unix has a long tradition of hosting programs to help you with this. Open-source Unixes feature a powerful one called gdb (yet another FSF project) that supports C and C++ debugging.

Perl, Python, Java, and Emacs Lisp all support standard packages or programs (included with their base distributions) that allow you to set breakpoints, control execution, and do general runtime-debugger things. Tcl, designed as a small language for small projects, has no such facility (though it does have a trace facility that can be used to watch variables at runtime).

Remember the Unix philosophy. Spend your time on design quality, not the low-level details, and automate away everything you can—including the detail work of runtime debugging.

15.7 Profiling

As a general rule, 90% of the execution time of your program will be spent in 10% of its code. Profilers are tools that help you identify the 10% of hot spots that constrain the speed of your program. This is a good thing for making it faster.

But in the Unix tradition, profilers have a far more important function. They enable you not to optimize the other 90%! This is good, and not just because it saves you work. The really valuable effect is that not optimizing that 90% holds down global complexity and reduces bugs.

You may recall that we quoted Donald Knuth observing “Premature optimization is the root of all evil” in Chapter 1, and that Rob Pike and Ken Thompson had a few pungent observations on the topic as well. These were the voices of experience. Do good design. Think about what’s right first. Tune for efficiency later.

Profilers help you do this. If you get in the good habit of using them, you can get rid of the bad habit of premature optimization. Profilers don’t just change the way you work; they change how you think.

Profilers for compiled languages rely on instrumenting object code, so they are even more platform-dependent than compilers. On the other hand, a compiled-language profiler doesn’t care about the source language of the programs it instruments. Under Unix, the single profiler gprof(1) handles C, C++, and all other compiled languages.

Perl, Python, and Emacs Lisp have their own profilers included in their basic distributions; these are portable across all platforms on which the host languages themselves run. Java has built-in profiling. Tcl has no profiling support as yet.

15.8 Combining Tools with Emacs

One of the things the Emacs editor is very good at is acting as a front end for other development tools (we discussed this from a philosophical angle in Chapter 13). In fact, nearly every tool we’ve discussed in this chapter can be driven from within an Emacs editor session through front ends that give them greater utility than they would have running standalone.

To illustrate this, we’ll walk you through the use of these tools with Emacs in a typical build/test/debug cycle. For details on them, see Emacs’s own on-line help system; this section just gives you an overview, to motivate you to learn more.

Read and learn—not just about Emacs, but about the mental habit of looking for synergies between programs, and creating them. Try to read this section as instruction in philosophy, not just technique.

15.8.1 Emacs and make

Make, for example, can be started with the Emacs command ESC-x compile followed by an Enter. This command will run make(1) in the current directory, capturing the output in an Emacs buffer.

This by itself wouldn’t be very useful. But Emacs’s make mode knows about the error message format (featuring a source file and line number) emitted by Unix C compilers and many other tools.

If anything run by make issues error messages, the command Ctl-X ` (control-X-backquote) will try to parse them and take you to each error location in turn, popping open a window on the appropriate file and taking the cursor to the error line.9

9 Look at p+processes->compile under the Emacs help menu for more information on these and related compilation-control commands.

This makes it extremely easy to step through an entire build, fixing any syntax that has been broken since the last compile.

15.8.2 Emacs and Runtime Debugging

For catching runtime errors, Emacs offers similar integration with your symbolic debugger—that is, you can use an Emacs mode to set breakpoints in your programs and examine their runtime state. You run the debugger by sending it commands through an Emacs window. Whenever the debugger stops on a breakpoint, the message the debugger ships back about the source location is parsed and used to pop up a window on the source around the breakpoint.

Emacs’s Grand Unified Debugger mode supports all the major C debuggers: gdb(1), sdb(1), dbx(1), and xdb(1). It also supports Perl symbolic debugging using the perldb module, and the standard debuggers for both Java and Python. Facilities built into Emacs Lisp itself support interactive debugging of Emacs Lisp code.

At time of writing (mid-2003) there is not yet support for Tcl debugging from within Emacs. The design of Tcl is such that it seems unlikely to be added.

15.8.3 Emacs and Version Control

Once you’ve corrected your program’s syntax and fixed its runtime bugs, you may want to save the changes into a version-controlled archive. If you’ve only tried running version-control tools from the shell, it’s hard to blame you for sloughing off this important step. Who wants to have to remember to run checkout/checkin commands around every edit operation?

Fortunately, Emacs offers help here too. Code built into Emacs implements a simple-to-use front end for SCCS, RCS, CVS, or Subversion. The single command Ctl-x v v tries to deduce the next logical version-control operation to do on the file you are visiting. The operations this includes are registering a file, checking out and locking it, and checking it back in (accepting a change comment in a pop-up buffer).10

10 See the subsection of the Emacs on-line documentation titled Version Control for more details on these and related commands.

Emacs also helps you view the change history of version-controlled files, and helps you back out changes you don’t want. It makes it easy to apply version-control operations to whole sets or project directory trees of files. In general, it does a pretty good job of making version-control operations painless.

The implications of these features are larger than you might guess before you’ve gotten used to it. You’ll find, once you get used to fast and easy version control, that it’s extremely liberating. Because you know you can always revert to a known-good state, you’ll find you feel more free to develop in a fluid and exploratory way, trying lots of changes out to see their effects.

15.8.4 Emacs and Profiling

Surprise...this is perhaps the only phase of the development cycle in which Emacs front-ending does not offer substantial help. Profiling is an intrinsically batchy operation—instrument your program, run it, view the statistics, speed-tune the code with an editor, repeat. There isn’t much room for Emacs leverage in the profiling-specific parts of this cycle.

Nevertheless, there’s a good tutorial reason for us to think about Emacs and profiling. If you found yourself analyzing a lot of profiling reports, it might pay you to write a mode in which a mouse click or keystroke on a profile report line visited the source of the relevant function. This actually would be fairly easy to do using the Emacs ’tags’ code. In fact, by the time you read this, some other reader may already have written such a mode and contributed it to the public Emacs code base.

The real point here is again a philosophical one. Don’t drudge—drudging wastes your time and productivity! If you find yourself spending a lot of time on the low-level mechanical parts of development, step back. Apply the Unix philosophy. Use your toolkit to automate or semi-automate the task.

Then give back something in return for all you’ve inherited, by posting your solution as open-source software to the Internet. Help liberate your fellow programmers from drudgery, too.

15.8.5 Like an IDE, Only Better

Earlier in this chapter we asserted that Emacs can give you capabilities resembling those of a conventional integrated development environment, only better. By now you should have enough facts in hand to see how that can be true. You can run entire development projects from inside Emacs, driving the low-level mechanics with a few keystrokes and saving yourself the mental effort and disruption of constantly switching contexts.

The Emacs-enabled development style trades away some capabilities of advanced IDEs, like graphical views of program structure. But those are frills. What Emacs gives you in return is flexibility and control. You’re not limited by the imagination of the IDE designer: you can tweak, customize, and add task-related intelligence using Emacs Lisp. Also, Emacs is better at supporting mixed-language development than conventional IDEs.

Finally, you’re not limited to accepting what one small group of IDE developers sees fit to support. By keeping an eye on the open-source community, you can benefit from the work of thousands of your peers, Emacs-using developers facing challenges much like yours. This is much more effective—and much more fun.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.242.253