16. Reuse: On Not Reinventing the Wheel

When the superior man refrains from acting, his force is felt for a thousand miles.

Tao Te Ching (as popularly mistranslated)

Reluctance to do unnecessary work is a great virtue in programmers. If the Chinese sage Lao-Tze were alive today and still teaching the way of the Tao, he would probably be mistranslated as: When the superior programmer refrains from coding, his force is felt for a thousand miles. In fact, recent translators have suggested that the Chinese term wu-wei that has traditionally been rendered as “inaction” or “refraining from action” should probably be read as “least action” or “most efficient action” or “action in accordance with natural law”, which is an even better description of good engineering practice!

Remember the Rule of Economy. Re-inventing fire and the wheel for every new project is terribly wasteful. Thinking time is precious and very valuable relative to all the other inputs that go into software development; accordingly, it should be spent solving new problems rather than rehashing old ones for which known solutions already exist. This attitude gives the best return both in the “soft” terms of developing human capital and in the “hard” terms of economic return on development investment.

Reinventing the wheel is bad not only because it wastes time, but because reinvented wheels are often square. There is an almost irresistible temptation to economize on reinvention time by taking a shortcut to a crude and poorly-thought-out version, which in the long run often turns out to be false economy.

—Henry Spencer

The most effective way to avoid reinventing the wheel is to borrow someone else’s design and implementation of it. In other words, to reuse code.

Unix supports reuse at every level from individual library modules up to entire programs, which Unix helps you script and recombine. Systematic reuse is one of the most important distinguishing behaviors of Unix programmers, and the experience of using Unix should teach you a habit of trying to prototype solutions by combining existing components with a minimum of new invention, rather than rushing to write standalone code that will only be used once.

The virtuousness of code reuse is one of the great apple-pie-and-motherhood verities of software development. But many developers entering the Unix community from a basis of experience in other operating systems have never learned (or have unlearned) the habit of systematic reuse. Waste and duplicative work is rife, even though it seems to be against the interests both of those who pay for code and those who produce it. Understanding why such dysfunctional behavior persists is the first step toward changing it.

16.1 The Tale of J. Random Newbie

Why do programmers reinvent wheels? There are many reasons, reaching all the way from the narrowly technical to the psychology of programmers and the economics of the software production system. The damage from the endemic waste of programming time reaches all these levels as well.

Consider the first, formative job experience of J. Random Newbie, a programmer fresh out of college. Let us assume that he (or she) has been taught the value of code reuse and is brimming with youthful zeal to apply it.

Newbie’s first project puts him on a team building some large application. Let’s say for the sake of example that it’s a GUI intended to help end users intelligently construct queries for and navigate through a large database. The project managers have assembled what they deem to be a suitable collection of tools and components, including not merely a development language but many libraries as well.

The libraries are crucial to the project. They package many services—from windowing widgets and network connections on up to entire subsystems like interactive help—that would otherwise require immense quantities of additional coding, with a severe impact on the project’s budget and its ship date.

Newbie is a little worried about that ship date. He may lack experience, but he’s read Dilbert and heard a few war stories from experienced programmers. He knows management has a tendency to what one might euphemistically call “aggressive” schedules. Perhaps he has read Ed Yourdon’s Death March [Yourdon], which as long ago as 1996 noted that a majority of projects are on a time and resource budget at least 50% too tight, and that the trend is for that squeeze to get worse.

But Newbie is bright and energetic. He figures his best chance of succeeding is to learn to use the tools and libraries that have been handed to him as intelligently as possible. He limbers up his typing fingers, hurls himself at the challenge...and enters hell.

Everything takes longer and is more painful than he expects. Beneath the surface gloss of their demo applications, the components he is re-using seem to have edge cases in which they behave unpredictably or destructively—edge cases his code tickles daily. He often finds himself wondering what the library programmers were thinking. He can’t tell, because the components are inadequately documented—often by technical writers who aren’t programmers and don’t think like programmers. And he can’t read the source code to learn what it is actually doing, because the libraries are opaque blocks of object code under proprietary licenses.

Newbie has to code increasingly elaborate workarounds for component problems, to the point where the net gain from using the libraries starts to look marginal. The workarounds make his code progressively grubbier. He probably hits a few places where a library simply cannot be made to do something crucially important that is theoretically within its specifications. Sometimes he is sure there is some way to actually make the black box perform, but he can’t figure out what it is.

Newbie finds that as he puts more strain on the libraries, his debugging time rises exponentially. His code is bedeviled with crashes and memory leaks that have trace paths leading into the libraries, into code he can’t see or modify. He knows most of those trace paths probably lead back out to his code, but without source it is very difficult to trace through the bits he didn’t write.

Newbie is growing horribly frustrated. He had heard in college that in industry, a hundred lines of finished code a week is considered good performance. He had laughed then, because he was many times more productive than that on his class projects and the code he wrote for fun. Now it’s not funny any more. He is wrestling not merely with his own inexperience but with a cascade of problems created by the carelessness or incompetence of others—problems he can’t fix, but can only work around.

The project schedule is slipping. Newbie, who dreamed of being an architect, finds himself a bricklayer trying to build with bricks that won’t stack properly and that crumble under load-bearing pressure. But his managers don’t want to hear excuses from a novice programmer; complaining too loudly about the poor quality of the components is likely to get him in political trouble with the senior people and managers who selected them. And even if he could win that battle, changing components would be a complicated proposition involving batteries of lawyers peering narrowly at licensing terms.

Unless Newbie is very, very lucky, he is not going to be able to get library bugs fixed within the lifetime of his project. In his saner moments, he may realize that the working code in the libraries doesn’t draw his attention the way the bugs and omissions do. He’d love to sit down for a clarifying chat with the component developers; he suspects they can’t be the idiots their code sometimes suggests, just programmers like him working within a system that frustrates their attempts to do the right thing. But he can’t even find out who they are—and if he could, the software vendor they work for probably wouldn’t let them talk to him.

In desperation, Newbie starts making his own bricks—simulating less stable library services with more stable ones and writing his own implementations from scratch. His replacement code, because he has a complete mental model of it that he can refresh by rereading, tends to work relatively well and be easier to debug than the combination of opaque components and workarounds it replaces.

Newbie is learning a lesson; the less he relies on other peoples’ code, the more lines of code he can get written. This lesson feeds his ego. Like all young programmers, deep down he thinks he is smarter than anyone else. His experience seems, superficially, to be confirming this. He begins building his own personal toolkit, one better fitted to his hand.

Unfortunately, the roll-your-own reflexes Newbie is acquiring are a short-term local optimization that will cause long-term problems. He may get more lines of code written, but the actual value of what he produces is likely to drop substantially relative to what it would have been if he were doing successful reuse. More code does not equal better code, not when it’s written at a lower level and largely devoted to reinventing wheels.

Newbie has at least one more demoralizing experience in store, when he changes jobs. He is likely to discover that he can’t take his toolkit with him. If he walks out of the building with code he wrote on company time, his old employers could well regard this as intellectual-property theft. His new employers, knowing this, are not likely to react well if he admits to reusing any of his old code.

Newbie could well find his toolkit is useless even if he can sneak it into the building at his new job. His new employers may use a different set of proprietary tools, languages, and libraries. It is likely he will have to learn a somewhat new set of techniques and reinvent a new set of wheels each time he changes projects.

Thus do programmers have reuse (and other good practices that go with it, like modularity and transparency) systematically conditioned out of them by a combination of technical problems, intellectual-property barriers, politics, and personal ego needs. Multiply J. Random Newbie by a hundred thousand, age him by decades, and have him grow more cynical and more used to the system year by year. There you have the state of much of the software industry, a recipe for enormous waste of time and capital and human skill—even before you factor in vendors’ market-control tactics, incompetent management, impossible deadlines, and all the other pressures that make doing good work difficult.

The professional culture that springs from J. Random Newbie’s experiences will reflect them in the large. Programming shops will have a ferocious Not Invented Here complex. They will be poisonously ambivalent about code reuse, pushing inadequate but heavily marketed vendor components on their programmers in order to meet schedule crunches, while simultaneously rejecting reuse of the programmers’ own tested code. They will churn out huge volumes of ad-hoc, duplicative software produced by programmers who know the results will be garbage but are glumly resigned to never being able to fix anything but their own individual pieces.

The closest equivalent of code reuse to emerge in such a culture will be a dogma that code once paid for can never be thrown away, but must instead be patched and kluged even when all parties know that it would be better to scrap and start anew. The products of this culture will become progressively more bloated and buggy over time even when every individual involved is trying his or her hardest to do good work.

16.2 Transparency as the Key to Reuse

We field-tested the tale of J. Random Newbie on a number of experienced programmers. If you the reader are one yourself, we expect you responded to it much as they did: with groans of recognition. If you are not a programmer but you manage programmers, we sincerely hope you found it enlightening. The tale is intended to illustrate the ways in which different levels of pressure against reuse reinforce each other to create a magnitude of problem not linearly predictable from any individual cause.

So accustomed are most of us to the background assumptions of the software industry that it can take considerable mental effort before the primary causes of this problem can be separated from the accidents of narrative. But they are not, in the end, very complex.

At the bottom of most of J. Random Newbie’s troubles (and the large-scale quality problems they imply) is transparency — or, rather, the lack of it. You can’t fix what you can’t see inside. In fact, for any software with a nontrivial API, you can’t even properly use what you can’t see inside. Documentation is inadequate not merely in practice but in principle; it cannot convey all the nuances that the code embodies.

In Chapter 6, we observed how central transparency is to good software. Object-code-only components destroy the transparency of a software system, On the other hand, the frustrations of code reuse are far less likely to bite when the code you are attempting to reuse is available for reading and modification. Well-commented source code is its own documentation. Bugs in source code can be fixed. Source can be instrumented and compiled for debugging to make probing its behavior in obscure cases easier. And if you need to change its behavior, you can do that.

There is another vital reason to demand source code. A lesson Unix programmers have learned through decades of constant change is that source code lasts, object code doesn’t. Hardware platforms change, service components like support libraries change, the operating system grows new APIs and deprecates old ones. Everything changes—but opaque binary executables cannot adapt to change. They are brittle, cannot be reliably forward-ported, and have to be supported with increasingly thick and error-prone layers of emulation code. They lock users into the assumptions of the people who built them. You need source because, even if you have neither the intention nor the need to change the software, you will have to rebuild it in new environments to keep it running.

The importance of transparency and the code-legacy problem are reasons that you should require the code you reuse to be open to inspection and modification.1 It is not a complete argument for what is now called ’open source’; because ’open source’ has rather stronger implications than simply requiring code to be transparent and visible.

1 NASA, which consciously builds software intended to have a service life of decades, has learned to insist on source-code availability for all space avionics software.

16.3 From Reuse to Open Source

In the early days of Unix, components of the operating system, its libraries, and its associated utilities were passed around as source code; this openness was a vital part of the Unix culture. We described in Chapter 2 how, when this tradition was disrupted after 1984, Unix lost its initial momentum. We have also described how, a decade later, the rise of the GNU toolkit and Linux prompted a rediscovery of the value of open-source code.

Today, open-source code is again one of the most powerful tools in any Unix programmer’s kit. Accordingly, though the explicit concept of “open source” and the most widely used open-source licenses are decades younger than Unix itself, it’s important to understand both to do leading-edge development in today’s Unix culture.

Open source relates to code reuse in much the way romantic love relates to sexual reproduction—it’s possible to explain the former in terms of the latter, but to do so is to risk overlooking much of what makes the former interesting. Open source does not reduce to merely being a tactic for supporting reuse in software development. It is an emergent phenomenon, a social contract among developers and users that tries to secure several advantages related to transparency. As such, there are several different ways to approaching an understanding of it.

Our historical description earlier in this book chose one angle by focusing on causal and cultural relationships between Unix and open source. We’ll discuss the institutions and tactics of open-source development in Chapter 19. In discussing the theory and practice of code reuse, it’s useful to think of open source more specifically, as a direct response to the problems we dramatized in the tale of J. Random Newbie.

Software developers want the code they use to be transparent. Furthermore, they don’t want to lose their toolkits and their expertise when they change jobs. They get tired of being victims, fed up with being frustrated by blunt tools and intellectual-property fences and having to repeatedly re-invent the wheel.

These are the motives for open source that flow from J. Random Newbie’s painful initiatory experience with reuse. Ego needs play a part here, too; they give pervasive emotional force to what would otherwise be a bloodless argument about engineering best practices. Software developers are like every other kind of craftsman and artificer; they want, not so secretly, to be artists. They have the drives and needs of artists, including the desire to have an audience. They not only want to reuse code, they want their code to be reused. There is an imperative here that goes beyond and overrides short-term economic goal-seeking and that cannot be satisfied by closed-source software production.

Open source is a kind of ideological preemptive strike on all these problems. If the root of most of J. Random Newbie’s problems with reuse is the opacity of closed-source code, then the institutional assumptions that produce closed-source code must be smashed. If corporate territoriality is a problem, it must be attacked or bypassed until the corporations have caught on to how self-destructive their territorial reflexes are. Open source is what happens when code reuse gets a flag and an army.

Accordingly, since the late 1990s, it no longer makes any sense to try to recommend strategies and tactics for code reuse without talking about open source, open-source practices, open-source licensing, and the open-source community. Even if those issues could be separated elsewhere, they have become inextricably bound together in the Unix world.

In the remainder of this chapter, we’ll survey various issues associated with reusing open-source code: evaluation, documentation, and licensing. In Chapter 19 we’ll discuss the open-source development model more generally, and examine the conventions you should follow when you are releasing code for others to use.

16.4 The Best Things in Life Are Open

On the Internet, literally terabytes of Unix sources for systems and applications software, service libraries, GUI toolkits and hardware drivers are available for the taking. You can have most built and running in minutes with standard tools. The mantra is ./configure; make; make install; usually you have to be root to do the install part.

People from outside the Unix world (especially non-technical people) are prone to think open-source (or ’free’) software is necessarily inferior to the commercial kind, that it’s shoddily made and unreliable and will cause more headaches than it saves. They miss an important point: in general, open-source software is written by people who care about it, need it, use it themselves, and are putting their individual reputations among their peers on the line by publishing it. They also tend to have less of their time consumed by meetings, retroactive design changes, and bureaucratic overhead. They are therefore both more strongly motivated and better positioned to do excellent work than wage slaves toiling Dilbert-like to meet impossible deadlines in the cubicles of proprietary software houses.

Furthermore, the open-source user community (those peers) is not shy about nailing bugs, and its standards are high. Authors who put out substandard work experience a lot of social pressure to fix their code or withdraw it, and can get a lot of skilled help fixing it if they choose. As a result, mature open-source packages are generally of high quality and often functionally superior to any proprietary equivalent. They may lack polish and have documentation that assumes much, but the vital parts will usually work quite well.

Besides the peer-review effect, another reason to expect better quality is this: in the open-source world developers are never forced by a deadline to close their eyes, hold their noses, and ship. A major consequent difference between open-source practice and elsewhere is that a release level of 1.0 actually means the software is ready to use. In fact, a version number of 0.90 or above is a fairly reliable signal that the code is production-ready, but the developers are not quite ready to bet their reputations on it.

If you are a programmer from outside the Unix world, you may find this claim difficult to believe. If so, consider this: on modern Unixes, the C compiler itself is almost invariably open source. The Free Software Foundation’s GNU Compiler Collection (GCC) is so powerful, so well documented, and so reliable that there is effectively no proprietary Unix compiler market left, and it has become normal for Unix vendors to port GCC to their platforms rather than do in-house compiler development.

The way to evaluate an open-source package is to read its documentation and skim some of its code. If what you see appears to be competently written and documented with care, be encouraged. If there also is evidence that the package has been around for a while and has incorporated substantial user feedback, you may bet that it is quite reliable (but test anyway).

A good gauge of maturity and the volume of user feedback is the number of people besides the original author mentioned in the README and project news or history files in the source distribution. Credits to lots of people for sending in fixes and patches are signs both of a significant user base keeping the authors on their toes, and of a conscientious maintainer who is responsive to feedback and will take corrections. It is also an indication that, if early code tends to be a minefield of bugs, there has since been a thundering herd run through it without too many recent explosions.

It’s also a good omen when the software has its own Web page, on-line FAQ (Frequently Asked Questions) list, and an associated mailing list or Usenet newsgroup. These are all signs that a live and substantial community of interest has grown up around the software. On Web pages, recent updates and an extensive mirror list are reliable signs of a project with a vigorous user community. Packages that are duds just don’t get this kind of continuing investment, because they can’t reward it.

Ports to multiple platforms are also a valuable indication of a diversified user base. Project pages tend to advertise new ports precisely because they signal credibility.

Here are some examples of what Web pages associated with high-quality open-source software look like:

• GIMP <http://www.gimp.org/>

• GNOME <http://www.gnome.org>

• KDE <http://www.kde.org>

• Python <http://www.python.org>

• The Linux kernel <http://www.kernel.org>

• PostgreSQL <http://www.postgresql.org>

• XFree86 <http://xfree86.org>

• InfoZip <http://www.info-zip.org/pub/infozip/>

Looking at Linux distributions is another good way to find quality. Distribution-makers for Linux and other open-source Unixes carry a lot of specialist expertise about which projects are best-of-breed—that’s a large part of the value they add when they integrate a release. If you are already using an open-source Unix, something else to check is whether the package you are evaluating is already carried by your distribution.

16.5 Where to Look?

Because so much open source is available in the Unix world, skill at finding code to reuse can have an enormous payoff—much greater than is the case for other operating systems. Such code comes in many forms: individual code snippets and examples, code libraries, utilities to be reused in scripts. Under Unix most code reuse is not a matter of actual cut-and-paste into your program—in fact, if you find yourself doing that, there is almost certainly a more graceful mode of reuse that you are missing. Accordingly, one of the most useful skills to cultivate under Unix is a good grasp of all the different ways to glue together code, so you can use the Rule of Composition.

To find re-usable code, start by looking under your nose. Unixes have always featured a rich toolkit of re-usable utilities and libraries; modern ones, such as any current Linux system, include thousands of programs, scripts, and libraries that may be re-usable. A simple man -k search with a few keywords often yields useful results.

To begin to grasp something of the amazing wealth of resources out there, surf to SourceForge, ibiblio, and Freshmeat.net. Other sites as important as these three may exist by the time you read this book, but all three of these have shown continuing value and popularity over a period of years, and seem likely to endure.

SourceForge <http://www.sourceforge.net> is a demonstration site for software specifically designed to support collaborative development, complete with associated project-management services. It is not merely an archive but a free development-hosting service, and in mid-2003 is undoubtedly the largest single hub of open-source activity in the world.

The Linux archives at ibiblio <http://www.ibiblio.org> were the largest in the world before SourceForge. The ibiblio archives are passive, simply a place to publish packages. They do, however, have a better interface to the World Wide Web than most passive sites (the program that creates its Web look and feel was one of our case studies in the discussion of Perl in Chapter 14). It’s also the home site of the Linux Documentation Project, which maintains many documents that are excellent resources for Unix users and developers.

Freshmeat <http://www.freshmeat.net> is a system dedicated to providing release announcements of new software, and new releases of old software. It lets users and third parties attach reviews to releases.

These three general-purpose sites contain code in many languages, but most of their content is C or C++. There are also sites specialized around some of the interpreted languages as discussed in Chapter 14.

The CPAN archive is the central repository for useful free code in Perl. It is easily reached from the Perl home page <http://www.perl.com/perl>.

The Python Software Activity makes an archive of Python software and documentation available at the Python Home Page <http://www.python.org>.

Many Java applets and pointers to other sites featuring free Java software are made available at the Java Applets page <http://java.sun.com/applets/>.

One of the most valuable ways you can invest your time as a Unix developer is to spend time wandering around these sites learning what is available for you to use. The coding time you save may be your own!

Browsing the package metadata is a good idea, but don’t stop there. Sample the code, too. You’ll get a better grasp on what the code is doing, and be able to use it more effectively.

More generally, reading code is an investment in the future. You’ll learn from it—new techniques, new ways to partition problems, different styles and approaches. Both using the code and learning from it are valuable rewards. Even if you don’t use the techniques in the code you study, the improved definition of the problem you get from looking at other peoples’ solutions may well help you invent a better one of your own.

Read before you write; develop the habit of reading code. There are seldom any completely new problems, so it is almost always possible to discover code that is close enough to what you need to be a good starting point. Even when your problem is genuinely novel, it is likely to be genetically related to a problem someone else has solved before, so the solution you need to develop is likely to be related to some pre-existing one as well.

16.6 Issues in Using Open-Source Software

There are three major issues in using or re-using open-source software; quality, documentation, and licensing terms. We’ve seen above that if you exercise a little judgment in picking through your alternatives, you will generally find one or more of quite respectable quality.

Documentation is often a more serious issue. Many high-quality open-source packages are less useful than they technically ought to be because they are poorly documentated. Unix tradition encourages a rather hieratic style of documentation, one which (while it may technically capture all of a package’s features) assumes that the reader is intimately familiar with the application domain and reading very carefully. There are good reasons for this, which we’ll discuss in Chapter 18, but the style can present a bit of a barrier. Fortunately, extracting value from it is a learnable skill.

It is worth doing a Web search for phrases including the software package, or topic keywords, and the string “HOWTO” or “FAQ”. These queries will often turn up documentation more useful to novices than the man page.

The most serious issue in reusing open-source software (especially in any kind of commercial product) is understanding what obligations, if any, the package’s license puts upon you. In the next two sections we’ll discuss this issue in detail.

16.7 Licensing Issues

Anything that is not public domain has a copyright, possibly more than one. Under U.S. federal law, the authors of a work hold copyright even if there is no copyright notice.

Who counts as an author under copyright law can be complicated, especially for software that has been worked on by many hands. This is why licenses are important. They can authorize uses of code in ways that would be otherwise impermissible under copyright law and, drafted appropriately, can protect users from arbitrary actions by the copyright holders.

In the proprietary software world, the license terms are designed to protect the copyright. They’re a way of granting a few rights to users while reserving as much legal territory as possible for the owner (the copyright holder). The copyright holder is very important, and the license logic so restrictive that the exact technicalities of the license terms are usually unimportant.

As will be seen below, the copyright holder typically uses the copyright to protect the license, which makes the code freely available under terms he intends to perpetuate indefinitely. Otherwise, only a few rights are reserved and most choices pass to the user. In particular, the copyright holder cannot change the terms on a copy you already have. Therefore, in open-source software the copyright holder is almost irrelevant—but the license terms are very important.

Normally the copyright holder of a project is the current project leader or sponsoring organization. Transfer of the project to a new leader is often signaled by changing the copyright holder. However, this is not a hard and fast rule; many open-source projects have multiple copyright holders, and there is no instance on record of this leading to legal problems. Some projects choose to assign copyright to the Free Software Foundation, on the theory that it has an interest in defending open source and lawyers available to do it.

16.7.1 What Qualifies as Open Source

For licensing purposes, we can distinguish several different kinds of rights that a license may convey. There are rights to copy and redistribute, rights to use, rights to modify for personal use, and rights to redistribute modified copies. A license may restrict or attach conditions to any of these rights.

The Open Source Definition <http://www.opensource.org/osd.html> is the result of a great deal of thought about what makes software “open source” or (in older terminology) “free”. It is widely accepted in the open-source community as an articulation of the social contract among open-source developers. Its constraints on licensing impose the following requirements:

An unlimited right to copy be granted.

• An unlimited right to redistribute in unmodified form be granted.

• An unlimited right to modify for personal use be granted.

The guidelines prohibit restrictions on redistribution of modified binaries; this meets the needs of software distributors, who need to be able to ship working code without encumbrance. It allows authors to require that modified sources be redistributed as pristine sources plus patches, thus establishing the author’s intentions and an “audit trail” of any changes by others.

The OSD is the legal definition of the “OSI Certified Open Source” certification mark, and as good a definition of “free software” as anyone has ever come up with. All of the standard licenses (MIT, BSD, Artistic, GPL/LGPL, and MPL) meet it (though some, like GPL, have other restrictions which you should understand before choosing it).

Note that licenses that allow only noncommercial use do not qualify as open-source licenses, even if they are based on GPL or some other standard license. Such licenses discriminate against particular occupations, persons, and groups, a practice which the OSD’s Clause 5 explicitly forbids.

Clause 5 was written after years of painful experience. No-commercial-use licenses turn out to have the problem that there is no bright-line legal test for what sort of redistribution qualifies as ’commercial’. Selling the software as a product qualifies, certainly. But what if it were distributed at a nominal price of zero in conjunction with other software or data, and a price is charged for the whole collection? Would it make a difference whether the software were essential to the function of the whole collection?

Nobody knows. The very fact that no-commercial-use licenses create uncertainty about a redistributor’s legal exposure is a serious strike against them. One of the objectives of the OSD is to ensure that people in the distribution chain of OSD-conforming software do not need to consult with intellectual-property lawyers to know what their rights are. OSD forbids complicated restrictions against persons, groups, and occupations partly so that people dealing with collections of software will not face a combinatorial explosion of slightly differing (and perhaps conflicting) restrictions on what they can do with it.

This concern is not hypothetical, either. One important part of the open-source distribution chain is CD-ROM distributors who aggregate it in useful collections ranging from simple anthology CDs up to bootable operating systems. Restrictions that would make life prohibitively complicated for CD-ROM distributors, or others trying to spread open-source software commercially, have to be forbidden.

On the other hand, the OSD has nothing to say about the laws of your jurisdiction. Some countries have laws against exporting certain restricted technologies to named ’rogue states’. The OSD cannot negate those, it only says that licensors may not add restrictions of their own.

16.7.2 Standard Open-Source Licenses

Here are the standard open-source license terms you are likely to encounter. The abbreviations listed here are in general use.

MIT <http://www.opensource.org/licenses/mit-license.html>

MIT X Consortium license (like BSD’s but with no advertising requirement)

BSD <http://www.opensource.org/licenses/bsd-license.html>

University of California at Berkeley Regents copyright (used on BSD code)

Artistic License <http://www.opensource.org/licenses/artistic-license.html>

Same terms as Perl Artistic License

GPL <http://www.gnu.org/copyleft.html>

GNU General Public License

LGPL <http://www.gnu.org/copyleft.html>

Library (or ’Lesser’) GPL

MPL <http://www.opensource.org/licenses/MPL-1.1.html>

Mozilla Public License

We’ll discuss these licenses in more detail, from a developer’s point of view, in Chapter 19. For the purposes of this chapter, the only important distinction among them is whether they are infectious or not. A license is infectious if it requires that any derivative work of the licensed software also be placed under its terms.

Under these licenses, the only kind of open-source use you should really worry about is actual incorporation of the free-software code into a proprietary product (as opposed, say, to merely using open-source development tools to make your product). If you’re prepared to include proper license acknowledgements and pointers to the source code you’re using in your product documentation, even direct incorporation should be safe provided the license is not infectious.

The GPL is both the most widely used and the most controversial infectious license. And it is clause 2(b), requiring that any derivative work of a GPLed program itself be GPLed, that causes the controversy. (Clause 3(b) requiring licensors to make source available on physical media on demand used to cause some, but the Internet explosion has made publishing source code archives as required by 3(a) so cheap that nobody worries about the source-publication requirement any more.)

Nobody is quite certain what the “contains or is derived from” in clause 2(b) means, nor what kinds of use are protected by the “mere aggregation” language a few paragraphs later. Contentious issues include library linking and inclusion of GPL-licensed header files. Part of the problem is that the U.S. copyright statutes do not define what derivation is; it has been left to the courts to hammer out definitions in case law, and computer software is an area in which this process (as of mid-2003) has barely begun.

At one end, the “mere aggregation” certainly makes it safe to ship GPLed software on the same media with your proprietary code, provided they do not link to or call each other. They may even be tools operating on the same file formats or on-disk structures; that situation, under copyright law, would not make one a derivative of the other.

At the other end, splicing GPLed code into your proprietary code, or linking GPLed object code to yours, certainly does make your code a derivative work and requires it to be GPLed.

It is generally believed that one program may execute a second program as a subprocess without either program becoming thereby a derivative work of the other.

The case that causes dispute is dynamic linking of shared libraries. The Free Software Foundation’s position is that if a program calls another program as a shared library, then that program is a derivative work of the library. Some programmers think this claim is overreaching. There are technical, legal, and political arguments on both sides that we won’t rehash here. Since the Free Software Foundation wrote and owns the license, it would be prudent to behave as if the FSF’s position is correct until a court rules otherwise.

Some people think the 2(b) language is deliberately designed to infect every part of any commercial program that uses even a snippet of GPLed code; such people refer to it as the GPV, or “General Public Virus”. Others think the “mere aggregation” language covers everything short of mixing GPL and non-GPL code in the same compilation or linkage unit.

This uncertainty has caused enough agitation in the open-source community that the FSF had to develop the special, slightly more relaxed “Library GPL” (which they have since renamed the “Lesser GPL”) to reassure people they could continue to use runtime libraries that came with the FSF’s GNU compiler collection.

You’ll have to choose your own interpretation of clause 2(b); most lawyers will not understand the technical issues involved, and there is no case law. As a matter of empirical fact, the FSF has never (from its founding in 1984 to mid-2003, at least) sued anyone under the GPL but it has enforced the GPL by threatening lawsuit, in all known cases successfully. And, as another empirical fact, Netscape includes the source and object of a GPLed program with the commercial distribution of its Netscape Navigator browser.

The MPL and LGPL are infectious in a more limited way than GPL. They explicitly allow linking with proprietary code without turning that code into a derivative work, provided all traffic between the GPLed and non-GPLed code goes through a library API or other well-defined interface.

16.7.3 When You Need a Lawyer

This section is directed to commercial developers considering incorporating software that falls under one of these standard licenses into closed-source products.

Having gone through all this legal verbiage, the expected thing for us to do at this point is to utter a somber disclaimer to the effect that we are not lawyers, and that if you have any doubts about the legality of something you want to do with open-source software, you should immediately consult a lawyer.

With all due respect to the legal profession, this would be fearful nonsense. The language of these licenses is as clear as legalese gets—they were written to be clear—and should not be at all hard to understand if you read it carefully. The lawyers and courts are actually more confused than you are. The law of software rights is murky, and case law on open-source licenses is (as of mid-2003) nonexistent; no one has ever been sued under them.

This means a lawyer is unlikely to have a significantly better insight than a careful lay reader. But lawyers are professionally paranoid about anything they don’t understand. So if you ask one, he is rather likely to tell you that you shouldn’t go anywhere near open-source software, despite the fact that he probably doesn’t understand the technical aspects or the author’s intentions anywhere near as well as you do.

Finally, the people who put their work under open-source licenses are generally not mega-corporations attended by schools of lawyers looking for blood in the water; they’re individuals or volunteer groups who mainly want to give their software away. The few exceptions (that is, large companies both issuing under open-source licenses and with money to hire lawyers) have a stake in open source and don’t want to antagonize the developer community that produces it by stirring up legal trouble. Therefore, your odds of getting hauled into court on an innocent technical violation are probably lower than your chances of being struck by lightning in the next week.

This isn’t to say you should treat these licenses as jokes. That would be disrespectful of the creativity and sweat that went into the software, and you wouldn’t enjoy being the first litigation target of an enraged author no matter how the lawsuit came out. But in the absence of definitive case law, a visible good-faith effort to meet the author’s intentions is 99% of what you can do; the additional 1% of protection you might (or might not) get by consulting a lawyer is unlikely to make a difference.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.1.82