Preface

Is it really punk rock

Like the party line?

Wilco, “Too Far Apart”

C Is Punk Rock

C has only a handful of keywords and is a bit rough around the edges, and it rocks. You can do anything with it. Like the C, G, and D chords on a guitar, you can learn the basic mechanics quickly, and then spend the rest of your life getting better. The people who don’t get it fear its power and think it too edgy to be safe. By all rankings, it is consistently the most popular language that doesn’t have a corporation or foundation spending money to promote it.1

Also, the language is about 40 years old, which makes it middle-aged. It was written by a few guys basically working against management—the perfect punk rock origins—but that was in the 1970s, and there’s been a lot of time for the language to go mainstream.

What did people do when punk rock went mainstream? In the decades since its advent in the 1970s, punk certainly has come in from the fringes: The Clash, The Offspring, Green Day, and The Strokes sold millions of albums worldwide (to name just a few), and I have heard lite instrumental versions of songs from the punk spinoff known as grunge at my local supermarket. The former lead singer of Sleater-Kinney now has a popular sketch comedy show that frequently lampoons punk rockers.2 One reaction to the continuing evolution would be to take the hard line and say that the original stuff was punk and everything else is just easy punk pop for the masses. The traditionalists can still play their albums from the ’70s, and if the grooves are worn out, they can download a digitally mastered edition. They can buy Ramones hoodies for their toddlers.

Outsiders don’t get it. Some of them hear the word punk and picture something out of the 1970s—a historic artifact about some kids that were, at the time, really doing something different. The traditionalist punks who still love and play their 1973 Iggy Pop LPs are having their fun, but they bolster the impression that punk is ossified and no longer relevant.

Getting back to the world of C, we have both the traditionalists, waving the banner of ANSI ’89, and those who will rock out to whatever works and may not even realize that the code they are writing would not have compiled or run in the 1990s. Outsiders don’t get the difference. They see still-in-print books from the 1980s and still-online tutorials from the 1990s, they hear from the hardcore traditionalists who insist on still writing like that today, and they don’t even know that the language and the rest of its users continue to evolve. That’s a shame, because they’re missing out on some great stuff.

This is a book about breaking tradition and keeping C punk rock. I don’t care to compare the code in this book to the original C specification in Kernighan & Ritchie’s 1978 book. My telephone has 512 MB of memory, so why are our C textbooks still spending pages upon pages covering techniques to shave kilobytes off of our executables? I am writing this on a bottom-of-the-line red netbook that can accommodate 3,200,000,000 instructions per second; what do I care about whether an operation requires comparing 8 bits or 16? We should be writing code that we can write quickly and that is readable by our fellow humans. We’re still writing in C, so our readable but imperfectly optimized code will still run an order of magnitude faster than if we’d written comparable code in any number of alternative, bloated languages.

Q & A (Or, the Parameters of the Book)

Q: How is this C book different from all others?

A: Some are better written, some are even entertaining, but C textbooks are a somewhat uniform bunch (I’ve read a lot of them, including C for Programmers with an Introduction to C11; Head First C; The C Programming Language, 1st Edition; The C Programming Language 2nd Edition; Programming in C; Practical C Programming; Absolute Beginner’s Guide to C; The Waite Group’s C Primer Plus; and C Programming). Most were written before the C99 standard simplified many aspects of usage, and you can tell that some of those now in their nth edition just pasted in a few notes about updates rather than seriously rethinking how to use the language. They all mention that there might be libraries that you could maybe use in writing your own code, but most predate the installation tools and ecosystem we have now, which make using those libraries reliable and reasonably portable. Those textbooks are still valid and still have value, but modern C code just doesn’t look like much of the code in many of those textbooks.

This book picks up where they left off, reconsidering the language and the ecosystem in which it lives. The storyline here is about using libraries that provide linked lists and XML parsers, not writing new ones from scratch. It is about writing code that is readable and has a friendly user interface.

Q: Who is this book for? Do I need to be a coding guru?

A: You have experience coding in any language, maybe Java or a scripting language such as Perl. I don’t have to sell you on why your code shouldn’t be one long routine with no subfunctions.

The body of the book assumes basic knowledge of C gained from time spent writing C code. If you are rusty on the details or are starting from zero, Appendix A offers a short tutorial on basic C for readers who are coming from a scripting language like Python or Ruby.

I might as well point out to you that I have also written a textbook on statistical and scientific computing, Modeling with Data (Klemens, 2008). Along with lots of details of dealing with numeric data and using statistical models for describing data, it has a longer, more standalone tutorial on C, which I naturally think overcomes many of the failings of older C tutorials.

Q: I’m an application programmer, not a kernel hacker. Why should I use C instead of a quick-to-write scripting language like Python?

A: If you are an application programmer, this book is for you. I read people asserting that C is a systems language, which impresses me as so un-punk—who are they to tell us what we’re allowed to write?

Statements along the lines of “Our language is almost as fast as C, but is easier to write” are so common that they are almost cliché. Well, C is definitely as fast as C, and the purpose of this book is to show you that C is easier to write than the textbooks from decades past imply that it is. You don’t have to call malloc and get elbow-deep in memory management half as often as the systems programmers of the 1990s did, we have facilities for easier string handling, and even the core syntax has evolved to make for more legible code.

I started writing C in earnest because I had to speed up a simulation in a scripting language, R. Like so many other scripting languages, R has a C interface and encourages the user to make use of it any time the host language is too slow. Eventually, I had so many functions jumping out from the host script to C code that I just dropped the host language entirely.

Q: It’s nice that application programmers coming from scripting languages will like this book, but I am a kernel hacker. I taught myself C in fifth grade and sometimes have dreams that correctly compile. What new material can there be for me?

A: C has evolved in the last 20 years. As I’ll discuss later, the set of things we are guaranteed that all C compilers support has changed with time, thanks to two new C standards since the original ANSI standard that defined the language for so long. Maybe have a look at Chapter 10 and see if anything there surprises you. Some sections of this book, like the chapter clarifying common misconceptions about pointers (Chapter 6), cover material that has changed little since the 1980s.

Also, the environment has advanced. Many of the tools I cover, such as make and the debugger, may already be familiar to you, but I have found that others are not as well known. Autotools has entirely changed how distribution of code happens, and Git has changed how collaborative coding happens.

Q: I can’t help but notice that about a third of this book has almost no C code in it.

A: This book intends to cover what the other C textbooks don’t, and at the top of that list are the tools and environment. If you’re not using a debugger (standalone or part of your IDE), you’re making your life much more difficult. Textbooks often relegate the debugger to an afterthought, if they mention it at all. Sharing code with others requires another set of tools, including Autotools and Git. Code doesn’t live in a vacuum, and I felt that I would be doing a disservice writing yet another textbook that pretends that all the reader needs to be productive is the syntax of the language.

Q: Out of the many tools available for C development, how did you pick the ones in this book?

A: More than most, the C community holds itself to a high standard of interoperability. There are a lot of C extensions provided by the GNU environment, IDEs that work only on Windows, and compiler extensions that exist only in LLVM. This is probably why past textbooks shied away from covering tools. But in the present day there are some systems that work on anything we commonly recognize as a computer. Many of them are from the GNU; LLVM and its associated tools are quickly making ground but are still not as prevalent. Whatever you are using, be it a Windows box, a Linux box, or an instance you just pulled up from your cloud computing provider, the tools I cover here should be easy and quick to install. I mention a few platform-specific tools, but will be explicit in those cases.

I do not cover any integrated development environments (IDEs) because few if any reliably work across all platforms (try pulling up an Amazon Elastic Compute Cloud instance and installing Eclipse and its C plug-ins), and the choice of IDE is heavily influenced by personal preference. IDEs typically have a project build system, which is usually incompatible with every other IDE’s project build system. IDE project files are therefore unusable for project distribution outside of situations (classrooms, certain offices, some computing platforms) where everybody is mandated to use the same IDE.

Q: I have the Internet and can look up commands and syntax details in a second or two, so, really, why should I read this book?

A: It’s true: you can get an operator precedence table from a Linux or Mac command prompt with man operator, so why am I going to put one here?

I’ve got the same Internet you’ve got, and I’ve spent a lot of time reading it. So I have a good idea of what isn’t being talked about, and that’s what I stick to here. When introducing a new tool, like gprof or gdb, I give you what you need to know to get your bearings and ask your search engine coherent questions, and what other textbooks missed (which is a lot).

Standards: So Many to Choose From

Unless explicitly stated otherwise, everything in this book conforms to the ISO C99 and C11 standards. To make sense of what that means, and give you some historical background, let us go through the list of major C standards (passing over the minor revisions and corrections).

K & R (circa 1978)

Dennis Ritchie, Ken Thompson, and a handful of other contributors came up with C while putting together the Unix operating system. Brian Kernighan and Dennis Ritchie eventually wrote down a description of the language in the first edition of their book, which set the first de facto standard (Kernighan, 1978).

ANSI C89

Bell Labs handed over the stewardship of the language to the American National Standards Institute (ANSI). In 1989 the institute published its standard, which made a few improvements over K & R. The second edition of K & R’s book included a full specification of the language, which meant that tens of thousands of programmers had a copy of the ANSI standard on their desks (Kernighan, 1988). The ANSI standard was adopted by the International Organization for Standardization (ISO) in 1990 with no serious changes, but ANSI ’89 seems to be the more common term (and would make a great t-shirt slogan).

A decade passed. C went mainstream, in the sense that the base code for more or less every PC and every Internet server was written in C, which is as mainstream as a human endeavor could possibly become.

During this period, C++ split off and hit it big (although not quite as big). C++ was the best thing to ever happen to C. While every other language was bolting on extra syntax to follow the object-oriented trend and whatever other new tricks came to the authors’ minds, C stuck to the standard. The people who wanted stability and portability used C, the people who wanted more and more features so they could wallow in them like moist hundred dollar bills got C++, and everybody was happy.

ISO C99

The C standard underwent a major revision a decade later. Additions were made for numeric and scientific computing, with a standard type for complex numbers and some type-generic functions. A few conveniences from C++ got lifted, including one-line comments (which originally came from one of C’s predecessor languages, BCPL) and being able to declare variables at the head of for loops. Using structures was made easier thanks to a few additions to the rules for how they can be declared and initialized, plus some notational conveniences. Things were modernized to acknowledge that security matters and that not everybody speaks English.

When you think about just how much of an impact C89 had, and how the entire globe was running on C code, it’s hard to imagine the ISO being able to put out anything that wouldn’t be widely criticized—even a refusal to make any changes would be reviled. And indeed, this standard was controversial. There are two common ways to express a complex variable (rectangular and polar coordinates)—so where does the ISO get off picking one? Why do we need a mechanism for variable-length macro inputs when all the good code got written without it? In other words, the purists accused the ISO of selling out to the pressure for more features.

As of this writing, most compilers support C99 plus or minus a few caveats; the long double type seems to cause a lot of trouble, for example. However, there is one notable exception to this broad consensus: Microsoft currently refuses to add C99 support to its Visual Studio C++ compiler. The section “Compiling C with Windows” covers some of the many ways to compile C code for Windows, so not using Visual Studio is at most an inconvenience, and having a major establishment player tell us that we can’t use ANSI- and ISO-standard C only bolsters the punk rock of it all.

C11

Self-conscious about the accusations of selling out, the ISO made few serious changes in the third edition of the standard. We got a means of writing type-generic functions, and things were modernized to further acknowledge that security matters and that not everybody speaks English.

The C11 standard came out in December of 2011, but support for the standard has been implemented by compiler authors at a surprisingly rapid pace, to the point that a number of major compilers already claim near-complete conformance. However, the standard defines behavior for both the compiler and the standard library—and library support, such as for threading and atomics, is complete on some systems but catching up on others.

The POSIX Standard

That’s the state of things as far as C itself goes, but the language coevolved with the Unix operating system, and you will see throughout the book that the interrelationship matters for day-to-day work. If something is easy on the Unix command line, it is probably because it is easy in C; Unix tools are often written to facilitate writing C code.

Unix

C and Unix were designed at Bell Labs in the early 1970s. During most of the 20th century, Bell was being investigated for monopolistic practices, and one of its agreements with the US federal government included promises that Bell would not expand its reach into software. So Unix was given away for free for researchers to dissect and rebuild. The name Unix is a trademark, originally owned by Bell Labs and subsequently traded off like a baseball card among a number of companies.

Variants of Unix blossomed as the code was looked over, reimplemented, and improved in different ways by diverse hackers. It just takes one little incompatibility to make a program or script unportable, so the need for some standardization quickly became apparent.

POSIX (Portable Operating System Interface)

This standard, first established by the Institute of Electrical and Electronics Engineers (IEEE) in 1988, provided a common basis for Unix-like operating systems. It specifies how the shell should work, what to expect from commands like ls and grep, and a number of C libraries that C authors can expect to have available. For example, the pipes that command-line users use to string together commands are specified in detail here, which means C’s popen (pipe open) function is POSIX-standard, not ISO C-standard. The POSIX standard has been revised many times; the version as of this writing is POSIX:2008, and that is what I am referring to when I say that something is POSIX-standard. A POSIX-standard system must have a C compiler available, via the command name c99.

This book will use the POSIX standard, though I’ll tell you when.

With the exception of many members of a family of OSes from Microsoft, just about every current operating system you could name is built on a POSIX-compliant base: Linux, Mac OS X, iOS, webOS, Solaris, BSD—even Windows servers offer a POSIX subsystem. And for the hold-out OSes, “Compiling C with Windows” will show you how to install a POSIX subsystem.

Finally, there are two more implementations of POSIX worth noting because of their prevalence and influence:

BSD

 After Unix was made available from Bell Labs for the public to dissect, the researchers at the University of California, Berkeley, made major improvements, eventually rewriting the entire Unix code base to produce the Berkeley Software Distribution. If you are using a computer from Apple, Inc., you are using BSD with an attractive graphical frontend. BSD goes beyond POSIX in several respects, and we’ll see some functions that are not part of the POSIX standard but are too useful to pass up (most notably the lifesaver that is asprintf).

GNU

It stands for GNU’s Not Unix, and is the other big success story in independently reimplementing and improving on the Unix environment. The great majority of Linux distributions use GNU tools throughout. There are very good odds that you have the GNU Compiler Collection (gcc) on your POSIX box—even BSD uses it. Again, the gcc defines a de facto standard that extends C and POSIX in a few ways, and I will be explicit when making use of those extensions.

Legally, the BSD license is slightly more permissive than the GNU license. Because some parties are deeply concerned with the political and business implications of the licenses, one can typically find both GNU and BSD versions of most tools. For example, both the gcc and the BSD’s clang are top-notch C compilers. The authors from both camps closely watch and learn from each other’s work, so we can expect that the differences that currently exist will tend to even out over time.

Some Logistics

The Second Edition

I used to be a cynic and think that people just wrote second editions to disrupt all the people selling used copies of the first edition. But this second edition actually could not have happened without the first being published, and could not have happened sooner (and most of the book’s readers are reading electronic copies anyway).

The big addition from the first edition is the chapter on concurrent threads, aka parallelization. It focuses on OpenMP and atomic variables and structs. OpenMP is not part of the C standard, but it is a reliable part of the C ecosystem, so it comfortably fits into this book. Atomic variables were added in the December 2011 revision of the C standard, so when this book came out less than a year later there were no compilers that supported them. We are now far enough along that I could write this chapter based both on the theory presented in the standard and the practice of a real-world implementation and tested code. See Chapter 12.

The first edition was blessed with some wonderfully pedantic readers. They caught everything that could be somehow construed as a bug, from the stupid thing I said about dashes on the command line to sentences whose wording could be misconstrued incorrectly in certain cases. Nothing in this world is bug-free, but the book is much more accurate and useful as a result of so much great reader feedback.

Other additions to this edition:

  • Appendix A provides a basic C tutorial for readers coming from another language. I was reluctant to include it in the first edition because there are so many C tutorials out there, but the book is more useful with it than without.

  • By popular demand, I expanded the discussion of how to use a debugger significantly. See “Using a Debugger”.

  • The first edition had a segment on how to write functions that take in a list of arbitrary length, so both sum(1, 2.2) and sum(1, 2.2, 3, 8, 16) would be valid. But what if you want to send multiple lists, like writing a dot-product function that multiplies two arbitrary-length vectors, like dot((2, 4), (-1, 1)) and dot((2, 4, 8, 16), (-1, 1, -1, 1))? “Multiple Lists” covers this.

  • I rewrote Chapter 11, on extending objects with new functions. The primary addition is an implementation of virtual tables.

  • I wrote a little more on the preprocessor, with an intro to the morass of test macros and their use in “Test Macros”, including a passing mention of the _Static_assert keyword.

  • I stuck to a promise I made to myself to not include a tutorial on regular expression parsing in this book (because there are hundreds online and in other books). But I did add a demo in “Parsing Regular Expressions” on the use of the POSIX regular expression parsing functions, which are in a relatively raw form compared to regex parsers in many other languages.

  • The discussion of string handling in the first edition relied heavily on asprintf, a sprintf-like function that autoallocates the required amount of memory before writing a string to the space. There is a version widely distributed by the GNU, but many readers were constrained from using it, so in this edition I added Example 9-3, showing how to write such a function from C-standard parts.

  • One of the big themes in Chapter 7 is that micromanaging numeric types can cause trouble, so the first edition made no mention of the dozens of new numeric types introduced in the C99 standard, like int_least32_t, uint_fast64_t, and so on (C99 §7.18; C11 §7.20). Several readers encouraged me to at least mention some of the more useful types, like intptr_t and intmax_t, which I now do where appropriate.

 

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, filenames and file paths, URLs, and email addresses. Many new terms are defined in a glossary at the end of this book.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Note

This icon signifies a tip, suggestion, or general note.

Caution

This icon indicates a warning or caution.

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

The code examples for this title can be found here: https://github.com/b-k/21st-Century-Examples.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “21st Century C, 2nd Edition by Ben Klemens (O’Reilly). Copyright 2014 Ben Klemens, 978-1-491-90389-6.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Safari® Books Online

Safari Books Online (www.safaribooksonline.com) is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/21st_century_c_2e.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

  • Nora Albert: general support, guinea pig.
  • Jerome Benoit: Autoconf tips.
  • Bruce Fields, Dave Kitabjian, Sarah Weissman: extensive and thorough review.
  • Patrick Hall: Unicode erudition.
  • Nathan Jepson, Allyson MacDonald, Rachel Roumeliotis, and Shawn Wallace: editorial.
  • Andreas Klein: pointing out the value of intptr_t.
  • Rolando Rodríguez: testing, inquisitive use, and exploration.
  • Rachel Steely, Nicole Shelby, and Becca Freed: production.
  • Ulrik Sverdrup: pointing out that we can use repeated designated initializers to set default values.

1 This preface owes an obvious and huge debt to Punk Rock Languages: A Polemic by Chris Adamson.

2 With lyrics like “Can’t get to heaven with a three-chord song,” maybe Sleater-Kinney was post-punk? Unfortunately, there is no ISO punk standard we can look to for precise in-or-out definitions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.96.86