In this chapter
12.1 Assertion Statements: assert()
page 428
12.2 Low-Level Memory: The memXXX()
Functions page 432
12.3 Temporary Files page 436
12.4 Committing Suicide: abort()
page 445
12.5 Nonlocal Gotos page 446
12.6 Pseudorandom Numbers page 454
12.7 Metacharacter Expansions page 461
12.8 Regular Expressions page 471
12.9 Suggested Reading page 480
12.10 Summary page 481
Exercises page 482
Chapter 6, “General Library Interfaces — Part 1,” page 165, presented the first set of general-purpose library APIs. In a sense, those APIs support working with the fundamental objects that Linux and Unix systems manage: the time of day, users and groups for files, and sorting and searching.
This chapter is more eclectic; the APIs covered here are not particularly related to each other. However, all are useful for day-to-day Linux/Unix programming. Our presentation moves from simpler, more general APIs to more complicated and more specialized ones.
An assertion is a statement you make about the state of your program at certain points in time during its execution. The use of assertions for programming was originally developed by C.A.R. Hoare.[1] The general idea is part of “program verification”: That as you design and develop a program, you can show that it’s correct by making carefully reasoned statements about the effects of your program’s code. Often, such statements are made about invariants—facts about the program’s state that are supposed to remain true throughout the execution of a chunk of code.
Assertions are particularly useful for describing two kinds of invariants: preconditions and postconditions: conditions that must hold true before and after, respectively, the execution of a code segment. A simple example of preconditions and postconditions is linear search:
/* lsearch --- return index in array of value, or -1 if not found */ int lsearch(int *array, size_t size, int value) { size_t i; /* precondition: array != NULL */ /* precondition: size > 0 */ for (i = 0; i < size; i++) if (array[i] == value) return i; /* postcondition: i == size */ return -1; }
This example states the conditions using comments. But wouldn’t it be better to be able to test the conditions by using code? This is the job of the assert()
macro:
#include <assert.h> ISO C void assert(scalar expression);
When the scalar expression
is false, the assert()
macro prints a diagnostic message and exits the program (with the abort()
function; see Section 12.4, “Committing Suicide: abort(),” page 445). ch12-assert.c
provides the lsearch()
function again, this time with assertions and a main()
function:
1 /* ch12-assert.c --- demonstrate assertions */ 2 3 #include <stdio.h> 4 #include <assert.h> 5 6 /* lsearch --- return index in array of value, or -1 if not found */ 7 8 int lsearch(int *array, size_t size, int value) 9 { 10 size_t i; 11 12 assert(array != NULL); 13 assert(size > 0); 14 for (i = 0; i < size; i++) 15 if (array[i] == value) 16 return i; 17 18 assert(i == size); 19 20 return -1; 21 } 22 23 /* main --- test out assertions */ 24 25 int main(void) 26 { 27 #define NELEMS 4 28 static int array[NELEMS] = { 1, 17, 42, 91 }; 29 int index; 30 31 index = lsearch(array, NELEMS, 21); 32 assert(index == -1); 33 34 index = lsearch(array, NELEMS, 17); 35 assert(index == 1); 36 37 index = lsearch(NULL, NELEMS, 10); /* won't return */ 38 39 printf("index = %d ", index); 40 41 return 0; 42 }
When compiled and run, the assertion on line 12 “fires:”
$ ch12-assert Run the program ch12-assert: ch12-assert.c:12: lsearch: Assertion `array != ((void *)0)' failed. Aborted (core dumped)
The message from assert()
varies from system to system. For GLIBC on GNU/Linux, the message includes the program name, the source code filename and line number, the function name, and then the text of the failed assertion. (In this case, the symbolic constant NULL
shows up as its macro expansion, ’((void *)0)’.
)
The ’Aborted (core dumped)
’ message means that ch12-assert
created a core
file; that is, a snapshot of the process’s address space right before it died.[2] This file can be used later, with a debugger; see Section 15.3, “GDB Basics,” page 570. Core file creation is a purposeful side effect of assert()
; the assumption is that something went drastically wrong, and you’ll want to examine the process with a debugger to determine what.
You can disable assertions by compiling your program with the command-line option `-DNDEBUG’
. When this macro is defined before <assert.h>
is included, the assert()
macro expands into code that does nothing. For example:
$ gcc -DNDEBUG=1 ch12-assert.c -o ch12-assert Compile with -DNDEBUG $ ch12-assert Run it Segmentation fault (core dumped) What happened?
Here, we got a real core dump! We know that assertions were disabled; there’s no “failed assertion” message. So what happened? Consider line 15 of lsearch()
, when called from line 37 of main()
. In this case, the array
variable is NULL
. Accessing memory through a NULL
pointer is an error. (Technically, the various standards leave as “undefined” what happens when you dereference a NULL
pointer. Most modern systems do what GNU/Linux does; they kill the process by sending it a SIGSEGV
signal; this in turn produces a core dump. This process is described in Chapter 10, “Signals,” page 347.)
This case raises an important point about assertions. Frequently, programmers mistakenly use assertions instead of runtime error checking. In our case, the test for ’array != NULL
’ should be a runtime check:
if (array == NULL) return -1;
The test for ’size > 0
’ (line 13) is less problematic; if size
is 0
or less than 0
, the loop never executes and lsearch()
(correctly) returns -1
. (In truth, this assertion isn’t needed because the code correctly handles the case in which ’size <= 0
’.)
The logic behind turning off assertions is that the extra checking can slow program performance and that therefore they should be disabled for the production version of a program. C.A.R. Hoare[3] made this observation, however:
Finally, it is absurd to make elaborate security checks on debugging runs, when no trust is put in the results, and then remove them in production runs, when an erroneous result could be expensive or disastrous. What would we think of a sailing enthusiast who wears his lifejacket when training on dry land, but takes it off as soon as he goes to sea?
Given these sentiments, our recommendation is to use assertions thoughtfully: First, for any given assertion, consider whether it should instead be a runtime check. Second, place your assertions carefully so that you won’t mind leaving assertion checking enabled, even in the production version of your program.
Finally, we’ll note the following, from the “BUGS” section of the GNU/Linux assert(3) manpage:
assert()
is implemented as a macro; if the expression tested has side effects, program behavior will be different depending on whether NDEBUG
is defined. This may create Heisenbugs which go away when debugging is turned on.
Heisenberg’s famous Uncertainty Principle from physics indicates that the more precisely you can determine a particle’s velocity, the less precisely you can determine its position, and vice versa. In layman’s terms, it states that the mere act of observing the particle affects it.
A similar phenomenon occurs in programming, not related to particle physics: The act of compiling a program for debugging, or running it with debugging enabled can change the program’s behavior. In particular, the original bug can disappear. Such a bug is known colloquially as a heisenbug.
The manpage is warning us against putting expressions with side effects into assert()
calls:
assert(*p++ == ' '),
The side-effect here is that the p
pointer is incremented as part of the test. When NDEBUG
is defined, the expression argument disappears from the source code; it’s never executed. This can lead to an unexpected failure. However, as soon as assertions are reenabled in preparation for debugging, things start working again! Such problems are painful to track down.
Several functions provide low-level services for working with arbitrary blocks of memory. Their names all start with the prefix ’mem
’:
#include <string.h> ISO C
void *memset(void *buf, int val, size_t count);
void *memcpy(void *dest, const void *src, size_t count);
void *memmove(void *dest, const void *src, size_t count);
void *memccpy(void *dest, const void *src, int val, size_t count);
int memcmp(const void *buf1, const void *buf2, size_t count);
void *memchr(const void *buf, int val, size_t count);
The memset()
function copies the value val
(treated as an unsigned char
) into the first count
bytes of buf
. It is particularly useful for zeroing out blocks of dynamic memory:
void *p = malloc(count); if (p != NULL) memset(p, 0, count);
However, memset()
can be used on any kind of memory, not just dynamic memory. The return value is the first argument: buf
.
Three functions copy one block of memory to another. The first two differ in their handling of overlapping memory areas; the third copies memory but stops upon seeing a particular value.
void *memcpy(void *dest, const void *src, size_t count)
This is the simplest function. It copies count
bytes from src
to dest
. It does not handle overlapping memory areas. It returns dest
.
void *memmove(void *dest, const void *src, size_t count)
Similar to memcpy()
, it also copies count
bytes from src
to dest
. However, it does handle overlapping memory areas. It returns dest
.
void *memccpy(void *dest, const void *src, int val, size_t count)
This copies bytes from src
to dest
stopping either after copying val
into dest
or after copying count
bytes. If it found val
, it returns a pointer to the position in dest
just beyond where val
was placed. Otherwise, it returns NULL
.
Now, what’s the issue with overlapping memory? Consider Figure 12.1.
The goal is to copy the four instances of struct xyz
in data[0]
through data[3]
into data[3]
through data[6].data[3]
is the problem here; a byte-by-byte copy moving forward in memory from data[0]
will clobber data[3]
before it can be safely copied into data[6]
! (It’s also possible to come up with a scenario where a backwards copy through memory destroys overlapping data.)
The memcpy()
function was the original System V API for copying blocks of memory; its behavior for overlapping blocks of memory wasn’t particularly defined one way or the other. For the 1989 C standard, the committee felt that this lack of defined behavior was a problem; thus they invented memmove()
. For historical compatibility, memcpy()
was left alone, with the behavior for overlapping memory specifically stated as undefined, and memmove()
was invented to provide a routine that would correctly deal with problem cases.
Which one should you use in your own code? For a library function that has no knowledge of the memory areas being passed into it, you should use memmove()
. That way, you’re guaranteed that there won’t be any problems with overlapping areas. For application-level code that “knows” that two areas don’t overlap, it’s safe to use memcpy()
.
For both memcpy()
and memmove()
(as for strcpy()
), the destination buffer is the first argument and the source is the second one. To remember this, note that the order is the same as for an assignment statement:
dest = src;
(Many systems have manpages that don’t help, providing the prototype as ’void *memcpy(void *buf1, void *buf2, size_t n)
’ and relying on the prose to explain which is which. Fortunately, the GNU/Linux manpage uses better names.)
The memcmp()
function compares count
bytes from two arbitrary buffers of data. Its return value is like strcmp()
: negative, zero, or positive if the first buffer is less than, equal to, or greater than the second one.
You may be wondering “Why not use strcmp()
for such comparisons?” The difference between the two functions is that memcmp()
doesn’t care about zero bytes (the ’