The last chapter covered some topics that traditional C textbooks stressed but which may not be relevant in a current computing environment. This chapter covers some points that I have found many textbooks do not cover or only mention in passing. Like the last chapter, this chapter covers a lot of little topics, but it breaks down into three main segments:
The preprocessor often gets short mention, I think because many people think of it as auxiliary or not real C. But it’s there for a reason: there are things that macros can do that the rest of the C language can’t. Not all standards-compliant compilers offer the same facilities, and the preprocessor is also how we determine and respond to the characteristics of the environment.
In my survey of C textbooks, I found a book or two that do not even mention the static
and extern
keywords. So this chapter takes some time to discuss linkage, and break down the confusing uses of the static
keyword.
The const
keyword fits this chapter because it is too useful to not use, but it has oddities in its specification in the standard and in its implementation in common compilers.
Some situations have common trap doors that users must know to avoid, but if you can provide a macro that always dodges the trap, you have a safer user interface. Chapter 10 will present several options for making the user interface to your library friendlier and less error-inviting, and will rely heavily on macros to do it.
I read a lot of people who say that macros are themselves invitations for errors and should be avoided, but those people don’t advise that you shouldn’t use NULL
, is
alpha
, isfinite
, assert
, type-generic math like log
, sin
, cos
, or pow
, or any of the dozens of other facilities defined by the GNU-standard library via macros. Those are well-written, robust macros that do what they should every time.
Macros perform text substitutions (referred to as expansions under the presumption that the substituted text will be longer), and text substitutions require a different mind-set from the usual functions, because the input text can interact with the text in the macro and other text in the source code. Macros are best used in cases where we want those interactions, and when we don’t we need to take care to prevent them.
Before getting to the rules for making macros robust, of which there are three, let me distinguish between two types of macro. One type expands to an expression, meaning that it makes sense to evaluate these macros, print their values, or in the case of numeric results, use them in the middle of an equation. The other type is a block of instructions, that might appear after an if
statement or in a while
loop. That said, here are some rules:
Parens! It’s easy for expectations to be broken when a macro pastes text into place. Here’s an easy example:
#define double(x) 2*x Needs more parens.
Now, the user tries double(1+1)*8
, and the macro expands it to 2*1+1*8
, equals 10, not 32. Parens make it work:
#define double(x) (2*(x))
Now (2*(1+1))*8
is what it should be. The general rule is to put all inputs in parens unless you have a specific reason not to. If you have an expression-type macro, put the macro expansion itself in parens.
Avoid double usage. This textbook example is a little risky:
#define max(a, b) ((a) > (b) ? (a) : (b))
If the user tries int x=1, y=2; int m=max(x, y++)
, the expectation is that m
will be 2 (the preincrement value of y
), and then y
will bump up to 3. But the macro expands to:
m
=
((
x
)
>
(
y
++
)
?
(
x
)
:
(
y
++
))
which will evaluate y++
twice, causing a double increment where the user expected only a single, and m=3
where the user expected m=2
.
If you have a block-type macro, then you can declare a variable to take on the value of the input at the head of the block, and then use your copy of the input for the rest of the macro.
This rule is not adhered to as religiously as the parens rule—the max
macro often appears in the wild—so bear in mind as a macro user that side effects inside calls to unknown macros should be kept to a minimum.
Curly braces for blocks. Here’s a simple block macro:
#define doubleincrement(a, b) Needs curly braces.
(a)++;
(b)++;
We can make it do the wrong thing by putting it after an if
statement:
int
x
=
1
,
y
=
0
;
if
(
x
>
y
)
doubleincrement
(
x
,
y
);
Adding some indentation to make the error obvious, this expands to:
int
x
=
1
,
y
=
0
;
if
(
x
>
y
)
(
x
)
++
;
(
y
)
++
;
Another potential pitfall: what if your macro declares a variable total
, but the user defined a total
already? Variables declared in the block can conflict with variables declared outside the block. Example 8-1 has the simple solution to both problems: put curly braces around your macro.
Putting the whole macro in curly braces allows us to have an intermediate variable named total
that lives only inside the scope of the curly braces around the macro, and it therefore in no way interferes with the total
declared in main
.
#include <stdio.h>
#define sum(max, out) {
int total=0;
for (int i=0; i<= max; i++)
total += i;
out = total;
}
int
main
(){
int
out
;
int
total
=
5
;
sum
(
5
,
out
);
printf
(
"out= %i original total=%i
"
,
out
,
total
);
}
But there is one small glitch remaining. Getting back to the simple doubleincrement
macro, this code:
#define doubleincrement(a, b) {
(a)++;
(b)++;
}
if
(
a
>
b
)
doubleincrement
(
a
,
b
);
else
return
0
;
expands to this:
if
(
a
>
b
)
{
(
a
)
++
;
(
b
)
++
;
};
else
return
0
;
The extra semicolon just before the else
confuses the compiler. Users will get a compiler error, which means that they cannot ship erroneous code, but the solution of removing the semicolon or wrapping the statement in a seemingly extraneous set of curly braces will not be apparent and makes for a not-transparent UI. To tell you the truth, there’s not much you can do about this. The common solution to this is to wrap the macro still further in a run-once do
-while
loop:
#define doubleincrement(a, b) do {
(a)++;
(b)++;
} while(0)
if
(
a
>
b
)
doubleincrement
(
a
,
b
);
else
return
0
;
In this case, the problem is solved, and we have a macro that users won’t know is a macro. But what if we have a macro which has a break
either built in or somehow provided by the user? Here is another assertion macro, and a usage which won’t work:
#define AnAssert(expression, action) do {
if (!(expression)) action;
} while(0)
double
an_array
[
100
];
double
total
=
0
;
…
for
(
int
i
=
0
;
i
<
100
;
i
++
){
AnAssert
(
!
(
isnan
(
an_array
[
i
])),
break
);
total
+=
an_array
[
i
];
}
The user is unaware that the break
statement provided is embedded in an internal-to-macro do
-while
loop, and thus may compile and run incorrect code. In cases where a do
-while
wrapper would break the expected behavior of break
, it is probably easier to leave off the do
-while
wrapper and warn users about the quirk regarding semicolons before an else
.12
Using gcc -E curly.c
, we see that the preprocessor expands the sum
macro as shown next, and following the curly braces shows us that there’s no chance that the total
in the macro’s scope will interfere with the total
in the main
scope. So the code would print total
as 5:
int
main
(){
int
out
;
int
total
=
5
;
{
int
total
=
0
;
for
(
int
i
=
0
;
i
<=
5
;
i
++
)
total
+=
i
;
out
=
total
;
};
printf
(
"out= %i total=%i
"
,
out
,
total
);
}
Limiting a macro’s scope with curly braces doesn’t protect us from all name clashes. In the previous example, what would happen if we were to write int out, i=5; sum(i, out);
?
If you have a macro that is behaving badly, use the -E
flag for gcc
, Clang
, or icc
to only run the preprocessor, printing the expanded version of everything to stdout
. Because that includes the expansion of #include <stdio.h>
and other voluminous boilerplate, I usually redirect the results to a file or to a pager, with a form like gcc -E
mycode.c
|less
, and then search the results for the macro expansion I’m trying to debug.
That’s about it for macro caveats. The basic principle of keeping macros simple still makes sense, and you’ll find that macros in production code tend to be one-liners that prep the inputs in some way and then call a standard function to do the real work. The debugger and non-C systems that can’t parse macro definitions themselves don’t have access to your macro, so whatever you write should still have a way of being usable without the macros. “Linkage with static and extern” will have one suggestion for reducing the hassle when writing down simple functions.
The token reserved for the preprocessor is the octothorp, #
, and the preprocessor makes three entirely different uses of it: to mark directives, to stringize an input, and to concatenate tokens.
You know that a preprocessor directive like #define
begins with a #
at the head of the line.
As an aside, whitespace before the #
is ignored [K&R 2nd ed. §A12, p. 228], which has some typographical utility. For example, you can put throwaway macros in the middle of a function, just before they get used, and indent them to flow with the function. According to the old school, putting the macro right where it gets used is against the “correct” organization of a program (which puts all macros at the head of the file), but having it right there makes it easy to refer to and makes the throwaway nature of the macro evident. In “OpenMP”, we’ll annotate for
loops with #pragma
s, and putting the #
flush with the left margin would produce an unreadable mess.
The next use of the #
is in a macro: it turns a macro argument into a string. Example 8-2 shows a program demonstrating a point about the use of sizeof
(see the sidebar), though the main focus is on the use of the preprocessor macro.
#include <stdio.h>
#define Peval(cmd) printf(#cmd ": %g ", cmd);
int
main
(){
double
*
plist
=
(
double
[]){
1
,
2
,
3
};
double
list
[]
=
{
1
,
2
,
3
};
Peval
(
sizeof
(
plist
)
/
(
sizeof
(
double
)
+
0.0
));
Peval
(
sizeof
(
list
)
/
(
sizeof
(
double
)
+
0.0
));
}
When you try it, you’ll see that the input to the macro is printed as plain text, and then its value is printed, because #cmd
is equivalent to "cmd"
as a string. So Peval(list[0])
would expand to:
printf
(
"list[0]"
": %g
"
,
list
[
0
]);
Does that look malformed to you, with the two strings "list[0]" ": %g
"
next to each other? The next preprocessor feature is that if two literal strings are adjacent, the preprocessor merges them into one: "list[0]: %g
"
. And this isn’t just in macros:
printf
(
"You can use the preprocessor's string "
"concatenation to break long strings of text "
"in your program. I think this is easier than "
"using backslashes, but be careful with spacing."
);
Conversely, you might want to join together two things that are not strings. Here, use two octothorps, which I herein dub the hexadecathorp: ##
. If the value of name
is LL
, then when you see name ## _list
, read it as LL_list
, which is a valid and usable variable name.
Gee, you comment, I sure wish every array had an auxiliary variable that gave its length. OK, Example 8-3 writes a macro that declares a local variable ending in _len
for each list you tell it to care about. It’ll even make sure every list has a terminating marker, so you don’t even need the length.
That is, this macro is total overkill, and I don’t recommend it for immediate use, but it does demonstrate how you can generate lots of little temp variables that follow a naming pattern that you choose.
#include <stdio.h>
#include <math.h>
//NAN
#define Setup_list(name, ...)
double *name ## _list = (double []){__VA_ARGS__, NAN};
int
name
##
_len
=
0
;
for
(
name
##
_len
=
0
;
!
isnan
(
name
##
_list
[
name
##
_len
]);
)
name
##
_len
++
;
int
main
(){
Setup_list
(
items
,
1
,
2
,
4
,
8
);
double
sum
=
0
;
for
(
double
*
ptr
=
items_list
;
!
isnan
(
*
ptr
);
ptr
++
)
sum
+=
*
ptr
;
printf
(
"total for items list: %g
"
,
sum
);
#
define
Length
(
in
)
in
##
_len
sum
=
0
;
Setup_list
(
next_set
,
-
1
,
2.2
,
4.8
,
0.1
);
for
(
int
i
=
0
;
i
<
Length
(
next_set
);
i
++
)
sum
+=
next_set_list
[
i
];
printf
(
"total for next set list: %g
"
,
sum
);
}
The lefthand side demonstrates the use of ##
to produce a variable name following the given template. The right-hand side foreshadows Chapter 10, which demonstrates uses of variadic macros.
Generates items_len
and items_list
.
Here is a loop using the NaN
marker.
Some systems let you query an array for its own length using a form like this.
Here is a loop using the next_set_len
length variable.
As a stylistic aside, there has historically been a custom to indicate that a function is actually a macro by putting it in all caps, as a warning to be careful to watch for the surprises associated with text substitution. I think this looks like yelling, and prefer to mark macros by capitalizing the first letter. Others don’t bother with the capitalization thing at all.
The set of things that can run a C program is very diverse—from Linux PCs to Arduino microcontrollers to GE refrigerators. Your C code finds out the capabilities of the compiler and target platform via test macros, which may be defined by the compiler, -D…
flags in the compilation command, or #include
d files listing local capabilities, like unistd.h on POSIX systems or windows.h (and the headers it calls in) on Windows.
Once you have a handle on what macros can be tested for, you can use the preprocessor to handle diverse environments.
gcc
and clang
will give you a list of defined macros via the -E -dM
flags (-E
: run only the preprocessor; -dM
: dump macro values). On the box I’m writing on,
echo "" | clang -dM -E -xc -
produces 157 macros.
It would be impossible to write down a complete list of feature macros, including those defined for the hardware, the brand of standard C library, and the compiler, but Table 8-1 lists some of the more common and stable macros and their meaning. I chose macros that are relevant to this book or are broad checks for system type. The ones that begin with __STDC_…
are defined by the C standard.
One of Autoconf’s key strengths is generating macros to describe capabilities. Let us say that you are using Autoconf, that your config.ac file includes a line with this macro:
AC_CHECK_FUNCS([strcasecmp asprintf])
and that the system where ./configure
was run has (POSIX-standard) strcasecmp
but is missing (GNU/BSD-standard) asprintf
. Then Autoconf will produce a header named config.h including these two lines:
#define HAVE_STRCASECMP 1
/* #undef HAVE_ASPRINTF */
You can then accommodate all options using the #ifdef
(if defined) or #ifndef
(if not defined) preprocessor directives, like:
#include "config.h"
#ifndef HAVE_ASPRINTF
[paste the source code for asprintf (Example 9-3) here.]
#endif
There are times when there is nothing to be done about a missing feature but to stop, in which case you can use the #error
preprocessor directive:
#ifndef HAVE_ASPRINTF
#
error
"HAVE_ASPRINTF undefined. I simply refuse to "
"compile on a system without asprintf."
#endif
Since C11, there is also the _Static_assert
keyword. A static assertion takes two arguments: the static expression to be tested, and a message to be sent to the person compiling the program. A C11-compliant assert.h header defines the less typographically awkward static_assert
to expand to the _Static_assert
keyword [C11 §7.2(3)]. Sample usage:
#include <limits.h>
//INT_MAX
#include <assert.h>
_Static_assert
(
INT_MAX
<
33000L
,
"Your compiler uses very short integers."
);
#ifndef HAVE_ASPRINTF
static_assert
(
0
,
"HAVE_ASPRINTF undefined. I still refuse to "
"compile on a system without asprintf."
);
#endif
The L
s at the end of 33000L
and some of the year-month values above indicate that the given numbers should be read as a long int
, in case you are on a compiler where integers this large overflow on a regular int
.
This may be a more convenient form than the #if
/#error
/#endif
form, but because it was introduced in a standard published in December 2011, it is itself a portability issue. For example, the designers of Visual Studio implement a _STATIC_ASSERT
macro which only takes one argument (the assertion), and do not recognize the standard _Static_assert
.14
Also, the #ifdef
/#error
/#endif
setup and _Static_assert
are largely equivalent: The C standard indicates that both check constant-expressions and print a string-literal, though one should do so in the preprocessing phase and one during compilation. [C99 §6.10.1(2) and C11 §6.10.1(3); C11 §6.7.10] So as of this writing, it is probably safest to stick to using the preprocessor to stop on missing capabilities.
What if you were to paste the same typedef for the same struct into a file? For instance, you could put
typedef
struct
{
int
a
;
double
b
;
}
ab_s
;
typedef
struct
{
int
a
;
double
b
;
}
ab_s
;
into a file named header.h.
A human can easily verify that these structs are the same, but the compiler is required to read any new struct declaration in a file as a new type [C99 §6.7.2.1(7) and C11 §6.7.2.1(8)]. So the above code won’t compile, as ab_s
is redeclared to be two separate (albeit equal) types.15
We can achieve the error of double-declaring by listing the typedef only once, but then including the header twice, like
#include "header.h"
#include "header.h"
Because include files frequently include other include files, this error can crop up in subtle ways involving longer chains of headers within headers. The C-standard solution to ensure that this cannot happen is generally referred to as an include guard, in which we define a variable specific to the file, and then wrap the rest of the file in an #ifndef
:
#ifndef Already_included_head_h
#define Already_included_head_h 1
[paste all of header.h here]
#endif
The first time through, the variable is not defined and the file is parsed; the second time through the variable is defined and so the rest of the file is skipped.
This form has been in use since forever (see K & R 2nd ed., §4.11.3), but it is slightly easier to use the once
pragma. At the head of the file to be included only once, add
#pragma once
and the compiler will understand that the file is not to be double-included. Pragmas are compiler-specific, with only a few defined in the C standard. However, every major compiler, including gcc
, clang
, Intel, C89-mode Visual Studio, and several others, all understand #pragma once
.
In this section, we write code that will tell the compiler what kind of advice it should give to the linker. The compiler works one .c file at a time, (typically) producing one .o file at at a time, then the linker joins those .o files together to produce one library or executable.
What happens if there are two declarations in two separate files for the variable x
? It could be that the author of one file just didn’t know that the author of the other file had chosen x
, so the two x
es should be stored in two separate spaces. Or perhaps the authors were well aware that they are referring to the same variable, and the linker should take all references of x
to be pointing to the same spot in memory.
External linkage means that symbols that match across files should be treated as the same thing by the linker. The extern
keyword will be useful to indicate external linkage (see later).16
Internal linkage indicates that a file’s instance of a variable x
or a function f()
is its own and matches only other instances of x
or f()
in the same scope (which for things declared outside of any functions would be file scope). Use the static
keyword to indicate internal linkage.
It’s funny that external linkage has the extern
keyword, but instead of something sensible like intern
for internal linkage, there’s static
. In “Automatic, Static, and Manual Memory”, I discussed the three types of memory model: static, automatic, and manual. Using the word static
for both linkage and memory model is joining together two concepts that may at one time have overlapped for technical reasons, but are now distinct.
For file scope variables, static
affects only the linkage:
The default linkage is external, so use the static
keyword to change this to internal linkage.
Any variable in file scope will be allocated using the static memory model, regardless of whether you used static
, int x
extern
, or just plain int x
.int x
For block scope variables, static
affects only the memory model:
The default linkage is internal, so the static
keyword doesn’t affect linkage. You could change the linkage by declaring the variable to be extern
, but this is rarely done.
The default memory model is automatic, so the static
keyword changes the memory model to static.
For functions, static
affects only the linkage:
Functions are only defined in file scope (though gcc
offers nested functions as an extension). As with file-scope variables, the default linkage is external, but use the static
keyword for internal linkage.
There’s no confusion with memory models, because functions are always static, like file-scope variables.
The norm for declaring a function to be shared across .c files is to put the header in a .h file to be reincluded all over your project, and put the function itself in one .c file (where it will have the default external linkage). This is a good norm, and is worth sticking to, but it is reasonably common for authors to want to put one- or two-line utility functions (like max
and min
) in a .h file to be included everywhere. You can do this by preceding the declaration of your function with the static
keyword, for example:
//In common_fns.h:
static long double max(long double a, long double b){
(a > b) ? a : b;
}
When you #include "common_fns.h"
in each of a dozen files, the compiler will produce a new instance of the max
function in each of them. But because you’ve given the function internal linkage, none of the files has made public the function name max
, so all dozen separate instances of the function can live independently with no conflicts. Such redeclaration might add a few bytes to your executable and a few milliseconds to your compilation time, but that’s irrelevant in typical environments.
The extern
keyword is a simpler issue than static
, because it is only about linkage, not memory models. The typical setup for a variable with external linkage:
In a header to be included anywhere the variable will be used, declare your variable with the extern
keyword. E.g., extern
int x
.
In exactly one .c file, declare the variable as usual, with an optional initializer. E.g., int x=3
. As with all static-memory variables, if you leave off the initial value (just int x
), the variable is initialized to zero or NULL
.
That’s all you have to do to use variables with external linkage.
You may be tempted to put the extern
declaration not in a header, but just as a loose declaration in your code. In file1.c, you have declared int x
, and you realize that you need access to x
in file2.c, so you throw a quick extern
int x
at the top of the file. This will work—today. Next month, when you change file1.c to declare double x
, the compiler’s type checking will still find file2.c to be entirely internally consistent. The linker blithely points the routine in file2.c to the location where the double
named x
is stored, and the routine blithely misreads the data there as an int
. You can avoid this disaster by leaving all extern
declarations in a header to #include
in both file1.c and file2.c. If any types change anywhere, the compiler will then be able to catch the inconsistency.
Under the hood, the system is doing a lot of work to make it easy for you to declare one variable several times while allocating memory for it only once. Formally, a declaration marked as extern
is a declaration (a statement of type information so the compiler can do consistency checking), and not a definition (instructions to allocate and initialize space in memory). But a declaration without the extern
keyword is a tentative definition: if the compiler gets to the end of the unit (defined below) and doesn’t see a definition, then the tentative definitions get turned into a single definition, with the usual initialization to zero or NULL
. The standard defines unit in that sentence as a single file, after #include
s are all pasted in [a translation unit; see C99 and C11 §6.9.2(2)].
Compilers like gcc
and clang
typically read unit to mean the entire program, meaning that a program with several non-extern
declarations and no definitions rolls all these tentative definitions up into a single definition. Even with the --pedantic
flag, gcc
doesn’t care whether you use the extern
keyword or leave it off entirely. In practice, that means that the extern
keyword is largely optional: your compiler will read a dozen declarations like int x=3
as a single definition of a single variable with external linkage. This is technically nonstandard, but K&R (2nd ed, p 227) describe this behavior as “usual in UNIX systems and recognized as a common extension by the [ANSI ’89] Standard.” (Harbison, 1991) §4.8 documents four distinct interpretations of the rules for extern
s.
This means that if you want two variables with the same name in two files to be distinct, but you forget the static
keyword, a compiler may link those variables together as a single variable with external linkage; subtle bugs can easily ensue. So be careful to use static
for all file-scope variables intended to have internal linkage.
The const
keyword is fundamentally useful, but the rules around const
have several surprises and inconsistencies. This segment will point them out so they won’t be surprises anymore, which should make it easier for you to use const
wherever good style advises that you do.
Early in your life, you learned that copies of input data are passed to functions, but you can still have functions that change input data by sending in a copy of a pointer to the data. When you see that an input is plain, not-pointer data, then you know that the caller’s original version of the variable won’t change. When you see a pointer input, it’s unclear. Lists and strings are naturally pointers, so the pointer input could be data to be modified, or it could just be a string.
The const
keyword is a literary device for you, the author, to make your code more readable. It is a type qualifier indicating that the data pointed to by the input pointer will not change over the course of the function. It is useful information to know when data shouldn’t change, so do use this keyword where possible.
The first caveat: the compiler does not lock down the data being pointed to against all modification. Data that is marked as const
under one name can be modified using a different name. In Example 8-4, a
and b
point to the same data, but because a
is not const
in the header for set_elmt
, it can change an element of the b
array. See Figure 8-1.
void
set_elmt
(
int
*
a
,
int
const
*
b
){
a
[
0
]
=
3
;
}
int
main
(){
int
a
[
10
]
=
{};
int
const
*
b
=
a
;
set_elmt
(
a
,
b
);
}
So const
is a literary device, not a lock on the data.
The trick to reading declarations is to read from right to left. Thus:
int const
A constant integer
int const *
A (variable) pointer to a constant integer
int * const
A constant pointer to a (variable) integer
int * const *
A pointer to a constant pointer to an integer
int const * *
A pointer to a pointer to a constant integer
int const * const *
A pointer to a constant pointer to a constant integer
You can see that the const
always refers to the text to its left, just as the *
does.
You can switch a type name and const
, and so write either int const
or const int
(though you can’t do this switch with const
and *
). I prefer the int const
form because it provides consistency with the more complex constructions and the right-to-left rule. There’s a custom to use the const int
form, perhaps because it reads more easily in English or because that’s how it’s always been done. Either works.
In practice, you will find that const
sometimes creates tension that needs to be resolved: when you have a pointer that is marked const
but want to send it as an input to a function that does not have a const
marker in the right place. Maybe the function author thought that the keyword was too much trouble, or believed the chatter about how shorter code is always better code, or just forgot.
Before proceeding, you’ll have to ask yourself if there is any way that the pointer could change in the const
-less function being called. There might be an edge case where something gets changed, or some other odd reason. This is stuff worth knowing anyway.
If you’ve established that the function does not break the promise of const
-ness that you made with your pointer, then it is entirely appropriate to cheat and cast your const
pointer to a non-const
for the sake of quieting the compiler.
//No const
in the header this time...
void set_elmt(int *a, int *b){
a[0] = 3;
}
int main(){
int a[10];
int const *b = a;
set_elmt(a, (int*)b); //...so add a type-cast to the call.
}
The rule seems reasonable to me. You can override the compiler’s const
-checking, as long as you are explicit about it and indicate that you know what you are doing.
If you are worried that the function you are calling won’t fulfill your promise of const
-ness, then you can take one step further and make a full copy of the data, not just an alias. Because you don’t want any changes in the variable anyway, you can throw out the copy afterward.
Let us say that we have a struct type—name it counter_s
—and we have a function that takes in such a struct, of the form f(counter_s const *in)
. Can the function modify the elements of the structure?
Let’s try it: Example 8-5 generates a struct with two pointers, and in ratio
, that struct becomes const
, yet when we send one of the pointers held by the structure to the const
-less subfunction, the compiler doesn’t complain.
#include <assert.h>
#include <stdlib.h>
//assert
typedef
struct
{
int
*
counter1
,
*
counter2
;
}
counter_s
;
void
check_counter
(
int
*
ctr
){
assert
(
*
ctr
!=
0
);
}
double
ratio
(
counter_s
const
*
in
){
check_counter
(
in
->
counter2
);
return
*
in
->
counter1
/
(
*
in
->
counter2
+
0.0
);
}
int
main
(){
counter_s
cc
=
{.
counter1
=
malloc
(
sizeof
(
int
)),
.
counter2
=
malloc
(
sizeof
(
int
))};
*
cc
.
counter1
=
*
cc
.
counter2
=
1
;
ratio
(
&
cc
);
}
The incoming struct is marked as const
.
We send an element of the const
struct to a function that takes not-const
inputs. The compiler does not complain.
This is declaration via designated initializers—coming soon.
In the definition of your struct, you can specify that an element be const
, though this is typically more trouble than it is worth. If you really need to protect only the lowest level in your hierarchy of types, your best bet is to put a note in the documentation.
Example 8-6 is a simple program to check whether the user gave Iggy Pop’s name on the command line. Sample usage from the shell (recalling that $?
is the return value of the just-run program):
iggy_pop_detector Iggy Pop; echo $? #prints 1 iggy_pop_detector Chaim Weitz; echo $? #prints 0
#include <stdbool.h>
#include <strings.h>
//strcasecmp (from POSIX)
bool
check_name
(
char
const
**
in
){
return
(
!
strcasecmp
(
in
[
0
],
"Iggy"
)
&&
!
strcasecmp
(
in
[
1
],
"Pop"
))
||
(
!
strcasecmp
(
in
[
0
],
"James"
)
&&
!
strcasecmp
(
in
[
1
],
"Osterberg"
));
}
int
main
(
int
argc
,
char
**
argv
){
if
(
argc
<
2
)
return
0
;
return
check_name
(
&
argv
[
1
]);
}
The check_name
function takes in a pointer to constant string, because there is no need to modify the input strings. But when you compile it, you’ll find that you get a warningclang
says: “passing char **
to parameter of type const char **
discards qualifiers in nested pointer types.” In a sequence of pointers, all the compilers I could find will convert to const
what you could call the top-level pointer (casting to char * const *
), but complain when asked to const
-ify what that pointer is pointing to (char const **
, aka const char **
).
Again, you’ll need to make an explicit cast—replace check_name(&argv[1])
with:
check_name
((
char
const
**
)
&
argv
[
1
]);
Why doesn’t this entirely sensible cast happen automatically? We need some creative setup before a problem arises, and the story is inconsistent with the rules to this point. So the explanation is a slog; I will understand if you skip it.
The code in Example 8-7 creates the three links in the diagram: the direct link from constptr -> fixed
, and the two steps in the indirect link from constptr -> var
and var -> fixed
. In the code, you can see that two of the assignments are made explicitly: constptr -> var
and constptr -> -> fixed
. But because *constptr == var
, that second link implicitly creates the var -> fixed
link. When we assign *var=30
, that assigns fixed = 30
.
#include <stdio.h>
int
main
(){
int
*
var
;
int
const
**
constptr
=
&
var
;
// the line that sets up the failure
int
const
fixed
=
20
;
*
constptr
=
&
fixed
;
// 100% valid
*
var
=
30
;
printf
(
"x=%i y=%i
"
,
fixed
,
*
var
);
}
We would never allow int *var
to point directly at int const fixed
. We only managed it via a sleight-of-pointer where var
winds up implicitly pointing to fixed
without explicitly stating it.
As earlier, data that is marked as const
under one name can be modified using a different name. So, really, it’s little surprise that we were able to modify the const
data using an alternative name.17
I enumerate this list of problems with const
so that you can surmount them. As literature goes, it isn’t all that problematic, and the recommendation that you add const
to your function declarations as often as appropriate still stands—don’t just grumble about how the people who came before you didn’t provide the right headers. After all, some day others will use your code, and you don’t want them grumbling about how they can’t use the const
keyword because your functions don’t have the right headers.
12 There is also the option of wrapping the block in if (1){ … } else (void)0
, which again absorbs a semicolon. This technically works, but triggers warnings when the macro is itself embedded in an if
-else
statement when using the -Wall
compiler flag, and so is also not transparent to users.
13 On the validity of blank macro arguments, see C99 and C11 §6.10.3(4), which explicitly allow “arguments consisting of no preprocessing tokens.”
14 See the Microsoft Developer Network.
15 If the types are the same, then the duplicate typedefs are not a problem, as per C11 §6.7(3): “A typedef name may be redefined to denote the same type as it currently does, provided that type is not a variably modified type.”
16 This is from C99 and C11 §6.2.3, which is actually about resolving symbols across different scopes, not just files. But trying crazy linkage tricks across different scopes within one file is generally not done.
17 The code here is a rewrite of the example in C99 and C11 §6.5.16.1(6), where the line analogous to constptr=&var
is marked as a constraint violation. Whether it is a constraint violation seems to depend on how one reads “both operands [on either side of an =
] are pointers to qualified or unqualified versions of compatible types” in the “constraints” section of C99 and C11 §6.5.16.1. I’m not the only one who thinks it’s ambiguous: compilers are supposed to throw an error and refuse to compile the program on constraint violations, but gcc
and clang
mark this form with a warning and continue.
18.119.122.82