I believe that in the end the word will break cement.
Pussy Riot, paraphrasing Aleksandr Solzhenitsyn in a statement on August 8, 2012
A string of letters is an array of indeterminate length, and automatically allocated arrays (allocated on the stack) can’t be resized, and that in a nutshell is the problem with text in C. Fortunately, many others before us have already faced this problem and produced at least partial solutions. A handful of C-standard and POSIX-standard functions are sufficient to handle many of our string-building needs.
Also, C was designed in the 1970s, before the invention of non-English languages. Again, with the right functions (and the right understanding of how language is encoded), C’s original focus on English is not a real problem.
The asprintf
function allocates the amount of string space you will need, and then fills the string. That means you never really have to worry about string-allocating again.
asprintf
is not part of the C standard, but it’s available on systems with the GNU or BSD standard library, which covers a big range of users. Further, the GNU Libiberty library provides a version of asprintf
that you can either cut and paste into your own code base or call from the library with a -liberty
flag for the linker. Libiberty ships with some systems with no native asprintf
, like MSYS for Windows. And if cutting and pasting from libiberty
is not an option, I’ll present a quick reimplementation using the standard vsnprintf
function.
The old way made people homicidal (or suicidal, depending on temperament), because they first had to get the length of the string they were about to fill, allocate space, and then actually write to the space. Don’t forget the extra slot for the null terminator!
Example 9-1 demonstrates the painful way of setting up a string, for the purpose of using C’s system
command to run an external utility. The thematically appropriate utility, strings
, searches a binary for printable plain text. The get_strings
function will receive argv[0]
, the name of the program itself, so the program searches itself for strings. This is perhaps amusing, which is all we can ask of demo code.
#include <stdio.h>
#include <string.h>
//strlen
#include <stdlib.h>
//malloc, free, system
void
get_strings
(
char
const
*
in
){
char
*
cmd
;
int
len
=
strlen
(
"strings "
)
+
strlen
(
in
)
+
1
;
cmd
=
malloc
(
len
);
snprintf
(
cmd
,
len
,
"strings %s"
,
in
);
if
(
system
(
cmd
))
fprintf
(
stderr
,
"something went wrong running %s.
"
,
cmd
);
free
(
cmd
);
}
int
main
(
int
argc
,
char
**
argv
){
get_strings
(
argv
[
0
]);
}
Example 9-2 uses asprintf
, so malloc
gets called for you, which means that you also don’t need the step where you measure the length of the string.
#define _GNU_SOURCE
//cause stdio.h to include asprintf
#include <stdio.h>
#include <stdlib.h>
//free
void
get_strings
(
char
const
*
in
){
char
*
cmd
;
asprintf
(
&
cmd
,
"strings %s"
,
in
);
if
(
system
(
cmd
))
fprintf
(
stderr
,
"something went wrong running %s.
"
,
cmd
);
free
(
cmd
);
}
int
main
(
int
argc
,
char
**
argv
){
get_strings
(
argv
[
0
]);
}
The actual call to asprintf
looks a lot like the call to sprintf
, except you need to send the location of the string, not the string itself, because new space will be malloc
ed and the location written into the char **
you input.
Say that, for whatever reason, the GNU asprintf
isn’t available for your use. Counting the length that a printf
statement and its arguments will eventually expand to is error-prone, so how can we get the computer to do it for us? The answer has been staring at us all along, in C99 §7.19.6.12(3) and C11 §7.21.6.12(3): “The vsnprintf
function returns the number of characters that would have been written had n been sufficiently large, not counting the terminating null character, or a negative value if an encoding error occurred.” The snprintf
function also returns a would-have-been value.
So if we do a test run with vsnprintf
on a 1-byte string, we can get a return value with the length that the string should be. Then we can allocate the string to that length and run vsnprintf
for real. We’re running the function twice, so it may take twice as long to work, but it’s worth it for the safety and convenience.
Example 9-3 presents an implementation of asprintf
via this procedure of running vsnprintf
twice. I wrapped it in a HAVE_ASPRINTF
check to be Autoconf-friendly; see below.
asprintf
(asprintf.c)#ifndef HAVE_ASPRINTF
#define HAVE_ASPRINTF
#include <stdio.h>
//vsnprintf
#include <stdlib.h>
//malloc
#include <stdarg.h>
//va_start et al
/* The declaration, to put into a .h file. The __attribute___ tells the compiler to check printf-style type-compliance. It's not C-standard, but a lot of compilers
support it; just remove it if yours doesn't. */
int
asprintf
(
char
**
str
,
char
*
fmt
,
...)
__attribute__
((
format
(
printf
,
2
,
3
)));
int
asprintf
(
char
**
str
,
char
*
fmt
,
...){
va_list
argp
;
va_start
(
argp
,
fmt
);
char
one_char
[
1
];
int
len
=
vsnprintf
(
one_char
,
1
,
fmt
,
argp
);
if
(
len
<
1
){
fprintf
(
stderr
,
"An encoding error occurred. Setting the input pointer to NULL.
"
);
*
str
=
NULL
;
return
len
;
}
va_end
(
argp
);
*
str
=
malloc
(
len
+
1
);
if
(
!
str
)
{
fprintf
(
stderr
,
"Couldn't allocate %i bytes.
"
,
len
+
1
);
return
-
1
;
}
va_start
(
argp
,
fmt
);
vsnprintf
(
*
str
,
len
+
1
,
fmt
,
argp
);
va_end
(
argp
);
return
len
;
}
#endif
#ifdef Test_asprintf
int
main
(){
char
*
s
;
asprintf
(
&
s
,
"hello, %s."
,
"—Reader—"
);
printf
(
"%s
"
,
s
);
asprintf
(
&
s
,
"%c"
,
' '
);
printf
(
"blank string: [%s]
"
,
s
);
int
i
=
0
;
asprintf
(
&
s
,
"%i"
,
i
++
);
printf
(
"Zero: %s
"
,
s
);
}
#endif
If you have a string of predetermined length, str
, and write data of unknown length to it using sprintf
, then you might find that data gets written to whatever is adjacent to str
—a classic security breach. Thus, sprintf
is effectively deprecated in favor of snprintf
, which limits the amount of data written.
Using asprintf
effectively prevents this problem, because as much memory as is needed will get written. It’s not perfect: eventually, whatever mangled and improper input string will hit a