Using sscanf(3) for Conversion and Validation

The function sscanf(3) is like a Swiss Army Knife for C input and conversion. While this mechanism is not a perfect solution for all conversions, it still enjoys simplicity of use and provides some measure of error detection.

Applying sscanf(3) to Numeric Conversion

Listing 10.1 shows a simple program that extracts the month, day, and year from a string. The input data has been deliberately made as messy as possible (lines 15–18) with lots of whitespace.

Code Listing 10.1. sscanf.c—Extracting Date Fields Using sscanf(3)
1:   /* sscanf.c */
2:
3:   #include <stdio.h>
4:   #include <stdlib.h>
5:   #include <string.h>
6:
7:   int
8:   main(int argc,char *argv[]) {
9:       int x;
10:      char *datestr;      /* Date string to parse */
11:      int nf;             /* Number of fields converted */
12:      int n;              /* # of characters scanned */
13:      int mm, dd, yyyy;   /* Month, day and year */
14:
15:      static char *sdate[] = {
16:          "   1 /  2  /  2000  ",
17:          " 03 - 9-2001,etc."
18:      } ;
19:
20:      for ( x=0; x<2; ++x ) {
21:          datestr = sdate[x];     /* Parse this date */
22:          printf("Extracting from '%s'
",datestr);
23:
24:          nf = sscanf(datestr,"%d %*[/-]%d %*[/-]%d%n",&mm,&dd,&yyyy,&n);
25:
26:          printf("%02d/%02d/%04d nf=%d, n=%d
",mm,dd,yyyy,nf,n);
27:
28:          if ( nf >= 3 )
29:              printf("Remainder = '%s'
",&datestr[n]);
30:      }
31:
32:      return 0;
33:  }

The variables used in this program are as follows:

  • Variable nf receives the number of the conversions that sscanf(3) successfully accomplishes (line 11).

  • Variable n receives the number of characters scanned so far (line 12).

  • Variables mm, dd, and yyyy are the month, day, and year extracted values, respectively (line 13).

  • The character pointer array sdate[] contains the two strings that are going to be used for extraction of the date components (lines 15–18).

Testing Numeric Conversions Using sscanf(3)

Compiling and running this program yields the following results under FreeBSD:

$ make sscanf
cc -c -D_POSIX_C_SOURCE=199309L -D_POSIX_SOURCE -Wall sscanf.c
cc sscanf.o -o sscanf
$ ./sscanf
Extracting from '  1 /  2  /  2000  '
01/02/2000 nf=3, n=18
Remainder = ' '
Extracting from '03 - 9-2001,etc.'
03/09/2001 nf=3, n=12
Remainder = ',etc.'
$

The first example shows how the date 01/02/2000 is successfully parsed. The second result 03/09/2001 is parsed out of the date string using hyphens instead. This is possible because the sscanf(3) %[] format feature was used to accept either a slash or a hyphen (line 24). The full format specifier used was %*[/-]. The asterisk indicates that the extracted value is not assigned to a variable (nor is it counted for the purposes of %n).

Notice that a space character precedes the %*[/-] format specification. This causes sscanf(3) to skip over preceding spaces prior to the slash or hyphen, if spaces are present.

The extracted results are reported in line 26, along with the values nf and n. Line 28 tests the value of nf before reporting the remainder string in line 29. This is necessary because the value of n is undefined if the sscanf(3) function did not work its way to the point of the %n specification (at the end of line 24).

The remainder strings show the points where the date extractions ended in both data examples. The last example shows the parse ending at the point ,etc..

Note that there are only three conversions present in the sscanf(3) call of line 24. This is because the %n specification does not count as a conversion.

Improving the sscanf(3) Conversion

One irritation that remains in our example in Listing 10.1 is that it does not skip over the trailing whitespace. This makes it difficult to test whether the entire input string was consumed when the date was extracted. Leftover data usually indicates that not all of it was valid.

This problem is remedied by altering the sscanf(3) statement on line 24 to read

nf = sscanf(datestr,"%d %*[/-]%d %*[/-]%d %n",&mm,&dd,&yyyy,&n);

If you look carefully at the format string, you will notice that one space was inserted before the %n specifier. This coaxes sscanf(3) into skipping over more whitespace before reporting how many characters were scanned. With the whitespace skipped, the test for leftovers is simple:

if ( datestr[n] != 0 ) {
    printf("EEK! Leftovers = '%s'
",&datestr[n]);

If the expression datestr[n] points to a null byte after the conversion, then it is known that all the input string was valid for the conversion.

The Limitations of sscanf(3)

The sscanf(3) return count indicates whether or not the conversion(s) was successful. When the %n specifier is processed, the caller can also determine where the scanning ended. However, sscanf(3) still suffers from the limitation that it does not indicate to the caller where in the string the point of failure is when the conversion fails.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.97.157