Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6. General Library Interfaces — Part 1

In this chapter

6.1 Times and Dates page 166
6.2 Sorting and Searching Functions page 181
6.3 User and Group Names page 195
6.4 Terminals: isatty() page 202
6.5 Suggested Reading page 203
6.6 Summary page 203
Exercises page 205

We saw in Chapter 5, “Directories and File Metadata,” page 117, that directly reading a directory returns filenames in the order in which they’re kept in the directory. We also saw that the struct stat contains all the information about a file, except its name. However, some components of that structure are not directly usable; they’re just numeric values.

This chapter presents the rest of the APIs needed to make full use of the struct stat component values. In order, we cover the following topics: time_t values for representing times and the time formatting function; sorting and searching functions (for sorting filenames, or any other data); the uid_t and gid_t types for representing users and groups and the functions that map them to and from the corresponding user and group names; and finally, a function to test whether a file descriptor represents a terminal.

Times and Dates

Time values are kept in the type known as time_t. The ISO C standard guarantees that this is a numeric type but does not otherwise specify what it is (integer or floating-point), or the range or the precision of the values stored therein.

On GNU/Linux and Unix systems, time_t values represent “seconds since the Epoch.” The Epoch is the beginning of recorded time, which is Midnight, January 1, 1970, UTC. On most systems, a time_t is a C long int. For 32-bit systems, this means that the time_t “overflows” sometime on January 19, 2038. By then, we hope, the time_t type will be redefined to be at least 64 bits big.

Various functions exist to retrieve the current time, compute the difference between two time_t values, convert time_t values into a more usable representation, and format both representations as character strings. Additionally, a date and time representation can be converted back into a time_t, and limited time-zone information is available.

A separate set of functions provides access to the current time with a higher resolution than one second. The functions work by providing two discrete values: the time as seconds since the Epoch, and the number of microseconds within the current second. These functions are described later in the book, in Section 14.3.1, “Microsecond Times: gettimeofday(),” page 544.

Retrieving the Current Time: `time()` and `difftime()`

The time() system call retrieves the current date and time; difftime() computes the difference between two time_t values:

#include <time.h>                                   ISO C

time_t time(time_t *t);
double difftime(time_t time1, time_t time0);

time() returns the current time. If the t parameter is not NULL, then the value pointed to by t is also filled in with the current time. It returns (time_t) -1 if there was an error, and errno is set.

Although ISO C doesn’t specify what’s in a time_t value, POSIX does indicate that it represents time in seconds. Thus, it’s both common and portable to make this assumption. For example, to see if a time value represents something that is six months or more in the past, one might use code like this:

/* Error checking omitted for brevity */
time_t now, then, some_time;

time(& now);                                        Get current time
then = now - (6L * 31 * 24 * 60 * 60);              Approximately six months ago

... set some_time, for example, via stat() ...
if (some_time < then)
   /* more than 6 months in the past */
else
   /* less than 6 months in the past */

However, since strictly portable code may need to run on non-POSIX systems, the difftime() function exists to produce the difference between two times. The same test, using difftime(), would be written this way:

time_t now, some_value;
const double six_months = 6.0 * 31 * 24 * 60 * 60;

time(& now);                                       Get current time
... set some_time, for example, via stat() ...

if (difftime(now, some_time) >= six_months)
   /* more than 6 months in the past */
else
   /* less than 6 months in the past */

The return type of difftime() is a double because a time_t could possibly represent fractions of a second as well. On POSIX systems, it always represents whole seconds.

In both of the preceding examples, note the use of typed constants to force the computation to be done with the right type of math: 6L in the first instance for long integers, 6.0 in the second, for floating point.

Breaking Down Times: `gmtime()` and `localtime()`

In practice, the “seconds since the Epoch” form of a date and time isn’t very useful except for simple comparisons. Computing the components of a time yourself, such as the month, day, year, and so on, is error prone, since the local time zone (possibly with daylight-saving time) must be taken into account, leap years must be computed correctly, and so forth. Fortunately, two standard routines do this job for you:

#include <time.h>                                  ISO C

struct tm *gmtime(const time_t *timep);
struct tm *localtime(const time_t *timep);

gmtime() returns a pointer to a struct tm that represents UTC time. localtime() returns a pointer to a struct tm representing the local time; that is, it takes the current time zone and daylight-saving time into account. In effect, this is “wall-clock time,” the date and time as it would be displayed on a wall clock or on a wristwatch. (How this works is discussed later, see Section 6.1.5, “Getting Time-Zone Information,” page 178.)

Both functions return a pointer to a struct tm, which looks like this:

struct tm {
    int     tm_sec;        /* seconds */
    int     tm_min;        /* minutes */
    int     tm_hour;       /* hours */
    int     tm_mday;       /* day of the month */
    int     tm_mon;        /* month */
    int     tm_year;       /* year */
    int     tm_wday;       /* day of the week */
    int     tm_yday;       /* day in the year */
    int     tm_isdst;      /* daylight saving time */
};

The struct tm is referred to as a broken-down time, since the time_t value is “broken down” into its component parts. The component parts, their ranges, and their meanings are shown in Table 6.1.

Table 6.1. Fields in the struct tm

Member	Range	Meaning
`tm_sec`	0–60	Second within a minute. Second 60 allows for leap seconds. (C89 had the range as 0–61.)
`tm_min`	0–59	Minute within an hour.
`tm_hour`	0–23	Hour within the day.
`tm_mday`	1–31	Day of the month.
`tm_mon`	0–11	Month of the year.
`tm_year`	0–N	Year, in years since 1900.
`tm_wday`	0–6	Day of week, Sunday = 0.
`tm_yday`	0–365	Day of year, January 1 = 0.
`tm_isdst`	< 0, 0, > 0	Daylight Savings Time flag.

The ISO C standard presents most of these values as “x since y.” For example, tm_sec is “seconds since the minute,” tm_mon is “months since January,” tm_wday is “days since Sunday,” and so on. This helps to understand why all the values start at 0. (The single exception, logically enough, is tm_mday, the day of the month, which ranges from 1–31.) Of course, having them start at zero is also practical; since C arrays are zero-based, it makes using these values as indices trivial:

static const char *const days[] = {                       Array of day names
    "Sunday", "Monday", "Tuesday", "Wednesday",
    "Thursday", "Friday", "Saturday",
};
time_t now;
struct tm *curtime;

time(& now);                                              Get current time
curtime = gmtime(& now);                                  Break it down
printf("Day of the week: %s
", days[curtime->tm_wday]);  Index and print

Both gmtime() and localtime() return a pointer to a struct tm. The pointer points to a static struct tm maintained by each routine, and it is likely that these struct tm structures are overwritten each time the routines are called. Thus, it’s a good idea to make a copy of the returned struct. Reusing the previous example:

static const char *const days[] = { /* As before */ };
time_t now;
struct tm curtime;                                           Structure, not pointer

time(& now);                                                 Get current time
curtime = *gmtime(& now);                                    Break it down and copy data
printf("Day of the week: %s
", days [curtime.tm_wday]);     Index and print, use. not ->

The tm_isdst field indicates whether or not daylight-saving time (DST) is currently in effect. A value of 0 means DST is not in effect, a positive value means it is, and a negative value means that no DST information is available. (The C standard is purposely vague, indicating only zero, positive, or negative; this gives implementors the most freedom.)

Formatting Dates and Times

The examples in the previous section showed how the fields in a struct tm could be used to index arrays of character strings for printing informative date and time values. While you could write your own code to use such arrays for formatting dates and times, standard routines alleviate the work.

Simple Time Formatting: `asctime()` and `ctime()`

The first two standard routines, listed below, produce output in a fixed format:

#include <time.h>                               ISO C

char *asctime (const struct tm *tm);
char *ctime (const time_t *timep);

As with gmtime() and localtime(), asctime() and ctime() return pointers to static buffers that are likely to be overwritten upon each call. Furthermore, these two routines return strings in the same format. They differ only in the kind of argument they accept. asctime() and ctime() should be used when all you need is simple date and time information:

#include <stdio.h>
#include <time.h>

int main (void)
{
    time_t now;

    time (& now);
    printf ("%s", ctime (& now));
 }

When run, this program produces output of the form: ’Thu May 22 15:44:21 2003’. The terminating newline is included in the result. To be more precise, the return value points to an array of 26 characters, as shown in Figure 6.1.

Figure 6.1. Return string from ctime() and asctime()

Much older Unix code relies on the fact that the values have a fixed position in the returned string. When using these routines, remember that they include a trailing newline. Thus, the small example program uses a simple "%s" format string for printf(), and not "%s ", as might be expected.

ctime() saves you the step of calling localtime(); it’s essentially equivalent to

time_t now;
char *curtime;

time (& now);
curtime = asctime (localtime (& now));

Complex Time Formatting: `strftime()`

While asctime() and ctime() are often adequate, they are also limited:

The output format is fixed. There’s no way to rearrange the order of the elements.
The output does not include time-zone information.
The output uses abbreviated month and day names.
The output assumes English names for the months and days.

For these reasons, C89 introduced the strftime() standard library routine:

#include <time.h>                                                  ISO C

size_t strftime (char *s, size_t max, const char *format,
                 const struct tm *tm);

strftime() is similar to sprintf(). The arguments are as follows:

char *s

A buffer to hold the formatted string.

size_t max

The size of the buffer.

const char *format

The format string.

const struct tm *tm

A struct tm pointer representing the broken-down time to be formatted.

The format string contains literal characters, intermixed with conversion specifiers that indicate what is to be placed into the string, such as the full weekday name, the hour according to a 24-hour or 12-hour clock, a.m. or p.m. designations, and so on. (Examples coming shortly.)

If the entire string can be formatted within max characters, the return value is the number of characters placed in s, not including the terminating zero byte. Otherwise, the return value is 0. In the latter case, the contents of s are “indeterminate.” The following simple example gives the flavor of how strftime() is used:

#include <stdio.h>
#include <time.h>

int main (void)
{
    char buf[100];
    time_t now;
    struct tm *curtime;

    time (& now);
    curtime = localtime (& now);
    (void) strftime (buf, sizeof buf,
            "It is now %A, %B %d, %Y, %I:%M %p", curtime);

    printf ("%s
", buf);
    exit (0);
}

When run, this program prints something like:

It is now Thursday, May 22, 2003, 04:15 PM

Table 6.2 provides the full list of conversion specifiers, their possible alternative representations, and their meanings. In addition, the C99 standard added more specifiers to the list; those that are new in C99 are marked with a ✓ symbol.

Table 6.2. strftime() conversion format specifiers

Specifier(s)	C99	Meaning
`%a`		The locale’s abbreviated weekday name.
`%A`		The locale’s full weekday name.
`%b`		The locale’s abbreviated month name.
`%B`		The locale’s full month name.
`%c, %Ec`		The locale’s “appropriate” date and time representation.
`%C, %EC`	✓	The century `(00–99).`
`%d, %Od`		The day of the month `(01–31)`.
`%D`	✓	Same as `%m/%d/%y.`
`%e, %Oe`	✓	The day of the month. A single digit is preceded with a space (`1–31`).
`%F`	✓	Same as `%Y-%m-%d` (ISO 8601 date format).
`%g`	✓	The last two digits of week-based year (`00–99`).
`%G`	✓	The ISO 8601 week-based year.
`%h`	✓	Same as `%b`.
`%H, %OH`		The hour in a 24-hour clock `(00–23)`.
`%I, %OI`		The hour in a 12-hour clock `(01–12)`.
`%j`		The day of the year `(001–366)`.
`%m, %Om`		The month as a number `(01–12).`
`%M, %OM`		The minute as a number `(00–59).`
`%n`	✓	A newline character `(’ ’)`.
`%p`		The locale’s a.m./p.m. designation.
`%r`	✓	The locale’s 12-hour clock time.
`%R`	✓	Same as `%H:%M.`
`%S, %OS`		The second as a number `(00–60)`.
`%t`	✓	A TAB character `(’ ’)`.
`%T`	✓	Same as `%H:%M:%S` (ISO 8601 time format).
`%u, %Ou`	✓	ISO 8601 weekday number, Monday = 1 `(1–7)`
`%U, %OU`		Week number, first Sunday is first day of week 1 `(00–53)`.
`%V, %OV`	✓	ISO 8601 week number `(01–53).`
`%w, %Ow`		The weekday as a number, Sunday = 0 `(0–6).`
`%W, %OW`		Week number, first Monday is first day of week 1 `(00–53).`
`%x, %Ex`		The locale’s “appropriate” date representation.
`%X, %EX`		The locale’s “appropriate” time representation.
`%y, %Ey, %Oy`		The last two digits of the year `(00–99).`
`%Y, %EY`		The year as a number.
`%Z`		The locale’s time zone, or no characters if no time-zone information is available.
`%%`		A single `%`.

A locale is a way of describing the current location, taking into account such things as language, character set, and defaults for formatting dates, times, and monetary amounts, and so on. We deal with them in Chapter 13, “Internationalization and Localization,” page 485. For now, it’s enough to understand that the results from strftime() for the same format string can vary, according to the current locale.

The versions starting with %E and %O are for “alternative representations.” Some locales have multiple ways of representing the same thing; these specifiers provide access to the additional representations. If a particular locale does not support alternative representations, then strftime() uses the regular version.

Many Unix versions of date allow you to provide, on the command line, a format string that begins with a + character. date then formats the current date and time and prints it according to the format string:

$ date + 'It is now %A, %B %d, %Y, %I:%M %p'
It is now Sunday, May 25, 2003, 06:44 PM

Most of the new C99 specifiers come from such existing Unix date implementations. The %n and %t formats are not strictly necessary in C, since the TAB and newline characters can be directly embedded in the string. However, in the context of a date format string on the command line, they make more sense. Thus, they’re included in the specification for strftime() as well.

The ISO 8601 standard defines (among other things) how weeks are numbered within a year. According to this standard, weeks run Monday through Sunday, and Monday is day 1 of the week, not day 0. If the week in which January 1 comes out contains at least four days in the new year, then it is considered to be week 1. Otherwise, that week is the last week of the previous year, numbered 52 or 53. These rules are used for the computation of the %g, %G, and %V format specifiers. (While parochial Americans such as the author may find these rules strange, they are commonly used throughout Europe.)

Many of the format specifiers produce results that are specific to the current locale. In addition, several indicate that they produce the “appropriate” representation for the locale (for example, %x). The C99 standard defines the values for the "C" locale. These values are listed in Table 6.3.

Table 6.3. "C" locale values for certain strftime() formats

Specifier	Meaning
`%a`	The first three characters of `%A`.
`%A`	One of `Sunday, Monday, ..., Saturday`.
`%b`	The first three characters of `%B`.
`%B`	One of `January, February, ..., December`.
`%c`	Same as `%a %b %e %T %Y`.
`%p`	One of `AM` or `PM`.
`%r`	Same as `%I:%M:%S %p`.
`%x`	Same as `%m/%d/%y`.
`%X`	Same as `%T`.
`%Z`	Implementation-defined.

It should be obvious that strftime() provides considerable flexibility and control over date- and time-related output, in much the same way as printf() and sprintf() do. Furthermore, strftime() cannot overflow its buffer, since it checks against the passed-in size parameter, making it a safer routine than is sprintf().

As a simple example, consider the creation of program log files, when a new file is created every hour. The filename should embed the date and time of its creation in its name:

/* Error checking omitted for brevity */
char fname[PATH_MAX];       /* PATH_MAX is in <limits.h> */
time_t now;
struct tm *tm;
int fd;

time (& now);
tm = localtime (& now);
strftime (fname, sizeof fname, "/var/log/myapp.%Y-%m-%d-%H:%M", tm);
fd = creat (name, 0600);
...

The year-month-day-hour-minute format causes the filenames to sort in the order they were created.

Note

Some time formats are more useful than others. For example, 12-hour times are ambiguous, as are any purely numeric date formats. (What does ’9/11’ mean? It depends on where you live.) Similarly, two-digit years are also a bad idea. Use strftime() judiciously.

Converting a Broken-Down Time to a `time_t`

Obtaining seconds-since-the-Epoch values from the system is easy; that’s how date and times are stored in inodes and returned from time() and stat(). These values are also easy to compare for equality or by < and > for simple earlier-than/later-than tests.

However, dates entered by humans are not so easy to work with. For example, many versions of the touch command allow you to provide a date and time to which touch should set a file’s modification or access time (with utime(), as described in Section 5.5.3, “Changing Timestamps: utime(),” page 157).

Converting a date as entered by a person into a time_t value is difficult: Leap years must be taken into account, time zones must be compensated for, and so on. Therefore, the C89 standard introduced the mktime() function:

#include <time.h>                                   ISO C

time_t mktime (struct tm *tm);

To use mktime(), fill in a struct tm with appropriate values: year, month, day, and so on. If you know whether daylight-saving time was in effect for the given date, set the tm_isdst field appropriately: 0 for “no,” and positive for “yes.” Otherwise, use a negative value for “don’t know.” The tm_wday and tm_yday fields are ignored.

mktime() assumes that the struct tm represents a local time, not UTC. It returns a time_t value representing the passed-in date and time, or it returns (time_t) –1 if the given date/time cannot be represented correctly. Upon a successful return, all the values in the struct tm are adjusted to be within the correct ranges, and tm_wday and tm_yday are set correctly as well. Here is a simple example:

 1  /* ch06-echodate.c --- demonstrate mktime(). */
 2
 3  #include <stdio.h>
 4  #include <time.h>
 5
 6  int main (void)
 7  {
 8      struct tm tm;
 9      time_t then;
10
11      printf ("Enter a Date/time as YYYY/MM/DD HH:MM:SS : ");
12      scanf ("%d/%d/%d %d:%d:%d",
13          & tm.tm_year, & tm.tm_mon, & tm.tm_mday,
14          & tm.tm_hour, & tm.tm_min, & tm.tm_sec);
15
16      /* Error checking on values omitted for brevity. */
17      tm.tm_year -= 1900;
18      tm.tm_mon--;
19
20      tm.tm_isdst = -1; /* Don't know about DST */
21
22      then = mktime (& tm);
23
24      printf ("Got: %s", ctime (& then));
25      exit (0);
26 }

Line 11 prompts for a date and time, and lines 12–14 read it in. (Production code should check the return value from scanf().) Lines 17 and 18 compensate for the different basing of years and months, respectively. Line 22 indicates that we don’t know whether or not the given date and time represent daylight-saving time. Line 22 calls mktime(), and line 24 prints the result of the conversion. When compiled and run, we see that it works:

$ ch06-echodate
Enter a Date/time as YYYY/MM/DD HH:MM:SS : 2003/5/25 19:07:23
Got: Sun May 25 19:07:23 2003

Getting Time-Zone Information

Early Unix systems embedded time-zone information into the kernel when it was compiled. The rules for daylight-saving time conversions were generally hard-coded, which was painful for users outside the United States or in places within the United States that didn’t observe DST.

Modern systems have abstracted that information into binary files read by the C library when time-related functions are invoked. This technique avoids the need to recompile libraries and system executables when the rules change and makes it much easier to update the rules.

The C language interface to time-zone information evolved across different Unix versions, both System V and Berkeley, until finally it was standardized by POSIX as follows:

#include <time.h>                                 POSIX

extern char *tzname[2];
extern long timezone;
extern int daylight;

void tzset (void);

The tzset() function examines the TZ environment variable to find time-zone and daylight-saving time information.^[1] If that variable isn’t set, then tzset() uses an “implementation-defined default time zone,” which is most likely the time zone of the machine you’re running on.

After tzset() has been called, the local time-zone information is available in several variables:

extern char *tzname [2]

The standard and daylight-saving time names for the time zone. For example, for U.S. locations in the Eastern time zone, the time-zone names are ’EST’ (Eastern Standard Time) and ’EDT’ (Eastern Daylight Time).

extern long timezone

The difference, in seconds, between the current time zone and UTC. The standard does not explain how this difference works. In practice, negative values represent time zones east of (ahead of, or later than) UTC; positive values represent time zones west of (behind, or earlier than) UTC. If you look at this value as “how much to change the local time to make it be the same as UTC,” then the sign of the value makes sense.

extern int daylight

This variable is zero if daylight-saving time conversions should never be applied in the current time zone, and nonzero otherwise.

Note

The daylight variable does not indicate whether daylight-saving time is currently in effect! Instead, it merely states whether the current time zone can even have daylight-saving time.

The POSIX standard indicates that ctime(), localtime(), mktime(), and strftime() all act “as if” they call tzset(). This means that they need not actually call tzset(), but they must behave as if it had been called. (The wording is intended to provide a certain amount of flexibility for implementors while guaranteeing correct behavior for user-level code.)

In practice, this means that you will almost never have to call tzset() yourself. However, it’s there if you need it.

BSD Systems Gotcha: `timezone()`, Not `timezone`

Instead of the POSIX timezone variable, a number of systems derived from 4.4 BSD provide a timezone() function:

#include <time.h>                                   BSD

char *timezone (int zone, int dst);

The zone argument is the number of minutes west of GMT, and dst is true if daylight-saving time is in effect. The return value is a string giving the name of the indicated zone, or a value expressed relative to GMT. This function provides compatibility with the V7 function of the same name and behavior.

Local Time: How Does It Know?

GNU/Linux systems store time zone information in files and directories underneath /usr/share/zoneinfo:

$ cd /usr/share/zoneinfo
$ ls -FC
Africa/      Canada/  Factory    Iceland    MST7MDT   Portugal   W-SU
America/     Chile/   GB         Indian/    Mexico/   ROC        WET
Antarctica/  Cuba     GB-Eire    Iran       Mideast/  ROK        Zulu
Arctic/      EET      GMT        Israel     NZ        Singapore 
 iso3166.tab
Asia/        EST      GMT+0      Jamaica    NZ-CHAT   SystemV/  
 posix/
Atlantic/    EST5EDT  GMT-0      Japan      Navajo    Turkey    
 posixrules
Australia/   Egypt    GMT0       Kwajalein  PRC       UCT       
 right/
Brazil/      Eire     Greenwich  Libya      PST8PDT   US/       
 zone.tab
CET          Etc/     HST        MET        Pacific/  UTC
CST6CDT      Europe/  Hongkong   MST        Poland    Universal

When possible, this directory uses hard links to provide the same data by multiple names. For example, the files EST5EDT and US/Eastern are really the same:

$ ls -il EST5EDT US/Eastern
 724350 -rw-r--r--    5 root     root         1267 Sep  6  2002
 EST5EDT
 724350 -rw-r--r--    5 root     root         1267 Sep  6  2002
 US/Eastern

Part of the process of installing a system is to choose the time zone. The correct time-zone data file is then placed in /etc/localtime:

$ file /etc/localtime
/etc/localtime: timezone data

On our system, this is a standalone copy of the time-zone file for our time zone. On other systems, it may be a symbolic link to the file in /usr/share/zoneinfo. The advantage of using a separate copy is that everything still works if /usr isn’t mounted.

The TZ environment variable, if set, overrides the default time zone:

$ date                          Date and time in default time zone
Wed Nov 19 06:44:50 EST 2003
$ export TZ=PST8PDT             Change time zone to US West Coast
$ date                          Print date and time
Wed Nov 19 03:45:09 PST 2003

This function’s widespread existence makes portable use of the POSIX timezone variable difficult. Fortunately, we don’t see a huge need for it: strftime() should be sufficient for all but the most unusual needs.

Sorting and Searching Functions

Sorting and searching are two fundamental operations, the need for which arises continually in many applications. The C library provides a number of standard interfaces for performing these tasks.

All the routines share a common theme; data are managed through void * pointers, and user-provided functions supply ordering. Note also that these APIs apply to in-memory data. Sorting and searching structures in files is considerably more involved and beyond the scope of an introductory text such as this one. (However, the sort command works well for text files; see the sort(1) manpage. Sorting binary files requires that a special-purpose program be written.)

Because no one algorithm works well for all applications, there are several different sets of library routines for maintaining searchable collections of data. This chapter covers only one simple interface for searching. Another, more advanced, interface is described in Section 14.4, “Advanced Searching with Binary Trees,” page 551. Furthermore, we purposely don’t explain the underlying algorithms, since this is a book on APIs, not algorithms and data structures. What’s important to understand is that you can treat the APIs as “black boxes” that do a particular job, without needing to understand the details of how they do the job.

Sorting: `qsort()`

Sorting is accomplished with qsort():

#include <stdlib.h>                                           ISO C

void qsort(void *base, size_t nmemb, size_t size,
           int (*compare) (const void *, const void *));

The name qsort() comes from C.A.R. Hoare’s Quicksort algorithm, which was used in the initial Unix implementation. (Nothing in the POSIX standard dictates the use of this algorithm for qsort(). The GLIBC implementation uses a highly optimized combination of Quicksort and Insertion Sort.)

qsort() sorts arrays of arbitrary objects. It works by shuffling opaque chunks of memory from one spot within the array to another and relies on you, the programmer, to provide a comparison function that allows it to determine the ordering of one array element relative to another. The arguments are as follows:

void *base

The address of the beginning of the array.

size_t nmemb

The total number of elements in the array.

size_t size

The size of each element in the array. The best way to obtain this value is with the C sizeof operator.

int (*compare)(const void *, const void *)

A possibly scary declaration for a function pointer. It says that “compare points to a function that takes two ’const void *’ parameters, and returns an int.”

Most of the work is in writing a proper comparison function. The return value should mimic that of strcmp(): less than zero if the first value is “less than” the second, zero if they are equal, and greater than zero if the first value is “greater than” the second. It is the comparison function that defines the meaning of “less than” and “greater than” for whatever it is you’re sorting. For example, to compare two double values, we could use this function:

int dcomp(const void *d1p, const void *d2p)
{
    const double *d1, *d2;

    d1 = (const double *) d1p;                 Cast pointers to right type
    d2 = (const double *) d2p;

    if (*d1 < *d2)                             Compare and right value
        return -1;
    else if (*d1 > *d2)
        return 1;
    else if (*d1 == *d2)
        return 0
    else
        return -1;       /* NaN sorts before real numbers */
}

This shows the general boilerplate for a comparison function: convert the arguments from void * to pointers to the type being compared and then return a comparison value.

For floating-point values, a simple subtraction such as ’return *d1 - *d2’ doesn’t work, particularly if one value is very small or if one or both values are special “not a number” or “infinity” values. Thus, we have to do the comparison manually, including taking into account the not-a-number value (which doesn’t even compare equal to itself!).

Example: Sorting Employees

For more complicated structures, a more involved function is necessary. For example, consider the following (rather trivial) struct employee:

struct employee {
    char lastname[30];
    char firstname[30];
    long emp_id;
    time_t start_date;
};

We might write a function to sort employees by last name, first name, and ID number:

int emp_name_id_compare(const void *e1p, const void *e2p)
{
    const struct employee *e1, *e2;
    int last, first;

    e1 = (const struct employee *) e1p;                           Convert pointers
    e2 = (const struct employee *) e2p;

    if ((last = strcmp(e1->lastname, e2->lastname)) != 0)         Compare last names
        return last;                                              Last names differ

    /* same last name, check first name */
    if ((first = strcmp(e1->firstname, e2->firstname)) != 0)      Compare first names
        return first;                                             First names differ

    /* same first name, check ID numbers */
    if (e1->emp_id < e2->emp_id)                                  Compare employee ID
        return -1;
    else if (e1->emp_id == e2->emp_id)
        return 0;
    else
        return 1;
}

The logic here is straightforward, initially comparing on last names, then first names, and then using the employee ID number if the two names are the same. By using strcmp() on strings, we automatically get the right kind of negative/zero/positive value to return.

The employee ID comparison can’t just use subtraction: suppose long is 64 bits and int is 32 bits, and the two values differ only in the upper 32 bits (say the lower 32 bits are zero). In such a case, the subtraction result would automatically be cast to int, throwing away the upper 32 bits and returning an incorrect value.

Note

We could have stopped with the comparison on first names, in which case all employees with the same last and first names would be grouped, but without any other ordering.

This point is important: qsort() does not guarantee a stable sort. A stable sort is one in which, if two elements compare equal based on some key value(s), they will maintain their original ordering, relative to each other, in the final sorted array. For example, consider three employees with the same first and last names, with employee numbers 17, 42, and 81. Their order in the original array might have been 42, 81, and 17. (Meaning, employee 42 is at a lower index than employee 81, who, in turn, is at a lower index than employee 17.) After sorting, the order might be 81, 42, and 17. If this is an issue, then the comparison routine must take all important key values into consideration. (Ours does.)

Simply by using a different function, we can sort employees by seniority:

int emp_seniority_compare(const void *e1p, const void *e2p)
{
    const struct employee *e1, *e2;
    double diff;

    e1 = (const struct employee *) e1p;              Cast pointers to correct type
    e2 = (const struct employee *) e2p;

    diff = difftime(e1->start_date, e2->start_date);  Compare times
    if (diff < 0)
        return -1;
    else if (diff > 0)
        return 1;
    else
        return 0;
}

For maximum portability we have used difftime(), which returns the difference in seconds between two time_t values. For this specific case, a cast such as—

return (int) difftime(e1->start_date, e2->start_date);

—should do the trick, since time_t values are within reasonable ranges. Nevertheless, we instead use a full three-way if statement, just to be safe.

Here is a sample data file, listing five U.S. presidents:

$ cat presdata.txt
Bush George 43 980013600            Last name, first name, president number, inauguration
Clinton William 42 727552800
Bush George 41 601322400
Reagan Ronald 40 348861600
Carter James 39 222631200

ch06-sortemp.c shows a simple program that reads this file into a struct employee array and then sorts it, using the two different comparison functions just presented.

  1  /* ch06-sortemp.c --- Demonstrate qsort() with two comparison functions. */
  2
  3  #include <stdio.h>
  4  #include <stdlib.h>
  5  #include <time.h>
  6
  7  struct employee {
  8      char lastname[30];
  9      char firstname[30];
 10      long emp_id;
 11      time_t start_date;
 12  };
 13
 14  /* emp_name_id_compare --- compare by name, then by ID */
 15
 16  int emp_name_id_compare(const void *e1p, const void *e2p)
 17  {
     ...as shown previously, omitted to save space ...
 39  }
 40
 41  /* emp_seniority_compare --- compare by seniority */
 42
 43  int emp_seniority_compare(const void *e1p, const void *e2p)
 44  {
     ... as shown previously, omitted to save space ...
 58  }
 59
 60  /* main --- demonstrate sorting */
 61
 62  int main(void)
 63  {
 64  #define NPRES 10
 65      struct employee presidents[NPRES];
 66      int i, npres;
 67      char buf[BUFSIZ];
 68
 69      /* Very simple code to read data: */
 70      for (npres = 0; npres < NPRES && fgets(buf, BUFSIZ, stdin) != NULL;
 71              npres++) {
 72         sscanf(buf, "%s %s %ld %ld
",
 73             presidents[npres].lastname,
 74             presidents[npres].firstname,
 75             &presidents[npres].emp_id,
 76             &presidents[npres].start_date);
 77      }
 78
 79      /* npres is now number of actual lines read. */
 80
 81      /* First, sort by name */
 82      qsort(presidents, npres, sizeof(struct employee), emp_name_id_compare);
 83
 84      /* Print output */
 85      printf("Sorted by name:
");
 86      for (i = 0; i < npres; i++)
 87          printf("	%s %s	%d	%s",
 88              presidents[i].lastname,
 89              presidents[i].firstname,
 90              presidents[i].emp_id,
 91              ctime(& presidents[i].start_date));
 92
 93      /* Now, sort by seniority */
 94      qsort(presidents, npres, sizeof(struct employee), emp_seniority_compare);
 95
 96      /* And print again */
 97      printf("Sorted by seniority:
");
 98      for (i = 0; i < npres; i++)
 99          printf("	%s %s	%d	%s",
100              presidents[i].lastname,
101              presidents[i].firstname,
102              presidents[i].emp_id,
103              ctime(& presidents[i].start_date));
104  }

Lines 70–77 read in the data. Note that any use of scanf() requires “well behaved” input data. If, for example, any name is more than 29 characters, there’s a problem. In this case, we’re safe, but production code must be considerably more careful.

Line 82 sorts the data by name and employee ID, and then lines 84–91 print the sorted data. Similarly, line 94 re-sorts the data, this time by seniority, with lines 97–103 printing the results. When compiled and run, the program produces the following results:

$ ch06-sortemp < presdata.txt
Sorted by name:
     Bush George     41   Fri Jan 20 13:00:00 1989
     Bush George     43   Sat Jan 20 13:00:00 2001
     Carter James    39   Thu Jan 20 13:00:00 1977
     Clinton William 42   Wed Jan 20 13:00:00 1993
     Reagan Ronald   40   Tue Jan 20 13:00:00 1981
Sorted by seniority:
     Carter James    39   Thu Jan 20 13:00:00 1977
     Reagan Ronald   40   Tue Jan 20 13:00:00 1981
     Bush George     41   Fri Jan 20 13:00:00 1989
     Clinton William 42   Wed Jan 20 13:00:00 1993
     Bush George     43   Sat Jan 20 13:00:00 2001

(We’ve used 1:00 p.m. as an approximation for the time when all of the presidents started working.^[2])

One point is worth mentioning: qsort() rearranges the data in the array. If each array element is a large structure, a lot of data will be copied back and forth as the array is sorted. It may pay, instead, to set up a separate array of pointers, each of which points at one element of the array. Then use qsort() to sort the pointer array, accessing the unsorted data through the sorted pointers.

The price paid is the extra memory to hold the pointers and modification of the comparison function to use an extra pointer indirection when comparing the structures. The benefit returned can be a considerable speedup, since only a four- or eight-byte pointer is moved around at each step, instead of a large structure. (Our struct employee is at least 68 bytes in size. Swapping four-byte pointers moves 17 times less data than does swapping structures.) For thousands of in-memory structures, the difference can be significant.

Note

If you’re a C++ programmer, beware! qsort() may be dangerous to use with arrays of objects! qsort() does raw memory moves, copying bytes. It’s completely unaware of C++ constructs such as copy constructors or operator=() functions. Instead, use one of the STL sorting functions, or use the separate-array-of-pointers technique.

Example: Sorting Directory Contents

In Section 5.3, “Reading Directories,” page 132, we demonstrated that directory entries are returned in physical directory order. Most of the time, it’s much more useful to have directory contents sorted in some fashion, such as by name or by modification time. While not standardized by POSIX, several routines make it easy to do this, using qsort() as the underlying sorting agent:

#include <dirent.h>                                                Common

int scandir(const char *dir, struct dirent ***namelist,
       int (*select) (const struct dirent *),
       int (*compare) (const struct dirent **, const struct dirent **));
int alphasort(const void *a, const void *b);

int versionsort(const void *a, const void *b);                     GLIBC

The scandir() and alphasort() functions were made available in 4.2 BSD and are widely supported.^[3] versionsort() is a GNU extension.

scandir() reads the directory named by dir, creates an array of struct dirent pointers by using malloc(), and sets *namelist to point to the beginning of that array. Both the array of pointers and the pointed-to struct dirent structures are allocated with malloc(); it is up to the calling code to use free() to avoid memory leaks.

Use the select function pointer to choose entries of interest. When this value is NULL, all valid directory entries are included in the final array. Otherwise, (*select)() is called for each entry, and those entries for which it returns nonzero (true) are included in the array.

The compare function pointer compares two directory entries. It is passed to qsort() for use in sorting.

alphasort() compares filenames lexicographically. It uses the strcoll() function for comparison. strcoll() is similar to strcmp() but takes locale-related sorting rules into consideration (see Section 13.4, “Can You Spell That for Me, Please?”, page 521).

versionsort() is a GNU extension, that uses the GNU strverscmp() function to compare filenames (see strverscmp(3)). To make a long story short, this function understands common filename versioning conventions and compares appropriately.

ch06-sortdir.c shows a program similar to ch04-catdir.c. However, it uses scandir() and alphasort() to do the work.

 1  /* ch06-sortdir.c --- Demonstrate scandir(), alphasort(). */
 2
 3  #include <stdio.h>       /* for printf() etc. */
 4  #include <errno.h>       /* for errno */
 5  #include <sys/types.h>   /* for system types */
 6  #include <dirent.h>      /* for directory functions */
 7
 8  char *myname;
 9  int process(const char *dir);
10
11  /* main --- loop over directory arguments */
12
13  int main(int argc, char **argv)
14  {
15      int i;
16      int errs = 0;
17
18      myname = argv[0];
19
20      if (argc == 1)
21          errs = process(".");   /* default to current directory */
22      else
23          for (i = 1; i < argc; i++)
24              errs += process(argv[i]);
25
26      return (errs != 0);
27  }
28
29  /* nodots --- ignore dot files, for use by scandir() */
30
31  int
32  nodots(const struct dirent *dp)
33  {
34     return (dp->d_name[0] != '.'),
35  }
36
37  /*
38   * process --- do something with the directory, in this case,
39   *             print inode/name pairs on standard output.
40   *             Return 0 if all OK, 1 otherwise.
41   */
42
43  int
44  process(const char *dir)
45  {
46      DIR *dp;
47      struct dirent **entries;
48      int nents, i;
49
50      nents = scandir(dir, & entries, nodots, alphasort);
51      if (nents < 0) {
52          fprintf(stderr, "%s: scandir failed: %s
", myname,
53                  strerror(errno));
54          return 1;
55      }
56
57      for (i = 0; i < nents; i++) {
58          printf("%8ld %s
", entries[i]->d_ino, entries[i]->d_name);
59          free(entries[i]);
60      }
61
62      free(entries);
63
64      return 0;
65  }

The main() program (lines 1–27) follows the standard boilerplate we’ve used before. The nodots() function (lines 31–35) acts as the select parameter, choosing only filenames that don’t begin with a period.

The process() function (lines 43–65) is quite simple, with scandir() doing most of the work. Note how each element is released separately with free() (line 59) and how the entire array is also released (line 62).

When run, the directory contents do indeed come out in sorted order, without ’.’ and ’..’:

$ ch06-sortdir                   Default actions displays current directory
 2097176 00-preface.texi
 2097187 01-intro.texi
 2097330 02-cmdline.texi
 2097339 03-memory.texi
 2097183 03-memory.texi.save
 2097335 04-fileio.texi
 2097334 05-fileinfo.texi
 2097332 06-general1.texi
...

Binary Searching: `bsearch()`

A linear search is pretty much what it sounds like: You start at the beginning, and walk through an array being searched until you find what you need. For something simple like finding integers, this usually takes the form of a for loop. Consider this function:

/* ifind --- linear search, return index if found or -1 if not */

int ifind(int x, const int array[], size_t nelems)
{
    size_t i;

    for (i = 0; i < nelems; i++)
        if (array[i] == x) /* found it */
            return i;

        return -1;
}

The advantage to linear searching is that it’s simple; it’s easy to write the code correctly the first time. Furthermore, it always works. Even if elements are added to the end of the array or removed from the array, there’s no need to sort the array.

The disadvantage to linear searching is that it’s slow. On average, for an array containing nelems elements, a linear search for a random element does ’nelems / 2’ comparisons before finding the desired element. This becomes prohibitively expensive, even on modern high-performance systems, as nelems becomes large. Thus, you should only use linear searching on small arrays.

Unlike a linear search, binary searching requires that the input array already be sorted. The disadvantage here is that if elements are added, the array must be re-sorted before it can be searched. (When elements are removed, the rest of the array contents must still be shuffled down. This is not as expensive as re-sorting, but it can still involve a lot of data motion.)

The advantage to binary searching, and it’s a significant one, is that binary searching is blindingly fast, requiring at most log₂(N) comparisons, where N is the number of elements in the array. The bsearch() function is declared as follows:

#include <stdlib.h>                                               ISO C

void *bsearch(const void *key, const void *base, size_t nmemb,
              size_t size, int (*compare)(const void *, const void *));

The parameters and their purposes are similar to those of qsort():

const void *key

The object being searched for in the array.

const void *base

The start of the array.

size_t nmemb

The number of elements in the array.

size_t size

The size of each element, obtained with sizeof.

int (*compare)(const void *, const void *)

The comparison function. It must work the same way as the qsort() comparison function, returning negative/zero/positive according to whether the first parameter is less than/equal to/greater than the second one.

bsearch() returns NULL if the object is not found. Otherwise, it returns a pointer to the found object. If more than one array element matches key, it is unspecified which one is returned. Thus, as with qsort(), make sure that the comparison function accounts for all relevant parts of the searched data structure.

ch06-searchemp.c shows bsearch() in practice, extending the struct employee example used previously.

  1  /* ch06-searchemp.c --- Demonstrate bsearch(). */
  2
  3  #include <stdio.h>
  4  #include <errno.h>
  5  #include <stdlib.h>
  6
  7  struct employee {
  8      char lastname[30];
  9      char firstname[30];
 10      long emp_id;
 11      time_t start_date;
 12  };
 13
 14  /* emp_id_compare --- compare by ID */
 15
 16  int emp_id_compare(const void *e1p, const void *e2p)
 17  {
 18      const struct employee *e1, *e2;
 19
 20      e1 = (const struct employee *) e1p;
 21      e2 = (const struct employee *) e2p;
 22
 23      if (e1->emp_id < e2->emp_id)
 24          return -1;
 25      else if (e1->emp_id == e2->emp_id)
 26          return 0;
 27      else
 28          return 1;
 29  }
 30
 31  /* print_employee --- print an employee structure */
 32
 33  void print_employee(const struct employee *emp)
 34  {
 35      printf("%s %s	%d	%s", emp->lastname, emp->firstname,
 36          emp->emp_id, ctime(& emp->start_date));
 37  }

Lines 7–12 define the struct employee; it’s the same as before. Lines 16–29 serve as the comparison function, for both qsort() and bsearch(). It compares on employee ID number only. Lines 33–37 define print_employee(), which is a convenience function for printing the structure since this is done from multiple places.

 39  /* main --- demonstrate sorting */
 40
 41  int main(int argc, char **argv)
 42  {
 43  #define NPRES 10
 44      struct employee presidents[NPRES];
 45      int i, npres;
 46      char buf[BUFSIZ];
 47      struct employee *the_pres;
 48      struct employee key;
 49      int id;
 50      FILE *fp;
 51
 52      if (argc != 2) {
 53          fprintf(stderr, "usage: %s datafile
", argv[0]);
 54          exit(1);
 55      }
 56
 57      if ((fp = fopen(argv[1], "r")) == NULL) {
 58          fprintf(stderr, "%s: %s: could not open: %s
", argv[0],
 59                  argv[1], strerror(errno));
 60          exit(1);
 61      }
 62
 63      /* Very simple code to read data: */
 64      for (npres = 0; npres < NPRES && fgets(buf, BUFSIZ, fp) != NULL;
 65              npres++) {
 66          sscanf(buf, "%s %s %ld %ld",
 67              presidents[npres].lastname,
 68              presidents[npres].firstname,
 69              & presidents[npres].emp_id,
 70              & presidents[npres].start_date);
 71      }
 72      fclose(fp);
 73
 74      /* npres is now number of actual lines read. */
 75
 76      /* First, sort by id */
 77      qsort(presidents, npres, sizeof(struct employee), emp_id_compare);
 78
 79      /* Print output */
 80      printf("Sorted by ID:
");
 81      for (i = 0; i < npres; i++) {
 82         putchar('	'),
 83         print_employee(& presidents[i]);
 84      }
 85
 86      for (;;) {
 87         printf("Enter ID number: ");
 88         if (fgets(buf, BUFSIZ, stdin) == NULL)
 89             break;
 90
 91         sscanf(buf, "%d
", & id);
 92         key.emp_id = id;
 93         the_pres = (struct employee *) bsearch(& key, presidents, npres,
 94                 sizeof(struct employee), emp_id_compare);
 95
 96         if (the_pres != NULL) {
 97             printf("Found: ");
 98             print_employee(the_pres);
 99         } else
100             printf("Employee with ID %d not found!
", id);
101    }
102
103    putchar('
'), /* Print a newline on EOF. */
104
105    exit(0);
106  }

The main() function starts with argument checking (lines 52–55). It then reads the data from the named file (lines 57–72). Standard input cannot be used for the employee data, since that is reserved for prompting the user for the employee ID to search for.

Lines 77–84 sort the data and then print them. The program then goes into a loop, starting on line 86. It prompts for an employee ID number, exiting the loop upon end-of-file. To search the array, we use the struct employee named key. It’s enough to set just its emp_id field to the entered ID number; none of the other fields are used in the comparison (line 92).

If an entry is found with the matching key, bsearch() returns a pointer to it. Otherwise it returns NULL. The return is tested on line 96, and appropriate action is then taken. Finally, line 102 prints a newline character so that the system prompt will come out on a fresh line. Here’s a transcript of what happens when the program is compiled and run:

$ ch06-searchemp presdata.txt                          Run the program
Sorted by ID:
     Carter James     39   Thu Jan 20 13:00:00 1977
     Reagan Ronald    40   Tue Jan 20 13:00:00 1981
     Bush George      41   Fri Jan 20 13:00:00 1989
     Clinton William  42   Wed Jan 20 13:00:00 1993
     Bush George      43   Sat Jan 20 13:00:00 2001
Enter ID number: 42                                   Enter a valid number
Found: Clinton William 42 Wed Jan 20 13:00:00 1993    It's found
Enter ID number: 29      Enter an invalid number
Employee with ID 29 not found!                        It's not found
Enter ID number: 40      Try another good one
Found: Reagan Ronald 40 Tue Jan 20 13:00:00 1981      This one is found too
Enter ID number: ^D                                   CTRL-D entered for EOF
$                                                     Ready for next command

Additional, more advanced, APIs for searching data collections are described in Section 14.4, “Advanced Searching with Binary Trees,” page 551.

User and Group Names

While the operating system works with user and group ID numbers for storage of file ownership and for permission checking, humans prefer to work with user and group names.

Early Unix systems kept the information that mapped names to ID numbers in simple text files, /etc/passwd and /etc/group. These files still exist on modern systems, and their format is unchanged from that of V7 Unix. However, they no longer tell the complete story. Large installations with many networked hosts keep the information in network databases: ways of storing the information on a small number of servers that are then accessed over the network.^[4]. However, this usage is transparent to most applications since access to the information is done through the same API as was used for retrieving the information from the text files. It is for this reason that POSIX standardizes only the APIs; the /etc/passwd and /etc/group files need not exist, as such, for a system to be POSIX compliant.

The APIs to the two databases are similar; most of our discussion focuses on the user database.

User Database

The traditional /etc/passwd format maintains one line per user. Each line has seven fields, each of which is separated from the next by a colon character:

$ grep arnold /etc/passwd
arnold:x:2076:10:Arnold D. Robbins:/home/arnold:/bin/bash

In order, the fields are as follows:

The user name

This is what the user types to log in, what shows up for ’ls -l’ and in any other context that displays users.

The password field

On older systems, this is the user’s encrypted password. On newer systems, this field is likely to be an x (as shown), meaning that the password information is held in a different file. This separation is a security measure; if the encrypted password isn’t available to nonprivileged users, it is much harder to “crack.”

The user ID number

This should be unique; one number per user.

The group ID number

This is the user’s initial group ID number. As is discussed later, on modern systems processes have multiple groups associated with them.

The user’s real name

This is at least a first and last name. Some systems allow for comma-separated fields, for office location, phone number, and so on, but this is not standardized.

The login directory

This directory becomes the home directory for users when they log in ($HOME—the default for the cd command).

The login program

The program to run when the user logs in. This is usually a shell, but it need not be. If this field is left empty, the default is /bin/sh.

Access to the user database is through the routines declared in <pwd.h>:

#include <sys/types.h>                        XSI
#include <pwd.h>

struct passwd *getpwent(void);
void setpwent(void);
void endpwent(void);

struct passwd *getpwnam(const char *name);
struct passwd *getpwuid(uid_t uid);

The fields in the struct passwd used by the various API routines correspond directly to the fields in the password file:

struct passwd {
    char    *pw_name;      /* user name */
    char    *pw_passwd;    /* user password */
    uid_t   pw_uid;        /* user id */
    gid_t   pw_gid;        /* group id */
    char    *pw_gecos;     /* real name */
    char    *pw_dir;       /* home directory */
    char    *pw_shell;     /* shell program */
};

(The name pw_gecos is historical; when the early Unix systems were being developed, this field held the corresponding information for the user’s account on the Bell Labs Honeywell systems running the GECOS operating system.)

The purpose of each routine is described in the following list.

struct passwd *getpwent(void)

Returns a pointer to an internal static struct passwd structure containing the “current” user’s information. This routine reads through the entire password database, one record at a time, returning a pointer to a structure for each user. The same pointer is returned each time; that is, the internal struct passwd is overwritten for each user’s entry. When getpwent() reaches the end of the password database, it returns NULL. Thus, it lets you step through the entire database, one user at a time. The order in which records are returned is undefined.

void setpwent(void)

Resets the internal state such that the next call to getpwent() returns the first record in the password database.

void endpwent(void)

“Closes the database,” so to speak, be it a simple file, network connection, or something else.

struct passwd *getpwnam(const char *name)

Looks up the user with a pw_name member equal to name, returning a pointer to a static struct passwd describing the user or NULL if the user is not found.

struct passwd *getpwuid(uid_t uid)

Similarly, looks up the user with the user ID number given by uid, returning a pointer to a static struct passwd describing the user or NULL if the user is not found.

getpwuid() is what’s needed when you have a user ID number (such as from a struct stat) and you wish to print the corresponding user name. getpwnam() converts a name to a user ID number, for example, if you wish to use chown() or fchown() on a file. In theory, both of these routines do a linear search through the password database to find the desired information. This is true in practice when a password file is used; however, behind-the-scenes databases (network or otherwise, as on BSD systems) tend to use more efficient methods of storage, so these calls are possibly not as expensive in such a case.^[5].

getpwent() is useful when you need to go through the entire password database. For instance, you might wish to read it all into memory, sort it, and then search it quickly with bsearch(). This is very useful for avoiding the multiple linear searches inherent in looking things up one at a time with getpwuid() or getpwnam().

Note

The pointers returned by getpwent(), getpwnam(), and getpwuid() all point to internal static data. Thus, you should make a copy of their contents if you need to save the information.

Take a good look at the struct passwd definition. The members that represent character strings are pointers; they too point at internal static data, and if you’re going to copy the structure, make sure to copy the data each member points to as well.

Group Database

The format of the /etc/group group database is similar to that of /etc/passwd, but with fewer fields:

$ grep arnold /etc/group
mail:x:12:mail,postfix,arnold
uucp:x:14:uucp,arnold
floppy:x:19:arnold
devel:x:42:miriam,arnold
arnold:x:2076:arnold

Again, there is one line per group, with fields separated by colons. The fields are as follows:

The group name

This is the name of the group, as shown in ’ls -l’ or in any other context in which a group name is needed.

The group password

This field is historical. It is no longer used.

The group ID number

As with the user ID, this should be unique to each group.

The user list

This is a comma-separated list of users who are members of the group.

In the previous example, we see that user arnold is a member of multiple groups. This membership is reflected in practice in what is termed the group set. Besides the main user ID and group ID number that processes have, the group set is a set of additional group ID numbers that each process carries around with it. The system checks all of these group ID numbers against a file’s group ID number when performing permission checking. This subject is discussed in more detail in Chapter 11, “Permissions and User and Group ID Numbers,” page 403.

The group database APIs are similar to those for the user database. The following functions are declared in <grp.h>:

#include  <sys/types.h>                              XSI
#include  <grp.h>

struct  group *getgrent(void);
void setgrent(void);
void endgrent(void);

struct group *getgrnam(const char *name);
struct group *getgrgid(gid_t gid);

The struct group corresponds to the records in /etc/group:

struct group {
    char   *gr_name;      /* group name */
    char   *gr_passwd;    /* group password */
    gid_t  gr_gid;        /* group id */
    char   **gr_mem;      /* group members */
};

The gr_mem field bears some explanation. While declared as a pointer to a pointer (char **), it is best thought of as an array of strings (like argv). The last element in the array is set to NULL. When no members are listed, the first element in the array is NULL.

ch06-groupinfo.c demonstrates how to use the struct group and the gr_mem field. The program accepts a single user name on the command line and prints all group records in which that user name appears:

 1  /* ch06-groupinfo.c --- Demonstrate getgrent() and struct group */
 2
 3  #include <stdio.h>
 4  #include <sys/types.h>
 5  #include <grp.h>
 6
 7  extern void print_group(const struct group *gr);
 8
 9  /* main --- print group lines for user named in argv[1] */
10
11  int
12  main(int argc, char **argv)
13  {
14      struct group *gr;
15      int i;
16
17      if (argc != 2) {                                      Check arguments
18         fprintf(stderr, "usage: %s user
", argv[0]);
19         exit(1);
20      }
21
22      while ((gr = getgrent()) != NULL)                     Get each group record
23          for (i = 0; gr->gr_mem[i] != NULL; i++)           Look at each member
24              if (strcmp(gr->gr_mem[i], argv[1]) == 0)      If found the user...
25                 print_group(gr);                           Print the record
26
27      endgrent();
28
29      exit(0);
30 }

The main() routine first does error checking (lines 17–20). The heart of the program is a nested loop. The outer loop (line 22) loops over all the group database records. The inner loop (line 23) loops over the members of the gr_mem array. If one of the members matches the name from the command line (line 24), then print_group() is called to print the record (line 25).

32  /* print_group --- print a group record */
33
34  void
35  print_group(const struct group *gr)
36  {
37      int i;
38
39      printf("%s:%s:%ld:", gr->gr_name, gr->gr_passwd, (long) gr->gr_gid);
40
41      for (i = 0; gr->gr_mem[i] != NULL; i++) {
42          printf("%s", gr->gr_mem[i]);
43          if (gr->gr_mem[i+1] != NULL)
44              putchar(','),
45      }
46
47      putchar('
'),
48  }

The print_group() function (lines 34–48) is straightforward, with logic similar to that of main() for printing the member list. Group list members are comma separated; thus, the loop body has to check that the next element in the array is not NULL before printing a comma. This code works correctly, even if there are no members in the group. However, for this program, we know there are members, or print_group() wouldn’t have been called! Here’s what happens when the program is run:

$ ch06-groupinfo arnold
mail:x:12:mail,postfix,arnold
uucp:x:14:uucp,arnold
floppy:x:19:arnold
devel:x:42:miriam,arnold
arnold:x:2076:arnold

Terminals: `isatty()`

The Linux/Unix standard input, standard output, standard error model discourages the special treatment of input and output devices. Programs generally should not need to know, or care, whether their output is a terminal, a file, a pipe, a physical device, or whatever.

However, there are times when a program really does need to know what kind of a file a file descriptor is associated with. The stat() family of calls often provides enough information: regular file, directory, device, and so on. Sometimes though, even that is not enough, and for interactive programs in particular, you may need to know if a file descriptor represents a tty

A tty (short for Teletype, one of the early manufacturers of computer terminals) is any device that represents a terminal, that is, something that a human would use to interact with the computer. This may be either a hardware device, such as the keyboard and monitor of a personal computer, an old-fashioned video display terminal connected to a computer by a serial line or modem, or a software pseudoterminal, such as is used for windowing systems and network logins.

The discrimination can be made with isatty():

#include <unistd.h>                                  POSIX

int isatty(int desc);

This function returns 1 if the file descriptor desc represents a terminal, 0 otherwise. According to POSIX, isatty() may set errno to indicate an error; thus you should set errno to 0 before calling isatty() and then check its value if the return is 0. (The GNU/Linux isatty(3) manpage doesn’t mention the use of errno.) The POSIX standard also points out that just because isatty() returns 1 doesn’t mean there’s a human at the other end of the file descriptor!

One place where isatty() comes into use is in modern versions of ls, in which the default is to print filenames in columns if the standard output is a terminal and to print them one per line if not.

Summary

Times are stored internally as time_t values, representing “seconds since the Epoch.” The Epoch is Midnight, January 1, 1970 UTC for GNU/Linux and Unix systems. The current time is retrieved from the system by the time() system call, and difftime() returns the difference, in seconds, between two time_t values.
The struct tm structure represents a “broken-down time,” which is a much more usable representation of a date and time. gmtime() and localtime() convert time_t values into struct tm values, and mktime() goes in the opposite direction.
asctime() and ctime() do simplistic formatting of time values, returning a pointer to a fixed-size, fixed-format static character string. strftime() provides much more flexible formatting, including locale-based values.
Time-zone information is made available by a call to tzset(). Since the standard routines act as if they call tzset() automatically, it is rare to need to call this function directly.
The standard routine for sorting arrays is qsort(). By using a user-provided comparison function and being told the number of array elements and their size, qsort() can sort any kind of data. This provides considerable flexibility.
scandir() reads an entire directory into an array of struct dirent. User-provided functions can be used to select which entries to include and can provide ordering of elements within the array. alphasort() is a standard function for sorting directory entries by name; scandir() passes the sorting function straight through to qsort().
The bsearch() function works similarly to qsort(). It does fast binary searching. Use it if the cost of linear searching outweighs the cost of sorting your data. (An additional API for searching data collections is described in Section 14.4, “Advanced Searching with Binary Trees,” page 551.)
The user and group databases may be kept in local disk files or may be made available over a network. The standard API purposely hides this distinction. Each database provides both linear scanning of the entire database and direct queries for a user/group name or user/group ID.
Finally, for those times when stat() just isn’t enough, isatty() can tell you whether or not an open file represents a terminal device.

Exercises

Write a simple version of the date command that accepts a format string on the command line and uses it to format and print the current time.

When a file is more than six months old, ’ls -l’ uses a simpler format for printing the modification time. The file GNU version of ls.c uses this computation:

3043 /* Consider a time to be recent if it is within the past six
3044    months. A Gregorian year has 365.2425 * 24 * 60 * 60 ==
3045    31556952 seconds on the average. Write this value as an
3046    integer constant to avoid floating point hassles. */
3047  six_months_ago = current_time - 31556952 / 2;

Compare this to our example computation for computing the time six months in the past. What are the advantages and disadvantages of each method?

Write a simple version of the touch command that changes the modification time of the files named on the command line to the current time.
Add an option to your touch command that accepts a date and time specification on the command line and uses that value as the new modification time of the files named on the command line.
Add another option to your version of touch that takes a filename and uses the modification time of the given file as the new modification time for the files named on the command line.
Enhance ch06-sortemp.c to sort a separate array of pointers that point into the array of employees.
Add options to ch06-sortdir.c to sort by inode number, modification time, access time, and size. Add a “reverse option” such that time-based sorts make the most recent file first and other criteria (size, inode) sort by largest value first.
Write a simple version of the chown command. Its usage should be
```
chown user[:group] files ...
```
Here, user and group are user and group names representing the new user and group for the named files. The group is optional; if present it is separated from the user by a colon.
To test your version on a GNU/Linux system, you will have to work as root. Do so carefully!
Enhance your chown to allow numeric user or group numbers, as well as names.
Write functions to copy user and group structures, including pointed-to data. Use malloc() to allocate storage as needed.
Write a specialized user-lookup library that reads the entire user database into a dynamically allocated array. Provide fast lookup of users, by both user ID number and name. Be sure to handle the case in which a requested user isn’t found.
Do the same thing for the group database.
Write a stat program that prints the contents of the struct stat for each file named on the command line. It should print all the values in human-readable format: time_t values as dates and times, uid_t and gid_t values as the corresponding names (if available), and the contents of symbolic links. Print the st_mode field the same way that ls would.
Compare your program to the GNU Coreutils stat program, both by comparing outputs and by looking at the source code.

^[1]Although POSIX standardizes TZ’s format, it isn’t all that interesting, so we haven’t bothered to document it here. After all, it is tzset() that has to understand the format, not user-level code. Implementations can, and do, use formats that extend POSIX.

^[2]The output shown here is for U.S. Eastern Standard Time. You will get different results for the same program and data if you use a different time zone.

^[3]One notable exception is Sun’s Solaris, where these two functions exist only in the hard-to-use BSD compatibility library.

^[4]Common network databases include Sun Microsystems’ Network Information Service (NIS) and NIS+, Kerberos (Hesiod), MacOS X NetInfo (versions up to and including 10.2), and LDAP, the Lightweight Directory Access Protocol. BSD systems keep user information in on-disk databases and generate the /etc/passwd and /etc/group files automatically.

^[5]Unfortunately, if performance is an issue, there’s no standard way to know how your library does things, and indeed, the way it works can vary at runtime! (See the nsswitch.conf(5) manpage on a GNU/Linux system.) On the other hand, the point of the API is, after all, to hide the details.

^[6]http://www.gtk.org

^[7]http://www.gnome.org

^[8]http://www.gtk.org/rdp

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. General Library Interfaces — Part 1

Create new playlist

Sign In

Sign Up

Chapter 6. General Library Interfaces — Part 1

Times and Dates

Retrieving the Current Time: time() and difftime()

Breaking Down Times: gmtime() and localtime()

Formatting Dates and Times

Simple Time Formatting: asctime() and ctime()

Complex Time Formatting: strftime()

Note

Converting a Broken-Down Time to a time_t

Getting Time-Zone Information

Note

BSD Systems Gotcha: timezone(), Not timezone

Sorting and Searching Functions

Sorting: qsort()

Example: Sorting Employees

Note

Note

Example: Sorting Directory Contents

Binary Searching: bsearch()

User and Group Names

User Database

Note

Group Database

Terminals: isatty()

Suggested Reading

Summary

Exercises

Table of Contents for
6. General Library Interfaces — Part 1

Retrieving the Current Time: `time()` and `difftime()`

Breaking Down Times: `gmtime()` and `localtime()`

Simple Time Formatting: `asctime()` and `ctime()`

Complex Time Formatting: `strftime()`

Converting a Broken-Down Time to a `time_t`

BSD Systems Gotcha: `timezone()`, Not `timezone`

Sorting: `qsort()`

Binary Searching: `bsearch()`

Terminals: `isatty()`