Chapter 7. Putting It All Together: ls

In this chapter

The V7 ls command nicely ties together everything we’ve seen so far. It uses almost all of the APIs we’ve covered, touching on many aspects of Unix programming: memory allocation, file metadata, dates and times, user names, directory reading, and sorting.

V7 ls Options

In comparison to modern versions of ls, the V7 ls accepted only a handful of options and the meaning of some of them is different for V7 than for current ls. The options are as follows:

-a

Print all directory entries. Without this, don’t print ’.’ and ’..’. Interestingly enough, V7 ls ignores only ’.’ and ’..’, while V1 through V6 ignore any file whose name begins with a period. This latter behavior is the default in modern versions of ls, as well.

-c

Use the inode change time, instead of the modification time, with -t or -l.

-d

For directory arguments, print information about the directory itself, not its contents.

-f

“Force” each argument to be read as a directory, and print the name found in each slot. This options disables -l, -r, -s and -t, and enables -a. (This option apparently existed for filesystem debugging and repair.)

-g

For ls -l’, use the group name instead of the user name.

-i

Print the inode number in the first column along with the filename or the long listing.

-l

Provide the familiar long format output. Note, however, that V7 ls -l printed only the user name, not the user and group names together.

-r

Reverse the sort order, be it alphabetic for filenames or by time.

-s

Print the size of the file in 512-byte blocks. The V7 ls(1) manpage states that indirect blocks—blocks used by the filesystem for locating the data blocks of large files—are also included in the computation, but, as we shall see, this statement was incorrect.

-t

Sort the output by modification time, most recent first, instead of by name.

-u

Use the access time instead of the modification time with -t and/or -l.

The biggest differences between V7 ls and modern ls concern the -a option and the -l option. Modern systems omit all dot files unless -a is given, and they include both user and group names in the -l long listing. On modern systems, -g is taken to mean print only the group name, and -o means print only the user name. For what it’s worth, GNU ls has over 50 options!

V7 ls Code

The file /usr/src/cmd/ls.c in the V7 distribution contains the code. It is all of 425 lines long.

  1  /*
  2   * list file or directory
  3   */
  4
  5  #include <sys/param.h>
  6  #include <sys/stat.h>
  7  #include <sys/dir.h>
  8  #include <stdio.h>
  9
 10  #define NFILES 1024
 11  FILE    *pwdf, *dirf;
 12  char    stdbuf[BUFSIZ];
 13
 14  struct lbuf {                            Collects needed info
 15      union {
 16          char    lname[15];
 17          char    *namep;
 18      } ln;
 19      char    ltype;
 20      short   lnum;
 21      short   lflags;
 22      short   lnl;
 23      short   luid;
 24      short   lgid;
 25      long    lsize;
 26      long    lmtime;
 27  };
 28
 29  int aflg, dflg, lflg, sflg, tflg, uflg, iflg, fflg, gflg, cflg;
 30  int rflg   = 1;
 31  long   year;                            Global variables: auto init to 0
 32  int flags;
 33  int lastuid = -1;
 34  char    tbuf[16];
 35  long    tblocks;
 36  int statreq;
 37  struct  lbuf   *flist[NFILES];
 38  struct  lbuf   **lastp = flist;
 39  struct  lbuf   **firstp = flist;
 40  char    *dotp   = ".";
 41
 42  char    *makename();                     char *makename(char *dir, char *file);
 43  struct  lbuf *gstat();                   struct lbuf *gstat(char *file, int argfl);
 44  char    *ctime();                        char *ctime(time_t *t);
 45  long    nblock();                        long nblock(long size);
 46
 47  #define ISARG  0100000

The program starts with file inclusions (lines 5–8) and variable declarations. The struct lbuf (lines 14–27) encapsulates the parts of the struct stat that are of interest to ls. We see later how this structure is filled.

The variables aflg, dflg, and so on (lines 29 and 30) all indicate the presence of the corresponding option. This variable naming style is typical of V7 code. The flist, lastp, and firstp variables (lines 37–39) represent the files that ls reports information about. Note that flist is a fixed-size array, allowing no more than 1024 files to be processed. We see shortly how all these variables are used.

After the variable declarations come function declarations (lines 42–45), and then the definition of ISARG, which distinguishes a file named on the command line from a file found when a directory is read.

 49  main(argc, argv)                               int main(int argc, char **argv)
 50  char *argv[];
 51  {
 52      int i;
 53      register struct lbuf *ep, **ep1;           Variable and function declarations
 54      register struct lbuf **slastp;
 55      struct lbuf **epp;
 56      struct lbuf lb;
 57      char *t;
 58      int compar();
 59
 60      setbuf(stdout, stdbuf);
 61      time(&lb.lmtime);                          Get current time
 62      year = lb.lmtime - 6L*30L*24L*60L*60L; /* 6 months ago */

The main() function starts by declaring variables and functions (lines 52–58), setting the buffer for standard output, retrieving the time of day (lines 60–61), and computing the seconds-since-the-Epoch value for approximately six months ago (line 62). Note that all the constants have the L suffix, indicating the use of long arithmetic.

 63    if (--argc > 0 && *argv[1] == '-') {
 64       argv++;
 65       while (*++*argv) switch (**argv) {   Parse options
 66
 67       case 'a':                            All directory entries
 68           aflg++;
 69           continue;
 70
 71       case 's':                            Size in blocks
 72           sflg++;
 73           statreq++;
 74           continue;
 75
 76       case 'd':                            Directory info, not contents
 77           dflg++;
 78           continue;
 79
 80       case 'g':                            Group name instead of user name
 81           gflg++;
 82           continue;
 83
 84       case 'l':                            Long listing
 85           lflg++;
 86           statreq++;
 87           continue;
 88
 89       case 'r':                            Reverse sort order
 90           rflg = -1;
 91           continue;
 92
 93       case 't':                            Sort by time, not name
 94           tflg++;
 95           statreq++;
 96           continue;
 97
 98       case 'u':                            Access time, not modification time
 99           uflg++;
100           continue;
101
102       case 'c':                            Inode change time, not modification time
103           cflg++;
104           continue;
105
106       case 'i':                            Include inode number
107           iflg++;
108           continue;
109
110       case 'f':                            Force reading each arg as directory
111           fflg++;
112           continue;
113
114       default:                             Ignore unknown option letters
115           continue;
116       }
117       argc--;
118    }

Lines 63–118 parse the command-line options. Note the manual parsing code: getopt() hadn’t been invented yet. The statreq variable is set to true when an option requires the use of the stat() system call.

Avoiding an unnecessary stat() call on each file is a big performance win. The stat() call was particularly expensive, because it could involve a disk seek to the inode location, a disk read to read the inode, and then a disk seek back to the location of the directory contents (in order to continue reading directory entries).

Modern systems have the inodes in groups, spread out throughout a filesystem instead of clustered together at the front. This makes a noticeable performance improvement. Nevertheless, stat() calls are still not free; you should use them as needed, but not any more than that.

119     if (fflg) {                        -f overrides -l, -s, -t, adds -a
120         aflg++;
121         lflg = 0;
122         sflg = 0;
123         tflg = 0;
124         statreq = 0;
125     }
126     if(lflg) {                         Open password or group file
127          t = "/etc/passwd";
128          if(gflg)
129              t = "/etc/group";
130          pwdf = fopen(t, "r");
131     }
132     if (argc==0) {                     Use current dir if no args
133         argc++;
134         argv = &dotp - 1;
135     }

Lines 119–125 handle the -f option, turning off -1, -s, -t, and statreq. Lines 126–131 handle -l, setting the file to be read for user or group information. Remember that the V7 ls shows only one or the other, not both.

If no arguments are left, lines 132–135 set up argv such that it points at a string representing the current directory. The assignment ’argv = &dotp - 1’ is valid, although unusual. The ’- 1’ compensates for the ’++argv’ on line 137. This avoids special case code for ’argc == 1’ in the main part of the program.

136     for (i=0; i < argc; i++) {               Get info about each file
137         if ((ep = gstat(*++argv, 1))==NULL)
138             continue;
139         ep->ln.namep = *argv;
140         ep->lflags |= ISARG;
141     }
142     qsort(firstp, lastp - firstp, sizeof *lastp, compar);
143     slastp = lastp;
144     for (epp=firstp; epp<slastp; epp++) {     Main code, see text
145         ep = *epp;
146         if (ep->ltype=='d' && dflg==0 ||fflg) {
147             if (argc>1)
148                 printf("
%s:
", ep->ln.namep);
149             lastp = slastp;
150             readdir(ep->ln.namep);
151             if (fflg==0)
152                 qsort(slastp,lastp - slastp,sizeof *lastp,compar);
153             if (lflg || sflg)
154                 printf("total %D
", tblocks);
155             for (ep1=slastp; ep1<lastp; ep1++)
156                pentry(*ep1);
157         } else
158             pentry(ep);
159     }
160     exit(0);
161  }                                             End of main()

Lines 136–141 loop over the arguments, gathering information about each one. The second argument to gstat() is a boolean: true if the name is a command-line argument, false otherwise. Line 140 adds the ISARG flag to the lflags field for each command-line argument.

The gstat() function adds each new struct lbuf into the global flist array (line 137). It also updates the lastp global pointer to point into this array at the current last element.

Lines 142–143 sort the array, using qsort(), and save the current value of lastp in slastp. Lines 144–159 loop over each element in the array, printing file or directory info, as appropriate.

The code for directories deserves further explication:

if (ep->ltype=='d' && dflg==0 || fflg) ...

  • Line 146. If the file type is directory and if -d was not provided or if -f was, then ls has to read the directory instead of printing information about the directory itself.

if (argc>1) printf(" %s: ", ep->ln.namep)

  • Lines 147–148. Print the directory name and a colon if multiple files were named on the command line.

lastp = slastp; readdir(ep->ln.namep)

  • Lines 149–150. Reset lastp from slastp. The flist array acts as a two-level stack of filenames. The command-line arguments are kept in firstp through slastp - 1. When readdir() reads a directory, it puts the struct lbuf structures for the directory contents onto the stack, starting at slastp and going through lastp. This is illustrated in Figure 7.1.

    The flist array as a two-level stack

    Figure 7.1. The flist array as a two-level stack

if (fflg==0) qsort(slastp,lastp - slastp,sizeof *lastp,compar)

  • Lines 151–152. Sort the subdirectory entries if -f is not in effect.

if (lflg || sflg) printf("total %D ", tblocks)

  • Lines 153–154. Print the total number of blocks used by files in the directory, for -l or -s. This total is kept in the variable tblocks, which is reset for each directory. The %D format string for printf() is equivalent to %ld on modern systems; it means “print a long integer.” (V7 also had %ld, see line 192.)

for (ep1=slastp; ep1<lastp; ep1++) pentry(*ep1)

  • Lines 155–156. Print the information about each file in the subdirectory. Note that the V7 ls descends only one level in a directory tree. It lacks the modern -R “recursive” option.

163  pentry(ap)                                 void pentry(struct lbuf *ap)
164  struct lbuf *ap;
165  {
166      struct { char dminor, dmajor;};        Unused historical artifact from V6 ls
167      register t;
168      register struct lbuf *p;
169      register char *cp;
170
171      p = ap;
172      if (p->lnum == -1)
173          return;
174      if (iflg)
175          printf("%5u ", p->lnum);           Inode number
176      if (sflg)
177      printf("%4D ", nblock(p->lsize));      Size in blocks

The pentry() routine prints information about a file. Lines 172–173 check whether the lnum field is -1, and return if so. When ’p->lnum == -1’ is true, the struct lbuf is not valid. Otherwise, this field is the file’s inode number.

Lines 174–175 print the inode number if -i is in effect. Lines 176–177 print the total number of blocks if -s is in effect. (As we see below, this number may not be accurate.)

178     if (lflg) {                             Long listing:
179         putchar(p->ltype);                  – File type
180         pmode(p->lflags);                   – Permissions
181         printf("%2d ", p->lnl);             – Link count
182         t = p->luid;
183         if(gflg)
184             t = p->lgid;
185         if (getname(t, tbuf)==0)
186             printf("%-6.6s", tbuf);         – User or group
187         else
188             printf("%-6d", t);
189         if (p->ltype=='b' || p->ltype=='c') – Device: major and minor numbers
190             printf("%3d,%3d", major((int)p->lsize), minor((int)p->lsize));
191         else
192             printf("%7ld", p->lsize);       – Size in bytes
193         cp = ctime(&p->lmtime);
194         if(p->lmtime < year)                – Modification time
195             printf(" %-7.7s %-4.4s ", cp+4, cp+20); else
196             printf(" %-12.12s ", cp+4);
197     }
198     if (p->lflags&ISARG)                    – Filename
199         printf("%s
", p->ln.namep);
200     else
201         printf("%.14s
", p->ln.lname);
202  }

Lines 178–197 handle the -l option. Lines 179–181 print the file’s type, permissions, and number of links. Lines 182–184 set t to the user ID or the group ID, based on the -g option. Lines 185–188 retrieve the corresponding name and print it if available. Otherwise, the program prints the numeric value.

Lines 189–192 check whether the file is a block or character device. If it is, they print the major and minor device numbers, extracted with the major() and minor() macros. Otherwise, they print the file’s size.

Lines 193–196 print the time of interest. If it’s older than six months, the code prints the month, day, and year. Otherwise, it prints the month, day, and time (see Section 6.1.3.1, “Simple Time Formatting: asctime() and ctime(),” page 170, for the format of ctime()’s result).

Finally, lines 198–201 print the filename. For a command-line argument, we know it’s a zero-terminated string, and %s can be used. For a file read from a directory, it may not be zero-terminated, and thus an explicit precision, %.14s, must be used.

204  getname(uid, buf)                             int getname(int uid, char buf[])
205  int uid;
206  char buf[];
207  {
208      int j, c, n, i;
209
210      if (uid==lastuid)                         Simple caching, see text
211          return(0);
212      if(pwdf == NULL)                          Safety check
213          return(-1);
214      rewind(pwdf);                             Start at front of file
215      lastuid = -1;
216      do {
217          i = 0;                                Index in buf array
218          j = 0;                                Counts fields in line
219          n = 0;                                Converts numeric value
220          while((c=fgetc(pwdf)) != '
') {      Read lines
221              if (c==EOF)
222                  return(-1);
223              if (c==':') {                     Count fields
224                  j++;
225                  c = '0';
226              }
227              if (j==0)                         First field is name
228                  buf[i++] = c;
229              if (j==2)                         Third field is numeric ID
230                  n = n*10 + c - '0';
231         }
232   } while (n != uid);                          Keep searching until ID found
233   buf[i++] = '';
234   lastuid = uid;
235   return(0);
236 }

The getname() function converts a user or group ID number into the corresponding name. It implements a simple caching scheme; if the passed-in uid is the same as the global variable lastuid, then the function returns 0, for OK; the buffer will already contain the name (lines 210–211). lastuid is initialized to -1 (line 33), so this test fails the first time getname() is called.

pwdf is already open on either /etc/passwd or /etc/group (see lines 126–130). The code here checks that the open succeeded and returns -1 if it didn’t (lines 212–213).

Surprisingly, ls does not use getpwuid() or getgrgid(). Instead, it takes advantage of the facts that the format of /etc/passwd and /etc/group is identical for the first three fields (name, password, numeric ID) and that both use a colon as separator.

Lines 216–232 implement a linear search through the file. j counts the number of colons seen so far: 0 for the name and 2 for the ID number. Thus, while scanning the line, it fills in both the name and the ID number.

Lines 233–235 terminate the name buffer, set the global lastuid to the found ID number, and return 0 for OK.

238  long                                        long nblock(long size)
239  nblock(size)
240  long size;
241  {
242       return((size+511)>>9);
243  }

The nblock() function reports how many disk blocks the file uses. This calculation is based on the file’s size as returned by stat(). The V7 block size was 512 bytes—the size of a physical disk sector.

The calculation on line 242 looks a bit scary. The ’>>9’ is a right-shift by nine bits. This divides by 512, to give the number of blocks. (On early hardware, a right-shift was much faster than division.) So far, so good. Now, a file of even one byte still takes up a whole disk block. However, ’1 / 512’ comes out as zero (integer division truncates), which is incorrect. This explains the ’size+511’. By adding 511, the code ensures that the sum produces the correct number of blocks when it is divided by 512.

This calculation is only approximate, however. Very large files also have indirect blocks. Despite the claim in the V7 ls(1) manpage, this calculation does not account for indirect blocks.

Furthermore, consider the case of a file with large holes (created by seeking way past the end of the file with lseek()). Holes don’t occupy disk blocks; however, this is not reflected in the size value. Thus, the calculation produced by nblock(), while usually correct, could produce results that are either smaller or larger than the real case.

For these reasons, the st_blocks member was added into the struct stat at 4.2 BSD, and then picked up for System V and POSIX.

245  int m1[] = { 1, S_IREAD>>0, 'r', '-' };
246  int m2[] = { 1, S_IWRITE>>0, 'w', '-' };
247  int m3[] = { 2, S_ISUID, 's', S_IEXEC>>0, 'x', '-' };
248  int m4[] = { 1, S_IREAD>>3, 'r', '-' };
249  int m5[] = { 1, S_IWRITE>>3, 'w', '-' };
250  int m6[] = { 2, S_ISGID, 's', S_IEXEC>>3, 'x', '-' };
251  int m7[] = { 1, S_IREAD>>6, 'r', '-' };
252  int m8[] = { 1, S_IWRITE>>6, 'w', '-' };
253  int m9[] = { 2, S_ISVTX, 't', S_IEXEC>>6, 'x', '-' };
254
255  int *m[] = { m1, m2, m3, m4, m5, m6, m7, m8, m9};
256
257  pmode(aflag)                                   void pmode(int aflag)
258  {
259      register int **mp;
260
261      flags = aflag;
262      for (mp = &m[0]; mp < &m[sizeof(m)/sizeof(m[0])];)
263         select(*mp++);
264  }
265
266  select(pairp)                                 void select(register int *pairp)   
267  register int *pairp;
268  {
269      register int n;
270
271      n = *pairp++;
272      while (--n>=0 && (flags&*pairp++)==0)
273          pairp++;
274      putchar(*pairp);
275  }

Lines 245–275 print the file’s permissions. The code is compact and rather elegant; it requires careful study.

  • Lines 245–253: The arrays m1 through m9 encode the permission bits to check for along with the corresponding characters to print. There is one array per character to print in the file mode. The first element of each array is the number of (permission, character) pairs encoded in that particular array. The final element is the character to print in the event that none of the given permission bits are found.

    Note also how the permissions are specified as ’I_READ>>0’, ’I_READ>>3’, ’I_READ>>6’, and so on. The individual constants for each bit (S_IRUSR, S_IRGRP, etc.) had not been invented yet. (See Table 4.5 in Section 4.6.1, “Specifying Initial File Permissions”, page 106.)

  • Line 255: The m array points to each of the m1 through m9 arrays.

  • Lines 257–264: The pmode() function first sets the global variable flags to the passed-in parameter aflag. It then loops through the m array, passing each element to the select() function. The passed-in element represents one of the m1 to m9 arrays.

  • Lines 266–275: The select() function understands the layout of each m1 through m9 array. n is the number of pairs in the array (the first element); line 271 sets it. Lines 272–273 look for permission bits, checking the global variable flags set previously on line 261.

Note the use of the ++ operator, both in the loop test and in the loop body. The effect is to skip over pairs in the array as long as the permission bit in the first element of the pair is not found in flags.

When the loop ends, either the permission bit has been found, in which case pairp points at the second element of the pair, which is the correct character to print, or it has not been found, in which case pairp points at the default character. In either case, line 274 prints the character that pairp points to.

A final point worth noting is that in C, character constants (such as "x") have type int, not char.[1] So there’s no problem putting such constants into an integer array; everything works correctly.

277  char *                      char *makename(char *dir, char *file)
278  makename(dir, file)
279  char *dir, *file;
280  {
281      static char dfile[100];
282      register char *dp, *fp;
283      register int i;
284
285      dp = dfile;
286      fp = dir;
287      while (*fp)
288          *dp++ = *fp++;
289      *dp++ = '/';
290      fp = file;
291      for (i=0; i<DIRSIZ; i++)
292          *dp++ = *fp++;
293      *dp = 0;
294      return(dfile);
295 }

Lines 277–295 define the makename() function. Its job is to concatenate a directory name and a filename, separated by a slash character, and produce a string. It does this in the static buffer dfile. Note that dfile is only 100 characters long and that no error checking is done.

The code itself is straightforward, copying characters one at a time. makename() is used by the readdir() function.

297  readdir(dir)                      void readdir(char *dir)
298  char *dir;
299  {
300      static struct direct dentry;
301      register int j;
302      register struct lbuf *ep;
303
304      if ((dirf = fopen(dir, "r")) == NULL) {
305          printf("%s unreadable
", dir);
306          return;
307      }
308      tblocks = 0;
309    for(;;) {
310        if (fread((char *) &dentry, sizeof(dentry), 1, dirf) != 1)
311            break;
312        if (dentry.d_ino==0
313         || aflg==0 && dentry.d_name[0]=='.' && (dentry.d_name[1]==''
314            || dentry.d_name[1]=='.' && dentry.d_name[2]==''))
315            continue;
316        ep = gstat(makename(dir, dentry.d_name), 0);
317        if (ep==NULL)
318            continue;
319        if (ep->lnum != -1)
320            ep->lnum = dentry.d_ino;
321        for (j=0; j<DIRSIZ; j++)
322            ep->ln.lname[j] = dentry.d_name[j];
323    }
324    fclose(dirf);
325  }

Lines 297–325 define the readdir() function, whose job is to read the contents of directories named on the command line.

Lines 304–307 open the directory for reading, returning if fopen() fails. Line 308 initializes the global variable tblocks to 0. This was used earlier (lines 153–154) to print the total number of blocks used by files in a directory.

Lines 309–323 are a loop that reads directory entries and adds them to the flist array. Lines 310–311 read one entry, exiting the loop upon end-of-file.

Lines 312–315 skip uninteresting entries. If the inode number is zero, this slot isn’t used. Otherwise, if -a was not given and the filename is either ’.’ or ’..’, skip it.

Lines 316–318 call gstat() with the full name of the file, and a second argument of false, indicating that it’s not from the command line. gstat() updates the global lastp pointer and the flist array. A NULL return value indicates some sort of failure.

Lines 319–322 save the inode number and name in the struct lbuf. If ep->lnum comes back from gstat() set to -1, it means that the stat() operation on the file failed. Finally, line 324 closes the directory.

The following function, gstat() (lines 327–398), is the core function for the operation of retrieving and storing file information.

327  struct lbuf *                                 struct lbuf*gstat(char *file, int argfl)
328  gstat(file, argfl)
329  char *file;
330  {
331      extern char *malloc();
332      struct stat statb;
333      register struct lbuf *rep;
334      static int nomocore;
335
336      if (nomocore)                             Ran out of memory earlier
337          return(NULL);
338      rep = (struct lbuf *)malloc(sizeof(struct lbuf));
339      if (rep==NULL) {
340          fprintf(stderr, "ls: out of memory
");
341          nomocore = 1;
342          return(NULL);
343      }
344      if (lastp >= &flist[NFILES]) {            Check whether too many files given
345          static int msg;
346          lastp--;
347          if (msg==0) {
348              fprintf(stderr, "ls: too many files
");
349              msg++;
350          }
351      }
352      *lastp++ = rep;                           Fill in information
353      rep->lflags = 0;
354      rep->lnum = 0;
355      rep->ltype = '-';                         Default file type

The static variable nomocore [sic] indicates that malloc() failed upon an earlier call. Since it’s static, it’s automatically initialized to 0 (that is, false). If it’s true upon entry, gstat() just returns NULL. Otherwise, if malloc() fails, ls prints an error message, sets nomocore to true, and returns NULL (lines 334–343).

Lines 344–351 make sure that there is still room left in the flist array. If not, ls prints a message (but only once; note the use of the static variable msg), and then reuses the last slot in flist.

Line 352 makes the slot lastp points to point to the new struct lbuf (rep.) This also updates lastp, which is used for sorting in main() (lines 142 and 152). Lines 353–355 set default values for the flags, inode number, and type fields in the struct lbuf.

356    if (argfl || statreq) {
357        if (stat(file, &statb)<0) {          stat() failed
358            printf("%s not found
", file);
359            statb.st_ino = -1;
360            statb.st_size = 0;
361            statb.st_mode = 0;
362            if (argfl) {
363                lastp--;
364                return(0);
365            }
366        }
367        rep->lnum = statb.st_ino;            stat() OK, copy info
368        rep->lsize = statb.st_size;
369        switch(statb.st_mode&S_IFMT) {
370
371        case S_IFDIR:
372            rep->ltype = 'd';
373            break;
374
375        case S_IFBLK:
376            rep->ltype = 'b';
377            rep->lsize = statb.st_rdev;
378            break;
379
380        case S_IFCHR:
381            rep->ltype = 'c';
382            rep->lsize = statb.st_rdev;
383            break;
384        }
385        rep->lflags = statb.st_mode & ~S_IFMT;
386        rep->luid = statb.st_uid;
387        rep->lgid = statb.st_gid;
388        rep->lnl = statb.st_nlink;
389        if(uflg)
390            rep->lmtime = statb.st_atime;
391        else if (cflg)
392            rep->lmtime = statb.st_ctime;
393        else
394            rep->lmtime = statb.st_mtime;
395        tblocks += nblock(statb.st_size);
396    }
397    return(rep);
398  }

Lines 356–396 handle the call to stat(). If this is a command-line argument or if statreq is true because of an option, the code fills in the struct lbuf as follows:

  • Lines 357–366: Call stat() and if it fails, print an error message and set values as appropriate, then return NULL (expressed as 0).

  • Lines 367–368: Set the inode number and size fields from the struct stat if the stat() succeeded.

  • Lines 369–384: Handle the special cases of directory, block device, and character device. In all cases the code updates the ltype field. For devices, the lsize value is replaced with the st_rdev value.

  • Lines 385–388: Fill in the lflags, luid, lgid, and lnl fields from the corresponding fields in the struct stat. Line 385 removes the file-type bits, leaving the 12 permissions bits (read/write/execute for user/group/other, and setuid, setgid, and save-text).

  • Lines 389–394: Based on command-line options, use one of the three time fields from the struct stat for the lmtime field in the struct lbuf.

  • Line 395: Update the global variable tblocks with the number of blocks in the file.

400  compar(pp1, pp2)                     int compar(struct lbuf**pp1,
401  struct 1buf **pp1, **pp2;                        struct lbuf**pp2)
402  {
403      register struct lbuf *p1, *p2;
404
405      p1 = *pp1;
406      p2 = *pp2;
407      if (dflg==0) {
408          if (p1->Iflags&ISARG && p1->ltype=='d') {
409              if (!(p2->Iflags&ISARG && p2->ltype=='d'))
410                  return(1);
411          } else {
412              if (p2->Iflags&ISARG && p2->ltype=='d')
413                  return(-1);
414          }
415      }
416      if (tflg) {
417          if(p2->lmtime == p1->lmtime)
418              return(0);
419          if(p2->lmtime > p1->lmtime)
420              return(rflg);
421          return(-rflg);
422      }
423      return(rflg * strcmp(p1->lflags&ISARG? p1->ln.namep: p1->ln.lname,
424                  p2->lflags&ISARG? p2->ln.namep: p2->ln.lname));
425  }

The compar() function is dense: There’s a lot happening in little space. The first thing to remember is the meaning of the return value: A negative value means that the first file should sort to an earlier spot in the array than the second, zero means the files are equal, and a positive value means that the second file should sort to an earlier spot than the first.

The next thing to understand is that ls prints the contents of directories after it prints information about files. Thus the result of sorting should be that all directories named on the command line follow all files named on the command line.

Finally, the rflg variable helps implement the -r option, which reverses the sorting order. It is initialized to 1 (line 30). If -r is used, rflg is set to -1 (lines 89—91).

The following pseudocode describes the logic of compar(); the line numbers in the left margin correspond to those of ls.c:

407  if ls has to read directories # dflg == 0
408      if p1 is a command-line arg and p1 is a directory
409          if p2 is not a command-line arg and is not a directory
410              return 1 # first comes after second
             else
                 fall through to time test
411      else
             # p1 is not a command-line directory
412          if p2 is a command-line arg and is a directory
413              return -1 # first comes before second
                                  else
                                      fall through to time test
416  if sorting is based on time # tflg is true
         # compare times:
417      if p2's time is equal to p1's time
418          return 0
419      if p2's time > p1's time
420          return the value of rflg (positive or negative)
      # p2's time < p1's time
421      return opposite of rflg (negative or positive)

423  Multiply rflg by the result of strcmp()
424  on the two names and return the result

The arguments to strcmp() on lines 423—424 look messy. What’s going on is that different members of the ln union in the struct lbuf must be used, depending on whether the filename is a command-line argument or was read from a directory.

Summary

  • The V7 ls is a relatively small program, yet it touches on many of the fundamental aspects of Unix programming: file I/O, file metadata, directory contents, users and groups, time and date values, sorting, and dynamic memory management.

  • The most notable external difference between V7 ls and modern ls is the treatment of the -a and -l options. The V7 version has many fewer options than do modern versions; a noticeable lack is the -R recursive option.

  • The management of flist is a clean way to use the limited memory of the PDP-11 architecture yet still provide as much information as possible. The struct lbuf nicely abstracts the information of interest from the struct stat; this simplifies the code considerably. The code for printing the nine permission bits is compact and elegant.

  • Some parts of ls use surprisingly small limits, such as the upper bound of 1024 on the number of files or the buffer size of 100 in makename().

Exercises

  1. Consider the getname() function. What happens if the requested ID number is 216 and the following two lines exist in /etc/passwd, in this order:

    joe:xyzzy:2160:10:Joe User:/usr/joe:/bin/sh
    jane:zzyxx:216:12:Jane User:/usr/jane:/bin/sh
    
  2. Consider the makename() function. Could it use sprintf() to make the concatenated name? Why or why not?

  3. Are lines 319—320 in readdir() really necessary?

  4. Take the stat program you wrote for the exercises in “Exercises” for Chapter 6, page 205. Add the nblock() function from the V7 ls, and print the results along with the st_blocks field from the struct stat. Add a visible marker when they’re different.

  5. How would you grade the V7 ls on its use of malloc()? (Hint: how often is free() called? Where should it be called?)

  6. How would you grade the V7 ls for code clarity? (Hint: how many comments are there?)

  7. Outline the steps you would take to adapt the V7 ls for modern systems.



[1] This is different in C++: There, character constants do have type char. This difference does not affect this particular code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.188.160