Sparse Files

The previous sections have focused on reading, writing, and truncating files. Now turn your attention briefly to the physical makeup of UNIX regular files. UNIX regular files have a special quality, which is supported by the kernel, that permits them to be sparsely populated.

A sparse file is a lot like the sparse matrixes that you learned about in school. The following represents a sparse matrix:

0 0 0 0 9
0 0 0 7 0
0 0 8 0 0
0 1 0 0 0
3 0 0 0 0

You can see that this matrix is made up entirely of zeros, except for the one diagonal. Storing this matrix requires 5 * 5 = 25 cells to store all the values. Yet, it would be wasteful to store this matrix with 25 cells when only 5 of them are non-zero. One form of sparse matrix might be optimized to store only the diagonal values and to supply zeros when requested for any of the non-diagonal cells.

Creating a Sparse File

Sparse files work the same way. It is possible to create a 1GB file with only a few bytes of real data in it. The program in Listing 4.3 illustrates a simple program that does this.

Code Listing 4.3. bigfile.c—Creating a Sparse File
1:   /* sparse.c */
2:
3:   #include <stdio.h>
4:   #include <fcntl.h>
5:   #include <unistd.h>
6:   #include <errno.h>
7:   #include <string.h>
8:   #include <sys/types.h>
9:   #include <sys/uio.h>
10:
11:  int
12:  main(int argc,char **argv) {
13:      int z;                              /* Return status code */
14:      off_t o;                            /* Offset */
15:      int fd;                             /* Read file descriptor */
16:
17:      /*
18:       * Create/truncate sparse.dat
19:       */
20:      fd = open("sparse.dat",O_CREAT|O_WRONLY|O_TRUNC,0640);
21:      if ( fd == -1 ) {
22:          fprintf(stderr,"%s: opening sparse.dat for write
",
23:              strerror(errno));
24:          return 1;                       /* Failed */
25:      }
26:
27:      /*
28:       * Seek to almost the 1GB mark :
29:       */
30:      o = lseek(fd,1023*1024*1024,SEEK_SET); /* Seek to ~1GB */
31:      if ( o == (off_t)(-1) ) {
32:          fprintf(stderr,"%s: lseek(2)
",strerror(errno));
33:          return 2;
34:      }
35:
36:      /*
37:       * Write a little message :
38:       */
39:      z = write(fd,"END-OF-SPARSE-FILE",18);
40:      if ( z == -1 ) {
41:          fprintf(stderr,"%s: write(2)
",strerror(errno));
42:          return 2;
43:      }
44:
45:      close(fd);                          /* Close the file */
46:
47:      return 0;
48:  }
						

A compile-and-test session for this program is shown next:

$ make sparse
cc -c -D_POSIX_C_SOURCE=199309L -Wall sparse.c
cc sparse.o -o sparse
$ ./sparse
$ ls -l sparse.dat
-rw-r-----  1 me   mygrp 1072693266 Apr 17 02:36 sparse.dat
$ od -cx sparse.dat
0000000                                 
            0000    0000    0000    0000    0000    0000    0000    0000
*
7774000000    E   N   D   -   O   F   -   S   P   A   R   S   E   -   F   I
            4e45    2d44    464f    532d    4150    5352    2d45    4946
7774000020    L   E                                                       
            454c                                                       
7774000022
$

After the program is compiled and run, the ls(1) command lists the file sparse.dat that it creates. Notice its huge size of 1072693266 bytes. You may not even have that much free space left! Yet the file exists.

Next, the od(1) command is used to dump the contents of this file in both hexadecimal and in ASCII where possible (options -cx). This command may run a very long time, since the od(1) command will read almost 1GB of zero bytes before reaching the end of the file.

Looking at the od(1) output, you can see that UNIX has provided zero bytes between the beginning of the file and the point where the seek was done, and it finally found the string "END-OF-SPARSE-FILE" that was written by the program. At the left of the output, where od(1) shows the file offset, you can see that the string was written at a very large file offset.

Now that sparse.dat exists, there is really only a small amount of disk space allocated to this file. There is no need to panic about wasted disk space, because just enough space is allocated to hold the C string that was written. Whenever any program reads other parts of this sparse file, which is largely one big hole, the UNIX kernel simply returns zero bytes.

Warning

It is probably a good idea to delete the sparse.dat file that was created by the example program. Sparse files can provide a real headache for backup programs, because many backup programs simply copy the file in question to the backup medium. If a backup is performed for your sparse.dat file, almost a gigabyte of zeros will be copied to the backup medium. For this reason, smarter backup utility programs know about sparse files and copy only the active information within them.

Sparse files can also be a problem when you copy them. If you attempt to copy your sparse.dat file to another location in your current directory, you may run out of disk space.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.144.248