Program: TarList (File Converter)

This program provides easy access to tar -format files using an interface similar to that used for zip archives in Section 9.19. Unix users will be familiar with the tar program, an archiver first written back in the mid-1970s. And JDK users might find the tar program syntax somewhat familiar, as it was the basis for the command-line Java Archiver (jar) program in the JDK, written 20 years later. If you’re not a Unix user, don’t dismay: just think of this as an example of a whole category of programs, those that need to repetitively read and write files in a special-purpose, predefined format. MS-Windows is full of special-purpose file formats, as are many other operating systems. Unlike jar, tar is just an archiver, not a combined archiver and compressor, so its format is somewhat simpler. In this section we’ll develop a program that reads a tar archive and lists the contents. The TarList program combines several reading methods with several formatting methods. So the commands:

tar -xvf demo.tar
java TarList demo.tar

should produce the same output. And indeed they do, at least for some files and some versions of tar, when run on a small tar archive:

$ java TarList demo.tar
-rwxr-xr-x ian/wheel       734 1999-10-05 19:10 TarDemo.class
-rwxr-xr-x ian/wheel       431 1999-10-05 19:10 TarList.java
-rw-r--r-- ian/wheel         0 1999-10-05 19:10 a
-rw-r--r-- ian/wheel         0 1999-10-05 19:10 b link to a
lrwxr-xr-x ian/wheel         0 1999-10-05 19:10 c -> a
$ tar -tvf demo.tar
-rwxr-xr-x ian/wheel       734 1999-10-05 19:10 TarDemo.class
-rwxr-xr-x ian/wheel       431 1999-10-05 19:10 TarList.java
-rw-r--r-- ian/wheel         0 1999-10-05 19:10 a
-rw-r--r-- ian/wheel         0 1999-10-05 19:10 b link to a
lrwxr-xr-x ian/wheel         0 1999-10-05 19:10 c -> a
$

This example archive contains five files. The last two items, b and c, represent two kinds of links, regular and symbolic. Aregular link is simply an additional name for a filesystem entry. In Win-32 terms, a symbolic link closely approximates a LNK file, except it is maintained by the operating system kernel instead of by a user-level programming library.

First let’s look at the main program class, TarList (Example 9-6), which is fairly simple. Its main method simply looks for a filename in the command-line arguments, passes it to the TarList constructor, and calls the list( ) method. The list( ) method delegates the presentation formatting to a method called toListFormat( ), which demonstrates several techniques. The Unix permissions, which consist of three octal digits (user, group, and other) representing three permissions (read, write, and execute) is formatted using a simple for loop and an array of strings (see Section 7.2). A DecimalFormat (see Section 5.8) is used to format the “size” column to a fixed width. But since DecimalFormat apparently lacks the capability to do fixed-width numeric fields with leading spaces instead of leading zeros, we convert the leading zeros to spaces. A DateFormat (see Section 6.3) is used to format the date-and-time field. All of this formatting is done into a StringBuffer (see Section 3.4), which at the very end is converted into a String and returned as the value of the toListFormat( ) method.

Example 9-6. TarList.java

import java.io.*;
import java.text.*;    // only for formatting
import java.util.*;

/**
 * Demonstrate the Tar archive lister.
 */
public class TarList {
    public static void main(String[] argv) throws IOException, TarException {
        if (argv.length == 0) {
            System.err.println("Usage: TarList archive");
            System.exit(1);
        }
        new TarList(argv[0]).list(  );
    }
    /** The TarFile we are reading */
    TarFile tf;

    /** Constructor */
    public TarList(String fileName) {
        tf = new TarFile(fileName);
    }

    /** Generate and print the listing */
    public void list(  ) throws IOException, TarException {
        Enumeration list = tf.entries(  );
        while (list.hasMoreElements(  )) {
            TarEntry e = (TarEntry)list.nextElement(  );
            System.out.println(toListFormat(e));
        }
    }

    protected StringBuffer sb;
    /** Shift used in formatting permissions */
    protected static int shft[] = { 6, 3, 0 };
    /** Format strings used in permissions */
    protected static String rwx[] = {
        "---", "--x", "-w-", "-wx",
        "r--", "r-x", "rw-", "rwx"
    };
    /** NumberFormat used in formatting List form string */
    NumberFormat sizeForm = new DecimalFormat("00000000");
    /** Date used in printing mtime */
    Date date = new Date(  );
    SimpleDateFormat dateForm =
        new SimpleDateFormat ("yyyy-MM-dd HH:mm");

    /** Format a TarEntry the same way that UNIX tar does */
    public String toListFormat(TarEntry e) {
        sb = new StringBuffer(  );
        switch(e.type) {
            case TarEntry.LF_OLDNORMAL:
            case TarEntry.LF_NORMAL:
            case TarEntry.LF_CONTIG:
            case TarEntry.LF_LINK:        // hard link: same as file
                sb.append('-'),    // 'f' would be sensible
                break;
            case TarEntry.LF_DIR:
                sb.append('d'),
                break;
            case TarEntry.LF_SYMLINK:
                sb.append('l'),
                break;
            case TarEntry.LF_CHR:        // UNIX device file
                sb.append('c'),
                break;
            case TarEntry.LF_BLK:        // UNIX device file
                sb.append('b'),
                break;
            case TarEntry.LF_FIFO:        // UNIX named pipe
                sb.append('p'),
                break;
            default:            // Can't happen?
                sb.append('?'),
                break;
        }

        // Convert e.g., 754 to rwxrw-r--
        int mode = e.getMode(  );
        for (int i=0; i<3; i++) {
            sb.append(rwx[mode >> shft[i] & 007]);
        }
        sb.append(' '),

        // owner and group
        sb.append(e.getUname()).append('/').append(e.getGname(  )).append(' '),

        // size
        // DecimalFormat can't do "%-9d", so we do part of it ourselves
        sb.append(' '),
        String t = sizeForm.format(e.getSize(  ));
        boolean digit = false;
        char c;
        for (int i=0; i<8; i++) {
            c = t.charAt(i);
            if (!digit && i<(8-1) && c == '0')
                sb.append(' '),        // leading space
            else {
                digit = true;
                sb.append(c);
            }
        }
        sb.append(' '),

        // mtime
        // copy file's mtime into Data object (after scaling
        // from "sec since 1970" to "msec since 1970"), and format it.
        date.setTime(1000*e.getTime(  ));
        sb.append(dateForm.format(date)).append(' '),

        sb.append(e.getName(  ));
        if (e.isLink(  ))
            sb.append(" link to " ).append(e.getLinkName(  ));
        if (e.isSymLink(  ))
            sb.append(" -> " ).append(e.getLinkName(  ));

        return sb.toString(  );
    }
}

“But wait,” you may be saying. “There’s no I/O here!” Well, patient reader, your waiting is rewarded. For here is class TarFile (Example 9-7). As its opening comment remarks, tar files, unlike zip files, have no central directory, so you have to read the entire archive file to be sure of having a particular file’s entry, or to know how many entries there are in the archive. I centralize this in a method called readFile( ), but for efficiency I don’t call this method until I need to; this technique is known as lazy evaluation (there are comments in the ToDo file on how to make it even lazier, at the cost of one more boolean variable). In this method I construct a RandomAccessFile (see Section 9.15) to read the data. Since I need to read the file sequentially but then may need to seek back to a particular location, I use a file that can be accessed randomly as well as sequentially. Most of the rest of the code has to do with keeping track of the files stored within the archive.

Example 9-7. TarFile.java

import java.io.*;
import java.util.*;

/**
 * Tape Archive Lister, patterned loosely after java.util.ZipFile.
 * Since, unlike Zip files, there is no central directory, you have to
 * read the entire file either to be sure of having a particular file's
 * entry, or to know how many entries there are in the archive.
 */

public class TarFile {
    /** True after we've done the expensive read. */
    protected boolean read = false;
    /** The list of entries found in the archive */
    protected Vector list;

    /** Size of header block on tape. */
    public static final int    RECORDSIZE = 512;

    /* Size of each block, in records */
    protected int        blocking;
    /* Size of each block, in bytes */
    protected int        blocksize;

    /** File containing archive */
    protected String    fileName;

    /** Construct (open) a Tar file by name */
    public TarFile(String name) {
        fileName = name;
        list = new Vector(  );
        read = false;
    }

    /** Construct (open) a Tar file by File */
    public TarFile(java.io.File name) throws IOException {
        this(name.getCanonicalPath(  ));
    }

    /** The main datastream. */
    protected RandomAccessFile is;

    /** Read the Tar archive in its entirety.
     * This is semi-lazy evaluation, in that we don't read the file
     * until we need to.
     * A future revision may use even lazier evaluation: in getEntry,
     * scan the list and, if not found, continue reading!
     * For now, just read the whole file.
     */
    protected void readFile(  ) throws IOException, TarException {
         is = new RandomAccessFile(fileName, "r");
        TarEntry hdr;
        try {
            do {
                hdr = new TarEntry(is);
                if (hdr.getSize(  ) < 0) {
                    System.out.println("Size < 0");
                    break;
                }
                // System.out.println(hdr.toString(  ));
                list.addElement(hdr);
                // Get the size of the entry
                int nbytes = hdr.getSize(  ), diff;
                // Round it up to blocksize.
                if ((diff = (nbytes % RECORDSIZE)) != 0) {
                    nbytes -= diff; nbytes += RECORDSIZE;
                }
                // And skip over the data portion.
                // System.out.println("Skipping " + nbytes + " bytes");
                is.skipBytes(nbytes);
            } while (true);
        } catch (EOFException e) {
            // OK, just stop reading.
        }
        // All done, say we've read the contents.
        read = true;
    }

    /* Close the Tar file. */
    public void close(  ) {
        try {
            is.close(  );
        } catch (IOException e) {
            // nothing to do
        }
    }

    /* Returns an enumeration of the Tar file entries. */
    public Enumeration entries(  ) throws IOException, TarException {
        if (!read) {
            readFile(  );
        }
        return list.elements(  );
    }

    /** Returns the Tar entry for the specified name, or null if not found. */
    public TarEntry getEntry(String name) {
        for (int i=0; i<list.size(  ); i++) {
            TarEntry e = (TarEntry)list.elementAt(i);
            if (name.equals(e.getName(  )))
                return e;
        }
        return null;
    }

    /** Returns an InputStream for reading the contents of the 
     * specified entry from the archive.
     * May cause the entire file to be read.
     */
    public InputStream getInputStream(TarEntry entry) {
        return null;
    }

    /** Returns the path name of the Tar file. */
    public String getName(  ) {
        return null;
    }

    /** Returns the number of entries in the Tar archive.
     * May cause the entire file to be read.
     */
    public int size(  ) {
        return 0;
    }
}

“But my patience is nearly at an end! Where’s the actual reading?” Indeed, you may well ask. But it’s not there. The actual reading code is further delegated to TarEntry’s constructor, which we’ll see in a minute. Since TarFile is patterned after ZipFile (see Section 9.19), it doesn’t extend any of the I/O classes. Like ZipFile, a TarFile is an object that lets you get at the individual elements within a tar-format archive, each represented by a TarEntry object. If you want to find whether a particular file exists in the archive, you can call the TarFile ’s getEntry( ) method. Or you can ask for all the entries, as we did previously in TarList. Having obtained one entry, you can ask for all the information about it, again as we did in TarList. Or you could ask for an InputStream, as we did for zip files. However, that part of the TarEntry class has been left as an exercise for the reader. Here, at last, is TarEntry (Example 9-8), whose constructor reads the archive header and stores the file’s beginning location for you, for when you get around to writing the getInputStream method.

As mentioned, I use lazy evaluation, simply reading the bytes into some byte arrays, and don’t convert them to strings or numbers until asked to. Notice also that the filenames and user/group names are treated as byte strings and converted as ASCII characters when needed as Strings. This makes sense, because the tar file format only uses ASCII characters at present. Some Unix implementations of tar explicitly look for null characters to end some of these strings; this will need work from the Unix standards people.

Example 9-8. TarEntry.java

import java.io.*;

/** One entry in an archive file.
 * @note
 * Tar format info taken from John Gilmore's public domain tar program,
 * @(#)tar.h 1.21 87/05/01    Public Domain, which said:
 * "Created 25 August 1985 by John Gilmore, ihnp4!hoptoad!gnu."
 * John is now [email protected], and by another path tar.h is GPL'd in GNU Tar.
 */
public class TarEntry {
    /** Where in the tar archive this entry's HEADER is found. */
    public long fileOffset = 0;

    /** The maximum size of a name */
    public static final int    NAMSIZ    = 100;
    public static final int    TUNMLEN    = 32;
    public static final int    TGNMLEN    = 32;

    // Next fourteen fields constitute one physical record.
    // Padded to TarFile.RECORDSIZE bytes on tape/disk.
    // Lazy Evaluation: just read fields in raw form, only format when asked.

    /** File name */
    byte[]    name = new byte[NAMSIZ];
    /** permissions, e.g., rwxr-xr-x? */
    byte[]    mode = new byte[8];
    /* user */
    byte[]    uid = new byte[8];
    /* group */
    byte[]    gid = new byte[8];
    /* size */
    byte[]    size = new byte[12];
    /* UNIX modification time */
    byte[]    mtime = new byte[12];
    /* checksum field */
    byte[]    chksum = new byte[8];
    byte    type;
    byte[]    linkName = new byte[NAMSIZ];
    byte[]    magic = new byte[8];
    byte[]    uname = new byte[TUNMLEN];
    byte[]    gname = new byte[TGNMLEN];
    byte[]    devmajor = new byte[8];
    byte[]    devminor = new byte[8];

    // End of the physical data fields.

    /* The magic field is filled with this if uname and gname are valid. */
    public static final byte TMAGIC[] = {
        // 'u', 's', 't', 'a', 'r', ' ', ' ', ''
        0, 0, 0, 0, 0, 0, 0x20, 0x20, 0
    }; /* 7 chars and a null */

    /* Type value for Normal file, Unix compatibility */
    public static final int    LF_OLDNORMAL ='';        
    /* Type value for Normal file */
    public static final int    LF_NORMAL = '0';
    /* Type value for Link to previously dumped file */
    public static final int LF_LINK =     '1';
    /* Type value for Symbolic link */
    public static final int LF_SYMLINK = '2';
    /* Type value for Character special file */
    public static final int LF_CHR = '3';
    /* Type value for Block special file */
    public static final int LF_BLK = '4';
    /* Type value for Directory */
    public static final int LF_DIR     = '5';
    /* Type value for FIFO special file */
    public static final int LF_FIFO     = '6';
    /* Type value for Contiguous file */
    public static final int LF_CONTIG = '7';

    /* Constructor that reads the entry's header. */
    public TarEntry(RandomAccessFile is) throws IOException, TarException {

        fileOffset = is.getFilePointer(  );

        // read(  ) returns -1 at EOF
        if (is.read(name) < 0)
            throw new EOFException(  );
        // Tar pads to block boundary with nulls.
        if (name[0] == '')
            throw new EOFException(  );
        // OK, read remaining fields.
        is.read(mode);
        is.read(uid);
        is.read(gid);
        is.read(size);
        is.read(mtime);
        is.read(chksum);
        type = is.readByte(  );
        is.read(linkName);
        is.read(magic);
        is.read(uname);
        is.read(gname);
        is.read(devmajor);
        is.read(devminor);

        // Since the tar header is < 512, we need to skip it.
        is.skipBytes((int)(TarFile.RECORDSIZE -
            (is.getFilePointer(  ) % TarFile.RECORDSIZE)));

        // TODO if checksum(  ) fails,
        //    throw new TarException("Failed to find next header");

    }

    /** Returns the name of the file this entry represents. */
    public String getName(  ) {
        return new String(name).trim(  );
    }

    public String getTypeName(  ) {
        switch(type) {
        case LF_OLDNORMAL:
        case LF_NORMAL:
            return "file";
        case LF_LINK:
            return "link w/in archive";
        case LF_SYMLINK:
            return "symlink";
        case LF_CHR:
        case LF_BLK:
        case LF_FIFO:
            return "special file";
        case LF_DIR:
            return "directory";
        case LF_CONTIG:
            return "contig";
        default:
            throw new IllegalStateException("TarEntry.getTypeName: type "   
                + type + " invalid");
        }
    }

    /** Returns the UNIX-specific "mode" (type+permissions) of the entry */
    public int getMode(  ) {
        try {
            return Integer.parseInt(new String(mode).trim(  ), 8) & 0777;
        } catch (IllegalArgumentException e) {
            return 0;
        }
    }

    /** Returns the size of the entry */
    public int getSize(  ) {
        try {
            return Integer.parseInt(new String(size).trim(  ), 8);
        } catch (IllegalArgumentException e) {
            return 0;
        }
    }

    /** Returns the name of the file this entry is a link to,
     * or null if this entry is not a link.
     */
    public String getLinkName(  ) {
        // if (isLink(  ))
        //     return null;
        return new String(linkName).trim(  );
    }
    
    /** Returns the modification time of the entry */
    public long getTime(  ) {
        try {
            return Long.parseLong(new String(mtime).trim(  ),8);
        } catch (IllegalArgumentException e) {
            return 0;
        }
    }

    /** Returns the string name of the userid */
    public String getUname(  ) {
        return new String(uname).trim(  );
    }

    /** Returns the string name of the group id */
    public String getGname(  ) {
        return new String(gname).trim(  );
    }

    /** Returns the numeric userid of the entry */
    public int getuid(  ) {
        try {
            return Integer.parseInt(new String(uid).trim(  ));
        } catch (IllegalArgumentException e) {
            return -1;
        }
    }
    /** Returns the numeric gid of the entry */
    public int getgid(  ) {
        try {
            return Integer.parseInt(new String(gid).trim(  ));
        } catch (IllegalArgumentException e) {
            return -1;
        }
    }

    /** Returns true if this entry represents a file */
    boolean isFile(  ) {
        return type == LF_NORMAL || type == LF_OLDNORMAL;
    }

    /** Returns true if this entry represents a directory */
    boolean isDirectory(  ) {
        return type == LF_DIR;
    }

    /** Returns true if this a hard link (to a file in the archive) */
    boolean isLink(  ) {
        return type == LF_LINK;
    }

    /** Returns true if this a symbolic link */
    boolean isSymLink(  ) {
        return type == LF_SYMLINK;
    }

    /** Returns true if this entry represents some type of UNIX special file */
    boolean isSpecial(  ) {
        return type == LF_CHR || type == LF_BLK || type == LF_FIFO;
    }

    public String toString(  ) {
        return "TarEntry[" + getName(  ) + ']';
    } 
}

See Also

The TarFile example is one of the longest in the book. One could equally well use filter subclassing to provide encryption. One could even, in theory, write a Java interface to an encrypted filesystem layer, such as CFS (see ftp://research.att.com/dist/mab/cfs.ps) or to a version-archiving system such as CVS (the Concurrent Versions System; see http://www.cvs.org). CVS is a good tool for maintaining source code; most large open source projects now use it (see http://www.openbsd.org/why-cvs.html). In fact, there is already a Java-based implementation of CVS (see http://www.jcvs.org/). Either of these would be substantially more clever than my little tarry friend, but, I suspect, contain rather more code.

For all topics in this chapter, Rusty’s book Java I/O should be considered the antepenultimate documentation. The penultimate reference is the Javadoc documentation, while the ultimate reference is, if you really need it, the source code for the Java API, to which I have not needed to make a single reference in writing this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.104.153