This program provides easy access to tar -format
files using an interface similar to that used for zip archives in
Section 9.19. Unix users will be familiar with the
tar program, an archiver first written back in
the mid-1970s. And JDK users might find the tar
program syntax somewhat familiar, as it was the basis for the
command-line Java Archiver (jar)
program in the JDK, written 20 years later. If you’re not a
Unix user, don’t dismay: just think of this as an example of a
whole category of programs, those that need to repetitively read and
write files in a special-purpose, predefined format. MS-Windows is
full of special-purpose file formats, as are many other operating
systems. Unlike jar, tar is
just an archiver, not a combined archiver and compressor, so its
format is somewhat simpler. In this section we’ll develop a
program that reads a tar
archive
and lists the contents. The TarList
program
combines several reading methods with several formatting methods. So
the commands:
tar -xvf demo.tar java TarList demo.tar
should produce the same output. And indeed they do, at least for some files and some versions of tar, when run on a small tar archive:
$ java TarList demo.tar -rwxr-xr-x ian/wheel 734 1999-10-05 19:10 TarDemo.class -rwxr-xr-x ian/wheel 431 1999-10-05 19:10 TarList.java -rw-r--r-- ian/wheel 0 1999-10-05 19:10 a -rw-r--r-- ian/wheel 0 1999-10-05 19:10 b link to a lrwxr-xr-x ian/wheel 0 1999-10-05 19:10 c -> a $ tar -tvf demo.tar -rwxr-xr-x ian/wheel 734 1999-10-05 19:10 TarDemo.class -rwxr-xr-x ian/wheel 431 1999-10-05 19:10 TarList.java -rw-r--r-- ian/wheel 0 1999-10-05 19:10 a -rw-r--r-- ian/wheel 0 1999-10-05 19:10 b link to a lrwxr-xr-x ian/wheel 0 1999-10-05 19:10 c -> a $
This example archive contains five files. The last two items,
b
and c
, represent two kinds of
links, regular and symbolic. Aregular
link is simply an additional name for a filesystem entry.
In Win-32 terms, a symbolic link closely
approximates a LNK
file, except it is maintained
by the operating system kernel instead of by a user-level programming
library.
First let’s look at the main program class,
TarList
(Example 9-6), which
is fairly simple. Its main
method simply looks for
a filename in the command-line arguments, passes it to the
TarList
constructor, and calls the list( )
method. The list( )
method delegates
the presentation formatting to a method called toListFormat( )
, which demonstrates several techniques. The Unix
permissions, which consist of three octal digits (user, group, and
other) representing three permissions (read, write, and execute) is
formatted using a simple for
loop and an array of
strings (see Section 7.2). A
DecimalFormat
(see Section 5.8)
is used to format the “size” column to a fixed width. But
since DecimalFormat
apparently lacks the
capability to do fixed-width numeric fields with leading spaces
instead of leading zeros, we convert the leading zeros to spaces. A
DateFormat
(see Section 6.3) is
used to format the date-and-time field. All of this formatting is
done into a StringBuffer (see Section 3.4), which at
the very end is converted into a String
and
returned as the value of the toListFormat( )
method.
Example 9-6. TarList.java
import java.io.*; import java.text.*; // only for formatting import java.util.*; /** * Demonstrate the Tar archive lister. */ public class TarList { public static void main(String[] argv) throws IOException, TarException { if (argv.length == 0) { System.err.println("Usage: TarList archive"); System.exit(1); } new TarList(argv[0]).list( ); } /** The TarFile we are reading */ TarFile tf; /** Constructor */ public TarList(String fileName) { tf = new TarFile(fileName); } /** Generate and print the listing */ public void list( ) throws IOException, TarException { Enumeration list = tf.entries( ); while (list.hasMoreElements( )) { TarEntry e = (TarEntry)list.nextElement( ); System.out.println(toListFormat(e)); } } protected StringBuffer sb; /** Shift used in formatting permissions */ protected static int shft[] = { 6, 3, 0 }; /** Format strings used in permissions */ protected static String rwx[] = { "---", "--x", "-w-", "-wx", "r--", "r-x", "rw-", "rwx" }; /** NumberFormat used in formatting List form string */ NumberFormat sizeForm = new DecimalFormat("00000000"); /** Date used in printing mtime */ Date date = new Date( ); SimpleDateFormat dateForm = new SimpleDateFormat ("yyyy-MM-dd HH:mm"); /** Format a TarEntry the same way that UNIX tar does */ public String toListFormat(TarEntry e) { sb = new StringBuffer( ); switch(e.type) { case TarEntry.LF_OLDNORMAL: case TarEntry.LF_NORMAL: case TarEntry.LF_CONTIG: case TarEntry.LF_LINK: // hard link: same as file sb.append('-'), // 'f' would be sensible break; case TarEntry.LF_DIR: sb.append('d'), break; case TarEntry.LF_SYMLINK: sb.append('l'), break; case TarEntry.LF_CHR: // UNIX device file sb.append('c'), break; case TarEntry.LF_BLK: // UNIX device file sb.append('b'), break; case TarEntry.LF_FIFO: // UNIX named pipe sb.append('p'), break; default: // Can't happen? sb.append('?'), break; } // Convert e.g., 754 to rwxrw-r-- int mode = e.getMode( ); for (int i=0; i<3; i++) { sb.append(rwx[mode >> shft[i] & 007]); } sb.append(' '), // owner and group sb.append(e.getUname()).append('/').append(e.getGname( )).append(' '), // size // DecimalFormat can't do "%-9d", so we do part of it ourselves sb.append(' '), String t = sizeForm.format(e.getSize( )); boolean digit = false; char c; for (int i=0; i<8; i++) { c = t.charAt(i); if (!digit && i<(8-1) && c == '0') sb.append(' '), // leading space else { digit = true; sb.append(c); } } sb.append(' '), // mtime // copy file's mtime into Data object (after scaling // from "sec since 1970" to "msec since 1970"), and format it. date.setTime(1000*e.getTime( )); sb.append(dateForm.format(date)).append(' '), sb.append(e.getName( )); if (e.isLink( )) sb.append(" link to " ).append(e.getLinkName( )); if (e.isSymLink( )) sb.append(" -> " ).append(e.getLinkName( )); return sb.toString( ); } }
“But wait,” you may be saying. “There’s no
I/O here!” Well, patient reader, your waiting is rewarded. For
here is class TarFile
(Example 9-7). As its opening comment remarks,
tar files, unlike zip files, have no central
directory, so you have to read the entire archive file to be sure of
having a particular file’s entry, or to know how many entries
there are in the archive. I centralize this in a method
called readFile( )
, but for efficiency I
don’t call this method until I need to; this technique is known
as lazy
evaluation (there are comments in the ToDo file on how to make it
even lazier, at the cost of one more boolean variable). In this
method I construct a RandomAccessFile
(see Section 9.15) to read the data. Since I need to read the
file sequentially but then may need to seek back to a particular
location, I use a file that can be accessed randomly as well as
sequentially. Most of the rest of the code has to do with keeping
track of the files stored within the
archive.
Example 9-7. TarFile.java
import java.io.*; import java.util.*; /** * Tape Archive Lister, patterned loosely after java.util.ZipFile. * Since, unlike Zip files, there is no central directory, you have to * read the entire file either to be sure of having a particular file's * entry, or to know how many entries there are in the archive. */ public class TarFile { /** True after we've done the expensive read. */ protected boolean read = false; /** The list of entries found in the archive */ protected Vector list; /** Size of header block on tape. */ public static final int RECORDSIZE = 512; /* Size of each block, in records */ protected int blocking; /* Size of each block, in bytes */ protected int blocksize; /** File containing archive */ protected String fileName; /** Construct (open) a Tar file by name */ public TarFile(String name) { fileName = name; list = new Vector( ); read = false; } /** Construct (open) a Tar file by File */ public TarFile(java.io.File name) throws IOException { this(name.getCanonicalPath( )); } /** The main datastream. */ protected RandomAccessFile is; /** Read the Tar archive in its entirety. * This is semi-lazy evaluation, in that we don't read the file * until we need to. * A future revision may use even lazier evaluation: in getEntry, * scan the list and, if not found, continue reading! * For now, just read the whole file. */ protected void readFile( ) throws IOException, TarException { is = new RandomAccessFile(fileName, "r"); TarEntry hdr; try { do { hdr = new TarEntry(is); if (hdr.getSize( ) < 0) { System.out.println("Size < 0"); break; } // System.out.println(hdr.toString( )); list.addElement(hdr); // Get the size of the entry int nbytes = hdr.getSize( ), diff; // Round it up to blocksize. if ((diff = (nbytes % RECORDSIZE)) != 0) { nbytes -= diff; nbytes += RECORDSIZE; } // And skip over the data portion. // System.out.println("Skipping " + nbytes + " bytes"); is.skipBytes(nbytes); } while (true); } catch (EOFException e) { // OK, just stop reading. } // All done, say we've read the contents. read = true; } /* Close the Tar file. */ public void close( ) { try { is.close( ); } catch (IOException e) { // nothing to do } } /* Returns an enumeration of the Tar file entries. */ public Enumeration entries( ) throws IOException, TarException { if (!read) { readFile( ); } return list.elements( ); } /** Returns the Tar entry for the specified name, or null if not found. */ public TarEntry getEntry(String name) { for (int i=0; i<list.size( ); i++) { TarEntry e = (TarEntry)list.elementAt(i); if (name.equals(e.getName( ))) return e; } return null; } /** Returns an InputStream for reading the contents of the * specified entry from the archive. * May cause the entire file to be read. */ public InputStream getInputStream(TarEntry entry) { return null; } /** Returns the path name of the Tar file. */ public String getName( ) { return null; } /** Returns the number of entries in the Tar archive. * May cause the entire file to be read. */ public int size( ) { return 0; } }
“But my patience is nearly at an end! Where’s the actual
reading?” Indeed, you may well ask. But it’s not there.
The actual reading code is further delegated to
TarEntry
’s constructor, which we’ll
see in a minute. Since TarFile
is patterned after
ZipFile
(see Section 9.19), it
doesn’t extend any of the I/O classes. Like
ZipFile
, a TarFile
is an object
that lets you get at the individual elements within a
tar-format archive, each represented by a
TarEntry
object. If you want to find whether a
particular file exists in the archive, you can call the
TarFile
’s getEntry( )
method. Or you can ask for all the entries, as we did
previously in TarList
. Having obtained one entry,
you can ask for all the information about it, again as we did in
TarList
. Or you could ask for an
InputStream
, as we did for zip files. However, that
part of the TarEntry
class has been left as an exercise
for the reader. Here, at last, is TarEntry
(Example 9-8), whose constructor reads the archive header
and stores the file’s beginning location for you, for when you
get around to writing the getInputStream
method.
As mentioned, I use lazy evaluation, simply reading the bytes into
some byte arrays, and don’t convert them to strings or numbers
until asked to. Notice also that the filenames and user/group names
are treated as byte strings and converted as ASCII characters when
needed as Strings
. This makes sense, because the
tar file format only uses ASCII characters at
present. Some Unix implementations of tar
explicitly look for null characters to end some of these strings;
this will need work from the Unix standards people.
Example 9-8. TarEntry.java
import java.io.*; /** One entry in an archive file. * @note * Tar format info taken from John Gilmore's public domain tar program, * @(#)tar.h 1.21 87/05/01 Public Domain, which said: * "Created 25 August 1985 by John Gilmore, ihnp4!hoptoad!gnu." * John is now [email protected], and by another path tar.h is GPL'd in GNU Tar. */ public class TarEntry { /** Where in the tar archive this entry's HEADER is found. */ public long fileOffset = 0; /** The maximum size of a name */ public static final int NAMSIZ = 100; public static final int TUNMLEN = 32; public static final int TGNMLEN = 32; // Next fourteen fields constitute one physical record. // Padded to TarFile.RECORDSIZE bytes on tape/disk. // Lazy Evaluation: just read fields in raw form, only format when asked. /** File name */ byte[] name = new byte[NAMSIZ]; /** permissions, e.g., rwxr-xr-x? */ byte[] mode = new byte[8]; /* user */ byte[] uid = new byte[8]; /* group */ byte[] gid = new byte[8]; /* size */ byte[] size = new byte[12]; /* UNIX modification time */ byte[] mtime = new byte[12]; /* checksum field */ byte[] chksum = new byte[8]; byte type; byte[] linkName = new byte[NAMSIZ]; byte[] magic = new byte[8]; byte[] uname = new byte[TUNMLEN]; byte[] gname = new byte[TGNMLEN]; byte[] devmajor = new byte[8]; byte[] devminor = new byte[8]; // End of the physical data fields. /* The magic field is filled with this if uname and gname are valid. */ public static final byte TMAGIC[] = { // 'u', 's', 't', 'a', 'r', ' ', ' ', ' ' 0, 0, 0, 0, 0, 0, 0x20, 0x20, 0 }; /* 7 chars and a null */ /* Type value for Normal file, Unix compatibility */ public static final int LF_OLDNORMAL =' '; /* Type value for Normal file */ public static final int LF_NORMAL = '0'; /* Type value for Link to previously dumped file */ public static final int LF_LINK = '1'; /* Type value for Symbolic link */ public static final int LF_SYMLINK = '2'; /* Type value for Character special file */ public static final int LF_CHR = '3'; /* Type value for Block special file */ public static final int LF_BLK = '4'; /* Type value for Directory */ public static final int LF_DIR = '5'; /* Type value for FIFO special file */ public static final int LF_FIFO = '6'; /* Type value for Contiguous file */ public static final int LF_CONTIG = '7'; /* Constructor that reads the entry's header. */ public TarEntry(RandomAccessFile is) throws IOException, TarException { fileOffset = is.getFilePointer( ); // read( ) returns -1 at EOF if (is.read(name) < 0) throw new EOFException( ); // Tar pads to block boundary with nulls. if (name[0] == ' ') throw new EOFException( ); // OK, read remaining fields. is.read(mode); is.read(uid); is.read(gid); is.read(size); is.read(mtime); is.read(chksum); type = is.readByte( ); is.read(linkName); is.read(magic); is.read(uname); is.read(gname); is.read(devmajor); is.read(devminor); // Since the tar header is < 512, we need to skip it. is.skipBytes((int)(TarFile.RECORDSIZE - (is.getFilePointer( ) % TarFile.RECORDSIZE))); // TODO if checksum( ) fails, // throw new TarException("Failed to find next header"); } /** Returns the name of the file this entry represents. */ public String getName( ) { return new String(name).trim( ); } public String getTypeName( ) { switch(type) { case LF_OLDNORMAL: case LF_NORMAL: return "file"; case LF_LINK: return "link w/in archive"; case LF_SYMLINK: return "symlink"; case LF_CHR: case LF_BLK: case LF_FIFO: return "special file"; case LF_DIR: return "directory"; case LF_CONTIG: return "contig"; default: throw new IllegalStateException("TarEntry.getTypeName: type " + type + " invalid"); } } /** Returns the UNIX-specific "mode" (type+permissions) of the entry */ public int getMode( ) { try { return Integer.parseInt(new String(mode).trim( ), 8) & 0777; } catch (IllegalArgumentException e) { return 0; } } /** Returns the size of the entry */ public int getSize( ) { try { return Integer.parseInt(new String(size).trim( ), 8); } catch (IllegalArgumentException e) { return 0; } } /** Returns the name of the file this entry is a link to, * or null if this entry is not a link. */ public String getLinkName( ) { // if (isLink( )) // return null; return new String(linkName).trim( ); } /** Returns the modification time of the entry */ public long getTime( ) { try { return Long.parseLong(new String(mtime).trim( ),8); } catch (IllegalArgumentException e) { return 0; } } /** Returns the string name of the userid */ public String getUname( ) { return new String(uname).trim( ); } /** Returns the string name of the group id */ public String getGname( ) { return new String(gname).trim( ); } /** Returns the numeric userid of the entry */ public int getuid( ) { try { return Integer.parseInt(new String(uid).trim( )); } catch (IllegalArgumentException e) { return -1; } } /** Returns the numeric gid of the entry */ public int getgid( ) { try { return Integer.parseInt(new String(gid).trim( )); } catch (IllegalArgumentException e) { return -1; } } /** Returns true if this entry represents a file */ boolean isFile( ) { return type == LF_NORMAL || type == LF_OLDNORMAL; } /** Returns true if this entry represents a directory */ boolean isDirectory( ) { return type == LF_DIR; } /** Returns true if this a hard link (to a file in the archive) */ boolean isLink( ) { return type == LF_LINK; } /** Returns true if this a symbolic link */ boolean isSymLink( ) { return type == LF_SYMLINK; } /** Returns true if this entry represents some type of UNIX special file */ boolean isSpecial( ) { return type == LF_CHR || type == LF_BLK || type == LF_FIFO; } public String toString( ) { return "TarEntry[" + getName( ) + ']'; } }
The TarFile
example is one of the longest in the
book. One could equally well use filter subclassing to provide
encryption. One could even, in theory, write a Java interface to an
encrypted filesystem layer, such as CFS (see ftp://research.att.com/dist/mab/cfs.ps
) or to
a version-archiving system such as
CVS (the Concurrent Versions
System; see http://www.cvs.org).
CVS is a good tool for maintaining source code; most large open
source projects now use it (see http://www.openbsd.org/why-cvs.html). In
fact, there is already a Java-based implementation of CVS (see http://www.jcvs.org/). Either of
these would be substantially more clever than my little tarry friend,
but, I suspect, contain rather more code.
For all topics in this chapter, Rusty’s book Java I/O should be considered the antepenultimate documentation. The penultimate reference is the Javadoc documentation, while the ultimate reference is, if you really need it, the source code for the Java API, to which I have not needed to make a single reference in writing this chapter.
18.226.104.153