Chapter 13. Simple Input Output

Simple Input Output

SCSI-wuzzy was a bus.

SCSI-wuzzy caused no fuss.

SCSI-wuzzy wasn't very SCSI, was he?

Designing input/output (henceforth “I/O”) libraries is a lot harder than it might appear. There have been some spectacular blunders in the past. One language, Algol 60, gave up on I/O altogether, leaving it out of the language specification and making it an implementation defined detail! Pascal has the deficiency that you can’t test for end-of-file until you have read from the file, distorting all while loops that control input. The C library gets() call to read a string from a stream is probably the biggest single security hole in Unix history. It was responsible for the November 1988 Morris Worm that permeated all the then 50,000 (!) hosts on the Internet. [1] Classic Fortran I/O is so ugly because John Backus’s team decided to reuse an existing IBM 704 assembler library instead of designing their own language support.

So what are the problems with Java I/O? There are several. First, it is a very large library, with 75 classes and interfaces. Size alone doesn’t make an interface poor, but there are too many low-value classes (like SequenceInputStream) and not enough high-value classes (like proper support for interactive I/O). Sun got the API wrong the first time and had to add many more classes in the JDK 1.1 release to properly cope with internationalization. Interactive I/O (text input-output to keyboard/screen) is particularly poorly supported. The Java I/O package is not intuitive to use, and there are a number of peculiar design choices often reflecting the use of the underlying standard C I/O library. That’s the same mistake the Fortran team committed 40 years earlier. Finally, because this library makes heavy use of wrapper classes, you need to use two or three classes to do simple file I/O.

Enough of the criticism. Let’s describe the philosophy of the java.io package. Obviously, its purpose is to conduct I/O on data and on objects. You will use this package to write your data into disk files, into sockets, into URLs, and to the system console, and to read it back again. There is some support for formatting character data, and for processing zip and jar files. Several other packages are involved in this, and the key ones are listed in Table 13-1. The point of presenting this table is to indicate the vast range of Java I/O features.

In this chapter we will cover the I/O basics. By the end of this chapter and the next, you will be versed in the use of the first five packages listed in Table 13-1. The “further reading” section at the end of each chapter will point out resources for more information on some of the other packages.

Table 13-1. Java Packages Involved with I/O

Package Name

Purpose

java.io

contains the 75 or so classes and interfaces for I/O.

java.nio

(new in JDK 1.4) an API for memory-mapped I/O, non-blocking I/O, and file locking.

java.text

for formatting text, dates, numbers, and messages according to national preferences and conventions.

java.util.regex

(new in JDK 1.4) for matching a string against a pattern or regular expression.

java.util.zip

used to read and write zip files.

java.util.logging

(new in JDK 1.4) a framework for recording and processing system or application messages to help with later problem diagnosis and resolution.

java.util.jar

used to read and write Jar files.

javax.xml.parsers

(new in JDK1.4) an API to read in and parse XML trees. Covered in Chapter 28.

javax.imageio

(new in JDK 1.4) an API for image file I/O (jpegs, gifs, etc.) together with common operations for handling them, such as thumbnail processing, conversions between formats, and color model adjustment.

javax.print

(new in JDK 1.4) the third attempt at providing a decent, enterprise-ready printing service.

javax.comm

support for accessing serial (RS-232) and parallel (IEEE-1284) port devices. Not part of the basic JDK.

javax.sound.midi

(new in JDK 1.3) provides interfaces and classes for I/O, sequencing, and synthesis of MIDI (Musical Instrument Digital Interface) data.

javax.speech

Speech recognition and output API under development. Third-party implementations are available now. This library will have the biggest impact in Java 2 Micro Edition on telephony applications.

Design Philosophy

The design philosophy for Java I/O is based on these principles:

  • I/O is based on streams. A stream has a physical “device” at one end (like a file or a location in memory). That physical device is managed by some class, and you wrap (layer) additional logical classes on top of that for specific kinds of I/O.

  • Programs that do I/O should be portable, even though I/O has some non-portable aspects. Platform differences in file names and line terminators must be handled in a way that ensures the code runs everywhere.

  • There are lots of small classes that do one thing, instead of a few big classes that do many things. There is one class that interprets data in binary format, and another class that reads data from a file. If you want to read binary data from a file, you use both of these classes. The constructors make it convenient to use the classes together.

We’ll see examples of these principles throughout the chapter. This is a long chapter, but a worthy one. To help you get the best out of it, Figure 13-1 represents the topics that will be covered, and how they are grouped together. As you can see, many of the I/O topics are freestanding and only relate to each other in a general way. If you feel lost at some point in this chapter, refer back to this diagram.

Topics in I/O.

Figure 13-1. Topics in I/O.

The first three boxes are covered in this chapter, and the next two in the following chapter on advanced I/O topics. Before looking at actual input/output classes, we’ll first cover the File and FileDescriptor classes that provide a convenient way to represent a filename in Java.

File and FileDescriptor Classes

These two classes can’t actually do any I/O! The class “java.io.File” should really be called “Filename” since most of its methods are concerned with querying and adjusting filename, and pathname information, not the contents of a file. Directory, filename and pathname information is often called “metadata,” meaning “data about data.” Methods of java.io.File allow you to access metadata to:

  • return a File object from a String containing a pathname

  • test whether a file exists, is readable/writable, or is a directory

  • delete the file, or create directory paths

  • say how many bytes are in the file and when it was last modified

  • get various forms of the file pathname.

Here are all the public members of java.io.file. Method names are in bold for visibility.

Public members of java.io.File

public class File implements Serializable, Comparable {
    public static final char separatorChar ; 
    public static final String separator ; 
    public static final char pathSeparatorChar ; 
    public static final String pathSeparator ; 

// constructors: 
    public File(String path); 
    public File(String director y,String file); 
    public File(File director y,String file); 

// about the file: 
    public String getName(); 
    public String getParent(); 
    public File getParentFile(); 
    public String getPath(); 
    public String getAbsolutePath(); 
    public File getAbsoluteFile(); 
    public String getCanonicalPath() throws IOException; 
    public File getCanonicalFile() throws IOException; 

    public boolean canRead(); 
    public boolean canWrite(); 
    public boolean exists(); 
    public boolean isAbsolute(); 
    public boolean isDirectory(); 
    public boolean isFile(); 
    public boolean isHidden(); 
    public long lastModified(); 
    public long length(); 

// about the directory 
    public String[]  list(); 
    public String[] list(FilenameFilter); 
    public File[] listFiles(); 
    public File[] listFiles(FilenameFilter); 
    public File[] listFiles(FileFilter); 
    public static File[] listRoots(); 
    public boolean mkdir(); 
    public boolean mkdirs(); 
// using temporary files 
    public boolean createNewFile() throws IOException; 
    public boolean delete(); 
    public void deleteOnExit(); 
    public static File createTempFile(String, String) throws IOException; 
    public static File createTempFile(String, String, File) throws IOException; 

// miscellaneous: 
    public boolean renameTo(File); 
    public boolean setLastModified(long); 
    public boolean setReadOnly(); 
    public int compareTo(File); 
    public int compareTo(Object); 
    public boolean equals(Object); 
    public int hashCode(); 
    public String toString(); 
    public URL toURL() throws java.net.MalformedURLException; 
} 

You create a File object by giving strings for the directory and filename. The file doesn’t actually have to exist when you instantiate the File object, and you can go on to create it using createNewFile(). You only bother to instantiate a File object if one of the operations listed above is of interest to you. If you just want to do some I/O, then keep reading—that information is coming soon. Most of the method names in File give a clear indication of what they do. Here are the details on some of the less obvious ones.

public int compareTo(File); 

This method compares the pathnames of two files for equality or otherwise. Filename comparisons are platform-dependent, as Microsoft Windows does not distinguish letter case in filenames. A straight alphabetic comparison would give the wrong result on Windows, but the compareTo method ensures the correct ordering for the platform.

public static File createTempFile(String prefix, String suffix) throws IOException; 

This routine creates a temporary file in the default temporary directory. A unique filename will be generated for you, and it will have the prefix and suffix that you provide. This lets you give all temporary files created by, e.g., your mail program—a name that starts with “mail” and ends with “.tmp.” There is another form of this method that takes a third parameter, a File object, to specify the directory.

public boolean createNewFile() throws IOException; 
public void deleteOnExit(); 

These two routines could be used together to provide a simple file-locking protocol, giving exclusive access to some other file or resource. The createNewFile method would either atomically create a new file, or return “false” if the file already existed. “Atomically create” means that the check for the existence of the file, and the creation of the file if it does not exist, form a single operation. If there are several copies of your program running and making the same call at the same time, only one of them will succeed in creating the file. JDK 1.4 introduces the java.nio package which provides more direct support for file locking. See Chapter 14 for details.

public boolean mkdir(); 
public boolean mkdirs(); 

The first method creates just that directory. The second creates that directory plus any non-existent parent directory as needed.

public String[]  list(); 
public File[] listFiles(); 

The first method returns an array of strings representing the files and directories in the directory the method is invoked on. The second method returns the same information, but as File objects, not strings.

public static File[] listRoots(); 

This method lists all the available filesystems on this system. On Windows systems, the array will hold File objects for “A:”, “C:”, “D:”, and so on, allowing the programmer to learn what active drives there are. On Unix, the root drive is just “/” the root filesystem. On Windows, File objects for the root directories of the local and mapped network drives will be returned. Windows UNC pathnames (Universal Naming Convention pathnames that start with “//”) indicate a non-local file and are not returned by this method.

FileDescriptor

The basic operating system object used to manipulate files is called a file descriptor, but you’re not expected to create them or work with them much in Java.

For interest, we will mention what descriptors are, and then move on to some tips on portable I/O. A file descriptor is a non-negative integer, 0,1,2,3... etc., that is used by the native I/O system calls to index a control structure containing data about each open file or socket. Each process has its own table of file descriptors, with each entry pointing to an entry in a system-wide file descriptor table. The size of the process descriptor table places a limit on the number of files or sockets that a process can have open simultaneously. A typical size is 128 or 256 file descriptors.

Java applications should not create their own file descriptors. The FileInputStream and FileOutputStream classes have methods that get the file descriptor for a file that you have open, and that open the file for a descriptor that you have. The class java.io.FileDescriptor is used when the operating system needs a file descriptor, perhaps for a JNI call, or as part of the runtime library.

Portability of I/O

The basic portability approach of the Java runtime library is to have the same method do slightly different things appropriate to each platform. The standard end of line sequence on Windows is “carriage return, linefeed,” while on Unix it is just “linefeed.” Any library method that writes an end of line sequence, such as System.out.println(), will output a “carriage return, linefeed” pair on Windows, a linefeed on Unix, and a carriage return when run on the Mac. In contrast, any string data where you wrote a literal ' ' (line feed) or ' ' (carriage return) in a string will be output on every platform exactly as you wrote it.

There is a field in the class java.io.File that contains the filename separator, which is a backslash on Windows and a forward slash on Unix. Instead of writing a pathname as “ a/b/c.txt,” you can write it using the separator character, ensuring that it will be correct on all platforms. If you insist on reducing portability by using literal strings for pathnames, remember that backslash is also the string escape character. Therefore, you have to write it twice (to escape itself) when writing literal file names for the PC. Here is an example:

String myFile = “a\b\c.txt”; 

You can actually use “/” to separate components in a Windows pathname in a program. The Windows file system calls all use it internally. The Windows interactive shell, COMMAND.COM, is the only part of the system that can’t handle it. This is an interesting historical artifact, dating from the origins of MS-DOS as an unauthorized port (known as QDOS) of CP/M with a few trivial changes. That port was eventually bought by Microsoft, renamed to MS-DOS, and the rest is history. Most programmers form a filename by declaring a variable with a briefer name to represent the separator character, like this:

final String s = java.io.File.separator; 
String myFile = "a" + s + "b" + s + "c.txt"; 

This helps, but does not provide 100% data file portability because it doesn’t mention the Windows drive. The form of an absolute pathname differs from system to system. One approach to minimizing this is to tell the program its data filenames at runtime, either as command line arguments or as system properties.

Output

Java programs access external data by instantiating a stream on the data. Most physical destinations (a place to where an output stream can flow) have a class dedicated to writing output there. For example, there is a file output stream class that opens a file for writing, a piped output stream class that opens a pipe for output to another thread, a byte array output stream class that opens a connection to a byte array in memory, and so on. Similar classes exist for each individual source of input (memory, socket, file, URL, etc.). There is only a limited number of destinations for data. The most common places to write data are:

  • a sequential file

  • a String

  • a pipe

  • the system console

  • an array of characters

  • a URL for an HTTP GET/POST

  • a random access file

  • an array of bytes

  • a socket

Some of these destinations have their own dedicated class whose constructor returns a stream. Other destinations have a getOutputStream() method that will hand you back a stream to write into. Either way, the stream object hides the low-level details of how the data is accessed, and just lets you get to it. After the information stream is opened, some other class is usually wrapped on top to actually transfer the data.

The additional wrapping classes provide logical operations, like binary I/O, printable I/O, encryption, or compression. You use the wrapping classes the same way and call the same methods regardless of whether the underlying I/O is to a socket, file, keyboard, URL, pipe, array, etc. This provides the benefit of a uniform API. Wherever you are transferring the data, you will use methods of the same two or three classes. The disadvantage is that you have to instantiate objects from two or three different classes to get anything done.

Originally, Java only had stream classes, and the streams only operated on bytes of data. However, characters in Java are two bytes wide, and byte-oriented I/O did not properly cope with internationalization. So a wider type of stream was introduced in JDK 1.1 specifically for character-based I/O. Reader classes are able to get Unicode character input two bytes at a time. Writer classes are able to do Unicode character output two bytes at a time, as shown in Figure 13-2. Input and output streams operate on data one byte at a time.

Your program outputs data into a stream or writer.

Figure 13-2. Your program outputs data into a stream or writer.

Writers are used when you want to output Unicode data, and output streams are used when you want to output ASCII or binary data. Readers are used when you want to read Unicode data, and input streams are used when you want to read ASCII or binary data. Readers and Writers are intended to replace byte-sized character I/O streams. They do the same job as Streams, and have a very similar API. In summary, Streams operate on bytes, while Readers/Writers operate on double-byte characters and therefore handle internationalization properly. We’ll start by looking at character Writers, then we’ll look at byte output Streams.

Outputting Double-byte Characters

All Writer classes output double-byte Unicode characters. Most operating systems expect characters to be one byte long, so you will only use Writers when you need the internationalization features that Unicode offers (e.g., you need to represent Cyrillic letters), or if you intend to later read the strings back in for further processing.

All Writer classes have five basic write() methods for transferring 16-bit characters. The many subclasses of Writer add more methods, but you can count on these five methods in every writer. You can write a single character, or the characters from an array, or a String, or a range of characters from an array or String. Here are the signatures of the Writer output methods:

public void write(int); 
public void write(char[]); 
public void write(char[], int from, int len); 
public void write(java.lang.String); 
public void write(java.lang.String, int from, int len); 

They are methods promised by the abstract class Writer, and all java.io classes with “Writer” in their name have them. You use Writers for outputting internationalizable text and numbers that some person will read, as opposed to binary or ASCII or object values for further computer processing.

First you choose whether you want printable internationalizable output or byte output. The former means you use a Writer, the latter means you use an OutputStream. Next you decide where you want to send the output. If you originally chose a Writer, you will now choose one of these Writer subclasses accordingly (see Table 13-2).

The class FileWriter is by far the most common place to send chars with a Writer. That class opens a connection onto a file. Its constructors are shown in Table 13-2. Some of the constructors take an argument that is a File or FileDescriptor object. File or FileDescriptor objects are merely ways of referring to a file without using its string name.

An example line of code that instantiates a FileWriter for a file called “jj4example.txt” is

FileWriter myFW = new FileWriter( "\jj4\example.txt” ); 

You have to double the backslash when it appears in a string because the backslash is also the escape character in a string. For example, “ ” is a newline and “ ” is a tab. The mistake of choosing the MS-DOS pathname separator character as the string escape happened because everyone on the original Java design team was a hardcore Unix whacker. None of them had ever written much for DOS, and none of them realized that backslash already had an established use on that platform.

Table 13-2. Choose the Writer Class Based on the Output Destination

Send Output To:

java.io Class

Constructors

A file

FileWriter

FileWriter(String fileName) throws IOException

FileWriter(String fileName, boolean append) throws

IOException

FileWriter(File file) throws IOException

FileWriter(File file, boolean append) throws IOException

FileWriter(FileDescriptor fd)

A char array in your program

CharArrayWriter

CharArrayWriter()

CharArrayWriter(int initialSize)

A String in your program

StringWriter

StringWriter()

StringWriter(int initialSize)

A pipe to be read by a PipedReader in another thread

PipedWriter

PipedWriter()

PipedWriter(PipedReader sink)

That line of code above opens a connection to the destination file, and gives you the basic writing methods. There is no separate “open” method. Note that the constructor can throw an exception, so you need to place the statement in a “try” statement.

Writer classes output Unicode (16-bit) characters, but most operating systems only support 8-bit characters by default. To cope with this, you can define a character set that specifies how Unicode will be turned into bytes. The simplest such character set mapping is “discard the high order bytes,” and this is the usual default for the file system. We’ll take a longer look at character sets in the next chapter. If you are using an 8-bit (ASCII or Latin-1) codeset and you don’t care about internationalization, don’t bother with Writers at all. Do all your I/O in bytes with OutputStreams.

Wrapping a Writer

Those five “write” output methods common to every Writer class (listed at the start of this section) are very spartan. Therefore, you usually “wrap” the destination Writer class with another Writer class that has more output methods. You wrap a class around the Writer by passing the Writer as an argument to the wrapping class’s constructor. Here’s an example that wraps a PrintWriter around the FileWriter we declared a few lines back:

// myPrt wraps myFW 
PrintWriter myPrt = new PrintWriter(myFW); 

The PrintWriter class is the one with the methods that actually transfer data to the Writer destination as printable strings. When you call a PrintWriter method, the data goes through to the destination writer that it is wrapped around, in this case the FileWriter.

Methods of java.io.PrintWriter

public class java.io.PrintWriter extends java.io.Writer {
    public PrintWriter(java.io.Writer); 
    public PrintWriter(java.io.Writer,boolean); 
    public PrintWriter(java.io.OutputStream); 
    public PrintWriter(java.io.OutputStream,boolean); 
    public void flush(); 
    public void close(); 
    public boolean checkError(); 

    public void print(boolean); 
    public void print(char); 
    public void print(int); 
    public void print(long); 
    public void print(float); 
    public void print(double); 
    public void print(char[]); 
    public void print(java.lang.String); 
    public void print(java.lang.Object); 

   public void println(); 
     // there are also println versions of all the above print methods, e.g. 
   public void println(boolean); 
    // and so on... through to... 
   public void println(Object); 

   public void write(int); 
   public void write(char[]); 
   public void write(char[], int from, int to); 
   public void write(java.lang.String); 
   public void write(java.lang.String, int from, int to); 
} 

There is a print() method for most primitive types, and also a println() method that follows the output with the end of line sequence for that platform. There is no print(byte) because byte-oriented output is done with an OutputStream, not a Writer. The print methods are all implemented by calling the write methods. These write methods override similar methods in the parent Writer, and suppress IOExceptions. No methods in PrintWriter throw IOException. You can call the method checkError() to see if an I/O error occurred at some earlier point. The API designer should have named that method “ isError ” to follow the standard naming conventions, by the way.

As an aside, all the print methods are implemented in terms of the five basic write methods. Here is java.io.PrintWriter’s print(int) method in full:

public void print(int i) {
     write(String.valueOf(i)); 
} 

Writing everything in terms of the five basic write methods makes it very easy to implement wrapping. When you call myPW.print(i), that routine calls myPW.write(i) and that calls the write method of the object passed in to the constructor. The delegation continues along the line until the write request has been through all the wrappers and reaches an object that does the operation directly on a file or string in memory, etc.

Here is the code to create a file and write printable numbers and strings into it.

FileWriter myFW = null; 
try {
          myFW = new FileWriter( "\jj4\dogs.txt” ); 
} catch(IOException x) { System.err.println(“IOExcpn: “ + x); } 

PrintWriter myPW = new PrintWriter( myFW ); 
int i =101; 
myPW.print( i ); 
myPW.println(" Dalmatians”); 
myPW.close(); 

Notice the close() method in PrintWriter. Even though there is no separate open for files, all I/O stream classes have a close method. You should develop the habit of closing each stream as you are done with it. Java does not automatically flush and close streams just because you stop writing to them. Failing to close an output stream may leave it with some data not yet flushed to the underlying device. You should also close input streams when you are done with them, as a matter of good programming practice. Streams take up some OS resources, of which there is a limited quantity. By closing a stream when you are finished with it, you allow the JVM to give the file descriptor and buffers back to the OS. Also, for pipes/sockets/URLs, closing allows the other end of the connection to see an end-of-file, and therefore is able to gracefully terminate instead of waiting for an event that may never happen.

Other Writer Wrappers

As Figure 13-3 suggests, there are two other writer classes that you can wrap around another Writer. You can wrap a BufferedWriter around it. Or you can create your own subclass of FilterWriter, and wrap that around a Writer. These two classes do specialized extra processing on the stream before it gets written to its destination.

Wrapping more Writer classes.

Figure 13-3. Wrapping more Writer classes.

You decorate a FileWriter (StringWriter… etc.) with a BufferedWriter to improve performance. FileWriter by itself sends its output to the underlying stream as it receives it. Because of disk latency and system call context switch overhead, it’s always quicker to do one 512-character transfer than 512 individual 1-character transfers. Wrapping a BufferedWriter around any Writer will achieve that efficiency by saving up smaller writes until its internal buffer is full. Buffered I/O should have been the default behavior in this package.

FilterWriter is an abstract class that you are meant to extend and override. It provides the opportunity to look at and modify characters as they are output. You could do the same thing by extending any of the other writer classes, but using this class makes your purpose explicit.

Here is an example program that post-processes the stream written into it and changes all “1”s to “2”s. A Filter can do other things like count lines, correct spelling mistakes, calculate checksums, or write an encrypted or compressed stream.

A Filter to Replace Chars

import java.io.*; 
class MyFilter extends java.io.FilterWriter {
        public MyFilter(Writer w) { super(w); } 

        public void write(String s, int off, int len) throws java.io.IOException {
             s = s.replace('1', '2'), 
             super.write(s, off, len); 
        } 

        public void write(char[] cbuf, int off, int len) throws IOException {
             String s= new String(cbuf); 
             this.write(s, off, len); 
        } 
} 

The lesson to take away from this is what a small amount of code is needed for such a large amount of functionality. Here is the code that uses the above filter. Try compiling and running it to see the results.

A Class that Uses a Filter

import java.io.*; 
public class Example2 {

       public static void main(String args[]) {
                FileWriter myFW = null; 
                try {
                         myFW = new FileWriter( "dogs.txt” ); 
                } catch(IOException x) { System.err.println(“IOExcpn: “ + x); } 

                FilterWriter filter = new MyFilter( myFW ); 
                BufferedWriter BW = new BufferedWriter(filter); 
                PrintWriter myPW = new PrintWriter( BW ); 
                myPW.println(“101 Dalmatians”); 
                myPW.close(); 
        } 
} 

In this example, the Filter overrides two of the write() methods, but you may need to override any or all of the FilterWriter methods depending on what you are doing. The example code also wraps a BufferedWriter around the filter, just to show how it is done. The API permits the BufferedWriter and the FilterWriter to be wrapped in either order. But to get the most performance benefit, you want as much of the pipeline as possible operating on buffers of data. So put the BufferedWriter as near to the start of the pipeline as possible (i.e., the BufferedWriter should be the outermost or next-outermost wrapper). This ensures that all wrapper objects downstream of the BufferedWriter are working with buffered data.

It’s common to cascade all these constructors together, like this:

PrintWriter myPW = new PrintWriter(
                                   new BufferedWriter(
                                       new myFilter(
                                            new FileWriter( "\jj4\dogs.txt” ) ) ) ); 

If you run the program and look at the output file dogs.txt, you will see the output has been filtered. It now contains “202 Dalmatians”.

Summary of Writers

  • Use a Writer when you want to output printable, internationalizable 16-bit characters

  • Choose a FileWriter or one of the other three destination classes, depending on where you want the chars to go

  • You can optionally wrap that in either or both of a BufferedWriter or your subclass of a FilterWriter

  • Then wrap a PrinterWriter on top, and use its print methods to do the output

  • Wrapping one class by another to give it additional abilities is an example of the Decorator or Wrapper design pattern.

The next section looks at output streams, which are analogous to Writers but work on a byte at a time, not a Unicode character at a time.

Outputting ASCII Characters and Binary Values

Let’s admit right away that this section is really about the output of binary values and 8-bit characters, of which the ASCII characters are just a subset. The full 8-bit character set is known as ISO 8859-1 Latin-1, but no one has heard of that and everyone has heard of ASCII, hence the heading above. Both ASCII and 8859-1 are listed in Appendix C so you can review them.

We’ll first take a look at how to output binary values. Contrast the character output of the previous section with the binary output of this section. Character output is intended to be read by people, whereas binary output is intended for further processing by computers. The character representation of a number varies in length depending on the size of the number. The binary representation of a number is a fixed length: 4 bytes for an integer, 8 bytes for a long value. Just to make a point, the number 29,019 can be represented in a computer in several ways, as shown in Table 13-3.

Table 13-3. The Number 29,019 Stored as Three Different Types

Type

Description

Hexadecimal Value

How It Looks When Printed

String

successive double-byte characters

0032 0039 0030

0031 0039

29019 or 2 9 0 1 9 (if your design has a bug)

Array of ASCII bytes

successive bytes

32 39 30 31 39

29019

int

four byte binary integer

0000715B

q[

Notice that the binary form of a number is not directly printable. Some bytes may happen to contain printable ASCII values (I chose this example so some do), but they don’t show the digits of the number. Note also that if you write Unicode into a file and then print that, you’ll usually get strange extra spacing because the OS print routines typically don’t know about Unicode and will find unexpected extra bytes everywhere.

The output stream classes are used when you don’t want a double-byte character form. The API is very similar to that of the Writer classes. As with Writers, you decide where you want to send the output, and choose one of the classes accordingly (see Table 13-4).

Table 13-4. Choose the Output Stream Class Based on the Output Destination

Send Binary Output To:

java.io Class

Constructors

A file

FileOutputStream

public FileOutputStream(java.lang.String) throws java.io.FileNotFoundException;

public FileOutputStream(java.lang.String,boolean) throws java.io.FileNotFoundException;

public FileOutputStream(java.io.File) throws java.io.FileNotFoundException;

public FileOutputStream(java.io.File,boolean) throws java.io.FileNotFoundException;

public FileOutputStream(java.io.FileDescriptor);

A byte array in your program

ByteArrayOutputStream

public ByteArrayOutputStream();

public ByteArrayOutputStream(int);

A pipe to be read by a PipedInputStream in another thread

PipedOutputStream

public

PipedOutputStream(java.io.PipedInputStream) throws java.io.IOException;

public PipedOutputStream();

There isn’t an output stream to write to a String, because you should use a Writer class, not a stream class, for that. There are also a couple of output destinations that you connect to using a method call to get the stream, rather than a constructor. As shown in Table 13-5, that is how you get a stream that writes into a socket or a URL. Sockets and URLs are treated differently because they can be read as well as written. The random access file can be open for reading and writing simultaneously too, and it also gets special treatment, as described in the next chapter.

Table 13-5. Choose the getOutputStream() Method Based on the Destination

Send

Output To: Class

Method in That Class to Get an Output Stream

A socket

java.net.Socket

public OutputStream getOutputStream() throws java.io.IOException;

A URL

java.net.URLConnection

public OutputStream getOutputStream() throws java.io.IOException;

Using the three classes or the two methods outlined, we can get an output stream that writes bytes into a file, a socket, a URL, a pipe, or a byte array. Writing to a URL takes a little bit of URL-specific setup (you have to open a connection, create a connection object, configure the server to allow writing, etc.), but the other four destinations are very easy to use.

Basic OutputStream Methods

All Output Streams have these three basic output methods. You can write a single byte, or the bytes from an array, or a range of bytes from an array. Here are the signatures of these output methods:

public void write(int b); 
public void write(byte[]); 
public void write(byte[], int from, int len); 

The first method accepts an int argument, although you might expect a byte. This is a concession to reduce the amount of casting you may otherwise sometimes need to do. Whatever size of value you send in, only the least significant byte will be output.

These methods are promised by the abstract class OutputStream. All java.io classes with “OutputStream” in their name have these. You use OutputStreams for outputting bytes or binary values, not Unicode characters or objects. As with the Writer classes, you are expected to wrap another class around your Output Stream. One possible wrapper is a PrintStream class. Wrap that around a FileOutputStream or socket output stream, etc., when you wish to write printable bytes such as ASCII values. Another possible wrapper is the DataOutputStream class. Use DataOutputStream when you wish to do binary I/O. DataOutputStream has these output methods to write numbers in binary.

java.io.DataOutputStream for Binary Output

 public class java.io.DataOutputStream 
           extends java.io.FilterOutputStream 
                    implements java.io.DataOutput {
  // constructor 
    public DataOutputStream(java.io.OutputStream); 
    public final void writeBoolean(boolean) throws java.io.IOException; 
    public final void writeByte(int) throws java.io.IOException; 
    public final void writeShort(int) throws java.io.IOException; 
    public final void writeChar(int) throws java.io.IOException; 
    public final void writeInt(int) throws java.io.IOException; 
    public final void writeLong(long) throws java.io.IOException; 
    public final void writeFloat(float) throws java.io.IOException; 
    public final void writeDouble(double) throws java.io.IOException; 
    public final void writeBytes(java.lang.String) throws java.io.IOException; 
    public final void writeChars(java.lang.String) throws java.io.IOException; 
    public final void writeUTF(java.lang.String) throws java.io.IOException; 
    public void flush() throws java.io.IOException; 
    public synchronized void write(int) throws java.io.IOException; 

    public synchronized void write(byte[], int, int) throws java.io.IOException; 
    public final int size();  // returns number-of-bytes written so far 
} 

You will use DataOutputStream when you want to output numbers in binary format for later processing by another program. There is a write method for all primitive types, and also for Strings. Depending on which method you use, Strings will be written as 16-bit Unicode chars (writeChars), as 8-bit bytes discarding the high-order byte of each char (writeBytes), or in the UTF-encoded format where characters are 1-3 bytes in length (writeUTF) and preceded by a 16-bit length field. If you’re not sure what to use, you should write Strings using the writeBytes method. Or use a PrintStream instead. You use a PrintStream if all you need to do is output printable ASCII bytes.

When you wrap several classes, only write from the outermost one. Otherwise, your I/O may get mixed up due to internal buffering. You will use PrintStream when you want to output ISO 8859-1 text and numbers in readable format for reading by a person, but you do not need internationalization. The class java.io.PrintStream has the following methods:

java.io.PrintStream for Printable Output

 public class java.io.PrintStream extends java.io.FilterOutputStream {
    public PrintStream(java.io.OutputStream); 
    public PrintStream(java.io.OutputStream,boolean autoFlush); 
    public PrintStream(java.io.OutputStream,boolean,String encoding) throws 

java.io.Unsuppor tedEncodingException; 
    public void print(boolean); 
    public void print(char); 
    public void print(int); 
    public void print(long); 
    public void print(float); 
    public void print(double); 
    public void print(char[]); 
    public void print(java.lang.String); 
    public void print(java.lang.Object); 

    public void println(); 
    // there are also println versions of all the above print methods, e.g. 
    public void println(boolean); 
    // and so on ... 

    public void flush(); 
    public void close(); 
    public boolean checkError(); 
    public void write(int); 
    public void write(byte[], int, int); 
} 

Use PrintStream to write ASCII bytes. Use PrintWriter when you need to write internationalizable Unicode characters. All characters printed by a PrintStream are converted into bytes using the platform’s default character encoding, or the encoding given as an argument String to the constructor. Some examples of this are at the end of the next chapter.

System.in, out, and err

On all Unix operating systems, and on Windows, three file descriptors are automatically opened by the shell that starts every process. This is even true on the Mac with OS-X (because OS-X is based on the Mach variant of Unix). The file descriptor convention is so common because it is a part of the C language API. File descriptor ’0’ is used for the standard input of the process. File descriptor ’1’ is used for the standard output of the process, and file descriptor ’2’ is used for the standard error of the process.

These three standard connections are known as “standard in,” “standard out,” and “standard err” or error. Normally, the standard input gets input from the keyboard, while standard output and standard error write data to the terminal from which the process was started. Every Java program contains two predefined PrintStreams, known as “out” and “err.” They are kept in Java.lang.System, and represent the command line output and error output, respectively. There is also an input stream called System.in that is the command line input. This is also referred to as console I/O or terminal I/O.

Anytime you have written System.out.println(“foo = “ + foo); you have already used a PrintStream, maybe without knowing it. See? It’s easier than you thought!

You can redirect the standard error, in, or out streams to a file or another stream (such as a socket) this way:

System.setErr(PrintStream err); 
System.setIn(InputStream in); 
System.setOut(PrintStream out); 

Stdin and stdout are used for low volume interactive I/O. Stderr is intended for error messages only. That way, if the output of a program is redirected somewhere, the error messages still appear on the console.

Writing a Binary File

Here is some sample code to create a file and write binary numbers into it.

FileOutputStream myFOS = null; 
try {
        myFOS = new FileOutputStream( "numbers.bin” ); 
        DataOutputStream myDOS = new DataOutputStream( myFOS ); 
        myDOS.writeInt(29019 ); 
        myDOS.writeInt(3); 
        myDOS.writeInt(5); 
        myDOS.writeInt(67); 
} catch(IOException x) { System.err.println(“IOExcpn: “ + x); } 

If you look at the “numbers.bin” output file with an editor that can display the contents of a file in hexadecimal, you will see four, four-byte ints there containing the values written. Later in this chapter, we’ll develop the code to dump the contents of a file that way.

Instead of a DataOutputStream or a PrintStream, you can layer an ObjectOutputStream and write Java objects from your program out to disk or across the net on a socket. There’s a longer explanation of object I/O in the next chapter.

Output Stream Wrappers

At the beginning of the chapter, we saw how a Writer could have a BufferedWriter and/or a subclass of FilterWriter interposed between the FileWriter (or other destination) and the PrintWriter. Output Streams can be wrapped in the same way to provide more functionality. You can wrap any or all of these output streams onto your original OutputStream:

  • BufferedOutputStream

  • Your subclass of FilterOutputStream

  • OutputStreamWriter

  • java.util.zip.ZipOutputStream

  • java.util.zip.GZIPOutputStream

  • java.util.jar.JarOutputStream

  • javax.crypto.CipherOutputStream

  • java.io.ObjectOutputStream

  • various others in the release, and which you write yourself.

The OutputStreamWriter class converts an OutputStream class to a Writer class, allowing you to layer any of the Writer classes on top of that. It provides a bridge from the 8-bit byte world to the 16-bit character world. The main motivation for doing so is that you can also specify the character set when you construct an OutputStreamWriter. Please review the next chapter for more information on character sets. It’s not something frequently used.

The CipherOutputStream will encrypt the stream that it gets and write the encrypted bytes. You have to set it up with a Cipher object (and a key). There is more detail in the online API documentation.

The zip, gzip, and jar output streams will compress the bytes written into them using the zip, gzip, and zip algorithms, respectively. Jar format is identical to zip format, but with the addition of a manifest file listing the names of other files in the archive. An example of writing an archive of several files in Zip format is shown in the next section.

The ObjectOutputStream class allows you to save an object and all the objects it references. You can wrap this class around any of the other output streams and send the object to a file, to a socket, down a pipe, etc. The next chapter shows an example of ObjectOutputStream in use.

Example of Outputting a Zip File

Zip is a multifile archive format popularized by the PC but available on almost all systems now. The zip format offers two principal benefits: it can bundle several files into one file, and it can compress the data as it writes it to the zip archive. It’s more convenient to pass around one file than twenty separate files. Compressed files are faster and cheaper to download or e-mail than their uncompressed versions. Java Archives (.jar) files are in zip format.

Support for zip and gzip files was introduced with JDK 1.1. GZIP, an alternative to ZIP widely used on Unix, uses a different format for the data and can only hold one file (not a series of them).

Java has classes that will compress and expand files into either the gzip or the zip format. If you wrote a file out in zip format, you have to read it back in that way too. The same holds for gzip format. The formats are not interchangeable. If you have a choice, opt for zip over gzip because it does more and is much more widely used.

Files aren’t the only possible destination for zip streams (or any output, compressed or otherwise). You can equally send streams through a socket to another computer across the Internet, put them in a String or byte array for later retrieval, or send them through a pipe to another thread. The following is an example program showing how three files can be put into a zip archive. After running this program, compare the size of the zip archive with the sum of the sizes of the three files. Text strings compress well, binary data less so.

Writing a Zip Archive

import java.io.*; 
import java.util.zip.*; 
public class Example4 {

      // writing a zip archive 
    static ZipOutputStream myZOS; 

      public static void main(String args[]) throws IOException {
               myZOS = new ZipOutputStream (
                                        new BufferedOutputStream (
                                                new FileOutputStream(“code.zip”) ) ); 
               writeOneFile(“Example1.java”); 
               writeOneFile(“Example2.java”); 
               writeOneFile(“Example3.java”); 
               myZOS.close(); 
      } 

      static void writeOneFile(String name) throws IOException {
               ZipEntr y myZE = new ZipEntr y(name); 
               myZOS.putNextEntr y(myZE); 

               BufferedReader myBR = new BufferedReader(
                                                                       new FileReader
Writing a Zip Archive(name) ); 
               int c; 
               while((c = myBR.read()) != -1)     // read a char until EOF 
                        myZOS.write(c);                  // write the char we just read 
               myBR.close(); 
      } 
} 

Executing this program will create a zip archive called code.zip. Each file in a zip archive is represented by an object called a ZipEntry. You can unpack it and recover the original source files with a Zip Input Stream, or use any of the standard Zip tools like winzip or Java’s jar command.

Summary of Output Streams

  • Use an Output Stream when you want to output ASCII or binary values

  • Choose a FileOutputStream or one of the getOutputStream methods, depending on where you want the chars to go

  • You can optionally wrap that in an arbitrary number of OutputStream filters, buffers, compressors, encoders, etc.

  • Then wrap a DataOutputStream on top, and use its write methods to output numbers in binary.

A very common mistake in Java is to use binary I/O where Unicode or ASCII I/O was intended. The numeric values transferred will not usually be human readable, and you’ll get a different length of data and a different value of data than you were expecting. The character values transferred will be fine because an ASCII character has the same bit representation whether it was written using an ASCII method or a binary method.

Let’s finish this section by looking at how Java copes with platform differences in I/O and data. Table 13-6 shows some I/O-related platform differences. The rightmost column shows the approach Java takes to minimize these differences.

Table 13-6. Platform Differences in I/O

MS Windows

Unix

Apple Mac

Java Feature

end of line characters

System.getProperty ( ”line.separator” )

filename separator

''

'/'

':'

java.io.File.separator

pathnames

volume :cd or \host share cd

/a/b/c/d

volume :b:c:d

pass pathname to program as an argument

data byte order

little-endian

varies with hardware

big-endian

big-endian, see text in next chapter

The end-of-line sequence is different on different platforms. Most of the time it doesn’t matter. You can write out a “ ” character and platforms will interpret it correctly. If you have some legacy code that requires the actual end-of-line sequence, you can obtain it with the following method call:

String actualEOL = java.lang.System.getProper ty(“line.separator”); 

That statement will put the EOL sequence used by this platform into the variable. The println() methods of PrintWriter and PrintStream also output the EOL sequence of the specific platform. The I/O API often allows a file to be identified in two parts: the directory it is in, and the filename. That allows you to split off the platform-sensitive directory pathname from the comparatively portable filename string.

Input

The classes to do input are mostly the flip side of the output classes we have already seen. Java programs access external data by instantiating a stream on the data source. Each place from which an input stream can flow has a class dedicated to getting that kind of input. Input is read from a stream of data representing the file, pipe, socket, memory array, or whatever. If you want to read 16-bit characters, you use a Reader class. If you want to read binary bytes or ASCII, you use an input stream.

Inputting Double Byte Characters

As usual, first decide between binary and character I/O, then choose your class based on where the data is coming from. For reading double-byte character data, you will use one of the Reader classes shown in Table 13-7. Note the symmetry with the Writer classes.

Table 13-7. Chose the Reader Class Based on Where the Input Comes From

Get Input From:

java.io Class

Constructors

A file.

FileReader

FileReader(java.lang.String) throws java.io.FileNotFoundException;

FileReader(java.io.File) throws java.io.FileNotFoundException;

FileReader(java.io.FileDescriptor);

A char array in your program. You read from the array passed to the constructor.

CharArrayReader

CharArrayReader(char[]);

CharArrayReader(char[],int from,int to);

A String in your program. You read from the String passed to the constructor.

StringReader

StringReader( String s )

A pipe that is written by a PipedWriter in another thread.

PipedReader

PipedReader()

PipedReader(PipedWriter source)

There are only four places from which you can read chars with a Reader, and FileReader is by far the most common. That class opens a connection onto a file. The constructors are shown in the table above. The constructor takes an argument that is the String pathname to the file, or a File object or FileDescriptor object.

Basic Reader Methods

All Readers give you at least these three somewhat basic input methods:

public int read() 
public int read(char[] cbuf) 
public int read(char[] cbuf, int from, int len) 

These read into, respectively, a single character, an array of characters, and a range in an array of characters. The call will not return until some data is available to read, although it won’t necessarily fill the array. The return value will be -1 if the Reader hits end of file (EOF). This is why the single char call returns a 32-bit int, even though it only reads a 16-bit character. The high-order 16 bits in the return value allow you to distinguish EOF from a character read. Those bits will be zero when a character is read, and 0xFFFF when EOF is reached. Test the return value for equality with -1 to see if you reached EOF.

An Input Problem Rears Its Ugly Head

At this point, from general symmetry, you are probably expecting to “wrap” another class on top of these Readers. That class will probably be called PrintReader, and it will have all the convenient methods for reading a String and returning a short, an int, a float, a boolean, etc. Bzzzt! Sorry, the design falls apart here. There is no such class as PrintReader. Not only that, there is no class that can give you the desired feature of being able to read back in exactly the same number values as were output using PrintWriter. The problem is an algorithmic one. Numbers in printed format vary in length, so your code has no way of telling where one number ends and an immediately adjacent starts. For example, if you use PrintWriter to print two ints like this into a file:

myPW.print(293); 
myPW.print(19); 

The file will contain the string “29319”, but if you try to read those numbers back in from that file, because of the variable length, there is no way of telling where one int ends and the next one starts. No program in any language can deduce whether the ints were originally 2 and 9319, or 29 and 319, or 293 and 19, or 2931 and 9. This problem does not arise with binary output, because all the binary types (short, int, float, etc.) have a fixed known size. This “where does a string of digits end?” problem is the reason that the XML language (see Chapter 28 on XML) always marks the end of a field with a closing tag.

The best you can do when reading printed numbers is to read characters until you hit something that can’t be part of a number (e.g., a space), then assemble a number out of the characters that preceded it. It’s common to break up such files with end-of-line sequences, and it’s convenient to be able to read a line at a time and tokenize (bundle together the groups of) the characters on that line.

You really need something more than reading individual characters though, so a kludgey little hack has been devised. A readLine() method has been placed in class BufferedReader. You can wrap a BufferedReader around a FileReader, which you probably want to do anyway for the performance, and then read lines from it. Each line comes into your program as a String with the end-of-line sequence removed. You can then tokenize the line however you like to recover individual values. Here’s an example of reading a line that way.

import java.io.*; 
public class Example4 {
       public static void main(String args[]) {
                FileReader myFR = null; 
                try {
                          myFR = new FileReader( "\jj4\dogs.txt” ); 
                } catch(IOException x) { System.err.println(“IOExcpn: “ + x); } 

                BufferedReader myBR = new BufferedReader(myFR); 

                try {
                          String in = myBR.readLine(); 
                          System.out.println(in); 
                } catch(IOException x) { System.err.println(“IOExcpn: “ + x); } 
       } 
} 

Another approach is to write all numeric strings into fixed length fields, say, 20 characters long. Then you can always read in fixed length input. Another approach is to avoid reading/writing printable data; instead, do it all with binary byte streams. Input is certainly messier than output.

Reader Wrappers

There is the usual variety of wrapper classes that can wrap a Reader, shown in the Figure 13-4.

Wrapping the Reader classes.

Figure 13-4. Wrapping the Reader classes.

Classes That Wrap Readers

The classes that wrap a Reader are:

  • BufferedReader. This class can provide a performance boost, and also has a readLine() method. The BufferedReader needs to wrap the class that actually accesses the data (e.g., the FileReader or whatever). Other classes may be layered on top of the BufferedReader, too.

  • FilterReader. You subclass FilterReader, and your overriding methods allow you to see and modify individual characters as they come in—before the rest of your program sees them.

  • LineNumberReader. This class keeps track of the line number count on this stream. You can find out the input line you are currently on by calling getLineNumber(). This class doesn’t really offer enough value to justify its existence. It was written to support the first Java compiler and included in the API for no good reason.

  • PushbackReader. This class maintains an internal buffer that allows characters to be “pushed back” into the stream after they have been read, allowing the next read to get them again. The default buffer size is one character, but there is a constructor that lets you specify a larger size. You might use this if you were assembling successive characters into a number and you come to a character that can’t be part of a number. You will push it back into the input stream so it can be ignored, but kept available for the next read attempt.

Inputting ASCII Characters and Binary Values

You choose an input stream when you want to bring bytes into your program. As with Readers, you decide where you want to read from, and choose one of the InputStream classes accordingly (see Table 13-8).

Table 13-8. Choose the Input Stream Class Based on the Source of the Input

Read Binary Input From:

java.io Class

Constructors

A file

FileInputStream

public FileInputStream(java.lang.String) throws java.io.FileNotFoundException;

public FileInputStream(java.io.File) throws java.io.FileNotFoundException;

public FileInputStream(java.io.FileDescriptor);

A byte array in your program

ByteArrayInputStream

public ByteArrayInputStream(byte []);

public ByteArrayInputStream(byte [], int from, int len);

A pipe to be read by a PipedOutputStream in another thread

PipedInputStream

public

PipedInputStream(java.io.PipedOutputStream) throws java.io.IOException;

public PipedInputStream();

A StringBuffer object

StringBufferInputStream

this class has been deprecated, don't use it.

As before, you connect to socket and URL input streams using a method call, rather than a constructor. The method getInputStream() returns an input stream connected to the network resource, as shown in Table 13-9.

Table 13-9. Choose the getInputStream() Method Based on the Source of the Data

Read Input From:

Class

Method in That Class to Get an Input Stream

A socket

java.net.Socket

public InputStream getInputStream() throws java.io.IOException;

A URL connection

java.net.URLConnection

public InputStream getInputStream() throws java.io.IOException;

Using the constructors or method calls we can get an input stream that reads bytes from a file, a socket, a URL, a pipe, or a byte array. Once you have your input stream of whatever variety, it has certain basic methods available.

Basic InputStream Methods

All InputStreams give you at least these three somewhat basic input methods:

public int read() 
public int read(byte[] b) 
public int read(byte[] b, int from, int len) 

These read into, respectively, a single byte, an array of bytes, and a range in an array of bytes. The call will not return until some data is available to read, although it won’t necessarily fill the array. The return value will be -1 if the InputStream hits end of file (EOF). This is why the single byte call returns a 32-bit int, even though it only reads a 8-bit byte. The high-order 24 bits in the return value allow you to distinguish EOF from a byte read. Those bits will be zero when a data byte is read and 0xFFFFFF when EOF is reached.

The ByteArrayInputStream allows you to take data from a byte array in your program using the read() methods rather than array indexing. There is a new package in JDK 1.4 called java.nio. This package supports methods that write out an array full of data to a stream with one statement. (See Chapter 14 for an example.)

All java.io classes with “InputStream” in their name have a few other methods, too, like close(), available(), and skip(). They are methods promised by the abstract class InputStream. To harp once more on the theme that is stressed throughout this chapter: you use InputStreams for reading bytes or binary values, not Unicode characters or objects.

If those three basic read() methods aren’t enough, you will wrap another class around your Input Stream. The most common wrapper is a DataInputStream class to read binary bytes. Anything written with a DataOutputStream can be read back in by a DataInputStream. The DataInputStream class has these methods for binary input.

Methods of java.io.DataInputStream

 public class java.io.DataInputStream 
         extends java.io.FilterInputStream implements java.io.DataInput {
    public DataInputStream(java.io.InputStream); 
    public final int read(byte[]) throws java.io.IOException; 
    public final int read(byte[], int, int) throws java.io.IOException; 
    public final void readFully(byte[]) throws java.io.IOException; 
    public final void readFully(byte[], int, int) throws java.io.IOException; 
    public final int skipBytes(int) throws java.io.IOException; 
    public final boolean readBoolean() throws java.io.IOException; 
    public final byte readByte() throws java.io.IOException; 
    public final int readUnsignedByte() throws java.io.IOException; 
    public final short readShort() throws java.io.IOException; 
    public final int readUnsignedShort() throws java.io.IOException; 
    public final char readChar() throws java.io.IOException; 
    public final int readInt() throws java.io.IOException; 
    public final long readLong() throws java.io.IOException; 
    public final float readFloat() throws java.io.IOException; 
    public final double readDouble() throws java.io.IOException; 
    public final String readLine() throws java.io.IOException;  // deprecated 
    public final String readUTF() throws java.io.IOException; 
    public static final String readUTF(java.io.DataInput) throws java.io.IOException; 
} 

As you can see, there is a read method for all primitive types, e.g., readInt() to read an int. A line of 8-bit bytes can be read into a String using the readLine() method, but this has been deprecated because it lacks proper byte-to-double-byte conversion. You can also read a string that was written in the UTF-encoded format where characters are 1-3 bytes in length (readUTF() method). The UTF format string is preceded by a 16-bit length field, allowing your code to scoop up the right amount of data. Notice that, although there is a DataOutputStream.write-Chars(), there is no DataInputStream.readChars(). You have to read chars one at a time, and you decide when you have read the entire string.

A Word About IOExceptions

Let’s say a few words on the subject of IOExceptions. If you look at the runtime source, you’ll see that there are a dozen I/O-related exceptions. The most common I/O-related exceptions are:

FileNotFoundException EOFException InterruptedIOException UTFDataFormatError

These are all subclasses of IOException. InterruptedIOException was supposed to be raised when you called the interrupt method of a thread that was blocked on I/O. It didn’t work very well, and we’ll look at the replacement in the next chapter.

The name EOFException suggests that it is thrown whenever EOF (end of file) is encountered, and that therefore this exception might be used as the condition to break out of a loop. Unhappily, it can’t always be used that way. EOFException is raised in only three classes: DataInputStream, ObjectInputStream, and RandomAccessFile (and their subclasses, of course). The EOFException would be better named UnexpectedEOFException, as it is only raised when the programmer has asked for a fixed definite amount of input, and the end of file is reached before all the requested amount of data has been obtained.

EOFException is not a universal alert that the normal end of file has been reached. In FileInputStream or FileReader, you detect EOF by checking for the -1 return value from a read, not by trying to catch EOFException. So if you want to use an EOFException to terminate a loop, make sure that the methods you are using will throw you one. The compiler will generally warn you if you try to catch an exception that is not thrown.

FileNotFoundException is self-explanatory. UTFDataFormatException is thrown when the I/O library finds an inconsistency in reading some UTF data.

Example

Here is some sample code to read a file and dump its contents in hexadecimal. The read method of FileInputStream returns -1 when it reaches EOF, so we use that to terminate the “get more input” loop.

// This program hex dumps the contents of the file 
// whose name is given as a commandline argument. 
import java.io.*; 
public class Dump {

    static FileInputStream myFIS = null; 
    static FileOutputStream myFOS = null; 
    static BufferedInputStream myBIS = null; 
    static PrintStream myPOS = null; 

    static public void main(String[] arg) {
           if (arg.length==0) {
                     System.out.println(“usage: java Dump somefile”); 
                     System.exit(); 
           } 
            PrintStream e = System.err ; 
           try {
                      myFIS = new FileInputStream( arg[0] ); 
                      myFOS = new FileOutputStream( arg[0] + “.hex” ); 

                      myBIS = new BufferedInputStream(myFIS); 
                      // the “true” says we want writes flushed to disk with each newline 
                      myPOS = new PrintStream (
                                                   new BufferedOutputStream(myFOS), true); 
                     myPOS.print(“Hex dump of file “ + arg[0]); 
                     int i; 
                      while ( (i=myBIS.read()) != -1 ) {
                               dump( (byte) i ); 
                      } 
           } catch(IOException x) {
                      e.println(“Exception: “ + x.getMessage() ); 
           } 
    } 

    static private long byteCount = 0; 

    static private void dump(byte b) {
           if (byteCount % 16 == 0) {
                      // output newline and the address ever y 16 bytes 
                     myPOS.println(); 
                     // pad leading zeros in address. 
                     String addr = Long.toHexString(byte Count); 
                     while (addr.length() < 8) addr = “0” + addr ; 
                     myPOS.print( addr + “:” ); 
           } 

           // output a space ever y 4 bytes 
          if (byteCount++ % 4 == 0) {
                     myPOS.print(“  “); 
           } 

           // dump the byte as 2 hex chars 
           String s = Integer.toHexString( b & 0xFF ); 
           if (s.length()==1) s = “0” + s; 
           myPOS.print( s.charAt(0) ); 
           myPOS.print( s.charAt(1) + “ “ ); 
    } 
} 

If you look at the “numbers.txt” output file with an editor that can display the contents of a file in hexadecimal, you will see four four-byte ints there containing the values written. Here is the beginning of the output you get from running this program on its own class file (it will vary with different compilers):

Hex dump of file Dump.class 
00000000:  ca fe ba be   00 03 00 2d   00 8e 0a 00   2f 00 45 09 
00000010:  00 46 00 47   07 00 48 0a   00 03 00 49   09 00 2e 00 
00000020:  4a 07 00 4b   07 00 4c 0a   00 07 00 45   0a 00 07 00 
00000030:  4d 08 00 4e   0a 00 07 00   4f 0a 00 06 

The first int word of this class file is 0xCAFE 0xBABE. The first word of all Java class files is 0xCAFE 0xBABE. It is what is known as a “magic number”— a special value put at a special place in a file to confer the magic ability to distinguish this kind of file from any other. It allows Java tools like the JVM to check that they have been given a class file to execute against. If you are going to put a special value there, you might as well use one that is easy to recognize and remember!

We wrap the FileInputStream in a BufferedInputStream as described in the next section. The program would work just as well without buffering, but may be slower for large input files. In JDK 1.4, the default size of a buffer in this class is 0.5KB. That’s much too small. For large inputs, you should do some measurements and think about moving this into the megabyte range.

Input Stream Wrappers

We have seen several times how a basic I/O class can be wrapped or “decorated” by another I/O class of the same parent class. So it should be no surprise that an InputStream can have a BufferedInputStream and/or a subclass of FilterInputStream interposed between the FileInputStream (or other data source) and the DataInputStream.

There are quite a variety of InputStreams that can decorate the basic access classes. Figure 13-5 shows some, but by no means all, of the most popular classes. You can wrap any or all of the following output streams onto your original InputStream:

Classes that wrap Input Streams.

Figure 13-5. Classes that wrap Input Streams.

  • BufferedInputStream. This class must directly wrap the input source (e.g., the FileInputStream) to get the most performance benefit. You want the buffering to start as early as possible. Wrap any other classes around the buffered input stream.

  • Your subclass of FilterInputStream. You will extend the class and override some or all of the read methods to filter the data on the way in.

  • LineNumberInputStream. This class keeps track of the number of newlines it has seen in the input stream.

  • PushbackInputStream. This class allows an arbitrary amount of data to be “pushed back” or returned to the input stream where it is available for rereading. You might do this when you are trying to assemble a number out of digits in the input stream and you read past the end of the number.

  • SequenceInputStream. This class provides the effect of gluing several input streams together, one after the other, so that as one stream is exhausted you seamlessly start reading from the next. You might use this when your data is spread across several data files with a similar format.

  • InputStreamReader. This class converts an InputStream class to a Reader class, allowing you to layer any of the Reader classes on top of that. It provides a bridge from the 8-bit byte world to the 16-bit character world when you have an input stream and want a Reader. Remembering that the Reader methods are poor at processing anything with more structure than a character, the most common reason for going from an input stream to a Reader is to change the character set encoding—to convert from, e.g., ASCII to EBCDIC.

  • java.util.zip.GZIPInputStream. The zip, gzip, and jar output streams will uncompress the bytes read from them, using the zip, gzip, and zip algorithms, respectively. An example of reading and expanding a file in GZip format is shown in the next section.

  • various others in the release, and which you write yourself. For example, there is a CipherInputStream that will decrypt what is given to it. This is part of the javax.crypto extension library, which you have to set up with a Cipher object. There is more detail in the online API documentation.

At the very end of the chain of wrapped classes you generally have either a DataInputStream or an ObjectInputStream. The ObjectInputStream class allows you to read back in an object and all the objects it references. You can wrap this class around any of the other input streams, and read the object from a file, a socket, up from a pipe, etc. An example of object I/O is shown in the next chapter.

GZIP Files and Streams

The word “GZip” means GNU Zip. The GNU organization (a loose organization of expert programmers founded at MIT by Richard Stallman) has specified a simpler variant of a ZIP format that has become popular on Unix. It compresses its input by using the patent-free Lempel-Ziv coding. Gzip compressed format can only hold a single file, not an archive or directory of files, as with PK-ZIP. If you have several files, you must use the Unix “tar” utility to bundle them up into a single file first, then use the GZip Unix utility to compress that one file. Unpacking is the reverse of this. Here’s the simplest example code to unpack a GZip file.

// Expand a .gz file into uncompressed form 
// Peter van der Linden, August 2001 

import java.io.*; 
import java.util.zip.*; 
public class expandgz {

   public static void main (String args[]) throws Exception {
         if (args.length == 0) {
                System.out.println(“usage:  java expandgz  filename.gz”); 
                System.exit(0); 
         } 
         GZIPInputStream gzi = new GZIPInputStream(
                                                new FileInputStream( args[0] )); 
         int to = args[0].lastIndexOf('.'), 
         if (to == -1) {
                 System.out.println(“usage:  java expandgz  filename.gz”); 
                 System.exit(0); 
         } 
         String fout = args[0].substring(0, to); 
         BufferedOutputStream bos = new BufferedOutputStream(
                                                new FileOutputStream(fout) ); 
         System.out.println(“writing “ + fout); 

         int b; 
          do {
              b = gzi.read(); 
              if (b==-1) break; 
              bos.write(b); 
         } while (true); 
         gzi.close(); 
         bos.close(); 
   } 
} 

Executing this program will expand the gzip file whose name (e.g., abc.gz) is given on the command line. There is a corresponding GZIPOutputStream class that you can use to write a file into compressed gzip form. It would have been better for everyone if the GNU folks had not invented a new format, but just reverseengineered zip. The Unix world wasn’t paying enough attention to the PC world back in those days.

Suggested Use of Input Streams

  • Use an Input Stream when you want to input binary values or ASCII text.

  • Choose a FileInputStream or one of the getInputStream methods, depending on where you want the bytes to come from.

  • You can optionally wrap that in an arbitrary number of InputStream filters, buffers, expanders, decoders, etc. Then wrap a DataInputStream on top, and use its read methods to do the input. Use ObjectInputStream if you are reconstituting objects rather than reading data.

  • If you use a buffer, it should directly wrap the FileInputStream so that as much as possible of the “pipeline” of classes is buffered.

Further Reading

Here are some online resources for more information on other I/O packages:

Image I/O

java.sun.com/products/java-media/jai/whatis.html

industry.java.sun.com/javaone/99/event/0,1768,661,00.html

API docs at, e.g., file://c:/jdk1.4b/docs/api/index.html (click to package javax.imageio)

Speech

White paper at java.sun.com/marketing/collateral/speech.html Programmer’s guide at java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Preface.html

Logging

Overview at java.sun.com/j2se/1.4/docs/guide/util/logging/overview.html

API docs at, e.g., file://c:/jdk1.4b/docs/api/index.html (click to package java.util.logging)

Communication ports

See the home page at java.sun.com/products/javacomm/

The home page has a pointer to a user guide.

Printing

Be careful not to mistakenly read information on the older print APIs.

java.sun.com/printing/

API User Guide:

java.sun.com/j2se/1.4/docs/guide/jps/spec/JPSTOC.fm.html

API docs at, e.g., file://c:/jdk1.4b/docs/api/index.html (click to package javax.print). The API docs contain a small printing example.

Exercises

  1. Measure the difference between buffered and non-buffered I/O operating with 10K 1-byte writes and one 10KB write, repeated 10,000 times in a loop. Draw a graph to illustrate your results. How do the results change with a buffer size of 128KB, 256KB, 512KB?

  2. Modify the program that does a hex dump of a file so that it also outputs any printable bytes in a set of columns to the right of the hex dump on each line. Print the character if it has a printable form, and print a “.” if it does not. This ensures that lines are the same length and columns line up.

  3. Write a Java program whose output at runtime is an exact duplicate of the program’s source code. The shortest Java program to do this is about a page of code.

  4. Write a program that prints a table of printable ISO 8859-1 characters and their bit patterns.

  5. Repeat the previous exercise using only the I/O class RandomAccessFile.

  6. Rewrite the hex dumper utility to use one or more Filter classes. The first filter can turn binary bytes into the equivalent printable hex characters. The second filter can insert the addresses and newlines at appropriate points.

  7. Rewrite the decss utility (see below) in Java. For extra credit, look up the algorithms on the web to actually carry out the decryption of an encoded DVD stream, and write Java code to do that. Does it run quickly enough to decode and play in real time? Explain why or why not.

Some Light Relief—The Illegal Prime Number!

By now everyone is familiar with DVDs—originally an acronym for “Digital Video Disc,” later changed to “Digital Versatile Disk” for pointless marketing reasons. DVDs are similar to CD-ROMs in many ways, with a crucial difference that DVDs can hold about 4.7GBytes, or about seven times as much data as a CD. The tracks and the bits in the tracks are packed closer together on a DVD, which is why DVD players can read CDs but not vice-versa. If you use a suitable compression technology, you can actually squeeze up to 133 minutes of high resolution video with several soundtracks and subtitles onto a DVD. The compression is essential, and the movie industry uses the MPEG-2 algorithm that was designed for this purpose, and which provides 40-1 compression. The more efficient MPEG-4 (DivX) compression, which provides another fivefold reduction, is also being introduced.

However, since the movie industry doesn’t want to be Napstered (have their content ripped off and broadcast for free on the Internet), they encrypt the MPEG-2 files using an algorithm called the Content Scrambling System or CSS. If you do a directory listing of a DVD, you’ll see some large .VOB files. These are Video OBjects, a fancy name for content scrambled .MPG2 files. Every maker of DVD players on the planet is supposed to license the decryption algorithm from the DVD Copy Control Association (DVD-CCA) for a fee, and they impose several restrictions on the player. DCC-CCA is believed to be a subsidiary of Matsushita, the company mainly responsible for the development of DVD and CSS. Some of its restrictions take away rights that consumers have long enjoyed under copyright law. They seem more geared towards controlling what consumers can do, rather than dealing with problems of rip-offs and piracy.

So what are the restrictions that licensed DVD players have to impose? CSS encryption allows the DVD industry to force region restrictions into all DVD players. There are six geographic regions (North America, Europe, etc.) and in 1999 they added a seventh for DVDs intended for airplanes. A player in region one will refuse to play disks labelled as belonging to any other region. Region restrictions allow the movie industry to sell the same DVD at different prices in different markets. It prevents any DVDs you buy on business trips outside your region from being played on your home system. The CSS encryption also prevents you from fast-forwarding past the copyright warning or advertisements or any other con-tent the producer wants you to see. You can sell commercials for a much higher price if people cannot skip past them. Some people speculate that CSS is also paving the way for more restrictions such as DVDs with a limited lifetime or limited number of viewings. The movie industry blows a lot of smoke about CSS preventing large scale piracy, but CSS does nothing whatever to prevent pirates from copying DVDs. Its only effect is enforcing use limitations on end consumers.

For a long time, there was no software to play DVDs available for Linux. If you had a shelf full of DVDs that you had bought, you could play them all on your tv or Windows box, but because of the CSS restrictions, not on your Linux or Solaris system. The CSS restrictions were the equivalent of a book publisher enforcing a restriction that you could read a book under incandescent lighting but not under fluorescent lighting or daylight. No one in the Linux community had the means to pay the “CSS tax” to the DVD-CCA. Then, in October 1999, a nameless German hacker reverse-engineered CSS. The source code to decrypt DVDs was published on the web by a 15-year-old boy from Norway. The program was called “deCSS” because it reverses CSS, turning the encrypted files into ordinary MPEG-2 files.

There then followed an extraordinary game of “whack-a-mole” as the DVD-CCA and the Motion Picture Association of America (MPAA) tried to chase the source code around the web and sue it out of existence. That game continues today. As far as we know, American laws do not apply in Norway, but the 15-year-old boy was hauled off by the foolish Norwegian police who also seized his PC and his cell phone. The cell phone was a lucky guess on the part of the cops, because he did actually have a back-up copy of the source stored in it (cell phones these days are effectively quite powerful computers, and many cell phones contain a JVM).

The U.S. movie industry had the foresight (and the impudence) to get a law passed so that it is illegal to write, publish, possess, or run code like deCSS. The Digital Copyright Millennium Act (DMCA) made it illegal to circumvent a “technological protection measure” put in place by the copyright owner. That means the deCSS program is illegal. Write a program to play a DVD that you own, and you could go to jail! The DCMA is a poorly-constructed law, written by the movie industry to advance its own interests at the expense of consumers. It will eventually be replaced by something more sensible but it all takes time. This is not theoretical. A Russian software developer was arrested under the DCMA by FBI agents in Las Vegas in July 2001 one day after he publicly pointed out copyright protection weaknesses in Adobe software.

Hackers started to vie with each other for the most imaginative way to publish the deCSS code. America has very strong guarantees concerning freedom of speech, and there are long-standing precedents saying that printed text (even source code) counts as speech. Programmers embedded the code in JPEG files, put the algorithm in plain English, and one person even wrote the deCSS steps in the form of a haiku (Japanese poetry)! There is a whole gallery of these deCSS publications at www.cs.cmu.edu/∼dst/DeCSS/Gallery/ (assuming it hasn’t been sued off the net yet).

My absolute favorite deCSS code exists in the form of a prime number. Computer scientist and number theory fan Phil Carmody found a prime number which expresses the deCSS code! Phil felt strongly that the Motion Picture Association was acting in bad faith, and to oppose this he wanted to make sure that the DeCSS code was archived somewhere beyond the reach of the law. Somewhere where the number would be allowed to be printed because it had some property that made it publishable, independent of whether it was “illegal” or not. Phil had done a lot of work with prime numbers and prime number proving. It can’t be illegal to possess a prime number, can it? Or can it? Basic common sense says no, but the DCMA says yes!

Phil took the deCSS source file, which contains about 100 lines of C code, and gzipped it to make it smaller. That resulted in a binary file about 600 bytes long. Then Phil considered the file as, not a 4-byte integer or an 8-byte long integer, but a ∼600-byte binary super-long integer, and he looked for a small number he could append so that the whole thing would be a prime number. In character terms, say the code gzipped to the string “100,” Phil was looking for an odd number suffix like “9” that would make the whole string (in this case “1009”) a prime number. That kind of search is quick and easy to program.

Number theory told Phil that his chances were about 1 in 1,600 of finding a one or two byte suffix that would make the entire number prime. There wasn’t a one byte suffix, so he went on to look for a two byte suffix. Even though the chances were very slim, he found one! If he had not found one, he would have simply gone on to test longer suffixes and change variable names in the code until eventually a prime number was reached. The resulting prime number is shown below. It is 1,401 digits long.

Here’s a little Java program that takes a large number stored as a string (such as the one above, hint, hint), turns that string into a super-long binary integer, and then writes that out as a gzip file. You can easily write a Java program like the example shown earlier in this chapter to expand it into a C source code file. But remember, it is illegal to have or compile or run such a source code file under American law prevailing since 1998.

Converting a Number to Binary and Writing to a File

//   Convert a big number into binary and write it out 
//   Peter van der Linden, June 2001 

import java.io.*; 
import java.math.*; 
public class togz {

       static String illegalPrime = 
“4856507896573978293098418946942861377074420873513579240196520736” + 
“plug the rest of the number in here... “; 

       static BigInteger b = new BigInteger(illegalPrime); 
       static final BigInteger two_five_six = new BigInteger(“256”); 

       static byte[] result = new byte[illegalPrime.length()]; 

       public static void main (String args[]) throws Exception {
                BigInteger d_r []; 

                if (b.isProbablePrime(5)) 
                           System.out.println(“b is probably prime (good)”); 
                else System.out.println(“b is probably not prime (bad!)”); 

                int i=0; 
                do {
                           d_r = b.divideAndRemainder(two_five_six); 
                           b = d_r[0];     // the multiple 
                           result[i++] = (byte) d_r[1].intValue();  // the remainder 
                } while ( b.compareTo( two_five_six ) = 0); 

                result[i] = (byte) b.intValue(); 

                System.out.println(“writing bytes.gz”); 
                FileOutputStream fos = new FileOutputStream(“bytes.gz”); 

                DataOutputStream dos = new DataOutputStream(fos); 
                for (int j=i; j0; j--) {
                           dos.writeByte( result[j] ); 
                } 
                fos.close(); 
       } 
} 

Run winzip on the resulting .gz file to unpack it, and voila, you become eligible for a prison sentence of up to 20 years. What’s wrong with this picture? The CSS descrambler that you get is just three or four utility routines, not a main program. To play DVDs on your Linux, Mac, Solaris, or even Windows box, you’ll need to download the software from one of several open source DVD players. I like the one at www.videolan.org.

Alternatively, if you like putting things together by hand, you can do a websearch for “vobdec,” which is an open source utility that lets you discover the title keys on the encrypted DVDs that you own. If you have the title key you can run the efdtt utility (do another websearch) to turn the encrypted MPEG stream into a clear one and just point it at your favorite player.

Finally, you can try rewriting any of these in Java for fun. Just don’t blame me if the FBI knocks on your door with an MPAA warrant and seizes your debugger. Illegal code! The very idea! Next thing you know, they’ll be declaring t-shirts illegal.



[1] See The Cornell Commission on Morris and the worm; T. Eisenberg, D. Gries, J. Hartmanis, D. Holcomb, M. S. Lynn and T. Santoro; Commun. ACM vol. 32, 6 (June 1989), pp. 706–709).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.151.158