Most programs need to interact with the outside world, and one common way of doing so is by reading and writing files. Files are normally on some persistent medium such as a disk drive, and, for the most part, we shall happily ignore the differences between a hard disk (and all the operating system-dependent filesystem types), a floppy or zip drive, a CD-ROM, and others. For now, they’re just files.
Java’s approach to input/output is sufficiently different from that of older languages (C, Fortran, Pascal) that people coming from those languages are often critical of Java’s I/O model. I can offer no better defense than that provided in the preface to Elliotte Rusty Harold’s book Java I/O :
Java is the first programming language with a modern, object-oriented approach to input and output. Java’s I/O model is more powerful and more suited to real-world tasks than any other major language used today. Surprisingly, however, I/O in Java has a bad reputation. It is widely believed (falsely) that Java I/O can’t handle basic tasks that are easily accomplished in other languages like C, C++, and Pascal. In particular, it is commonly said that:
-- I/O is too complicated for introductory students; or, more specifically, there’s no good way to read a number from the console.
-- Java can’t handle basic formatting tasks like printing PI with three decimal digits of precision.
[Rusty’s book shows] that not only can Java handle these two tasks with relative ease and grace; it can do anything C and C++ can do, and a whole lot more. Java’s I/O capabilities not only match those of classic languages like C and Pascal, they vastly surpass them.
The most common complaint about Java I/O among students, teachers, authors of textbooks, and posters to
comp.lang.java
is that there’s no simple way to read a number from the console (System.in
). Many otherwise excellent introductory Java books repeat this canard. Some textbooks go to great lengths to reproduce the behavior they’re accustomed to from C or Pascal, apparently so teachers don’t have to significantly rewrite the tired Pascal exercises they’ve been using for the last 20 years. However, new books that aren’t committed to the old ways of doing things generally use command-line interfaces for basic exercises, then rapidly introduce the graphical user interfaces any real [desktop] program is going to use anyway. Apple wisely abandoned the command-line interface back in 1984, and the rest of the world is slowly catching up. AlthoughSystem.in
andSystem.out
are certainly convenient for teaching and debugging, in 1999 no completed, cross-platform program should even assume the existence of a console for either input or output.The second common complaint about Java I/O is that it can’t handle formatted output; that is, that there’s no equivalent of
printf( )
in Java. In a very narrow sense, this is true, because Java does not support the variable length arguments lists a function likeprintf( )
requires. Nonetheless, a number of misguided souls (your author not least among them) [has] at one time or another embarked on futile efforts to reproduceprintf( )
in Java. This may have been necessary in Java 1.0, but as of Java 1.1, it’s no longer needed. Thejava.text
package, described in Chapter 16 [of Rusty’s book, and in Chapter 5 of the present work], provides complete support for formatting numbers. Furthermore, thejava.text
package goes way beyond the limited capabilities ofprintf( )
. It supports not only different precisions and widths, but also internationalization, currency formats, grouping symbols, and a lot more. It can easily be extended to handle Roman numerals, scientific or exponential notation, or any other number format you may require.The underlying flaw in most people’s analysis of Java I/O is that they’ve confused input and output with the formatting and interpreting of data. Java is the first major language to cleanly separate the classes that read and write bytes (primarily, various kinds of input streams and output streams) from the classes that interpret this data. You often need to format strings without necessarily writing them on the console. You may also need to write large chunks of data without worrying about what they represent. Traditional languages that connect formatting and interpretation of I/O and hard-wire a few specific formats are extremely difficult to extend to other formats. In essence, you have to give up and start from scratch every time you want to process a new format.
Furthermore, C’s
printf()
,fprintf()
, andsprintf( )
family only really works well on Unix (where, not coincidentally, C was invented). On other platforms the underlying assumption that every target may be treated as a file fails, and these standard library functions must be replaced by other functions from the host API.Java’s clean separation between formatting and I/O allows you to create new formatting classes without throwing away the I/O classes, and to write new I/O classes while still using the old formatting classes. Formatting and interpreting strings are fundamentally different operations from moving bytes from one device to another. Java is the first major language to recognize and take advantage of this.
To which I can only add, “Well said, Rusty.” What Rusty doesn’t mention is an obvious corollary of this flexibility: it can often take a bit more coding to do some of the command-line, standard-in/standard-out operations. You’ll see most of these in this chapter, and you’ll see throughout the book how flexible Java I/O really is.
This chapter covers all the normal input/output operations such as opening/closing and reading/writing files. Files are assumed to reside on some kind of file store or permanent storage. I don’t discuss how such a filesystem or disk I/O system works -- consult a book on operating system design for the general details, or a platform-specific book on system internals or filesystem design for such details. Network filesystems such as Sun’s Network File System (NFS, common on Unix and available for Windows though products such as Hummingbird NFS), Macintosh Appletalk File System (available for Unix via NetATalk), and SMB (MS-Windows network filesystem, available for Unix with the freeware Samba program) are assumed to work “just like” disk filesystems, except where noted. And while you could even provide your own network filesystem layer using the material covered in Chapter 16, it is exceedingly difficult to design your own network virtual filesystem, and probably better to use one of the existing ones.
Java provides two
sets of classes for
reading and
writing. The
Stream
section of package
java.io
(see Figure 9-1) is for
reading or writing bytes of data. Older languages tended to assume
that a byte (which is a machine-specific collection of bits, usually
eight bits on modern computers) is exactly the same thing as a
“character” -- a letter, digit, or other linguistic
element. However, Java is designed to be used interanationally, and
eight bits is simply not enough to handle the many different
character sets used around the world. Script-based languages like
Arabic and Indian languages, and pictographic languages like Chinese,
Japanese, and Korean each have many more than 256 characters, the
maximum that can be represented in an eight-bit byte. The unification
of these many character code sets is called, not
surprisingly,
Unicode. Actually,
it’s not the first such unification, but it’s the most
widely used standard at this time. Both Java and XML use
Unicode as their character sets, allowing
you to read and write text in any of these human languages. But you
have to use Readers
and
Writers
, not Streams
, for
textual data.
You see, Unicode itself
doesn’t solve the entire
problem. Many of these human languages were used on computers long
before Unicode was invented, and they didn’t all pick the same
representation as Unicode. And they all have zillions of files
encoded in a particular representation that
isn’t Unicode. So conversion routines are needed when reading
and writing to convert between Unicode String
objects used inside the Java machine and the particular external
representation that a user’s files are written in. These
converters are packaged inside a powerful set of classes called
Readers
and Writers
.
Readers/Writers
are always used instead of
InputStreams/OutputStreams
when you want to deal
with characters instead of bytes. We’ll see more on this
conversion, and how to specify which conversion, a little later in
this chapter.
One topic not addressed here is the issue of hardcopy printing. Java includes two similar schemes for printing onto paper, both using the same graphics model as is used in AWT, the basic Window System package. For this reason, I defer discussion of printing to Chapter 12.
Another topic not covered here is that of having the read or write occur concurrently with other program activity. This requires the use of threads, or multiple flows of control within a single program. Threaded I/O is a necessity in many programs: those reading from slow devices such as tape drives, those reading from or writing to network connections, and those with a GUI. For this reason the topic is given considerable attention, in the context of multi-threaded applications, in Chapter 24.
13.59.227.82