In this chapter, we cover the Java application programming interfaces (APIs) for input and output. You will learn how to access files and directories and how to read and write data in binary and text format. This chapter also shows you the object serialization mechanism that lets you store objects as easily as you can store text or numeric data. Next, we turn to several improvements that were made in the “new I/O” package java.nio
, introduced in Java SE 1.4. We finish the chapter with a discussion of regular expressions, even though they are not actually related to streams and files. We couldn’t find a better place to handle that topic, and apparently neither could the Java team—the regular expression API specification was attached to the specification request for the “new I/O” features of Java SE 1.4.
In the Java API, an object from which we can read a sequence of bytes is called an input stream. An object to which we can write a sequence of bytes is called an output stream. These sources and destinations of byte sequences can be—and often are—files, but they can also be network connections and even blocks of memory. The abstract classes InputStream
and OutputStream
form the basis for a hierarchy of input/output (I/O) classes.
Because byte-oriented streams are inconvenient for processing information stored in Unicode (recall that Unicode uses multiple bytes per character), there is a separate hierarchy of classes for processing Unicode characters that inherit from the abstract Reader
and Writer
classes. These classes have read and write operations that are based on two-byte Unicode code units rather than on single-byte characters.
The InputStream
class has an abstract method:
abstract int read()
This method reads one byte and returns the byte that was read, or −1 if it encounters the end of the input source. The designer of a concrete input stream class overrides this method to provide useful functionality. For example, in the FileInputStream
class, this method reads one byte from a file. System.in
is a predefined object of a subclass of InputStream
that allows you to read information from the keyboard.
The InputStream
class also has nonabstract methods to read an array of bytes or to skip a number of bytes. These methods call the abstract read
method, so subclasses need to override only one method.
Similarly, the OutputStream
class defines the abstract method
abstract void write(int b)
which writes one byte to an output location.
Both the read
and write
methods block until the bytes are actually read or written. This means that if the stream cannot immediately be accessed (usually because of a busy network connection), the current thread blocks. This gives other threads the chance to do useful work while the method is waiting for the stream to again become available.
The available
method lets you check the number of bytes that are currently available for reading. This means a fragment like the following is unlikely to block:
int bytesAvailable = in.available(); if (bytesAvailable > 0) { byte[] data = new byte[bytesAvailable]; in.read(data); }
When you have finished reading or writing to a stream, close it by calling the close
method. This call frees up operating system resources that are in limited supply. If an application opens too many streams without closing them, system resources can become depleted. Closing an output stream also flushes the buffer used for the output stream: any characters that were temporarily placed in a buffer so that they could be delivered as a larger packet are sent off. In particular, if you do not close a file, the last packet of bytes might never be delivered. You can also manually flush the output with the flush
method.
Even if a stream class provides concrete methods to work with the raw read
and write
functions, application programmers rarely use them. The data that you are interested in probably contain numbers, strings, and objects, not raw bytes.
Java gives you many stream classes derived from the basic InputStream
and OutputStream
classes that let you work with data in the forms that you usually use rather than at the byte level.
Unlike C, which gets by just fine with a single type FILE*
, Java has a whole zoo of more than 60 (!) different stream types (see Figures 1-1 and 1-2).
Let us divide the animals in the stream class zoo by how they are used. There are separate hierarchies for classes that process bytes and characters. As you saw, the InputStream
and OutputStream
classes let you read and write individual bytes and arrays of bytes. These classes form the basis of the hiearchy shown in Figure 1-1. To read and write strings and numbers, you need more capable subclasses. For example, DataInputStream
and DataOutputStream
let you read and write all the primitive Java types in binary format. Finally, there are streams that do useful stuff; for example, the ZipInputStream
and ZipOutputStream
that let you read and write files in the familiar ZIP compression format.
For Unicode text, on the other hand, you use subclasses of the abstract classes Reader
and Writer
(see Figure 1-2). The basic methods of the Reader
and Writer
classes are similar to the ones for InputStream
and OutputStream
.
abstract int read() abstract void write(int c)
The read
method returns either a Unicode code unit (as an integer between 0 and 65535) or −1 when you have reached the end of the file. The write
method is called with a Unicode code unit. (See Volume I, Chapter 3 for a discussion of Unicode code units.)
Java SE 5.0 introduced four additional interfaces: Closeable
, Flushable
, Readable
, and Appendable
(see Figure 1-3). The first two interfaces are very simple, with methods
void close() throws IOException
and
void flush()
respectively. The classes InputStream
, OutputStream
, Reader
, and Writer
all implement the Closeable
interface. OutputStream
and Writer
implement the Flushable
interface.
The Readable
interface has a single method
int read(CharBuffer cb)
The CharBuffer
class has methods for sequential and random read/write access. It represents an in-memory buffer or a memory-mapped file. (See “The Buffer Data Structure” on page 72 for details.)
The Appendable
interface has two methods for appending single characters and character sequences:
Appendable append(char c) Appendable append(CharSequence s)
The CharSequence
interface describes basic properties of a sequence of char
values. It is implemented by String
, CharBuffer
, StringBuilder
, and StringBuffer
.
Of the stream zoo classes, only Writer
implements Appendable
.
FileInputStream
and FileOutputStream
give you input and output streams attached to a disk file. You give the file name or full path name of the file in the constructor. For example,
FileInputStream fin = new FileInputStream("employee.dat");
looks in the user directory for a file named "employee.dat"
.
Because all the classes in java.io
interpret relative path names as starting with the user’s working directory, you may want to know this directory. You can get at this information by a call to System.getProperty("user.dir")
.
Like the abstract InputStream
and OutputStream
classes, these classes support only reading and writing on the byte level. That is, we can only read bytes and byte arrays from the object fin
.
byte b = (byte) fin.read();
As you will see in the next section, if we just had a DataInputStream
, then we could read numeric types:
DataInputStream din = . . .; double s = din.readDouble();
But just as the FileInputStream
has no methods to read numeric types, the DataInputStream
has no method to get data from a file.
Java uses a clever mechanism to separate two kinds of responsibilities. Some streams (such as the FileInputStream
and the input stream returned by the openStream
method of the URL class) can retrieve bytes from files and other more exotic locations. Other streams (such as the DataInputStream
and the PrintWriter
) can assemble bytes into more useful data types. The Java programmer has to combine the two. For example, to be able to read numbers from a file, first create a FileInputStream
and then pass it to the constructor of a DataInputStream
.
FileInputStream fin = new FileInputStream("employee.dat"); DataInputStream din = new DataInputStream(fin); double s = din.readDouble();
If you look at Figure 1-1 again, you can see the classes FilterInputStream
and FilterOutputStream
. The subclasses of these files are used to add capabilities to raw byte streams.
You can add multiple capabilities by nesting the filters. For example, by default, streams are not buffered. That is, every call to read
asks the operating system to dole out yet another byte. It is more efficient to request blocks of data instead and put them in a buffer. If you want buffering and the data input methods for a file, you need to use the following rather monstrous sequence of constructors:
DataInputStream din = new DataInputStream( new BufferedInputStream( new FileInputStream("employee.dat")));
Notice that we put the DataInputStream
last in the chain of constructors because we want to use the DataInputStream
methods, and we want them to use the buffered read
method.
Sometimes you’ll need to keep track of the intermediate streams when chaining them together. For example, when reading input, you often need to peek at the next byte to see if it is the value that you expect. Java provides the PushbackInputStream
for this purpose.
PushbackInputStream pbin = new PushbackInputStream( new BufferedInputStream( new FileInputStream("employee.dat")));
Now you can speculatively read the next byte
int b = pbin.read();
and throw it back if it isn’t what you wanted.
if (b != '<') pbin.unread(b);
But reading and unreading are the only methods that apply to the pushback input stream. If you want to look ahead and also read numbers, then you need both a pushback input stream and a data input stream reference.
DataInputStream din = new DataInputStream( pbin = new PushbackInputStream( new BufferedInputStream( new FileInputStream("employee.dat"))));
Of course, in the stream libraries of other programming languages, niceties such as buffering and lookahead are automatically taken care of, so it is a bit of a hassle in Java that one has to resort to combining stream filters in these cases. But the ability to mix and match filter classes to construct truly useful sequences of streams does give you an immense amount of flexibility. For example, you can read numbers from a compressed ZIP file by using the following sequence of streams (see Figure 1-4):
ZipInputStream zin = new ZipInputStream(new FileInputStream("employee.zip")); DataInputStream din = new DataInputStream(zin);
(See “ZIP Archives” on page 32 for more on Java’s ability to handle ZIP files.)
When saving data, you have the choice between binary and text format. For example, if the integer 1234 is saved in binary, it is written as the sequence of bytes 00 00 04 D2
(in hexadecimal notation). In text format, it is saved as the string "1234"
. Although binary I/O is fast and efficient, it is not easily readable by humans. We first discuss text I/O and cover binary I/O in the section “Reading and Writing Binary Data” on page 23.
When saving text strings, you need to consider the character encoding. In the UTF-16 encoding, the string "1234"
is encoded as 00 31 00 32 00 33 00 34
(in hex). However, many programs expect that text files are encoded in a different encoding. In ISO 8859-1, the encoding most commonly used in the United States and Western Europe, the string would be written as 31 32 33 34
, without the zero bytes.
The OutputStreamWriter
class turns a stream of Unicode characters into a stream of bytes, using a chosen character encoding. Conversely, the InputStreamReader
class turns an input stream that contains bytes (specifying characters in some character encoding) into a reader that emits Unicode characters.
For example, here is how you make an input reader that reads keystrokes from the console and converts them to Unicode:
InputStreamReader in = new InputStreamReader(System.in);
This input stream reader assumes the default character encoding used by the host system, such as the ISO 8859-1 encoding in Western Europe. You can choose a different encoding by specifying it in the constructor for the InputStreamReader
, for example,
InputStreamReader in = new InputStreamReader(new FileInputStream("kremlin.dat"), "ISO8859_5");
See “Character Sets” on page 19 for more information on character encodings.
Because it is so common to attach a reader or writer to a file, a pair of convenience classes, FileReader
and FileWriter
, is provided for this purpose. For example, the writer definition
FileWriter out = new FileWriter("output.txt");
FileWriter out = new FileWriter(new FileOutputStream("output.txt"));
For text output, you want to use a PrintWriter
. That class has methods to print strings and numbers in text format. There is even a convenience constructor to link a PrintWriter
with a FileWriter
. The statement
PrintWriter out = new PrintWriter("employee.txt");
is equivalent to
PrintWriter out = new PrintWriter(new FileWriter("employee.txt"));
To write to a print writer, you use the same print
, println
, and printf
methods that you used with System.out
. You can use these methods to print numbers (int
, short
, long
, float
, double
), characters, boolean
values, strings, and objects.
For example, consider this code:
String name = "Harry Hacker"; double salary = 75000; out.print(name); out.print(' '), out.println(salary);
This writes the characters
Harry Hacker 75000.0
to the writer out
. The characters are then converted to bytes and end up in the file employee.txt
.
The println
method adds the correct end-of-line character for the target system ("
"
on Windows, "
"
on UNIX) to the line. This is the string obtained by the call System.getProperty("line.separator")
.
If the writer is set to autoflush mode, then all characters in the buffer are sent to their destination whenever println
is called. (Print writers are always buffered.) By default, autoflushing is not enabled. You can enable or disable autoflushing by using the PrintWriter(Writer out, boolean autoFlush)
constructor:
PrintWriter out = new PrintWriter(new FileWriter("employee.txt"), true); // autoflush
The print
methods don’t throw exceptions. You can call the checkError
method to see if something went wrong with the stream.
Java veterans might wonder whatever happened to the PrintStream
class and to System.out
. In Java 1.0, the PrintStream
class simply truncated all Unicode characters to ASCII characters by dropping the top byte. Clearly, that was not a clean or portable approach, and it was fixed with the introduction of readers and writers in Java 1.1. For compatibility with existing code, System.in
, System.out
, and System.err
are still streams, not readers and writers. But now the PrintStream
class internally converts Unicode characters to the default host encoding in the same way as the PrintWriter
does. Objects of type PrintStream
act exactly like print writers when you use the print
and println
methods, but unlike print writers, they allow you to output raw bytes with the write(int)
and write(byte[])
methods.
To write data in binary format, you use a DataOutputStream
.
To write in text format, you use a PrintWriter
.
Therefore, you might expect that there is an analog to the DataInputStream
that lets you read data in text format. The closest analog is the Scanner
class that we used extensively in Volume I. However, before Java SE 5.0, the only game in town for processing text input was the BufferedReader
class—it has a method, readLine
, that lets you read a line of text. You need to combine a buffered reader with an input source.
BufferedReader in = new BufferedReader(new FileReader("employee.txt"));
The readLine
method returns null
when no more input is available. A typical input loop, therefore, looks like this:
String line;
while ((line = in.readLine()) != null)
{
do something with line
}
However, a BufferedReader
has no methods for reading numbers. We suggest that you use a Scanner
for reading text input.
In this section, we walk you through an example program that stores an array of Employee
records in a text file. Each record is stored in a separate line. Instance fields are separated from each other by delimiters. We use a vertical bar (|
) as our delimiter. (A colon (:
) is another popular choice. Part of the fun is that everyone uses a different delimiter.) Naturally, we punt on the issue of what might happen if a |
actually occurred in one of the strings we save.
Here is a sample set of records:
Harry Hacker|35500|1989|10|1 Carl Cracker|75000|1987|12|15 Tony Tester|38000|1990|3|15
Writing records is simple. Because we write to a text file, we use the PrintWriter
class. We simply write all fields, followed by either a |
or, for the last field, a
. This work is done in the following writeData
method that we add to our Employee
class.
public void writeData(PrintWriter out) throws IOException { GregorianCalendar calendar = new GregorianCalendar(); calendar.setTime(hireDay); out.println(name + "|" + salary + "|" + calendar.get(Calendar.YEAR) + "|" + (calendar.get(Calendar.MONTH) + 1) + "|" + calendar.get(Calendar.DAY_OF_MONTH)); }
To read records, we read in a line at a time and separate the fields. We use a scanner to read each line and then split the line into tokens with the String.split
method.
public void readData(Scanner in) { String line = in.nextLine(); String[] tokens = line.split("\|"); name = tokens[0]; salary = Double.parseDouble(tokens[1]); int y = Integer.parseInt(tokens[2]); int m = Integer.parseInt(tokens[3]); int d = Integer.parseInt(tokens[4]); GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d); hireDay = calendar.getTime(); }
The parameter of the split
method is a regular expression describing the separator. We discuss regular expressions in more detail at the end of this chapter. As it happens, the vertical bar character has a special meaning in regular expressions, so it needs to be escaped with a character. That character needs to be escaped by another
, yielding the
"\|"
expression.
The complete program is in Listing 1-1. The static method
void writeData(Employee[] e, PrintWriter out)
first writes the length of the array, then writes each record. The static method
Employee[] readData(BufferedReader in)
first reads in the length of the array, then reads in each record. This turns out to be a bit tricky:
int n = in.nextInt(); in.nextLine(); // consume newline Employee[] employees = new Employee[n]; for (int i = 0; i < n; i++) { employees[i] = new Employee(); employees[i].readData(in); }
The call to nextInt
reads the array length but not the trailing newline character. We must consume the newline so that the readData
method can get the next input line when it calls the nextLine
method.
Example 1-1. TextFileTest.java
1. import java.io.*; 2. import java.util.*; 3. 4. /** 5. * @version 1.12 2007-06-22 6. * @author Cay Horstmann 7. */ 8. public class TextFileTest 9. { 10. public static void main(String[] args) 11. { 12. Employee[] staff = new Employee[3]; 13. 14. staff[0] = new Employee("Carl Cracker", 75000, 1987, 12, 15); 15. staff[1] = new Employee("Harry Hacker", 50000, 1989, 10, 1); 16. staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15); 17. 18. try 19. { 20. // save all employee records to the file employee.dat 21. PrintWriter out = new PrintWriter("employee.dat"); 22. writeData(staff, out); 23. out.close(); 24. 25. // retrieve all records into a new array 26. Scanner in = new Scanner(new FileReader("employee.dat")); 27. Employee[] newStaff = readData(in); 28. in.close(); 29. 30. // print the newly read employee records 31. for (Employee e : newStaff) 32. System.out.println(e); 33. } 34. catch (IOException exception) 35. { 36. exception.printStackTrace(); 37. } 38. } 39. 40. /** 41. * Writes all employees in an array to a print writer 42. * @param employees an array of employees 43. * @param out a print writer 44. */ 45. private static void writeData(Employee[] employees, PrintWriter out) throws IOException 46. { 47. // write number of employees 48. out.println(employees.length); 49. 50. for (Employee e : employees) 51. e.writeData(out); 52. } 53. /** 54. * Reads an array of employees from a scanner 55. * @param in the scanner 56. * @return the array of employees 57. */ 58. private static Employee[] readData(Scanner in) 59. { 60. // retrieve the array size 61. int n = in.nextInt(); 62. in.nextLine(); // consume newline 63. 64. Employee[] employees = new Employee[n]; 65. for (int i = 0; i < n; i++) 66. { 67. employees[i] = new Employee(); 68. employees[i].readData(in); 69. } 70. return employees; 71. } 72. } 73. 74. class Employee 75. { 76. public Employee() 77. { 78. } 79. 80. public Employee(String n, double s, int year, int month, int day) 81. { 82. name = n; 83. salary = s; 84. GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day); 85. hireDay = calendar.getTime(); 86. } 87. 88. public String getName() 89. { 90. return name; 91. } 92. 93. public double getSalary() 94. { 95. return salary; 96. } 97. 98. public Date getHireDay() 99. { 100. return hireDay; 101. } 102. 103. public void raiseSalary(double byPercent) 104. { 105. double raise = salary * byPercent / 100; 106. salary += raise; 107. } 108. 109. public String toString() 110. { 111. return getClass().getName() + "[name=" + name + ",salary=" + salary + ",hireDay=" 112. + hireDay + "]"; 113. } 114. 115. /** 116. * Writes employee data to a print writer 117. * @param out the print writer 118. */ 119. public void writeData(PrintWriter out) 120. { 121. GregorianCalendar calendar = new GregorianCalendar(); 122. calendar.setTime(hireDay); 123. out.println(name + "|" + salary + "|" + calendar.get(Calendar.YEAR) + "|" 124. + (calendar.get(Calendar.MONTH) + 1) + "|" + calendar.get(Calendar.DAY_OF_MONTH)); 125. } 126. 127. /** 128. * Reads employee data from a buffered reader 129. * @param in the scanner 130. */ 131. public void readData(Scanner in) 132. { 133. String line = in.nextLine(); 134. String[] tokens = line.split("\|"); 135. name = tokens[0]; 136. salary = Double.parseDouble(tokens[1]); 137. int y = Integer.parseInt(tokens[2]); 138. int m = Integer.parseInt(tokens[3]); 139. int d = Integer.parseInt(tokens[4]); 140. GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d); 141. hireDay = calendar.getTime(); 142. } 143. 144. private String name; 145. private double salary; 146. private Date hireDay; 147. }
In the past, international character sets have been handled rather unsystematically throughout the Java library. The java.nio
package—introduced in Java SE 1.4—unifies character set conversion with the introduction of the Charset
class. (Note that the s
is lower case.)
A character set maps between sequences of two-byte Unicode code units and byte sequences used in a local character encoding. One of the most popular character encodings is ISO-8859-1, a single-byte encoding of the first 256 Unicode characters. Gaining in importance is ISO-8859-15, which replaces some of the less useful characters of ISO-8859-1 with accented letters used in French and Finnish, and, more important, replaces the “international currency” character ¤ with the Euro symbol (€) in code point 0xA4
. Other examples for character encodings are the variable-byte encodings commonly used for Japanese and Chinese.
The Charset
class uses the character set names standardized in the IANA Character Set Registry (http://www.iana.org/assignments/character-sets). These names differ slightly from those used in previous versions. For example, the “official” name of ISO-8859-1 is now "ISO-8859-1"
and no longer "ISO8859_1"
, which was the preferred name up to Java SE 1.3.
An excellent reference for the “ISO 8859 alphabet soup” is http://czyborra.com/charsets/iso8859.html.
You obtain a Charset
by calling the static forName
method with either the official name or one of its aliases:
Charset cset = Charset.forName("ISO-8859-1");
Character set names are case insensitive.
For compatibility with other naming conventions, each character set can have a number of aliases. For example, ISO-8859-1 has aliases
ISO8859-1 ISO_8859_1 ISO8859_1 ISO_8859-1 ISO_8859-1:1987 8859_1 latin1 l1 csISOLatin1 iso-ir-100 cp819 IBM819 IBM-819 819
The aliases
method returns a Set
object of the aliases. Here is the code to iterate through the aliases:
Set<String> aliases = cset.aliases(); for (String alias : aliases) System.out.println(alias);
To find out which character sets are available in a particular implementation, call the static availableCharsets
method. Use this code to find out the names of all available character sets:
Map<String, Charset> charsets = Charset.availableCharsets(); for (String name : charsets.keySet()) System.out.println(name);
Table 1-1 lists the character encodings that every Java implementation is required to have. Table 1-2 lists the encoding schemes that the Java Development Kit (JDK) installs by default. The character sets in Table 1-3 are installed only on operating systems that use non-European languages.
Table 1-1. Required Character Encodings
| Legacy Name | Description |
---|---|---|
|
| American Standard Code for Information Exchange |
|
| ISO 8859-1, Latin alphabet No. 1 |
|
| Eight-bit Unicode Transformation Format |
|
| Sixteen-bit Unicode Transformation Format, byte order specified by an optional initial byte-order mark |
|
| Sixteen-bit Unicode Transformation Format, big-endian byte order |
|
| Sixteen-bit Unicode Transformation Format, little-endian byte order |
Table 1-2. Basic Character Encodings
| Legacy Name | Description |
---|---|---|
|
| ISO 8859-2, Latin alphabet No. 2 |
|
| ISO 8859-4, Latin alphabet No. 4 |
|
| ISO 8859-5, Latin/Cyrillic alphabet |
|
| ISO 8859-7, Latin/Greek alphabet |
|
| ISO 8859-9, Latin alphabet No. 5 |
|
| ISO 8859-13, Latin alphabet No. 7 |
|
| ISO 8859-15, Latin alphabet No. 9 |
|
| Windows Eastern European |
|
| Windows Cyrillic |
|
| Windows Latin-1 |
|
| Windows Greek |
|
| Windows Turkish |
|
| Windows Baltic |
Table 1-3. Extended Character Encodings
| Legacy Name | Description |
---|---|---|
|
| Big5, Traditional Chinese |
|
| Big5 with Hong Kong extensions, Traditional Chinese |
|
| JIS X 0201, 0208, 0212, EUC encoding, Japanese |
|
| KS C 5601, EUC encoding, Korean |
|
| Simplified Chinese, PRC Standard |
|
| GBK, Simplified Chinese |
|
| ISCII91 encoding of Indic scripts |
|
| JIS X 0201, 0208 in ISO 2022 form, Japanese |
|
| ISO 2022 KR, Korean |
|
| ISO 8859-3, Latin alphabet No. 3 |
|
| ISO 8859-6, Latin/Arabic alphabet |
|
| ISO 8859-8, Latin/Hebrew alphabet |
|
| Shift-JIS, Japanese |
|
| TIS620, Thai |
|
| Windows Hebrew |
|
| Windows Arabic |
|
| Windows Vietnamese |
|
| Windows Japanese |
|
| GB2312, EUC encoding, Simplified Chinese |
|
| JIS X 0201, 0208, EUC encoding, Japanese |
|
| CNS11643 (Plane 1-3), EUC encoding, Traditional Chinese |
|
| Windows Traditional Chinese with Hong Kong extensions |
|
| Windows Simplified Chinese |
|
| Windows Korean |
|
| Windows Traditional Chinese |
Local encoding schemes cannot represent all Unicode characters. If a character cannot be represented, it is transformed to a ?
.
Once you have a character set, you can use it to convert between Unicode strings and encoded byte sequences. Here is how you encode a Unicode string:
String str = . . .; ByteBuffer buffer = cset.encode(str); byte[] bytes = buffer.array();
Conversely, to decode a byte sequence, you need a byte buffer. Use the static wrap
method of the ByteBuffer
array to turn a byte array into a byte buffer. The result of the decode
method is a CharBuffer
. Call its toString
method to get a string.
byte[] bytes = . . .; ByteBuffer bbuf = ByteBuffer.wrap(bytes, offset, length); CharBuffer cbuf = cset.decode(bbuf); String str = cbuf.toString();
The DataOutput
interface defines the following methods for writing a number, character, boolean
value, or string in binary format:
writeChars writeByte writeInt writeShort writeLong writeFloat writeDouble writeChar writeBoolean writeUTF
For example, writeInt
always writes an integer as a 4-byte binary quantity regardless of the number of digits, and writeDouble
always writes a double
as an 8-byte binary quantity. The resulting output is not humanly readable, but the space needed will be the same for each value of a given type and reading it back in will be faster than parsing text.
There are two different methods of storing integers and floating-point numbers in memory, depending on the platform you are using. Suppose, for example, you are working with a 4-byte int
, say the decimal number 1234, or 4D2 in hexadecimal (1234 = 4 × 256 + 13 × 16 + 2). This can be stored in such a way that the first of the 4 bytes in memory holds the most significant byte (MSB) of the value: 00 00 04 D2
. This is the so-called big-endian method. Or we can start with the least significant byte (LSB) first: D2 04 00 00
. This is called, naturally enough, the little-endian method. For example, the SPARC uses big-endian; the Pentium, little-endian. This can lead to problems. When a C or C++ file is saved, the data are saved exactly as the processor stores them. That makes it challenging to move even the simplest data files from one platform to another. In Java, all values are written in the big-endian fashion, regardless of the processor. That makes Java data files platform independent.
The writeUTF
method writes string data by using a modified version of 8-bit Unicode Transformation Format. Instead of simply using the standard UTF-8 encoding (which is shown in Table 1-4), character strings are first represented in UTF-16 (see Table 1-5) and then the result is encoded using the UTF-8 rules. The modified encoding is different for characters with code higher than 0xFFFF
. It is used for backward compatibility with virtual machines that were built when Unicode had not yet grown beyond 16 bits.
Because nobody else uses this modification of UTF-8, you should only use the writeUTF
method to write strings that are intended for a Java virtual machine; for example, if you write a program that generates bytecodes. Use the writeChars
method for other purposes.
See RFC 2279 (http://ietf.org/rfc/rfc2279.txt) and RFC 2781 (http://ietf.org/rfc/rfc2781.txt) for definitions of UTF-8 and UTF-16.
To read the data back in, use the following methods, defined in the DataInput
interface:
readInt readShort readLong readFloat readDouble readChar readBoolean readUTF
The DataInputStream
class implements the DataInput
interface. To read binary data from a file, you combine a DataInputStream
with a source of bytes such as a FileInputStream
:
DataInputStream in = new DataInputStream(new FileInputStream("employee.dat"));
Similarly, to write binary data, you use the DataOutputStream
class that implements the DataOutput
interface:
DataOutputStream out = new DataOutputStream(new FileOutputStream("employee.dat"));
The RandomAccessFile
class lets you find or write data anywhere in a file. Disk files are random access, but streams of data from a network are not. You open a random-access file either for reading only or for both reading and writing. You specify the option by using the string "r"
(for read access) or "rw"
(for read/write access) as the second argument in the constructor.
RandomAccessFile in = new RandomAccessFile("employee.dat", "r"); RandomAccessFile inOut = new RandomAccessFile("employee.dat", "rw");
When you open an existing file as a RandomAccessFile
, it does not get deleted.
A random-access file has a file pointer that indicates the position of the next byte that will be read or written. The seek
method sets the file pointer to an arbitrary byte position within the file. The argument to seek
is a long
integer between zero and the length of the file in bytes.
The getFilePointer
method returns the current position of the file pointer.
The RandomAccessFile
class implements both the DataInput
and DataOutput
interfaces. To read and write from a random-access file, you use methods such as readInt
/writeInt
and readChar/writeChar
that we discussed in the preceding section.
We now walk through an example program that stores employee records in a random access file. Each record will have the same size. This makes it easy to read an arbitrary record. Suppose you want to position the file pointer to the third record. Simply set the file pointer to the appropriate byte position and start reading.
long n = 3; in.seek((n - 1) * RECORD_SIZE); Employee e = new Employee(); e.readData(in);
If you want to modify the record and then save it back into the same location, remember to set the file pointer back to the beginning of the record:
in.seek((n - 1) * RECORD_SIZE); e.writeData(out);
To determine the total number of bytes in a file, use the length
method. The total number of records is the length divided by the size of each record.
long nbytes = in.length(); // length in bytes int nrecords = (int) (nbytes / RECORD_SIZE);
Integers and floating-point values have a fixed size in binary format, but we have to work harder for strings. We provide two helper methods to write and read strings of a fixed size.
The writeFixedString
writes the specified number of code units, starting at the beginning of the string. (If there are too few code units, the method pads the string, using zero values.)
public static void writeFixedString(String s, int size, DataOutput out) throws IOException { for (int i = 0; i < size; i++) { char ch = 0; if (i < s.length()) ch = s.charAt(i); out.writeChar(ch); } }
The readFixedString
method reads characters from the input stream until it has consumed size
code units or until it encounters a character with a zero value. Then, it skips past the remaining zero values in the input field. For added efficiency, this method uses the StringBuilder
class to read in a string.
public static String readFixedString(int size, DataInput in) throws IOException { StringBuilder b = new StringBuilder(size); int i = 0; boolean more = true; while (more && i < size) { char ch = in.readChar(); i++; if (ch == 0) more = false; else b.append(ch); } in.skipBytes(2 * (size - i)); return b.toString(); }
We placed the writeFixedString
and readFixedString
methods inside the DataIO
helper class.
To write a fixed-size record, we simply write all fields in binary.
public void writeData(DataOutput out) throws IOException { DataIO.writeFixedString(name, NAME_SIZE, out); out.writeDouble(salary); GregorianCalendar calendar = new GregorianCalendar(); calendar.setTime(hireDay); out.writeInt(calendar.get(Calendar.YEAR)); out.writeInt(calendar.get(Calendar.MONTH) + 1); out.writeInt(calendar.get(Calendar.DAY_OF_MONTH)); }
Reading the data back is just as simple.
public void readData(DataInput in) throws IOException { name = DataIO.readFixedString(NAME_SIZE, in); salary = in.readDouble(); int y = in.readInt(); int m = in.readInt(); int d = in.readInt(); GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d); hireDay = calendar.getTime(); }
Let us compute the size of each record. We will use 40 characters for the name strings. Therefore, each record contains 100 bytes:
40 characters = 80 bytes for the name
1 double
= 8 bytes for the salary
3 int
= 12 bytes for the date
The program shown in Listing 1-2 writes three records into a data file and then reads them from the file in reverse order. To do this efficiently requires random access—we need to get at the third record first.
Example 1-2. RandomFileTest.java
1. import java.io.*; 2. import java.util.*; 3. 4. /** 5. * @version 1.11 2004-05-11 6. * @author Cay Horstmann 7. */ 8. 9. public class RandomFileTest 10. { 11. public static void main(String[] args) 12. { 13. Employee[] staff = new Employee[3]; 14. 15. staff[0] = new Employee("Carl Cracker", 75000, 1987, 12, 15); 16. staff[1] = new Employee("Harry Hacker", 50000, 1989, 10, 1); 17. staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15); 18. 19. try 20. { 21. // save all employee records to the file employee.dat 22. DataOutputStream out = new DataOutputStream(new FileOutputStream("employee.dat")); 23. for (Employee e : staff) 24. e.writeData(out); 25. out.close(); 26. 27. // retrieve all records into a new array 28. RandomAccessFile in = new RandomAccessFile("employee.dat", "r"); 29. // compute the array size 30. int n = (int)(in.length() / Employee.RECORD_SIZE); 31. Employee[] newStaff = new Employee[n]; 32. 33. // read employees in reverse order 34. for (int i = n - 1; i >= 0; i--) 35. { 36. newStaff[i] = new Employee(); 37. in.seek(i * Employee.RECORD_SIZE); 38. newStaff[i].readData(in); 39. } 40. in.close(); 41. 42. // print the newly read employee records 43. for (Employee e : newStaff) 44. System.out.println(e); 45. } 46. catch (IOException e) 47. { 48. e.printStackTrace(); 49. } 50. } 51. } 52. 53. class Employee 54. { 55. public Employee() {} 56. 57. public Employee(String n, double s, int year, int month, int day) 58. { 59. name = n; 60. salary = s; 61. GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day); 62. hireDay = calendar.getTime(); 63. } 64. 65. public String getName() 66. { 67. return name; 68. } 69. 70. public double getSalary() 71. { 72. return salary; 73. } 74. 75. public Date getHireDay() 76. { 77. return hireDay; 78. } 79. 80. /** 81. Raises the salary of this employee. 82. @byPercent the percentage of the raise 83. */ 84. public void raiseSalary(double byPercent) 85. { 86. double raise = salary * byPercent / 100; 87. salary += raise; 88. } 89. 90. public String toString() 91. { 92. return getClass().getName() 93. + "[name=" + name 94. + ",salary=" + salary 95. + ",hireDay=" + hireDay 96. + "]"; 97. } 98. 99. /** 100. Writes employee data to a data output 101. @param out the data output 102. */ 103. public void writeData(DataOutput out) throws IOException 104. { 105. DataIO.writeFixedString(name, NAME_SIZE, out); 106. out.writeDouble(salary); 107. 108. GregorianCalendar calendar = new GregorianCalendar(); 109. calendar.setTime(hireDay); 110. out.writeInt(calendar.get(Calendar.YEAR)); 111. out.writeInt(calendar.get(Calendar.MONTH) + 1); 112. out.writeInt(calendar.get(Calendar.DAY_OF_MONTH)); 113. } 114. 115. /** 116. Reads employee data from a data input 117. @param in the data input 118. */ 119. public void readData(DataInput in) throws IOException 120. { 121. name = DataIO.readFixedString(NAME_SIZE, in); 122. salary = in.readDouble(); 123. int y = in.readInt(); 124. int m = in.readInt(); 125. int d = in.readInt(); 126. GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d); 127. hireDay = calendar.getTime(); 128. } 129. 130. public static final int NAME_SIZE = 40; 131. public static final int RECORD_SIZE = 2 * NAME_SIZE + 8 + 4 + 4 + 4; 132. 133. private String name; 134. private double salary; 135. private Date hireDay; 136. } 137. 138. class DataIO 139. { 140. public static String readFixedString(int size, DataInput in) 141. throws IOException 142. { 143. StringBuilder b = new StringBuilder(size); 144. int i = 0; 145. boolean more = true; 146. while (more && i < size) 147. { 148. char ch = in.readChar(); 149. i++; 150. if (ch == 0) more = false; 151. else b.append(ch); 152. } 153. in.skipBytes(2 * (size - i)); 154. return b.toString(); 155. } 156. 157. public static void writeFixedString(String s, int size, DataOutput out) 158. throws IOException 159. { 160. for (int i = 0; i < size; i++) 161. { 162. char ch = 0; 163. if (i < s.length()) ch = s.charAt(i); 164. out.writeChar(ch); 165. } 166. } 167. }
ZIP archives store one or more files in (usually) compressed format. Each ZIP archive has a header with information such as the name of the file and the compression method that was used. In Java, you use a ZipInputStream
to read a ZIP archive. You need to look at the individual entries in the archive. The getNextEntry
method returns an object of type ZipEntry
that describes the entry. The read
method of the ZipInputStream
is modified to return −1 at the end of the current entry (instead of just at the end of the ZIP file). You must then call closeEntry
to read the next entry. Here is a typical code sequence to read through a ZIP file:
ZipInputStream zin = new ZipInputStream(new FileInputStream(zipname)); ZipEntry entry; while ((entry = zin.getNextEntry()) != null) { analyze entry; read the contents of zin; zin.closeEntry(); } zin.close();
To read the contents of a ZIP entry, you will probably not want to use the raw read
method; usually, you will use the methods of a more competent stream filter. For example, to read a text file inside a ZIP file, you can use the following loop:
Scanner in = new Scanner(zin);
while (in.hasNextLine())
do something with in.nextLine();
The ZIP input stream throws a ZipException
when there is an error in reading a ZIP file. Normally this error occurs when the ZIP file has been corrupted.
To write a ZIP file, you use a ZipOutputStream
. For each entry that you want to place into the ZIP file, you create a ZipEntry
object. You pass the file name to the ZipEntry
constructor; it sets the other parameters such as file date and decompression method. You can override these settings if you like. Then, you call the putNextEntry
method of the ZipOutputStream
to begin writing a new file. Send the file data to the ZIP stream. When you are done, call closeEntry
. Repeat for all the files you want to store. Here is a code skeleton:
FileOutputStream fout = new FileOutputStream("test.zip"); ZipOutputStream zout = new ZipOutputStream(fout); for all files { ZipEntry ze = new ZipEntry(filename); zout.putNextEntry(ze); send data to zout; zout.closeEntry(); } zout.close();
JAR files (which were discussed in Volume I, Chapter 10) are simply ZIP files with another entry, the so-called manifest. You use the JarInputStream
and JarOutputStream
classes to read and write the manifest entry.
ZIP streams are a good example of the power of the stream abstraction. When you read the data that are stored in compressed form, you don’t worry that the data are being decompressed as they are being requested. And the source of the bytes in ZIP formats need not be a file—the ZIP data can come from a network connection. In fact, whenever the class loader of an applet reads a JAR file, it reads and decompresses data from the network.
The article at http://www.javaworld.com/javaworld/jw-10-2000/jw-1027-toolbox.html shows you how to modify a ZIP archive.
The program shown in Listing 1-3 lets you open a ZIP file. It then displays the files stored in the ZIP archive in the combo box at the bottom of the screen. If you select one of the files, the contents of the file are displayed in the text area, as shown in Figure 1-5.
Example 1-3. ZipTest.java
1. import java.awt.*; 2. import java.awt.event.*; 3. import java.io.*; 4. import java.util.*; 5. import java.util.List; 6. import java.util.zip.*; 7. import javax.swing.*; 8. 9. /** 10. * @version 1.32 2007-06-22 11. * @author Cay Horstmann 12. */ 13. public class ZipTest 14. { 15. public static void main(String[] args) 16. { 17. EventQueue.invokeLater(new Runnable() 18. { 19. public void run() 20. { 21. ZipTestFrame frame = new ZipTestFrame(); 22. frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); 23. frame.setVisible(true); 24. } 25. }); 26. } 27. } 28. 29. /** 30. * A frame with a text area to show the contents of a file inside a ZIP archive, a combo 31. * box to select different files in the archive, and a menu to load a new archive. 32. */ 33. class ZipTestFrame extends JFrame 34. { 35. public ZipTestFrame() 36. { 37. setTitle("ZipTest"); 38. setSize(DEFAULT_WIDTH, DEFAULT_HEIGHT); 39. 40. // add the menu and the Open and Exit menu items 41. JMenuBar menuBar = new JMenuBar(); 42. JMenu menu = new JMenu("File"); 43. 44. JMenuItem openItem = new JMenuItem("Open"); 45. menu.add(openItem); 46. openItem.addActionListener(new ActionListener() 47. { 48. public void actionPerformed(ActionEvent event) 49. { 50. JFileChooser chooser = new JFileChooser(); 51. chooser.setCurrentDirectory(new File(".")); 52. int r = chooser.showOpenDialog(ZipTestFrame.this); 53. if (r == JFileChooser.APPROVE_OPTION) 54. { 55. zipname = chooser.getSelectedFile().getPath(); 56. fileCombo.removeAllItems(); 57. scanZipFile(); 58. } 59. } 60. }); 61. 62. JMenuItem exitItem = new JMenuItem("Exit"); 63. menu.add(exitItem); 64. exitItem.addActionListener(new ActionListener() 65. { 66. public void actionPerformed(ActionEvent event) 67. { 68. System.exit(0); 69. } 70. }); 71. 72. menuBar.add(menu); 73. setJMenuBar(menuBar); 74. 75. // add the text area and combo box 76. fileText = new JTextArea(); 77. fileCombo = new JComboBox(); 78. fileCombo.addActionListener(new ActionListener() 79. { 80. public void actionPerformed(ActionEvent event) 81. { 82. loadZipFile((String) fileCombo.getSelectedItem()); 83. } 84. }); 85. 86. add(fileCombo, BorderLayout.SOUTH); 87. add(new JScrollPane(fileText), BorderLayout.CENTER); 88. } 89. 90. /** 91. * Scans the contents of the ZIP archive and populates the combo box. 92. */ 93. public void scanZipFile() 94. { 95. new SwingWorker<Void, String>() 96. { 97. protected Void doInBackground() throws Exception 98. { 99. ZipInputStream zin = new ZipInputStream(new FileInputStream(zipname)); 100. ZipEntry entry; 101. while ((entry = zin.getNextEntry()) != null) 102. { 103. publish(entry.getName()); 104. zin.closeEntry(); 105. } 106. zin.close(); 107. return null; 108. } 109. 110. protected void process(List<String> names) 111. { 112. for (String name : names) 113. fileCombo.addItem(name); 114. 115. } 116. }.execute(); 117. } 118. 119. /** 120. * Loads a file from the ZIP archive into the text area 121. * @param name the name of the file in the archive 122. */ 123. public void loadZipFile(final String name) 124. { 125. fileCombo.setEnabled(false); 126. fileText.setText(""); 127. new SwingWorker<Void, Void>() 128. { 129. protected Void doInBackground() throws Exception 130. { 131. try 132. { 133. ZipInputStream zin = new ZipInputStream(new FileInputStream(zipname)); 134. ZipEntry entry; 135. 136. // find entry with matching name in archive 137. while ((entry = zin.getNextEntry()) != null) 138. { 139. if (entry.getName().equals(name)) 140. { 141. // read entry into text area 142. Scanner in = new Scanner(zin); 143. while (in.hasNextLine()) 144. { 145. fileText.append(in.nextLine()); 146. fileText.append(" "); 147. } 148. } 149. zin.closeEntry(); 150. } 151. zin.close(); 152. } 153. catch (IOException e) 154. { 155. e.printStackTrace(); 156. } 157. return null; 158. } 159. 160. protected void done() 161. { 162. fileCombo.setEnabled(true); 163. } 164. }.execute(); 165. } 166. 167. public static final int DEFAULT_WIDTH = 400; 168. public static final int DEFAULT_HEIGHT = 300; 169. private JComboBox fileCombo; 170. private JTextArea fileText; 171. private String zipname; 172. }
Using a fixed-length record format is a good choice if you need to store data of the same type. However, objects that you create in an object-oriented program are rarely all of the same type. For example, you might have an array called staff
that is nominally an array of Employee
records but contains objects that are actually instances of a subclass such as Manager
.
It is certainly possible to come up with a data format that allows you to store such polymorphic collections, but fortunately, we don’t have to. The Java language supports a very general mechanism, called object serialization, that makes it possible to write any object to a stream and read it again later. (You will see later in this chapter where the term “serialization” comes from.)
To save object data, you first need to open an ObjectOutputStream
object:
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("employee.dat"));
Now, to save an object, you simply use the writeObject
method of the ObjectOutputStream
class as in the following fragment:
Employee harry = new Employee("Harry Hacker", 50000, 1989, 10, 1); Manager boss = new Manager("Carl Cracker", 80000, 1987, 12, 15); out.writeObject(harry); out.writeObject(boss);
To read the objects back in, first get an ObjectInputStream
object:
ObjectInputStream in = new ObjectInputStream(new FileInputStream("employee.dat"));
Then, retrieve the objects in the same order in which they were written, using the readObject
method.
Employee e1 = (Employee) in.readObject(); Employee e2 = (Employee) in.readObject();
There is, however, one change you need to make to any class that you want to save and restore in an object stream. The class must implement the Serializable
interface:
class Employee implements Serializable { . . . }
The Serializable
interface has no methods, so you don’t need to change your classes in any way. In this regard, it is similar to the Cloneable
interface that we discussed in Volume I, Chapter 6. However, to make a class cloneable, you still had to override the clone
method of the Object
class. To make a class serializable, you do not need to do anything else.
You can write and read only objects with the writeObject/readObject
methods. For primitive type values, you use methods such as writeInt/readInt
or writeDouble/readDouble
. (The object stream classes implement the DataInput/DataOutput
interfaces.)
Behind the scenes, an ObjectOutputStream
looks at all fields of the objects and saves their contents. For example, when writing an Employee
object, the name, date, and salary fields are written to the output stream.
However, there is one important situation that we need to consider: What happens when one object is shared by several objects as part of its state?
To illustrate the problem, let us make a slight modification to the Manager
class. Let’s assume that each manager has a secretary:
class Manager extends Employee { . . . private Employee secretary; }
Each Manager
object now contains a reference to the Employee
object that describes the secretary. Of course, two managers can share the same secretary, as is the case in Figure 1-6 and the following code:
harry = new Employee("Harry Hacker", . . .); Manager carl = new Manager("Carl Cracker", . . .); carl.setSecretary(harry); Manager tony = new Manager("Tony Tester", . . .); tony.setSecretary(harry);
Saving such a network of objects is a challenge. Of course, we cannot save and restore the memory addresses for the secretary objects. When an object is reloaded, it will likely occupy a completely different memory address than it originally did.
Instead, each object is saved with a serial number, hence the name object serialization for this mechanism. Here is the algorithm:
Associate a serial number with each object reference that you encounter (as shown in Figure 1-7).
When encountering an object reference for the first time, save the object data to the stream.
If it has been saved previously, just write “same as previously saved object with serial number x.”
When reading back the objects, the procedure is reversed.
When an object is specified in the stream for the first time, construct it, initialize it with the stream data, and remember the association between the sequence number and the object reference.
When the tag “same as previously saved object with serial number x,” is encountered, retrieve the object reference for the sequence number.
In this chapter, we use serialization to save a collection of objects to a disk file and retrieve it exactly as we stored it. Another very important application is the transmittal of a collection of objects across a network connection to another computer. Just as raw memory addresses are meaningless in a file, they are also meaningless when communicating with a different processor. Because serialization replaces memory addresses with serial numbers, it permits the transport of object collections from one machine to another. We study that use of serialization when discussing remote method invocation in Chapter 5.
Listing 1-4 is a program that saves and reloads a network of Employee
and Manager
objects (some of which share the same employee as a secretary). Note that the secretary object is unique after reloading—when newStaff[1]
gets a raise, that is reflected in the secretary
fields of the managers.
Example 1-4. ObjectStreamTest.java
1. import java.io.*; 2. import java.util.*; 3. 4. /** 5. * @version 1.10 17 Aug 1998 6. * @author Cay Horstmann 7. */ 8. class ObjectStreamTest 9. { 10. public static void main(String[] args) 11. { 12. Employee harry = new Employee("Harry Hacker", 50000, 1989, 10, 1); 13. Manager carl = new Manager("Carl Cracker", 80000, 1987, 12, 15); 14. carl.setSecretary(harry); 15. Manager tony = new Manager("Tony Tester", 40000, 1990, 3, 15); 16. tony.setSecretary(harry); 17. 18. Employee[] staff = new Employee[3]; 19. 20. staff[0] = carl; 21. staff[1] = harry; 22. staff[2] = tony; 23. 24. try 25. { 26. // save all employee records to the file employee.dat 27. ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("employee.dat")); 28. out.writeObject(staff); 29. out.close(); 30. 31. // retrieve all records into a new array 32. ObjectInputStream in = new ObjectInputStream(new FileInputStream("employee.dat")); 33. Employee[] newStaff = (Employee[]) in.readObject(); 34. in.close(); 35. 36. // raise secretary's salary 37. newStaff[1].raiseSalary(10); 38. 39. // print the newly read employee records 40. for (Employee e : newStaff) 41. System.out.println(e); 42. } 43. catch (Exception e) 44. { 45. e.printStackTrace(); 46. } 47. } 48. } 49. 50. class Employee implements Serializable 51. { 52. public Employee() 53. { 54. } 55. 56. public Employee(String n, double s, int year, int month, int day) 57. { 58. name = n; 59. salary = s; 60. GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day); 61. hireDay = calendar.getTime(); 62. } 63. 64. public String getName() 65. { 66. return name; 67. } 68. 69. public double getSalary() 70. { 71. return salary; 72. } 73. 74. public Date getHireDay() 75. { 76. return hireDay; 77. } 78. 79. public void raiseSalary(double byPercent) 80. { 81. double raise = salary * byPercent / 100; 82. salary += raise; 83. } 84. 85. public String toString() 86. { 87. return getClass().getName() + "[name=" + name + ",salary=" + salary + ",hireDay=" 88. + hireDay + "]"; 89. } 90. 91. private String name; 92. private double salary; 93. private Date hireDay; 94. } 95. 96. class Manager extends Employee 97. { 98. /** 99. * Constructs a Manager without a secretary 100. * @param n the employee's name 101. * @param s the salary 102. * @param year the hire year 103. * @param month the hire month 104. * @param day the hire day 105. */ 106. public Manager(String n, double s, int year, int month, int day) 107. { 108. super(n, s, year, month, day); 109. secretary = null; 110. } 111. 112. /** 113. * Assigns a secretary to the manager 114. * @param s the secretary 115. */ 116. public void setSecretary(Employee s) 117. { 118. secretary = s; 119. } 120. 121. public String toString() 122. { 123. return super.toString() + "[secretary=" + secretary + "]"; 124. } 125. 126. private Employee secretary; 127. }
Object serialization saves object data in a particular file format. Of course, you can use the writeObject/readObject
methods without having to know the exact sequence of bytes that represents objects in a file. Nonetheless, we found studying the data format to be extremely helpful for gaining insight into the object streaming process. Because the details are somewhat technical, feel free to skip this section if you are not interested in the implementation.
Every file begins with the two-byte “magic number”
AC ED
followed by the version number of the object serialization format, which is currently
00 05
(We use hexadecimal numbers throughout this section to denote bytes.) Then, it contains a sequence of objects, in the order that they were saved.
String objects are saved as
| two-byte length | characters |
For example, the string “Harry” is saved as
74 00 05 Harry
The Unicode characters of the string are saved in “modified UTF-8” format.
When an object is saved, the class of that object must be saved as well. The class description contains
The name of the class.
The serial version unique ID, which is a fingerprint of the data field types and method signatures.
A set of flags describing the serialization method.
A description of the data fields.
The fingerprint is obtained by ordering descriptions of the class, superclass, interfaces, field types, and method signatures in a canonical way, and then applying the so-called Secure Hash Algorithm (SHA) to that data.
SHA is a fast algorithm that gives a “fingerprint” to a larger block of information. This fingerprint is always a 20-byte data packet, regardless of the size of the original data. It is created by a clever sequence of bit operations on the data that makes it essentially 100 percent certain that the fingerprint will change if the information is altered in any way. (For more details on SHA, see, for example, Cryptography and Network Security: Principles and Practice, by William Stallings [Prentice Hall, 2002].) However, the serialization mechanism uses only the first 8 bytes of the SHA code as a class fingerprint. It is still very likely that the class fingerprint will change if the data fields or methods change.
When reading an object, its fingerprint is compared against the current fingerprint of the class. If they don’t match, then the class definition has changed after the object was written, and an exception is generated. Of course, in practice, classes do evolve, and it might be necessary for a program to read in older versions of objects. We discuss this later in the section entitled “Versioning” on page 54.
Here is how a class identifier is stored:
72
2-byte length of class name
class name
8-byte fingerprint
1-byte flag
2-byte count of data field descriptors
data field descriptors
78
(end marker)
superclass type (70
if none)
The flag byte is composed of three bit masks, defined in java.io.ObjectStreamConstants
:
static final byte SC_WRITE_METHOD = 1; // class has writeObject method that writes additional data static final byte SC_SERIALIZABLE = 2; // class implements Serializable interface static final byte SC_EXTERNALIZABLE = 4; // class implements Externalizable interface
We discuss the Externalizable
interface later in this chapter. Externalizable classes supply custom read and write methods that take over the output of their instance fields. The classes that we write implement the Serializable
interface and will have a flag value of 02
. The serializable java.util.Date
class defines its own readObject
/writeObject
methods and has a flag of 03
.
Each data field descriptor has the format:
1-byte type code
2-byte length of field name
field name
class name (if field is an object)
The type code is one of the following:
When the type code is L
, the field name is followed by the field type. Class and field name strings do not start with the string code 74
, but field types do. Field types use a slightly different encoding of their names, namely, the format used by native methods.
For example, the salary field of the Employee
class is encoded as:
D 00 06 salary
Here is the complete class descriptor of the Employee
class:
| ||
| Fingerprint and flags | |
| Number of instance fields | |
| Instance field type and name | |
| Instance field type and name | |
| Instance field class name— | |
| Instance field type and name | |
| Instance field class name— | |
| End marker | |
| No superclass |
These descriptors are fairly long. If the same class descriptor is needed again in the file, an abbreviated form is used:
| 4-byte serial number |
The serial number refers to the previous explicit class descriptor. We discuss the numbering scheme later.
An object is stored as
| class descriptor | object data |
For example, here is how an Employee
object is stored:
|
| |
|
| |
| Existing class | |
| External storage—details later | |
|
|
As you can see, the data file contains enough information to restore the Employee
object.
Arrays are saved in the following format:
| class descriptor | 4-byte number of entries | entries |
The array class name in the class descriptor is in the same format as that used by native methods (which is slightly different from the class name used by class names in other class descriptors). In this format, class names start with an L
and end with a semicolon.
For example, an array of three Employee
objects starts out like this:
| Array | ||
| New class, string length, class name | ||
| Fingerprint and flags | ||
| Number of instance fields | ||
| End marker | ||
| No superclass | ||
| Number of array entries |
Note that the fingerprint for an array of Employee
objects is different from a fingerprint of the Employee
class itself.
All objects (including arrays and strings) and all class descriptors are given serial numbers as they are saved in the output file. The numbers start at 00 7E 00 00
.
We already saw that a full class descriptor for any given class occurs only once. Subsequent descriptors refer to it. For example, in our previous example, a repeated reference to the Date
class was coded as
71 00 7E 00 08
The same mechanism is used for objects. If a reference to a previously saved object is written, it is saved in exactly the same way; that is, 71
followed by the serial number. It is always clear from the context whether the particular serial reference denotes a class descriptor or an object.
Finally, a null reference is stored as
70
Here is the commented output of the ObjectRefTest
program of the preceding section. If you like, run the program, look at a hex dump of its data file employee.dat
, and compare it with the commented listing. The important lines toward the end of the output show the reference to a previously saved object.
| File header | |||
| Array | |||
| New class, string length, class name | |||
| Fingerprint and flags | |||
| Number of instance fields | |||
| End marker | |||
| No superclass | |||
| Number of array entries | |||
|
| |||
| New class, string length, class name (serial #2) | |||
| Fingerprint and flags | |||
| Number of data fields | |||
| Instance field type and name | |||
| Instance field class name— | |||
| End marker | |||
| Superclass—new class, string length, class name (serial #4) | |||
| Fingerprint and flags | |||
| Number of instance fields | |||
| Instance field type and name | |||
| Instance field type and name | |||
| Instance field class name— | |||
| Instance field type and name | |||
| Instance field class name— | |||
| End marker | |||
| No superclass | |||
|
| |||
|
| |||
| New class, string length, class name (serial #8) | |||
| Fingerprint and flags | |||
| No instance variables | |||
| End marker | |||
| No superclass | |||
| External storage, number of bytes | |||
| Date | |||
| End marker | |||
|
| |||
|
| |||
| existing class (use serial #4) | |||
|
| |||
|
| |||
| Existing class (use serial #8) | |||
| External storage, number of bytes | |||
| Date | |||
| End marker | |||
| ||||
|
| |||
|
| |||
| Existing class (use serial #2) | |||
|
| |||
|
| |||
| Existing class (use serial #8) | |||
| External storage, number of bytes | |||
| Date | |||
| End marker | |||
|
| |||
|
|
Of course, studying these codes can be about as exciting as reading the average phone book. It is not important to know the exact file format (unless you are trying to create an evil effect by modifying the data), but it is still instructive to know that the object stream contains a detailed description of all the objects that it contains, with sufficient detail to allow reconstruction of both objects and arrays of objects.
What you should remember is this:
The object stream output contains the types and data fields of all objects.
Each object is assigned a serial number.
Repeated occurrences of the same object are stored as references to that serial number.
Certain data fields should never be serialized, for example, integer values that store file handles or handles of windows that are only meaningful to native methods. Such information is guaranteed to be useless when you reload an object at a later time or transport it to a different machine. In fact, improper values for such fields can actually cause native methods to crash. Java has an easy mechanism to prevent such fields from ever being serialized. Mark them with the keyword transient
. You also need to tag fields as transient
if they belong to nonserializable classes. Transient fields are always skipped when objects are serialized.
The serialization mechanism provides a way for individual classes to add validation or any other desired action to the default read and write behavior. A serializable class can define methods with the signature
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException; private void writeObject(ObjectOutputStream out) throws IOException;
Then, the data fields are no longer automatically serialized, and these methods are called instead.
Here is a typical example. A number of classes in the java.awt.geom
package, such as Point2D.Double
, are not serializable. Now suppose you want to serialize a class LabeledPoint
that stores a String
and a Point2D.Double
. First, you need to mark the Point2D.Double
field as transient
to avoid a NotSerializableException
.
public class LabeledPoint implements Serializable { . . . private String label; private transient Point2D.Double point; }
In the writeObject
method, we first write the object descriptor and the String
field, state, by calling the defaultWriteObject
method. This is a special method of the ObjectOutputStream
class that can only be called from within a writeObject
method of a serializable class. Then we write the point coordinates, using the standard DataOutput
calls.
private void writeObject(ObjectOutputStream out) throws IOException { out.defaultWriteObject(); out.writeDouble(point.getX()); out.writeDouble(point.getY()); }
In the readObject
method, we reverse the process:
private void readObject(ObjectInputStream in) throws IOException { in.defaultReadObject(); double x = in.readDouble(); double y = in.readDouble(); point = new Point2D.Double(x, y); }
Another example is the java.util.Date
class that supplies its own readObject
and writeObject
methods. These methods write the date as a number of milliseconds from the epoch (January 1, 1970, midnight UTC). The Date
class has a complex internal representation that stores both a Calendar
object and a millisecond count to optimize lookups. The state of the Calendar
is redundant and does not have to be saved.
The readObject
and writeObject
methods only need to save and load their data fields. They should not concern themselves with superclass data or any other class information.
Rather than letting the serialization mechanism save and restore object data, a class can define its own mechanism. To do this, a class must implement the Externalizable
interface. This in turn requires it to define two methods:
public void readExternal(ObjectInputStream in) throws IOException, ClassNotFoundException; public void writeExternal(ObjectOutputStream out) throws IOException;
Unlike the readObject
and writeObject
methods that were described in the preceding section, these methods are fully responsible for saving and restoring the entire object, including the superclass data. The serialization mechanism merely records the class of the object in the stream. When reading an externalizable object, the object stream creates an object with the default constructor and then calls the readExternal
method. Here is how you can implement these methods for the Employee
class:
public void readExternal(ObjectInput s) throws IOException { name = s.readUTF(); salary = s.readDouble(); hireDay = new Date(s.readLong()); } public void writeExternal(ObjectOutput s) throws IOException { s.writeUTF(name); s.writeDouble(salary); s.writeLong(hireDay.getTime()); }
Serialization is somewhat slow because the virtual machine must discover the structure of each object. If you are concerned about performance and if you read and write a large number of objects of a particular class, you should investigate the use of the Externalizable
interface. The tech tip http://java.sun.com/developer/TechTips/2000/tt0425.html demonstrates that in the case of an employee class, using external reading and writing was about 35 to 40 percent faster than the default serialization.
You have to pay particular attention when serializing and deserializing objects that are assumed to be unique. This commonly happens when you are implementing singletons and typesafe enumerations.
If you use the enum
construct of Java SE 5.0, then you need not worry about serialization—it just works. However, suppose you maintain legacy code that contains an enumerated type such as
public class Orientation { public static final Orientation HORIZONTAL = new Orientation(1); public static final Orientation VERTICAL = new Orientation(2); private Orientation(int v) { value = v; } private int value; }
This idiom was common before enumerations were added to the Java language. Note that the constructor is private. Thus, no objects can be created beyond Orientation.HORIZONTAL
and Orientation.VERTICAL
. In particular, you can use the ==
operator to test for object equality:
if (orientation == Orientation.HORIZONTAL) . . .
There is an important twist that you need to remember when a typesafe enumeration implements the Serializable
interface. The default serialization mechanism is not appropriate. Suppose we write a value of type Orientation
and read it in again:
Orientation original = Orientation.HORIZONTAL; ObjectOutputStream out = . . .; out.write(value); out.close(); ObjectInputStream in = . . .; Orientation saved = (Orientation) in.read();
Now the test
if (saved == Orientation.HORIZONTAL) . . .
will fail. In fact, the saved
value is a completely new object of the Orientation
type and not equal to any of the predefined constants. Even though the constructor is private, the serialization mechanism can create new objects!
To solve this problem, you need to define another special serialization method, called readResolve
. If the readResolve
method is defined, it is called after the object is deserialized. It must return an object that then becomes the return value of the readObject
method. In our case, the readResolve
method will inspect the value
field and return the appropriate enumerated constant:
protected Object readResolve() throws ObjectStreamException { if (value == 1) return Orientation.HORIZONTAL; if (value == 2) return Orientation.VERTICAL; return null; // this shouldn't happen }
Remember to add a readResolve
method to all typesafe enumerations in your legacy code and to all classes that follow the singleton design pattern.
If you use serialization to save objects, you will need to consider what happens when your program evolves. Can version 1.1 read the old files? Can the users who still use 1.0 read the files that the new version is now producing? Clearly, it would be desirable if object files could cope with the evolution of classes.
At first glance it seems that this would not be possible. When a class definition changes in any way, then its SHA fingerprint also changes, and you know that object streams will refuse to read in objects with different fingerprints. However, a class can indicate that it is compatible with an earlier version of itself. To do this, you must first obtain the fingerprint of the earlier version of the class. You use the stand-alone serialver
program that is part of the JDK to obtain this number. For example, running
serialver Employee
Employee: static final long serialVersionUID = -1814239825517340645L;
If you start the serialver
program with the -show
option, then the program brings up a graphical dialog box (see Figure 1-8).
All later versions of the class must define the serialVersionUID
constant to the same fingerprint as the original.
class Employee implements Serializable // version 1.1 { . . . public static final long serialVersionUID = -1814239825517340645L; }
When a class has a static data member named serialVersionUID
, it will not compute the fingerprint manually but instead will use that value.
Once that static data member has been placed inside a class, the serialization system is now willing to read in different versions of objects of that class.
If only the methods of the class change, there is no problem with reading the new object data. However, if data fields change, then you may have problems. For example, the old file object may have more or fewer data fields than the one in the program, or the types of the data fields may be different. In that case, the object stream makes an effort to convert the stream object to the current version of the class.
The object stream compares the data fields of the current version of the class with the data fields of the version in the stream. Of course, the object stream considers only the nontransient and nonstatic data fields. If two fields have matching names but different types, then the object stream makes no effort to convert one type to the other—the objects are incompatible. If the object in the stream has data fields that are not present in the current version, then the object stream ignores the additional data. If the current version has data fields that are not present in the streamed object, the added fields are set to their default (null
for objects, zero for numbers, and false
for boolean
values).
Here is an example. Suppose we have saved a number of employee records on disk, using the original version (1.0) of the class. Now we change the Employee
class to version 2.0 by adding a data field called department
. Figure 1-9 shows what happens when a 1.0 object is read into a program that uses 2.0 objects. The department field is set to null
. Figure 1-10 shows the opposite scenario: A program using 1.0 objects reads a 2.0 object. The additional department
field is ignored.
Is this process safe? It depends. Dropping a data field seems harmless—the recipient still has all the data that it knew how to manipulate. Setting a data field to null
might not be so safe. Many classes work hard to initialize all data fields in all constructors to non-null
values, so that the methods don’t have to be prepared to handle null
data. It is up to the class designer to implement additional code in the readObject
method to fix version incompatibilities or to make sure the methods are robust enough to handle null
data.
There is an amusing use for the serialization mechanism: It gives you an easy way to clone an object provided the class is serializable. Simply serialize it to an output stream and then read it back in. The result is a new object that is a deep copy of the existing object. You don’t have to write the object to a file—you can use a ByteArrayOutputStream
to save the data into a byte array.
As Listing 1-5 shows, to get clone
for free, simply extend the SerialCloneable
class, and you are done.
You should be aware that this method, although clever, will usually be much slower than a clone method that explicitly constructs a new object and copies or clones the data fields.
Example 1-5. SerialCloneTest.java
1. import java.io.*; 2. import java.util.*; 3. 4. public class SerialCloneTest 5. { 6. public static void main(String[] args) 7. { 8. Employee harry = new Employee("Harry Hacker", 35000, 1989, 10, 1); 9. // clone harry 10. Employee harry2 = (Employee) harry.clone(); 11. 12. // mutate harry 13. harry.raiseSalary(10); 14. 15. // now harry and the clone are different 16. System.out.println(harry); 17. System.out.println(harry2); 18. } 19. } 20. 21. /** 22. A class whose clone method uses serialization. 23. */ 24. class SerialCloneable implements Cloneable, Serializable 25. { 26. public Object clone() 27. { 28. try 29. { 30. // save the object to a byte array 31. ByteArrayOutputStream bout = new ByteArrayOutputStream(); 32. ObjectOutputStream out = new ObjectOutputStream(bout); 33. out.writeObject(this); 34. out.close(); 35. 36. // read a clone of the object from the byte array 37. ByteArrayInputStream bin = new ByteArrayInputStream(bout.toByteArray()); 38. ObjectInputStream in = new ObjectInputStream(bin); 39. Object ret = in.readObject(); 40. in.close(); 41. 42. return ret; 43. } 44. catch (Exception e) 45. { 46. return null; 47. } 48. } 49. } 50. 51. /** 52. The familiar Employee class, redefined to extend the 53. SerialCloneable class. 54. */ 55. class Employee extends SerialCloneable 56. { 57. public Employee(String n, double s, int year, int month, int day) 58. { 59. name = n; 60. salary = s; 61. GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day); 62. hireDay = calendar.getTime(); 63. } 64. 65. public String getName() 66. { 67. return name; 68. } 69. 70. public double getSalary() 71. { 72. return salary; 73. } 74. 75. public Date getHireDay() 76. { 77. return hireDay; 78. } 79. 80. public void raiseSalary(double byPercent) 81. { 82. double raise = salary * byPercent / 100; 83. salary += raise; 84. } 85. 86. public String toString() 87. { 88. return getClass().getName() 89. + "[name=" + name 90. + ",salary=" + salary 91. + ",hireDay=" + hireDay 92. + "]"; 93. } 94. 95. private String name; 96. private double salary; 97. private Date hireDay; 98. }
You have learned how to read and write data from a file. However, there is more to file management than reading and writing. The File
class encapsulates the functionality that you will need to work with the file system on the user’s machine. For example, you use the File
class to find out when a file was last modified or to remove or rename the file. In other words, the stream classes are concerned with the contents of the file, whereas the File
class is concerned with the storage of the file on a disk.
As is so often the case in Java, the File
class takes the least common denominator approach. For example, under Windows, you can find out (or set) the read-only flag for a file, but while you can find out if it is a hidden file, you can’t hide it without using a native method.
The simplest constructor for a File
object takes a (full) file name. If you don’t supply a path name, then Java uses the current directory. For example,
File f = new File("test.txt");
gives you a file object with this name in the current directory. (The “current directory” is the current directory of the process that executes the virtual machine. If you launched the virtual machine from the command line, it is the directory from which you started the java
executable.)
Because the backslash character is the escape character in Java strings, be sure to use \
for Windows-style path names ("C:\Windows\win.ini"
). In Windows, you can also use a single forward slash ("C:/Windows/win.ini"
) because most Windows file handling system calls will interpret forward slashes as file separators. However, this is not recommended—the behavior of the Windows system functions is subject to change, and on other operating systems, the file separator might be different. Instead, for portable programs, you should use the file separator character for the platform on which your program runs. It is stored in the constant string File.separator
.
A call to this constructor does not create a file with this name if it doesn’t exist. Actually, creating a file from a File
object is done with one of the stream class constructors or the createNewFile
method in the File
class. The createNewFile
method only creates a file if no file with that name exists, and it returns a boolean
to tell you whether it was successful.
On the other hand, once you have a File
object, the exists
method in the File
class tells you whether a file exists with that name. For example, the following trial program would almost certainly print “false” on anyone’s machine, and yet it can print out a path name to this nonexistent file.
import java.io.*; public class Test { public static void main(String args[]) { File f = new File("afilethatprobablydoesntexist"); System.out.println(f.getAbsolutePath()); System.out.println(f.exists()); } }
There are two other constructors for File
objects:
File(String path, String name)
which creates a File
object with the given name in the directory specified by the path
parameter. (If the path
parameter is null
, this constructor creates a File
object, using the current directory.)
Finally, you can use an existing File
object in the constructor:
File(File dir, String name)
where the File
object represents a directory and, as before, if dir
is null
, the constructor creates a File
object in the current directory.
Somewhat confusingly, a File
object can represent either a file or a directory (perhaps because the operating system that the Java designers were most familiar with happens to implement directories as files). You use the isDirectory
and isFile
methods to tell whether the file object represents a file or a directory. This is surprising—in an object-oriented system, you might have expected a separate Directory
class, perhaps extending the File
class.
To make an object representing a directory, you simply supply the directory name in the File
constructor:
File tempDir = new File(File.separator + "temp");
If this directory does not yet exist, you can create it with the mkdir
method:
tempDir.mkdir();
If a file object represents a directory, use list()
to get an array of the file names in that directory. The program in Listing 1-6 uses all these methods to print out the directory substructure of whatever path is entered on the command line. (It would be easy enough to change this program into a utility class that returns a list of the subdirectories for further processing.)
Always use File
objects, not strings, when manipulating file or directory names. For example, the equals
method of the File
class knows that some file systems are not case significant and that a trailing /
in a directory name doesn’t matter.
Example 1-6. FindDirectories.java
1. import java.io.*; 2. 3. /** 4. * @version 1.00 05 Sep 1997 5. * @author Gary Cornell 6. */ 7. public class FindDirectories 8. { 9. public static void main(String[] args) 10. { 11. // if no arguments provided, start at the parent directory 12. if (args.length == 0) args = new String[] { ".." }; 13. 14. try 15. { 16. File pathName = new File(args[0]); 17. String[] fileNames = pathName.list(); 18. 19. // enumerate all files in the directory 20. for (int i = 0; i < fileNames.length; i++) 21. { 22. File f = new File(pathName.getPath(), fileNames[i]); 23. 24. // if the file is again a directory, call the main method recursively 25. if (f.isDirectory()) 26. { 27. System.out.println(f.getCanonicalPath()); 28. main(new String[] { f.getPath() }); 29. } 30. } 31. } 32. catch (IOException e) 33. { 34. e.printStackTrace(); 35. } 36. } 37. }
Rather than listing all files in a directory, you can use a FileNameFilter
object as a parameter to the list
method to narrow down the list. These objects are simply instances of a class that satisfies the FilenameFilter
interface.
All a class needs to do to implement the FilenameFilter
interface is define a method called accept
. Here is an example of a simple FilenameFilter
class that allows only files with a specified extension:
public class ExtensionFilter implements FilenameFilter { public ExtensionFilter(String ext) { extension = "." + ext; } public boolean accept(File dir, String name) { return name.endsWith(extension); } private String extension; }
When writing portable programs, it is a challenge to specify file names with subdirectories. As we mentioned earlier, it turns out that you can use a forward slash (the UNIX separator) as the directory separator in Windows as well, but other operating systems might not permit this, so we don’t recommend using a forward slash.
If you do use forward slashes as directory separators in Windows when constructing a File
object, the getAbsolutePath
method returns a file name that contains forward slashes, which will look strange to Windows users. Instead, use the getCanonicalPath
method—it replaces the forward slashes with backslashes.
It is much better to use the information about the current directory separator that the File
class stores in a static instance field called separator
. In a Windows environment, this is a backslash (); in a UNIX environment, it is a forward slash (
/
). For example:
File foo = new File("Documents" + File.separator + "data.txt")
Of course, if you use the second alternate version of the File
constructor
File foo = new File("Documents", "data.txt")
then the constructor will supply the correct separator.
The API notes that follow give you what we think are the most important remaining methods of the File
class; their use should be straightforward.
Java SE 1.4 introduced a number of features for improved input/output processing, collectively called the “new I/O,” in the java.nio
package. (Of course, the “new” moniker is somewhat regrettable because, a few years down the road, the package wasn’t new any longer.)
The package includes support for the following features:
Character set encoders and decoders
Nonblocking I/O
Memory-mapped files
File locking
We already covered character encoding and decoding in the section “Character Sets” on page 19. Nonblocking I/O is discussed in Chapter 3 because it is particularly important when communicating across a network. In the following sections, we examine memory-mapped files and file locking in detail.
Most operating systems can take advantage of the virtual memory implementation to “map” a file, or a region of a file, into memory. Then the file can be accessed as if it were an in-memory array, which is much faster than the traditional file operations.
At the end of this section, you can find a program that computes the CRC32 checksum of a file, using traditional file input and a memory-mapped file. On one machine, we got the timing data shown in Table 1-6 when computing the checksum of the 37-Mbyte file rt.jar
in the jre/lib
directory of the JDK.
As you can see, on this particular machine, memory mapping is a bit faster than using buffered sequential input and dramatically faster than using a RandomAccessFile
.
Of course, the exact values will differ greatly from one machine to another, but it is obvious that the performance gain can be substantial if you need to use random access. For sequential reading of files of moderate size, on the other hand, there is no reason to use memory mapping.
The java.nio
package makes memory mapping quite simple. Here is what you do.
First, get a channel from the file. A channel is an abstraction for disk files that lets you access operating system features such as memory mapping, file locking, and fast data transfers between files. You get a channel by calling the getChannel
method that has been added to the FileInputStream
, FileOutputStream
, and RandomAccessFile
class.
FileInputStream in = new FileInputStream(. . .); FileChannel channel = in.getChannel();
Then you get a MappedByteBuffer
from the channel by calling the map
method of the FileChannel
class. You specify the area of the file that you want to map and a mapping mode. Three modes are supported:
FileChannel.MapMode.READ_ONLY:
The resulting buffer is read-only. Any attempt to write to the buffer results in a ReadOnlyBufferException
.
FileChannel.MapMode.READ_WRITE:
The resulting buffer is writable, and the changes will be written back to the file at some time. Note that other programs that have mapped the same file might not see those changes immediately. The exact behavior of simultaneous file mapping by multiple programs is operating-system dependent.
FileChannel.MapMode.PRIVATE:
The resulting buffer is writable, but any changes are private to this buffer and are not propagated to the file.
Once you have the buffer, you can read and write data, using the methods of the ByteBuffer
class and the Buffer
superclass.
Buffers support both sequential and random data access. A buffer has a position that is advanced by get
and put
operations. For example, you can sequentially traverse all bytes in the buffer as
while (buffer.hasRemaining()) { byte b = buffer.get(); . . . }
Alternatively, you can use random access:
for (int i = 0; i < buffer.limit(); i++) { byte b = buffer.get(i); . . . }
You can also read and write arrays of bytes with the methods
get(byte[] bytes) get(byte[], int offset, int length)
Finally, there are methods
getInt getLong getShort getChar getFloat getDouble
to read primitive type values that are stored as binary values in the file. As we already mentioned, Java uses big-endian ordering for binary data. However, if you need to process a file containing binary numbers in little-endian order, simply call
buffer.order(ByteOrder.LITTLE_ENDIAN);
To find out the current byte order of a buffer, call
ByteOrder b = buffer.order()
To write numbers to a buffer, use one of the methods
putInt putLong putShort putChar putFloat putDouble
Listing 1-7 computes the 32-bit cyclic redundancy checksum (CRC32) of a file. That quantity is a checksum that is often used to determine whether a file has been corrupted. Corruption of a file makes it very likely that the checksum has changed. The java.util.zip
package contains a class CRC32
that computes the checksum of a sequence of bytes, using the following loop:
CRC32 crc = new CRC32(); while (more bytes) crc.update(next byte) long checksum = crc.getValue();
For a nice explanation of the CRC algorithm, see http://www.relisoft.com/Science/CrcMath.html.
The details of the CRC computation are not important. We just use it as an example of a useful file operation.
Run the program as
java NIOTest filename
Example 1-7. NIOTest.java
1. import java.io.*; 2. import java.nio.*; 3. import java.nio.channels.*; 4. import java.util.zip.*; 5. 6. /** 7. * This program computes the CRC checksum of a file. <br> 8. * Usage: java NIOTest filename 9. * @version 1.01 2004-05-11 10. * @author Cay Horstmann 11. */ 12. public class NIOTest 13. { 14. public static long checksumInputStream(String filename) throws IOException 15. { 16. InputStream in = new FileInputStream(filename); 17. CRC32 crc = new CRC32(); 18. 19. int c; 20. while ((c = in.read()) != -1) 21. crc.update(c); 22. return crc.getValue(); 23. } 24. 25. public static long checksumBufferedInputStream(String filename) throws IOException 26. { 27. InputStream in = new BufferedInputStream(new FileInputStream(filename)); 28. CRC32 crc = new CRC32(); 29. 30. int c; 31. while ((c = in.read()) != -1) 32. crc.update(c); 33. return crc.getValue(); 34. } 35. 36. public static long checksumRandomAccessFile(String filename) throws IOException 37. { 38. RandomAccessFile file = new RandomAccessFile(filename, "r"); 39. long length = file.length(); 40. CRC32 crc = new CRC32(); 41. 42. for (long p = 0; p < length; p++) 43. { 44. file.seek(p); 45. int c = file.readByte(); 46. crc.update(c); 47. } 48. return crc.getValue(); 49. } 50. 51. public static long checksumMappedFile(String filename) throws IOException 52. { 53. FileInputStream in = new FileInputStream(filename); 54. FileChannel channel = in.getChannel(); 55. 56. CRC32 crc = new CRC32(); 57. int length = (int) channel.size(); 58. MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, length); 59. 60. for (int p = 0; p < length; p++) 61. { 62. int c = buffer.get(p); 63. crc.update(c); 64. } 65. return crc.getValue(); 66. } 67. 68. public static void main(String[] args) throws IOException 69. { 70. System.out.println("Input Stream:"); 71. long start = System.currentTimeMillis(); 72. long crcValue = checksumInputStream(args[0]); 73. long end = System.currentTimeMillis(); 74. System.out.println(Long.toHexString(crcValue)); 75. System.out.println((end - start) + " milliseconds"); 76. 77. System.out.println("Buffered Input Stream:"); 78. start = System.currentTimeMillis(); 79. crcValue = checksumBufferedInputStream(args[0]); 80. end = System.currentTimeMillis(); 81. System.out.println(Long.toHexString(crcValue)); 82. System.out.println((end - start) + " milliseconds"); 83. 84. System.out.println("Random Access File:"); 85. start = System.currentTimeMillis(); 86. crcValue = checksumRandomAccessFile(args[0]); 87. end = System.currentTimeMillis(); 88. System.out.println(Long.toHexString(crcValue)); 89. System.out.println((end - start) + " milliseconds"); 90. 91. System.out.println("Mapped File:"); 92. start = System.currentTimeMillis(); 93. crcValue = checksumMappedFile(args[0]); 94. end = System.currentTimeMillis(); 95. System.out.println(Long.toHexString(crcValue)); 96. System.out.println((end - start) + " milliseconds"); 97. } 98. }
When you use memory mapping, you make a single buffer that spans the entire file, or the area of the file in which you are interested. You can also use buffers to read and write more modest chunks of information.
In this section, we briefly describe the basic operations on Buffer
objects. A buffer is an array of values of the same type. The Buffer
class is an abstract class with concrete subclasses ByteBuffer
, CharBuffer
, DoubleBuffer
, FloatBuffer
, IntBuffer
, LongBuffer
, and ShortBuffer
.
In practice, you will most commonly use ByteBuffer
and CharBuffer
. As shown in Figure 1-11, a buffer has
A capacity that never changes.
A position at which the next value is read or written.
A limit beyond which reading and writing is meaningless.
Optionally, a mark for repeating a read or write operation.
These values fulfill the condition
0 ≤ mark ≤ position ≤ limit ≤ capacity
The principal purpose for a buffer is a “write, then read” cycle. At the outset, the buffer’s position is 0 and the limit is the capacity. Keep calling put
to add values to the buffer. When you run out of data or you reach the capacity, it is time to switch to reading.
Call flip
to set the limit to the current position and the position to 0. Now keep calling get
while the remaining
method (which returns limit - position) is positive. When you have read all values in the buffer, call clear
to prepare the buffer for the next writing cycle. The clear
method resets the position to 0 and the limit to the capacity.
If you want to reread the buffer, use rewind
or mark/reset
—see the API notes for details.
Consider a situation in which multiple simultaneously executing programs need to modify the same file. Clearly, the programs need to communicate in some way, or the file can easily become damaged.
File locks control access to a file or a range of bytes within a file. However, file locking varies greatly among operating systems, which explains why file locking capabilities were absent from prior versions of the JDK.
File locking is not all that common in application programs. Many applications use a database for data storage, and the database has mechanisms for resolving concurrent access. If you store information in flat files and are worried about concurrent access, you might find it simpler to start using a database rather than designing complex file locking schemes.
Still, there are situations in which file locking is essential. Suppose your application saves a configuration file with user preferences. If a user invokes two instances of the application, it could happen that both of them want to write the configuration file at the same time. In that situation, the first instance should lock the file. When the second instance finds the file locked, it can decide to wait until the file is unlocked or simply skip the writing process.
To lock a file, call either the lock
or tryLock
method of the FileChannel
class:
FileLock lock = channel.lock();
or
FileLock lock = channel.tryLock();
The first call blocks until the lock becomes available. The second call returns immediately, either with the lock or null
if the lock is not available. The file remains locked until the channel is closed or the release
method is invoked on the lock.
You can also lock a portion of the file with the call
FileLock lock(long start, long size, boolean exclusive)
or
FileLock tryLock(long start, long size, boolean exclusive)
The exclusive
flag is true
to lock the file for both reading and writing. It is false
for a shared lock, which allows multiple processes to read from the file, while preventing any process from acquiring an exclusive lock. Not all operating systems support shared locks. You may get an exclusive lock even if you just asked for a shared one. Call the isShared
method of the FileLock
class to find out which kind you have.
If you lock the tail portion of a file and the file subsequently grows beyond the locked portion, the additional area is not locked. To lock all bytes, use a size of Long.MAX_VALUE
.
Keep in mind that file locking is system dependent. Here are some points to watch for:
On some systems, file locking is merely advisory. If an application fails to get a lock, it may still write to a file that another application has currently locked.
On some systems, you cannot simultaneously lock a file and map it into memory.
File locks are held by the entire Java virtual machine. If two programs are launched by the same virtual machine (such as an applet or application launcher), then they can’t each acquire a lock on the same file. The lock
and tryLock
methods will throw an OverlappingFileLockException
if the virtual machine already holds another overlapping lock on the same file.
On some systems, closing a channel releases all locks on the underlying file held by the Java virtual machine. You should therefore avoid multiple channels on the same locked file.
Locking files on a networked file system is highly system dependent and should probably be avoided.
Regular expressions are used to specify string patterns. You can use regular expressions whenever you need to locate strings that match a particular pattern. For example, one of our sample programs locates all hyperlinks in an HTML file by looking for strings of the pattern <a href="...">
.
Of course, for specifying a pattern, the ...
notation is not precise enough. You need to specify precisely what sequence of characters is a legal match. You need to use a special syntax whenever you describe a pattern.
Here is a simple example. The regular expression
[Jj]ava.+
matches any string of the following form:
For example, the string "javanese"
matches the particular regular expression, but the string "Core Java"
does not.
As you can see, you need to know a bit of syntax to understand the meaning of a regular expression. Fortunately, for most purposes, a small number of straightforward constructs are sufficient.
A character class is a set of character alternatives, enclosed in brackets, such as [Jj], [0-9]
, [A-Za-z]
, or [^0-9]
. Here the -
denotes a range (all characters whose Unicode value falls between the two bounds), and ^
denotes the complement (all characters except the ones specified).
There are many predefined character classes such as d
(digits) or p{Sc}
(Unicode currency symbol). See Tables 1-7 and 1-8.