Writing Objects to Disk

We've already seen how to write out strings, ints, doubles, etc., in both printable and binary forms. It may surprise you to learn that you can also write out, and later read back in, entire objects. When you serialize (write out) a single object to a data stream, it automatically saves the object, namely, all its instance data. If any of these non-static fields reference other objects, those objects are serialized too. That way, when you later deserialize (restore) the object, you get back the object and all its member fields pointing to all the things they pointed to before—everything needed to reconstitute the original object.

For example, if you serialize one element of a doubly linked list, everything it references, and everything the references reference, and so on, will be saved. For an element of a doubly linked list, that means the elements on each side of it, and the elements on each side of those (one of which is the original element—that doesn't get written out twice), and so on until the entire list has been written to disk.

The point of including everything that your object connects to, and all the things they connect to and so on, is to ensure that (when you read them back in) you can use those objects with the same state and contents that they had when you originally wrote them out. Doesn't this swell up the size of what you're writing until it's as big as your entire program? Objects contain references to their member fields, but members tend not to back reference the object of which they are a field. That means the links mostly go one way, and so serialized objects in practice remain a manageable size.

It's quite powerful to be able to do I/O on an entire graph of objects with one simple method call. If an object can be written to a stream, it can also be sent through a socket, compressed, encrypted, read out of a socket on another host, backed up onto a file, and later read back in again and reconstituted.

To make an object serializable, all you need do is make its class implement the Serializable interface. The interface java.io.Serializable doesn't have any methods or fields. It is an example of the Marker Interface design pattern. The purpose of requiring a class to implement an empty interface is to identify to other programmers and to the run-time library that it can be serialized. Here is a class that can be serialized:

package java.util;
public class Date implements java.io.Serializable { ...

Here is how you can serialize a Date object, and save it in a file:

// first create the file
ObjectOutputStream oos =new ObjectOutputStream (new FileOutputStream("serial.bin") );
java.util.Date d = new java.util.Date();
oos.writeObject(d);

You can go on to write many more objects into the ObjectOutputStream. You need to be aware of the types that you are writing, so that you can cast them correctly back to their original type when you read them in again. Here is how you would read a serialized object back in again:

ObjectInputStream ois = new ObjectInputStream ( new FileInputStream("serial.bin") );
java.util.Date restoredDate = (java.util.Date) ois.readObject();

It usually comes as a pleasant surprise to people to see that serialization is so easy! When you read an object back in, it has the type of Object. You need to cast it back to what it actually is. A ClassCastException will be thrown if you try to cast an object to a class it doesn't belong to. In the example, we really are reading in a Date, and so the cast takes place without a problem. If you put the above lines in a main program and add a couple of println's for the date before and after serialization, you will see output similar to this:

javac Serial.java
java Serial
date written out: Fri Apr 30 13:22:22
  date read back: Fri Apr 30 13:22:22

The important thing here is that the Date object survived its journey out into the filesystem, and came back in with the same value. Reading and writing objects is thankfully very simple.

Make sure you close an ObjectInputStream as soon as you have got the objects back that you need. ObjectInputStream hold references to all objects read from it, so their memory isn't eligible for garbage collection until the stream is closed or reset.

If a class is serializable, so are its subclasses (any interfaces of a parent are always inherited by the child). You can make a class serializable even if its parent isn't as long as the parent has a no-arg constructor and the child takes responsibility for restoring any parent context that wasn't serialized.

Serializing and security

Java object serialization was developed as an enabler for two other technologies: RMI and Java Beans. RMI (Remote Method Invocation) lets you make method calls across the network to processes running on other computers. Java Beans are a technique for modularizing code and manipulating them in a visual tool. Beans are enjoying some success in enterprise software, but have (surprisingly) been a total flop in desktop applications so far. Maybe the performance improvements of the 1.5 release will revive interest in desktop applications and hence in desktop beans.

When serialization was still in the design stage inside Sun, there was a great deal of debate about whether Java could allow serialization as the default setting. In other words, classes would have to opt out of allowing it, rather than opt in. After a lot of soul-searching, it was decided that programmers must take some explicit step to indicate that a class can be serializable (namely, we have to state that the class implements Serializable). The reason is that there are security implications to serializing a class.

If you serialize something like a file descriptor, someone could edit the file containing it and change some of the fields. When the file descriptor is read back in and deserialized, it will now be pointing at a different entry in the OS file descriptor table, or perhaps something outside the table altogether. Even though the exploit took place using native code, it would detract from the overall high level of security that Java enjoys.

So designers need to consider the security aspects when they make a class serializable. What happens if some field is given a different value while the object is in a file? Perhaps some kind of validation can be done after the object is read back in. Fields can be cross-checked for consistency with other data. You can also take complete control over the serialization process by doing it yourself. You take this approach by implementing the Externalizable interface and providing bodies for the two read and write methods therein.

Another step to improving the security of your serialized objects is to use the “transient” keyword. Any data field that is marked “transient” will not be written out when the class is serialized. You can often mark it “private” as well. A transient field is one that has a value that depends on some current state that will not be saved. For example, “current_speed” would be transient. You can also use the transient keyword to prevent the writing out of a field that is sensitive, such as “salary.” If you do that, you will need to find some other way to restore the value after you have deserialized the object, perhaps by reading it from a secure database.

Some entire classes are not capable of being serialized. One such example is java.lang.Thread. Threads consist largely of Java code, but they also have a significant native code part. Each Java thread has two stacks: one for Java code and one for system code (usually C code). The native stack of a thread is not managed by Java, but by native code. The Java run-time doesn't know much about the native stack of a thread, and cannot save it. Thus, trying to serialize a thread will not be successful.

We already mentioned that only instance data is saved, not static data. You don't need to save the instructions in the methods of an object; that information is exactly what a class file is. So to successfully deserialize an object and use it, perhaps on another host, its class file must be accessible in the new environment.

XML support class

Just as I/O for ordinary types can be printable or binary, so too can objects be serialized into binary form or (incredibly) printable form. One of the main purposes of XML is to represent binary data in printable form. XML is described at length later in this book, and the one sentence summary is, “XML is a portable way of storing data items in character form surrounded by tags that say what type each item is.”

The Java 1.4 release introduced a class that let you serialize objects into XML form! The documentation says that this is intended solely for Java beans, including Swing GUI components. For your other classes (they say) you should continue using java.io.ObjectOuputStream. Those of us who like the advantages of XML will be the best judge of what to use where!

Here's an example of object persistence using XML:

import java.beans.*;
import java.io.*;
public class serial {
    public static void main(String args[]) throws Exception {
        java.util.Date d = new java.util.Date();
        FileOutputStream fos = new FileOutputStream("serial.xml");
        XMLEncoder xe = new XMLEncoder( fos );
        xe.writeObject(d);
        xe.close();
    }
}

After running the program, file serial.xml contains these lines:

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.5.0" class="java.beans.XMLDecoder">
 <object class="java.util.Date">
  <long>1083282712203</long>
 </object>
</java>

XML makes a terrific serializable format for objects. I hope that Sun eventually blesses its use for all classes. The reason for their restriction is to maintain backwards compatibility with all the code that uses the earlier binary format. If you don't have any code that uses the earlier format, or you can convert it, then go ahead and serialize to XML.

Why would you serialize an object at all? The object may represent a customer record or a transaction that you wish to capture on backing store for archival, logging, or audit trail purposes. By writing it in XML, you are ensuring that it will be readable by any system at any time in the future, and can also be processed by a lot of automated tools that don't know about your specific data format.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.200.71