Chapter 10. Serialization

This chapter concerns the object serialization API, which provides a framework for encoding objects as byte streams and reconstructing objects from their byte-stream encodings. Encoding an object as a byte stream is known as serializing the object; the reverse process is known as deserializing it. Once an object has been serialized, its encoding can be transmitted from one running virtual machine to another or stored on disk for later deserialization. Serialization provides the standard wire-level object representation for remote communication, and the standard persistent data format for the JavaBeans component architecture.

Item 54: Implement Serializable judiciously

Allowing a class's instances to be serialized can be as simple as adding the words “implements Serializable” to its declaration. Because this is so easy to do, there is a common misconception that serialization requires little effort on the part of the programmer. The truth is far more complex. While the immediate cost to make a class serializable can be negligible, the long-term costs are often substantial.

A major cost of implementing Serializable is that it decreases the flexibility to change a class's implementation once it has been released. When a class implements Serializable, its byte-stream encoding (or serialized form) becomes part of its exported API. Once you distribute a class widely, you are generally required to support the serialized form forever, just as you are required to support all other parts of the exported API. If you do not go to the effort to design a custom serialized form, but merely accept the default, the serialized form will forever be tied to the class's original internal representation. In other words, if you accept the default serialized form, the class's private and package-private instance fields become part of its exported API, and the practice of minimizing access to fields (Item 12) loses its effectiveness as a tool for information hiding.

If you accept the default serialized form and later change the class's internal representation, an incompatible change in the serialized form may result. Clients attempting to serialize an instance using an old version of the class and deserialize it using the new version will experience program failures. It is possible to change the internal representation while maintaining the original serialized form (using ObjectOutputStream.putFields and ObjectInputStream.readFields), but it can be difficult and leaves visible warts in the source code. Therefore you should carefully design a high-quality serialized form that you are willing to live with for the long haul (Item 55). Doing so will add to the cost of development, but it is worth the effort. Even a well-designed serialized form places constraints on the evolution of a class; an ill-designed serialized form can be crippling.

A simple example of the constraints on evolution that accompany serializability concerns stream unique identifiers, more commonly known as serial version UIDs. Every serializable class has a unique identification number associated with it. If you do not specify the identification number explicitly by declaring a private static final long field named serialVersionUID, the system automatically generates it by applying a complex deterministic procedure to the class. The automatically generated value is affected by the class's name, the names of the interfaces it implements, and all of its public and protected members. If you change any of these things in any way, for example, by adding a trivial convenience method, the automatically generated serial version UID changes. If you fail to declare an explicit serial version UID, compatibility will be broken.

A second cost of implementing Serializable is that it increases the likelihood of bugs and security holes. Normally, objects are created using constructors; serialization is an extralinguistic mechanism for creating objects. Whether you accept the default behavior or override it, deserialization is a “hidden constructor” with all of the same issues as other constructors. Because there is no explicit constructor, it is easy to forget that you must ensure that deserialization guarantees all of the invariants established by real constructors and that it does not allow an attacker to gain access to the internals of the object under construction. Relying on the default deserialization mechanism can easily leave objects open to invariant corruption and illegal access (Item 56).

A third cost of implementing Serializable is that it increases the testing burden associated with releasing a new version of a class. When a serializable class is revised, it is important to check that it is possible to serialize an instance in the new release, and deserialize it in old releases, and vice versa. The amount of testing required is thus proportional to the product of the number of serializable classes and the number of releases, which can be large. These tests cannot be constructed automatically because, in addition to binary compatibility, you must test for semantic compatibility. In other words, you must ensure both that the serialization-deserialization process succeeds and that it results in a faithful replica of the original object. The greater the change to a serializable class, the greater the need for testing. The need is reduced if a custom serialized form is carefully designed when the class is first written (Item 55), but it does not vanish entirely.

Implementing the Serializable interface is not a decision to be undertaken lightly. It offers real benefits: It is essential if a class is to participate in some framework that relies on serialization for object transmission or persistence. Furthermore, it greatly eases the use of a class as a component in another class that must implement Serializable. There are, however, many real costs associated with implementing Serializable. Each time you implement a class, weigh the costs against the benefits. As a rule of thumb, value classes such as Date and BigInteger should implement Serializable, as should most collection classes. Classes representing active entities, such as thread pools, should rarely implement Serializable. As of release 1.4, there is an XML-based JavaBeans persistence mechanism, so it is no longer necessary for Beans to implement Serializable.

Classes designed for inheritance (Item 15) should rarely implement Serializable, and interfaces should rarely extend it. Violating this rule places a significant burden on anyone who extends the class or implements the interface. There are times when it is appropriate to violate the rule. For example, if a class or interface exists primarily to participate in some framework that requires all participants to implement Serializable, then it makes perfect sense for the class or interface to implement or extend Serializable.

There is one caveat regarding the decision not to implement Serializable. If a class that is designed for inheritance is not serializable, it may be impossible to write a serializable subclass. Specifically, it will be impossible if the superclass does not provide an accessible parameterless constructor. Therefore you should consider providing a parameterless constructor on nonserializable classes designed for inheritance. Often this requires no effort because many classes designed for inheritance have no state, but this is not always the case.

It is best to create objects with all of their invariants already established (Item 13). If client-provided information is required to establish these invariants, this precludes the use of a parameterless constructor. Naively adding a parameterless constructor and an initialization method to a class whose remaining constructors establish its invariants would complicate the class's state-space, increasing the likelihood of error.

Here is a way to add a parameterless constructor to a nonserializable extendable class that avoids these deficiencies. Suppose the class has one constructor:

public AbstractFoo(int x, int y) { ... }

The following transformation adds a protected parameterless constructor and an initialization method. The initialization method has the same parameters as the normal constructor and establishes the same invariants:

					//Nonserializable
					stateful class allowing serializable subclass
public abstract class AbstractFoo {
    private int x, y; // The state
    private boolean initialized = false;

    public AbstractFoo(int x, int y) { initialize(x, y); }

    /**
     * This constructor and the following method allow subclass's
     * readObject method to initialize our internal state.
     */
    protected AbstractFoo() { }

    protected final void initialize(int x, int y) {
        if (initialized)
            throw new IllegalStateException(
                "Already initialized");
        this.x = x;
        this.y = y;
        ... // Do anything else the original constructor did
        initialized = true;
    }

    /**
     * These methods provide access to internal state so it can
     * be manually serialized by subclass's writeObject method.
     */
    protected final int getX() { return x; }
    protected final int getY() { return y; }

    // Must be called by all public instance methods
    private void checkInit() throws IllegalStateException {
        if (!initialized)
            throw new IllegalStateException("Uninitialized");
    }
    ... // Remainder omitted
}

All instance methods in AbstractFoo must invoke checkInit before going about their business. This ensures that method invocations fail quickly and cleanly if a poorly written subclass fails to initialize an instance. With this mechanism in place, it is reasonably straightforward to implement a serializable subclass:

					//Serializable subclass of nonserializable stateful class
public class Foo extends AbstractFoo implements Serializable {
    private void readObject(ObjectInputStream s)
            throws IOException, ClassNotFoundException {
        s.defaultReadObject();

        // Manually deserialize and initialize superclass state
        int x = s.readInt();
        int y = s.readInt();
        initialize(x, y);
    }

    private void writeObject(ObjectOutputStream s)
            throws IOException {
        s.defaultWriteObject();

        // Manually serialize superclass state
        s.writeInt(getX());
        s.writeInt(getY());
    }

    // Constructor does not use any of the fancy mechanism
    public Foo(int x, int y) { super(x, y); }
}

Inner classes (Item 18) should rarely, if ever, implement Serializable. They use compiler-generated synthetic fields to store references to enclosing instances and to store values of local variables from enclosing scopes. How these fields correspond to the class definition is unspecified, as are the names of anonymous and local classes. Therefore, the default serialized form of an inner class is ill-defined. A static member class can, however, implement Serializable.

To summarize, the ease of implementing Serializable is specious. Unless a class is to be thrown away after a short period of use, implementing Serializable is a serious commitment that should be made with care. Extra caution is warranted if a class is designed for inheritance. For such classes, an intermediate design point between implementing Serializable and prohibiting it in subclasses is to provide an accessible parameterless constructor. This design point permits, but does not require, subclasses to implement Serializable.

Item 55:Consider using a custom serialized form

When you are producing a class under time pressure, it is generally appropriate to concentrate your efforts on designing the best API. Sometimes this means releasing a “throwaway” implementation, which you know you'll replace in a future release. Normally this is not a problem, but if the class implements Serializable and uses the default serialized form, you'll never be able to escape completely from the throwaway implementation. It will dictate the serialized form forever. This is not a theoretical problem. It happened to several classes in the Java platform libraries, such as BigInteger.

Do not accept the default serialized form without first considering whether it is appropriate. Accepting the default serialized form should be a conscious decision on your part that this encoding is reasonable from the standpoint of flexibility, performance, and correctness. Generally speaking, you should accept the default serialized form only if it is largely identical to the encoding that you would choose if you were designing a custom serialized form.

The default serialized form of an object is a reasonably efficient encoding of the physical representation of the object graph rooted at the object. In other words, it describes the data contained in the object and in every object that is reachable from this object. It also describes the topology by which all of these objects are interlinked. The ideal serialized form of an object contains only the logical data represented by the object. It is independent of the physical representation.

The default serialized form is likely to be appropriate if an object's physical representation is identical to its logical content. For example, the default serialized form would be reasonable for the following class, which represents a person's name:

					//Good candidate for default serialized form
public class Name implements Serializable {
    /**
     * Last name.  Must be non-null.
     * @serial
     */
    private String lastName;

    /**
     * First name.  Must be non-null.
     * @serial
     */
    private String firstName;
    /**
     * Middle initial, or 'u0000' if name lacks middle initial.
     * @serial
     */
    private char   middleInitial;

    ... // Remainder omitted
}

Logically speaking, a name consists of two strings that represent a last name and first name and a character that represents a middle initial. The instance fields in Name precisely mirror this logical content.

Even if you decide that the default serialized form is appropriate, you often must provide a readObject method to ensure invariants and security. In the case of Name, the readObject method could ensure that lastName and firstName were non-null. This issue is discussed at length in Item 56.

Note that there are documentation comments on the lastName, firstName, and middleInitial fields, even though they are private. That is because these private fields define a public API, the serialized form of the class, and this public API must be documented. The presence of the @serial tag tells the Javadoc utility to place this documentation on a special page that documents serialized forms.

Near the opposite end of the spectrum from Name, consider the following class, which represents a list of strings (ignoring for the moment that you'd be better off using one of the standard List implementations in the library):

					//Awful candidate for default serialized form
public class StringList implements Serializable {
    private int size = 0;
    private Entry head = null;

    private static class Entry implements Serializable {
        String data;
        Entry  next;
        Entry  previous;
    }

    ... // Remainder omitted
}

Logically speaking, this class represents a sequence of strings. Physically, it represents the sequence as a doubly linked list. If you accept the default serialized form, the serialized form will painstakingly mirror every entry in the linked list and all the links between the entries, in both directions.

Using the default serialized form when an object's physical representation differs substantially from its logical data content has four disadvantages:

  • It permanently ties the exported API to the internal representation. In the above example, the private StringList.Entry class becomes part of the public API. If the representation is changed in a future release, the StringList class will still need to accept the linked-list representation on input and generate it on output. The class will never be rid of the code to manipulate linked lists, even if it doesn't use them any more.

  • It can consume excessive space. In the above example, the serialized form unnecessarily represents each entry in the linked list and all the links. These entries and links are mere implementation details not worthy of inclusion in the serialized form. Because the serialized form is excessively large, writing it to disk or sending it across the network will be excessively slow.

  • It can consume excessive time. The serialization logic has no knowledge of the topology of the object graph, so it must go through an expensive graph traversal. In the example above, it would be sufficient simply to follow the next references.

  • It can cause stack overflows. The default serialization procedure performs a recursive traversal of the object graph, which can cause stack overflows even for moderately sized object graphs. Serializing a StringList instance with 1200 elements causes the stack to overflow on my machine. The number of elements required to cause this problem may vary depending on the JVM implementation; some implementations may not have this problem at all.

A reasonable serialized form for StringList is simply the number of strings in the list, followed by the strings themselves. This constitutes the logical data represented by a StringList, stripped of the details of its physical representation. Here is a revised version of StringList containing writeObject and readObject methods implementing this serialized form. As a reminder, the transient modifier indicates that an instance field is to be omitted from a class's default serialized form:

					//StringList with a reasonable
					custom serialized form
public class StringList implements Serializable {
    private transient int size   = 0;
    private transient Entry head = null;

    // No longer Serializable!
    private static class Entry {
        String data;
        Entry  next;
        Entry  previous;
    }

    // Appends the specified string to the list
    public void add(String s) { ... }

    /**
     * Serialize this <tt>StringList</tt> instance.
     *
     * @serialData The size of the list (the number of strings
     * it contains) is emitted (<tt>int</tt>), followed by all of
     * its elements (each a <tt>String</tt>), in the proper
     * sequence.
     */
    private void writeObject(ObjectOutputStream s)
            throws IOException {
        s.defaultWriteObject();
        s.writeInt(size);

       // Write out all elements in the proper order.
       for (Entry e = head; e != null; e = e.next)
           s.writeObject(e.data);
    }

    private void readObject(ObjectInputStream s)
            throws IOException, ClassNotFoundException {
        s.defaultReadObject();
        int size = s.readInt();

        // Read in all elements and insert them in list
        for (int i = 0; i < size; i++)
            add((String)s.readObject());
     }

    ... // Remainder omitted
}

Note that the writeObject method invokes defaultWriteObject and the readObject method invokes defaultReadObject, even though all of StringList's fields are transient. If all instance fields are transient, it is technically permissible to dispense with invoking defaultWriteObject and defaultReadObject, but it is not recommended. Even if all instance fields are transient, invoking defaultWriteObject affects the serialized form, resulting in greatly enhanced flexibility. The resulting serialized form makes it possible to add nontransient instance fields in a later release while preserving backward and forward compatibility. If an instance is serialized in a later version and deserialized in an earlier version, the added fields will be ignored. Had the earlier version's readObject method failed to invoke defaultReadObject, the deserialization would fail with a StreamCorruptedException.

Note that there is a documentation comment on the writeObject method, even though it is private. This is analogous to the documentation comment on the private fields in the Name class. This private method defines a public API, the serialized form, and that public API should be documented. Like the @serial tag for fields, the @serialData tag for methods tells the Javadoc utility to place this documentation on the serialized forms page.

To lend some sense of scale to the earlier performance discussion, if the average string length is ten characters, the serialized form of the revised version of StringList occupies about half as much space as the serialized form of the original. On my machine, serializing the revised version of StringList is about two and one half times as fast as serializing the original version, again with a string length of ten. Finally, there is no stack overflow problem in the revised form, hence no practical upper limit to the size of a StringList that can be serialized.

While the default serialized form would be bad for StringList, there are classes for which it would be far worse. For StringList, the default serialized form is inflexible and performs badly, but it is correct in the sense that serializing and deserializing a StringList instance yields a faithful copy of the original object with all of its invariants intact. This is not the case for any object whose invariants are tied to implementation-specific details.

For example, consider the case of a hash table. The physical representation is a sequence of hash buckets containing key-value entries. Which bucket an entry is placed in is a function of the hash code of the key, which is not, in general, guaranteed to be the same from JVM implementation to JVM implementation. In fact, it isn't even guaranteed to be the same from run to run on the same JVM implementation. Therefore accepting the default serialized form for a hash table would constitute a serious bug. Serializing and deserializing the hash table could yield an object whose invariants were seriously corrupt.

Whether or not you use the default serialized form, every instance field that is not labeled transient will be serialized when the defaultWriteObject method is invoked. Therefore every instance field that can be made transient should be made so. This includes redundant fields, whose values can be computed from “primary data fields,” such as a cached hash value. It also includes fields whose values are tied to one particular run of the JVM, such as a long field representing a pointer to a native data structure. Before deciding to make a field nontransient, convince yourself that its value is part of the logical state of the object. If you use a custom serialized form, most or all of the instance fields should be labeled transient, as in the StringList example shown above.

If you are using the default serialized form and you have labeled one or more fields transient, remember that these fields will be initialized to their default values when an instance is deserialized: null for object reference fields, zero for numeric primitive fields, and false for boolean fields [JLS, 4.5.5]. If these values are unacceptable for any transient fields, you must provide a readObject method that invokes the defaultReadObject method and then restores transient fields to acceptable values (Item 56). Alternatively, these fields can be lazily initialized the first time they are used.

Regardless of what serialized form you choose, declare an explicit serial version UID in every serializable class you write. This eliminates the serial version UID as a potential source of incompatibility (Item 54). There is also a small performance benefit. If no serial version UID is provided, an expensive computation is required to generate one at run time.

Declaring a serial version UID is simple. Just add this line to your class:

private static final long serialVersionUID = randomLongValue ;
				

It doesn't much matter which value you choose for randomLongValue. Common practice dictates that you generate the value by running the serialver utility on the class, but it's also fine to pick a number out of thin air. If you ever want to make a new version of the class that is incompatible with existing versions, merely change the value in the declaration. This will cause attempts to deserialize serialized instances of previous versions to fail with an InvalidClassException.

To summarize, when you have decided that a class should be serializable (Item 54), think hard about what the serialized form should be. Only use the default serialized form if it is a reasonable description of the logical state of the object; otherwise design a custom serialized form that aptly describes the object. You should allocate as much time to designing the serialized form of a class as you allocate to designing its exported methods. Just as you cannot eliminate exported methods from future versions, you cannot eliminate fields from the serialized form; they must be preserved forever to ensure serialization compatibility. Choosing the wrong serialized form can have permanent, negative impact on the complexity and performance of a class.

Item 56:Write readObject methods defensively

Item 24 contains an immutable date-range class containing mutable private date fields. The class goes to great lengths to preserve its invariants and its immutability by defensively copying Date objects in its constructor and accessors. Here is the class:

					//Immutable
					class that uses defensive copying
public final class Period {
    private final Date start;
    private final Date end;

    /**
     * @param start the beginning of the period.
     * @param end the end of the period; must not precede start.
     * @throws IllegalArgument if start is after end.
     * @throws NullPointerException if start or end is null.
     */
    public Period(Date start, Date end) {
        this.start = new Date(start.getTime());
        this.end   = new Date(end.getTime());

        if (this.start.compareTo(this.end) > 0)
          throw new IllegalArgumentException(start +" > "+ end);
    }

    public Date start () { return (Date) start.clone(); }

    public Date end () { return (Date) end.clone(); }

    public String toString() { return start + " - " + end; }

    ... // Remainder omitted
}

Suppose you decide that you want this class to be serializable. Because the physical representation of a Period object exactly mirrors its logical data content, it is not unreasonable to use the default serialized form (Item 55). Therefore, it might seem that all you have to do to make the class serializable is to add the words “implements Serializable” to the class declaration. If you did so, however, the class would no longer guarantee its critical invariants.

The problem is that the readObject method is effectively another public constructor, and it demands all of the same care as any other constructor. Just as a constructor must check its arguments for validity (Item 23) and make defensive copies of parameters where appropriate (Item 24), so must a readObject method. If a readObject method fails to do either of these things, it is a relatively simple matter for an attacker to violate the class's invariants.

Loosely speaking, readObject is a constructor that takes a byte stream as its sole parameter. In normal use, the byte stream is generated by serializing a normally constructed instance. The problem arises when readObject is presented with a byte stream that is artificially constructed to generate an object that violates the invariants of its class. Assume that we simply added “implements Serializable” to the class declaration for Period. This ugly program generates a Period instance whose end precedes its start:

public class BogusPeriod {
    //Byte stream could not have come from real Period instance
    private static final byte[] serializedForm = new byte[] {
    (byte)0xac, (byte)0xed, 0x00, 0x05, 0x73, 0x72, 0x00, 0x06,
    0x50, 0x65, 0x72, 0x69, 0x6f, 0x64, 0x40, 0x7e, (byte)0xf8,
    0x2b, 0x4f, 0x46, (byte)0xc0, (byte)0xf4, 0x02, 0x00, 0x02,
    0x4c, 0x00, 0x03, 0x65, 0x6e, 0x64, 0x74, 0x00, 0x10, 0x4c,
    0x6a, 0x61, 0x76, 0x61, 0x2f, 0x75, 0x74, 0x69, 0x6c, 0x2f,
    0x44, 0x61, 0x74, 0x65, 0x3b, 0x4c, 0x00, 0x05, 0x73, 0x74,
    0x61, 0x72, 0x74, 0x71, 0x00, 0x7e, 0x00, 0x01, 0x78, 0x70,
    0x73, 0x72, 0x00, 0x0e, 0x6a, 0x61, 0x76, 0x61, 0x2e, 0x75,
    0x74, 0x69, 0x6c, 0x2e, 0x44, 0x61, 0x74, 0x65, 0x68, 0x6a,
    (byte)0x81, 0x01, 0x4b, 0x59, 0x74, 0x19, 0x03, 0x00, 0x00,
    0x78, 0x70, 0x77, 0x08, 0x00, 0x00, 0x00, 0x66, (byte)0xdf,
    0x6e, 0x1e, 0x00, 0x78, 0x73, 0x71, 0x00, 0x7e, 0x00, 0x03,
    0x77, 0x08, 0x00, 0x00, 0x00, (byte)0xd5, 0x17, 0x69, 0x22,
    0x00, 0x78 };

    public static void main(String[] args) {
        Period p = (Period) deserialize(serializedForm);
        System.out.println(p);
    }

    //Returns the object with the specified serialized form
    public static Object deserialize(byte[] sf) {
        try {
            InputStream is = new ByteArrayInputStream(sf);
            ObjectInputStream ois = new ObjectInputStream(is);
            return ois.readObject();
        } catch (Exception e) {
            throw new IllegalArgumentException(e.toString());
        }
    }
}

The byte array literal used to initialize serializedForm was generated by serializing a normal Period instance and hand-editing the resulting byte stream. The details of the stream are unimportant to the example, but if you're curious, the serialization byte stream format is described in the Java Object Serialization Specification [Serialization, 6]. If you run this program, it prints “Fri Jan 01 12:00:00 PST 1999 - Sun Jan 01 12:00:00 PST 1984.” Making Period serializable enabled us to create an object that violates its class invariants. To fix this problem, provide a readObject method for Period that calls defaultReadObject and then checks the validity of the deserialized object. If the validity check fails, the readObject method throws an InvalidObjectException, preventing the deserialization from completing:

private void readObject(ObjectInputStream s)
        throws IOException, ClassNotFoundException {
    s.defaultReadObject();

    // Check that our invariants are satisfied
    if (start.compareTo(end) > 0)
        throw new InvalidObjectException(start +" after "+ end);
}

While this fix prevents an attacker from creating an invalid Period instance, there is a more subtle problem still lurking. It is possible to create a mutable Period instance by fabricating a byte stream that begins with a byte stream representing a valid Period instance and then appends extra references to the private Date fields internal to the Period instance. The attacker reads the Period instance from the ObjectInputStream and then reads the “rogue object references” that were appended to the stream. These references give the attacker access to the objects referenced by the private Date fields within the Period object. By mutating these Date instances, the attacker can mutate the Period instance. The following class demonstrates this attack:

public class MutablePeriod {
    // A period instance
    public final Period period;

    // period's start field, to which we shouldn't have access
    public final Date start;

    // period's end field, to which we shouldn't have access
    public final Date end;

    public MutablePeriod() {
        try {
            ByteArrayOutputStream bos =
                new ByteArrayOutputStream();
            ObjectOutputStream out =
                new ObjectOutputStream(bos);

            // Serialize a valid Period instance
            out.writeObject(new Period(new Date(), new Date()));

            /*
             * Append rogue "previous object refs" for internal
             * Date fields in Period. For details, see "Java
             * Object Serialization Specification," Section 6.4.
             */
            byte[] ref = { 0x71, 0, 0x7e, 0, 5 }; // Ref #5
            bos.write(ref); // The start field
            ref[4] = 4;     // Ref # 4
            bos.write(ref); // The end field

            // Deserialize Period and "stolen" Date references
            ObjectInputStream in = new ObjectInputStream(
            new ByteArrayInputStream(bos.toByteArray()));
            period = (Period) in.readObject();
            start  = (Date)   in.readObject();
            end    = (Date)   in.readObject();
        } catch (Exception e) {
            throw new RuntimeException(e.toString());
        }
    }
}

To see the attack in action, run the following program:

public static void main(String[] args) {
    MutablePeriod mp = new MutablePeriod();
    Period p = mp.period;
    Date pEnd = mp.end;

    // Let's turn back the clock
    pEnd.setYear(78);
    System.out.println(p);

    // Bring back the 60's!
    pEnd.setYear(69);
    System.out.println(p);
}

Running this program produces the following output:

Wed Mar 07 23:30:01 PST 2001 - Tue Mar 07 23:30:01 PST 1978
Wed Mar 07 23:30:01 PST 2001 - Fri Mar 07 23:30:01 PST 1969

While the Period instance is created with its invariants intact, it is possible to modify its internal components at will. Once in possession of a mutable Period instance, an attacker might cause great harm by passing the instance on to a class that depends on Period's immutability for its security. This is not so farfetched: There are classes that depend on String's immutability for their security.

The source of the problem is that Period's readObject method is not doing enough defensive copying. When an object is deserialized, it is critical to defensively copy any field containing an object reference that a client must not possess. Therefore every serializable immutable class containing private mutable components must defensively copy these components in its readObject method. The following readObject method suffices to ensure Period's invariants and to maintain its immutability:

private void readObject(ObjectInputStream s)
    throws IOException, ClassNotFoundException {
    s.defaultReadObject();

    // Defensively copy our mutable components
    start = new Date(start.getTime());
    end   = new Date(end.getTime());

    // Check that our invariants are satisfied
    if (start.compareTo(end) > 0)
        throw new InvalidObjectException(start +" after "+ end);
}

Note that the defensive copy is performed prior to the validity check and that we did not use Date's clone method to perform the defensive copy. Both of these details are required to protect Period against attack (Item 24). Note also that defensive copying is not possible for final fields. To use the readObject method, we must make the start and end fields nonfinal. This is unfortunate, but it is clearly the lesser of two evils. With the new readObject method in place and the final modifier removed from the start and end fields, the MutablePeriod class is rendered ineffective. The above attack program now generates this output:

Thu Mar 08 00:03:45 PST 2001 - Thu Mar 08 00:03:45 PST 2001
Thu Mar 08 00:03:45 PST 2001 - Thu Mar 08 00:03:45 PST 2001

There is a simple litmus test for deciding whether the default readObject method is acceptable. Would you feel comfortable adding a public constructor that took as parameters the values for each nontransient field in your object and stored the values in the fields with no validation whatsoever? If you can't answer yes to this question, then you must provide an explicit readObject method, and it must perform all of the validity checking and defensive copying that would be required of a constructor.

There is one other similarity between readObject methods and constructors, concerning nonfinal serializable classes. A readObject method must not invoke an overridable method, directly or indirectly (Item 15). If this rule is violated and the method is overridden, the overriding method will run before the subclass's state has been deserialized. A program failure is likely to result.

To summarize, any time you write a readObject method, adopt the mind-set that you are writing a public constructor that must produce a valid instance regardless of what byte stream it is given. Do not assume that the byte stream represents an actual serialized instance. While the examples in this item concern a class that uses the default serialized form, all of the issues that were raised apply equally to classes with custom serialized forms. Here, in summary form, are the guidelines for writing a bulletproof readObject method:

  • For classes with object reference fields that must remain private, defensively copy each object that is to be stored in such a field. Mutable components of immutable classes fall into this category.

  • For classes with invariants, check invariants and throw an InvalidObjectException if a check fails. The checks should follow any defensive copying.

  • If an entire object graph must be validated after it is deserialized, the ObjectInputValidation interface should be used. The use of this interface is beyond the scope of this book. A sample use may be found in The Java Class Libraries, Second Edition, Volume 1 [Chan98,].

  • Do not invoke any overridable methods in the class, directly or indirectly.

The readResolve method may be used as an alternative to a defensive readObject method. This alternative is discussed in Item 57.

Item 57: Provide a readResolve method when necessary

Item 2 describes the Singleton pattern and gives the following example of a singleton class. This class restricts access to its constructor to ensure that only a single instance is ever created:

public class Elvis {
    public static final Elvis INSTANCE = new Elvis();

    private Elvis() {
					        ...
					    }

    ...  // Remainder omitted
}

As noted in Item 2, this class would no longer be a singleton if the words “implements Serializable” were added to its declaration. It doesn't matter whether the class uses the default serialized form or a custom serialized form (Item 55), nor does it matter whether the class provides an explicit readObject method (Item 56). Any readObject method, whether explicit or default, returns a newly created instance, which will not be the same instance that was created at class initialization time. Prior to the 1.2 release, it was impossible to write a serializable singleton class.

In the 1.2 release, the readResolve feature was added to the serialization facility [Serialization, 3.6]. If the class of an object being deserialized defines a readResolve method with the proper declaration, this method is invoked on the newly created object after it is deserialized. The object reference returned by this method is then returned in lieu of the newly created object. In most uses of this feature, no reference to the newly created object is retained; the object is effectively stillborn, immediately becoming eligible for garbage collection.

If the Elvis class is made to implement Serializable, the following readResolve method suffices to guarantee the singleton property:

private Object readResolve() throws ObjectStreamException {
    // Return the one true Elvis and let the garbage collector
    // take care of the Elvis impersonator.
    return INSTANCE;
}

This method ignores the deserialized object, simply returning the distinguished Elvis instance created when the class was initialized. Therefore the serialized form of an Elvis instance need not contain any real data; all instance fields should be marked transient. This applies not only to Elvis, but to all singletons.

A readResolve method is necessary not only for singletons, but for all other instance-controlled classes, in other words, for all classes that strictly control instance creation to maintain some invariant. Another example of an instance-controlled class is a typesafe enum (Item 21), whose readResolve method must return the canonical instance representing the specified enumeration constant. As a rule of thumb, if you are writing a serializable class that contains no public or protected constructors, consider whether it requires a readResolve method.

A second use for the readResolve method is as a conservative alternative to the defensive readObject method recommended in Item 56. In this approach, all validity checks and defensive copying are eliminated from the readObject method in favor of the validity checks and defensive copying provided by a normal constructor. If the default serialized form is used, the readObject method may be eliminated entirely. As explained in Item 56, this allows a malicious client to create an instance with compromised invariants. However, the potentially compromised deserialized instance is never placed into active service; it is simply mined for inputs to a public constructor or static factory and discarded.

The beauty of this approach is that it virtually eliminates the extralinguistic component of serialization, making it impossible to violate any class invariants that were present before the class was made serializable. To make this technique concrete, the following readResolve method can be used in lieu of the defensive readObject method in the Period example in Item 56:

					// The defensive readResolve idiom
private Object readResolve() throws ObjectStreamException {
    return new Period(start, end);
}

This readResolve method stops both of the attacks described Item 56 dead in their tracks. The defensive readResolve idiom has several advantages over a defensive readObject. It is a mechanical technique for making a class serializable without putting its invariants at risk. It requires little code and little thought, and it is guaranteed to work. Finally, it eliminates the artificial restrictions that serialization places on the use of final fields.

While the defensive readResolve idiom is not widely used, it merits serious consideration. Its major disadvantage is that it is not suitable for classes that permit inheritance outside of their own package. This is not an issue for immutable classes, as they are generally final (Item 13). A minor disadvantage of the idiom is that it slightly reduces deserialization performance because it entails creating an extra object. On my machine, it slows the deserialization of Period instances by about one percent when compared to a defensive readObject method.

The accessibility of the readResolve method is significant. If you place a readResolve method on a final class, such as a singleton, it should be private. If you place a readResolve method on a nonfinal class, you must carefully consider its accessibility. If it is private, it will not apply to any subclasses. If it is package-private, it will apply only to subclasses in the same package. If it is protected or public, it will apply to all subclasses that do not override it. If a readResolve method is protected or public and a subclass does not override it, deserializing a serialized subclass instance will produce a superclass instance, which is probably not what you want.

The previous paragraph hints at the reason the readResolve method may not be substituted for a defensive readObject method in classes that permit inheritance. If the superclass's readResolve method were final, it would prevent subclass instances from being properly deserialized. If it were overridable, a malicious subclass could override it with a method returning a compromised instance.

To summarize, you must use a readResolve method to protect the “instance-control invariants” of singletons and other instance-controlled classes. In essence, the readResolve method turns the readObject method from a de facto public constructor into a de facto public static factory. The readResolve method is also useful as a simple alternative to a defensive readObject method for classes that prohibit inheritance outside their package.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.51.145