Chapter 11. Generic Types

 

The problem with people who have no vices is that generally you can be pretty sure they're going to have some pretty annoying virtues.

 
 --Elizabeth Taylor

In the Java programming language, the class Object forms the root of the class hierarchy, allowing other classes to deal with arbitrary types of objects by declaring that they work with Object. In Chapter 3 you saw a SingleLinkQueue class that used Cell objects to hold a reference to any Object you wanted in the queue. This provides great flexibility, but with some inconvenience: If you know that the elements in question are String objects, for example, then you have to cast the result of the remove method each time.

This use of Object as a generic reference is also potentially unsafe—there is nothing to stop the programmer from mistakenly adding a Number instead of a String, with the mistake only discovered at runtime when the cast applied to remove fails and a ClassCastException is thrown. To avoid these problems you could define different queue classes to store particular types of elements, such as SingleLinkStringQueue or SingleLinkNumberQueue, but this has many drawbacks: It is tedious, inefficient, and error-prone to duplicate code in this way; it is awkward to deal with an arbitrary kind of queue; and you could end up with numerous classes that are identical except for the return and parameter types of a few methods. Ideally, you would like to write a SingleLinkQueue that can be specialized to contain any particular kind of object and then, for each queue object, define which type it holds. That is where generic types come in.

To start, here's a new version of Cell declared as a generic class:

class Cell<E> {
    private Cell<E> next;
    private E element;
    public Cell(E element) {
        this.element = element;
    }
    public Cell(E element, Cell<E> next) {
        this.element = element;
        this.next = next;
    }
    public E getElement() {
        return element;
    }
    public void setElement(E element) {
        this.element = element;
    }
    public Cell<E> getNext() {
        return next;
    }
    public void setNext(Cell<E> next) {
        this.next = next;
    }
}

The class is now declared as Cell<E> (which is read as “Cell of E”). E represents the type of element that a cell object can hold. In this generic version of Cell, everywhere that Object was used in the original Cell class now uses the name E. E is known as a type variable for which a concrete type can be substituted. There's nothing special about the name E—it could be ElementType, for example—but E is a nice abbreviation for “element.” By convention, type variables have single character names: E for an element type, K for a key type, V for a value type, T for a general type, and so forth.

To create an actual Cell object you have to tell the compiler what specific type you wish to replace E with. For example, a Cell to hold a String could be constructed and referenced as follows:

Cell<String> strCell = new Cell<String>("Hello");

Note that the concrete type information is required in two places here: in the type declaration of strCell and as part of the constructor invocation.

Of course, our cells aren't used directly like this; rather, they are an internal part of the SingleLinkQueue class, which would look like this:

class SingleLinkQueue<E> {
    protected Cell<E> head;
    protected Cell<E> tail;
    public void add(E item) {
        Cell<E> cell = new Cell<E>(item);
        if (tail == null)
            head = tail = cell;
        else {
            tail.setNext(cell);
            tail = cell;
        }
    }

    public E remove() {
        if (head == null)
            return null;
        Cell<E> cell = head;
        head = head.getNext();
        if (head == null)
            tail = null;  // empty queue
        return cell.getElement();
    }
}

A queue with element type E is made up of cells of E. Again, everywhere in the original class that Object was used, the type variable E is now used. In addition, each use of Cell is replaced with Cell<E>.

You can create and use a queue of String objects as follows:

SingleLinkQueue<String> queue =
    new SingleLinkQueue<String>();
queue.add("Hello");
queue.add("World");

Now there is no need for a cast when invoking remove:

String hello = queue.remove();

Nor is it possible to add the wrong kind of element to the queue:

queue.add(25);    // INVALID: won't compile

Generic types are invaluable in defining these kind of collection classes, as you'll see in Chapter 21, but they have uses in many other situations.

Generic Type Declarations

The declaration of the class Cell<E> is an example of a generic type declaration. It declares Cell to be a generic class, where the type variable E is known as the type parameter of the generic declaration. When you use a type name such as Cell<String>, providing a specific type argument (String) to replace the generic type parameter (E), you thereby introduce a parameterized type. This has some similarities to the method invocation process where declared parameters are given the values of specific arguments, so the use of a parameterized type such as Cell<String> is also known as a generic type invocation.

A generic type declaration can contain multiple type parameters, separated by commas. For example, the Map interface (see page 587) defines a generic mapping between keys and values so it is declared as interfaceMap<K, V>, where K is the type parameter for the key type and V is the type parameter for the value type.

When you define a generic class, all invocations of that generic class are simply expressions of that one class. Declaring a variable strCell as Cell<String> tells the compiler that strCell will refer to an object of type Cell<E> where E will be String. It does not tell the compiler to create a new class Cell<String>. Cell<String> and Cell<Number> are not two separate classes. They are two generic type invocations of the one class—Cell—that are applied in two different ways to two different objects. The class Cell is known as the raw type corresponding to the generic class declaration Cell<E>.

The following code shows quite clearly that there is just one class because the value of same is true:

Cell<String> strCell = new Cell<String>("Hello");
Cell<Integer> intCell = new Cell<Integer>(25);
boolean same = (strCell.getClass() == intCell.getClass());

This may seem surprising if you thought that Cell<String> was a class. But it is only Cell that is a class. The use of <String> and <Integer> in the constructor invocations are not class definitions. They declare information about the type of object being constructed so that the compiler can check that the object is used correctly. At runtime, no generic type information is present in objects. In the example above, the Cell object that contains the element "Hello" has no knowledge that it was constructed as a Cell<String>—that knowledge has been erased; a concept we come back to a bit later in the chapter (see page 267).

The fact that there is a single class definition for a generic class, no matter how many different parameterized invocations there may be, has several consequences in regard to what you can and cannot do with generics. We point these out along the way, but they all boil down to this: There is only one class, so you cannot do anything that cannot be done with only one class.

The first consequence of this fact is that a generic class with a type parameter, say E, cannot use E in the type of a static field or anywhere within a static method or static initializer. To do so would require a different field or method for each value of E. Because there is only one class, there is only one actual static field or method, and so the code cannot depend on the value of E. The fact that static members are never dependent on a particular parameterization is reinforced by the rule that you cannot refer to a static member via a parameterized type name. For example, if SingleLinkQueue has a static merge method, it must be invoked as SingleLinkQueue.merge. SingleLinkQueue<Integer>.merge, for example, would not compile. To similarly reinforce that there is but a single class object, a class literal such as SingleLinkQueue.class can only use the raw class name.

Within a generic class definition, a type parameter can appear in any non-static declaration where you would normally put a concrete type name: in a field declaration, a method return type or parameter type declaration, a local variable declaration, or a nested type declaration. You've seen examples of most of these uses in the Cell and SingleLinkQueue classes.

Two places where you would usually use a class name but cannot use a type parameter are when creating objects and arrays. For example, here is the wrong way to try to expose the elements of a queue as an array:

class SingleLinkQueue<E> {
    // ...

    public E[] toArray() {
        int size = 0;
        for (Cell<E> c = head; c != null; c = c.getNext())
             size++;

        E[] arr = new E[size];   // INVALID: won't compile
        // ... copy in elements ...
    }
}

You cannot instantiate E or create an array of E for the same reason you can't have a static field of type E: There is a single class with a single definition of toArray, and the compiler has to know at compile time what code to generate to create any objects or arrays. There is no way for the compiler to generate newString[size] or newInteger[size] depending on what E happens to be.[1]

By now you may be wondering how it is that the compiler can do anything with type parameters? The answer to that is deceptively simple: The compiler uses the most general possible type to represent the type parameter (often Object) and uses type casting to ensure type correctness. You can think of generics as a (potentially optimized) shorthand for writing all those casts. This is explained in more detail a little later.

So how do you expose the elements of the queue as an array? You could expose the size of the queue and have the caller create, and pass to you, an array of the right size and type—which we show you a little later—or you could have the caller provide the type token for E (the Class object for the actual type argument) and use reflection to create an array of the right size—see “Genericity and Dynamic Arrays” on page 430.

Exercise 11.1Revisit the LinkedList class that you started back in Exercise 2.2 and rewrite it as a generic class.

Exercise 11.2Rewrite the Attr class from Chapter 3 as a generic class.

Exercise 11.3Was Exercise 11.2 a good idea? How does Attr being generic affect the Attributed interface defined in Chapter 4? What does it mean for Attributed objects?

Bounded Type Parameters

In the generic class declaration of Cell<E>, the type variable E can be replaced by absolutely any reference type. Sometimes it doesn't make sense to allow an arbitrary type argument to be used with a generic type. For example, a sorted collection can hold any type of object as long as that object can be sorted. As you have seen a number of times now, the way to define something that can be sorted is to have it implement the Comparable interface. Comparable itself is a generic interface, with a single type variable that identifies what type the current type can be compared with. Usually, objects of a given type should only be compared with other objects of that same type, so typically a class Value that is comparable will be declared as

class Value implements Comparable<Value> {
    // ...
}

So a sorted collection would restrict its type parameter to be a type that implements Comparable:[2]

interface SortedCollection<E extends Comparable<E>> {
    // ... sorted collection methods ...
}

Here E is restricted to be a type that “extends” Comparable<E> so that you know that no matter what type argument is supplied, it is guaranteed to support the methods of the Comparable interface. We say that Comparable is the upper bound on the type of E, and that E is a bounded type parameter. The keyword extends is used here in a very general way to mean either “extends” or “implements,” depending on whether the type that follows is a class type or an interface type. Any given class can extend only one other class, but multiple interfaces can be implemented by classes or interfaces. A type bound can express such multiple dependencies by declaring that the type parameter extends one class or interface, followed by an ampersand (&) separated list of additional interfaces that must be implemented. For example, a sorted collection of character sequences could be declared as follows:

interface SortedCharSeqCollection<E extends Comparable<E>
                                  & CharSequence> {
    // ... sorted char sequence collection methods ...
}

Such a collection could hold String objects, for example.

Nested Generic Types

A nested type can also be declared as a generic type with its own type variables. Because static members cannot refer to type variables of their own class, any type variable in a static nested type is distinct from any type variable in the outer type, even if they have the same name. For example, if Cell objects are only intended for use within SingleLinkQueue objects, then it would be reasonable for Cell to be a static nested class within SingleLinkQueue:

class SingleLinkQueue<E> {
    static class Cell<E> {
        private Cell<E> next;
        private E element;
        public Cell(E element) {
            this.element = element;
        }
        public Cell(E element, Cell<E> next) {
            this.element = element;
            this.next = next;
        }
        public E getElement() {
            return element;
        }
        /* ... rest of Cell methods as before ... */
    }

    protected Cell<E> head;
    protected Cell<E> tail;

    /* ... rest of SingleLinkQueue methods as before ... */
}

The code in Cell and SingleLinkQueue is unchanged except that Cell is declared as a static nested class within SingleLinkQueue. The type variable E is used for both classes because the element type of a cell must always match the element type of the queue it is part of, but they are distinct names. The two different E types are associated in the declarations of head and tail, where the E type parameter of the SingleLinkQueue class defines the parameterized type Cell<E>. As always, you should choose names carefully and avoid confusion. If the type variables should always represent the same type then use the same name; if they can be distinct types, then use different names.

If the nested type is an inner class, then the type variables of the outer class declaration are accessible to it and can be used directly. For example, an alternative design of the queue could use a non-generic inner class to define cell objects:

class SingleLinkQueue<E> {
    class Cell {
        private Cell next;
        private E element;
        public Cell(E element) {
            this.element = element;
        }
        public Cell(E element, Cell next) {
            this.element = element;
            this.next = next;
        }
        public E getElement() {
            return element;
        }
        /* ... rest of Cell methods ... */
    }

    protected Cell head;
    protected Cell tail;

    public void add(E item) {
        Cell cell = new Cell(item);
        if (tail == null)
            head = tail = cell;
        else {
            tail.setNext(cell);
            tail = cell;
        }
    }

    public E remove() {
        if (head == null)
            return null;
        Cell cell = head;
        head = head.getNext();
        if (head == null)
            tail = null;  // empty queue
        return cell.getElement();
    }

    /* ... rest of methods ... */
}

This time the element type of the cell is directly tied to the element type of the queue that it is in, and there is no need to declare Cell to be a generic class.

If you do choose to use generic inner classes, a type variable in the inner class hides any type variable with the same name in the outer class and, as always, hiding should be avoided.

Deeply nesting types is even more of a problem when the nested types are generic. Both nested types and generic types add an extra degree of complexity to programs, and their combination can greatly compound that complexity.

Working with Generic Types

To use a generic type you define a suitable parameterized type that provides the desired type arguments for each type parameter. For example, you previously saw:

SingleLinkQueue<String> queue =
        new SingleLinkQueue<String>();

The variable queue is of type SingleLinkQueue<String>, and the constructor invocation also declares the parameterized type SingleLinkQueue<String>. A parameterized type must supply a type argument for each type parameter declared by the generic type.

Working with generic types requires a little more planning than working with non-generic types: When you need to declare a field or local variable, or a method parameter or return type, you need to specify the appropriate parameterized type that you intend to work with. For example, should a field be of type SingleLinkQueue<String> or SingleLinkQueue<Number>, or should it be able to refer to any queue, perhaps SingleLinkQueue<Object>? This seems an easy enough question to answer—indeed it seems we did just answer it: You should specify a type argument that is assignable by all the types you expect. The problem is that the way you specify such a type is not the way we just illustrated because generic types don't form subtypes in the way you might think.

Subtyping and Wildcards

Suppose you want to write a method to sum all of the elements in a List—one of the collection interfaces of java.util that we describe in Section 21.6 on page 580. Naturally, you can only sum numbers, so you require that the list that gets passed contains only Number objects. The following code would seem to do the trick:

static double sum(List<Number> list) {
    double sum = 0.0;
    for (Number n : list)
        sum += n.doubleValue();
    return sum;
}

The intent is that sum can be passed any List object that has elements that are compatible with Number. But that is not what the parameterized type List<Number> means: It means an object compatible with List that has elements declared to be Number. If you try to invoke sum with a List<Integer>, for example, your code will not compile.

List<Integer> l = new ArrayList<Integer>();
l.add(1);
l.add(4);
l.add(9);
double sum = sum(l);   // INVALID: won't compile

The problem is that even though Integer is a subtype of Number, List<Integer> is not a subtype of List<Number>. Contrast this with arrays, where Integer[] is a subtype of Number[].

We need a way to declare a List of an arbitrary element type that is compatible with Number, and the way to do that is to use the type argument wildcard '?':

static double sum(List<? extends Number> list) {
    double sum = 0.0;
    for (Number n : list)
        sum += n.doubleValue();
    return sum;
}

In this example, the wildcard indicates that we require a List of any type, as long as that type is Number or a subclass of Number. More formally, we've used a bounded wildcard, where Number forms the upper bound on the type that is expected: Whatever type we get, it must be at least a Number.

You can also specify that a wildcard type must be the same as, or a supertype of, another. You do this by using the keyword super rather than extends. For example, you could use List<?super Integer> to match a List of Integer or any of its supertypes: List<Integer>, List<Number>, List<Serializable>, List<Comparable<Integer>>, or List<Object>. In this case the type represents the lower bound on the wildcard's type.

Note that unlike a bounded type variable, a bounded wildcard can have only a single bound—either an upper bound or a lower bound. For example, if you would like to restrict the type of list to be one containing elements that are at least of class Value and also implement Serializable, you cannot say List<?extends Value& Serializable>.

Of course, you can also have an unbounded wildcard. For example, List<?> represents a list of any kind—implicitly the upper bound is Object. But remember that List<?> is not another way of saying List<Object>; rather, it is a way of saying List<?extends Object>.

In contrast to the non-wildcard versions, parameterized types that include a bounded wildcard are related in the way you might have expected. For example, List<?extends Integer> is a subtype of List<?extends Number>, which is itself a subtype of List<?>. Similarly, List<?super Number> is a subtype of List<?super Integer>. In addition, non-wildcard parameterized types are subtypes of suitably bounded, wildcard parameterized types. For example, List<Number> is a subtype of List<?extends Number>, as is List<Integer>. While this in itself is a good property to have, the difference between the behavior with and without bounded wildcards can be a source of confusion. Perhaps the following diagram will clarify things:

Subtyping and Wildcards

These properties of wildcards make them not only useful but essential for effective use of generic types. Whenever a method takes a parameter that is a parameterized type or returns a parameterized type, that parameterized type should almost always be expressed with a wildcard of some form. Typically, parameters use lower-bounded wildcards, while return values use upper bounded wildcards—though if there is a constraint between a parameter type and the return type, it may not be possible to use a wildcard. Even a type bound that is a parameterized type will often be expressed with a wildcard. Recall the example of the SortedCollection<E> interface, in which we constrained the type parameter E such that E extends Comparable<E>. This seemed reasonable: to sort the collection we need the elements to be comparable to each other. In fact, this constraint is overly tight—for two elements of a class T to be comparable to each other, it is not necessary that T extend Comparable<T> but only that T extend Comparable<S> where S is T or any supertype of T. For example, if class Value implements Comparable<Object> then it can still correctly compare two Value objects—it just happens to be able to do more than that. Consequently, SortedCollection should be declared as follows:

interface SortedCollection<E extends Comparable<? super E>> {
    // ... sorted collection methods ...
}

Wildcards are essential for obtaining the flexibility that we needed in the sum example, but they have a limitation: Because the wildcard represents an unknown type, you can't do anything that requires the type to be known. For example, you can't add an element to a queue referenced through a variable that has an unbounded or upper-bounded wildcard type:[3]

SingleLinkQueue<?> strings =
    new SingleLinkQueue<String>();
strings.add("Hello");               // INVALID: won't compile

SingleLinkQueue<? extends Number> numbers =
    new SingleLinkQueue<Number>();
numbers.add(Integer.valueOf(25));   // INVALID: won't compile

This is a reasonable restriction because in general, if passed a queue of an unknown kind, you have no idea what type of object you can store in it. So you can't store a String (or an Object or a Number or …) in a queue that could require some other kind of element. Nor can you store a Number in a queue in which the elements are known to be Number objects or objects of a subclass of Number—because in the subclass case passing a general Number object would be incorrect. However, given strings above, you could invoke remove and assign the result to an Object reference—because the returned value must be compatible with Object. Similarly, given numbers, you could invoke remove and assign the result to a Number reference—because the returned value must be at least a Number.

In contrast, given a lower-bounded wildcard type, the wildcard is known to be the same as, or a super type of, the bound, so adding an element of the same type as the bound is always correct. For example, this method is perfectly correct:

static void addString(SingleLinkQueue<? super String> sq) {
    sq.add("Hello");
}

No matter what queue is passed to addString, you're guaranteed that it can hold String objects.

Wildcards can be used in most declarations: fields, local variables, parameter types, and return types. But you can't use them to name a class or interface in an extends or implements clause. Any parameterized types that a class (or interface) extends or implements must be concrete types in which the immediate type argument is not a wildcard. For example, you can't define a class that implements List<?>. However, the type argument may itself be a parameterized type with a type argument that is a wildcard so implementing List<List<?>> is okay.

You'll see more examples of the use of wildcards when we look at the collection classes in detail in Chapter 21.

Generic Methods and Constructors

Earlier we mentioned how the SingleLinkQueue class might want to expose the elements of the queue as an array. We noted then that it was impossible to create an array with the type variable E and that instead the caller should pass in an array of the right size and type. Here's a first attempt at defining such a method:

public E[] toArray_v1(E[] arr) { // too restrictive!
    int i = 0;
    for (Cell<E> c = head;
         c != null && i < arr.length;
         c = c.getNext())
        arr[i++] = c.getElement();
    return arr;
}

This method works. If you invoke toArray_v1 on an instance of SingleLinkQueue<String> and pass a String[], then as many elements from the queue as the array can hold will be copied into it, and then the array will be returned. The problem with this definition, as you may have already surmised from the discussion of wildcards, is that it is too restrictive. It will, in the current example, accept only a String[] and not, for example, an Object[] even though it is perfectly valid to store String objects into an Object[].

Can we use a wildcard to help us out in this situation as well? Unfortunately not. To use a wildcard we'd try writing something like this:

public ?[] toArray(?[]) { ... } // INVALID: won't compile

But this is simply not legal syntax: A wildcard represents an unknown type in a parameterized type definition and '?[]' is not a parameterized type. What you need to do is introduce another type variable, say T, and use it to describe the type of the array. And the way you introduce new type variables in a method is to declare the method as a generic method.

You declare a generic method by defining type variables between the method modifiers and the method return type, much as you would declare type variables for a generic class or interface. Here is how you would actually define toArray as a generic method:

public <T> T[] toArray(T[] arr) {
    Object[] tmp = arr;
    int i = 0;
    for (Cell<E> c = head;
         c != null && i < arr.length;
         c = c.getNext())
        tmp[i++] = c.getElement();
    return arr;
}

Notice that the type variable T is only used in the method header to constrain the parameter type and the return type to be the same—within the method body we don't really care what type T is. But also note that in the method body we actually work with a local variable of type Object[], which refers to the actual array that is passed in. This is necessary because the array is of type T, but getElement returns E, and these are not the same type—T may be Object while E is String. We could cast E to T for the cast must succeed if T and E are compatible, but such a cast would be an unchecked cast and would produce a compiler warning—see “Erasure at Runtime” on page 267—so we bypass the problem by using the Object[] variable.

By now you may be asking yourself: “Shouldn't there be some restriction between the types T and E, since they must be compatible?” Logically, there could be such a restriction: T must be E or a supertype of E. Unfortunately, there is no way to express this restriction. Only wildcards can be given a lower type bound, so you cannot write <Tsuper E> as a type variable. In any case, such a restriction is not strictly necessary. Suppose you have a SingleLinkQueue<Object> that you know you have filled only with String instances—why shouldn't you be able to pass toArray a String[]? Yet String is not a supertype of Object. So in fact there is no compile-time type checking between T and E to see if they are compatible. We instead rely on the runtime type check that always occurs when a reference is stored into an array.

Generic methods and constructors are typically used when you need to introduce a type variable to constrain the parameterized types of different parameters, or to constrain a parameter and the return type, as with toArray. For example, consider a merge operation for SingleLinkQueue that moves the elements from a source queue into a destination queue. To perform a merge, the elements of the source queue must be the same as, or a subtype of, the elements of the destination queue. You can express this constraint with a generic method by using a wildcard:

public static <E> void merge(SingleLinkQueue<E> d,
                             SingleLinkQueue<? extends E> s)
{
    // ... merge s elements into d ...
}

We could have introduced a second type variable, say S, to use instead of the wildcard to achieve the same affect. So which is preferable? The general rule is to use wildcards when you can because code with wildcards is generally more readable than code with multiple type parameters. When deciding if you need a type variable, ask yourself if that type variable is used to relate two or more parameters, or to relate a parameter type with the return type. If the answer is no, then a wildcard should suffice. In the above example, S would appear in only one place and so can trivially be replaced by a wildcard.

A generic method need not be declared in a generic type, and if it is then the type variables are distinct. As with inner generic types, if a generic method declares a type variable of the same name as that of a type variable in the method's class or interface, the outer name is hidden. Finally, note that a static method such as merge can be a generic method, though it cannot refer to any type variables of its own generic class.

Generic Invocations and Type Inference

If you invoke a generic method like toArray or merge, the compiler must ensure, as it does for all method invocations, that you pass in arguments of the right type and assign any return value to a variable of the right type. When you create an instance of a generic class, you use a parameterized type to bind a particular type argument to the type variables of the class. Similarly, you can parameterize a method invocation to supply type arguments for the method's type variables. For example, consider this generic method that simply returns its argument:

<T> T passThrough(T obj) {
    return obj;
}

You can invoke passThrough as a String method:

String s1 = "Hello";
String s2 = this.<String>passThrough(s1);

This parameterized method invocation tells the compiler that T should be treated as String and that the arguments and return value of passThrough must be verified accordingly.

Fortunately, explicitly parameterizing a method invocation is rarely needed. In the absence of a type argument, the compiler will infer what type to use from the static argument types and the way in which the return type is used. For example, this invocation of passThrough is equivalent to the previous one but relies on type inference to establish the type of T:

String s1 = "Hello";
String s2 = passThrough(s1);

In this case, the argument type is inferred to be String, which is the static type of the variable s1. This implies that the return type is also String. This is compatible with the assignment to s2 and so the invocation is valid with an inferred type of String.

The following invocations of passThrough are also valid:

String s1 = "Hello";
Object o1 = passThrough(s1);          // T => String
Object o2 = passThrough((Object) s1); // T => Object

In the first case the argument type is String and the return type is expected to be Object. The inferred type for T is again String, implying that the return type is also String. This is compatible with assignment to the Object variable o1, so the invocation is valid, with an inferred type of String. In the second case the static type of the argument is Object (due to the cast) and so the inferred type is also Object. This makes the return type Object, so again we have a compatible assignment and so a valid invocation. In general, type inference has to find the most specific type in the set of types that satisfy the constraints imposed by the type variables—a non-trivial exercise.

Type inference is based on the static type of the argument expressions that are being passed, not their dynamic types, so the following won't compile:

String s1 = "Hello";
s1 = passThrough((Object) s1); // INVALID: won't compile

The static argument type of Object requires that T, as the return type of the method, be Object (or a supertype if Object had one) and the String type of s1 requires that T be assignable to String. But there is no such type—you cannot assign an Object to a String reference. Of course, you can inform the compiler that you know the returned object actually is a String by inserting an additional cast:

s1 = (String) passThrough((Object) s1);

The actual type inference process and the rules controlling it are extremely complex, but for the most part you don't need to care what the actual inferred type is as long as the code compiles. For example, if a generic method took two parameters of type T and you invoked it passing a List<String> and a List<Number>, then the inferred type could be List<?>. Inference is also used as part of determining which method to invoke, as you will see shortly.

The inference process may result in simple types like Object or String, or more complex types for which no single class or interface declaration exists—such as a type that is both Runnable and Cloneable. These complex types are generally known as intersection types. For each constraint on a type variable there will be a set of types that meet that constraint. The intersection of those sets contains the inferred type.

Finally, note that if you do wish to make a parameterized method invocation, you cannot parameterize an invocation that uses just a simple method name:

String s1 = "Hello";
String s2 = <String>passThrough(s1); // INVALID: won't compile

Instead you must qualify the method name appropriately, such as by using this or super for instance methods, or the class name for static methods.

Wildcard Capture

Wildcards represent an unknown type, but whenever a variable that has a wildcard type is used, the compiler must treat it as having some specific type so that it can check for correct usage. This specific (but still unknown) type is referred to as the capture of the wildcard. The place you will most commonly come across the capture of a wildcard is in the error messages the compiler produces when you use a parameterized type the wrong way. For example, recall the incorrect attempt to add a String object to a queue accessed through an unbounded wildcard reference:

SingleLinkQueue<?> strings =
    new SingleLinkQueue<String>();
strings.add("Hello");               // INVALID: won't compile

The error message this produced from the compiler we used was:

add(capture of ?) in SingleLinkQueue<capture of ?> cannot be applied to (java.lang.String)

This is telling us that when the wildcard reference strings is used, the type of queue is SingleLinkQueue<captureof ?>, so the type of the parameter to add is also “captureof ?”. Because String is not compatible with “captureof ?” the call is not allowed. Even if the wildcard is bounded, as in the earlier example with the queue of numbers (“?extends Number), the type of the parameter to add is “captureof ?extends Number” which is still not compatible with Integer, even though Integer is a subtype of Number.

If a wildcard is always represented by its capture, it would seem that once you have a wildcard type you can only use it wherever a wildcard type is expected. Indeed this is a basic rule that ensures the integrity of the type system.

However, as it stands this basic rule imposes a rather annoying restriction. A wildcard represents any type. Similarly, the type variable of a generic method represents any type. But under the basic rule a wildcard type can only be used where a wildcard type is expected. This precludes passing wildcard types to generic methods that have a parameter defined by a type variable. Consider the synchronizedList utility method from the Collections class—part of the java.util package discussed in more detail in Chapter 21. Its job is to take an arbitrary list object and return a new list object that is backed by the original, but one for which all the methods are synchronized to make access to the list thread-safe (see Chapter 14). The method is declared as follows:

static <T> List<T> synchronizedList(List<T> list) { ... }

There is no logical reason why we should not be able to pass it a list referenced through a wildcard type:

static void processList(List<?> list) {
    List<?> slist = Collections.synchronizedList(list);
    // ... process the list
}

To do so is quite safe—and quite convenient!

Although in general you cannot use a List<?> where a List<T> is expected because the actual type represented by the wildcard is not known to be compatible with T, a special rule bridges the gap that would otherwise exist between wildcard types and the type variables of generic methods. This rule allows (under the right circumstances) the capture of the wildcard to be represented as some unknown generic type X and then to infer in the call that T is X. You don't actually need to know what X is—whatever it is, the resulting use of the list is guaranteed to be type-safe.

This conversion of the capture of the wildcard to the type T is known as capture conversion and it complements the other type conversions that we discussed in Section 9.4 on page 216.

For capture conversion to apply, there must be a unique mapping between the capture of the wildcard and the type variable involved. This leads to some general restrictions on when capture conversion can apply.

First, capture conversion won't apply if the type parameter is used with more than one method parameter. Consider a utility method that merges two lists:

static <T> List<T> merge(List<T> first, List<T> second) {
    /* ... */
}

If you try to pass an argument of type List<?> for first and second it will fail—even if you passed the same reference. The problem is that the process for resolving the method call will essentially replace the type of the first argument with X, the type of the second argument with Y, and then see if T is uniquely determined. But because X is not the same as Y this will not be the case.

Second, you can only apply capture conversion if the type variable is defined at the top-level of the generic type. For example, suppose you had the method

static <T> void processListOfLists(List<List<T>> list) {
    /* ... */
}

and you tried to invoke it with an argument of type List<List<?>>. The capture of the wildcard would not uniquely determine an element type for the outer List. The method requires a List with elements that are all the same type, but a List with an element type of List<?> could contain a List<String>, a List<Object>, and so forth, so capture conversion does not apply in this case.

Finally, as usual you cannot use a wildcard reference anywhere that the type of the wildcard needs to be known. For example, given the method

static <T> void addToList(List<T> list, T t) {
    /* ... */
}

you cannot pass a reference of type List<?> because the inferred type of T will be “captureof ?” and there is no type compatible with “captureof ?” that you could possibly pass for the parameter t. Or looking at it the other way, T will be determined by whatever you pass for t, and whatever it is, it cannot be “captureof ?”, so the type of T would not be uniquely determined.

Under the Hood: Erasure and Raw Types

As we have stated a number of times, for each generic type there is only one class, no matter how many different parameterized types are formed from it. This raises the question: What exactly is that type? Given a class like Cell<E>, what type do Cell<String> and Cell<Number> share?

To determine the answer the compiler uses a process called erasure (because the compiler essentially erases all generic type information from the compiled class). The erasure of a generic or parameterized type is simply the unadorned type name. This is also known as the raw type (see “Raw Types, “Unchecked” Warnings, and Bridge Methods” on page 745). For example, the erasure of Cell<E> is just Cell. The erasure of a type variable is the erasure of its first bound. For example, the erasure of E in <E> is Object—the implicit upper bound. The erasure of E in <Eextends Number> is the explicit upper bound of Number, as it is in <Eextends Number& Cloneable>, because Number is the first bound. The compiler generates a class definition for the erasure of a generic type by effectively replacing each type variable with its erasure. When a parameterized type is used and the type information from the erasure of the generic type doesn't match what is expected, the compiler inserts a cast. For example, if you invoke remove on an object created as a SingleLinkQueue<String> and assign it to a String reference, the compiler will insert a cast to String because the erasure of the type variable in SingleLinkQueue<E> is just Object.

You need to understand the erasure process because it impacts your programs in two key areas:

  • The runtime actions that can involve generic types

  • Method overloading and overriding

We start with the runtime issues.

Erasure at Runtime

The runtime impact of erasure is simple to state: Nothing that needs to know the value of a type argument at runtime is allowed. This has the following consequences, some of which we have already discussed:

  • You cannot instantiate a type represented only as a type parameter, nor can you create an array of such a type. So for a type variable T you can't do newT() or newT[n] .

  • You cannot create an array whose element type is a parameterized type, with the exception of a parameterized type for which all the type arguments are unbounded wildcards. For example, “newList<String>[1] ” is invalid, whereas “newList<?>[1] ” is okay.

  • You cannot use instanceof to see if an object is an instance of a parameterized type, again with the exception of a parameterized type whose type arguments are all unbounded wildcards. The instanceof test is a runtime test, and at runtime the parameterized type has been replaced by its erasure. Since this is unlikely to perform the test you expected, the compiler rejects it. You can replace the parameterized type by its erasure if that is what you really wanted to test.

  • Casts involving type parameters or parameterized types are replaced with casts to the erasure of that type (see discussion below).

  • A catch clause cannot declare that it catches an exception represented by a type variable—even if the bound on that type variable is an exception type. The exact type of the exception must be known at compile time. (You can throw an exception so referenced, however, and you can declare the type variable in a method's throw clause.)

  • A generic class is not allowed to directly or indirectly extend Throwable. Given that you can't catch generic exceptions there is little point in being able to declare them.

  • You cannot use a parameterized type in a class literal expression (such as SingleLinkQueue<String>.class). This reinforces that there is but one class object.

The restriction on creating arrays of a parameterized type can be limiting but is essential. Consider, for example, an implementation of List that provides a split method that returns the contents of the list split across a number of sub-lists. It would seem reasonable to return the sublists as an array of lists. So given a class that implements List<T> the split method would return List<T>[]. You could actually declare such a method, but you would have trouble implementing it. You cannot create an array with component type List<T>, and any attempt to return some less specific array type (say List[]) would result in an “unchecked” warning (see below). The reason you cannot create a generic array such as List<T>[] is that correct use of the array cannot be enforced by the type system. This is because array store checks are based on the erasure of the component type not on their actual parameterized type, so you would be allowed to store any kind of List in an array that is supposed to have a component type of List<String>, for example. Not allowing you to create generic arrays avoids this problem. From the programmers perspective, it is better to try to work with a suitable collection of a parameterized type rather than an array. For example, the split method could be defined to return List<List<T>> instead.

Using unbounded wildcards in array creation and instanceof is allowed for parameterized types because such types are effectively equivalent to their raw types. For example, creating a List<?>[1] is effectively the same as creating a List[1]—an array whose component type is a raw type. However, there is one significant difference between these two forms: The raw type permits unsafe operations but causes the compiler to issue a warning, whereas the wildcard form prohibits unsafe operations, causing a compile-time error.

The absence of generic type information at runtime also means that casts involving type parameters or parameterized types may not have the desired effect: You cannot check whether an object is an instance of a parameterized type, so a cast to that type can not be checked at runtime—except if all type parameters are unbounded wildcards. In an ideal world in which everyone used generics, such casts would simply be disallowed (as is use of instanceof). Some degree of interoperability between generic code and non-generic code must be provided, so the casts are allowed but are replaced by casts to the erasure of the type variable or parameterized type, usually causing the compiler to emit an “unchecked” warning (see page 745). This warns you that the cast can neither be checked at runtime nor guaranteed to be type-safe at compile time. For example, a method with a SingleLinkQueue<?> parameter could cast it to SingleLinkQueue<String> and add a String. This would cause the compiler to emit an “unchecked” warning. At runtime, the actual cast to SingleLinkQueue would succeed even if the queue was actually of type SingleLinkQueue<Number>, and the String would be added, violating the type integrity of the queue. This violation would only be detected when a remove invocation tried to cast the returned instance to Number, resulting in a runtime ClassCastException. Note, however, that if the cast does not involve a change to the type parameter, then the cast is quite safe and there is no warning. For example, a cast from Collection<T> to List<T> doesn't require checking anything about T.

Casts involving parameterized types are unavoidable in many circumstances, such as when a method returns a (possibly non-generic) supertype, and the actual object will be of a generic subtype that you need to cast to that subtype. In such circumstances you should use casts that only involve unbounded wildcards, and assign the results to variables that also declare unbounded wildcards. The fact that you need a cast means that you have lost valuable type information and there is no way to restore that information at runtime, so casting to a parameterized type should always involve unbounded wildcards. For example, consider a non-generic version of the passThrough method:

Object passThrough(Object o) {
    return o;
}

and its subsequent use with an instance of List<String>:

List<String> l = new ArrayList<String>();
Object o = passThrough(l);

List<String> list1 = (List<String>) o; // unchecked
List<String> list2 = (List) o;         // unchecked; raw type
List<?>      list3 = (List) o;         // OK: but raw type
List<?>      list4 = (List<?>) o;      // OK

The attempt to cast to List<String> is effectively a cast to List<?> (or equivalently the raw type List). At runtime we may have a list of some type but we can not be certain it is a List<String>, so the use of the cast gives an “unchecked” warning. The cast to the raw List type is itself okay, but the assignment to list2 raises an “unchecked” warning because again there is no guarantee that the object is a List<String>. You can remove the “unchecked” warning by assigning to a variable of type List<?>, as with list3 above—the cast checks that the object is a list of some kind and that list3 can hold a reference to a list of some kind. You should cast to List<?> instead of simply to List because it clearly expresses that List is a generic type. The raw types exist for integration with legacy code that predates generics; when you are writing code, it's better to consistently stick with the generic expression of the concept instead of the legacy raw type.

The only places where use of a raw type is unavoidable are within a class literal expression and when accessing static members. This reinforces the fact that what you are accessing is independent of any parameterization of the generic type.

In some rare circumstances you may know for certain that a value has a particular parameterized type,[4] even though the type system cannot verify that at runtime, and a cast to that parameterized type (with its ensuing “unchecked” warning) is essential to allow your code to compile. As noted above, the fact that you need the cast means that type information was lost, so your first response to this should be to avoid the loss of type information in the first place. If you cannot do this you should clearly document why the “unchecked” warning is occurring and why you are certain that the lack of a runtime type check is not a problem.

Overloading and Overriding

In Chapter 2 we defined overloaded methods to be methods having the same name but different signatures. Later, in Chapter 3, we defined an overriding method to be a method in a subclass with the same name and the same signature as an accessible method in the supertype. When a method signature involves type variables, these definitions have to be adapted slightly.

First, the definition of “same signature” for two generic methods requires that they have the same number of type variables with the same corresponding bounds. Further, after all the uses of the type variables in the second method are renamed with the names of the type variables from the first method, the formal parameter types must be the same.

Second, rather than requiring the same (or different) signatures we talk about override-equivalent signatures: Two methods have override-equivalent signatures if their signatures are the same, or if the erasures of their signatures are the same.

Using the new definitions, two methods are overloaded if they have the same name and do not have override-equivalent signatures. Consider this class:

class Base<T> {
    void m(int x)    {}
    void m(T t)      {}
    void m(String s) {}
    <N extends Number> void m(N n) {}
    void m(SingleLinkQueue<?> q)   {}
}

This defines five different overloads of the method m. Replacing each method's signature by its erasure, we would get the following corresponding set of methods:

void m(int x)    {}
void m(Object t) {}
void m(String s) {}
void m(Number n) {}
void m(SingleLinkQueue q) {}

It is an error for a class or interface to declare two methods with the same name and the same signature erasure. Consequently, trying to define any of the following versions of m in Base would be an error:

void m(Object o) {}
void m(Number n) {}
<G extends String> void m(G g) {}
void m(SingleLinkQueue<Object> q) {}

In each case, the erasure of the signature matches the erasure of one of the other signatures.

A method in a subtype potentially overrides an accessible method in a super type if the two methods have the same name and have override-equivalent signatures. We say “potentially overrides” because there is an additional requirement to be met: The signature of the subclass method must be the same as that of the superclass method, or it must be the same as the erasure of the signature of the superclass method. This constraint makes overriding a “one-way street”: A method without generic types can override a method with generic types, but not the other way around. You are allowed to do this so that you can generify an existing class without breaking previously existing subclasses that overrode methods.

Continuing with the previous example we can extend Base as follows:

class Derived<T> extends Base<T> {
    void m(Integer i){}  // new overload
    void m(Object t) {}  // overrides m(T t)
    void m(Number n) {}  // overrides m(N n)
}

This introduces one new overloaded form of m and defines two overriding implementations of m.

The simple rule to remember when thinking about overloading and overriding is to always consider the erasure of the method signatures and to remember that you can't add genericity to an inherited method.

Finding the Right Method — Revisited

In “Finding the Right Method” on page 224, we outlined the basic algorithm used by the compiler to determine which form of a method should be invoked for a given method invocation expression. We can now consider how generic types and generic methods (and constructors) affect that basic algorithm.

The changes to the algorithm can be summarized as follows:

  1. If the method invocation includes explicit type arguments, then any potentially applicable generic methods must have the same number of type parameters. Non-generic methods may also be potentially applicable, in which case the actual type arguments are ignored.

  2. If there are no explicit type arguments in the invocation, and a generic method might be applicable, then first the type arguments for the generic method are inferred based on the static types of the argument expressions used in the method invocation. If no types that satisfy all the constraints on the type parameters can be inferred, then the method is not applicable.

  3. At each phase, once the set of potentially applicable methods is determined, applicable methods are searched for as previously described. For generic methods, either the explicit (if present) or the inferred type arguments establish whether the argument type is compatible with the formal parameter type.

  4. When the most specific method is searched for, type inference is again used when generic methods are being considered. However, this time the type inference is not based on the actual argument expression types; rather, the initial constraint is between the formal parameters of the generic method and the formal parameters of the method against which it is being tested for being most specific. In simple terms, the most specific test considers only the declared parameter types of the methods concerned, not the type of the arguments used in the method invocation. This is clarified in an example below.

  5. The search for the maximally specific method is not based on “same or different signature” but rather on whether the signatures are override-equivalent (as previously described) or not.

  6. If the maximally specific method is a generic method, then the type of the method invocation expression is the inferred return type of that generic method as determined by the type inference applied as in point 2 above.

For example, consider these two overloads of a method m:

void m(String key, Object val) {
    // ...
}

<S, T extends Number> void m(S key, T val) {
    // ...
}

Now consider this invocation of m:

m("hello", "world");

Both forms of m are potentially applicable because they have the right name and the right number of parameters. Both arguments are reference types and neither method has a variable number of parameters, so only phase 1 for finding the applicable methods will occur. First, the type arguments of the generic method must be inferred. Attempting to do so, however, requires that String extend Number; it does not, so the generic form is not applicable. The non-generic form is applicable because it matches in the first argument type exactly, and in the second a String can be assigned to an Object. Since there is only one applicable method, it is selected.

Now consider this invocation of m:

m(new Object(), 29);

Again both forms are potentially applicable. The type of the generic method is inferred to be <Object,Integer> after applying boxing conversion to the argument 29. The non-generic form fails to match in the first argument because an Object is not compatible with a String, so it is not applicable. The generic method matches in the first argument exactly, and matches in the second argument after a boxing conversion (phase 2). Consequently, the generic method is the only applicable method.

In contrast, this invocation of m is ambiguous:

m("hello", Integer.valueOf(29));

Both forms are potentially applicable, and again only phase 1 will come into play. The inferred type of the generic method call is <String,Integer> . The non-generic form matches exactly in the first argument type and matches in the second argument because an Integer is an Object. The generic form matches exactly on both argument types. So there are two applicable methods. The next step is to determine which is the most specific method. First, we see if the non-generic method is more specific than the generic method. This is done as follows:

  1. Since our second method is generic, type inference is performed, but this time with the initial constraint that String is convertible to S and that Object is convertible to T. This infers the types <String,Object> for the generic method. Note that the bound on T is not considered at this point.

  2. A check is made to see if each parameter of the first method is a subtype of the (inferred) type of the corresponding parameter in the second method. In this case the parameter types of the first method and the inferred parameter types of the second method are the same, so the check passes.

  3. Since the second method is generic, a second check is then made to see if the types of the parameters of the first method are subtypes of the bounds of the generic type parameters of the second method. Considering the first parameter, the bound of S is Object and so we are testing whether String is a subtype of Object—which it is. The bound of T, however, is Number and the test is if Object is a subtype of Number—which it is not.

Because the check has failed, the non-generic method is not more specific than the generic method. Now we must see if the generic method is more specific than the non-generic one. This proceeds as follows:

  1. The second method is not generic this time, so no type inference occurs.

  2. A check is made to see if each parameter of the first method is a subtype of the type of the corresponding parameter in the second method. In this case the check is to see whether S is a subtype of String and T is a subtype of Object. A type variable is a subtype only of its bounds (and their supertypes) so S is a subtype of Object and T is subtype of Number. Because S is a subtype of Object and not String, the check fails.

The check failed, so the generic method is not more specific than the non-generic method. Since neither method is more specific than the other the call is ambiguous and the compiler rejects it as such.

Informally, one method is more specific than another if all calls to the first method could be handled by the second. Looking at the two methods here, the most general signature of the generic method would have parameters of type Object and Number. So if we consider the first parameter, anything we pass to the non-generic method must be a String, and a String is always an Object, so the generic method could accept all first arguments to the first method. So considering just the first parameter, the non-generic method is more specific than the generic method. But conversely, the generic method can accept first arguments that are not String objects, so the generic method is not more specific than the non-generic method. Considering the second parameter, you can pass arbitrary objects to the non-generic method but only Number objects to the generic method. Consequently, the non-generic method is not more specific at the second parameter and so it is not more specific. As neither is more specific, the call is ambiguous.

You can remove the ambiguity in two ways. You can cast the first argument to Object so that the non-generic method is no longer applicable and so the generic version must be called:

m((Object) "hello", Integer.valueOf(29));

The alternative is to cast the second argument to Object so that the generic method is no longer applicable and so the non-generic version must be called:

m("hello", (Object) Integer.valueOf(29));

Note that parameterizing the invocation itself does not automatically exclude the non-generic method—its applicability is determined without consideration of the type arguments.

Overloading, even without involving generics, should be used judiciously and only to improve the resulting clarity of the program. With generics added to the mix, it is even easier to create programs in which the intent can only be established by detailed recourse to the language specification. Avoid mixing generic and non-generic overloads of a method unless you have an extremely compelling reason.

Class Extension and Generic Types

Generic types add an extra dimension to the type system, and force you to think more deeply about how to define classes and interfaces and how to use existing classes and interfaces. You can extend a non-generic type to produce either a generic or a non-generic subtype. You can extend a generic type to produce a generic subtype, or you can extend a specific parameterized type to yield a non-generic subtype, or you can combine both. This degree of flexibility can be overwhelming, but the key thing to focus on is what abstraction your new class or interface represents and what abstraction the class or interface you are inheriting from represents.

For example, the List<E> interface is generic and represents a list of elements of some kind. If you implement that interface then you may be providing a general-purpose implementation that can deal with arbitrary element types, and in that case your class would also be a generic class, such as

class GeneralList<E> implements List<E> { /* ... */ }

Or you may be providing a specialized implementation that has been hand-crafted to deal with a particular kind of element, say String, and so it won't be generic itself and will implement the parameterized interface List<String>:

    class StringList implements List<String> { /* ... */ }

An existing non-generic class, such as AbstractEventService, may provide base functionality that you have to extend to use a framework within your application. You might define a concrete implementation that has no need of generics:

class RemoteEventService extends AbstractEventService {
    /* ... */
}

Or you might define a generic service for different kinds of events:

class LocalEventService<T extends Event> extends
                          AbstractEventService { /* ... */ }

The possibilities really depend on the semantics of the types involved.

When choosing to extend a parameterized type, you need to carefully consider the implications, particularly if you are implementing an interface, because a class cannot inherit two interface types that are different parameterizations of the same interface. For example, consider the Comparable interface. If you define a comparable class Value then it is natural to declare it as:

class Value implements Comparable<Value> {
    // ...
}

Suppose you now extend Value to add additional state. This ExtendedValue class should define objects that are comparable only to ExtendedValue objects, so you would expect to be able to define it so:

class ExtendedValue extends Value
    implements Comparable<ExtendedValue> { // INVALID!
    // ...
}

But this won't compile because ExtendedValue is attempting to implement both Comparable<Value> and Comparable<ExtendedValue> and that is not permitted. Once you extend or implement a parameterized type you fix that parameterization for the current class and all subclasses. So you need to think carefully about the implications of that. In the case of Comparable you have little choice but to accept the restriction, and although you can't change the type parameterization, you can override the implementation of compareTo to ensure that a subclass instance is accepted for comparison while a superclass instance is rejected. Recall from the SortedCollection example on page 258 that if you want to accept a Comparable object, the correct approach is to declare Comparable<?super T> rather than Comparable<T>. This is not only more general (as previously explained), it also allows for the fact that a comparable class T may not actually be able to implement Comparable<T>.

Generic types are a powerful tool for writing your programs more effectively, but they are a tool that is difficult to master and can easily be misused. A full treatment of generic types requires an understanding of type theory—in particular the characteristics of covariant and contravariant typing—that is beyond the scope of this book. If you are interested there is additional material on the subject listed in “Further Reading” on page 755.

 

Nearly all men can stand adversity, but if you want to test a man's character, give him power.

 
 --Abraham Lincoln


[1] This is a consequence of there being no parameterized type information within objects at runtime—see Section 11.5 on page 267.

[2] You'll see a little later (page 258) that it's better to actually declare this slightly differently, but for now we keep it simple.

[3] Since the reference literal null has no type, the type-system will allow you to invoke add passing null, but this is an uninteresting boundary case with little practical significance.

[4] Implementing a clone method is one such circumstance—see Section A.3.2 on page 747.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.78.237