Lesson 14. Generics

I introduced you to parameterized types in Lesson 2. This lesson covers parameterized types, or generics, in depth. You will learn the many rules behind creating parameterized types. Creating parameterized types is probably tied with multithreading as the most complex topic in core Java programming.

You will learn about:

• creating parameterized types

• multiple type parameters

• the erasure scheme

• upper bounds (extends)

• wildcards

• generic methods

• lower bounds (super)

• additional bounds

• raw types

• checked collections

Parameterized Types

You generally develop a collection so that it can contain objects of any type. However, in most cases, collections are useful only if you constrain them to hold objects of a single (set of) types: a list of students, a words-to-definitions map, and so on. You could consider this to be the Single-Responsibility Principle applied to collections: Collections should hold one kind of thing and one kind of thing only. You rarely want your student list to contain a stray professor. And if you included a word-to-picture entry in your map, it would create significant headaches for your dictionary application.

Sun originally developed the base collection classes in Java's class library to support storing and returning objects of any type (i.e., Object or any subclass). This meant that you could add a String object to a collection intended to store Student objects. While this sounds like an unlikely mistake, pulling an object of an unexpected type out of a collection is a frequent source of application defects. The reason is that code that stores objects into the collection is often distant from code that retrieves objects from the collection.

With the advent of J2SE 5.0, Sun introduced the concept of parameterized types, also known as generics. You can now associate, or bind, a collection instance to a specific type. You can specify that an ArrayList can only hold Student objects. Under earlier Java versions, you would receive a ClassCast-Exception error at runtime when retrieving a String inadvertently stored in the ArrayList. Now, with parameterized types, you receive a compilation error at the point where code attempts to insert a String in the list.

Collection Framework

Sun has parameterized all collection classes in the Collections Framework. A simple language-based test demonstrates a simple use of the parameterized type ArrayList:

image

Both the implementation class ArrayList and the interface List are bound to the String class (line 1). You can add String objects to name (2). When retrieving from name and assigning to a String reference (3), you need not cast to String.

If you attempt to insert an object of any other type:

names.add(new Date()); // this won't compile!

the compiler will present you with an error:

image

Multiple Type Parameters

If you look at the API documentation for java.util.List and java.util.ArrayList, you will see that they are declared as List<E> and ArrayList<E>, respectively. The <E> is known as the type parameter list. The List interface and ArrayList class declarations each contain a single type parameter in the type parameter list. Therefore you must supply a single bind type when using List or ArrayList in your code.1

1 You are not required to supply any bind types; however, this is not recommended. See the section in this lesson entitled Raw Types.

The class HashMap represents a collection of key-value pairs. A key can be of one type and the value could be another. For example, you might store an appointment book in a HashMap, where the key is a Date and the value is a String description of an event occurring on that date.

image

The Map interface and HashMap class are declared as Map<K,V> and HashMap<K,V>, respectively. You must supply two bind types when using these—one for the key (K) and one for the value (V).

Creating Parameterized Types

You will develop a parameterized MultiHashMap. A MultiHashMap is similar to a HashMap, except that it allows you to associate multiple values with a given key. Let's expand on the calendar example: Some people lead particularly interesting lives, in which two events might occur on one day.

A few starter tests will get you off the ground. Remember that you should be developing incrementally, one test method and one assertion at a time.

image

image

The method getSoleEvent is of some interest. After retrieving the events collection stored at a date using the Map method get, you must ensure that it contains only one element. To retrieve the sole element, you can create an iterator and return the first element it points to. You must bind the Iterator object to the same type that you bound the collection to (String in this example).

From an implementation standpoint, there is more than one way to build a MultiHashMap. The easiest is to use a HashMap where each key is associated with a collection of values. For this example, you will define MultiHashMap to encapsulate and use a HashMap.

In order to support the existing tests, the implementation of MultiHashMap is simplistic. Each method size, put, and get will delegate to the encapsulated HashMap:

image

The put method first extracts a list from map using the key passed in. If there is no entry at the key, the method constructs a new ArrayList bound to the value type V and puts this list into map. Regardless, value is added to the list.

Like HashMap, the type parameter list contains two type parameters, K and V. Throughout the definition for MultiHashMap, you will see these symbols where you might expect to see type names. For example, the get method returns V instead of, say, Object. Each use of a type parameter symbol within the type declaration is known as a naked type variable.

At compile time, each occurrence of a naked type variable is replaced with the appropriate type of the corresponding type parameter. You'll see how this actually works in the next section, Erasure.

In the definition of the map field (highlighted in bold), you construct a HashMap object and bind it to <K,List<V>>. The key for map is the same type and the one to which the key of MultiHashMap is bound (K). The value for map is a List bound to type V—the value type to which the MultiHashMap is bound.

Bind parameters correspond to type parameters. The test contains this instantiation of MultiHashMap in its setUp method:

events = new MultiHashMap<Date,String>();

The Date type corresponds to the type parameter K, and the String type corresponds to the type parameter V. Thus, the embedded map field would be a HashMap<Date,List<String>>—a hash map whose key is bound to a date and whose value is bound to a list of strings.

Erasure

There is more than one way that Sun might have chosen to implement support for parameterized types. One possible way would have been to create a brand-new type definition for each type to which you bind a parameterized class. In binding to a type, each occurrence of a naked type variable in the source would be replaced with the bind type. This technique is used by C++.

For example, were Java to use this scheme, binding MultiHashMap to <Date,String> would result in the following code being created behind the scenes:

image

Binding MultiHashMap to another pair of types, such as <String,String>, would result in the compiler creating another version of MultiHashMap. Using this scheme could potentially result in the compiler creating dozens of class variants for MultiHashMap.

Java uses a different scheme called erasure. Instead of creating separate type definitions, Java erases the parameterized type information to create a single equivalent type. Each type parameter is associated with a constraint known as its upper bound, which is java.lang.Object by default. Client bind information is erased and replaced with casts as appropriate. The MultiHashMap class would translate to:

image

Knowing how parameterized types work behind the scenes is critical to being able to understand and use them effectively. Java contains a number of restrictions on the use of parameterized types that exist because of the erasure scheme. I'll discuss these limitations throughout this chapter. Each possible scheme for implementing generics has its downsides. Sun chose the erasure scheme for its ability to provide ultimate backward compatibility.

Upper Bounds

As mentioned, every type parameter has a default upper bound of Object. You can constrain a type parameter to a different upper bound. For example, you might want to supply an EventMap class, where the key must be bound to a Date type—either java.util.Date or java.sql.Date (which is a subclass of java.util.Date). A simple test:

image

The EventMap class itself has no different behavior, only additional constraints on the type parameter K:

image

You use the extends keyword to specify the upper bound for a type parameter. In this example, the K type parameter for EventMap has an upper bound of java.util.Date. Code using an EventMap must bind the key to a java.util.Date or a subclass of java.util.Date (such as java.sql.Date). If you attempt to do otherwise:

EventMap<String,String> map = new EventMap<String,String>();

you will receive compile-time errors:

image

The compiler replaces naked type variables in generated code with the upper bound type. This gives you the ability to send more specific messages to naked type objects within the generic class.

You want the ability to extract all event descriptions from the EventMap for events where the date has passed. Code the following test in Event-MapTest.

image

Within EventMap, you can presume that objects of type K are Date objects:

image

In order to accomplish this, you'll have to add a method to MultiHashMap to return the entrySet from the encapsulated map:

protected Set<Map.Entry<K,List<V>>> entrySet() {
   return map.entrySet();
}

The entrySet method returns a set bound to the Map.Entry type. Each Map.Entry object in the Set is in turn bound to the key type (K) and a list of the value type (List<V>).

Wildcards

Sometimes you'll write a method where you don't care about the type to which a parameter is bound. Suppose you need a utility method that creates a single string by concatenating elements in a list, separating the printable representation of each element with a new line. The StringUtilTest method testConcatenateList demonstrates this need:

image

In the StringUtil method concatenate, you will append each list element's string representation to a StringBuilder. You can get the string representation of any object, provided by the toString method, without knowing or caring about its type. Thus, you want to be able to pass a List that is bound to any type as the argument to concatenate.

You might think that you can bind the list parameter to Object:

image

By binding list to Object, you constrain it to hold objects only of type Object—and not of any Object subclasses. You cannot assign a List<String> reference to a List<Object> reference. If you could, client code could add an Object to list using the List<Object> reference. Code that then attempted to extract from list using the List<String> reference would unexpectedly retrieve an Object.

Instead, Java allows you to use a wildcard character (?) to represent any possible type:

image

Within the concatenate method body, you cannot use the ? directly as a naked type variable. But since list can contain any type of object, you can assign each of its elements to an Object reference in the for-each loop.

Additionally, you can constrain a wildcard to an upper bound using the extends clause.

image

For a second string utility method, you need to be able to concatenate a list of numerics, whether they are BigDecimal objects or Integer objects. Several tests drive out the minor differences between decimal output and integral output:

image

The implementation in StringUtil:

image

You'll need to import java.math.* to get the above code to compile.

The declaration of the list parameter in concatenateNumbers specifies that it is a List bound to either Number or a subclass of Number. The code in concatenateNumbers can then assign each element in list to a Number reference.

Implications of Using Wildcards

The downside of using a wildcard on a reference is that you cannot call methods on it that take an object of the type parameter. For example, suppose you create a pad utility method that can add an element n times to the end of a list.

image

The pad method declares the list parameter to be a List that can be bound to any type:

image

The error you receive indicates that the compiler doesn't recognize an appropriate add method.

image

The problem is that the wildcard ? designates an unknown type. Suppose that the List is bound to a Date:

List<Date> list = new ArrayList<Date>();

Then you attempt to pass a String to the pad method:

ListUtil.pad(list, "abc", count);

The pad method doesn't know anything about the specific type of the object being passed in, nor does it know anything about the type to which the list is bound. It can't guarantee that a client isn't trying to subvert the type safety of the list. Java just won't allow you to do this.

There is still a problem even if you specify an upper bound on the wildcard.

image

The problem is that a client could bind the list to java.sql.Date:

List<java.sql.Date> list = new ArrayList<java.sql.Date>();

Since java.util.Date is a superclass ofjava.sql.Date, you cannot add it to a list that is constrained to hold only objects of type java.sql.Date or a subclass of java.sql.Date. Due to erasure, Java has no way of figuring out whether or not a given operation is safe, so it must prohibit them all. The general rule of thumb is that you can use bounded wildcards only when reading from a data structure.

Generic Methods

How do you solve the above problem? You can declare the pad method as a generic method. Just as you can specify type parameters for a class, you can specify type parameters that exist for the scope of a method:

image

The compile can extract, or infer, the type for T based on the arguments passed to pad. It uses the most specific type that it can infer from the arguments. Your test should now pass.

Generic method type parameters can also have upper bounds.

You will need to use generic methods when you have a dependency between an argument to a method and either another argument or the return type. Otherwise, you should prefer the use of wildcards. In the pad method declaration, the object parameter's type is dependent upon the type of the list parameter.

Wildcard Capture

The technique of introducing a generic method to solve the above problem is known as wildcard capture. Another example is a simple method that swaps the elements of a list from front to back.

image

The wildcard capture involves calling a new generic method, swap:

image

By doing so, you are giving the wildcard a name.

Super

Upper-bounded wildcards, which you specify using extends, are useful for reading from a data structure. You can support writing to a data structure using lower-bounded wildcards.

image

You want to be able to create a new MultiHashMap from an existing MultiHashMap. The new map is a subset of the existing map. You obtain this subset by applying a filter to the values in the existing MultiHashMap. The test shown here creates a multimap of meetings. A meeting can be a one-time event or it can be recurrent. You want to create a new multimap that consists only of meetings that occur on Mondays.

Further, the original meetings multimap consists of java.sql.Date objects. Perhaps it was directly loaded from a database application. You want the new multimap to consist of java.util.Date objects.

image

You can use lower-bounded wildcards on the MultiHashMap filter method to allow transforming from java.sql.Date values to java.util.Date values. In the implementation, the target argument of the filter method is a MultiHashMap whose value type (V) has a lower bound of V. This is expressed as ? super V. This means that the value type (V) of the target MultiHashMap can be the same as V or a supertype of V. The meetings example works because java.util.Date is a supertype of java.sql.Date.

image

Additional Bounds

It is possible to specify additional bounds, or more than one type, in an extends clause. While the first constraint can be either a class or interface, subsequent constraints must be interface types. For example:

public static
   <T extends Iterable&Comparable<T>> void iterateAndCompare(T t)

By using additional bounds, you are constraining the parameter passed to implement more than one interface. In the example, the object passed in must implement both the Iterable and Comparable interfaces. This is not normally something you want to do; additional bounds were introduced largely to solve issues of backward compatibility (see below).

If a method requires that an object behave as two different types, it intends for that method to do two different kinds of things to that object. In most cases, this will be a violation of the Single-Responsibility Principle, which is as applicable to methods as it is to classes. There will always be exceptions, of course, but before using additional bounds for this reason, see if you can decompose the method in question.

The more legitimate reason to use additional bounds is demonstrated by code in the Collections class. The Collections class contains static utility methods for operating on collection objects; it includes a method named max that returns the largest element in a collection. Under earlier versions of Java, the signature of max was:

public static Object max(Collection c)

The passed-in collection could hold any type of object, but the max method depended upon the collection holding objects that implemented the Comparable interface. An initial stab at a solution using generics could insist that the collection hold Comparable objects:

public static
   <T extends Comparable<? super T>> T max(Collection<? extends T> c)

Paraphrased, this signature says: max takes a collection of objects of any type and that that type must implement the Comparable interface, which in turn must be bound to the type or supertype of elements stored in the collection. The problem is that after erasure, this results in a different signature for the max method:

public static Comparable max(Collection c)

Instead of returning an Object reference, max would now return a Comparable reference. The change in signature would break existing compiled code that used the max method, killing compatibility. Using additional bounds effectively solves the problem:

public static
   <T extends Object&Comparable<? super T>>
   T max(Collection<? extends T> c)

Per J2SE specification, a type variable gets erased to its leftmost bound—Object in this example, and not Comparable. The max method now returns an Object reference but also defines the additional constraint that the collection objects must implement the appropriate Comparable interface.

Raw Types

If you work with an existing system, written in J2SE 1.4 or earlier, it will use collection classes that do not support parameterization. You'll have plenty of code such as:

List list = new ArrayList();
list.add("a");

You can still use this code under J2SE 5.0. When you use a generic type without binding it to a type parameter, you refer to the type as a raw type. The use of raw types is inherently not typesafe: You can add objects of any type to a raw collection and encounter runtime exceptions when retrieving an object of unexpected type. This is the fundamental reason why Sun introduced parameterized types.

Any such unsafe operation is met with a warning by the compiler. Using the default compiler options, the above two lines of code that add to a raw ArrayList result in the following compiler message:

Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

Do what the message says and recompile using the VM switch Xlint:unchecked:

javac -source 1.5 -Xlint:unchecked *.java

Under Ant, supply an additional nested element on the javac task:

image

Recompiling will present you with more specific details on the source of each unchecked warning:

image

The Java compiler will generate an unchecked warning whenever it cannot guarantee the type safety of an operation on a parameterized type. If you are developing on an existing 1.4 or older application, you'll likely receive gobs of warnings. While it may not be possible or desirable to fix all the warnings at once, you should never treat warnings lightly. If you modify a section of legacy code, attempt to introduce generic types. You should be extremely leery of warnings that arise from any new code that you add.

Checked Collections

Suppose you're working with legacy code. In fact, most systems you encounter will include code written for older versions of Java. This is the reality of Java development. I still encounter systems that make pervasive use of Vector and Hashtable—even though Sun recommended use of collections framework classes (List/ArrayList, etc.) instead as of Java 1.2. That was over six years ago!

You will likely encounter raw collections for some time coming. You can quickly add some time safety to your code by using a checked wrapper. Suppose your existing mess of a system creates a list intended to hold Integer wrapper objects:

List ages = new ArrayList();

Elsewhere in the code, in a method nested a few classes away, another bozo developer adds a new line of code:

ages.add("17");

And even further away, code in another class is responsible for extracting age values from the collection:

int age = ((Integer)ages.get(0)).intValue();

Oops! That line of legacy code results in a ClassCastException, since the first element in ages is a String, not an Integer. Sure, you will find the problem “real soon now,” but you will have wasted valuable time in the process.

The best solution would be to parameterize references to ages in each of the three classes in question. Then the second developer wouldn't even be able to compile the code that inserts a string. But changing those classes might be beyond your control or there might be dozens of affected classes and not three.2

2 And of course none of those dozens of classes will have tests. The vast majority of legacy systems have no tests.

Sun introduced a new set of methods to the Collections class that can help solve this problem with type safety. These methods create checked wrapper objects. A checked wrapper object delegates off to the actual collection object, much like the unmodifiable and synchronized wrappers. A checked wrapper ensures that an object passed to the collection is of the appropriate type, preventing inappropriate objects from being inserted into the collection.

Using a checked collection will at least constrain the exception to the point of error. In other words, the code that adds the bad data will generate the exception, not the code that attempts to extract it. This will make for quicker debugging efforts. You can make the change in one place:

List ages = Collections.checkedList(new ArrayList(), Integer.class);

When the Java VM executes this line of code, you receive an exception:

java.lang.ClassCastException: Attempt to insert class java.lang.String element into collection
with element type class java.lang.Integer

A language unit test demonstrates this in toto:

image

Checked collections are not at all magical. The first argument to checkedList is the ArrayList object you are creating; it is bound to the Integer type. The second argument is a Class reference to the Integer type. Each time you invoke the add method, it uses the class reference to determine whether or not the passed parameter is of that type. If not, it throws a ClassCastException.

Checked wrappers require the redundancy of having to pass in a type (Integer.class) reference in addition to specifying a bind type (<Integer>). This is required because of erasure: the bind type information is not available to the list object at runtime. You can encapsulate unchecked wrappers in your own parameterized types, but you will then need to require clients to pass in the Class reference.

The Collection class provides checked collection wrappers for the types Collection, List, Map, Set, SortedMap, and SortedSet.

Even if you are writing J2SE 5.0 code, using checked collections can protect you from loose and fast developers beyond your sphere of influence. Some cowboy and cowgirl developers enjoy subverting the rules whenever they can. In the case of parameterized types, Java lets them.

You can cast an object of a parameterized type to a raw type. This allows you to store an object of any type in the collection. You will receive a compilation warning, but you can ignore the warning. Do so at your own peril. The results will generally be disastrous. This sort of strategy falls under the category of “don't do that!”

Using checked collections in 5.0 code can help solve this problem. The exceptions thrown will come from the sneaky code, letting you pinpoint the source of trouble.

Arrays

You cannot create arrays of bounded parameterized types:

List<String>[] names = new List<String>[100]; // this does not compile

Attempting to do so generates the compilation error:

arrays of generic types are not allowed

Again, the problem is that you could easily assign the parameterized type reference to another type reference. You could then add an inappropriate object, causing a ClassCastException upon attempted extraction:

image

Java does allow you to create arrays of parameterized types that are unbounded (i.e., where you use only the wildcard character ?):

image

The chief distinction is that you must acknowledge the potential for problems by casting, since the List is a collection of objects of unknown type.

Additional Limitations

Due to the erasure scheme, an object of a parameterized type has no information on the bind type. This creates a lot of implications about what you can and cannot do.

You cannot create new objects using a naked type variable:

image

This code generates the compile error:

image

Erasure means that a naked type variable erases to its upper bound, Number in this example. In most cases, the upper bound is an abstract class such as Number or Object (the default), so creating objects of that type would not be useful. Java simply prohibits it.

You can cast to a naked type variable, but for the same reason, it is not often useful to do so. You cannot use a naked type variable as the target of the instanceof operator.

You can use type variables in a generic static method. But you cannot use the type variables defined by an enclosing class in static variables, static methods, or static nested classes. Because of erasure, there is only one class definition that is shared by all instances, regardless of the type to which they are bound. The class definition is created at compile time. This means that sharing static elements wouldn't work. Each client that binds the parameterized type to something different would expect to have the static member constrained to the type they specified.

Reflection

The reflection package has been retrofitted to support providing parameter information for parameterized types and methods. For example, you can send the message getTypeParameters to a class in order to retrieve an array of TypeParameter objects; each TypeParameter object provides you with enough information to be able to reconstruct the type parameters.

To support these changes, Sun altered the Java byte code specification. Class files now store additional information about type parameters. Most significant, the Class class has been modified to be a parameterized type, Class<T>. The following assignment works:

Class<String> klass = String.class;

If you're interested in how you might use this, take a look at the source for the CheckedCollection class. It's a static inner class of java.util.Collections.

The reflection modifications provide you with information on the declaration of parameterized types and methods. What you will not get from reflection is information about the binding of a type variable. If you bind an ArrayList to a String, that information is not known to the ArrayList object because of the erasure scheme. Thus reflection has no way of providing it to you. It would be nice to be able to code

image

but it just won't work.

Final Notes

As you've seen, knowing how the erasure scheme works is key to being able to understand and implement parameterized types. From a client perspective, using parameterized types is relatively easy and imparts the significant benefit of type safety. From the perspective of a developer creating parameterized types, it can be a fairly complex adventure.

If you find yourself struggling with how to understand or define a parameterized type, distill the definition and sample use of the parameterized type into its erased equivalent. Also, take a look at some of the uses in the J2SE 5.0 source code. The collections framework classes and interfaces, such as Map and HashMap, provide some good examples. For more complex examples, take a look at java.util.Collections.

Exercises

  1. Create a new parameterized collection type—a Ring. A Ring is a circular list that maintains knowledge of a current element. A client can retrieve and remove the current element. The Ring must support advancing or backing up the current pointer by one position. The method add adds an element after the current element. The Ring class should support client iteration through all elements, starting at the current pointer, using for-each. The Ring class should throw appropriate exceptions on any operation that is invalid because the ring is empty.

    Do not use another data structure to store the ring (e.g., a java.util.LinkedList). Create your own link structure using a nested node class. Each node, or entry, should contain three things: the data element added, a reference to the next node in the circle, and a reference to the previous node in the circle.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.69.85