Chapter 16. Collections

  • Collection API

  • List, LinkedList, and ArrayList

  • Set, HashSet, and SortedSet

  • The Collections Helper Class

  • Wildcard Parameters and Generic Methods

  • Wildcarding a Generic Parameter

  • Generic Methods

  • Summary of Collection

  • Map, HashMap, and TreeMap

  • Exercises

  • Some Light Relief—Early Names for Java

We come now to a most important part of Java: the java.util and related packages that provide off-the-shelf data structures for your programs.

Computer science has a couple of dozen standard data structures (such as the linked list, the binary tree, the stack, etc). Java provides many of these standard data structures directly as library classes, and makes it easy for you to implement additional ones in a consistent way. These data structures hold collections of objects, called elements, and the library is known as the Java Collection Framework.

The Java Collection Framework has been updated to use generics starting in JDK 1.5, and you need to read Chapter 15 to understand this chapter. Collections are expected to be homogeneous: an instance of a collection class should only hold elements of one type. The code change is that you use generics to tell the compiler what that type is. The compiler will then detect attempts to add an object of the wrong type into a collection.

Collection API

We mentioned there are a couple of dozen basic data structures in computer science. Java provides the most important of these (including all three mentioned earlier) in a set of library classes. There is a common interface, java.util.Collection, giving the method signatures for insertion, deletion, and other common operations. Whatever java.util data structures you use, you can access them in a uniform way.

There are three main data structure classes that implement java.util.Collection:

  • List. Lists can contain duplicate elements. Lists are kept in the order that elements are added. A list might contain an element for each tire sold in a tire store on one day. In this example, if someone buys four “Road Hugger” tires, four identical elements will get added to the list.

  • Set. Sets cannot contain duplicate elements. Sets have their own idea of order, which is not the order that elements are added. A Set might contain “all customer objects who have not bought from us in the last six months”.

  • Queue.

    image

    Queues were added in JDK 1.5. They are intended for holding elements prior to processing. Queues came into Java to help Threads. They typically deliver elements in a FIFO (first in, first out) order, but don't have to. You might use a queue to hold service requests while waiting for a server thread to become free.

On top of these three basic structures, there are many additional flavors or variations:

  • Lists that are kept in an array (allowing fast random access)

  • Lists in which each element points to adjacent elements rather than being stored next to them (enabling lists of arbitrary size)

  • Sets specialized for fast access to enum objects

  • Sets that support “copy on write” operations for sharing mostly read-only data between threads,

  • Blocking queues of various types that wait for space before completing an add().

There are also a few simple data structures from JDK 1.1, updated to implement Collection: Vector and Stack.

There is a fourth basic data structure in java.util. It doesn't implement the Collection interface because it's about pairs of related objects.

  • Map. Lists and Sets just hold individual objects. Maps are for pairs of objects, where one is the key (such as “person name”), and the other is a value associated with that key (e.g. phone number). Maps have lots of variations too, depending on the underlying data structure: HashMap, TreeMap, WeakHashMap, HashTable and others. A map is a mathematical term for how one group of things (the keys) is related to another group of things (the values).

You choose List, Set, Queue, or Map depending on the characteristics of your data. Then you drill down onto a specific concrete class that gives the performance or other quality you need. You don't typically extend the Collection classes. You declare an object of one of the classes. Then you add your objects into your Collection object. Later, you iterate through the collection and process those objects, perhaps updating or sorting or removing some of them. There are umpteen other secondary operations too, like getting the count of the number of objects, and adding or removing a complete other collection from this one.

Most texts launch into an involved and highly detailed description of the individual data structures at this point. You can quickly get overwhelmed with low level information on weakly referenced hash maps, or abstract sequential lists. Instead, let's take a look at characteristics of collections, and some examples.

The Collection interface

The class java.util.Collection is just an interface that defines a dozen or so methods for adding to, removing from, and querying a data structure. Now all the individual java.util data structures (and others that you write) can implement this interface, and everyone adds and retrieves data with the same method signatures. Collection has a single generic parameter, E, so you can specify what Element type it is a collection of. It looks like this (simplified for ease of reading):

public interface java.util.Collection <E> {
     // basic operations
     public int      size();
     public boolean isEmpty();
     public boolean contains(E element);
     public boolean add(E element);
     public boolean remove(E element);
     public Iterator<E> iterator();

     // bulk operations on an entire collection
     public boolean addAll(Collection c);
     public boolean removeAll(Collection c);
     public boolean retainAll(Collection c);
     public boolean containsAll(Collection c);
     public void    clear();

     // put the collection into an array
     public Object[] toArray();
     public <T> T[] toArray(T[] a);

     // a reminder that you may need to override these
     public boolean equals(Object o);
     public int      hashCode();
}

Again, generic parameter E says what type you're going to keep in this instance. Here's how you would declare a collection that will hold Double objects (refer to the previous chapter if you are wondering about <Double>):

Collection <Double> myReadings;

Of course, that only gives you a place to hold a reference to the collection object. Say we decide to hold our readings in a PriorityQueue. The whole thing will look like this:

Collection<Double> myReadings = new PriorityQueue<Double>();

Let's take a closer look at the methods that the Collection interface promises. All of the basic methods should be self-explanatory, with the possible exception of iterator(), which we will get to shortly.

add() and remove() will add or remove the argument element and return true if that operation changed the Collection. In other words, if you try to remove something that is not in the collection, no exception is thrown, but the method call returns false. If the argument object actually was part of the collection, that element is removed from it and the method returns true. If you try to add something that is already in the collection, the result depends on what kind of collection it is. Some collections allow duplicate elements, and will add it again quite happily. Other collections do not allow duplicates. They will notice that they are already holding the element, and will return false to indicate that the operation did not change the collection.

There is no get() method to obtain an individual object back from a collection! The older way uses an iterator, which will give you all the objects one-by-one. You can stop iterating when you reach one you particularly like, if you wish. In JDK 1.5 use the for statement. There is a get() method on some of the classes that implement Collection.

The retainAll() method does set intersection. It goes through this Collection and compares it with the Collection passed as the argument. It keeps all the elements that are in both collections and removes the others from this Collection.

The addAll() method adds all the elements of the argument collection to this collection. The removeAll() method removes all elements from this collection that are also contained in the argument collection. After this call returns, this collection will hold no elements in common with the argument collection.

All concrete Collection classes have a constructor that takes a Collection as an argument. The collection must be of the same type or a subtype of E. There is no way for the Collection interface to specify the existence of these constructors. It's just a generally accepted protocol. This constructor allows you to get a view of an existing Collection in one of the other Collection data structures. No cloning takes place.

Collections only ever hold references to objects, so sorting, or adding, or other operations are quick—just rearrange a few pointers. But if you change an object through some other pointer, the collection gets changed too. That might or might not be what you intend.

There is a helper class, Collections, that consists exclusively of static functions to do useful things to a Collection argument (get the maximum element, reverse a collection, sort it, do a binary search, etc.). We come back to that later with an example. Next we'll take a look at the way you add elements to any collection, and how you can use an iterator to visit all the elements.

Adding to a Collection

One of the concrete classes that implements Collection is java.util.HashSet. This class implements Set interface, so we know it cannot have duplicates. It is implemented using a hash table, and it acts like a Set, hence HashSet. Here is the code that shows how you would populate a HashSet Collection object and invoke some common methods on it:

     Collection<Double> c = new HashSet<Double>();

     for(int i = 1; i <= 5; i++)
         c.add( (double)i );  // add element to collection

     if ( ! c.isEmpty() )
         System.out.println("c has " + c.size() + " elements");


     for(Double d : c )     // iterate through a collection!
         System.out.println("c has " + d );
}

If you put those statements in a main routine and add an “import java.util.*” at the top of the class, you'll be able to compile and run the program with this result:

% javac -source 1.5 foo.java
% java foo
c has 5 elements
c has 3.0
c has 4.0
c has 5.0
c has 1.0
c has 2.0

HashSet is one of the collections that uses an order that suits itself, hence the elements come out in a seemingly funny order. To see why, you have to understand how hash tables work, and in particular what is going on with the methods equals() and hashcode() that are defined in java.lang.Object, and hence present in every Object. We'll defer this discussion to a little later in the chapter.

Foreach statement—iterating through a Collection

There are two things to note here. First, Sets don't generally give you back the elements in the order in which you add them. Second, iterating through a collection is now quick and simple to write. Just mention the element type, a loop variable, and the collection you are taking it from:

for(Double d : c )     // iterating through a collection!

Let's take a more detailed look at iterators, the older way of traversing a collection

public interface Iterator <E>

Iterator is an interface that allows you to visit all the elements in a collection without having to know the details of how or where they're stored. The collection class knows all those details, and it implements the Iterator interface. The Iterator interface looks like this:

public interface java.util.Iterator<E> {
    public boolean hasNext();
    public E next();
    public void remove();
}

The interface java.util.Iterator replaces an earlier attempt to do the same thing, java.util.Enumerator (mnemonic: each improvement makes the name shorter: Enumerator, Iterator, foreach). Use foreach when you can. Use Iterators instead of Enumerators because:

  • Iterators provide a safe way to remove an element that you have just arrived at, whereas Enumerator does not have this feature.

  • The names are shorter in Iterator (next() versus nextElement()).

  • Iterators have been updated to take a generic parameter, so that the next() method returns something of the correct type, not type Object.

The Collection interface promises that each implementing class will have a method that returns an Iterator:

public Iterator<E> iterator()

Calling a collection's iterator() method returns you back an object that fulfills the Iterator (with a capital “I”) interface. The obvious implementation for iterator() is to return an inner class that implements Iterator, and is local to the method, possibly anonymous[1]. For an array, iteration is simply going from a[0] to a[n-1], but for a binary tree a little more effort is needed. The Iterator interface allows that effort to be encapsulated, kept in the Collection, and hidden from user classes.

The method hasNext() allows you to test whether there is a next element in the collection. The methods hasNext() and next() are robust. You can call them in any order, and they are not tied to each other at all. The method hasNext() can be called multiple times without moving to the next element, and it will return the correct answer.

The method next() returns the first then subsequent elements of the iteration. If there is no next element, it will throw the exception NoSuchElementException. When you visit an element with next(), you can then call remove() on it to delete it from the collection. This is much simpler than having every individual Java programmer trying to implement remove() for a linked list or whatever the structure is, for themselves.

Assume we are using the Collection created in the previous example. We can work our way through each element with an explicit Iterator like this:

Collection<Double> c = new HashSet<Double>();

//  code to fill the collection, omitted.

for (Iterator i = c.iterator(); i.hasNext(); )
    System.out.println("c has " + i.next() );

The new syntax of the for statement is a lot easier and quicker to write:

for(Double d : c )
    System.out.println("c has " + d );

Iterators will drop out of use, as the 1.5 release grows in popularity. You'll only see Iterators in old code, and in loops where you need to remove elements from the collection.

Let's go on to look at another class that implements Collection: LinkedList.


[1] If that sentence seemed like mumbo-jumbo, please review Chapter 12 and tackle the collections questions in the exercises at the end of this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.142.85