CHAPTER 5: The Collection ADT

© Ake13bk/Shutterstock

This chapter introduces one of the most fundamental ADTs, the Collection ADT. This ADT provides the ability to store information and then access the stored information. As you can imagine, storing and retrieving information is often at the heart of computer processing. There are other ADTs that provide these same operations, perhaps with certain restrictions on the information stored or the order in which it can be accessed. In fact, you might recall that we have referred to both of our previously defined ADTs, the Stack and the Queue, as “access-controlled” collections.

Typical Collection ADTs (unlike the Stack and the Queue) allow retrieval of information based on the contents of the information. Therefore, we return to the topic of object comparison in this chapter, considering what it means for two objects to be “equal” or for one object to be “less than” or “greater than” another object. Being able to compare objects allows us to store them in sorted order, which, in turn, permits relatively fast retrieval options. This allows us, for example, to analyze more information during processing—we demonstrate this ability using a text processing application developed with our Collection ADT.

5.1 The Collection Interface

Stacks and queues are useful ADTs allowing us to solve many different types of problems. Stacks are perfect for situations where we need to process the most recent data first, such as matching grouping symbols in arithmetic expressions and queues make good buffers for storing information that needs to be handled in a first in first out sequence. The restrictions on how to insert and remove information when using stacks and queues ensure that when we use them in these situations we properly access the data.

But what about problems where we need to retrieve information regardless of the order in which it is stored—for example obtaining bank account information based on an ID number or determining if a specific item is in stock at a warehouse? In these situations we need a structure that allows us to store information and retrieve it later based on some key part of the information. Many solutions to this problem exist—lists, maps, tables, search trees, and databases just to name a few. This chapter presents the most basic ADT that provides the required functionality—the Collection ADT.

Our Collection ADT allows us to collect together information for later access. The order in which we add information to a collection has no effect on when we can retrieve it. Once an object is added to a collection we can retrieve it and/or remove it based on its contents.

To provide content-based access the Collection ADT uses the equals method. All classes provide an equals method—if they do not implement the method directly then they inherit it. We know this because the equals method is implemented in the Object class that is the root of the inheritance tree (see Section 1.2, “Organizing Classes”). The equals method of the Object class only considers two objects to be equal if they are aliases of each other. Usually we want equality to be related to the contents of the objects, not their memory address. Therefore, objects stored in our collections should override the equals method of the Object class. We return to this topic in Section 5.4, “Comparing Objects Revisited.” For now we will use the String class in our examples—the String class does provide a content-based equals method.

Assumptions for Our Collections

Many variations of collections are possible. Here we define a basic collection, a content-based repository that supports addition, removal, and retrieval of elements.

Our collections allow duplicate elements. Element first is considered to be a duplicate of element second if first.equals(second) returns true. When an operation involves “finding” such an element, it can “find” any one of the duplicates. There is no distinction among duplicate elements in these cases.

As with our stacks and queues, collections do not support null elements. As a general precondition for all of our collection methods, null elements cannot be used as arguments. Rather than stating this precondition for each method, we state it once in the lead comment of the interface. This is a standard precondition when defining ADTs. It simplifies coding, which removes the need for testing for null pointers every time we use an argument within a method.

Other than prohibiting null elements, our collection classes have minimal preconditions for operations. For example, it is possible to specify a remove operation that requires a matching object to be present in the collection. We do not do this. Instead, we define a remove operation that returns a boolean value indicating whether the operation succeeded. This approach provides greater flexibility to applications. Similarly we could require that the add operation not be invoked on a full collection, perhaps stating that in such a situation it throws a “Collection Overflow” exception. Instead we use the same approach as with removeadd returns a boolean indicating success or failure of the operation. An attempt to add to a full collection would return false.

The Interface

As we did with stacks and queues, we capture the formal specifications of our Collection ADT using the Java interface construct. Our collections are generic—the type of object held by any particular collection is indicated by the client at the time the list is instantiated. It is the client’s responsibility to use a class whose equals method definition suits their needs. Here is the code for our CollectionInterface. Study the comments and method signatures to learn more about the details. As you can see, this interface is part of the ch05.collections package. The implementation classes are also stored in the same package.


//---------------------------------------------------------------------------
// CollectionInterface.java       by Dale/Joyce/Weems               Chapter 5
//
// Interface for a class that implements a collection of T.
// A collection allows addition, removal, and access of elements.
// Null elements are not allowed. Duplicate elements are allowed.
//---------------------------------------------------------------------------
package ch05.collections;

public interface CollectionInterface<T>
{
  boolean add(T element);
  // Attempts to add element to this collection.
  // Returns true if successful, false otherwise.

  T get(T target);
  // Returns an element e from this collection such that e.equals(target).
  // If no such element exists, returns null.

  boolean contains(T target);
  // Returns true if this collection contains an element e such that
  // e.equals(target); otherwise returns false.

  boolean remove (T target);
  // Removes an element e from this collection such that e.equals(target)
  // and returns true. If no such element exists, returns false.

  boolean isFull();
  // Returns true if this collection is full; otherwise, returns false.

  boolean isEmpty();
  // Returns true if this collection is empty; otherwise, returns false.


  int size();
  // Returns the number of elements in this collection.
}

Figure 5.4, in this chapter’s summary, shows the relationships among the primary classes and interfaces created to support our Collection ADT.

5.2 Array-Based Collection Implementation

This section develops an array-based implementation of our Collection ADT. We look at another array-based solution in Section 5.5, “Sorted Array-Based Collection Implementation,” and discuss a link-based solution in Section 5.6, “Link-Based Collection Implementation.”

Our basic approach is simple: If a collection has N elements, we hold the elements in the first N locations of the array, locations 0 to N-1. We maintain an instance variable, numElements, to hold the current number of elements in the collection. This example shows how we would represent a collection of country abbreviations:

When an element is added to the collection, we simply place it in the next available slot, the slot indicated by numElements, and increment numElements by one. The result of adding Columbia (“COL”) to the above collection is as follows:

When an element is removed we could shift all the elements at higher indices one position lower, to fill the gap left by the removal. Instead we take advantage of the unsorted nature of the array and simply move the element at the end of the collection into the position occupied by the element to be removed. Let us remove Mexico from the above collection:

The ArrayCollection class implements the CollectionInterface using the basic approach described above. The implementation of the ArrayCollection class is straightforward, especially if you are familiar with the array-based implementations of the Stack ADT from Chapter 2 and of the Queue ADT from Chapter 4.

As was the case with our array-based generic stacks and queues, the instantiation of an array as an array of Object, and subsequent casting of it as an array of T, typically generates a compiler warning. Even though this approach is somewhat awkward and results in a compiler warning, it is how we must create generic collections using arrays in Java. Two constructors are provided: one that instantiates a collection based on a default capacity of 100, and another that allows the client to indicate the capacity.

For our ArrayCollection class we place the code for array search in a protected helper method, called find, which is invoked by each method that needs to search for an element. The find method sets the values of the protected instance variables found and location, thereby indicating the results of the search. The algorithm used here for searching the array for an element is the Sequential Search algorithm presented in Section 1.6, “Comparing Algorithms: Order of Growth Analysis.” The find method is an example of procedural abstraction. Its existence removes the need to create similar code across the methods remove, contains, and get, and greatly simplifies the code for each of those methods.

Here is the code for the ArrayCollection class:

//---------------------------------------------------------------------------
// ArrayCollection.java          by Dale/Joyce/Weems                Chapter 5
//
// Implements the CollectionInterface using an array.
//
// Null elements are not allowed. Duplicate elements are allowed.
//
// Two constructors are provided: one that creates a collection of a default
// capacity, and one that allows the calling program to specify the capacity.
//---------------------------------------------------------------------------
package ch05.collections;

public class ArrayCollection<T> implements CollectionInterface<T>
{
  protected final int DEFCAP = 100; // default capacity
  protected T[] elements;           // array to hold collection's elements
  protected int numElements = 0;    // number of elements in this collection

  // set by find method
  protected boolean found;  // true if target found, otherwise false
  protected int location;   // indicates location of target if found

  public ArrayCollection()
  {
    elements = (T[]) new Object[DEFCAP];
  }

  public ArrayCollection(int capacity)
  {
    elements = (T[]) new Object[capacity];
  }

  protected void find(T target)
  // Searches elements for an occurrence of an element e such that
  // e.equals(target). If successful, sets instance variables found to true
  // and location to the index of e. If not successful, sets found to false.
  {
    location = 0; found = false;
    while (location < numElements)
    {
      if (elements[location].equals(target))
      {
        found = true;
        return;
      }
      else
        location++;
    }
  }

  public boolean add(T element)
  // Attempts to add element to this collection.
  // Returns true if successful, false otherwise.
  {
    if (isFull())
      return false;
    else
    {
      elements[numElements] = element;
      numElements++;
      return true;
    }
  }

  public boolean remove (T target)
  // Removes an element e from this collection such that e.equals(target)
  // and returns true; if no such element exists, returns false.
  {
    find(target);
    if (found)

    {
      elements[location] = elements[numElements - 1];
      elements[numElements - 1] = null;
      numElements--;
    }
    return found;
  }

  public boolean contains (T target)
  // Returns true if this collection contains an element e such that
  // e.equals(target); otherwise, returns false.
  {
     find(target);
     return found;
  }

  public T get(T target)
  // Returns an element e from this collection such that e.equals(target);
  // if no such element exists, returns null.
  {
    find(target);
    if (found)
      return elements[location];
    else
      return null;
  }

  public boolean isFull()
  // Returns true if this collection is full; otherwise, returns false.
  {
    return (numElements == elements.length);
  }

  public boolean isEmpty()
  // Returns true if this collection is empty; otherwise, returns false.
  {
    return (numElements == 0);
  }

  public int size()
  // Returns the number of elements in this collection.
  {
    return numElements;
  }
}

5.3 Application: Vocabulary Density

The vocabulary density of a text is the total number of words in the text divided by the number of unique words in the text. In this section we develop an application that computes the vocabulary density of a given text.

The problem is clear—read the text and count how many words it contains and also the number of unique words it contains. We define a word to be a sequence of letter characters (A through Z) plus the apostrophe character ( ’ ). The apostrophe is included so as to treat words such as “it’s” and “Molly’s” as a single word. We ignore the case of letters when determining uniqueness—for example, “The” and “the” are considered to be the same word.

To count the number of unique words we need to keep track of all the words encountered so far while reading through the text. Our Collection ADT is the perfect tool to accomplish this. The algorithm is simple—read through the text word by word, checking each word to see if it is already in the collection and if not, add it to the collection. When finished, the size of the collection provides us the number of unique words. As we proceed through the words we also keep track of the total number of words processed. We use the total number of words and the number of unique words to calculate the vocabulary density. More formally:

Input to our application will reside in a text file. The name/location of the file should be indicated as a command line argument to the program. Here is the application—the parts of the code directly related to the use of the Collection ADT are emphasized:

//---------------------------------------------------------------------------
// VocabularyDensity.java       by Dale/Joyce/Weems                 Chapter 5
//
// Displays the number of total words, unique words in the input text file,
// and the resulting vocabulary density.
// Input file indicated by a command line argument.
//---------------------------------------------------------------------------

package ch05.apps;

import java.io.*;
import java.util.*;
import ch05.collections.*;

public class VocabularyDensity
{
  public static void main(String[] args) throws IOException
  {
    final int CAPACITY = 1000;   // capacity of collection
    String fname = args[0];      // input file of text
    String word;                 // current word
    int numWords = 0;            // total number of words
    int uniqWords;               // number of unique words
    double density;              // vocabulary density

    CollectionInterface<String> words =
                                new ArrayCollection<String>(CAPACITY);

    // Set up file reading
    FileReader fin = new FileReader(fname);
    Scanner wordsIn = new Scanner(fin);
    wordsIn.useDelimiter("[^a-zA-Z']+"); // delimiters are nonletters,'

    while (wordsIn.hasNext())      // while more words to process
    {
      word = wordsIn.next();
      word = word.toLowerCase();
      if (!words.contains(word))
        words.add(word);
      numWords++;
    }

    density = (double)numWords/words.size();
    System.out.println("Analyzed file " + fname);
    System.out.println("
	Total words:  " + numWords);
    if (words.size() == CAPACITY)
      System.out.println("	Unique words: at least " + words.size());
    else

    {
      System.out.println("	Unique words: " + words.size());
      System.out.printf("
	Vocabulary density: %.2f", density);
    }
  }
}

Some observations about the application:

  • The code is short and simple—this is because so much of the complexity is handled by the ArrayCollection class. This is a good example of the power and utility of abstraction.

  • This program demonstrates how to read input from a file. The FileReader constructor will throw IOException if there are issues finding/opening the indicated file. Because that is a checked exception, we must either catch it or rethrow it. We chose the latter approach as you can see on the line declaring the main method. The FileReader fin is passed as an argument to a Scanner, allowing us to use the scanner to read the file.

  • We use Java’s Scanner to break the input file into words. It allows us to know when “there are more words to process” and allows us to “get the next word.” The Scanner class permits us to define the set of delimiters used to separate the tokens found in its input source by invoking its useDelimiter method. This method accepts a regular expression as an argument. In our program we indicate that the delimiters are “[^a-zA-Z’]+”; this regular expression means any sequence of one or more characters that are not a letter (“a-zA-Z” indicates all the characters between ‘a’ and ‘z’ plus all the characters between ‘A’ and ‘Z’) or an apostrophe (note the ’ that follows the ‘Z’ in the expression)—the ‘^’ symbol acts to negate that set of characters, and the + sign indicates “one or more” of the characters in the bracketed set that proceeds it. Thus, “one or more characters that are not a letter or an apostrophe.” See documentation about the Java Pattern class for more information.

  • The toLowerCase method of the String class changes all letters of a word to lowercase before checking if the word is in the collection, fulfilling the specification to ignore the case of letters when determining uniqueness.

  • The constant CAPACITY is used to instantiate the ArrayCollection variable words. Therefore, the words collection can hold at most CAPACITY words, in our case 1,000. If the capacity is reached we display the phrase “at least” because we cannot determine the exact number of unique words in the file. In that case we do not display the density as it would be meaningless.

  • Once the capacity is reached the add method no longer adds words to the collection—instead it simply returns the value false over and over as the program continues to process the input file. Exercise 5.12a asks you to modify the program so that in the case of reaching a full capacity it stops processing and outputs an appropriate message.

The reader is encouraged to test this program and to use it to analyze some text files of their choice. Here are the results of running the program on three related, important, historical documents—the 1215 British Magna Carta (translated to English from the original Latin but minus the list of those in attendance), the 1776 U.S. Declaration of Independence (minus the list of signatures), and the 1930 Indian Purna Swaraj:

Analyzed file BritishMagnaCarta.txt Analyzed file USAdeclaration.txt Analyzed file IndiaPurnaSwaraj.txt
Total words: 3310 Total words: 1341 Total words: 594
Unique words: 698 Unique words: 540 Unique words: 288
Vocabulary density: 4.74 Vocabulary density: 2.48 Vocabulary density: 2.06

5.4 Comparing Objects Revisited

With stacks and queues, we access only the ends of our structures. We push or pop the top element of a stack, or we enqueue a value at the tail of a queue and dequeue it from the head. We do not access elements stored at other places within the structure. With a collection, however, access is permitted to elements anywhere within the structure. Therefore, to understand collections we need to understand how an element within a structure is identified—we need to understand equality of objects.

In the next section we consider an approach to storing the elements of a collection that improves the efficiency of finding any given element. If we can store the elements in “increasing order” then we can use the binary search algorithm to locate them. But what does it mean to store elements “in order”? To understand this we need to understand object comparison.

This section reviews both the equals method, used to determine equality, and the compareTo method (along with the Comparable interface), used to determine order.

The equals Method

Section 1.5, “Basic Structuring Mechanisms,” discussed comparing objects using the comparison operator (==). Recall that when using ==, the comparison is actually made between the contents of the two reference variables that point to the objects, and not between the contents of the objects themselves. This is demonstrated in Figure 5.1 (which replicates Figure 1.6).

The comparison operator does not compare the contents of objects. What else can we do? How do we compare the actual objects? One option is to use the equals method. Because this method is exported from the Object class that is the root of the Java inheritance tree, it can be used with objects of any Java class. If c1 and c2 are objects of the class Circle, then we can compare them using:

c1.equals(c2)

Figure 5.1 Comparing primitive and nonprimitive variables

The equals method, as defined in the Object class, acts in much the same way as the comparison operator. It returns true if and only if the two variables reference the same object. To circumvent this problem we can, within a class, redefine the equals method to fit the purpose of the class.

Suppose we have a Circle class that features a radius attribute of type int. A reasonable definition for equality of Circle objects is that they are equal if they have the same radii. To implement this approach, we include the following method in the Circle class:

@Override
public boolean equals(Object obj)
// Returns true if obj is a Circle with same radius as this Circle,
// otherwise returns false.
{
   if (obj == this)
      return true;
   else
   if (obj == null || obj.getClass() != this.getClass())
      return false;
   else
   {
      Circle c = (Circle) obj;
      return (this.radius == c.radius);
   }
}

This design of the equals method reflects standard good practice. A few notes about the design follow:

  • The @Override notation indicates to the compiler that we are overriding an ancestor’s method. The compiler double-checks our syntax and ensures that we are in fact overriding a method.

  • The first if clause checks to see if we are comparing a circle to itself. In that case we return true.

  • The second if clause checks to see if the argument is null. If so we return false. Due to the short-circuit nature of the Java || operator the second part of the or expression is only evaluated if the first part is false, protecting us from a null pointer exception. In the second part of the expression we check to make sure the two objects being compared are of the same type—if not we return false.

  • If processing reaches the final else clause we can safely cast the obj argument as a Circle and compare the two radii.

Now when a statement such as

c1.equals(c2)

is encountered, the customized equals method of the Circle class is used, rather than the equals method of the Object class. Even though c1 and c2 may reference different Circle objects, if those objects have equal radii, the equals method returns true. As discussed previously in Section 5.1, “The Collection Interface,” we sometimes refer to the attribute, or combination of attributes, of an object used by the equals method to determine equality as the key of the class. The key of a class, from the point of view of an application, is the set of attributes that are used to determine the identity of an object of the class, for that application. For example, if an application uses the Circle class equals method to determine identity, then the radius of a circle is the key.

The FamousPerson class

Let us look at another example, one that we will use in several applications. This example should help clarify what is meant by the key attributes of a class. Consider a class, FamousPerson, whose objects will represent famous people. The information we wish to capture about a famous person is their name (first name and last name), the year they were born, and some short interesting fact about the person.

The FamousPerson class can be found in the support package. It includes protected variables firstName, lastName, yearOfBirth, and fact to hold its attributes, all of which are initialized through its constructor. The class also includes getters plus equals, compareTo, and toString methods. Here we show the equals method that demonstrates that equality can sometimes depend on a subset of the attributes, in this case the first and last name attributes. We say that the combination of firstName and lastName is the key of the class.


@Override
public boolean equals(Object obj)
// Returns true if 'obj' is a FamousPerson with same first and last
// names as this FamousPerson, otherwise returns false.
{
   if (obj == this)
      return true;
   else
   if (obj == null || obj.getClass() != this.getClass())
       return false;
   else
   {
      FamousPerson fp = (FamousPerson) obj;
      return (this.firstName.equals(fp.firstName) &&
              this.lastName.equals(fp.lastName));
   }
}

Let us create an application that uses our Collection ADT and the FamousPerson class. A text file named FamousCS.txt is included in the input folder—it contains information about famous computer scientists, one scientist per line, in the format: first name, last name, year of birth, and fact. We make the simplifying assumptions that the only commas in the file are those used to separate the information, and that the scientists’ names do not contain “white space.”

Our application will read the information from the file and store it in a collection of FamousPerson. It will then interact with the user, displaying the information about the scientists based on the user’s queries. The application is easy to write because the more complex parts are already implemented by the ArrayCollection and FamousPerson classes. We call the application CSInfo, representing Computer Scientist Information.

//---------------------------------------------------------------------------
// CSInfo.java              by Dale/Joyce/Weems                     Chapter 5
//
// Reads information about famous computer scientists from the file
// FamousCS.txt. Allows user to enter names and provides them the information
// from the file that matches the name.
//---------------------------------------------------------------------------
package ch05.apps;

import java.io.*;
import java.util.*;
import ch05.collections.*;
import support.FamousPerson;

public class CSInfo
{
  public static void main(String[] args) throws IOException
  {
    // instantiate collection
    final int CAPACITY = 300;
    ArrayCollection<FamousPerson> people
          = new ArrayCollection<FamousPerson>(CAPACITY);

    // set up file reading
    FileReader fin = new FileReader("input/FamousCS.txt");
    Scanner info = new Scanner(fin);
    info.useDelimiter("[,\n]");  // delimiters are commas, line feeds

    Scanner scan = new Scanner(System.in);
    FamousPerson person;
    String fname, lname, fact;
    int year;

    // read the info from the file and put in collection
    while (info.hasNext())
    {
      fname = info.next();   lname = info.next();
      year = info.nextInt(); fact = info.next();
      person = new FamousPerson(fname, lname, year, fact);
      people.add(person);
    }

    // interact with user, getting names and displaying info
    final String STOP = "X";
    System.out.println("Enter names of computer scientists.");
    System.out.println("Enter: firstname lastname (" + STOP + " to exit)
");
    fname = null; lname = null;
    while (!STOP.equals(fname))
    {
      System.out.print("Name> ");
      fname = scan.next();
      if (!STOP.equals(fname))
      {
         lname = scan.next();
         person = new FamousPerson(fname, lname, 0, "");
         if (people.contains(person))

         {
           person = people.get(person);
           System.out.println(person);
         }
         else
           System.out.println("No information available
");
      }
    }
  }
}

Here is a sample run of the application:

Enter names of computer scientists.
Enter: firstname lastname (X to exit)

Name> Ada Lovelace
Ada Lovelace(Born 1815): Considered by many to be first computer programmer.

Name> Molly Joyce
No information available

Name> Edsger Dijkstra
Edsger Dijkstra(Born 1930): 1972 Turing Award winner.

Name> X

As you study the code of the application, take particular note of the following line:

person = people.get(person);

It might seem strange to be passing person to the collection to get back a FamousPerson object and assign it to person. This is an example of how the key attributes of an object let us retrieve information from a collection. Suppose the user enters “Ada Lovelace” in response to the name prompt. In that case the line

person = new FamousPerson(fname, lname, 0, "");

creates a new Person object with attributes

firstName: "Ada" lastName: "Lovelace" yearOfBirth: 0
fact: ""

When the get method is invoked with this object as an argument, the object is compared to each of the objects held in the collection—based on the way equals is defined for FamousPerson objects, a match is found for the Person object with attributes

firstName: "Ada" lastName: "Lovelace" yearOfBirth: 1815
fact: "Considered by many to be first computer programmer."

It is this matched object that is returned, that is a reference to it is returned, and assigned to the person variable. Now the person variable references the object that is in the collection, the object with the complete information, and it is this information that is used in the println statement on the following line. As long as we provide a partial object that contains the correct key information, we can retrieve the complete object from the collection.

The Comparable Interface

The equals method allows us to check whether a particular element is in a collection. But in addition to checking objects for equality, we need another type of comparison. We often need to be able to tell when one object is less than, equal to, or greater than another object. The Java library provides an interface, called Comparable, that can be used to ensure that a class provides this functionality.

The Comparable interface consists of exactly one abstract method:

public int compareTo(T o);
// Returns a negative integer, zero, or a positive integer as this object
// is less than, equal to, or greater than the specified object.

The compareTo method returns an integer value that indicates the relative “size” relationship between the object upon which the method is invoked and the object passed to the method as an argument. As we see in the next section, this information can be used to order the objects of a collection within the internal representation structure, facilitating faster search. By convention the compareTo method of a class should support the standard order of a class—lexicographic order for strings, numeric order for numbers, low score to high score for bowlers, high score to low score for golfers, and so on. We call the order established by a class compareTo method the natural order of the class.

Any class that implements the Comparable interface defines its own compareTo method, with a signature that matches the abstract method defined in the interface. After all, the implementer of the class is in the best position to define how objects of the class should be compared—to know the natural order.

The Java Comparable interface is able to work with generics. Use of generic types with the compareTo method helps ensure that comparison takes place only between compatible objects.

Let us return to our famous person example. Recalling that people are compared on their first and last names when determining equality, we use the same two fields when determining relative order. Following standard procedure we first compare last names using the String class equals method—only if the two last names under consideration are equal do we turn our attention to the first names. When a key consists of multiple attributes, starting with the most significant attribute, first is the best approach. Here is the compareTo method implemented within our FamousPerson class (our FamousPerson class does implement the Comparable interface):


public int compareTo(FamousPerson other)
// Precondition: 'other' is not null
//
// Compares this FamousPerson with 'other' for order. Returns a
// negative integer, zero, or a positive integer as this object
// is less than, equal to, or greater than 'other'.
{
  if (!this.lastName.equals(other.lastName))
    return this.lastName.compareTo(other.lastName);
  else
    return this.firstName.compareTo(other.firstName);
}

Our compareTo method uses the compareTo method of the String class. This makes sense because the attributes upon which we are basing the comparison are strings. Notice that the equals method described previously and the compareTo method described here are consistent with each other. In other words, the equals method returns true for two people if and only if the compareTo method returns 0 for the two people. It is good programming practice to ensure consistency between these two methods. Unlike with the equals method it is common to assume the argument passed to compareTo is not null.

5.5 Sorted Array-Based Collection Implementation

In Section 5.2, “Array-Based Collection Implementation,” we implemented a collection class that used an array. The ArrayCollection implementation featured a constant time add, but all the other basic operations—get, contains, and remove—require O(N) time. Consider an application that builds a large collection and then repeatedly accesses the collection to see if it contains various elements—perhaps a spell-checker application. Although such an application can make use of the fast add operation, it might benefit even more from a fast contains operation. Is there some way we can structure our stored data so as to facilitate a faster search?

In Section 1.6, “Comparing Algorithms: Order of Growth Analysis,” and Section 3.3, “Recursive Processing of Arrays,” we studied the binary search algorithm—an O(log2N) algorithm for finding an element in a sorted array. If we use an array as the internal representation for a collection implementation, and keep that array in sorted order, then the binary search algorithm can be used to find elements in the collection. Not only will this significantly speed up the contains operation but since get and remove both require that we first find the element in question, those operations can also benefit. In this section we take this approach. We call our implementation the SortedArrayCollection.

Our goal with this new Collection ADT implementation is to increase the efficiency of the operations. Efficiency is typically only a concern for a collection when the collection holds a large number of elements. It makes sense, therefore, to make this implementation of the Collection ADT unbounded. We do this using the same approach we did with our array-based unbounded queue implementation—we include a protected enlarge method that can be invoked internally by the public add method when needed.

Comparable Elements

As elements are added to our collection we intend to keep the protected array sorted according to the natural order of the elements—therefore we intend to use the compareTo method associated with the elements. In order to do this we must ensure that only classes that implement the Comparable interface are used with our collection.

Is it possible to force applications to use Comparable classes when providing an argument for a generic type? Yes it is. We can use what is called a “bounded type.” For example, we could design a class Duo that holds a pair of elements and that provides a largest operation to return the larger of the two elements:

public class Duo<T extends Comparable<T>>
{
  protected T first;
  protected T second;

  public Duo(T f, T s)
  {
    first = f; second = s;
  }

  public T larger()
  {
    if (first.compareTo(second) > 0)
       return first;
    else
       return second;
  }
}

Note that in the class declaration instead of indicating the generic parameter as <T> we used <T extends Comparable <T>>. This indicates that the generic argument must be of a class that implements the Comparable interface. If we instantiate a Duo object using a class that does not implement Comparable, a syntax error is generated.

Although it appears we can use the above approach to solve our problem of assuring that only Comparable classes are used with our SortedArrayCollection classes, it turns out that is not the case. Because we wish to use an array as our internal representation structure, we cannot use this approach due to type safety issues and differences between the way arrays and generic types are treated by Java. Attempting to declare an array of <T extends Comparable<T>> results in a syntax error. So we need to find a different approach.

Rather than try to enforce the use of Comparable classes using syntactic means we will use our “programming by contract” approach. We specify as a precondition of the add method that its argument is comparable to the previous objects added to the collection. As the add method is the only way for an element to get into the collection, we are guaranteed that all elements held in our SortedArrayCollection can be compared to each other. If a client class ignores the contract then it is likely that an exception will be thrown when they use the class.

The Implementation

The SortedArrayCollection class implements the CollectionInterface using an array. For this class we are concerned about the order in which we keep elements stored—the array must be kept sorted. Nevertheless, the implementation of the SortedArrayCollection class has much in common with the implementation of the ArrayCollection class developed in Section 5.2 “Array-Based Collection Implementation,” so we do not provide complete code listings here. The code for the new class can be found in the ch05.collections package. A few notes follow.

  • As already mentioned, this implementation is unbounded, using a protected enlarge method as needed to increase the size of the internal representation array by the original capacity.

  • The class includes a protected find method that serves the same function as the find method of the ArrayCollection class—search the collection for an occurrence of an element that is equal to the target element, and if successful set the instance variable found to true and set the instance variable location to the index where the target was found. For this class, however, the find method makes use of the binary search algorithm to provide excellent efficiency. The recursive approach to the binary search algorithm described in Section 3.3, “Recursive Processing of Arrays,” is used.

  • During the search process the compareTo operation is used. Therefore, within the find method we must cast elements as Comparable, so that the Java compiler will accept our use of the compareTo method. Some compilers will generate a warning message regarding an unchecked call to compareTo, because there is no way for the compiler to verify that the generic type T will actually implement Comparable. Because we understand the reason for the warning message and because our precondition prohibits the use of elements that do not implement Comparable, we can safely ignore the compiler warning. An application that ignores the precondition and adds elements to the SortedArrayCollection that are not Comparable will cause a type mismatch exception to be thrown—which is exactly the result we desire because incomparable elements should not be added to a sorted collection.

  • Unlike with the ArrayCollection class we cannot simply add a new element in the next available location of the array. So that the add method can also make use of the find method, to determine where the new element should be inserted, the find method of the SortedArrayCollection class has additional functionality. In the case of an "unsuccessful" search, the find method sets the instance variable found to false and sets the instance variable location to the index where the target should be inserted.

  • Once the index where the addition of the new element is identified, the add method creates room for the new element by shifting all of the elements from that index through the end of the collection, one index higher, and then inserts the new element in the free slot that is created:

    The code for the add method is as follows:

    public boolean add(T element)
    // Precondition:  element is Comparable to previously added objects
    //
    // Adds element to this collection.
    {
      if (numElements == elements.length)
        enlarge();
    
      find(element); // sets location to index where element belongs
    
      for (int index = numElements; index > location; index--)
        elements [index] = elements[index - 1];
    
      elements [location] = element;
      numElements++;
      return true;
    }

    Note that since the SortedArrayCollection is unbounded the add method always returns true—it should always be able to add the given element successfully.

  • Similarly, the remove method shifts elements one index lower in order to remove the targeted element without disrupting the already sorted array.

Implementing ADTs “by Copy” or “by Reference”

When designing an ADT, such as for a stack, queue, or collection, we have a choice about how to handle the elements—“by copy” or “by reference.”

By Copy

With this approach, the ADT manipulates copies of the data used in the client program. When the ADT is presented with a data element to store, it makes a copy of the element and stores that copy. Making a valid copy of an object can be a complicated process, especially if the object is composed of other objects. Valid copies of an object are typically created using the object’s clone method. Classes that provide a clone method must indicate this fact to the run-time system by implementing the Cloneable interface. In the examples that follow, we assume the object classes provide rigorous clone methods and implement the Cloneable interface. In that case, code for an unsorted collection add operation might be

public void add (T element)
// Adds a copy of the element to this collection
{
  elements[numElements] = element.clone();
  numElements++;
}

In Java, if the collection elements are objects, then it is really a reference to a copy of the element that is stored—because all Java objects are manipulated by reference. The key distinction here is that it is a reference to a copy of the element, and not a reference to the element itself, that is stored.

Similarly, when an ADT returns an element using the “by copy” approach, it actually returns a reference to a copy of the element. As an example, consider the code for a collection get operation:

public T get(T target)
// Returns a copy of element e from this collection such that e.equals(target);
// if no such element exists, returns null.
{
  find(target);
  if (found)
    return elements[location].clone();
  else
    return null;
}

This approach provides strong information hiding. In effect, the ADT is providing a separate repository for a copy of the client’s data.

By Reference

For this approach, an ADT manipulates references to the actual elements passed to it by the client program. For example, code for an unsorted collection add operation might be

public void add (T element)
// Adds an element to this collection
{
  elements[numElements] = element;
  numElements++;
}

Because the client program retains a reference to the element, we have exposed the contents of the collection ADT to the client program. The ADT still hides the way the data is organized—for example, the use of an array of objects—but it allows direct access to the individual elements of the collection by the client program through the client program’s own references. In effect, the ADT provides an organization for the original client data.

The “by reference” method is the most commonly used approach and the one we use throughout this text. It has the advantage that it takes less time and space than the “by copy” method. Copying objects takes time, especially if the objects are large and require complicated deep-copying methods. Storing extra copies of objects also requires extra memory. Thus, the “by reference” approach is an attractive strategy.

When we use the “by reference” approach, we create aliases of our elements, so we must deal with the potential problems associated with aliases. If our data elements are immutable, then no problems will occur. If the elements can be changed, however, problems can arise.

If an element is accessed and changed through one alias, it could disrupt the status of the element when it is accessed through the other alias. This situation is especially dangerous if the client program can change an attribute that the ADT uses to determine the internal organization of the elements. For example, if the client changes an attribute that determines an object’s position within a sorted collection, then the object may no longer be in the proper place. Because the change did not go through a method of the sorted collection class, the class has no way of knowing that it should correct this situation. A subsequent get operation on this collection would likely fail.

An Example

The diagrams in Figure 5.2 show the ramifications of both approaches. Suppose we have objects that hold a person’s name and weight. Further suppose that we have a collection of these objects sorted by the variable weight. We add three objects into the collection, and then transform one of the objects with a diet method that changes the weight of the object. The left side of the figure models the approach of storing references to copies of the objects—the “by copy” approach. The right side models the approach of storing references to the original objects—the “by reference” approach.

The middle section of the figure, showing the state of things after the objects have been inserted into the collections, clearly demonstrates the differences in the internal implementations. The “by copy” approach creates copies and the collection elements reference them; these copies take up space that is not required in the “by reference” approach. It is also clear from the right side of the figure that the “by reference” approach creates aliases for the objects, as we can see more than one reference to an object. In both approaches, the collection elements are kept sorted by weight.

The situation becomes more interesting when we modify one of the objects. When the person represented by the S1 object loses some weight, the diet method is invoked to decrease the weight of the object. In this scenario, both approaches display problems. In the “by copy” approach, we see that the S1 object has been updated. The copy of the S1 object maintained in the collection is clearly out of date. It holds the old weight value. A programmer must remember that such a collection stores only the values of objects as they existed at the time of the add operation and that changes to those objects are not reflected in the objects stored on the collection. The programmer must design the code to update the collection, if appropriate.

In the “by reference” approach, the object referred to by the collection contains the up-to-date weight information, because it is the same object referred to by the S1 variable. The collection, however, is no longer sorted by the weight attribute. Because the update to weight took place without any collection activity, the collection objects remain in the same order as before. The collection structure is now corrupt, and calls to the collection methods may behave unpredictably. Instead of directly updating the S1 object, the program should have removed the object from the collection, updated the object, and then reinserted it into the collection.

Figure 5.2 Store “by copy” versus store “by reference”

Summation

Which approach is better? That depends. If processing time and space are issues, and if we are comfortable counting on the application programs to behave properly, then the “by reference” approach is probably the best choice. If we are not overly concerned about time and space usage (maybe our collection objects are not too large), but we are concerned with maintaining careful control over the access to and integrity of our collections, then the “by copy” approach is probably the best choice. The suitability of either approach depends on how the collection is used.

Sample Application

To provide evidence that our new collection implementation works correctly we return to the vocabulary density problem introduced in Section 5.3, “Application: Vocabulary Density.” The application developed in that section used the ArrayCollection class:

CollectionInterface<String> words =
                             new ArrayCollection<String>(CAPACITY);

The only thing we need to do, to use our new implementation, is to change the above line to use the SortedArrayCollection class:

CollectionInterface<String> words =
                             new SortedArrayCollection<String>(CAPACITY);

To compare the two approaches in terms of time efficiency we used a suite of text files, most obtained from the Project Gutenberg website, as input to both versions of the application. We timed the programs using the currentTimeMillis method of the Java Library System class. Although this approach to measuring execution time is not extremely precise, it is good enough for our purposes here.

See Table 5.1 for the results of our experiment. The text files are presented in the table from smallest to largest. The “Linux Word File” is an alphabetical list of words used by the Linux Operating System’s spell-checker program. It has the lowest density, essentially 1 (due to some words, for example “unix,” being repeated in the file with different capitalization schemes, the number of words and number of unique words are not exactly the same). Other than that special case, the density tends to increase as the size of the file increases, as is expected. The “Mashup” file is a concatenation of 121 files from the Gutenberg site consisting of all the other files on the list (except the Linux Word File) plus other novels, “complete works,” dictionaries, and technical manuals including Spanish, French, and German language texts in addition to the English.

Before running the program on a particular text file we made sure that the instantiated collection would be large enough to hold all of the unique words in the file—this way we guaranteed that the bounded ArrayCollection was able to handle the problem and we did not “penalize” the unbounded SortedArrayCollection for time spent enlarging its internal storage.

Table 5.1 Results of Vocabulary Density Experiment1

Text File Size Results Array-Collection Sorted-Array-Collection
Shakespeare’s 18th Sonnet 1 KB words:
unique:
density:
114
83
1.37
20 msecs 23 msecs
Shakespeare’s Hamlet 177 KB words:
unique:
density:
32,247
4,790
6.73
236 msecs 128 msecs
Linux Word File 400 KB words:
unique:
density:
45,404
45,371
1.00
9,100 msecs 182 msecs
Melville’s Moby-Dick 1,227 KB words:
unique:
density:
216,113
17,497
12.35
2,278 msecs
or
2.3 seconds
382 msecs
The Complete Works of William Shakespeare 5,542 KB words:
unique:
density:
900,271
26,961
33.39
9.7 seconds 1.2 seconds
Webster’s Unabridged Dictionary 28,278 KB words:
unique:
density:
4,669,130
206,981
22.56
4.7 minutes 9.5 seconds
11th Edition of the Encyclopaedia Britannica 291,644 KB words:
unique:
density:
47,611,399
695,531
68.45
56.4 minutes 2.5 minutes
Mashup 608,274 KB words:
unique:
density:
102,635,256
1,202,099
85.38
10 hours 7.2 minutes

The benefits of the sorted array approach are clear from the table. As the size of the input increases, the fast performance of the binary search used by the SortedArrayCollection as compared to the sequential search used by the ArrayCollection become obvious. This is most evident in the comparison of times for the large Mashup file, with the array-based approach requiring over 10 hours and the sorted array approach taking only 7 minutes, on the author’s desktop computer.

5.6 Link-Based Collection Implementation

In this section we design a link-based implementation of our Collection ADT. The internal representation structure will be an unsorted linked list. You will discover that most of the detailed work is already finished as we will be able to reuse code from previous ADT implementations. Throughout the discussion we compare and contrast the implementation of the operations within the previously defined ArrayCollection class and the LinkedCollection class currently under consideration. To study the code details not shown here please see the LinkedCollection class in the ch05.collections package.

The Internal Representation

As we did for our link-based stacks and queues, we again use the LLNode class from the support package to provide our nodes. The information attribute of a node contains the collection element; the link attribute contains a reference to the node holding the next collection element. We maintain a variable, head of type LLNode<T> that references the first node in the linked list. This linked list acts as the internal representation for the collection. It is an unsorted linked list. The start of the class is

package ch05.collections;
import support.LLNode;

public class LinkedCollection<T> implements CollectionInterface<T>
{
  protected LLNode<T> head;        // head of the linked list
  protected int numElements = 0;   // number of elements in this collection

  // set by find method
  protected boolean found;         // true if target found, else false
  protected LLNode<T> location;    // node containing target, if found
  protected LLNode<T> previous;    // node preceding location

  public LinkedCollection()
  {
    numElements = 0;
    head = null;
  }

There is only one constructor provided for LinkedCollection. There is no need to deal with capacity as there was with the array-based implementations. The constructor sets the instance variables numElements and head, essentially constructing an empty collection.

The Operations

Why do we need a numElements instance variable? Recall that in the array-based approach, the numElements variable was used to indicate the first empty array location. Because we are now using a linked list, we do not have array locations. Nevertheless, to support the size operation we still maintain numElements, incrementing it whenever an element is added to the collection and decrementing it whenever an element is removed. The size method simply returns the value of numElements just as it does in ArrayCollection. The isEmpty method also remains unchanged from the array-based approach and the isFull method just trivially returns false because our linked structure is never full.

The add method is passed an argument of class T, and adds it to the collection. Because the internal linked list is unsorted, and order of the elements is not important, we can just add new elements to the front of the linked list. This is the easiest, most efficient approach, because head already provides a reference to this location. The code for the add method is very similar to that of the push method in the LinkedStack class developed in Section 2.8, “A Link-Based Stack.” Note that since our LinkedCollection is unbounded, the add method always returns the value true—it should always be able to add the given element successfully.

public boolean add(T element)
// Adds element to this collection.
{
  LLNode<T> newNode = new LLNode<T>(element);
  newNode.setLink(head);
  head = newNode;
  numElements++;
  return true;
}

As we did for both the unsorted and sorted array-based implementation approaches, we create a protected helper method for our linked approach and call it find. Recall that this method sets the values of the instance variables found and location, which can then be used by the other methods. Given that the find method works correctly, coding both the contains and get methods become easy—in fact they are exactly the same as their counterparts in the array-based implementations:

public boolean contains (T target)      public T get(T target)
{                                       {
  find(target);                           find(target);
  return found;                           if (found)
}                                           return location.getInfo();
                                          else
                                            return null;
                                        }

The protected find method is also used by the remove method—we discuss remove below. For now, let us consider find in more detail. It follows the same algorithm as the find method of the array-based implementation: Walk through the internal representation of elements until the target element is found or the end of the representation is reached. The only difference is the use of linked list statements instead of array-related statements. Here is the code for both methods, for comparison. In the array-based approach, location is of type int, indicating the array index of the target element; in the link-based approach, location is an LLNode, indicating the node containing the target element.

Array-Based
protected void find(T target)
{
  location = 0;
  found = false;
  while (location < numElements)
  {
    if (elements[location].equals(target))
    {
      found = true;
      return;
    }
    else
      location++;
  }
}
Link-Based
protected void find(T target)
{
  location = head;
  found = false;
  while (location != null)
  {
    if (location.getInfo().equals(target))
     {
      found = true;
      return;
    }
    else
    {
      previous = location;
      location = location.getLink();
    }
  }
}

Figure 5.3 Removing element at location from a linked collection

Actually, we lied. There is another difference between the two find implementations. Do you see it? The link-based implementation assigns a value to a variable named previous that is not mentioned in the array-based implementation. This variable is used by the LinkedCollection remove method. We will take a closer look now at this method.

To remove the target, we must first find it. We do so using the find method, which sets the location variable to indicate the target element. As shown in Figure 5.3, however, to actually remove the node referenced by location, we must change the reference in the previous node. That is, we must change the link of the previous node to reference the node following the one being removed. We “jump over” the removed node. This is where we use the previous variable.

Not only does find set location, but it also sets previous so that the remove method has access to the previous node and can implement the “jump over” step. This is accomplished by remove with the statement:

previous.setLink(location.getLink());

Removing the first node must be treated as a special case because the main reference to the linked list (head) must be changed. We handle that special case with an if statement at the beginning of the code for remove. Is removing the last node a special case that requires additional code? No. The link within the last node is null. Therefore, in the case of removing the last node the above statement correctly sets the value of the link of the previous node to null, indicating that it is now the end of the list. Here is remove:

public boolean remove (T target)
// Removes an element e from this collection such that e.equals(target)
// and returns true; if no such element exists, returns false.
{
  find(target);
  if (found)
  {
    if (head == location)
      head = head.getLink();    // remove first node
    else
      previous.setLink(location.getLink());  // remove node at location

    numElements--;
  }
  return found;
}

Comparing Collection Implementations

We have now looked at several different implementations of the Collection ADT. How do they compare? Let us briefly review each implementation—a summary of this discussion is found in Table 5.2.

All three implementation approaches, the ArrayCollection, SortedArrayCollection, and LinkedCollection, maintain an instance variable numElements that holds the current size of the collection. This allows simple O(1) implementations of the size, isEmpty, and isFull operations.

Our ArrayCollection implementation is a bounded implementation. It is the responsibility of the client to indicate the capacity of an ArrayCollection object at instantiation. If an add operation is invoked on a full collection the add operation returns false and there is no change to the collection. A client must therefore ensure the initial capacity is large enough to handle the problem they are addressing. Being an array-based implementation the constructor is an O(N) operation, where N indicates the capacity—for any particular collection object we only execute the constructor once. The array in this implementation is unsorted—meaning the add operation is trivial and is O(1), whereas to locate a target element in the array for the contains, get, and remove operations requires a O(N) sequential search.

Table 5.2 Comparison of Collection Implementations

Storage Structure ArrayCollection Unsorted array SortedArrayCollection Sorted array LinkedCollection Unsorted linked list
Space Bounded by original capacity Unbounded—invokes enlarge method as needed Unbounded—grows and shrinks
Class constructor O(N) O(N) O(1)
size
isEmpty
isFull
O(1) O(1) O(1)
contains O(N) O(log2N) O(N)
get O(N) O(log2N) O(N)
add O(1) O(N) O(1)
remove O(N) O(N) O(N)

The SortedArrayCollection is also an array-based implementation; however, in this case we decided to make it an unbounded array-based implementation. If an add operation is invoked when the array is full then additional space is created using the enlarge method. This approach relieves the client of the responsibility of predetermining the needed capacity. Keeping the internal array sorted requires a more complex add method with a cost of O(N). However, it also means that the cost of locating a target element is now O(log2N) as we can use the binary search algorithm. This significantly lowers the time cost of both contains and get.

Capacity is not a concern for the LinkedCollection. In this unbounded implementation the amount of space used to hold the collection grows and shrinks along with the collection. As we pointed out when discussing linked implementations previously, we do require two reference variables for each element in the collection—one to hold the link and one to reference the collection element. The constructor for the LinkedCollection simply initializes the instance variables. Other than the constructor, the efficiency of the unsorted linked list approach is the same as that of the unsorted array approach. For both approaches it makes no difference where we add an element so the add method is fast. On the other hand, searching for a target element requires traversing the collection element by element.

5.7 Collection Variations

If you reflect on our Collection ADT you will realize it offers simple but crucial functionality–the ability to store and retrieve information. This functionality sits at the heart of information processing. Data structures, file systems, memory/storage, databases, the Cloud, the Internet all involve, at their core, storing and retrieving information.

Given the central importance of collections, this section about variations could go on and on! The lists, search trees, maps, hash tables, and priority queues we study in the upcoming chapters can all be considered forms of collections. Even the stacks and queues that we covered previously are collections, albeit collections with specific rules about addition and removal of elements. We will first briefly review the role of collections within the Java library. We then take a look at two commonly used collections not covered elsewhere in the text—bags and sets.

The Java Collections Framework

It is not surprising that the Java platform includes a "Collections Framework"—a unified architecture of classes and interfaces that provides powerful, flexible tools for storing and retrieving information. At the center of the framework is the Collection interface, found in the java.util package of the library. This interface supports 11 subinterfaces including Deque, List, and Set and has 33 implementing classes including the somewhat familiar ArrayList, LinkedList, Stack, and PriorityQueue and exotic-sounding classes such as BeanContextServicesSupport and RoleUnresolvedList.

There are 15 abstract methods defined within the Java Collection interface, including methods that mirror our isEmpty, size, add, remove, and contains methods. Note, however, that the interface does not define anything analogous to our get method. So how is one supposed to retrieve information from a collection? The answer is that the designers of the Collections Framework do not intend for anyone to directly implement the Collection interface. Instead it is used as the root of an inheritance tree of interfaces as previously mentioned, and it is those interfaces that are implemented. Various methods similar to our get method are defined within those interfaces.

It is instructive to look at some of the additional method definitions from the Collection interface:

toArray Returns an array containing all of the elements of the collection
clear Removes all elements
equals Takes an Object argument, returning true if it is equal to the current collection and false otherwise
addAll Takes a Collection argument and adds its contents to the current collection; returns a boolean indicating success or failure
retainAll Takes a Collection argument and removes any elements from the current collection that are not in the argument collection; returns a boolean indicating success or failure
removeAll Takes a Collection argument and removes any elements from the current collection that are also in the argument collection

 

Methods such as equals, addAll, retainAll, and removeAll allow clients to use collections as more than just separate repositories—collections can be combined and compared. The Set ADT we define below is a collection intended to be used in that fashion.

As mentioned in Chapter 2, in this text we do not cover the Collections Framework in great detail. This text is meant to teach you about the fundamental nature of data structures and to demonstrate how to define, implement, and use them. It is not an exploration of how to use Java’s specific library architecture of similar structures. If you are interested in learning more about the Java Collections Framework, you can study the extensive documentation available at Oracle’s website.

The Bag ADT

We want to define an ADT that fits our everyday notion of a bag—a bag of marbles, a bag of groceries, a bag of money! What assumptions should we make about our bag? What operations should we include?

Think about a bag of marbles. Suppose marbles are identified by their color. It certainly is possible to place more than one red marble into our bag—we decide that our Bag ADT should allow duplicate elements.

We can put marbles into the bag, look into the bag to see if it contains a particular marble (“is there a green marble in there?”) and reach in and remove a specific marble. These activities correspond to our Collection ADT’s add, contains, and remove operations. We can check to see if the bag is full or is empty and we could look inside and count how many marbles it contains—our isFull, isEmpty, and size operations. With a bit of a stretch of imagination we can think about reaching into the bag and pointing to a marble (which is essentially what our get operation does). In short, all of the operations we defined for our Collection ADT also make sense for our Bag ADT.

What other operations can we include? We could close our eyes and reach into the bag and draw out a random marble (“I hope I get the blue one”). We could peek into the bag and count all marbles of a specified color (“How many red ones are there?”) or remove all marbles of a specified color (“I don’t like the purple ones”). We could turn the bag upside down and empty it (“Watch out, here they come!”). We will add grab, count, removeAll, and clear to our Bag operations. Their formal definitions are included in our BagInterface listed below. As with our Collection ADT, our Bag ADT requires a valid equals method to be defined for the objects that it contains.

//---------------------------------------------------------------------------
// BagInterface.java           by Dale/Joyce/Weems                  Chapter 5
//
// Interface for a class that implements a bag of T.
// A bag is a collection that supports a few extra operations.
//---------------------------------------------------------------------------
package ch05.collections;

public interface BagInterface<T> extends CollectionInterface<T>
{
  T grab();
  // If this bag is not empty, removes and returns a random element of the
  // bag; otherwise returns null.

  int count(T target);
  // Returns a count of all elements e in this collection such that
  // e.equals(target).

  int removeAll(T target);
  // Removes all elements e from this collection such that e.equals(target)
  // and returns the number of elements removed.

  void clear();
  // Empties this bag so that it contains zero elements.
}

As you can see, the BagInterface is located in the ch05.collections package. The interface extends our CollectionInterface. Any class that implements BagInterface must provide concrete methods for the add, get, contains, remove, isFull, isEmpty, and size operations of the CollectionInterface in addition to the grab, count, removeAll, and clear operations that are explicitly listed in the interface.

The Bag ADT is suited for use by applications that need to be able to count specific elements in a collection or remove all of a specific element from a collection—perhaps applications that manage inventories or allow distribution of resources. Additionally, the unique grab operation allows the Bag ADT to be used for applications that require randomness, for example, games or tutorial programs. Implementation of the Bag ADT is left as an exercise.

The Set ADT

A feature of each of our collection ADTs so far is that they allow duplicate elements. If we change our approach, that is, if we disallow duplicate elements, we have a collection commonly known as a Set. The Set ADT models the mathematical set that is typically defined as a collection of distinct objects.

We can implement a Set class by copying and changing the code from one of our collection implementations—the only method we need to change is the add method. The new add method could be designed to check if the element argument is not already in the collection, and if not it would add the element and return true. Otherwise, of course, it returns false.

Instead of copying the code from one of our implementations is there another way we can reuse it? Yes, in fact, there are two common approaches. We can extend the implementation class or we can wrap an object of the class within the new implementation. Let us investigate each approach.

The BasicSet1 class below extends the LinkedCollection. Therefore, it inherits all of the methods of the LinkedCollection. To ensure that duplicate elements are not entered into the collection we redefine the add method as discussed above. Applications that instantiate a BasicSet1 object and use it to invoke the add method will use the safe add method, that prevents duplicate elements from being added, rather than the overridden method from the LinkedCollection class, which allows duplicates.

//---------------------------------------------------------------------------
// BasicSet1.java              by Dale/Joyce/Weems                  Chapter 5
//
// Implements the CollectionInterface by extending LinkedCollection.
// Overrides add method so that duplicate elements are not added.
//
// Null elements are not allowed.
// One constructor is provided, one that creates an empty collection.
//---------------------------------------------------------------------------
package ch05.collections;

public class BasicSet1<T> extends LinkedCollection<T>
                          implements CollectionInterface<T>
{
  public BasicSet1()
  {
    super();
  }

  @Override
  public boolean add(T element)
  // If element is not already contained in this collection adds element to
  // this collection and returns true; otherwise returns false.
  {
    if (!this.contains(element))
    {
      super.add(element);
      return true;
    }
    else
      return false;
  }
}

As you can see, the BasicSet1 class code is short and was easy to create owing to the effort already expended in the design and creation of the LinkedCollection class.

A drawback of the above approach is that future changes to the LinkedCollection implementation could invalidate the unique element provision of its BasicSet1 subclass. For example, suppose the LinkedCollection was enhanced with the following method:

public void addAll(LinkedCollection elements)
// Adds all the contents of the elements collection to this collection.

Because BasicSet1 inherits from LinkedCollection, this new method would be available to BasicSet1 objects. If the maintenance programmer does not update BasicSet1 to override addAll, then the integrity of BasicSet1 would be compromised as it is possible that addAll could be used to add duplicate elements.

Let us look at another reuse approach that does not suffer from this drawback. Rather than extending LinkedCollection we can use a LinkedCollection object inside our new basic set implementation. This approach is sometimes called “wrapping” because, similar to the Java wrapper classes such as Integer, we hold an object of one type within an object of another type and delegate calls to it. Here is the code:

//---------------------------------------------------------------------------
// BasicSet2.java             by Dale/Joyce/Weems                   Chapter 5
//
// Implements the CollectionInterface by wrapping a LinkedCollection.
// Ensures that duplicate elements are not added.
//
// Null elements are not allowed.
// One constructor is provided, one that creates an empty collection.
//---------------------------------------------------------------------------
package ch05.collections;

public class BasicSet2<T> implements CollectionInterface<T>
{
  LinkedCollection<T> set;

  public BasicSet2()
  {
    set = new LinkedCollection<T>();
  }
  public boolean add(T element)
  // If element is not already contained in this collection adds element to
  // this collection and returns true; otherwise returns false.
  {
    if (!this.contains(element))
    {
      set.add(element);
      return true;
    }
    else
      return false;
  }

  public int size(){return set.size();}
  public boolean contains (T target){return set.contains(target);}
  public boolean remove (T target){return set.remove(target);}
  public T get(T target){return set.get(target);}
  public boolean isEmpty(){return set.isEmpty();}
  public boolean isFull(){return set.isFull();}
}

The only instance variable in the BasicSet2 class is the variable set of type LinkedCollection. In the case of every method provided, except for the add method, the method just invokes the corresponding method of LinkedCollection through the set variable. In the case of add the corresponding LinkedCollection add is also invoked but only if it is safe to do so. Although BasicSet2 is longer than BasicSet1, it does not require more work to create since the only method requiring extra code is the add method—as was also the case for BasicSet1. Furthermore, BasicSet2 does not suffer from the maintenance issue discussed above for BasicSet1. Our BasicSet2 class is a good example of a hierarchical implementation. It is built on top of the LinkedCollection class that in turn uses the LLNode class.

It is possible to define a more complex Set ADT, one that supports the set operations we learned in mathematics classes such as union and intersection. Exercise 36 asks the reader to define and implement such an ADT.

Summary

We introduced the concept of a collection—an object that collects together other objects for later retrieval. The Collection ADT acts as the basis for many other structures. Following our standard approach, we defined the ADT using the Java interface construct.

Three separate implementations of the Collection ADT were developed—an unsorted array, a sorted array, and a linked list. Figure 5.4 shows the relationships among the primary classes and interfaces we created to support this ADT. The diagram follows the same conventions we listed on page 146 for the Stack ADT diagram.

All collection implementations require checking elements for equality, and the sorted array implementation requires the ability to determine the relative order of elements. Therefore, we reviewed comparison of objects, considering both the equals and compareTo methods.

The primary application developed in this chapter to demonstrate the usefulness of the Collection ADT was a vocabulary density calculator. Using both the unsorted and sorted array-based implementations of the collection allowed us to demonstrate the difference in efficiency between the two approaches when analyzing large text files.

In Section 5.7, “Collection Variations,” we reviewed the Java Library Collections Framework and defined two specific collection variants: the Bag and the Set. Implementations for a basic Set ADT using two approaches were developed, each dependent on reusing the LinkedCollection class. One approach demonstrated reuse through inheritance and the other demonstrated reuse by wrapping.

Figure 5.4 The primary classes and interfaces created to support the Collection ADT

Exercises

5.1 The Collection Interface

  1. For each of the methods declared in CollectionInterface, identify what type of operation it is (observer, transformer, or both).

  2. CollectionInterface defines seven abstract methods. Define at least three more operations that might be useful for a Collection ADT to provide.

  3. Identify which attributes would make a good key for determining equality of objects for each of the following classes and explain your reasoning:

    1. A Golfer class that supports finding the latest score of your favorite golfer.

    2. A Student class for use by a university accounting office that includes name, ID number, and payment attributes.

    3. A Rectangle class with attributes length, width, area, and perimeter.

    4. A Book class for an online store that includes book name, author name, edition number, cost, and number of pages attributes.

  4. Discuss the ramification of dropping the “null elements are not allowed” general precondition of CollectionInterface.

  5. Discuss the difference between an operation that pushes an element onto a full (bounded) stack as defined by our StackInterface (Section 2.4, “The Stack Interface”) and one that adds an element to a full collection.

5.2 Array-Based Collection Implementation

  1. Show the values contained in the instance variables elements and numElements of the sample collection after the following sequence of operations:

    ArrayCollection<String> sample = new ArrayCollection<String>;
    sample.add("A"); sample.add("B"); sample.add("C"); sample.add("D");
    sample.remove("B");
  2. Describe the ramifications of each of the following changes to ArrayCollection:

    1. the first three lines of the add method are deleted

    2. the numElements++ statement is deleted from the add method

    3. in the find method “protected” is changed to “public”

    4. in the find method “<” is changed to “<=”

    5. the elements[numElements – 1] = null statement is deleted from the remove method

    6. the “==” in the isEmpty method is changed to “=

    7. in the get method, “found” is changed to “!found

  3. Add the following methods to the ArrayCollection class, and create a test driver for each to show that they work correctly. In order to practice your array coding skills, code each of these methods by accessing the internal variables of the ArrayCollection, not by calling the previously defined public methods of the class.

    1. String toString() creates and returns a string that correctly represents the current collection. Such a method could prove useful for testing and debugging the class and for testing and debugging applications that use the class. Assume each stored element already provides its own reasonable toString method.

    2. int count(T target) returns a count of the number of elements e in the collection such that e.equals(target) is true.

    3. void removeAll(T target) removes all elements e from the collection such that e.equals(target) is true.

    4. ArrayCollection<T> combine(ArrayCollection<T> other) creates and returns a new ArrayCollection object that is a combination of this object and the argument object.

  4. What would be the result of using the ArrayCollection class to hold objects of a class that has not overridden the equals method of the Object class?

5.3 Application: Vocabulary Density

  1. The Vocabulary Density application allows you to analyze text, reporting on the total number of words, the number of unique words, and the vocabulary density. Use it to analyze one or more texts of your choice. Create a report on the results. This is a wide open exercise—we leave it up to you to figure out something interesting to analyze.

  2. Study the Vocabulary Density application and answer the following questions:

    1. How is the name and location of the input file provided to the program?

    2. What happens if the input text contains more than CAPACITY unique words?

    3. What would be the effect of removing the ’ (apostrophe) from the delimiters definition?

    4. What is the effect of removing the “word = word.toLowerCase()” statement?

  3. Revise the Vocabulary Density application so that

    1. if the collection becomes full, then the application no longer continues processing—instead it displays a suitable message and then ends.

    2. it includes a constant THRESHOLD of type int and ignores words whose length is less than the threshold value—thus the word analysis would not include “short” words.

    3. it permits multiple filenames to be passed as command line arguments, and will proceed to separately analyze and report on each of the files.

    4. expand on the previous revision so that in addition to the separate analysis of the files it also performs a combined analysis, as if the combined files all represented a single text.

  4. The file Animals.txt found in the input folder contains a long list of animal names, one per line. Create an application that reads that file and creates a collection of animal names. Use the ArrayCollection class. Your application should then generate a random character and challenge the user to repeatedly enter an animal name that begins with that character, reading the names entered by the user until they either enter a name that does not begin with the required character or is not in the collection, or they enter a name they used before. Finally, your application reports how many names they successfully entered.

  5. The file Keywords.txt found in the input folder contains all the Java keywords. Create an application that accepts the name of a Java program file as a command line argument and displays a count of the total number of keywords the program contains. For example, if you use the VocabularyDensity.java program as your input, the application should display

    
    VocabularyDensity.java contains 24 Java keywords

    As part of your solution you should create a collection of keywords using the information in the Keywords.txt file. Do not worry about the fact that you might be counting keywords contained within comments or strings.

5.4 Comparing Objects Revisited

  1. Based on the equals method for Circle objects defined in Section 5.4, “Comparing Objects Revisited,” what is the output of the following code sequence?

    Circle c1 = new Circle(5);
    Circle c2 = new Circle(5);
    Circle c3 = new Circle(15);
    Circle c4 = null;
    System.out.println(c1 == c1);
    System.out.println(c1 == c2);
    System.out.println(c1 == c3);
    System.out.println(c1 == c4);
    System.out.println(c1.equals(c1));
    System.out.println(c1.equals(c2));
    System.out.println(c1.equals(c3));
    System.out.println(c1.equals(c4));
  2. An equals method is supposed to provide an equivalence relation among the objects of a class. This means that if a, b, and c are non-null objects of the class, then

    1. a.equals(a) is true.

    2. a.equals(b) has the same value as b.equals(a).

    3. If a.equals(b) is true and b.equals(c) is true, then a.equals(c) is true.

    State whether the following definitions of equals are valid. If they are not, explain why not.

    1. Two circles are equal if they have the same area.

    2. Two circles are equal if their radii are within 10% of each other.

    3. Two integers are equal if they have the same remainder when divided by a specific integer, for example, when divided by 3.

    4. Two integers are equal if the second integer is a multiple of the first.

  3. Suppose we have a Rectangle class that includes length and width attributes, of type int, both set by the constructor. Define an equals method for this class so that two rectangle objects are considered equal if

    1. They have the exact same length and width.

    2. They have the same dimensions—that is, they are congruent.

    3. They have the same shape—that is, they are similar.

    4. They have the same perimeter.

    5. They have the same area.

  4. Which of the definitions of equals listed in Exercise 17 is valid, based on the criteria listed in Exercise 16?

  5. Based on the compareTo method for FamousPerson objects defined in Section 5.4, “Comparing Objects Revisited,” what is the output of the following code sequence? Answers may include phrases such as “a negative integer.”

    FamousPerson f1 = new FamousPerson("Peter","Parker",1962,"Spiderman");
    FamousPerson f2 = new FamousPerson("Bonnie","Parker",1910,"Criminal");
    FamousPerson f3 = new FamousPerson("Clark","Kent",1938,"Superman");
    FamousPerson f4 = new FamousPerson("Clark","Kent",1938,"Reporter");
    System.out.println(f1.compareTo(f1));
    System.out.println(f1.compareTo(f2));
    System.out.println(f2.compareTo(f1));
    System.out.println(f1.compareTo(f3));
    System.out.println(f3.compareTo(f4));
    System.out.println(f4.compareTo(f3));
  6. In Exercise 16 we stated some rules for the behavior of the equals method. What similar rule or rules should the compareTo method follow?

  7. Suppose we have a Rectangle class that includes length and width attributes of type int, both set by the constructor. Create a compareTo method for this class so that rectangle objects are ordered based on their

    1. Perimeter.

    2. Area.

  8. How many “known” classes in the Java library implement the Comparable interface? List five such classes that you have used before, or at least have studied.

5.5 Sorted Array-Based Collection Implementation

  1. Show the values contained in the instance variables elements and numElements of the sample collection after the following sequence of operations:

    SortedArrayCollection<String> sample
                = new SortedArrayCollection<String>;
    sample.add("A"); sample.add("D"); sample.add("B"); sample.add("C");
    sample.remove("B");
  2. Fill in the following table with the order of growth of the indicated operations for the given implementation approach—assume the collection holds N elements:

    operation ArrayCollection SortedArrayCollection add
    get
    contains
    remove
    isEmpty
  3. Describe the ramifications of each of the following changes to SortedArrayCollection:

    1. in the enlarge method "origCap" is changed to "DEFCAP"

    2. the first two statements are deleted from the find method

    3. the if (result > 0) location++ statement is deleted from the recFind method

  4. Add the following methods to the SortedArrayCollection class, and create a test driver for each to show that they work correctly. Code each of these methods by accessing the internal variables of the SortedArrayCollection, not by calling the previously defined methods of the class.

    1. String toString() creates and returns a string that correctly represents the current collection. Such a method could prove useful for testing and debugging the class and for testing and debugging applications that use the class. Assume each stored element already provides its own reasonable toString method.

    2. T smallest() returns null if the collection is empty, otherwise returns the smallest element of the collection.

    3. int greater(T element) returns a count of the number of elements e in the collection that are greater than element, that is such that e.compareTo (element) is > 0.

    4. SortedArrayCollection<T> combine(SortedArrayCollection<T> other) creates and returns a new SortedArrayCollection object that is a combination of this object and the argument object.

  5. Describe the functional difference between the following two sections of code:

    CollectionInterface<String> c = new ArrayCollection<String>(10);
    c.add("Tom"); c.add("Julie"); c.add("Molly");
    System.out.println(c.contains("Kathy");
    
    CollectionInterface<String> c = new SortedArrayCollection<String>(10);
    c.add("Tom"); c.add("Julie"); c.add("Molly");
    System.out.println(c.contains("Kathy");

5.6 Link-Based Collection Implementation

  1. Show the values contained in the instance variables head and numElements of the sample collection after the following sequence of operations:

    LinkedCollection<String> sample = new LinkedCollection<String>;
    sample.add("A"); sample.add("B"); sample.add("C"); sample.add("D");
    sample.remove("B");
  2. Describe the ramifications of each of the following changes to LinkedCollection:

    1. the statement newNode.setLink(head) is deleted from the add method

    2. the statement location = location.getLink() is deleted from the find method

    3. the statement head = head.getLink() within the remove method is changed to head = location.getLink()

  3. Add the following methods to the LinkedCollection class, and create a test driver for each to show that they work correctly. Code each of these methods by accessing the internal variables of the LinkedCollection, not by calling the previously defined methods of the class.

    1. String toString() creates and returns a string that correctly represents the current collection. Such a method could prove useful for testing and debugging the class and for testing and debugging applications that use the class. Assume each stored element already provides its own reasonable toString method.

    2. int count(T target) returns a count of the number of elements e in the collection such that e.equals(target) is true.

    3. void removeAll(T target) removes all elements e from the collection such that e.equals(target) is true.

    4. LinkedCollection<T> combine(LinkedCollection<T> other) creates and returns a new SortedArrayCollection object that is a combination of this object and the argument object.

  4. Create a new collection class named SortedLinkedCollection that implements a collection using a sorted linked list. Include a toString method as described in Exercise 30a. Include a test driver application that demonstrates your class works correctly.

  5. Recreate the contents of Table 5.2 (no peeking).

5.7 Collection Variations

  1. Create a class that implements the BagInterface, plus a test driver that shows that your class operates correctly:

    1. Using an unsorted array (if you like, you can extend the ArrayCollection class)

    2. Using a sorted array (if you like, you can extend the SortedArrayCollection class)

    3. Using a linked list (if you like, you can extend the LinkedCollection class)

  2. Create an application that creates a Bag of FamousPerson using one of your implementation classes from Exercise 33. Your application should use the information in the input/FamousCS.txt file (see Section 5.4, “Comparing Objects Revisited”) to create the Bag. Next your application will repeat the following five times:

    • selects a random “famous person” from the bag

    • tells the user the year of birth and the fact about the person

    • asks the user to enter the last name of the famous person and reads their response

    • display all the information about the famous person

    Finally, your application tells the user how many they “got right” out of the five chances, that is, how many famous people they correctly identified.

  3. Describe the three approaches discussed in the section to reuse the LinkedCollection class in the creation of a Basic Set ADT implementation.

  4. An Advanced Set includes all the operations of a Basic Set plus operations for the union, intersection, and difference of sets.

    1. Define an Advanced Set interface.

    2. Implement the Advanced Set using an unsorted array; include a test driver that demonstrates your implementation works correctly.

    3. Implement the Advanced Set using a sorted array; include a test driver that demonstrates your implementation works correctly.

    4. Implement the Advanced Set using a linked list; include a test driver that demonstrates your implementation works correctly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.124.21