© Ake13bk/Shutterstock
This chapter introduces one of the most fundamental ADTs, the Collection ADT. This ADT provides the ability to store information and then access the stored information. As you can imagine, storing and retrieving information is often at the heart of computer processing. There are other ADTs that provide these same operations, perhaps with certain restrictions on the information stored or the order in which it can be accessed. In fact, you might recall that we have referred to both of our previously defined ADTs, the Stack and the Queue, as “access-controlled” collections.
Typical Collection ADTs (unlike the Stack and the Queue) allow retrieval of information based on the contents of the information. Therefore, we return to the topic of object comparison in this chapter, considering what it means for two objects to be “equal” or for one object to be “less than” or “greater than” another object. Being able to compare objects allows us to store them in sorted order, which, in turn, permits relatively fast retrieval options. This allows us, for example, to analyze more information during processing—we demonstrate this ability using a text processing application developed with our Collection ADT.
Stacks and queues are useful ADTs allowing us to solve many different types of problems. Stacks are perfect for situations where we need to process the most recent data first, such as matching grouping symbols in arithmetic expressions and queues make good buffers for storing information that needs to be handled in a first in first out sequence. The restrictions on how to insert and remove information when using stacks and queues ensure that when we use them in these situations we properly access the data.
But what about problems where we need to retrieve information regardless of the order in which it is stored—for example obtaining bank account information based on an ID number or determining if a specific item is in stock at a warehouse? In these situations we need a structure that allows us to store information and retrieve it later based on some key part of the information. Many solutions to this problem exist—lists, maps, tables, search trees, and databases just to name a few. This chapter presents the most basic ADT that provides the required functionality—the Collection ADT.
Our Collection ADT allows us to collect together information for later access. The order in which we add information to a collection has no effect on when we can retrieve it. Once an object is added to a collection we can retrieve it and/or remove it based on its contents.
To provide content-based access the Collection ADT uses the equals
method. All classes provide an equals
method—if they do not implement the method directly then they inherit it. We know this because the equals
method is implemented in the Object
class that is the root of the inheritance tree (see Section 1.2, “Organizing Classes”). The equals
method of the Object
class only considers two objects to be equal if they are aliases of each other. Usually we want equality to be related to the contents of the objects, not their memory address. Therefore, objects stored in our collections should override the equals
method of the Object
class. We return to this topic in Section 5.4, “Comparing Objects Revisited.” For now we will use the String
class in our examples—the String
class does provide a content-based equals
method.
Many variations of collections are possible. Here we define a basic collection, a content-based repository that supports addition, removal, and retrieval of elements.
Our collections allow duplicate elements. Element first
is considered to be a duplicate of element second
if first.equals(second)
returns true
. When an operation involves “finding” such an element, it can “find” any one of the duplicates. There is no distinction among duplicate elements in these cases.
As with our stacks and queues, collections do not support null
elements. As a general precondition for all of our collection methods, null
elements cannot be used as arguments. Rather than stating this precondition for each method, we state it once in the lead comment of the interface. This is a standard precondition when defining ADTs. It simplifies coding, which removes the need for testing for null
pointers every time we use an argument within a method.
Other than prohibiting null
elements, our collection classes have minimal preconditions for operations. For example, it is possible to specify a remove
operation that requires a matching object to be present in the collection. We do not do this. Instead, we define a remove
operation that returns a boolean
value indicating whether the operation succeeded. This approach provides greater flexibility to applications. Similarly we could require that the add
operation not be invoked on a full collection, perhaps stating that in such a situation it throws a “Collection Overflow” exception. Instead we use the same approach as with remove
—add
returns a boolean
indicating success or failure of the operation. An attempt to add
to a full collection would return false
.
As we did with stacks and queues, we capture the formal specifications of our Collection ADT using the Java interface construct. Our collections are generic—the type of object held by any particular collection is indicated by the client at the time the list is instantiated. It is the client’s responsibility to use a class whose equals
method definition suits their needs. Here is the code for our CollectionInterface
. Study the comments and method signatures to learn more about the details. As you can see, this interface is part of the ch05.collections
package. The implementation classes are also stored in the same package.
//--------------------------------------------------------------------------- // CollectionInterface.java by Dale/Joyce/Weems Chapter 5 // // Interface for a class that implements a collection of T. // A collection allows addition, removal, and access of elements. // Null elements are not allowed. Duplicate elements are allowed. //--------------------------------------------------------------------------- package ch05.collections; public interface CollectionInterface<T> { boolean add(T element); // Attempts to add element to this collection. // Returns true if successful, false otherwise. T get(T target); // Returns an element e from this collection such that e.equals(target). // If no such element exists, returns null. boolean contains(T target); // Returns true if this collection contains an element e such that // e.equals(target); otherwise returns false. boolean remove (T target); // Removes an element e from this collection such that e.equals(target) // and returns true. If no such element exists, returns false. boolean isFull(); // Returns true if this collection is full; otherwise, returns false. boolean isEmpty(); // Returns true if this collection is empty; otherwise, returns false. int size(); // Returns the number of elements in this collection. }
Figure 5.4, in this chapter’s summary, shows the relationships among the primary classes and interfaces created to support our Collection ADT.
This section develops an array-based implementation of our Collection ADT. We look at another array-based solution in Section 5.5, “Sorted Array-Based Collection Implementation,” and discuss a link-based solution in Section 5.6, “Link-Based Collection Implementation.”
Our basic approach is simple: If a collection has N elements, we hold the elements in the first N locations of the array, locations 0 to N-1. We maintain an instance variable, numElements
, to hold the current number of elements in the collection. This example shows how we would represent a collection of country abbreviations:
When an element is added to the collection, we simply place it in the next available slot, the slot indicated by numElements
, and increment numElements
by one. The result of adding Columbia (“COL”) to the above collection is as follows:
When an element is removed we could shift all the elements at higher indices one position lower, to fill the gap left by the removal. Instead we take advantage of the unsorted nature of the array and simply move the element at the end of the collection into the position occupied by the element to be removed. Let us remove Mexico from the above collection:
The ArrayCollection
class implements the CollectionInterface
using the basic approach described above. The implementation of the ArrayCollection
class is straightforward, especially if you are familiar with the array-based implementations of the Stack ADT from Chapter 2 and of the Queue ADT from Chapter 4.
As was the case with our array-based generic stacks and queues, the instantiation of an array as an array of Object
, and subsequent casting of it as an array of T
, typically generates a compiler warning. Even though this approach is somewhat awkward and results in a compiler warning, it is how we must create generic collections using arrays in Java. Two constructors are provided: one that instantiates a collection based on a default capacity of 100, and another that allows the client to indicate the capacity.
For our ArrayCollection
class we place the code for array search in a protected helper method, called find
, which is invoked by each method that needs to search for an element. The find
method sets the values of the protected instance variables found
and location
, thereby indicating the results of the search. The algorithm used here for searching the array for an element is the Sequential Search algorithm presented in Section 1.6, “Comparing Algorithms: Order of Growth Analysis.” The find
method is an example of procedural abstraction. Its existence removes the need to create similar code across the methods remove
, contains
, and get
, and greatly simplifies the code for each of those methods.
Here is the code for the ArrayCollection
class:
//--------------------------------------------------------------------------- // ArrayCollection.java by Dale/Joyce/Weems Chapter 5 // // Implements the CollectionInterface using an array. // // Null elements are not allowed. Duplicate elements are allowed. // // Two constructors are provided: one that creates a collection of a default // capacity, and one that allows the calling program to specify the capacity. //--------------------------------------------------------------------------- package ch05.collections; public class ArrayCollection<T> implements CollectionInterface<T> { protected final int DEFCAP = 100; // default capacity protected T[] elements; // array to hold collection's elements protected int numElements = 0; // number of elements in this collection // set by find method protected boolean found; // true if target found, otherwise false protected int location; // indicates location of target if found public ArrayCollection() { elements = (T[]) new Object[DEFCAP]; } public ArrayCollection(int capacity) { elements = (T[]) new Object[capacity]; } protected void find(T target) // Searches elements for an occurrence of an element e such that // e.equals(target). If successful, sets instance variables found to true // and location to the index of e. If not successful, sets found to false. { location = 0; found = false; while (location < numElements) { if (elements[location].equals(target)) { found = true; return; } else location++; } } public boolean add(T element) // Attempts to add element to this collection. // Returns true if successful, false otherwise. { if (isFull()) return false; else { elements[numElements] = element; numElements++; return true; } } public boolean remove (T target) // Removes an element e from this collection such that e.equals(target) // and returns true; if no such element exists, returns false. { find(target); if (found) { elements[location] = elements[numElements - 1]; elements[numElements - 1] = null; numElements--; } return found; } public boolean contains (T target) // Returns true if this collection contains an element e such that // e.equals(target); otherwise, returns false. { find(target); return found; } public T get(T target) // Returns an element e from this collection such that e.equals(target); // if no such element exists, returns null. { find(target); if (found) return elements[location]; else return null; } public boolean isFull() // Returns true if this collection is full; otherwise, returns false. { return (numElements == elements.length); } public boolean isEmpty() // Returns true if this collection is empty; otherwise, returns false. { return (numElements == 0); } public int size() // Returns the number of elements in this collection. { return numElements; } }
The vocabulary density of a text is the total number of words in the text divided by the number of unique words in the text. In this section we develop an application that computes the vocabulary density of a given text.
The problem is clear—read the text and count how many words it contains and also the number of unique words it contains. We define a word to be a sequence of letter characters (A through Z) plus the apostrophe character ( ’ ). The apostrophe is included so as to treat words such as “it’s” and “Molly’s” as a single word. We ignore the case of letters when determining uniqueness—for example, “The” and “the” are considered to be the same word.
To count the number of unique words we need to keep track of all the words encountered so far while reading through the text. Our Collection ADT is the perfect tool to accomplish this. The algorithm is simple—read through the text word by word, checking each word to see if it is already in the collection and if not, add it to the collection. When finished, the size of the collection provides us the number of unique words. As we proceed through the words we also keep track of the total number of words processed. We use the total number of words and the number of unique words to calculate the vocabulary density. More formally:
Input to our application will reside in a text file. The name/location of the file should be indicated as a command line argument to the program. Here is the application—the parts of the code directly related to the use of the Collection ADT are emphasized:
//--------------------------------------------------------------------------- // VocabularyDensity.java by Dale/Joyce/Weems Chapter 5 // // Displays the number of total words, unique words in the input text file, // and the resulting vocabulary density. // Input file indicated by a command line argument. //--------------------------------------------------------------------------- package ch05.apps; import java.io.*; import java.util.*; import ch05.collections.*; public class VocabularyDensity { public static void main(String[] args) throws IOException { final int CAPACITY = 1000; // capacity of collection String fname = args[0]; // input file of text String word; // current word int numWords = 0; // total number of words int uniqWords; // number of unique words double density; // vocabulary density CollectionInterface<String> words = new ArrayCollection<String>(CAPACITY); // Set up file reading FileReader fin = new FileReader(fname); Scanner wordsIn = new Scanner(fin); wordsIn.useDelimiter("[^a-zA-Z']+"); // delimiters are nonletters,' while (wordsIn.hasNext()) // while more words to process { word = wordsIn.next(); word = word.toLowerCase(); if (!words.contains(word)) words.add(word); numWords++; } density = (double)numWords/words.size(); System.out.println("Analyzed file " + fname); System.out.println(" Total words: " + numWords); if (words.size() == CAPACITY) System.out.println(" Unique words: at least " + words.size()); else { System.out.println(" Unique words: " + words.size()); System.out.printf(" Vocabulary density: %.2f", density); } } }
Some observations about the application:
The code is short and simple—this is because so much of the complexity is handled by the ArrayCollection
class. This is a good example of the power and utility of abstraction.
This program demonstrates how to read input from a file. The FileReader
constructor will throw IOException
if there are issues finding/opening the indicated file. Because that is a checked exception, we must either catch it or rethrow it. We chose the latter approach as you can see on the line declaring the main
method. The FileReader fin
is passed as an argument to a Scanner
, allowing us to use the scanner to read the file.
We use Java’s Scanner
to break the input file into words. It allows us to know when “there are more words to process” and allows us to “get the next word.” The Scanner
class permits us to define the set of delimiters used to separate the tokens found in its input source by invoking its useDelimiter
method. This method accepts a regular expression as an argument. In our program we indicate that the delimiters are “[^a-zA-Z’]+”; this regular expression means any sequence of one or more characters that are not a letter (“a-zA-Z” indicates all the characters between ‘a’ and ‘z’ plus all the characters between ‘A’ and ‘Z’) or an apostrophe (note the ’ that follows the ‘Z’ in the expression)—the ‘^’ symbol acts to negate that set of characters, and the + sign indicates “one or more” of the characters in the bracketed set that proceeds it. Thus, “one or more characters that are not a letter or an apostrophe.” See documentation about the Java Pattern
class for more information.
The toLowerCase
method of the String
class changes all letters of a word to lowercase before checking if the word is in the collection, fulfilling the specification to ignore the case of letters when determining uniqueness.
The constant CAPACITY
is used to instantiate the ArrayCollection
variable words
. Therefore, the words
collection can hold at most CAPACITY
words, in our case 1,000. If the capacity is reached we display the phrase “at least” because we cannot determine the exact number of unique words in the file. In that case we do not display the density as it would be meaningless.
Once the capacity is reached the add
method no longer adds words to the collection—instead it simply returns the value false
over and over as the program continues to process the input file. Exercise 5.12a asks you to modify the program so that in the case of reaching a full capacity it stops processing and outputs an appropriate message.
The reader is encouraged to test this program and to use it to analyze some text files of their choice. Here are the results of running the program on three related, important, historical documents—the 1215 British Magna Carta (translated to English from the original Latin but minus the list of those in attendance), the 1776 U.S. Declaration of Independence (minus the list of signatures), and the 1930 Indian Purna Swaraj:
Analyzed file BritishMagnaCarta.txt | Analyzed file USAdeclaration.txt | Analyzed file IndiaPurnaSwaraj.txt |
---|---|---|
Total words: 3310 | Total words: 1341 | Total words: 594 |
Unique words: 698 | Unique words: 540 | Unique words: 288 |
Vocabulary density: 4.74 | Vocabulary density: 2.48 | Vocabulary density: 2.06 |
With stacks and queues, we access only the ends of our structures. We push or pop the top element of a stack, or we enqueue a value at the tail of a queue and dequeue it from the head. We do not access elements stored at other places within the structure. With a collection, however, access is permitted to elements anywhere within the structure. Therefore, to understand collections we need to understand how an element within a structure is identified—we need to understand equality of objects.
In the next section we consider an approach to storing the elements of a collection that improves the efficiency of finding any given element. If we can store the elements in “increasing order” then we can use the binary search algorithm to locate them. But what does it mean to store elements “in order”? To understand this we need to understand object comparison.
This section reviews both the equals
method, used to determine equality, and the compareTo
method (along with the Comparable
interface), used to determine order.
Section 1.5, “Basic Structuring Mechanisms,” discussed comparing objects using the comparison operator (==). Recall that when using ==, the comparison is actually made between the contents of the two reference variables that point to the objects, and not between the contents of the objects themselves. This is demonstrated in Figure 5.1 (which replicates Figure 1.6).
The comparison operator does not compare the contents of objects. What else can we do? How do we compare the actual objects? One option is to use the equals
method. Because this method is exported from the Object
class that is the root of the Java inheritance tree, it can be used with objects of any Java class. If c1
and c2
are objects of the class Circle
, then we can compare them using:
c1.equals(c2)
The equals
method, as defined in the Object
class, acts in much the same way as the comparison operator. It returns true if and only if the two variables reference the same object. To circumvent this problem we can, within a class, redefine the equals
method to fit the purpose of the class.
Suppose we have a Circle
class that features a radius
attribute of type int
. A reasonable definition for equality of Circle
objects is that they are equal if they have the same radii. To implement this approach, we include the following method in the Circle
class:
@Override public boolean equals(Object obj) // Returns true if obj is a Circle with same radius as this Circle, // otherwise returns false. { if (obj == this) return true; else if (obj == null || obj.getClass() != this.getClass()) return false; else { Circle c = (Circle) obj; return (this.radius == c.radius); } }
This design of the equals
method reflects standard good practice. A few notes about the design follow:
The @Override
notation indicates to the compiler that we are overriding an ancestor’s method. The compiler double-checks our syntax and ensures that we are in fact overriding a method.
The first if clause checks to see if we are comparing a circle to itself. In that case we return true
.
The second if clause checks to see if the argument is null
. If so we return false
. Due to the short-circuit nature of the Java ||
operator the second part of the or expression is only evaluated if the first part is false, protecting us from a null pointer exception. In the second part of the expression we check to make sure the two objects being compared are of the same type—if not we return false
.
If processing reaches the final else clause we can safely cast the obj
argument as a Circle
and compare the two radii.
Now when a statement such as
c1.equals(c2)
is encountered, the customized equals
method of the Circle
class is used, rather than the equals
method of the Object
class. Even though c1
and c2
may reference different Circle
objects, if those objects have equal radii, the equals
method returns true
. As discussed previously in Section 5.1, “The Collection Interface,” we sometimes refer to the attribute, or combination of attributes, of an object used by the equals
method to determine equality as the key of the class. The key of a class, from the point of view of an application, is the set of attributes that are used to determine the identity of an object of the class, for that application. For example, if an application uses the Circle
class equals
method to determine identity, then the radius of a circle is the key.
Let us look at another example, one that we will use in several applications. This example should help clarify what is meant by the key attributes of a class. Consider a class, FamousPerson
, whose objects will represent famous people. The information we wish to capture about a famous person is their name (first name and last name), the year they were born, and some short interesting fact about the person.
The FamousPerson
class can be found in the support
package. It includes protected variables firstName
, lastName
, yearOfBirth
, and fact
to hold its attributes, all of which are initialized through its constructor. The class also includes getters plus equals
, compareTo
, and toString
methods. Here we show the equals method that demonstrates that equality can sometimes depend on a subset of the attributes, in this case the first and last name attributes. We say that the combination of firstName
and lastName
is the key of the class.
@Override public boolean equals(Object obj) // Returns true if 'obj' is a FamousPerson with same first and last // names as this FamousPerson, otherwise returns false. { if (obj == this) return true; else if (obj == null || obj.getClass() != this.getClass()) return false; else { FamousPerson fp = (FamousPerson) obj; return (this.firstName.equals(fp.firstName) && this.lastName.equals(fp.lastName)); } }
Let us create an application that uses our Collection ADT and the FamousPerson
class. A text file named FamousCS.txt
is included in the input
folder—it contains information about famous computer scientists, one scientist per line, in the format: first name, last name, year of birth, and fact. We make the simplifying assumptions that the only commas in the file are those used to separate the information, and that the scientists’ names do not contain “white space.”
Our application will read the information from the file and store it in a collection of FamousPerson
. It will then interact with the user, displaying the information about the scientists based on the user’s queries. The application is easy to write because the more complex parts are already implemented by the ArrayCollection
and FamousPerson
classes. We call the application CSInfo
, representing Computer Scientist Information.
//--------------------------------------------------------------------------- // CSInfo.java by Dale/Joyce/Weems Chapter 5 // // Reads information about famous computer scientists from the file // FamousCS.txt. Allows user to enter names and provides them the information // from the file that matches the name. //--------------------------------------------------------------------------- package ch05.apps; import java.io.*; import java.util.*; import ch05.collections.*; import support.FamousPerson; public class CSInfo { public static void main(String[] args) throws IOException { // instantiate collection final int CAPACITY = 300; ArrayCollection<FamousPerson> people = new ArrayCollection<FamousPerson>(CAPACITY); // set up file reading FileReader fin = new FileReader("input/FamousCS.txt"); Scanner info = new Scanner(fin); info.useDelimiter("[,\n]"); // delimiters are commas, line feeds Scanner scan = new Scanner(System.in); FamousPerson person; String fname, lname, fact; int year; // read the info from the file and put in collection while (info.hasNext()) { fname = info.next(); lname = info.next(); year = info.nextInt(); fact = info.next(); person = new FamousPerson(fname, lname, year, fact); people.add(person); } // interact with user, getting names and displaying info final String STOP = "X"; System.out.println("Enter names of computer scientists."); System.out.println("Enter: firstname lastname (" + STOP + " to exit) "); fname = null; lname = null; while (!STOP.equals(fname)) { System.out.print("Name> "); fname = scan.next(); if (!STOP.equals(fname)) { lname = scan.next(); person = new FamousPerson(fname, lname, 0, ""); if (people.contains(person)) { person = people.get(person); System.out.println(person); } else System.out.println("No information available "); } } } }
Here is a sample run of the application:
Enter names of computer scientists. Enter: firstname lastname (X to exit) Name> Ada Lovelace Ada Lovelace(Born 1815): Considered by many to be first computer programmer. Name> Molly Joyce No information available Name> Edsger Dijkstra Edsger Dijkstra(Born 1930): 1972 Turing Award winner. Name> X
As you study the code of the application, take particular note of the following line:
person = people.get(person);
It might seem strange to be passing person
to the collection to get back a FamousPerson
object and assign it to person
. This is an example of how the key attributes of an object let us retrieve information from a collection. Suppose the user enters “Ada Lovelace” in response to the name prompt. In that case the line
person = new FamousPerson(fname, lname, 0, "");
creates a new Person
object with attributes
firstName: "Ada" lastName: "Lovelace" yearOfBirth: 0 fact: ""
When the get
method is invoked with this object as an argument, the object is compared to each of the objects held in the collection—based on the way equals
is defined for FamousPerson
objects, a match is found for the Person
object with attributes
firstName: "Ada" lastName: "Lovelace" yearOfBirth: 1815 fact: "Considered by many to be first computer programmer."
It is this matched object that is returned, that is a reference to it is returned, and assigned to the person
variable. Now the person
variable references the object that is in the collection, the object with the complete information, and it is this information that is used in the println
statement on the following line. As long as we provide a partial object that contains the correct key information, we can retrieve the complete object from the collection.
The equals
method allows us to check whether a particular element is in a collection. But in addition to checking objects for equality, we need another type of comparison. We often need to be able to tell when one object is less than, equal to, or greater than another object. The Java library provides an interface, called Comparable
, that can be used to ensure that a class provides this functionality.
The Comparable
interface consists of exactly one abstract method:
public int compareTo(T o); // Returns a negative integer, zero, or a positive integer as this object // is less than, equal to, or greater than the specified object.
The compareTo
method returns an integer value that indicates the relative “size” relationship between the object upon which the method is invoked and the object passed to the method as an argument. As we see in the next section, this information can be used to order the objects of a collection within the internal representation structure, facilitating faster search. By convention the compareTo
method of a class should support the standard order of a class—lexicographic order for strings, numeric order for numbers, low score to high score for bowlers, high score to low score for golfers, and so on. We call the order established by a class compareTo
method the natural order of the class.
Any class that implements the Comparable
interface defines its own compareTo
method, with a signature that matches the abstract method defined in the interface. After all, the implementer of the class is in the best position to define how objects of the class should be compared—to know the natural order.
The Java Comparable
interface is able to work with generics. Use of generic types with the compareTo
method helps ensure that comparison takes place only between compatible objects.
Let us return to our famous person example. Recalling that people are compared on their first and last names when determining equality, we use the same two fields when determining relative order. Following standard procedure we first compare last names using the String
class equals
method—only if the two last names under consideration are equal do we turn our attention to the first names. When a key consists of multiple attributes, starting with the most significant attribute, first is the best approach. Here is the compareTo
method implemented within our FamousPerson
class (our FamousPerson
class does implement the Comparable
interface):
public int compareTo(FamousPerson other) // Precondition: 'other' is not null // // Compares this FamousPerson with 'other' for order. Returns a // negative integer, zero, or a positive integer as this object // is less than, equal to, or greater than 'other'. { if (!this.lastName.equals(other.lastName)) return this.lastName.compareTo(other.lastName); else return this.firstName.compareTo(other.firstName); }
Our compareTo
method uses the compareTo
method of the String
class. This makes sense because the attributes upon which we are basing the comparison are strings. Notice that the equals
method described previously and the compareTo
method described here are consistent with each other. In other words, the equals
method returns true
for two people if and only if the compareTo
method returns 0 for the two people. It is good programming practice to ensure consistency between these two methods. Unlike with the equals
method it is common to assume the argument passed to compareTo
is not null
.
In Section 5.2, “Array-Based Collection Implementation,” we implemented a collection class that used an array. The ArrayCollection
implementation featured a constant time add,
but all the other basic operations—get
, contains
, and remove
—require O(N) time. Consider an application that builds a large collection and then repeatedly accesses the collection to see if it contains various elements—perhaps a spell-checker application. Although such an application can make use of the fast add
operation, it might benefit even more from a fast contains
operation. Is there some way we can structure our stored data so as to facilitate a faster search?
In Section 1.6, “Comparing Algorithms: Order of Growth Analysis,” and Section 3.3, “Recursive Processing of Arrays,” we studied the binary search algorithm—an O(log2N) algorithm for finding an element in a sorted array. If we use an array as the internal representation for a collection implementation, and keep that array in sorted order, then the binary search algorithm can be used to find elements in the collection. Not only will this significantly speed up the contains
operation but since get
and remove
both require that we first find the element in question, those operations can also benefit. In this section we take this approach. We call our implementation the SortedArrayCollection
.
Our goal with this new Collection ADT implementation is to increase the efficiency of the operations. Efficiency is typically only a concern for a collection when the collection holds a large number of elements. It makes sense, therefore, to make this implementation of the Collection ADT unbounded. We do this using the same approach we did with our array-based unbounded queue implementation—we include a protected enlarge
method that can be invoked internally by the public add
method when needed.
As elements are added to our collection we intend to keep the protected array sorted according to the natural order of the elements—therefore we intend to use the compareTo
method associated with the elements. In order to do this we must ensure that only classes that implement the Comparable
interface are used with our collection.
Is it possible to force applications to use Comparable
classes when providing an argument for a generic type? Yes it is. We can use what is called a “bounded type.” For example, we could design a class Duo
that holds a pair of elements and that provides a largest
operation to return the larger of the two elements:
public class Duo<T extends Comparable<T>> { protected T first; protected T second; public Duo(T f, T s) { first = f; second = s; } public T larger() { if (first.compareTo(second) > 0) return first; else return second; } }
Note that in the class declaration instead of indicating the generic parameter as <T>
we used <T extends Comparable <T>>.
This indicates that the generic argument must be of a class that implements the Comparable
interface. If we instantiate a Duo
object using a class that does not implement Comparable
, a syntax error is generated.
Although it appears we can use the above approach to solve our problem of assuring that only Comparable
classes are used with our SortedArrayCollection
classes, it turns out that is not the case. Because we wish to use an array as our internal representation structure, we cannot use this approach due to type safety issues and differences between the way arrays and generic types are treated by Java. Attempting to declare an array of <T extends Comparable<T>>
results in a syntax error. So we need to find a different approach.
Rather than try to enforce the use of Comparable
classes using syntactic means we will use our “programming by contract” approach. We specify as a precondition of the add
method that its argument is comparable to the previous objects added to the collection. As the add
method is the only way for an element to get into the collection, we are guaranteed that all elements held in our SortedArrayCollection
can be compared to each other. If a client class ignores the contract then it is likely that an exception will be thrown when they use the class.
The SortedArrayCollection
class implements the CollectionInterface
using an array. For this class we are concerned about the order in which we keep elements stored—the array must be kept sorted. Nevertheless, the implementation of the SortedArrayCollection
class has much in common with the implementation of the ArrayCollection
class developed in Section 5.2 “Array-Based Collection Implementation,” so we do not provide complete code listings here. The code for the new class can be found in the ch05.collections
package. A few notes follow.
As already mentioned, this implementation is unbounded, using a protected enlarge
method as needed to increase the size of the internal representation array by the original capacity.
The class includes a protected find
method that serves the same function as the find
method of the ArrayCollection
class—search the collection for an occurrence of an element that is equal to the target element, and if successful set the instance variable found
to true
and set the instance variable location
to the index where the target was found. For this class, however, the find
method makes use of the binary search algorithm to provide excellent efficiency. The recursive approach to the binary search algorithm described in Section 3.3, “Recursive Processing of Arrays,” is used.
During the search process the compareTo
operation is used. Therefore, within the find
method we must cast elements as Comparable
, so that the Java compiler will accept our use of the compareTo
method. Some compilers will generate a warning message regarding an unchecked call to compareTo
, because there is no way for the compiler to verify that the generic type T
will actually implement Comparable
. Because we understand the reason for the warning message and because our precondition prohibits the use of elements that do not implement Comparable
, we can safely ignore the compiler warning. An application that ignores the precondition and adds elements to the SortedArrayCollection
that are not Comparable
will cause a type mismatch exception to be thrown—which is exactly the result we desire because incomparable elements should not be added to a sorted collection.
Unlike with the ArrayCollection
class we cannot simply add a new element in the next available location of the array. So that the add
method can also make use of the find
method, to determine where the new element should be inserted, the find
method of the SortedArrayCollection
class has additional functionality. In the case of an "unsuccessful" search, the find
method sets the instance variable found
to false
and sets the instance variable location
to the index where the target should be inserted.
Once the index where the addition of the new element is identified, the add
method creates room for the new element by shifting all of the elements from that index through the end of the collection, one index higher, and then inserts the new element in the free slot that is created:
The code for the add
method is as follows:
public boolean add(T element) // Precondition: element is Comparable to previously added objects // // Adds element to this collection. { if (numElements == elements.length) enlarge(); find(element); // sets location to index where element belongs for (int index = numElements; index > location; index--) elements [index] = elements[index - 1]; elements [location] = element; numElements++; return true; }
Note that since the SortedArrayCollection
is unbounded the add
method always returns true
—it should always be able to add the given element
successfully.
Similarly, the remove method shifts elements one index lower in order to remove the targeted element without disrupting the already sorted array.
When designing an ADT, such as for a stack, queue, or collection, we have a choice about how to handle the elements—“by copy” or “by reference.”
With this approach, the ADT manipulates copies of the data used in the client program. When the ADT is presented with a data element to store, it makes a copy of the element and stores that copy. Making a valid copy of an object can be a complicated process, especially if the object is composed of other objects. Valid copies of an object are typically created using the object’s clone
method. Classes that provide a clone
method must indicate this fact to the run-time system by implementing the Cloneable
interface. In the examples that follow, we assume the object classes provide rigorous clone
methods and implement the Cloneable
interface. In that case, code for an unsorted collection add
operation might be
public void add (T element)
// Adds a copy of the element to this collection
{
elements[numElements] = element.clone();
numElements++;
}
In Java, if the collection elements are objects, then it is really a reference to a copy of the element that is stored—because all Java objects are manipulated by reference. The key distinction here is that it is a reference to a copy of the element, and not a reference to the element itself, that is stored.
Similarly, when an ADT returns an element using the “by copy” approach, it actually returns a reference to a copy of the element. As an example, consider the code for a collection get
operation:
public T get(T target) // Returns a copy of element e from this collection such that e.equals(target); // if no such element exists, returns null. { find(target); if (found) return elements[location].clone(); else return null; }
This approach provides strong information hiding. In effect, the ADT is providing a separate repository for a copy of the client’s data.
For this approach, an ADT manipulates references to the actual elements passed to it by the client program. For example, code for an unsorted collection add
operation might be
public void add (T element)
// Adds an element to this collection
{
elements[numElements] = element;
numElements++;
}
Because the client program retains a reference to the element, we have exposed the contents of the collection ADT to the client program. The ADT still hides the way the data is organized—for example, the use of an array of objects—but it allows direct access to the individual elements of the collection by the client program through the client program’s own references. In effect, the ADT provides an organization for the original client data.
The “by reference” method is the most commonly used approach and the one we use throughout this text. It has the advantage that it takes less time and space than the “by copy” method. Copying objects takes time, especially if the objects are large and require complicated deep-copying methods. Storing extra copies of objects also requires extra memory. Thus, the “by reference” approach is an attractive strategy.
When we use the “by reference” approach, we create aliases of our elements, so we must deal with the potential problems associated with aliases. If our data elements are immutable, then no problems will occur. If the elements can be changed, however, problems can arise.
If an element is accessed and changed through one alias, it could disrupt the status of the element when it is accessed through the other alias. This situation is especially dangerous if the client program can change an attribute that the ADT uses to determine the internal organization of the elements. For example, if the client changes an attribute that determines an object’s position within a sorted collection, then the object may no longer be in the proper place. Because the change did not go through a method of the sorted collection class, the class has no way of knowing that it should correct this situation. A subsequent get
operation on this collection would likely fail.
The diagrams in Figure 5.2 show the ramifications of both approaches. Suppose we have objects that hold a person’s name
and weight
. Further suppose that we have a collection of these objects sorted by the variable weight
. We add three objects into the collection, and then transform one of the objects with a diet
method that changes the weight
of the object. The left side of the figure models the approach of storing references to copies of the objects—the “by copy” approach. The right side models the approach of storing references to the original objects—the “by reference” approach.
The middle section of the figure, showing the state of things after the objects have been inserted into the collections, clearly demonstrates the differences in the internal implementations. The “by copy” approach creates copies and the collection elements reference them; these copies take up space that is not required in the “by reference” approach. It is also clear from the right side of the figure that the “by reference” approach creates aliases for the objects, as we can see more than one reference to an object. In both approaches, the collection elements are kept sorted by weight.
The situation becomes more interesting when we modify one of the objects. When the person represented by the S1
object loses some weight, the diet
method is invoked to decrease the weight of the object. In this scenario, both approaches display problems. In the “by copy” approach, we see that the S1
object has been updated. The copy of the S1
object maintained in the collection is clearly out of date. It holds the old weight value. A programmer must remember that such a collection stores only the values of objects as they existed at the time of the add
operation and that changes to those objects are not reflected in the objects stored on the collection. The programmer must design the code to update the collection, if appropriate.
In the “by reference” approach, the object referred to by the collection contains the up-to-date weight information, because it is the same object referred to by the S1
variable. The collection, however, is no longer sorted by the weight
attribute. Because the update to weight
took place without any collection activity, the collection objects remain in the same order as before. The collection structure is now corrupt, and calls to the collection methods may behave unpredictably. Instead of directly updating the S1
object, the program should have removed the object from the collection, updated the object, and then reinserted it into the collection.
Which approach is better? That depends. If processing time and space are issues, and if we are comfortable counting on the application programs to behave properly, then the “by reference” approach is probably the best choice. If we are not overly concerned about time and space usage (maybe our collection objects are not too large), but we are concerned with maintaining careful control over the access to and integrity of our collections, then the “by copy” approach is probably the best choice. The suitability of either approach depends on how the collection is used.
To provide evidence that our new collection implementation works correctly we return to the vocabulary density problem introduced in Section 5.3, “Application: Vocabulary Density.” The application developed in that section used the ArrayCollection
class:
CollectionInterface<String> words = new ArrayCollection<String>(CAPACITY);
The only thing we need to do, to use our new implementation, is to change the above line to use the SortedArrayCollection
class:
CollectionInterface<String> words = new SortedArrayCollection<String>(CAPACITY);
To compare the two approaches in terms of time efficiency we used a suite of text files, most obtained from the Project Gutenberg website, as input to both versions of the application. We timed the programs using the currentTimeMillis
method of the Java Library System
class. Although this approach to measuring execution time is not extremely precise, it is good enough for our purposes here.
See Table 5.1 for the results of our experiment. The text files are presented in the table from smallest to largest. The “Linux Word File” is an alphabetical list of words used by the Linux Operating System’s spell-checker program. It has the lowest density, essentially 1 (due to some words, for example “unix,” being repeated in the file with different capitalization schemes, the number of words and number of unique words are not exactly the same). Other than that special case, the density tends to increase as the size of the file increases, as is expected. The “Mashup” file is a concatenation of 121 files from the Gutenberg site consisting of all the other files on the list (except the Linux Word File) plus other novels, “complete works,” dictionaries, and technical manuals including Spanish, French, and German language texts in addition to the English.
Before running the program on a particular text file we made sure that the instantiated collection would be large enough to hold all of the unique words in the file—this way we guaranteed that the bounded ArrayCollection
was able to handle the problem and we did not “penalize” the unbounded SortedArrayCollection
for time spent enlarging its internal storage.
Table 5.1 Results of Vocabulary Density Experiment1
Text | File Size | Results | Array-Collection |
Sorted-Array-Collection |
|
---|---|---|---|---|---|
Shakespeare’s 18th Sonnet | 1 KB | words: unique: density: |
114 83 1.37 |
20 msecs | 23 msecs |
Shakespeare’s Hamlet | 177 KB | words: unique: density: |
32,247 4,790 6.73 |
236 msecs | 128 msecs |
Linux Word File | 400 KB | words: unique: density: |
45,404 45,371 1.00 |
9,100 msecs | 182 msecs |
Melville’s Moby-Dick | 1,227 KB | words: unique: density: |
216,113 17,497 12.35 |
2,278 msecs or 2.3 seconds |
382 msecs |
The Complete Works of William Shakespeare | 5,542 KB | words: unique: density: |
900,271 26,961 33.39 |
9.7 seconds | 1.2 seconds |
Webster’s Unabridged Dictionary | 28,278 KB | words: unique: density: |
4,669,130 206,981 22.56 |
4.7 minutes | 9.5 seconds |
11th Edition of the Encyclopaedia Britannica | 291,644 KB | words: unique: density: |
47,611,399 695,531 68.45 |
56.4 minutes | 2.5 minutes |
Mashup | 608,274 KB | words: unique: density: |
102,635,256 1,202,099 85.38 |
10 hours | 7.2 minutes |
The benefits of the sorted array approach are clear from the table. As the size of the input increases, the fast performance of the binary search used by the SortedArrayCollection
as compared to the sequential search used by the ArrayCollection
become obvious. This is most evident in the comparison of times for the large Mashup file, with the array-based approach requiring over 10 hours and the sorted array approach taking only 7 minutes, on the author’s desktop computer.
In this section we design a link-based implementation of our Collection ADT. The internal representation structure will be an unsorted linked list. You will discover that most of the detailed work is already finished as we will be able to reuse code from previous ADT implementations. Throughout the discussion we compare and contrast the implementation of the operations within the previously defined ArrayCollection
class and the LinkedCollection
class currently under consideration. To study the code details not shown here please see the LinkedCollection
class in the ch05.collections
package.
As we did for our link-based stacks and queues, we again use the LLNode
class from the support
package to provide our nodes. The information attribute of a node contains the collection element; the link attribute contains a reference to the node holding the next collection element. We maintain a variable, head
of type LLNode<T>
that references the first node in the linked list. This linked list acts as the internal representation for the collection. It is an unsorted linked list. The start of the class is
package ch05.collections; import support.LLNode; public class LinkedCollection<T> implements CollectionInterface<T> { protected LLNode<T> head; // head of the linked list protected int numElements = 0; // number of elements in this collection // set by find method protected boolean found; // true if target found, else false protected LLNode<T> location; // node containing target, if found protected LLNode<T> previous; // node preceding location public LinkedCollection() { numElements = 0; head = null; }
There is only one constructor provided for LinkedCollection
. There is no need to deal with capacity as there was with the array-based implementations. The constructor sets the instance variables numElements
and head
, essentially constructing an empty collection.
Why do we need a numElements
instance variable? Recall that in the array-based approach, the numElements
variable was used to indicate the first empty array location. Because we are now using a linked list, we do not have array locations. Nevertheless, to support the size
operation we still maintain numElements
, incrementing it whenever an element is added to the collection and decrementing it whenever an element is removed. The size
method simply returns the value of numElements
just as it does in ArrayCollection
. The isEmpty
method also remains unchanged from the array-based approach and the isFull
method just trivially returns false because our linked structure is never full.
The add
method is passed an argument of class T
, and adds it to the collection. Because the internal linked list is unsorted, and order of the elements is not important, we can just add new elements to the front of the linked list. This is the easiest, most efficient approach, because head
already provides a reference to this location. The code for the add
method is very similar to that of the push
method in the LinkedStack
class developed in Section 2.8, “A Link-Based Stack.” Note that since our LinkedCollection
is unbounded, the add
method always returns the value true
—it should always be able to add the given element
successfully.
public boolean add(T element)
// Adds element to this collection.
{
LLNode<T> newNode = new LLNode<T>(element);
newNode.setLink(head);
head = newNode;
numElements++;
return true;
}
As we did for both the unsorted and sorted array-based implementation approaches, we create a protected helper method for our linked approach and call it find
. Recall that this method sets the values of the instance variables found
and location
, which can then be used by the other methods. Given that the find
method works correctly, coding both the contains
and get
methods become easy—in fact they are exactly the same as their counterparts in the array-based implementations:
public boolean contains (T target) public T get(T target) { { find(target); find(target); return found; if (found) } return location.getInfo(); else return null; }
The protected find
method is also used by the remove
method—we discuss remove
below. For now, let us consider find
in more detail. It follows the same algorithm as the find
method of the array-based implementation: Walk through the internal representation of elements until the target element is found or the end of the representation is reached. The only difference is the use of linked list statements instead of array-related statements. Here is the code for both methods, for comparison. In the array-based approach, location
is of type int
, indicating the array index of the target element; in the link-based approach, location
is an LLNode
, indicating the node containing the target element.
protected void find(T target) { location = 0; found = false; while (location < numElements) { if (elements[location].equals(target)) { found = true; return; } else location++; } }
protected void find(T target) { location = head; found = false; while (location != null) { if (location.getInfo().equals(target)) { found = true; return; } else { previous = location; location = location.getLink(); } } }
Actually, we lied. There is another difference between the two find
implementations. Do you see it? The link-based implementation assigns a value to a variable named previous
that is not mentioned in the array-based implementation. This variable is used by the LinkedCollection remove
method. We will take a closer look now at this method.
To remove the target, we must first find it. We do so using the find
method, which sets the location
variable to indicate the target element. As shown in Figure 5.3, however, to actually remove the node referenced by location
, we must change the reference in the previous node. That is, we must change the link of the previous node to reference the node following the one being removed. We “jump over” the removed node. This is where we use the previous
variable.
Not only does find
set location
, but it also sets previous
so that the remove
method has access to the previous node and can implement the “jump over” step. This is accomplished by remove
with the statement:
previous.setLink(location.getLink());
Removing the first node must be treated as a special case because the main reference to the linked list (head
) must be changed. We handle that special case with an if statement at the beginning of the code for remove
. Is removing the last node a special case that requires additional code? No. The link within the last node is null
. Therefore, in the case of removing the last node the above statement correctly sets the value of the link of the previous node to null
, indicating that it is now the end of the list. Here is remove
:
public boolean remove (T target) // Removes an element e from this collection such that e.equals(target) // and returns true; if no such element exists, returns false. { find(target); if (found) { if (head == location) head = head.getLink(); // remove first node else previous.setLink(location.getLink()); // remove node at location numElements--; } return found; }
We have now looked at several different implementations of the Collection ADT. How do they compare? Let us briefly review each implementation—a summary of this discussion is found in Table 5.2.
All three implementation approaches, the ArrayCollection
, SortedArrayCollection
, and LinkedCollection
, maintain an instance variable numElements
that holds the current size of the collection. This allows simple O(1) implementations of the size
, isEmpty
, and isFull
operations.
Our ArrayCollection
implementation is a bounded implementation. It is the responsibility of the client to indicate the capacity of an ArrayCollection
object at instantiation. If an add
operation is invoked on a full collection the add
operation returns false
and there is no change to the collection. A client must therefore ensure the initial capacity is large enough to handle the problem they are addressing. Being an array-based implementation the constructor is an O(N) operation, where N indicates the capacity—for any particular collection object we only execute the constructor once. The array in this implementation is unsorted—meaning the add
operation is trivial and is O(1), whereas to locate a target element in the array for the contains
, get
, and remove
operations requires a O(N) sequential search.
Table 5.2 Comparison of Collection Implementations
Storage Structure | ArrayCollection Unsorted array |
SortedArrayCollection Sorted array |
LinkedCollection Unsorted linked list |
---|---|---|---|
Space | Bounded by original capacity | Unbounded—invokes enlarge method as needed |
Unbounded—grows and shrinks |
Class constructor | O(N) | O(N) | O(1) |
size isEmpty isFull |
O(1) | O(1) | O(1) |
contains |
O(N) | O(log2N) | O(N) |
get |
O(N) | O(log2N) | O(N) |
add |
O(1) | O(N) | O(1) |
remove |
O(N) | O(N) | O(N) |
The SortedArrayCollection
is also an array-based implementation; however, in this case we decided to make it an unbounded array-based implementation. If an add
operation is invoked when the array is full then additional space is created using the enlarge
method. This approach relieves the client of the responsibility of predetermining the needed capacity. Keeping the internal array sorted requires a more complex add
method with a cost of O(N). However, it also means that the cost of locating a target element is now O(log2N) as we can use the binary search algorithm. This significantly lowers the time cost of both contains
and get
.
Capacity is not a concern for the LinkedCollection
. In this unbounded implementation the amount of space used to hold the collection grows and shrinks along with the collection. As we pointed out when discussing linked implementations previously, we do require two reference variables for each element in the collection—one to hold the link and one to reference the collection element. The constructor for the LinkedCollection
simply initializes the instance variables. Other than the constructor, the efficiency of the unsorted linked list approach is the same as that of the unsorted array approach. For both approaches it makes no difference where we add an element so the add
method is fast. On the other hand, searching for a target element requires traversing the collection element by element.
If you reflect on our Collection ADT you will realize it offers simple but crucial functionality–the ability to store and retrieve information. This functionality sits at the heart of information processing. Data structures, file systems, memory/storage, databases, the Cloud, the Internet all involve, at their core, storing and retrieving information.
Given the central importance of collections, this section about variations could go on and on! The lists, search trees, maps, hash tables, and priority queues we study in the upcoming chapters can all be considered forms of collections. Even the stacks and queues that we covered previously are collections, albeit collections with specific rules about addition and removal of elements. We will first briefly review the role of collections within the Java library. We then take a look at two commonly used collections not covered elsewhere in the text—bags and sets.
It is not surprising that the Java platform includes a "Collections Framework"—a unified architecture of classes and interfaces that provides powerful, flexible tools for storing and retrieving information. At the center of the framework is the Collection
interface, found in the java.util
package of the library. This interface supports 11 subinterfaces including Deque
, List
, and Set
and has 33 implementing classes including the somewhat familiar ArrayList
, LinkedList
, Stack
, and PriorityQueue
and exotic-sounding classes such as BeanContextServicesSupport
and RoleUnresolvedList
.
There are 15 abstract methods defined within the Java Collection
interface, including methods that mirror our isEmpty
, size
, add
, remove
, and contains
methods. Note, however, that the interface does not define anything analogous to our get
method. So how is one supposed to retrieve information from a collection? The answer is that the designers of the Collections Framework do not intend for anyone to directly implement the Collection interface. Instead it is used as the root of an inheritance tree of interfaces as previously mentioned, and it is those interfaces that are implemented. Various methods similar to our get
method are defined within those interfaces.
It is instructive to look at some of the additional method definitions from the Collection
interface:
toArray |
Returns an array containing all of the elements of the collection |
clear |
Removes all elements |
equals |
Takes an Object argument, returning true if it is equal to the current collection and false otherwise |
addAll |
Takes a Collection argument and adds its contents to the current collection; returns a boolean indicating success or failure |
retainAll |
Takes a Collection argument and removes any elements from the current collection that are not in the argument collection; returns a boolean indicating success or failure |
removeAll |
Takes a Collection argument and removes any elements from the current collection that are also in the argument collection |
Methods such as equals
, addAll
, retainAll
, and removeAll
allow clients to use collections as more than just separate repositories—collections can be combined and compared. The Set ADT we define below is a collection intended to be used in that fashion.
As mentioned in Chapter 2, in this text we do not cover the Collections Framework in great detail. This text is meant to teach you about the fundamental nature of data structures and to demonstrate how to define, implement, and use them. It is not an exploration of how to use Java’s specific library architecture of similar structures. If you are interested in learning more about the Java Collections Framework, you can study the extensive documentation available at Oracle’s website.
We want to define an ADT that fits our everyday notion of a bag—a bag of marbles, a bag of groceries, a bag of money! What assumptions should we make about our bag? What operations should we include?
Think about a bag of marbles. Suppose marbles are identified by their color. It certainly is possible to place more than one red marble into our bag—we decide that our Bag ADT should allow duplicate elements.
We can put marbles into the bag, look into the bag to see if it contains a particular marble (“is there a green marble in there?”) and reach in and remove a specific marble. These activities correspond to our Collection ADT’s add
, contains
, and remove
operations. We can check to see if the bag is full or is empty and we could look inside and count how many marbles it contains—our isFull
, isEmpty
, and size
operations. With a bit of a stretch of imagination we can think about reaching into the bag and pointing to a marble (which is essentially what our get
operation does). In short, all of the operations we defined for our Collection ADT also make sense for our Bag ADT.
What other operations can we include? We could close our eyes and reach into the bag and draw out a random marble (“I hope I get the blue one”). We could peek into the bag and count all marbles of a specified color (“How many red ones are there?”) or remove all marbles of a specified color (“I don’t like the purple ones”). We could turn the bag upside down and empty it (“Watch out, here they come!”). We will add grab
, count
, removeAll
, and clear
to our Bag operations. Their formal definitions are included in our BagInterface
listed below. As with our Collection ADT, our Bag ADT requires a valid equals
method to be defined for the objects that it contains.
//--------------------------------------------------------------------------- // BagInterface.java by Dale/Joyce/Weems Chapter 5 // // Interface for a class that implements a bag of T. // A bag is a collection that supports a few extra operations. //--------------------------------------------------------------------------- package ch05.collections; public interface BagInterface<T> extends CollectionInterface<T> { T grab(); // If this bag is not empty, removes and returns a random element of the // bag; otherwise returns null. int count(T target); // Returns a count of all elements e in this collection such that // e.equals(target). int removeAll(T target); // Removes all elements e from this collection such that e.equals(target) // and returns the number of elements removed. void clear(); // Empties this bag so that it contains zero elements. }
As you can see, the BagInterface
is located in the ch05.collections
package. The interface extends our CollectionInterface
. Any class that implements BagInterface
must provide concrete methods for the add
, get
, contains
, remove
, isFull
, isEmpty
, and size
operations of the CollectionInterface
in addition to the grab
, count
, removeAll
, and clear
operations that are explicitly listed in the interface.
The Bag ADT is suited for use by applications that need to be able to count specific elements in a collection or remove all of a specific element from a collection—perhaps applications that manage inventories or allow distribution of resources. Additionally, the unique grab
operation allows the Bag ADT to be used for applications that require randomness, for example, games or tutorial programs. Implementation of the Bag ADT is left as an exercise.
A feature of each of our collection ADTs so far is that they allow duplicate elements. If we change our approach, that is, if we disallow duplicate elements, we have a collection commonly known as a Set. The Set ADT models the mathematical set that is typically defined as a collection of distinct objects.
We can implement a Set class by copying and changing the code from one of our collection implementations—the only method we need to change is the add
method. The new add
method could be designed to check if the element
argument is not already in the collection, and if not it would add the element
and return true
. Otherwise, of course, it returns false
.
Instead of copying the code from one of our implementations is there another way we can reuse it? Yes, in fact, there are two common approaches. We can extend the implementation class or we can wrap an object of the class within the new implementation. Let us investigate each approach.
The BasicSet1
class below extends the LinkedCollection
. Therefore, it inherits all of the methods of the LinkedCollection
. To ensure that duplicate elements are not entered into the collection we redefine the add
method as discussed above. Applications that instantiate a BasicSet1
object and use it to invoke the add
method will use the safe add
method, that prevents duplicate elements from being added, rather than the overridden method from the LinkedCollection
class, which allows duplicates.
//--------------------------------------------------------------------------- // BasicSet1.java by Dale/Joyce/Weems Chapter 5 // // Implements the CollectionInterface by extending LinkedCollection. // Overrides add method so that duplicate elements are not added. // // Null elements are not allowed. // One constructor is provided, one that creates an empty collection. //--------------------------------------------------------------------------- package ch05.collections; public class BasicSet1<T> extends LinkedCollection<T> implements CollectionInterface<T> { public BasicSet1() { super(); } @Override public boolean add(T element) // If element is not already contained in this collection adds element to // this collection and returns true; otherwise returns false. { if (!this.contains(element)) { super.add(element); return true; } else return false; } }
As you can see, the BasicSet1
class code is short and was easy to create owing to the effort already expended in the design and creation of the LinkedCollection
class.
A drawback of the above approach is that future changes to the LinkedCollection
implementation could invalidate the unique element provision of its BasicSet1
subclass. For example, suppose the LinkedCollection
was enhanced with the following method:
public void addAll(LinkedCollection elements)
// Adds all the contents of the elements collection to this collection.
Because BasicSet1
inherits from LinkedCollection,
this new method would be available to BasicSet1
objects. If the maintenance programmer does not update BasicSet1
to override addAll
, then the integrity of BasicSet1
would be compromised as it is possible that addAll
could be used to add duplicate elements.
Let us look at another reuse approach that does not suffer from this drawback. Rather than extending LinkedCollection
we can use a LinkedCollection
object inside our new basic set implementation. This approach is sometimes called “wrapping” because, similar to the Java wrapper classes such as Integer
, we hold an object of one type within an object of another type and delegate calls to it. Here is the code:
//--------------------------------------------------------------------------- // BasicSet2.java by Dale/Joyce/Weems Chapter 5 // // Implements the CollectionInterface by wrapping a LinkedCollection. // Ensures that duplicate elements are not added. // // Null elements are not allowed. // One constructor is provided, one that creates an empty collection. //--------------------------------------------------------------------------- package ch05.collections; public class BasicSet2<T> implements CollectionInterface<T> { LinkedCollection<T> set; public BasicSet2() { set = new LinkedCollection<T>(); } public boolean add(T element) // If element is not already contained in this collection adds element to // this collection and returns true; otherwise returns false. { if (!this.contains(element)) { set.add(element); return true; } else return false; } public int size(){return set.size();} public boolean contains (T target){return set.contains(target);} public boolean remove (T target){return set.remove(target);} public T get(T target){return set.get(target);} public boolean isEmpty(){return set.isEmpty();} public boolean isFull(){return set.isFull();} }
The only instance variable in the BasicSet2
class is the variable set
of type LinkedCollection
. In the case of every method provided, except for the add
method, the method just invokes the corresponding method of LinkedCollection
through the set
variable. In the case of add
the corresponding LinkedCollection add
is also invoked but only if it is safe to do so. Although BasicSet2
is longer than BasicSet1
, it does not require more work to create since the only method requiring extra code is the add
method—as was also the case for BasicSet1
. Furthermore, BasicSet2
does not suffer from the maintenance issue discussed above for BasicSet1
. Our BasicSet2
class is a good example of a hierarchical implementation. It is built on top of the LinkedCollection
class that in turn uses the LLNode
class.
It is possible to define a more complex Set ADT, one that supports the set operations we learned in mathematics classes such as union and intersection. Exercise 36 asks the reader to define and implement such an ADT.
We introduced the concept of a collection—an object that collects together other objects for later retrieval. The Collection ADT acts as the basis for many other structures. Following our standard approach, we defined the ADT using the Java interface
construct.
Three separate implementations of the Collection ADT were developed—an unsorted array, a sorted array, and a linked list. Figure 5.4 shows the relationships among the primary classes and interfaces we created to support this ADT. The diagram follows the same conventions we listed on page 146 for the Stack ADT diagram.
All collection implementations require checking elements for equality, and the sorted array implementation requires the ability to determine the relative order of elements. Therefore, we reviewed comparison of objects, considering both the equals
and compareTo
methods.
The primary application developed in this chapter to demonstrate the usefulness of the Collection ADT was a vocabulary density calculator. Using both the unsorted and sorted array-based implementations of the collection allowed us to demonstrate the difference in efficiency between the two approaches when analyzing large text files.
In Section 5.7, “Collection Variations,” we reviewed the Java Library Collections Framework and defined two specific collection variants: the Bag and the Set. Implementations for a basic Set ADT using two approaches were developed, each dependent on reusing the LinkedCollection
class. One approach demonstrated reuse through inheritance and the other demonstrated reuse by wrapping.
For each of the methods declared in CollectionInterface,
identify what type of operation it is (observer, transformer, or both).
CollectionInterface
defines seven abstract methods. Define at least three more operations that might be useful for a Collection ADT to provide.
Identify which attributes would make a good key for determining equality of objects for each of the following classes and explain your reasoning:
A Golfer class that supports finding the latest score of your favorite golfer.
A Student class for use by a university accounting office that includes name, ID number, and payment attributes.
A Rectangle class with attributes length, width, area, and perimeter.
A Book class for an online store that includes book name, author name, edition number, cost, and number of pages attributes.
Discuss the ramification of dropping the “null
elements are not allowed” general precondition of CollectionInterface
.
Discuss the difference between an operation that pushes an element onto a full (bounded) stack as defined by our StackInterface
(Section 2.4, “The Stack Interface”) and one that adds an element to a full collection.
Show the values contained in the instance variables elements
and numElements
of the sample
collection after the following sequence of operations:
ArrayCollection<String> sample = new ArrayCollection<String>; sample.add("A"); sample.add("B"); sample.add("C"); sample.add("D"); sample.remove("B");
Describe the ramifications of each of the following changes to ArrayCollection
:
the first three lines of the add
method are deleted
the numElements++
statement is deleted from the add
method
in the find method “protected” is changed to “public”
in the find method “<” is changed to “<=”
the elements[numElements – 1] = null
statement is deleted from the remove
method
the “==
” in the isEmpty method is changed to “=
”
in the get method, “found
” is changed to “!found
”
Add the following methods to the ArrayCollection
class, and create a test driver for each to show that they work correctly. In order to practice your array coding skills, code each of these methods by accessing the internal variables of the ArrayCollection
, not by calling the previously defined public methods of the class.
String toString()
creates and returns a string that correctly represents the current collection. Such a method could prove useful for testing and debugging the class and for testing and debugging applications that use the class. Assume each stored element already provides its own reasonable toString
method.
int count(T target)
returns a count of the number of elements e
in the collection such that e.equals(target)
is true
.
void removeAll(T target)
removes all elements e
from the collection such that e.equals(target)
is true
.
ArrayCollection<T> combine(ArrayCollection<T> other)
creates and returns a new ArrayCollection
object that is a combination of this
object and the argument object.
What would be the result of using the ArrayCollection
class to hold objects of a class that has not overridden the equals
method of the Object
class?
The Vocabulary Density application allows you to analyze text, reporting on the total number of words, the number of unique words, and the vocabulary density. Use it to analyze one or more texts of your choice. Create a report on the results. This is a wide open exercise—we leave it up to you to figure out something interesting to analyze.
Study the Vocabulary Density application and answer the following questions:
How is the name and location of the input file provided to the program?
What happens if the input text contains more than CAPACITY
unique words?
What would be the effect of removing the ’ (apostrophe) from the delimiters definition?
What is the effect of removing the “word = word.toLowerCase()
” statement?
Revise the Vocabulary Density application so that
if the collection becomes full, then the application no longer continues processing—instead it displays a suitable message and then ends.
it includes a constant THRESHOLD
of type int
and ignores words whose length is less than the threshold value—thus the word analysis would not include “short” words.
it permits multiple filenames to be passed as command line arguments, and will proceed to separately analyze and report on each of the files.
expand on the previous revision so that in addition to the separate analysis of the files it also performs a combined analysis, as if the combined files all represented a single text.
The file Animals.txt
found in the input
folder contains a long list of animal names, one per line. Create an application that reads that file and creates a collection of animal names. Use the ArrayCollection
class. Your application should then generate a random character and challenge the user to repeatedly enter an animal name that begins with that character, reading the names entered by the user until they either enter a name that does not begin with the required character or is not in the collection, or they enter a name they used before. Finally, your application reports how many names they successfully entered.
The file Keywords.txt
found in the input
folder contains all the Java keywords. Create an application that accepts the name of a Java program file as a command line argument and displays a count of the total number of keywords the program contains. For example, if you use the VocabularyDensity.java
program as your input, the application should display
VocabularyDensity.java contains 24 Java keywords
As part of your solution you should create a collection of keywords using the information in the Keywords.txt
file. Do not worry about the fact that you might be counting keywords contained within comments or strings.
Based on the equals
method for Circle
objects defined in Section 5.4, “Comparing Objects Revisited,” what is the output of the following code sequence?
Circle c1 = new Circle(5); Circle c2 = new Circle(5); Circle c3 = new Circle(15); Circle c4 = null; System.out.println(c1 == c1); System.out.println(c1 == c2); System.out.println(c1 == c3); System.out.println(c1 == c4); System.out.println(c1.equals(c1)); System.out.println(c1.equals(c2)); System.out.println(c1.equals(c3)); System.out.println(c1.equals(c4));
An equals
method is supposed to provide an equivalence relation among the objects of a class. This means that if a, b, and c are non-null objects of the class, then
a.equals(a)
is true.
a.equals(b)
has the same value as b.equals(a)
.
If a.equals(b)
is true and b.equals(c)
is true, then a.equals(c)
is true.
State whether the following definitions of equals
are valid. If they are not, explain why not.
Two circles are equal if they have the same area.
Two circles are equal if their radii are within 10% of each other.
Two integers are equal if they have the same remainder when divided by a specific integer, for example, when divided by 3.
Two integers are equal if the second integer is a multiple of the first.
Suppose we have a Rectangle
class that includes length
and width
attributes, of type int
, both set by the constructor. Define an equals
method for this class so that two rectangle objects are considered equal if
They have the exact same length
and width
.
They have the same dimensions—that is, they are congruent.
They have the same shape—that is, they are similar.
They have the same perimeter.
They have the same area.
Which of the definitions of equals
listed in Exercise 17 is valid, based on the criteria listed in Exercise 16?
Based on the compareTo
method for FamousPerson
objects defined in Section 5.4, “Comparing Objects Revisited,” what is the output of the following code sequence? Answers may include phrases such as “a negative integer.”
FamousPerson f1 = new FamousPerson("Peter","Parker",1962,"Spiderman"); FamousPerson f2 = new FamousPerson("Bonnie","Parker",1910,"Criminal"); FamousPerson f3 = new FamousPerson("Clark","Kent",1938,"Superman"); FamousPerson f4 = new FamousPerson("Clark","Kent",1938,"Reporter"); System.out.println(f1.compareTo(f1)); System.out.println(f1.compareTo(f2)); System.out.println(f2.compareTo(f1)); System.out.println(f1.compareTo(f3)); System.out.println(f3.compareTo(f4)); System.out.println(f4.compareTo(f3));
In Exercise 16 we stated some rules for the behavior of the equals
method. What similar rule or rules should the compareTo
method follow?
Suppose we have a Rectangle
class that includes length
and width
attributes of type int
, both set by the constructor. Create a compareTo
method for this class so that rectangle objects are ordered based on their
Perimeter.
Area.
How many “known” classes in the Java library implement the Comparable
interface? List five such classes that you have used before, or at least have studied.
Show the values contained in the instance variables elements
and numElements
of the sample
collection after the following sequence of operations:
SortedArrayCollection<String> sample = new SortedArrayCollection<String>; sample.add("A"); sample.add("D"); sample.add("B"); sample.add("C"); sample.remove("B");
Fill in the following table with the order of growth of the indicated operations for the given implementation approach—assume the collection holds N elements:
operation | ArrayCollection |
SortedArrayCollection add |
get |
||
contains |
||
remove |
||
isEmpty |
Describe the ramifications of each of the following changes to SortedArrayCollection
:
in the enlarge
method "origCap
" is changed to "DEFCAP
"
the first two statements are deleted from the find
method
the if (result > 0) location++
statement is deleted from the recFind
method
Add the following methods to the SortedArrayCollection
class, and create a test driver for each to show that they work correctly. Code each of these methods by accessing the internal variables of the SortedArrayCollection
, not by calling the previously defined methods of the class.
String toString()
creates and returns a string that correctly represents the current collection. Such a method could prove useful for testing and debugging the class and for testing and debugging applications that use the class. Assume each stored element already provides its own reasonable toString
method.
T smallest()
returns null
if the collection is empty, otherwise returns the smallest element of the collection.
int greater(T element)
returns a count of the number of elements e
in the collection that are greater than element
, that is such that e.compareTo (element)
is > 0.
SortedArrayCollection<T> combine(SortedArrayCollection<T> other)
creates and returns a new SortedArrayCollection
object that is a combination of this
object and the argument object.
Describe the functional difference between the following two sections of code:
CollectionInterface<String> c = new ArrayCollection<String>(10); c.add("Tom"); c.add("Julie"); c.add("Molly"); System.out.println(c.contains("Kathy"); CollectionInterface<String> c = new SortedArrayCollection<String>(10); c.add("Tom"); c.add("Julie"); c.add("Molly"); System.out.println(c.contains("Kathy");
Show the values contained in the instance variables head
and numElements
of the sample
collection after the following sequence of operations:
LinkedCollection<String> sample = new LinkedCollection<String>; sample.add("A"); sample.add("B"); sample.add("C"); sample.add("D"); sample.remove("B");
Describe the ramifications of each of the following changes to LinkedCollection
:
the statement newNode.setLink(head)
is deleted from the add
method
the statement location = location.getLink()
is deleted from the find
method
the statement head = head.getLink()
within the remove
method is changed to head = location.getLink()
Add the following methods to the LinkedCollection
class, and create a test driver for each to show that they work correctly. Code each of these methods by accessing the internal variables of the LinkedCollection
, not by calling the previously defined methods of the class.
String toString()
creates and returns a string that correctly represents the current collection. Such a method could prove useful for testing and debugging the class and for testing and debugging applications that use the class. Assume each stored element already provides its own reasonable toString
method.
int count(T target)
returns a count of the number of elements e
in the collection such that e.equals(target)
is true
.
void removeAll(T target)
removes all elements e
from the collection such that e.equals(target)
is true
.
LinkedCollection<T> combine(LinkedCollection<T> other)
creates and returns a new SortedArrayCollection
object that is a combination of this
object and the argument object.
Create a new collection class named SortedLinkedCollection
that implements a collection using a sorted linked list. Include a toString
method as described in Exercise 30a. Include a test driver application that demonstrates your class works correctly.
Recreate the contents of Table 5.2 (no peeking).
Create a class that implements the BagInterface
, plus a test driver that shows that your class operates correctly:
Using an unsorted array (if you like, you can extend the ArrayCollection
class)
Using a sorted array (if you like, you can extend the SortedArrayCollection
class)
Using a linked list (if you like, you can extend the LinkedCollection
class)
Create an application that creates a Bag of FamousPerson
using one of your implementation classes from Exercise 33. Your application should use the information in the input/FamousCS.txt
file (see Section 5.4, “Comparing Objects Revisited”) to create the Bag. Next your application will repeat the following five times:
selects a random “famous person” from the bag
tells the user the year of birth and the fact about the person
asks the user to enter the last name of the famous person and reads their response
display all the information about the famous person
Finally, your application tells the user how many they “got right” out of the five chances, that is, how many famous people they correctly identified.
Describe the three approaches discussed in the section to reuse the LinkedCollection
class in the creation of a Basic Set ADT implementation.
An Advanced Set includes all the operations of a Basic Set plus operations for the union, intersection, and difference of sets.
Define an Advanced Set interface.
Implement the Advanced Set using an unsorted array; include a test driver that demonstrates your implementation works correctly.
Implement the Advanced Set using a sorted array; include a test driver that demonstrates your implementation works correctly.
Implement the Advanced Set using a linked list; include a test driver that demonstrates your implementation works correctly.
3.133.124.21