Lesson 9. Maps and Equality

In this lesson you will learn more about the hash table data structure classes in Java. You will also learn about the fundamental concept of equality. Understanding these two fundamental, related topics is critical to your ability to master Java. Yet many developers with years of Java experience do not fully understand these concepts. Insidious defects and severe performance issues can arise from this lack of education. Before we start on these important topics, however, you'll learn about logical operators, an equally important topic.

You will learn about:

• logical operators

• hash tables

• equality

toString

Map andSet implementations

Strings and equality

Logical Operators

Suppose you want to execute a line of code only if two separate conditions both hold true. You can use cascading if statements to guard that line of code:

if (isFullTime)
   if (isInState)
      rate *= 0.9;

Using logical operators, you can combine multiple conditionals into a single complex boolean expression:

if (isFullTime && isInState)
   rate *= 0.9;

The double ampersands symbol (&&) is the logical and operator. It is a binary operator—it separates two boolean expressions—the operands. If the result of both operands is true, the result of the entire conditional is true. In any other case, the result of the entire conditional is false. Here is the test-driven truth table (TDTT1) that represents the possible combinations of the logical and operator:

1 A fabricated but cute acronym.

assertTrue(true && true);
assertFalse(true && false);
assertFalse(false && true);
assertFalse(false && false);

The logical or operator (||), also a binary operator, returns true if either one of the two operands returns true or if both operands return true. The logical or operator returns false only if both operands are false. The TDTT for logical or:

assertTrue(true || true);
assertTrue(true || false);
assertTrue(false || true);
assertFalse(false || false);

The logical not operator (!) is a unary operator that reverse the boolean value of an expression. If the source expression is true, the not operator results in the entire expression returning false, and vice versa. The TDTT:

assertFalse(!true);
assertTrue(!false);

The logical exclusive-or (xor) operator (^) is a less frequently used binary operator that returns false if both operands result in the same boolean value. The TDTT:

assertFalse(true ^ true);
assertTrue(true ^ false);
assertTrue(false ^ true);
assertFalse(false ^ false);

You can combine multiple logical operators. If you do not provide parentheses, the order of precedence is !, then &&, then ^^, then ||.

Short-Circuiting

The && and || logical operators just shown are known as short-circuited logical operators. When the Java VM executes an expression using a short-circuited logical operator, it evaluates the left operand (the expression to the left of the operator) of the expression first. The result of the left operand may immediately determine the result of the entire expression.

In the case of or (||), if the left operand is true, there is no need to evaluate the right operand. The entire expression will always be true. The Java VM saves time by executing only half of the expression. With respect to and (&&), if the left operand is false, the entire expression will always be false. The Java VM will not evaluate the right operand.

You may have a rare need to always execute both sides of the expression. The code on the right-hand side may do something that you need to always happen. This is poor design. Lesson 4 points out that methods should either effect actions or return information, but not both. Likewise, operands should be viewed as atomic operations that are either commands or queries. You should be able to restructure the code so that the action occurs before executing the complex conditional.

If you still feel compelled to always execute both operands, you can use non-short-circuited logical operators. The non-short-circuited and operator is a single ampersand (&). The non-short-circuited or operator is a single pipe (|).

Note that xor (^) is always non-short-circuited. Java cannot discern the result of an xor expression from only one operand. Refer to the TDTT above to see why.

Hash Tables

image

You need to create a student directory to contain students enrolled in the university. The system must allow each student to be assigned a unique identifier, or ID. The system must allow you to later retrieve each student from the directory using its ID.

To build this story, you will create a StudentDirectory class. While you could use an ArrayList to store all students, you will instead use a Map. You learned a bit about maps in Lesson 6. As a reminder, Map is the interface for a data structure that allows elements to be stored and received based on a key. A map is a collection of key-value pairs. You store, or put, an element at a certain key in a map. You later retrieve, or get, that element by supplying the proper key.

In Lesson 6, you learned about the EnumMap implementation of the Map interface. If the keys in a Map are of enum type, you should use an EnumMap. Otherwise you will usually want to use the workhorse implementation of Map, java.util.HashMap. HashMap is a general-use map based on a data structure known as a hash table. Most Map code you encounter will use the HashMap implementation (or the legacy Hashtable implementation—see Lesson 7).

A hash table provides a specific way to implement a map. It is designed for rapid insertion and retrieval.

To build the student directory, start with the test:

image

image

The test, testStoreAndRetrieve, uses a for loop to construct and insert a number of students into the StudentDirectory. The test verifies, again using a for loop, that each student can be found in the directory using the Student directory method findById.

The StudentDirectory code is brief.

image

A StudentDirectory object encapsulates a HashMap, students. When client code invokes the add method, the code in add inserts the student argument into the HashMap. The key for the inserted student is the student's ID.

The ID must be unique. If you put a new value at an ID for which an entry already exists, the old value is replaced.

Later in this lesson, you will revisit hash tables. First, however, you will learn about equality. And before you can learn about equality, you have a story to implement.

Courses

image

A Session represents the teaching of a course for a given start date. Most courses are taught once per term. Each course may thus be represented by many Session instances.2

2 This is a small school, so we're ignoring the possibility that there may need to be two separate sections of a session in order to accommodate a large number of students.

Currently the Session class contains the department and number information for the associated course, as well as its number of credits. If there can be multiple sessions of a course, this implementation is inefficient and troublesome, since the course information will be repeated throughout each Session object.

What you want instead is a way of tying many session objects back to a single Course object. The UML diagram in Figure 9.1 shows this many-to-one relationship from Session to a new class named Course.

Figure 9.1. A Course Has Many Sessions

image

You will refactor your code to the above relationship. Start by creating a simple Course class that captures the department, number, and number of credits. A simple test declares that department and number are required to create a Course. The department and number combined represent the unique key for a Course. Two separate courses cannot have the same department and number.

package sis.studentinfo;

import junit.framework.*;

public class CourseTest extends TestCase {
   public void testCreate() {
      Course course = new Course("CMSC", "120");
      assertEquals("CMSC", course.getDepartment());
      assertEquals("120", course.getNumber());
   }
}

Course starts out as a simple data object. It contains a constructor and two getters for the key fields.

package sis.studentinfo;

public class Course {
   private String department;
   private String number;

   public Course(String department, String number) {
      this.department = department;
      this.number = number;
   }

   public String getDepartment() {
      return department;
   }

   public String getNumber() {
      return number;
   }
}

Refactoring Session

Here's an overview of the steps you'll undertake in this refactoring:

  1. Construct session objects using a Course object. Impacts: SessionTest, Session, CourseSessionTest, SummerCourseSessionTest, RosterReporterTest, CourseReportTest.
  2. Change the creation methods in CourseSession and SummerCourse-Session to take a Course as a parameter.
  3. Modify CourseSession and SummerCourseSession constructors to take a Course.
  4. Modify the Session constructor to take a Course.
  5. Store a Course reference in Session instead of department and number strings.

You will make these changes incrementally. With each change, you'll use the compiler to help you find problems, then run your tests to ensure that you didn't break anything.

Starting in SessionTest:

Currently, the abstract method createSession constructs Session objects using department and number strings:

abstract protected Session createSession(
   String department, String number, Date startDate);

You want to change this creation method to reflect the need to construct Session objects using a Course object:

abstract public class SessionTest extends TestCase {
   ...
   abstract protected Session createSession(
      Course course, Date startDate);

This change will necessitate quite a few other changes, but overall it shouldn't take more than 5 minutes to make them. In SessionTest, you will need to change both the setUp method and testComparable. Here's the existing setUp method in SessionTest (I've boldfaced the code that will need to change):

public void setUp() {
   startDate = createDate(2003, 1, 6);
   session = createSession("ENGL", "101", startDate);
   session.setNumberOfCredits(CREDITS);
}

I've shown the change to setUp here; changes to testComparable are similar:

protected void setUp() {
   startDate = new Date();
   session = createSession(new Course("ENGL", "101"), startDate);
   session.setNumberOfCredits(CREDITS);
}

Use the compiler to guide you through making these changes. As you fix one problem, the compiler can take you to the next problem to fix.

You will need to change both SummerCourseSessionTest and CourseSessionTest. In the interest of incrementalism, try to make the smallest amount of changes that will get you to a green bar. You can modify both CourseSessionTest and SummerCourseSessionTest without having to touch CourseSession or SummerCourseSession.

Here are the changes to the tests. This time I've included the old lines of code from the previous version of the code and struck them out so you could see the difference line to line. I'll use this approach for the remainder of the refactoring.

image

image

Tests should pass at this point. Now modify the Session subclass creation methods to take a Course directly.

image

This change impacts RosterReporterTest:

image

And also CourseReportTest3:

3 I've taken some liberties by updating both RosterReporterTest and CourseReportTest with Java features you have learned since originally coding them. For example, the code now assigns a CourseSession object to a Session reference. Doing so will also force you to change to RosterReporter and CourseReport.

image

image

Change the creation methods in CourseSession and SummerCourseSession to take a Course as a parameter, but—for the moment—continue to pass along the department and number to the constructor.

image

Tests pass. Now push the course into the constructor and rerun the tests.

image

Tedious? Perhaps. Simple? Each one of the steps in this refactoring is very basic and takes only seconds to execute. Safe? Extremely. The overall time expenditure should only be a few minutes before everything is working. You get to see a green bar several times during the gradual code transformation. The foremost authority on refactoring, Martin Fowler, suggests that you can never run your tests too frequently.4

4 [Fowler2000], p. 94.

The alternative would be to make the changes all at once and hope they work. You might save a couple of minutes, but there is a likely possibility that you will make a mistake. Correcting the mistake will take you more time than you saved.

You're almost there. Push the Course object into the superclass Session constructor.

image

Finally, you can change the Session class to store a Course reference instead of the separate department and number String references.

image

Equality

image

In an upcoming lesson, you will build a course catalog. One requirement for the course catalog will be to ensure that duplicate courses are not added to the catalog. Courses are semantically equal if they have the same department and course number.

Now that you have a functional Course class, you will want to override the equals method to supply a semantic definition for equality. If you have two Course objects and the department and number fields in both objects contain the same (corresponding) String, comparing the two should return true.

// in CourseTest.java:
public void testEquality() {
   Course courseA = new Course("NURS", "201");
   Course courseAPrime = new Course("NURS", "201");
   assertEquals(courseA, courseAPrime);
}

This test fails. Remember from Lesson 1 that the object courseA is a separate object in memory, distinct from courseAPrime. The assertEquals method translates roughly5 to the underlying lines of code:

5 The actual code is slightly more involved.

if (!courseA.equals(courseAPrime))
   fail();

In other words, the comparison between two objects in assertEquals uses the equals method defined on the receiver. The receiver is the first parameter, which JUnit refers to as the expected value. The receiver in this example is courseA.

You have not yet defined an equals method in Course. Java therefore searches up the hierarchy chain to find one. The only superclass of Course is the implicit superclass Object. In Object you will find the default definition of equals:

package java.lang;
public class Object {
   // ...
   public boolean equals(Object obj) {
      return (this == obj);
   }
   // ...
}

The equals implementation in Object compares the reference—the memory location—of the receiver to the reference of the object being passed in.

image

If you do not provide a definition for equals in your subclass, you get the default definition, memory equivalence.

Memory equivalence exists if two references point to the exact same object in memory.

You will want to override the equals method in the Course object. The Course definition of equals will override the Object definition of equals. The equals method should return true if the receiver object is semantically equal to the parameter object. For Course, equals should return true if the department and course number of the receiver is equal to that of the parameter Course.

Let's code this slowly, one step at a time. For now, code the equals method in Course to most simply pass the test:

@Override
public boolean equals(Object object) {
   return true;
}

The equals method must match the signature that assertEquals expects. The assertEquals method expects to call an equals method that takes an Object as parameter. If you were to specify that equals took a Course object as parameter, the assertEquals method would not find it:

@Override
public boolean equals(Course course) {// THIS WILL NOT WORK
   return true;
}

The assertEquals method would instead invoke the Object definition of equals.

Add a comparison to the test that should fail (since equals always returns true):

image

Once you see that this test fails, update your equals method:

image

You must first cast the object argument to a Course reference so that you can refer to its instance variables. The local variable name you give to the argument helps differentiate between the receiver (this) and the argument (that). The return statement compares this Course's department and number to the department and number of that Course. If both (&&) are equal, the equals method returns true.

The Contract for Equality

The Java API documentation for the equals method in Object provides a list of what defines an equivalence relation between two objects:

image

The above contract should hold true for all non-null values of x and y. You can implement each of these rules in your unit test for equality.6

6 A nice set of JUnit extensions, located at http://sourceforge.net/projects/junit-addons, provides an automated means of testing the equality contract.

image

When you run your tests, everything will pass but the final assertion in testEquality. You will receive a NullPointerException: The that argument is null and the equals code attempts to access its fields (department and object). You can add a guard clause to return false if the argument is null:

@Override
public boolean equals(Object object) {
   if (object == null)
      return false;
   Course that = (Course)object;
   return
      this.department.equals(that.department) &&
      this.number.equals(that.number);
}

You might choose to represent each of the “qualities of equality” (reflexivity, transitivity, symmetry, consistency, and comparison to null) in a separate test. There are differing views on approaches to TDD. Some developers feel that you should structure your code so that each test contains a single assertion.7 I take the viewpoint that the goal of a test is to prove a single piece of functionality. Doing so may require many assertions/postconditions. In this example, the many assertions you just coded represent a complete contract for a single piece of functionality—equality.

7 [Astels2004].

Apples and Oranges

Since the equals method takes an Object as parameter, you can pass anything to it (and the code will still compile). Your equals method should handle this situation:

// apples & oranges
assertFalse(courseA.equals("CMSC-120"));

Even though the String value matches the department and number of courseA, a Course is not a String. You would expect this comparison to return false. When Java executes the equals method, however, you receive a ClassCastException. The line of the equals method that executes the cast is the cause:

@Override
public boolean equals(Object object) {
   if (object == null)
      return false;
   Course that = (Course)object;
   return
      this.department.equals(that.department) &&
      this.number.equals(that.number);
}

The code attempts to cast the String parameter to a Course reference. This invalid cast generates the ClassCastException.

You need a guard clause that immediately returns false if someone throws an orange at your equals method. The guard clause ensures that the type of the parameter matches the type of the receiver.

@Override
public boolean equals(Object object) {
   if (object == null)
      return false;
   if (this.getClass() != object.getClass())
      return false;
   Course that = (Course)object;
   return
      this.department.equals(that.department) &&
      this.number.equals(that.number);
}

You learned about the getClass method in Lesson 8. It returns a class constant. For the example comparison that threw the ClassCastException, the class constant of the receiver is Course.class, and the class constant of the parameter is String.class. Class constants are unique—there is only one “instance” of the Class object Course.class. This uniqueness allows you to compare class constants using the operator != (not equal to) instead of comparing them using equals.

Some developers choose to use the instanceof operator. The instanceof operator returns true if an object is an instance of, or subclass of, the target class. Here is the equals method rewritten to use instanceof.

@Override
public boolean equals(Object object) {
   if (object == null)
      return false;
   if (!(object instanceof Course))
      return false;
   Course that = (Course)object;
   return
      this.department.equals(that.department) &&
      this.number.equals(that.number);
}

You should initially prefer to compare the classes of the receiver and argument, instead of using instanceof. The class comparison only returns true if both objects are of the exact same type. Introduce instanceof later if you need to compare objects irrespective of their position in an inheritance hierarchy. See Lesson 12 for more information on instanceof.

JUnit and Equality

Sometimes I will code equals tests to be a bit more explicit. Instead of saying:

assertEquals(courseA, courseB);

I'll express the comparison as:

assertTrue(courseA.equals(courseB));

You should be clear on the fact that assertEquals uses the equals method for reference comparisons. But it can be helpful to explicitly emphasize that fact, and show that the equals method is what you're actually testing.

You may want to compare two references to see if they are the same. In other words, do they point to the same object in memory? You could code this comparison as:

assertTrue(courseA == courseB);

or as:

assertSame(courseA, courseB);

You should prefer the second assertion form, since it supplies a better error message upon failure. As before, though, you might choose to explicitly show the == comparison if for purposes of building an equals method test.

Collections and Equality

The student information system will need to create a course catalog that contains all of the Course objects. You will thus need to be able to store a Course in a collection, then verify that the collection contains the object.

image

The contains method defined in ArrayList loops through all contained objects, comparing each to the parameter using the equals method. This test should pass with no further changes. In fact, the test for containment is more of a language test. It tests the functionality of ArrayList, by demonstrating that ArrayList uses the equals method for testing containment. The containment test does not add to the equality contract. Feel free to remove it.

What about using the Course as a key in a HashMap object?

image

The above bit of test demonstrates a Map method, containsKey. This method returns true if a matching key exists in the map. Even though a Map is a collection, just as a List is, this test fails! To understand why, you must understand how the HashMap class is implemented.

Hash Tables

An ArrayList encapsulates an array, a contiguous block of space in memory. Finding an object in an array requires serially traversing the array in order to find an element. The class java.util.HashMap is built around the hash table construct mentioned at the beginning of this lesson. It too is based on a contiguous block of space in memory.

I can pictorially represent a hash table as a bunch of slots (Figure 9.2).

Figure 9.2. A Hash Table with Ten Slots

image

When inserting an element into a hash table, the code rapidly determines a slot for the element by first asking it for its hash code. A hash code is simply an integer, ideally as unique as possible. The contract for hash code is based on the definition of equality for a class: If two objects are equal, their hash code should be equal. If two objects are not equal, then their hash codes are ideally (but not necessarily) unequal.

Once the hash code is available, a bit of arithmetic determines the slot:

hash code % table size = slot number

(Remember that the modulus operator, %, returns the remainder of the equivalent integer division.) For example, if you insert a Course object into a hash table of size 10 and the hash code value of the course is 65, it would go into the fifth slot:

65 % 10 = 5

Just as in an array, Java calculates the memory address for the start of a slot using the formula:

offset + (slot size * slot number)

Figure 9.3 shows a Course element slotted into a hash table.

Figure 9.3. Hash Table with Course in the Fifth Slot

image

The hash code is an int that is retrieved by sending the message hashCode to any Object. The class java.lang.Object supplies a default definition of hashCode. It returns a unique number based upon the memory address of the object.

To retrieve an object from a hash table, you recalculate the slot by asking for the object's hashCode again. You can then do the same offset calculation to immediately retrieve the object.

So what goes in a hashCode method? It must return an int, and as mentioned, two equal objects must return the same hashCode. The simplest, but worst, solution would be to return a constant, such as the number 1.

Collisions

The hash code for an object should be as unique as possible. If two unequal objects return the same hash code, they will resolve to the same slot in the hash table. This is known as a collision. The implication of a collision is extra logic and extra time to maintain a list of collided objects. A few solutions for resolving collisions exist. The simplest is manage a list of collided objects for each slot. Figure 9.4 shows a pictorial representation.

Figure 9.4. Hash Table Collisions

image

Since the possibility of collisions exists, when retrieving an object, the hash table code must also compare the object retrieved from a slot to ensure that it is the one expected. If not, the code must traverse the list of collided objects to find the match.

If all objects were to return the same hashCode, such as 1, then all objects will collide to the same slot in a HashMap. The end result would be that every insertion and deletion would require serially traversing the list of collided objects. This list would include every object inserted. In this degenerate case, you might as well use an ArrayList.

If the hash codes of all objects generate different slots, you have an optimally mapped hash table. Performance is the best possible: All insertions to and retrievals from the table execute in constant time. There is no need to serially traverse a collision list.

An Ideal Hash Algorithm

Java already provides a good hashCode implementation for the class String, which is more often than not the key type for a HashMap. The integral numeric wrapper classes (Integer, Long, Byte, Char, Short) are also often used as a HashMap key; their hash code is simply the number they wrap.

The fields that uniquely identify your object are typically one of these object types (String or numeric). To obtain a decent hash code that satisfies the equality condition, you can simply return the hash code of the unique key. If more than one field is used to represent the unique key, you will need to devise an algorithm to generate the hash code.

The following code, to be added to the Course class, generates a hashCode based on arithmetically combining the hashCodes of department and number. Hash code algorithms typically use prime numbers. The use of primes tends to give better distribution when combined with the modulus operation against the table size.

@Override
public int hashCode() {
   final int hashMultiplier = 41;
   int result = 7;
   result = result * hashMultiplier + department.hashCode();
   result = result * hashMultiplier + number.hashCode();
   return result;
}

The tests should pass once you implement this hashCode method.8 Separate the hash code tests into a new test method that will enforce the complete contract for hashCode.

8 This algorithm was obtained from the example at http://mindprod.com/jgloss/hashcode.html.

public void testHashCode() {
   Course courseA = new Course("NURS", "201");
   Course courseAPrime = new Course("NURS", "201");

   Map<Course, String> map = new HashMap<Course, String>();
   map.put(courseA, "");
   assertTrue(map.containsKey(courseAPrime));

   assertEquals(courseA.hashCode(), courseAPrime.hashCode());
   // consistency
   assertEquals(courseA.hashCode(), courseA.hashCode());
}

The test proves that two semantically equal courses produce the same hash code. It also demonstrates consistency. The return value from hashCode is the same (consistent) with each call to it, as long as the key values for the courses do not change.

The part of the test that demonstrates stuffing the Course into a hash map can be removed, since the contract assertions for hashCode are all that is required. The final set of equality and hash code contract tests:

image

These tests contain a good amount of code! Again, consider JUnit-addons (see footnote 6 above) as a way of automatically enforcing equality and hash code contracts. If you choose to not use JUnit-addons, make sure you refactor to eliminate duplication when writing these tests.

A Final Note on hashCode

image

If you code an equals method for a class, also code a hashCode method.

In the absence of hashCode, the compiler won't complain, and you may not encounter any problems. However, unexpected behavior will likely result if you put objects of that class into a hash table based collection (the class java.util.Set also uses a hash table implementation). Figuring out that there is a missing hashCode method can waste a considerable amount of time. Get in the habit of coding hashCode if you code equals!

If performance is a consideration, you may want to write unit tests for the hash code. One technique is to load a HashSet with a large number of elements, testing the performance to ensure that the insertions execute in a reasonable time. If not, the hash table probably managed a high number of collisions.

public void testHashCodePerformance() {
   final int count = 10000;
   long start = System.currentTimeMillis();
   Map<Course, String> map = new HashMap<Course, String>();
   for (int i = 0; i < count; i++) {
      Course course = new Course("C" + i, "" + i);
      map.put(course, "");
   }
   long stop = System.currentTimeMillis();
   long elapsed = stop - start;
   final long arbitraryThreshold = 200;
   assertTrue("elapsed time = " + elapsed,
      elapsed < arbitraryThreshold);
}

There are many ways to test performance, but the above test provides a simple, adequate mechanism. The class java.lang.System contains a static method, currentTimeMillis, that returns the milliseconds representing the current time stamp. Before starting, you store the current timestamp in a local variable (start). You execute the desired code n times using a for loop. When the loop completes, you capture the timestamp (stop). You obtain the elapsed time by subtracting the start time from the stop time. The assertion ensures that the loop completed in a reasonable amount of time (arbitraryThreshold).

The assertTrue method call contains a String as its first parameter. The String gets printed by JUnit as part of the error message only if the assertion fails. In this example, the failure will show how long the loop actually took to execute.

You may want to isolate performance tests in a separate suite. You will still want to run them regularly (perhaps daily, or after you have completed a significant task). The nature of performance tests makes them slow, however, so you may not want the constant negative impact to the performance of your functional unit test suite.

If you find you need more comprehensive unit testing, either build your own framework by eliminating duplication or consider a tool such as JunitPerf.9 JUnit itself contains some rudimentary hooks to help you with repetitive testing.

9 See http://www.clarkware.com/software/JUnitPerf.html.

To demonstrate the usefulness of the example performance test, change the hashCode implementation in Course to return a constant number:

@Override
public int hashCode() {
   return 1;
}

Even with only 10,000 entries into the HashMap, the test should fail miserably. Revert the hashCode to its previous form; the test should now pass.

Another technique for testing hashCode is to ensure that the variance that a number of hash codes produces is reasonably high. The variance is a measure of how spread out a distribution of numbers is. You calculate the variance as the average of the squared deviation of each number from the mean.

The problem with both of these testing techniques is that they are based on arbitrary measures of what is acceptable. As a developer, you can determine the value of this kind of testing and choose to defer it until you can pinpoint a true performance problem to the hash table.

Minimally, you should at least write enough tests to prove that the contract for hashCode holds true.

More on Using HashMaps

Like a list, you will often want to iterate through the contents of a hash table. Your needs may include iterating through only the keys, through only the values or through both simultaneously.

In Lesson 6, you built a ReportCard class. You used ReportCard to store a map of grades to corresponding messages for printing on a report card. As a reminder, here is the source for ReportCard:

image

The following three new tests for ReportCard demonstrate how to iterate through each of the three possibilities. (These are unnecessary tests, and act more like language tests.) Each of the three tests show different but valid testing approaches.

image

The first test, testKeys, provides one technique for ensuring that you iterate through all expected values in a collection. You first create a set and populating it with the objects (grades in this example) you expect to receive. You then create a secondary set and add to it each element that you iterate through. After iteration, you compare the two sets to ensure that they are equal.

Perhaps you remember sets from grade school. (Remember Venn diagrams?) If not, no big deal. A set is an unordered collection that contains unique elements. If you attempt to add a duplicate element to a set, the set rejects it. In Java, the java.util.Set interface declares the behavior of a set. The class java.util.HashSet is the workhorse implementation of the Set interface. The HashSet class compares objects using equals to determine if a candidate object is a duplicate.

Based on your understanding of how a hash table works, it should be clear that you cannot guarantee that its keys will appear in any specific order. It should also be clear that each key is unique. You cannot have two entries in a hash table with the same key. As such, when you send the message keySet to a HashMap object, it returns the keys to you as a set.

As shown in testKeys, you iterate a Set much as you iterate a List, using a for-each loop.

In contrast, the values in a hash table can be duplicates. For example, you might choose to print the same message (“get help”) for a student with a D or an F. A HashMap thus cannot return the values as a Set. Instead, it returns the values as a java.util.Collection.

Collection is an interface implemented by both HashSet and ArrayList. The Collection interface declares basic collection functionality, including the abilities to add objects to a collection and obtain an iterator from a collection. The Collection interface does not imply any order to a collection, so it does not declare methods related to indexed access as does the List interface. Both java.util.Set and java.util.List interfaces extend the Collection interface.

public void testValues() {
   List<String> expectedMessages = new ArrayList<String>();
   expectedMessages.add(ReportCard.A_MESSAGE);
   expectedMessages.add(ReportCard.B_MESSAGE);
   expectedMessages.add(ReportCard.C_MESSAGE);
   expectedMessages.add(ReportCard.D_MESSAGE);
   expectedMessages.add(ReportCard.F_MESSAGE);

   Collection<String> messages = card.getMessages().values();
   for (String message: messages)
      assertTrue(expectedMessages.contains(message));
   assertEquals(expectedMessages.size(), messages.size());
}

The test, testValues, uses a slightly different technique. You again create a Set to represent expected values, then add to it each possible value. You obtain the hash table values by sending the message values to the HashMap object.

As you iterate through the values, you verify that the list of expected values contains each element. The contains method uses the equals method to determine if the collection includes an object. Finally, you must also test the size of the secondary set to ensure that it does not contain more elements than expected.

The third test, testEntries, shows how you can iterate through the keys and associated values (the maps) simultaneously. If you send the message entrySet to a HashMap, it returns a Set of key-value pairs. You can iterate through this set using a for-each loop. For each pass through the loop you get a Map.Entry reference that stores a key-value pair. You can retrieve the key and value from a Map.Entry by using the methods getKey and getValue. The Map.Entry is bound to the same key type (Student.Grade in the example) and value type (String) as the HashMap object.

public void testEntries() {
   Set<Entry> entries = new HashSet<Entry>();

   for (Map.Entry<Student.Grade,String> entry:
         card.getMessages().entrySet())
      entries.add(
         new Entry(entry.getKey(), entry.getValue()));

   Set<Entry> expectedEntries = new HashSet<Entry>();
   expectedEntries.add(
      new Entry(Student.Grade.A, ReportCard.A_MESSAGE));
   expectedEntries.add(
      new Entry(Student.Grade.B, ReportCard.B_MESSAGE));
   expectedEntries.add(
      new Entry(Student.Grade.C, ReportCard.C_MESSAGE));
   expectedEntries.add(
      new Entry(Student.Grade.D, ReportCard.D_MESSAGE));
   expectedEntries.add(
      new Entry(Student.Grade.F, ReportCard.F_MESSAGE));

   assertEquals(expectedEntries, entries);
}

You need to test that the for-each loop iterates over all the entries. You can create a secondary set of expected entries, like in the previous tests, and compare this expected set to the set populated by iteration. To the expected set you need to add the appropriate key-value entries, but unfortunately, Map.Entry is an interface. No concrete class to store a key-value pair is available to you.

Instead, you will create your own Entry class to store a key-value pair. The process of iteration will instantiate Entry objects and add them to the “actuals” set. Also, you will populate the expected set with Entry objects. You can then compare the two sets to ensure that they contain equal Entry objects.

image

The class Entry is a quick-and-dirty class used solely for testing. I therefore chose to write no tests for it. The Entry class contains an equals method and its companion method hashCode. Since this is a quick-and-dirty test class, I might have chosen to omit the hashCode method, but a successful test requires it. Try running the tests without hashCode, and make sure you understand why the test fails without it.

Additional Hash Table and Set Implementations

The package java.util includes several additional classes that implement the Map and Set interfaces. I'll briefly overview some of these. Consult the Java API documentation for the package java.util for more details on their use.

EnumSet

You learned about EnumMap in Lesson 6. The EnumSet class works a little differently. It is defined as an abstract class. You create an instance of an (unnamed) EnumSet subclass by calling one of almost a dozen EnumSet class methods. I summarize these factory methods in Table 9.1.

Table 9.1. EnumSet Factory Methods

image

Using EnumSet, you can simplify the ReportCardTest method testKeys (above) as follows:

public void testKeys() {
   Set<Student.Grade> expectedGrades =
      EnumSet.allOf(Student.Grade.class);
   Set<Student.Grade> grades =
      EnumSet.noneOf(Student.Grade.class);
   for (Student.Grade grade: card.getMessages().keySet())
      grades.add(grade);
   assertEquals(expectedGrades, grades);
}

TreeSet and TreeMap

A TreeSet maintains the order of its elements based on their natural sorting order (that you define using Comparable) or a sort order you define using a Comparator object. A TreeMap maintains the order of its keys in a similar manner. As an example, you might choose to ensure that the report card messages stay ordered by student grade. You pay for this behavior. Instead of almost immediate insertion and retrieval time, as you expect when using a HashMap or HashSet, operation times increase logarithmically as the size of the collection increases.

LinkedHashSet and LinkedHashMap

A LinkedHashSet maintains the order of its elements chronologically: It remembers the order in which you inserted elements. It does so by maintaining a linked list behind the scenes. Its performance is comparable to that of HashSet: add, contains, and remove all execute in constant time but with slightly more overhead. Iterating against a LinkedHashSet can actually execute faster than iterating against a regular HashSet. The Map analog is LinkedHashMap.

IdentityHashMap

An IdentityHashMap uses reference equality (==) to compare keys instead of equality (equals). You would use this Map implementation if you needed keys to represent unique instances. This class is not a general-purpose map implementation since it violates the contract that Maps should compare keys using equals.

toString

You have seen how you can send the toString message to a StringBuilder object in order to extract its contents as a String.

The toString method is one of the few methods that the Java designers chose to define in java.lang.Object. Its primary purpose is to return a printable (String) representation of an object. You could use this printable representation as part of an end-user application, but don't. The toString method is more useful for developers, to help in debugging or deciphering a JUnit message.

You want objects of the Course class to display a useful string when you print them out in a JUnit error message. One way would be to extract its elements individually and produce a concatenated string. Another way is to use the toString method.

public void testToString() {
   Course course = new Course("ENGL", "301");
   assertEquals("ENGL 301", course.toString());
}

Since toString is inherited from Object, you get a default implementation that fails the test:

expected: <ENGL 301> but was: <studentinfo.Course@53742e9>

This is the implementation that passes the test:

@Override
public String toString() {
   return department + " " + number;
}

Many IDEs will display objects in watch or inspection windows by sending them the toString message. JUnit uses toString to display an appropriate message upon failure of an assertEquals method. If you code:

assertEquals(course, new Course("SPAN", "420"));

then JUnit will now display the following ComparisonFailure message:

expected: <ENGL 301> but was: <SPAN 420>

You can concatenate an object to a String by using the + operator. Java sends the toString message to the object in order to “convert” it to a String object. For example:

assertEquals("Course: ENGL 301", "Course: " + course);

The default implementation of toString is rarely useful. You will probably want to supply a more useful version in most of your subclasses. If you only use the toString implementation for the purpose of debugging or understanding code, writing a test is not necessary.

In fact, toString's definition is often volatile. It depends upon what a developer wants to see as a summary string at a given time. Creating an unnecessary toString test means that you have made the production class a little harder to change.

One of the main reason you write tests is to allow you to change code with impunity, without fear. For each test that is missing, your confidence in modifying code diminishes. Writing tests enables change. The irony is that the more tests you have, the more time-consuming it will be to make changes.

You should never use the difficulty of maintaining tests as an excuse to not write tests. But you should also not write tests for things that are unnecessary to test. You should also code your tests so that they are as resistant as possible to change. Finding the right balance is a matter of experience and good judgment.

Ultimately, it's your call. My usual recommendation is to start out with more tests than necessary. Cut back only when they become a problem that you can't fix through other means.

Strings and Equality

The String class is optimized for heavy use. In most applications, the number of String objects created vastly outweighs the number of any other type being created.

The current version of the Java VM provides a set of command-line switches to turn on various types of profiling. You can enter:

java -agentlib:hprof=help

at the command line for further details.10

10 This feature is not guaranteed to be part of future versions of Java.

I ran a heap profile against the JUnit run of the current suite of almost 60 tests. The Java VM created about 19,250 objects in the fraction of the second required to run the tests. Out of those objects, over 7,100 were String objects. The object count for the next most frequently instantiated type was around 2,000. Counts dropped off dramatically after that.

These numbers demonstrate the key role that the String class plays in Java. String objects will account for a third to over a half of all objects created during a typical application execution. As such, it is critical that the String class perform well. It is also critical that the creation of so many strings does not bloat the memory space.

To minimize the amount of memory used, the String class uses a literal pool. The main idea is that if two String objects contain the same characters, they share the same memory space (the “pool”). The String literal pool is an implementation of a design pattern called Flyweight.11 Flyweight is based on sharing and is designed for efficient use of large numbers of fine-grained objects.

11 [Gamma1995].

To demonstrate:

String a = "we have the technology";
String b = "we have the technology";
assertTrue(a.equals(b));
assertTrue(a == b);

The strings a and b are semantically equal (.equals): Their character content is the same. The string references also have memory equivalence (==): They point to the same String object in memory.

Because of this optimization, as a beginning Java programmer it is easy to think you should compare String objects using ==.

if (name == "David Thomas")

But this is almost always a mistake. It is possible for two strings to contain the same characters but not be stored in the same memory location. This occurs because Java cannot optimize Strings that are constructed through the use of StringBuilder or concatenation.12

12 When compiled, code using the + for String concatenation translates to equivalent code that uses the StringBuffer class.

image

Compare Strings using equals, not ==.

String c = "we have";
c += " the technology";
assertTrue(a.equals(c));
assertFalse(a == c);

This test passes. The Java VM stores c in a different memory location than a.

You might be able to compare String objects using == in certain controlled circumstances in your code. Doing so can provide a boost in performance. However, you should use == for a String comparison only as a performance optimization and document it as such with explicit comments. Being clever and using memory equivalence comparisons for String in other circumstances is a great way to invite insidious bugs that are difficult to track down.

Exercises

  1. Create a String literal using the first two sentences of this exercise. You will create a WordCount class to parse through the text and count the number of instances of each word. Punctuation should not appear in the word list, either as part of a word or as a separate “word.” Use a map to store the frequencies. The WordCount class should be able to return a set of strings, each showing the word and its frequency. Track the frequency without respect to case. In other words, two words spelled the same but with different case are the same word. Hint: You can use the regular expression W+ in conjunction with the String split method in order to extract the words from the String.
  2. Create a class, Name, that declares a single string field. Create an equals method on the class that uses the string for comparisons. Do not create a hashCode method. Build a test that verifies the contract of equality for Name.
  3. Create a Set<Name> object that contains a variety of Name objects, including new Name("Foo"). Verify that the Set contains method returns false if you ask the Set whether or not it contains the Name("Foo") instance. Show that if you create:

    Name foo = new Name("Foo");

    the set does contain foo.

  4. Modify the test for Name (see the previous question) to pass only if the list does contain new Name("Foo").
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.234.229