Chapter 15

A Collection of Useful Classes

WHAT YOU WILL LEARN IN THIS CHAPTER

  • How to use the static methods in the Arrays class for filling, copying, comparing, sorting, and searching arrays
  • How to use the Observable class and the Observer interface to communicate between objects
  • What facilities the Random class provides
  • How to create and use Date and Calendar objects
  • What regular expressions are and how you can create and use them
  • What a Scanner class does and how you use it

In this chapter you look at some more useful classes in the java.util package, but this time they are not collection classes — just a collection of classes. You also look at the facilities provided by classes in the java.util.regex package that implement regular expressions in Java. Support for regular expressions is a very powerful and important feature of Java.

UTILITY METHODS FOR ARRAYS

The java.util.Arrays class defines a set of static methods for operating on arrays. You have methods for sorting and searching arrays, as well as methods for comparing arrays of elements of a basic type. You also have methods for filling arrays with a given value. Let’s look at the simplest method first, the fill() method for filling an array.

Filling an Array

The need to fill an array with a specific value arises quite often, and you already met the static fill() method that is defined in the Arrays class in Chapter 4. The fill() method comes in a number of overloaded versions of the form

fill(type[] array, type value)
 

Here type is a placeholder for the types supported by various versions of the method. The method stores value in each element of array. The return type is void so there is no return value. There are versions supporting type as any of the following:

image

Here’s how you could fill an array of integers with a particular value:

long[] values = new long[1000];
java.util.Arrays.fill(values, 888L);  // Every element as 888
 

It’s quite easy to initialize multidimensional arrays, too. To initialize a two-dimensional array, for example, you treat it as an array of one-dimensional arrays. For example:

int[][] dataValues = new int[10][20];
for(int[] row : dataValues) {
  Arrays.fill(row, 99);
}
 

The for loop sets every element on the dataValues array to 99. The loop iterates over the 10 arrays of 20 elements that make up the dataValues array. If you want to set the rows in the array to different values, you could do it like this:

int initial = 0;
int[][] dataValues = new int[10][20];
for(int[] row : dataValues) {
  Arrays.fill(row, ++initial);
}
 

This results in the first row of 20 elements being set to 1, the second row of 20 elements to 2, and so on through to the last row of 20 elements that is set to 10.

The version of fill() that accepts an argument of type Object[]obviously processes an array of any class type. You could fill an array of Person objects like this:

Person[] people = new Person[100];
java.util.Arrays.fill(people, new Person("John", "Doe"));
 

This inserts a reference to the object passed as the second argument to the fill() method in every element of the people array. Note that there is only one Person object that all the array elements reference.

Another version of fill() accepts four arguments. This is of the form:

fill(type[] array, int fromIndex, int toIndex, type value)
 

This fills part of array with value, starting at array[fromIndex] up to and including array[toIndex-1]. There are versions of this method for the same range of types as the previous set of fill() methods. This variety of fill()throws an IllegalArgumentException if fromIndex is greater than toIndex. It also throws an ArrayIndexOutOfBoundsException if fromIndex is negative or toIndex is greater than array.length. Here’s an example of using this form of the fill() method:

Person[] people = new Person[100];
java.util.Arrays.fill(people, 0, 50, new Person("Jane", "Doe"));
java.util.Arrays.fill(people, 50, 100, new Person("John", "Doe"));
 

This sets the first 50 elements to reference one Person object and the second 50 elements to reference another.

Copying an Array

You can copy an array of any type using the static copyOf() method in the Arrays class. Here’s an example:

String[] decisions = {"yes", "no", "maybe", "definitely not"};
String[] copyDecisions = Arrays.copyOf(decisions, decisions.length);
 

copyDecisions references a new array that is returned by the copyOf() method that is a duplicate of decisions because the second argument specifies the same length as decisions. If the second argument to copyOf() is negative, a NegativeArraySizeException is thrown. If the first argument is null, a NullPointerException is thrown. Both exceptions have RuntimeException as a base class and therefore need not be caught.

You can arrange for the array copy to be truncated, or to have an increased number of elements by specifying a different value for the second argument. For example:

String[]copyDecisions1 = Arrays.copyOf(decisions, decisions.length - 2);
String[]copyDecisions2 = Arrays.copyOf(decisions, decisions.length + 5);
 

Here copyDecisions1 has just two elements corresponding to the first two elements of decisions. The copyDecisions2 array has nine elements. The first four are identical to that of the first argument, decisions, and the last five are set to null.

You can also create a new array from part of an existing array. For example:

String[]copyDecisions = Arrays.copyOfRange(decisions, 1, 3);
 

The new array that is created contains two elements, "no" and "maybe". The second and third arguments specify the index values for the first element to be copied and one beyond the last element to be copied respectively. The second argument must be between zero and the length of the array that is the first argument, otherwise an ArrayIndexOutOfBoundsException is thrown. If the second argument is not less than the third argument, an IllegalArgumentException is thrown. The third argument can be greater than the length of the array being copied, in which case the excess elements are supplied as the equivalent of null — that is, null for an array containing objects and zero for numerical elements. If the first argument is null, a NullPointerException is thrown. All the exceptions that the copyOfRange() method can throw are subclasses of RuntimeException so you are not obliged to catch them.

Comparing Arrays

There are nine overloaded versions of the static equals() method for comparing arrays defined in the Arrays class, one for each of the types that apply to the fill() method. All versions of equals() are of the form:

boolean equals(type[] array1, type[] array2)
 

The method returns true if array1 is equal to array2 and false otherwise. The two arrays are equal if they contain the same number of elements and the values of all corresponding elements in the two arrays are equal. If array1 and array2 are both null, they are also considered to be equal.

When floating-point arrays are compared, 0.0 is considered to be equal to -0.0, and elements that contain NaN are also considered to be equal. Array elements of a class type are compared by calling their equals() method. If you have not implemented the equals() method in your own classes, then the version inherited from the Object class is used. This compares references, not objects, and so returns true only if both references refer to the same object.

Here’s how you can compare two arrays:

String[] numbers = {"one", "two", "three", "four" };
String[] values = {"one", "two", "three", "four" };
if(java.util.Arrays.equals(numbers, values)) {
  System.out.println("The arrays are equal");
} else {
  System.out.println("The arrays are not equal");
}
 

In this fragment both arrays are equal so the equals() method returns true.

image

Sorting Arrays

The static sort() method in the Arrays class sorts the elements of an array that you pass as the argument into ascending sequence. The method is overloaded for eight of the nine types (boolean is excluded) you saw for the fill() method, for each of two versions of sort():

void sort(type[] array)
void sort(type[] array, int fromIndex, int toIndex)
 

The first variety sorts the entire array into ascending sequence. The second sorts the elements from array[fromIndex] to array[toIndex-1] into ascending sequence. This throws an IllegalArgumentException if fromIndex is greater than toIndex. It throws an ArrayIndexOutOfBoundsException if fromIndex is negative or toIndex is greater than array.length.

You can pass an array of elements of any class type to the versions of the sort() method that have the first parameter as type Object[]. If you are using either variety of the sort() method to sort an array of objects, then the objects must support the Comparable<> interface because the sort() method uses the compareTo() method.

Here’s how you can sort an array of strings:

String[] numbers = {"one", "two", "three", "four", "five",
                    "six", "seven", "eight"};
java.util.Arrays.sort(numbers);
 

After executing these statements, the elements of the numbers array contain:

"eight" "five" "four" "one" "seven" "six" "three" "two"
 

Two additional versions of the sort() method that sort arrays of objects are parameterized methods. These are for sorting arrays in which the order of elements is determined by an external comparator object. The class type of the comparator object must implement the java.util.Comparator<> interface. One advantage of using an external comparator is that you can have several comparators that can impose different orderings depending on the circumstances. For example, in some cases you might want to sort a name file ordering by first name within second name. On other occasions you might want to sort by second name within first name. You can’t do this using the Comparable<> interface implemented by the class. The first version of the sort() method that makes use of a comparator is:

<T>void sort(T[] array, Comparator<? super T> comparator)
 

This sorts all the elements of array using the comparator you pass as the second argument.

The second version of the sort() method using a comparator is:

<T>void sort(T[] array, int fromIndex, int toIndex, Comparator<? super T>  comparator)
 

This sorts the elements of array from index position fromIndex up to but excluding the element at index position toIndex.

The wildcard parameter to the Comparator<> type specifies that the type argument to the comparator can be T or any superclass of T. This implies that the sort() method can sort an array of elements of type T using a Comparator<> object that can compare objects of type T or objects of any superclass of T. To put this in a specific context, this means that you can use an object of type Comparator<Person> to sort an array of objects of type Manager, where Manager is a subclass of Person.

The Comparator<T> interface declares two methods. First is the compare() method, which the sort() method uses for comparing elements of the array of type T[]. The method compares two objects of type T that are passed as arguments, so it’s of the form:

int compare(T obj1, T obj2)
 

The method returns a negative integer, zero, or a positive integer, depending on whether obj1 is less than, equal to, or greater than obj2. The method throws a ClassCastException if the types of the argument you pass are such that they cannot be compared by the comparator.

The second method in the Comparator<T> interface is equals(), which is used for comparing Comparator<> objects for equality. The method is of the form:

boolean equals(Object comparator)
 

This compares the current Comparator<> object with another object of a type that also implements the Comparator<> interface that you pass as the argument. It returns a boolean value indicating whether the current comparator object and the argument impose the same ordering on a collection of objects. I think it would be a good idea to see how sorting using a Comparator<> object works in practice.

TRY IT OUT: Sorting an Array Using a Comparator

You can borrow the version of the Person class that implements the Comparable<> interface from the TryVector example in Chapter 14 for this example. Copy the Person.java file to the directory you set up for this example. The comparator needs access to the first name and the surname for a Person object to make comparisons, so you need to add methods to the Person class to allow that:

image
public class Person implements Comparable<Person> {
  public String getFirstName() {
    return firstName;
  }
  public String getSurname() {
    return surname;
  }
  // Rest of the class as before...
}
 

Directory "TrySortingWithComparator"

You can now define a class for a comparator that applies to Person objects:

image
import java.util.Comparator;
 
public class ComparePersons implements Comparator<Person> {
  // Method to compare Person objects - order is descending
  public int compare(Person person1, Person person2) {
    int result = -person1.getSurname().compareTo(person2.getSurname());
    return result == 0 ?
          -person1.getFirstName().compareTo(person2.getFirstName()) : result;
  }
 
  // Method to compare with another comparator
  public boolean equals(Object comparator) {
    if(this == comparator) {           // If argument is the same object
      return true;                     // then it must be equal
    }
    if(comparator == null) {           // If argument is null
      return false;                    // then it can't be equal
    }
    // Class must be the same for equal
    return getClass() == comparator.getClass(); 
  }
}
 

Directory "TrySortingWithComparator"

Just to make it more interesting and to demonstrate that it’s this comparator and not the compareTo() method in the Person class that’s being used by the sort() method, this comparator establishes a descending sequence of Person objects. By switching the sign of the value that the compareTo() method returns, you invert the sort order. Thus, sorting using this comparator sorts Person objects in descending alphabetical order by surname and then by first name within surname.

You can try this out with the following program:

image
import java.util.Arrays;
 
public class TrySortingWithComparator {
  public static void main(String[] args) {
    Person[] authors = {
      new Person("Danielle", "Steel"),    new Person("John", "Grisham"),
      new Person("Tom", "Clancy"),        new Person("Christina", "Schwartz"),
      new Person("Patricia", "Cornwell"), new Person("Bill", "Bryson")
                       };
 
    System.out.println("Original order:");
    for(Person author : authors) {
      System.out.println(author);
    }
 
    Arrays.sort(authors, new ComparePersons()); // Sort using comparator
 
    System.out.println("
Order after sorting using comparator:");
    for(Person author : authors) {
      System.out.println(author);
    }
 
    Arrays.sort(authors);                       // Sort using compareTo() method
 
    System.out.println("
Order after sorting using compareTo() method:");
    for(Person author : authors) {
      System.out.println(author);
    }
  }
}
 

Directory "TrySortingWithComparator"

This example produces the following output:

Original order:
Danielle Steel
John Grisham
Tom Clancy
Christina Schwartz
Patricia Cornwell
Bill Bryson
 
Order after sorting using comparator:
Danielle Steel
Christina Schwartz
John Grisham
Patricia Cornwell
Tom Clancy
Bill Bryson
 
Order after sorting using compareTo() method:
Bill Bryson
Tom Clancy
Patricia Cornwell
John Grisham
Christina Schwartz
Danielle Steel
 

How It Works

After defining the authors array of Person objects, you sort them with the statement:

    Arrays.sort(authors, new ComparePersons());  // Sort using comparator
 

The second argument is an instance of the ComparePersons class, which is a comparator for Person objects because it implements the Comparator<Person> interface. The sort() method calls the compare() method to establish the order between Person objects, and you defined this method like this:

  public int compare(Person person1, Person person2) {
    int result = -person1.getSurname().compareTo(person2.getSurname());
    return result == 0 ?
               -person1.getFirstName().compareTo(person2.getFirstName()):
               result;
}
 

The primary comparison is between surnames and returns a result that is the opposite of that produced by the compareTo() method for String objects. Because the order established by the compareTo() method is ascending, your compare() method establishes a descending sequence. If the surnames are equal, the order is determined by the first names, again inverting the sign of the value returned by the compareTo() method to maintain descending sequence. Of course, you could have coded this method by switching the arguments, person1 and person2, instead of reversing the sign:

  public int compare(Person person1, Person person2) {
    int result = person2.getSurname().compareTo(person1.getSurname());
    return result == 0 ? 
                person2.getFirstName().compareTo(person1.getFirstName()) : result;
  }
 

This would establish a descending sequence for Person objects.

You call the sort() method a second time with the statement:

    Arrays.sort(authors);                     // Sort using compareTo() method
 

Because you have not supplied a comparator, the sort() method expects the class type of the elements to be sorted to have implemented the Comparable<> interface. Fortunately your Person class does, so the authors get sorted. This time the result is in ascending sequence because that’s what the compareTo() method establishes.

Searching Arrays

The static binarySearch() method in the Arrays class searches the elements of a sorted array for a given value using the binary search algorithm. This works only if the elements of the array are in ascending sequence, so if they are not, you should call the sort() method before calling binarySearch(). The binary search algorithm works by repeatedly subdividing the sequence of elements to find the target element value, as illustrated in Figure 15-1.

image

The figure shows two searches of an array of integers. The first step is always to compare the target with the element at the approximate center of the array. The second step is to examine the element at the approximate center of the left or right half of the array, depending on whether the target is less than or greater than the element. This process of subdividing and examining the element at the middle of the interval continues until the target is found, or the interval consists of a single element that is different from the target. When the target is found, the result is the index position of the element that is equal to the target. You should be able to see that the algorithm implicitly assumes that the elements are in ascending order.

You have eight overloaded versions of the binarySearch() method supporting the range of types that you saw with fill():

binarySearch(type[] array, type value)
 

The second argument is the value you are searching for. You have an additional version of the method for searching an array of type T[] for which you can supply a reference to a Comparator<? super T> object as the third argument; the second argument is the value sought. You also have a version that will search a part of an array:

binarySearch(T[] array, int from, int to, T value, Comparator<? super T> c)

This searches the elements from array[from] to array[to-1] inclusive for value.

All versions of the binarySearch() method return a value of type int that is the index position in array where value was found. Of course, it is possible that the value is not in the array. In this case a negative integer is returned. This is produced by taking the value of the index position of the first element that is greater than the value, reversing its sign, and subtracting 1. For example, suppose you have an array of integers containing the element values 2, 4, 6, 8, and 10:

int[] numbers = {2, 4, 6, 8, 10};
 

You could search for the value 7 with the following statement:

int position = java.util.Arrays.binarySearch(numbers, 7);
 

The value of position is -4 because the element at index position 3 is the first element that is greater than 7. The return value is calculated as -3-1, which is -4. This mechanism guarantees that if the value sought is not in the array then the return value is always negative, so you can always tell whether a value is in the array by examining the sign of the result. The magnitude of the value returned when the search fails is the index position where you could insert the value you were looking for and still maintain the order of the elements in the array.

Unless you are using a method that uses a comparator for searching arrays of objects, the class type of the array elements must implement the Comparable<> interface. Here’s how you could search for a string in an array of strings:

String[] numbers = {"one", "two", "three", "four", "five", "six", "seven"};
 
java.util.Arrays.sort(numbers);
int position = java.util.Arrays.binarySearch(numbers, "three");
 

You must sort the numbers array; otherwise, the binary search doesn’t work. After executing these statements the value of position is 5.

TRY IT OUT: In Search of an Author

You could search the authors array from the previous example. Copy the source file for the Person class from the previous example to a new directory for this example. Here’s the code to try a binary search:

image
import java.util.Arrays;
 
public class TryBinarySearch {
  public static void main(String[] args) {
    Person[] authors = {
          new Person("Danielle", "Steel"), new Person("John", "Grisham"),
          new Person("Tom", "Clancy"),     new Person("Christina", "Schwartz"),
          new Person("Patricia", "Cornwell"), new Person("Bill", "Bryson")
                       };
 
    Arrays.sort(authors);                       // Sort using compareTo() method
 
    System.out.println("
Order after sorting into ascending sequence:");
    for(Person author : authors) {
      System.out.println(author);
    }
 
    // Search for authors
    Person[] people = {
         new Person("Christina", "Schwartz"), new Person("Ned", "Kelly"),
         new Person("Tom", "Clancy"),         new Person("Charles", "Dickens")
                      };
    int index = 0;
    System.out.println("
In search of authors:");
    for(Person person : people) {
      index = Arrays.binarySearch(authors, person);
      if(index >= 0) {
        System.out.println(person + " was found at index position " + index);
      } else {
        System.out.println(person + " was not found. Return value is " + index);
      }
    }
  }
}
 

Directory "TryBinarySearch"

This example produces the following output:

Order after sorting into ascending sequence:
Bill Bryson
Tom Clancy
Patricia Cornwell
John Grisham
Christina Schwartz
Danielle Steel
 
In search of authors:
Christina Schwartz was found at index position 4
Ned Kelly was not found. Return value is -5
Tom Clancy was found at index position 1
Charles Dickens was not found. Return value is -4
 

How It Works

You create and sort the authors array in the same way as you did in the previous example. The elements in the authors array are sorted into ascending sequence because you use the sort() method without supplying a comparator, and the Comparable<> interface implementation in the Person class imposes ascending sequence on objects.

You create the people array containing Person objects that might or might not be authors. You use the binarySearch() method to check whether the elements from the people array appear in the authors array in a loop:

    for(Person person : people) {
      index = Arrays.binarySearch(authors, person);
      if(index >= 0) {
        System.out.println(person + " was found at index position " + index);
      } else {
        System.out.println(person + " was not found. Return value is " + index);
      }
    }
 

The person variable references each of the elements in turn. If the person object appears in the authors array, the index is non-negative, and the first output statement in the if executes; otherwise, the second output statement executes. You can see from the output that everything works as expected.

Array Contents as a String

The Arrays class defines several static overloads of the toString() method that return the contents of an array you pass as the argument to the method as a String object. There are overloads of this method for each of the primitive types and for type Object. The string that the methods return is the string representation of each of the array elements separated by commas, between square brackets. This is very useful when you when to output an array in this way. For example:

int[] numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
System.out.println(Arrays.toString(numbers));
 

Executing this code fragment produces the output:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
 

For presenting a multidimensional array as a string, the Arrays class defines the static deepToString() method that has a parameter of type Object[]. The method works for arrays of any number of dimensions and array elements of any type, including primitive types. If the array elements are references, the string representation of the element is produced by calling its toString() method.

Here’s an example:

String[][] folks = {
                  {"Ann", "Arthur",  "Arnie"},
                  { "Bill", "Barbara", "Ben", "Brenda", "Brian"},
                  {"Charles", "Catherine"}};
System.out.println(Arrays.deepToString(folks));
 

The output produced by this fragment is:

[[Ann, Arthur, Arnie], [Bill, Barbara, Ben, Brenda, Brian], [Charles, Catherine]]
 

OBSERVABLE AND OBSERVER OBJECTS

The Observable class provides an interesting mechanism for communicating a change in one class object to a number of other class objects. One use for this mechanism is in graphical user interface (GUI) programming where you often have one object representing all the data for the application — a text document, for example, or a geometric model of a physical object — and several other objects that represent views of the data displayed in separate windows, where each shows a different representation or perhaps a subset of the data. This is referred to as the document/view architecture for an application, or sometimes the model/view architecture. This is a contraction of something referred to as the model/view/controller architecture, and I come back to this when I discuss creating GUIs. The document/view terminology is applied to any collection of application data — geometry, bitmaps, or whatever. It isn’t restricted to what is normally understood by the term document. Figure 15-2 illustrates the document/view architecture.

When the Document object changes, all the views need to be notified that a change has occurred, because they may well need to update what they display. The document is observable, and the views are observers. This is exactly what the Observable class is designed to achieve when used in combination with the Observer interface. A document can be considered to be an Observable object, and a view can be thought of as an Observer object. This enables a view to respond to changes in the document.

The document/view architecture portrays a many-to-many relationship. A document may have many observers, and a view may observe many documents.

Defining Classes of Observable Objects

You use the java.util.Observable class in the definition of a class of objects that may be observed. You simply derive the class for objects to be monitored — Document, say — from the Observable class.

Any class that may need to be notified when a Document object changes must implement the Observer interface. This doesn’t in itself cause the Observer objects to be notified when a change in an observed object occurs; it just establishes the potential for this to happen. You need to do something else to link the observers to the observable, which I come to in a moment.

The definition of the class for observed objects could be of the form:

public class Document extends Observable {
 
  // Details of the class definitions...
}
 

The Document class inherits methods that operate the communications to the Observer objects from the Observable class.

A class for observers could be defined as the following:

public class View implements Observer {
  // Method for the interface
  public void update(Observable theObservableObject, Object arg) {
    // This method is called when the observed object changes
  }
 
  // Rest of the class definition...
}
 

To implement the Observer interface, you need to define just one method, update(). This method is called when an associated Observable object changes. The first argument that is passed to the update() method is a reference to the Observable object that changed and caused the method to be called. This enables the View object to access public methods in the associated Observable object that would be used to access the data to be displayed. The second argument to update() conveys additional information to the Observer object.

Observable Class Methods

The Observable class maintains an internal record of all the Observer objects related to the object to be observed. Your class, derived from Observable, inherits the data members that deal with this. Your class of observable objects also inherits nine methods from the Observable class:

  • void addObserver(Observer o): Adds the object you pass as the argument to the internal record of observers. Only Observer objects in the internal record are notified when a change in the Observable object occurs.
  • void deleteObserver(Observer o): Deletes the object you pass as the argument from the internal record of observers.
  • void deleteObservers(): Deletes all observers from the internal record of observers.
  • void notifyObservers(Object arg): Calls the update() method for all of the Observer objects in the internal record if the current object has been set as changed. The current object is set as changed by calling the setChanged() method. The current object and the argument passed to the notifyObservers() method are passed to the update() method for each Observer object. The clearChanged() method for the Observable is called to reset its status.
  • void notifyObservers(): Calling this method is equivalent to calling the previous method with a null argument.
  • int countObservers(): Returns the number of Observer objects for the current object.
  • void setChanged(): Sets the current object as changed. You must call this method before calling the notifyObservers() method. Note that this method is protected.
  • boolean hasChanged(): Returns true if the object has been set as changed and returns false otherwise.
  • void clearChanged(): Resets the changed status of the current object to unchanged. Note that this method is also protected. This method is called automatically after notifyObservers() is called.

It’s fairly easy to see how these methods are used to manage the relationship between an Observable object and its observers. To connect an observer to an Observable object, the Observer object must be registered with the Observable object by calling its addObserver() method. The Observer is then notified when changes to the Observable object occur. An Observable object is responsible for adding Observer objects to its internal record through the addObserver() method. In practice, the Observer objects are typically created as objects that are dependent on the Observable object, and then they are added to the record, so there’s an implied ownership relationship.

This makes sense if you think about how the mechanism is often used in an application using the document/view architecture. A document has permanence because it represents the data for an application. A view is a transient presentation of some or all of the data in the document, so a document object should naturally create and own its view objects. A view is responsible for managing the interface to the application’s user, but the update of the underlying data in the document object would be carried out by methods in the document object, which would then notify other view objects that a change has occurred.

Of course, you’re in no way limited to using the Observable class and the Observer interface in the way in which I described here. You can use them in any context where you want changes that occur in one class object to be communicated to others. We can exercise the process in a frightening example.

TRY IT OUT: Observing the Observable

We first define a class for an object that can exhibit change:

image
import java.util.Observable;
 
public class JekyllAndHyde extends Observable {
  public void drinkPotion() {
    name = "Mr.Hyde";
    setChanged();
    notifyObservers();
  }
 
  public String getName() {
    return name;
  }
 
  private String name = "Dr. Jekyll";
}
 

Directory "Horrific"

Now we can define the class of person who’s looking out for this kind of thing:

image
import java.util.Observer;
import java.util.Observable;
 
public class Person implements Observer {
  // Constructor
  public Person(String name, String says) {
    this.name = name;
    this.says = says;
  }
 
  // Called when observing an object that changes
  public void update(Observable thing, Object o) {
    System.out.println("It's " + ((JekyllAndHyde)thing).getName() +
                       "
" + name + ": " + says);
  }
 
  private String name;                           // Person's identity
  private String says;                           // What they say when startled
}
 

Directory "Horrific"

We can gather a bunch of observers to watch Dr. Jekyll with the following class:

image
// Try out observers
import java.util.Observer;
 
public class Horrific {
  public static void main(String[] args) {
    JekyllAndHyde man = new JekyllAndHyde();     // Create Dr. Jekyll
 
    Observer[] crowd = {
     new Person("Officer","What's all this then?"),
     new Person("Eileen Backwards", "Oh, no, it's horrible Đ those teeth!"),
     new Person("Phil McCavity", "I'm your local dentist Đ here's my card."),
     new Person("Slim Sagebrush", "What in tarnation's goin' on here?"),
     new Person("Freaky Weirdo", "Real cool, man. Where can I get that stuff?")
                       };
 
    // Add the observers
    for(Observer observer : crowd) {
      man.addObserver(observer);
    }
    man.drinkPotion();                           // Dr. Jekyll drinks up
  }
}
 

Directory "Horrific"

If you compile and run this, you should get the following output:

It's Mr.Hyde
Freaky Weirdo: Real cool, man. Where can I get that stuff?
It's Mr.Hyde
Slim Sagebrush: What in tarnation's goin' on here?
It's Mr.Hyde
Phil McCavity: I'm your local dentist - here's my card.
It's Mr.Hyde
Eileen Backwards: Oh, no, it's horrible - those teeth!
It's Mr.Hyde
Officer: What's all this then?
 

How It Works

JekyllAndHyde is a very simple class with just two methods. The drinkPotion() method encourages Dr. Jekyll to do his stuff and change into Mr. Hyde, and the getName() method enables anyone who is interested to find out who he now is. The class extends the Observable class, so you can add observers for a JekyllAndHyde object.

The revamped Person class implements the Observer interface, so an object of this class can observe an Observable object. When notified of a change in the object being observed, the update() method is called. Here, it just outputs who the person is and what he or she says.

In the Horrific class, after defining Dr. Jekyll in the variable man, you create an array, crowd, of type Observer[] to hold the observers — which are of type Person, of course. You can use an array of type Observer[] because the Person class implements the Observer interface. You pass two arguments to the Person class constructor: a name and a string indicating what the person says when he sees a change in Dr. Jekyll. You add each of the observers for the man object in the for loop.

Calling the drinkPotion() method for man causes the internal name to be changed, the setChanged() method to be called for the man object, and the notifyObservers() method that is inherited from the Observable class to be called. This results in the update() method for each of the registered observers being called, which generates the output. If you comment out the setChanged() call in the drinkPotion() method, and compile and run the program again, you get no output. Unless setChanged() is called, the observers aren’t notified.

Now let’s move on to look at the java.util.Random class.

GENERATING RANDOM NUMBERS

You have already used the Random class a little, but let’s investigate this in more detail. The Random class enables you to create multiple random number generators that are independent of one another. Each object of the class is a separate random number generator. Any Random object can generate pseudo-random numbers of types int, long, float, or double. These numbers are created using an algorithm that takes a seed and grows a sequence of numbers from it. Initializing the algorithm twice with the same seed produces the same sequence because the algorithm is deterministic.

The integer values that are generated are uniformly distributed over the complete range for the type, and the floating-point values are uniformly distributed over the range 0.0 to 1.0 for both types. You can also generate numbers of type double with a Gaussian (or normal) distribution that has a mean of 0.0 and a standard deviation of 1.0. This is the typical bell-shaped curve that represents the probability distribution for many random events. Figure 15-3 illustrates the principle flavors of random number generators that you can define.

In addition to the methods shown in Figure 15-3, nextBoolean() returns either true or false with equal probability, and nextBytes() that fills the byte[] array you supply as the argument with a sequence of bytes with random values; NullPointerException is thrown if the argument is null.

There are two constructors for a Random object. The default constructor creates an object that uses the current time from your computer clock as the seed value for generating pseudo-random numbers. The other constructor accepts an argument of type long that is used as the seed.

image
Random lottery = new Random();                   // Sequence not repeatable
Random repeatable = new Random(997L);            // Repeatable sequence
 

If you use the default constructor, the sequence of numbers that is generated is different each time a program is run, although beware of creating two generators in the same program with the default constructor. The time resolution can be down to nanoseconds but it is only guaranteed to be no greater than 1 millisecond. If you create two objects in successive statements they may generate the same sequence because the times used for the starting seed values were identical.

Random objects that you create using the same seed always produce the same sequence, which can be very important when you are testing a program. Testing a program where the output is not repeatable can be a challenge! A major feature of random number generators you create using a given seed in Java is that not only do they always produce the same sequence of pseudo-random numbers, but they also do so on totally different computers.

Random Operations

The public methods provided by a Random object are the following:

  • int nextInt(): Returns a pseudo-random number integer. Values are uniformly distributed across the range of possible values of type int.
  • int nextInt(int limit): Returns a pseudo-random number integer that is greater than or equal to 0, and less than limit — very useful for creating random array index values.
  • long nextLong()(): The same as nextInt() except values are of type long.
  • float nextFloat(): Returns a pseudo-random float value. Values are uniformly distributed across the range 0.0f to 1.0f, excluding 1.0f.
  • double nextDouble(): Returns a pseudo-random number of type double. Values are uniformly distributed across the range 0.0 to 1.0, excluding 1.0.
  • double nextGaussian(): Returns a pseudo-random number selected from a Gaussian distribution. Values generated have a mean of 0.0 and a standard deviation of 1.0.
  • void nextBytes(byte[] bytes): Fills the array, bytes, with pseudo-random values.
  • boolean nextBoolean(): Returns a pseudo-random boolean value with true and false being equally probable.
  • void setSeed(long seed): Resets the random number generator to generate values using the value passed as an argument as a starting seed for the algorithm.

To produce a pseudo-random number of a particular type, you just call the appropriate method for a Random object. You can repeat a sequence of numbers that has been generated by a Random object with a given seed by calling the setSeed() method with the original seed value as the argument.

You can give the Random class an outing with a simple program that simulates throwing a pair of dice. The program allows you six throws to try to get a double six.

TRY IT OUT: Using Random Objects

Here’s the program:

image
import java.util.Random;
import java.io.IOException;
 
public class Dice {
  public static void main(String[] args) {
    System.out.println("You have six throws of a pair of dice.
" +
               "The objective is to get a double six. Here goes...
");
 
    Random diceValues = new Random();               // Random number generator
    String[] goes = {"First",  "Second", "Third",
                     "Fourth", "Fifth",  "Sixth"};
    int die1 = 0;                                   // First die value
    int die2 = 0;                                   // Second die value
 
    for(String go : goes)  {
      die1 = 1 + diceValues.nextInt(6);             // Number from 1 to 6
      die2 = 1 + diceValues.nextInt(6);             // Number from 1 to 6
      System.out.println(go + " throw: " + die1 + ", " + die2);
 
      if(die1 + die2 == 12) {                       // Is it double 6?
        System.out.println("    You win!!");        // Yes !!!
        return;
      }
    }
    System.out.println("Sorry, you lost...");
    return;
  }
}

Dice.java

If you compile this program you should get output that looks something like this:

You have six throws of a pair of dice.
The objective is to get a double six. Here goes...
 
First throw: 3, 2
Second throw: 1, 1
Third throw: 1, 2
Fourth throw: 5, 3
Fifth throw: 2, 2
Sixth throw: 6, 4
Sorry, you lost...
 

How It Works

You use a random number generator that you create using the default constructor, so it is seeded with the current time and produces a different sequence of values each time the program is run. You simulate throwing the dice in the for loop. For each throw you need a random number between 1 and 6 to be generated for each die. The easiest way to produce this is to add 1 to the value returned by the nextInt() method when you pass 6 as the argument. If you want to make a meal of it, you could obtain the same result by using the following statement:

die1 = 1 + abs(diceValues.nextInt())%6;             // Number from 1 to 6
 

Remember that the pseudo-random integer values that you get from the version of the nextInt() method you are using here is uniformly distributed across the whole range of possible values for type int, positive and negative. That’s why you need to use the abs() method from the Math class here to make sure you end up with a positive die value. The remainder after dividing the value resulting from abs(diceValues.nextInt()) by 6 is between 0 and 5. Adding 1 to this produces the result you want.

image

NOTE Remember that the odds against a double six are 36:1, so on average you only succeed once out of every six times you run the example.

DATES AND TIMES

Quite a few classes in the java.util package are involved with dates and times, including the Date class, the Calendar class, and the GregorianCalendar class. In spite of the class name, a Date class object actually defines a particular instant in time to the nearest millisecond, measured from January 1, 1970, 00:00:00 GMT. Because it is relative to a particular instant in time, it also corresponds to a date. The Calendar class is the base class for GregorianCalendar, which represents the sort of day/month/year calendar everybody is used to and also provides methods for obtaining day, month, and year information from a Date object. A Calendar object is always set to a particular date — a particular instant on a particular date to be precise — but you can change it by various means. From this standpoint a GregorianCalendar object is more like one of those desk calendars that just show one date, and you can flip over the days, months, or years to show another date.

You also have the TimeZone class that defines a time zone that can be used in conjunction with a calendar and that you can use to specify the rules for clock changes due to daylight saving time. The ramifications of handling dates and times are immense so you are only able to dabble here, but at least you get the basic ideas. Let’s look at Date objects first.

The Date Class

A Date class object represents a given date and time. You have two Date constructors:

  • Date() creates an object based on the current time of your computer clock to the nearest millisecond.
  • Date(long time) creates an object based on time, which is the number of milliseconds since 00:00:00 GMT on January 1, 1970.

With either constructor, you create an object that represents a specific instant in time to the nearest millisecond. Carrying dates around as the number of milliseconds since the dawn of the year 1970 won’t grab you as being incredibly user-friendly — but I come back to how you can better interpret a Date object in a moment. The Date class provides three methods for comparing objects that return a boolean value:

  • after(Date earlier) returns true if the current object represents a date that’s later than the date represented by the argument and returns false otherwise.
  • before(Date later) returns true if the current object represents a date that’s earlier than the date represented by the argument and returns false otherwise.
  • equals(Object aDate) returns true if the current object and the argument represent the same date and time and returns false otherwise.

The Date class implements the Comparable<Date> interface so you have the compareTo() method available. As you’ve seen in other contexts, this method returns a negative integer, zero, or a positive integer depending on whether the current object is less than, equal to, or greater than the argument. The presence of this method in the class means that you can use the sort() method in the Arrays class to sort an array of Date objects, or the sort() method in the Collections class to sort a collection of dates. Because the hashCode() method is also implemented for the class, you have all you need to use Date objects as keys in a hash map.

Interpreting Date Objects

The DateFormat class is an abstract class that you can use to create meaningful String representations of Date objects. It isn’t in the java.util package though — it’s defined in the package java.text. You have four standard representations for the date and the time that are identified by constants defined in the DateFormat class. The effects of these vary in different countries, because the representation for the date and the time reflects the conventions of those countries. The constants in the DateFormat class defining the four formats are shown in Table 15-1.

TABLE 15-1: Date Formats

DATE FORMAT DESCRIPTION
SHORT A completely numeric representation for a date or a time, such as 2/2/97 or 4:15 a.m.
MEDIUM A longer representation than SHORT, such as 5-Dec-97
LONG A longer representation than MEDIUM, such as December 5, 1997
FULL A comprehensive representation of the date or the time such as Friday, December 5, 1997 AD or 4:45:52 PST (Pacific Standard Time)

A java.util.Locale object identifies information that is specific to a country, a region, or a language. You can define a Locale object for a specific country, for a specific language, for a country and a language, or for a country and a language and a variant, the latter being a vendor- or browser-specific code such as WIN or MAC. When you are creating a Locale object, you use ISO codes to specify the language or the country (or both). The language codes are defined by ISO-639. Countries are specified by the country codes in the standard ISO-3166. You can find the country codes on the Internet at www.iso.org/iso/country_codes.

You can also get a list of the country codes as an array of String objects by calling the static getISOCountries() method. For example:

String[] countryCodes = java.util.Locale.getISOCountries();
 

You can find the language codes at www.loc.gov/standards/iso639-2/php/English_list.php.

You can also get the language codes that are defined by the standard in a String object:

String[] languages = java.util.Locale.getISOLanguages();
 

For some countries, the easiest way to specify the locale, if you don’t have the ISO codes on the tip of your tongue, is to use one of the static Locale objects defined within the Locale class:

image

The Locale class also defines static Locale objects that represent languages:

image

Because the DateFormat class is abstract, you can’t create objects of the class directly, but you can obtain DateFormat objects by using static methods that are defined in the class, each of which returns a value of type DateFormat. A DateFormat object encapsulates a Locale and an integer date style. The style is defined by one of the constants defined in the DateFormat class, SHORT, MEDIUM, LONG, or FULL that you saw earlier.

Each Locale object has a default style that matches conventions that apply for the country or language it represents.

You can create DateFormat instances that can format a Date object as a time, as a date, or as a date and a time. The static methods that create DateFormat objects of various kinds are getTimeInstance() that returns a time formatter, getDateInstance() that returns a date formatter, and getDateTimeInstance() that returns an object that can format the date and the time. The first two come in three flavors, a no-arg version where you get a formatter for the default locale and style, a single argument version where you supply the style for the default locale, and a version that accepts a style argument and a Locale argument. The getDateTimeInstance() also comes in three versions, the no-arg version that creates a formatter for the default locale and the default date and time style, a version where you supply two arguments that are the date style and the time style, and a version that requires three arguments that are the date style, the time style, and a Locale object for the locale.

When you’ve obtained a DateFormat object for the country and the style that you want, and the sort of data you want to format — the date or the time or both — you’re ready to produce a String from the Date object.

You just pass the Date object to the format() method for the DateFormat object. For example:

Date today = new Date();               // Object for now - today's date
DateFormat fmt = DateFormat.getDateTimeInstance(DateFormat.FULL,  // Date style
                                                DateFormat.FULL,  // Time style
                                                Locale.US);       // Locale
String formatted = fmt.format(today);
 

The first statement creates a Date object that represents the instant in time when the call to the Date constructor executes. The second statement creates a DateFormat object that can format the date and time encapsulated by a Date object. In this case you specify the formatting style for the data and the time to be the same, the FULL constant in the DateFormat class. This provides the most detailed specification of the date and time. The third argument, Locale.US, determines that the formatting should correspond to that required for the United States. The Locale class defines constants for other major countries and languages. The third statement applies the format() method of the fmt object to the Date object. After executing these statements, the String variable formatted contains a full representation of the date and the time when the Date object today was created.

You can try out some dates and formats in an example.

TRY IT OUT: Producing Dates and Times

This example shows the four different date formats for four countries:

image
// Trying date formatting
import java.util.Locale;
import java.text.DateFormat;
import java.util.Date;
import static java.util.Locale.*;        // Import names of constants
import static java.text.DateFormat.*;    // Import names of constants
 
public class TryDateFormats {
  public enum Style {FULL, LONG, MEDIUM, SHORT}
 
  public static void main(String[] args) {
    Date today = new Date();
    Locale[] locales = {US, UK, GERMANY, FRANCE};
 
    // Output the date for each locale in four styles
    DateFormat fmt = null;
    for(Locale locale : locales) {
      System.out.println("
The Date for " +
                         locale.getDisplayCountry() + ":");
      for (Style style : Style.values()) {
        fmt = DateFormat.getDateInstance(style.ordinal(), locale);
        System.out.println( "  In " + style +
                            " is " + fmt.format(today));
      }
    }
  }
}
 

TryDateFormats.java

When I compiled and ran this it produced the following output:

The Date for United States:
  In FULL is Monday, June 27, 2011
  In LONG is June 27, 2011
  In MEDIUM is Jun 27, 2011
  In SHORT is 6/27/11
 
The Date for United Kingdom:
  In FULL is Monday, 27 June 2011
  In LONG is 27 June 2011
  In MEDIUM is 27-Jun-2011
  In SHORT is 27/06/11
 
The Date for Germany:
  In FULL is Montag, 27. Juni 2011
  In LONG is 27. Juni 2011
  In MEDIUM is 27.06.2011
  In SHORT is 27.06.11
 
The Date for France:
  In FULL is lundi 27 juin 2011
  In LONG is 27 juin 2011
  In MEDIUM is 27 juin 2011
  In SHORT is 27/06/11
 

How It Works

By statically importing the constants from the Locale and DateFormat classes, you obviate the need to qualify the constants in the program and thus make the code a little less cluttered. The nested Styles enum defines the four possible styles. The program creates a Date object for the current date and time and an array of Locale objects for four countries using values defined in the Locale class.

The output is produced in the nested for loops. The outer loop iterates over the countries, and the inner loop is a collection-based for loop that iterates over the styles for each country in the Styles enum. The ordinal() method for an enum value returns the ordinal for the value in the enumeration. You use this to specify the style as the first argument to the getDateInstance() method. A DateFormat object is created for each style and country combination. Calling the format() method for the DateFormat object produces the date string in the inner call to println().

You could change the program in a couple ways. You could initialize the locales[] array with DateFormat.getAvailableLocales(). This returns an array of type Locale containing all of the supported locales, but be warned — there are a lot of them. You may find that the characters won’t display for some countries because your machine doesn’t support the country-specific character set. You could also use the method getTimeInstance() or getDateTimeInstance() instead of getDateInstance() to see what sort of output they generate.

Under the covers, a DateFormat object contains a DateFormatSymbols object that contains all the strings for the names of days of the week and other fixed information related to time and dates. This class is also in the java.text package. Normally you don’t use the DateFormatSymbols class directly, but it can be useful when all you want are the days of the week.

Obtaining a Date Object from a String

The parse() method for a DateFormat object interprets a String object argument as a date and time, and returns a Date object corresponding to the date and the time. The parse() method throws a ParseException if the String object can’t be converted to a Date object, so you must call it within a try block.

The String argument to the parse() method must correspond to the country and style that you used when you obtained the DateFormat object. For example, the following code parses the string properly:

Date aDate;
DateFormat fmt = DateFormat.getDateInstance(DateFormat.FULL, Locale.US);
try {
  aDate = fmt.parse("Saturday, July 4, 1998 ");
  System.out.println("The Date string is: " + fmt.format(aDate));
 
} catch(java.text.ParseException e) {
  System.out.println(e);
}
 

This works because the string is what would be produced by the locale and style. If you omit the day from the string, or you use the LONG style or a different locale, a ParseException is thrown.

Gregorian Calendars

The Gregorian calendar is the calendar generally in use today in the western world and is represented by an object of the GregorianCalendar class. A GregorianCalendar object encapsulates time zone information, as well as date and time data. You have no less than seven constructors for GregorianCalendar objects, from the default that creates a calendar with the current date and time in the default locale for your machine through to a constructor specifying the year, month, day, hour, minute, and second. The default suits most situations:

GregorianCalendar calendar = new GregorianCalendar();
 

This object is set to the current instant in time, and you can access this as a Date object by calling its getTime():

Date now = calendar.getTime();
 

You can create a GregorianCalendar object encapsulating a specific date and/or time with any of the following constructors:

GregorianCalendar(int year, int month, int day)
GregorianCalendar(int year, int month, int day, int hour, int minute)
GregorianCalendar(int year, int month, int day, int hour, int minute, int second)
 

The day argument is the day within the month, so the value can be from 1 to 28, 29, 30, or 31, depending on the month and whether it’s a leap year or not. The month value is zero-based so January is 0 and December is 11.

The GregorianCalendar class is derived from the abstract Calendar class from which it inherits a large number of methods and static constants for use with these methods. The constants include month values with the names JANUARY to DECEMBER so you could create a calendar object with the statement:

GregorianCalendar calendar = new GregorianCalendar(1967, Calendar.MARCH, 10);
 

If you statically import the constant members of the GregorianCalendar class you are able to use constants such as MARCH and DECEMBER without the need to qualify them with the class name. The time zone and locale is the default for the computer on which this statement executes. If you want to specify a time zone, there is a GregorianCalendar constructor that accepts an argument of type java.util.TimeZone. You can get the default TimeZone object by calling the static getDefault() method, but if you are going to the trouble of specifying a time zone, you probably want something other than the default. To create a particular time zone you need to know its ID. This is a string specifying a region or country plus a location. For example, here are some examples of time zone IDs:

"Europe/Stockholm" "Asia/Novosibirsk"
"Pacific/Guam" "Antarctica/Palmer"
"Atlantic/South_Georgia" "Africa/Accra"
"America/Chicago" "Europe/London"

To obtain a reference to a TimeZone object corresponding to a given time zone ID, you pass the ID to the static getTimeZone() method. For example, we could create a Calendar object for the Chicago time zone like this:

GregorianCalendar calendar =
      new GregorianCalendar(TimeZone.getTimeZone("America/Chicago"));
 

If you want to know what all the time zones IDs are, you could list them like this:

String[] ids = TimeZone.getAvailableIDs();
for(String id : ids) {
  System.out.println(id);
}
 

Be prepared for a lot of output though. There are well more 500 time zone IDs.

The calendar created from a TimeZone object has the default locale. If you want to specify the locale explicitly, you have a constructor that accepts a Locale reference as the second argument. For example:

GregorianCalendar calendar =
      new GregorianCalendar(TimeZone.getTimeZone("America/Chicago"), Locale.US);
 

You can also create a Calendar object from a locale:

GregorianCalendar calendar = new GregorianCalendar(Locale.UK);
 

This creates a calendar set to the current time in the default time zone within the UK.

Setting the Date and Time

If you have a Date object available, you can pass it to the setTime() method for a GregorianCalendar object to set it to the time specified by the Date object:

GregorianCalendar calendar = new GregorianCalendar();
calendar.setTime(date);
 

More typically you will want to set the date and/or time with explicit values such as day, month, and year, and you have several overloaded versions of the set() method for setting various components of the date and time. These are inherited in the GregorianCalendar class from its superclass, the Calendar class. You can set a GregorianCalendar object to a particular date like this:

GregorianCalendar calendar = new GregorianCalendar();
calendar.set(1995, 10, 29);                  // Date set to 29th November 1995
 

The three arguments to the set() method here are the year, the month, and the day as type int. You need to take care with this method because it’s easy to forget that the month is zero-based, with January specified by 0. Note that the fields reflecting the time setting within the day are not changed. They remain at whatever they were. You can reset all fields for a GregorianCalendar object to undefined by calling its clear() method.

The other versions of the set() method are:

set(int year, int month, int day, int hour, int minute)
set(int year, int month, int day, int hour, int minute, int second)
set(int field, int value)
 

It’s obvious what the first two of these do. In each case the fields not explicitly set are left at their original values. The third version of set() sets a field specified by one of the integer constants defined in the Calendar class for this purpose (shown in Table 15-2):

TABLE 15-2: Calendar Field Setting Options

FIELD VALUE
AM_PM Can have the values AM or PM, which correspond to values of 0 and 1
DAY_OF_WEEK Can have the values SUNDAY, MONDAY, and so on, through to SATURDAY, which correspond to values of 1 to 7
DAY_OF_WEEK_IN_MONTH Ordinal number for the day of the week in the current month.
DAY_OF_YEAR Can be set to a value from 1 to 366
MONTH Can be set to a value of JANUARY, FEBRUARY, and so on, through to DECEMBER, corresponding to values of 0 to 11
DAY_OF_MONTHor DATE Can be set to a value from 1 to 31
WEEK_OF_MONTH Can be set to a value from 1 to 6
WEEK_OF_YEAR Can be set to a value from 1 to 54
HOUR_OF_DAY A value from 0 to 23
HOUR A value from 1 to 12 representing the current hour in the a.m. or p.m.
MINUTE The current minute in the current hour — a value from 0 to 59
SECOND The second in the current minute, 0 to 59
MILLISECOND The millisecond in the current second, 0 to 999
YEAR The current year — for example, 2011
ERA Can be set to either GregorianCalendar.BC or GregorianCalendar.AD (both values being defined in the GregorianCalendar class)
ZONE_OFFSET A millisecond value indicating the offset from GMT
DST_OFFSET A millisecond value indicating the offset for daylight saving time in the current time zone

Qualifying the names of these constants with the class name GregorianCalendar can make the code look cumbersome but you can use static import for the constants to simplify things:

import static java.util.Calendar.*;
import static java.util.GregorianCalendar.*;
 

The static import statement imports only the names of static data members that are defined in a class, not the names of inherited members. Therefore, you need two import statements if you want access to all the constants you can use with the GregorianCalendar class.

With these two import statements in effect, you can write statements like this:

GregorianCalendar calendar = new GregorianCalendar();
calendar.set(DAY_OF_WEEK, TUESDAY);
 

Getting Date and Time Information

You can get information such as the day, the month, and the year from a GregorianCalendar object by using the get() method and specifying what you want by the argument. The possible arguments to the get() method are those defined in the earlier table of constants identifying calendar fields. All values returned are of type int. For example, you could get the day of the week with the statement:

int day = calendar.get(calendar.DAY_OF_WEEK);
 

You could now test this for a particular day using the constants defined in the class:

if(day == calendar.SATURDAY)
  // Go to game...
 

Because the values for day are integers, you could equally well use a switch statement:

switch(day) {
  case Calendar.MONDAY:
  // do the washing...
  break;
  case Calendar.TUESDAY:
  // do something else...
  break;
  // etc...
}
 

Modifying Dates and Times

Of course, you might want to alter the current instant in the calendar, and for this you have the add() method. The first argument determines what units you are adding in, and you specify this argument using the same field designators as in the previous list. For example, you can add 14 to the year with the statement:

calendar.add(calendar.YEAR, 14);  // 14 years into the future
 

To go into the past, you just make the second argument negative:

calendar.add(calendar.MONTH, -6);  // Go back 6 months
 

You can increment or decrement a field of a calendar by 1 using the roll() method. This method modifies the field specified by the first argument by +1 or −1, depending on whether the second argument is true or false. For example, to decrement the current month in the object calendar, you would write the following:

calendar.roll(calendar.MONTH, false);  // Go back a month
 

The change can affect other fields. If the original month were January, rolling it back by one would make the date December of the previous year.

Another version of the roll() method allows you to roll a field by a specified signed integer amount as the second argument. A negative value rolls down and a positive value rolls up.

Of course, having modified a GregorianCalendar object, you can get the current instant back as a Date object using the getTime() method that we saw earlier. You can then use a DateFormat object to present this in a readable form.

Comparing Calendars

Checking the relationship between dates represented by Calendar objects is a fairly fundamental requirement and you have four methods available for comparing them (shown in Table 15-3):

TABLE 15-3: Methods for Comparing Calendar Objects

METHOD DESCRIPTION
before() Returns true if the current object corresponds to a time before that of the Calendar object passed as an argument. Note that this implies a true return can occur if the date is the same but the time is different.
after() Returns true if the current object corresponds to a time after that of the Calendar object passed as an argument.
equals() Returns true if the current object corresponds to a time that is identical to that of the Calendar object passed as an argument.
compareTo(Calendar c) Returns a value of type int that is negative, zero, or positive depending on whether the time value for the current object is less than, equal to, or greater than the time value for the argument.

These are very simple to use. To determine whether the object thisDate defines a time that precedes the time defined by the object today, you could write:

if(thisDate.before(today)) {
  // Do something...
}
 

Alternatively you could write the same thing as:

if(today.after(thisDate)) {
  // Do something...
}
 

It’s time to look at how we can use calendars.

TRY IT OUT: Using a Calendar

This example deduces important information about when you were born. It uses the FormattedInput class from Chapter 8 to get input from the keyboard, so copy this class and the source file for the InvalidUserInputException class to a new directory for the source files for this example. Here’s the code:

image
import java.util.GregorianCalendar;
import java.text.DateFormatSymbols;
import static java.util.Calendar.*;
 
class TryCalendar {
  public static void main(String[] args) {
    FormattedInput in = new FormattedInput();
 
    // Get the date of birth from the keyboard
    int day = 0, month = 0, year = 0;
    System.out.println("Enter your birth date as dd mm yyyy: ");
    try {
      day = in.readInt();
      month = in.readInt();
      year = in.readInt();
    } catch(InvalidUserInputException e) {
      System.out.println("Invalid input - terminating...");
      System.exit(1);
    }
 
    // Create birth date calendar - month is 0 to 11
    GregorianCalendar birthdate = new GregorianCalendar(year, month-1,day);
    GregorianCalendar today = new GregorianCalendar();        // Today's date
 
    // Create this year's birthday
    GregorianCalendar birthday = new GregorianCalendar(
                                        today.get(YEAR),
                                        birthdate.get(MONTH),
                                        birthdate.get(DATE));
 
    int age = today.get(YEAR) - birthdate.get(YEAR);
 
    String[] weekdays = new DateFormatSymbols().getWeekdays(); // Get day names
 
    System.out.println("You were born on a " +
                        weekdays[birthdate.get(DAY_OF_WEEK)]);
    System.out.println("This year you " +
                        (birthday.after(today)   ?"will be " : "are ") +
                        age + " years old.");
    System.out.println("In " + today.get(YEAR) + " your birthday " +
                       (today.before(birthday)? "will be": "was") +
                       " on a "+ weekdays[birthday.get(DAY_OF_WEEK)] +".");
  }
}

TryCalendar.java

I got the following output:

Enter your birth date as dd mm yyyy: 
05 12 1974
You were born on a Thursday
This year you will be 37 years old.
In 2011 your birthday will be on a Monday.

How It Works

You start by prompting for the day, month, and year for a date of birth to be entered through the keyboard as integers. You then create a GregorianCalendar object corresponding to this date. Note the adjustment of the month — the constructor expects January to be specified as 0. You need a GregorianCalendar object for today’s date so you use the default constructor for this. To compute the age this year, you just have to subtract the year of birth from this year, both of which you get from the GregorianCalendar objects.

To get at the strings for the days of the week, you create a DateFormatSymbols object and call its getWeekdays() method. This returns an array of eight String objects, the first of which is empty to make it easy to index using day numbers from 1 to 7. The second element in the array contains "Sunday". You can also get the month names using the getMonths() method.

To display the day of the week for the date of birth, you call the get() method for the GregorianCalendar object birthdate and use the result to index the weekdays[] array. To determine the appropriate text in the next two output statements, you use the after() and before() methods for Calendar objects to compare today with the birthday date this year.

REGULAR EXPRESSIONS

You saw some elementary capability for searching strings when I discussed the String class in Chapter 4. You have much more sophisticated facilities for analyzing strings using search patterns known as regular expressions. Regular expressions are not unique to Java. Perl is perhaps better known for its support of regular expressions and C++ supports them too. Many word processors, especially on UNIX, support regular expressions, and there are specific utilities for regular expressions, too. Many IDEs such as JCreator and Microsoft Visual Studio also support regular expressions.

So what is a regular expression? A regular expression is simply a string that describes a pattern that you use to search for matches within some other string. It’s not simply a passive sequence of characters to be matched, though. A regular expression is essentially a mini-program for a specialized kind of computer called a state-machine. This isn’t a real machine but a piece of software specifically designed to interpret a regular expression and analyze a given string based on the operations implicit in a regular expression.

The regular expression capability in Java is implemented through two classes in the java.util.regex package: the Pattern class, which defines objects that encapsulate regular expressions, and the Matcher class, which defines an object that encapsulates a state-machine that can search a particular string using a given Pattern object. The java.util.regex package also defines the PatternSyntaxException class, which defines exception objects that are thrown when a syntax error is found when compiling a regular expression to create a Pattern object.

Using regular expressions in Java is basically very simple:

1. You create a Pattern object by passing a string containing a regular expression to the static compile() method in the Pattern class.

2. You then obtain a Matcher object, which can search a given string for the pattern, by calling the matcher() method for the Pattern object with the string that is to be searched as the argument.

3. You call the find() method (or some other methods, as you later see) for the Matcher object to search the string.

4. If the pattern is found, you query the Matcher object to discover the whereabouts of the pattern in the string and other information relating to the match.

Although this is a straightforward process that is easy to code, the hard work is in defining the pattern to achieve the result that you want. This is an extensive topic because in their full glory regular expressions are immensely powerful and can be very complicated. There are books devoted entirely to this, so my aim is to give you enough of a bare-bones understanding of how regular expressions work that you are in a position to look into the subject in more depth if you need to. Although regular expressions can look quite fearsome, don’t be put off. They are always built step-by-step, so although the result may look complicated and obscure, they are not necessarily difficult to put together. Regular expressions are a lot of fun and a sure way to impress your friends and maybe confound your enemies.

Defining Regular Expressions

You may not have heard of regular expressions before reading this book and, therefore, may think you have never used them. If so, you are almost certainly wrong. Whenever you search a directory for files of a particular type, "*.java", for example, you are using a form of regular expression. However, to say that regular expressions can do much more than this is something of an understatement. To get an understanding of what you can do with regular expressions, you start at the bottom with the simplest kind of operation and work your way up to some of the more complex problems they can solve.

Creating a Pattern

In its most elementary form, a regular expression just does a simple search for a substring. For example, if you want to search a string for the word had, the regular expression is exactly that. The string that defines this particular regular expression is "had". Let’s use this as a vehicle for understanding the programming mechanism for using regular expressions. You create a Pattern object for the expression "had" like this:

Pattern had = Pattern.compile("had");
 

The static compile() method in the Pattern class returns a reference to a Pattern object that contains the compiled regular expression. The method throws a PatternSyntaxException if the argument is invalid. You don’t have to catch this exception as it is a subclass of RuntimeException and therefore is unchecked, but it is a good idea to do so to make sure the regular expression pattern is valid. The compilation process stores the regular expression in a Pattern object in a form that is ready to be processed by a Matcher state-machine.

A further version of the compile() method enables you to control more closely how the pattern is applied when looking for a match. The second argument is a value of type int that specifies one or more of the following flags that are defined in the Pattern class (shown in Table 15-4):

TABLE 15-4: Flags Controlling Pattern Operation

FLAG DESCRIPTION
CASE_INSENSITIVE Matches ignoring case, but assumes only US-ASCII characters are being matched.
MULTILINE Enables the beginning or end of lines to be matched anywhere. Without this flag only the beginning and end of the entire sequence is matched.
UNICODE_CASE When this is specified in addition to CASE_INSENSITIVE, case-insensitive matching is consistent with the Unicode standard.
DOTALL Makes the expression (which you see shortly) match any character, including line terminators.
LITERAL Causes the string specifying a pattern to be treated as a sequence of literal characters, so escape sequences, for example, are not recognized as such.
CANON_EQ Matches taking account of canonical equivalence of combined characters. For example, some characters that have diacritics may be represented as a single character or as a single character with a diacritic followed by a diacritic character. This flag treats these as a match.
COMMENTS Allows whitespace and comments in a pattern. Comments in a pattern start with # so from the first # to the end of the line is ignored.
UNIX_LINES Enables UNIX lines mode, where only ' ' is recognized as a line terminator.
UNICODE_CHARACTER_CLASS Enables the Unicode version of predefined character classes.

All these flags are unique single-bit values within a value of type int so you can combine them by ORing them together or by simple addition. For example, you can specify the CASE_INSENSITIVE and the UNICODE_CASE flags with the following expression:

Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE
 

Or you can write this as:

Pattern.CASE_INSENSITIVE + Pattern.UNICODE_CASE
 

Beware of using addition when you want to add a flag to a variable representing an existing set of flags. If the flag already exists, addition produces the wrong result because adding the two corresponding bits results in a carry to the next bit. ORing always produces the correct result.

If you want to match "had" ignoring case, you could create the pattern with the following statement:

Pattern had = Pattern.compile("had", Pattern.CASE_INSENSITIVE);
 

In addition to the exception thrown by the first version of the method, this version throws an IllegalArgumentException if the second argument has bit values set that do not correspond to any of the flag constants defined in the Pattern class.

Creating a Matcher

After you have a Pattern object, you can create a Matcher object that can search a specified string, like this:

String sentence = "Smith, where Jones had had 'had', had had 'had had'.";
Matcher matchHad = had.matcher(sentence);
 

The first statement defines the string sentence that you want to search. To create the Matcher object, you call the matcher() method for the Pattern object with the string to be analyzed as the argument. This returns a Matcher object that can analyze the string that was passed to it. The parameter for the matcher() method is actually of type CharSequence. This is an interface that is implemented by the String, StringBuffer, and StringBuilder classes so you can pass a reference of any of these types to the method. The java.nio.CharBuffer class also implements CharSequence, so you can pass the contents of a CharBuffer to the method, too. This means that if you use a CharBuffer to hold character data you have read from a file, you can pass the data directly to the matcher() method to be searched.

An advantage of Java’s implementation of regular expressions is that you can reuse a Pattern object to create Matcher objects to search for the pattern in a variety of strings. To use the same pattern to search another string, you just call the matcher() method for the Pattern object with the new string as the argument. You then have a new Matcher object that you can use to search the new string.

You can also change the string that a Matcher object is to search by calling its reset() method with a new string as the argument. For example:

matchHad.reset("Had I known, I would not have eaten the haddock.");
 

This replaces the previous string, sentence, in the Matcher object, so it is now capable of searching the new string. Like the matcher() method in the Pattern class, the parameter type for the reset() method is CharSequence, so you can pass a reference of type String, StringBuffer, StringBuilder, or CharBuffer to it.

Searching a String

Now that you have a Matcher object, you can use it to search the string. Calling the find() method for the Matcher object searches the string for the next occurrence of the pattern. If it finds the pattern, the method stores information about where it was found in the Matcher object and returns true. If it doesn’t find the pattern, it returns false. When the pattern has been found, calling the start() method for the Matcher object returns the index position in the string where the first character in the pattern was found. Calling the end() method returns the index position following the last character in the pattern. Both index values are returned as type int. Therefore, you could search for the first occurrence of the pattern like this:

if(m.find()) {
  System.out.println("Pattern found. Start: " + m.start() + " End: " + m.end());
} else {
  System.out.println("Pattern not found.");
}
 

Note that you must not call start() or end() for the Matcher object before you have succeeded in finding the pattern. Until a pattern has been matched, the Matcher object is in an undefined state and calling either of these methods results in an exception of type IllegalStateException being thrown.

You usually want to find all occurrences of a pattern in a string. When you call the find() method, searching starts either at the beginning of this matcher’s region, or at the first character not searched by a previous call to find(). Thus, you can easily find all occurrences of the pattern by searching in a loop like this:

while(m.find()) {
  System.out.println(" Start: " + m.start() + " End: " + m.end());
}
 

At the end of this loop the index position is at the character following the last occurrence of the pattern in the string. If you want to reset the index position to zero, you just call an overloaded version of reset() for the Matcher object that has no arguments:

m.reset();                             //Reset this matcher
 

This resets the Matcher object to its state before any search operations were carried out. To make sure you understand the searching process, let’s put it all together in an example.

TRY IT OUT: Searching for a Substring

Here’s a complete example to search a string for a pattern:

image
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.Arrays;
 
class TryRegex {
  public static void main(String args[]) {
    //  A regex and a string in which to search are specified
    String regEx = "had";
    String str = "Smith, where Jones had had 'had', had had 'had had'.";
 
    // The matches in the output will be marked (fixed-width font required)
    char[]  marker = new char[str.length()];
    Arrays.fill(marker,' '),
    //  So we can later replace spaces with marker characters
 
    //  Obtain the required matcher
    Pattern pattern = Pattern.compile(regEx);
    Matcher m = pattern.matcher(str);
 
    // Find every match and mark it
    while( m.find() ){
      System.out.println(
                  "Pattern found at Start: " + m.start() + " End: " + m.end());
      Arrays.fill(marker,m.start(),m.end(),'^'),
    }
 
    // Show the object string with matches marked under it
    System.out.println(str);
    System.out.println(marker);
  }
}

TryRegex.java

This produces the following output:

Pattern found at Start: 19 End: 22
Pattern found at Start: 23 End: 26
Pattern found at Start: 28 End: 31
Pattern found at Start: 34 End: 37
Pattern found at Start: 38 End: 41
Pattern found at Start: 43 End: 46
Pattern found at Start: 47 End: 50
Smith, where Jones had had 'had', had had 'had had'.
                   ^^^ ^^^  ^^^   ^^^ ^^^  ^^^ ^^^
 

How It Works

You first define a string, regEx, containing the regular expression, and a string, str, that you search:

    String regEx = "had";
    String str = "Smith, where Jones had had 'had', had had 'had had'.";
 

You also create an array, marker, of type char[] with the same number of elements as str, that you use to indicate where the pattern is found in the string:

    char[]  marker = new char[str.length()];
 

You fill the elements of the marker array with spaces initially using the static fill() method from the Arrays class:

    Arrays.fill(marker,' '),
 

Later you replace some of the spaces in the array with '^' to indicate where the pattern has been found in the original string.

After compiling the regular expression regEx into a Pattern object, pattern, you create a Matcher object, m, from pattern, which applies to the string str:

    Pattern pattern = Pattern.compile(regEx);
    Matcher m = pattern.matcher(str);
 

You then call the find() method for m in the while loop condition:

    while( m.find() ){
      System.out.println(
             "Pattern found at Start: " + m.start() + " End: " + m.end());
      Arrays.fill(marker, m.start(), m.end(), '^'),
    }
 

This loop continues as long as the find() method returns true. On each iteration you output the index values returned by the start() and end() methods, which reflect the index position where the first character of the pattern was found, and the index position following the last character. You also insert the '^' character in the marker array at the index positions where the pattern was found — again using the fill() method. The loop ends when the find() method returns false, implying that there are no more occurrences of the pattern in the string.

When the loop ends you have found all occurrences of the pattern, so you output str with the contents of the marker array immediately below it. As long as the command-line output uses a fixed-width font, the '^' characters mark the positions where the pattern appears in the string.

You reuse this example as you delve into further options for regular expressions by plugging in different definitions for regEx and the string that is searched, str. The output is more economical if you delete or comment out the statement in the while loop that outputs the start and end index positions.

Matching an Entire String

On some occasions you want to try to match a pattern against an entire string — in other words, you want to establish that the complete string you are searching is a match for the pattern. Suppose you read an input value into your program as a string. This might be from the keyboard or possibly through a dialog box managing the entry data in the graphical user interface for an application. You might want to be sure that the input string is an integer, for example. If input should be of a particular form, you can use a regular expression to determine whether it is correct or not.

The matches() method for a Matcher object tries to match the entire input string with the pattern and returns true only if there is a match. The following code fragment demonstrates how this works:

String input = null;
// Read into input from some source...
 
Pattern yes = Pattern.compile("yes");
Matcher m = yes.matcher(input);
 
if(m.matches()) {                           // Check if input matches "yes"
  System.out.println("Input is yes.");
} else {
  System.out.println("Input is not yes.");
}
 

Of course, this illustration is trivial, but later you see how you can define more sophisticated patterns that can check for a range of possible input forms.

Defining Sets of Characters

A regular expression can be made up of ordinary characters, which are upper- and lowercase letters and digits, plus sequences of meta-characters, which are characters that have a special meaning. The pattern in the previous example was just the word "had", but what if you wanted to search a string for occurrences of "hid" or "hod" as well as "had", or even any three-letter word beginning with "h" and ending with "d"?

You can deal with any of these possibilities with regular expressions. One option is to specify the middle character as a wildcard by using a period; a period is one example of a meta-character. This meta-character matches any character except end-of-line, so the regular expression "h.d", represents any sequence of three characters that starts with "h" and ends with "d". Try changing the definitions of regEx and str in the previous example to

  String regEx = "h.d";
  String str = "Ted and Ned Hodge hid their hod and huddled in the hedge.";
 

If you recompile and run the example again, the last two lines of output are

Ted and Ned Hodge hid their hod and huddled in the hedge.
                  ^^^       ^^^     ^^^            ^^^
 

You can see that you didn’t find "Hod" in Hodge because of the capital "H", but you found all the other 3-letter sequences beginning with "h" and ending with "d".

Of course, the regular expression "h.d" would also have found "hzd" or "hNd" if they had been present, which is not what you want. You can limit the possibilities by replacing the period with just the collection of characters you are looking for between square brackets, like this:

  String regEx = "h[aio]d";
 

The [aio] sequence of meta-characters defines what is called a simple class of characters, consisting in this case of "a", "i", or "o". Here the term class is used in the sense of a set of characters, not a class that defines a type. If you try this version of the regular expression in the previous example, the last two lines of output are:

Ted and Ned Hodge hid their hod and huddled in the hedge.
                  ^^^       ^^^
 

The regular expression now matches all 3-letter sequences that begin with "h" and end with "d" and have a middle letter of "a" or "i" or "o".

You can define character classes in a regular expression in a variety of ways. Table 15-5 gives some examples of the more useful forms:

TABLE 15-5: Character Classes in a Regular Expression

CLASS DESCRIPTION
[aeiou] This is a simple class that any of the characters between the square brackets match — in this example, any lowercase vowel. You used this form in the earlier code fragment to search for variations on "had".
[^aeiou] This represents any character except those appearing to the right of the ^ character between the square brackets. Thus, here you have specified any character that is not a lowercase vowel. Note this is any character, not any letter, so the expression "h[^aeiou]d" looks for "h!d" or "h9d" as well as "hxd" or "hWd". Of course, it rejects "had" or "hid" or any other form with a lowercase vowel as the middle letter.
[a-e] This defines an inclusive range — any of the letters "a" to "e" in this case. You can also specify multiple ranges. For example:
[a-cs-zA-E]
This corresponds to any of the characters from "a" to "c", from "s" to "z", or from "A" to "E".
If you want to specify that a position must contain a digit, you could use [0-9]. To specify that a position can be a letter or a digit, you could express it as [a-zA-Z0-9].

You can use any of these in combination with ordinary characters to form a regular expression. For example, suppose you want to search some text for any sequence beginning with "b", "c", or "d", with "a" as the middle letter, and ending with "d" or "t". You could define the regular expression to do this as:

String regEx = "[b-d]a[dt]";
 

This expression matches any occurrence of "bad", "cad", "dad", "bat", "cat", or "dat".

Logical Operators in Regular Expressions

You can use the && operator to combine classes that define sets of characters. This is particularly useful when you use it combined with the negation operator, ^, that appears in the second row of the table in the preceding section. For example, if you want to specify that any lowercase consonant is acceptable, you could write the expression that matches this as:

"[b-df-hj-np-tv-z]"
 

However, this can much more conveniently be expressed as the following pattern:

"[a-z&&[^aeiou]]"
 

This produces the intersection (in other words, the characters common to both sets) of the set of characters "a" through "z" with the set that is not a lowercase vowel. To put it another way, the lowercase vowels are subtracted from the set "a" through "z" so you are left with just the lowercase consonants.

The | operator is a logical OR that you use to specify alternatives. A regular expression to find "hid", "had", or "hod" could be written as "hid|had|hod". You can try this in the previous example by changing the definition of regEx to:

    String regEx = "hid|had|hod";
 

Note that the | operation means either the whole expression to the left of the operator or the whole expression to the right, not just the characters on either side as alternatives.

You could also use the | operator to define an expression to find sequences beginning with an uppercase or lowercase "h", followed by a lowercase vowel, and ending in "d", like this:

    String regEx = "[h|H][aeiou]d";

The first pair of square brackets encloses the choice of "h" or "H". The second pair of square brackets determines that the next character is any lowercase vowel. The last character must always be "d". With this as the regular expression in the example, the "Hod" in Hodge is found as well as the other variations.

Predefined Character Sets

You also have a number of predefined character classes that provide you with a shorthand notation for commonly used sets of characters. Table 15-6 gives some that are particularly useful:

TABLE 15-6: Predefined Character Classes

CHARACTER CLASS DESCRIPTION
. This represents any character, as you have already seen.
d This represents any digit and is therefore shorthand for [0-9].
D This represents any character that is not a digit. It is therefore equivalent to [^0-9].
s This represents any whitespace character. A whitespace character is a space, a tab ' ', a newline character ' ', a form feed character 'f', a carriage return ' ', or a page break 'x0B'.
S This represents any non-whitespace character and is therefore equivalent to [^s].
w This represents a word character, which corresponds to an upper- or lowercase letter, a digit, or an underscore. It is therefore equivalent to [a-zA-Z_0-9].
W This represents any character that is not a word character, so it is equivalent to [^w].

Note that when you are using any of the sequences that start with a backslash in a regular expression, you need to keep in mind that Java treats a backslash as the beginning of an escape sequence. Therefore, you must specify the backslash in the regular expression as \. For example, to find a sequence of three digits, the regular expression would be "\d\d\d". This is peculiar to Java because of the significance of the backslash in Java strings, so it doesn’t necessarily apply to other environments that support regular expressions, such as Perl.

Obviously, you may well want to include a period, or any of the other meta-characters, as part of the character sequence you are looking for. To do this you can use an escape sequence starting with a backslash in the expression to define such characters. Because Java strings interpret a backslash as the start of a Java escape sequence, the backslash itself has to be represented as \, the same as when using the predefined character sets that begin with a backslash. Thus, the regular expression to find the sequence "had." would be "had\.".

The earlier search you tried with the expression "h.d" found embedded sequences such as "hud" in the word huddled. You could use the s set that corresponds to any whitespace character to prevent this by defining regEx like this:

    String regEx = "\sh.d\s";
 

This searches for a five-character sequence that starts and ends with any whitespace character. The output from the example is now:

Ted and Ned Hodge hid their hod and huddled in the hedge.
                 ^^^^^     ^^^^^
 

You can see that the marker array shows the five-character sequences that were found. The embedded sequences are now no longer included, as they don’t begin and end with a whitespace character.

To take another example, suppose you want to find hedge or Hodge as words in the sentence, bearing in mind that there’s a period at the end. You could do this by defining the regular expression as:

    String regEx = "\s[h|H][e|o]dge[\s|\.]";
 

The first character is defined as any whitespace by \s. The next character is defined as either "h" or "H" by [h|H]. This can be followed by either "e" or "o" specified by [e|o]. This is followed by plaintext dge with either a whitespace character or a period at the end, specified by [\s|\.]. This doesn’t cater to all possibilities. Sequences at the beginning of the string are not found, for example, nor are sequences followed by a comma. You see how to deal with these next.

Matching Boundaries

So far you have been trying to find the occurrence of a pattern anywhere in a string. In many situations you will want to be more specific. You may want to look for a pattern that appears at the beginning of a line in a string but not anywhere else, or maybe just at the end of any line. As you saw in the previous example, you may want to look for a word that is not embedded — you want to find the word "cat" but not the "cat" in "cattle" or in "Popacatapetl", for example. The previous example worked for the string you were searching but would not produce the right result if the word you were looking for was followed by a comma or appeared at the end of the text. However, you have other options for specifying the pattern. You can use a number of special sequences in a regular expression when you want to match a particular boundary. For example, those presented in Table 15-7 are especially useful:

TABLE 15-7: Boundary Matching in a Regular Expression

SEQUENCE BOUNDARY MATCHED
^ Specifies the beginning of a line. For example, to find the word Java at the beginning of any line you could use the expression "^Java".
$ Specifies the end of a line. For example, to find the word Java at the end of any line you could use the expression "Java$". Of course, if you were expecting a period at the end of a line the expression would be "Java\.$".
 Specifies a word boundary. To find three-letter words beginning with 'h' and ending with 'd', you could use the expression "\bh.d\b".
B A non-word boundary — the complement of .
A Specifies the beginning of the string being searched. To find the word The at the very beginning of the string being searched, you could use the expression "\AThe\b". The \b at the end of the regular expression is necessary to avoid finding Then or There at the beginning of the input.
z Specifies the end of the string being searched. To find the word hedge followed by a period at the end of a string, you could use the expression “\bhedge\.\z".
 The end of input except for the final terminator. A final terminator is a newline character (' ') if Pattern.UNIX_LINES is set. Otherwise, it can also be a carriage return (' '), a carriage return followed by a newline character, a next-line character ('u0085'), a line separator ('u2028'), or a paragraph separator ('u2029').

Although you have moved quite a way from the simple search for a fixed substring offered by the String class methods, you still can’t search for sequences that may vary in length. If you wanted to find all the numerical values in a string, which might be sequences such as 1234 or 23.45 or 999.998, for example, you don’t yet have the ability to do that. You can fix that now by taking a look at quantifiers in a regular expression and what they can do for you.

Using Quantifiers

A quantifier following a subsequence of a pattern determines the possibilities for how that subsequence of a pattern can repeat. Let’s take an example. Suppose you want to find any numerical values in a string. If you take the simplest case, we can say an integer is an arbitrary sequence of one or more digits. The quantifier for one or more is the meta-character "+". You have also seen that you can use d as shorthand for any digit (remembering, of course, that it becomes \d in a Java String literal), so you could express any sequence of digits as the regular expression:

"\d+"
 

Of course, a number may also include a decimal point and may be optionally followed by further digits. To indicate something can occur just once or not at all, as is the case with a decimal point, you can use the ? quantifier. You can write the pattern for a sequence of digits followed by an optional decimal point as:

 "\d+\.?"
 

To add the possibility of further digits, you can append \d+ to what you have so far to produce the expression:

"\d+\.?\d+"
 

This is a bit untidy. You can rewrite this as an integral part followed by an optional fractional part by putting parentheses around the bit for the fractional part and adding the ? operator:

"\d+(\.\d+)?"
 

However, this isn’t quite right. You can have 2. as a valid numerical value, for example, so you want to specify zero or more appearances of digits in the fractional part. The * quantifier expresses that, so maybe you should use:

"\d+(\.\d*)?"
 

You are still missing something, though. What about the value .25 or the value -3? The optional sign in front of a number is easy, so let’s deal with that first. To express the possibility that + or - can appear, you can use [+|-], and because this either appears or it doesn’t, you can extend it to [+|-]?. So to add the possibility of a sign, you can write the expression as:

 "[+|-]?\d+(\.\d*)?"
 

You have to be careful how you allow for numbers beginning with a decimal point. You can’t allow a sign followed by a decimal point or just a decimal point by itself to be interpreted as a number, so you can’t say a number starts with zero or more digits or that the leading digits are optional. You could define a separate expression for numbers without leading digits like this:

"[+|-]?\.\d+"
 

Here then is an optional sign followed by a decimal point and at least one digit. With the other expression there is also an optional sign, so you can combine these into a single expression to recognize either form, like this:

"[+|-]?(\d+(\.\d*)?)|(\.\d+)"
 

This regular expression identifies substrings with an optional plus or minus sign followed by either a substring defined by "\d+(\.\d*)?" or a substring defined by "\.\d+". You might be tempted to use square brackets instead of parentheses here, but this would be quite wrong as square brackets define a set of characters, so any single character from the set is a match.

That was probably a bit more work than you anticipated, but it’s often the case that things that look simple at first sight can turn out to be a little tricky. Let’s try that out in an example.

TRY IT OUT: Finding Numbers

This is similar to the code we have used in previous examples except that here we just list each substring that is found to correspond to the pattern:

image
import java.util.regex.Pattern;
import java.util.regex.Matcher;
 
public class FindingNumbers {
  public static void main(String args[]) {
    String regEx = "[+|-]?(\d+(\.\d*)?)|(\.\d+)";
    String str = "256 is the square of 16 and -2.5 squared is 6.25 " +
                                            "and -.243 is less than 0.1234.";
    Pattern pattern = Pattern.compile(regEx);
    Matcher m = pattern.matcher(str);
    String subStr = null;
    while(m.find()) {
      System.out.println(m.group());            // Output the substring matched
    }
  }
}
 

FindingNumbers.java

This produces the following output:

256
16
-2.5
6.25
.243
0.1234
 

How It Works

Well, you found all the numbers in the string, so our regular expression works well, doesn’t it? You can’t do that with the methods in the String class. The only new code item here is the method, group(), that you call in the while loop for the Matcher object, m. This method returns a reference to a String object that contains the subsequence corresponding to the last match of the entire pattern. Calling the group() method for the Matcher object m is equivalent to the expression str.substring(m.start(), m.end()).

Tokenizing a String

You saw in Chapter 4 that you could tokenize a string using the split() method for a String object. As I mentioned then, the split() method does this by applying a regular expression — in fact, the first argument to the method is interpreted as a regular expression. This is because the expression text.split(str, limit), where text is a String variable, is equivalent to the expression:

Pattern.compile(str).split(text, limit)
 

This means that you can apply all of the power of regular expressions to the identification of delimiters in the string. To demonstrate that this is the case, I will repeat the example from Chapter 4, but modify the first argument to the split() method so only the words in the text are included in the set of tokens.

TRY IT OUT: Extracting the Words from a String

Here’s the code for the modified version of the example:

image
public class StringTokenizing {
  public static void main(String[] args) {
    String text =
              "To be or not to be, that is the question."; // String to segment
    String delimiters = "[^\w]+";
 
    // Analyze the string
    String[] tokens = text.split(delimiters);
 
    // Output the tokens
    System.out.println("Number of tokens: " + tokens.length);
    for(String token : tokens) {
      System.out.println(token);
    }
  }
}

StringTokenizing.java

Now you should get the following output:

Number of tokens: 10
To
be
or
not
to
be
that
is
the
question
 

How It Works

The program produces 10 tokens in the output, which is the number of words in the text. The original version in Chapter 4 treated a comma followed by a space as two separate tokens and produced an empty token as a result. The pattern "[^\w]+" matches one or more characters that are not word characters; i.e., not uppercase or lowercase letters, or digits. This means the delimiters pattern includes one or more spaces, periods, exclamations marks and question marks, and all the words in text are found.

Search and Replace Operations

You can implement a search and replace operation very easily using regular expressions. Whenever you call the find() method for a Matcher object, you can call the appendReplacement() method to replace the subsequence that was matched. You create a revised version of the original string in a new StringBuffer object that you supply to the method. The arguments to the appendReplacement() method are a reference to the StringBuffer object that is to contain the new string, and the replacement string for the matched text. You can see how this works by considering a specific example.

Suppose you define a string to be searched as:

String joke = "My dog hasn't got any nose.
" +
              "How does your dog smell then?
" +
              "My dog smells horrible.
";
 

You now want to replace each occurrence of "dog" in the string by "goat". You first need a regular expression to find "dog":

String regEx = "dog";
 

You can compile this into a pattern and create a Matcher object for the string joke:

Pattern doggone = Pattern.compile(regEx);
Matcher m = doggone.matcher(joke);
 

You are going to assemble a new version of joke in a StringBuffer object that you can create like this:

StringBuffer newJoke = new StringBuffer();
 

This is an empty StringBuffer object ready to receive the revised text. You can now find and replace instances of "dog" in joke by calling the find() method for m and calling appendReplacement() each time it returns true:

while(m.find()) {
  m.appendReplacement(newJoke, "goat");
}

Each call of appendReplacement() copies characters from joke to newJoke starting at the character where the previous find() operation started and ending at the character preceding the first character matched: at m.start()-1, in other words. The method then appends the string specified by the second argument to newJoke. This process is illustrated in Figure 15-4.

The find() method returns true three times, once for each occurrence of "dog" in joke. When the three steps shown in the diagram have been completed, the find() method returns false on the next iteration, terminating the loop. This leaves newJoke in the state shown in the last box in Figure 15-4. All you now need to complete newJoke is a way to copy the text from joke that comes after the last subsequence that was found. The appendTail() method for the Matcher object does that:

m.appendTail(newJoke);
 

This copies the text starting with the m.end() index position from the last successful match through to the end of the string. Thus this statement copies the segment "smells horrible." from joke to newJoke. You can put all that together and run it.

TRY IT OUT: Search and Replace

Here’s the code I have just discussed assembled into a complete program:

image
import java.util.regex.Pattern;
import java.util.regex.Matcher;
 
class SearchAndReplace {
  public static void main(String args[]) {
    String joke = "My dog hasn't got any nose.
"
                 +"How does your dog smell then?
"
                 +"My dog smells horrible.
";
    String regEx = "dog";
 
    Pattern doggone = Pattern.compile(regEx);
    Matcher m = doggone.matcher(joke);
 
    StringBuffer newJoke = new StringBuffer();
    while(m.find()) {
      m.appendReplacement(newJoke, "goat");
    }
    m.appendTail(newJoke);
    System.out.println(newJoke);
  }
}

SearchAndReplace.java

When you compile and execute this you should get the following output:

My goat hasn't got any nose.
How does your goat smell then?
My goat smells horrible.
 

How It Works

Each time the find() method in the while loop condition returns true, you call the appendReplacement() method for the Matcher object m. This copies characters from joke to newJoke, starting at the index position where find()started searching, and ending at the character preceding the first character in the match, which is at m.start()-1. The method then appends the replacement string, "goat", to newJoke.

After the loop finishes, the appendTail() method copies characters from joke to newJoke, starting with the character following the last match at m.end() through to the end of joke. Thus, you end up with a new string similar to the original, but which has each instance of "dog" replaced by "goat".

You can use the search and replace capability to solve some string manipulation problems very easily. For example, if you want to make sure that any sequence of one or more whitespace characters is replaced by a single space, you can define the regular expression as "\s+" and the replacement string as a single space " ". To eliminate all spaces at the beginning of each line, you can use the expression "^\s+" and define the replacement string as empty, "". You must specify Pattern.MULTILINE as the flag for the compile() method for this to work.

Using Capturing Groups

Earlier you used the group() method for a Matcher object to retrieve the subsequence matched by the entire pattern defined by the regular expression. The entire pattern represents what is called a capturing group because the Matcher object captures the subsequence corresponding to the pattern match. Regular expressions can also define other capturing groups that correspond to parts of the pattern. Each pair of parentheses in a regular expression defines a separate capturing group in addition to the group that the whole expression defines. In the earlier example, you defined the regular expression by the following statement:

    String regEx = "[+|-]?(\d+(\.\d*)?)|(\.\d+)";
 

This defines three capturing groups other than the whole expression: one for the subexpression (\d+(\.\d*)?), one for the subexpression (\.\d*), and one for the subexpression (\.\d+). The Matcher object stores the subsequence that matches the pattern defined by each capturing group, and what’s more, you can retrieve them.

To retrieve the text matching a particular capturing group, you need a way to identify the capturing group that you are interested in. To this end, capturing groups are numbered. The capturing group for the whole regular expression is always number 0. Counting their opening parentheses from the left in the regular expression numbers the other groups. Thus, the first opening parenthesis from the left corresponds to capturing group 1, the second corresponds to capturing group 2, and so on for as many opening parentheses as there are in the whole expression. Figure 15-5 illustrates how the groups are numbered in an arbitrary regular expression.

As you see, it’s easy to number the capturing groups as long as you can count left parentheses. Group 1 is the same as Group 0 because the whole regular expression is parenthesized. The other capturing groups in sequence are defined by (B), (C(D)), (D), and (E).

To retrieve the text matching a particular capturing group after the find() method returns true, you call the group() method for the Matcher object with the group number as the argument. The groupCount() method for the Matcher object returns a value of type int that specifies the number of capturing groups within the pattern — that is, excluding group 0, which corresponds to the whole pattern. Therefore, you have all you need to access the text corresponding to any or all of the capturing groups in a regular expression.

TRY IT OUT: Capturing Groups

Let’s modify our earlier example to output the text matching each group:

image
import java.util.regex.Pattern;
import java.util.regex.Matcher;
 
public class TryCapturingGroups {
  public static void main(String args[]) {
    String regEx = "[+|-]?(\d+(\.\d*)?)|(\.\d+)";
    String str = "256 is the square of 16 and -2.5 squared is 6.25 " +
                                            "and -.243 is less than 0.1234.";
    Pattern pattern = Pattern.compile(regEx);
    Matcher m = pattern.matcher(str);
    while(m.find()) {
      for(int i = 0; i <= m.groupCount() ; i++) {
        System.out.println(
                        "Group " + i + ": " + m.group(i)); // Group i substring
      }
    }
  }
}
 

TryCapturingGroups.java

This produces the following output:

Group 0: 256
Group 1: 256
Group 2: null
Group 3: null
Group 0: 16
Group 1: 16
Group 2: null
Group 3: null
Group 0: -2.5
Group 1: 2.5
Group 2: .5
Group 3: null
Group 0: 6.25
Group 1: 6.25
Group 2: .25
Group 3: null
Group 0: .243
Group 1: null
Group 2: null
Group 3: .243
Group 0: 0.1234
Group 1: 0.1234
Group 2: .1234
Group 3: null
 

How It Works

The regular expression here defines four capturing groups:

  • Group 0: The whole expression
  • Group 1: The subexpression "(\d+(\.\d*)?)"
  • Group 2: The subexpression "(\.\d*)"
  • Group 3: The subexpression "(\.\d+)"

After each successful call of the find() method for m, you output the text captured by each group by passing the index value for the group to the group() method. Note that because you want to output group 0 as well as the other groups, you start the loop index from 0 and allow it to equal the value returned by groupCount() so as to index over all the groups.

You can see from the output that group 1 corresponds to numbers beginning with a digit, and group 3 corresponds to numbers starting with a decimal point, so either one or the other of these is always null. Group 2 corresponds to the sub-pattern within group 1 that matches the fractional part of a number that begins with a digit, so the text for this can be non-null only when the text for group 1 is non-null and the number has a decimal point.

Juggling Captured Text

Because you can get access to the text corresponding to each capturing group, you can move such blocks of text around. The appendReplacement() method has special provision for recognizing references to capturing groups in the replacement text string. If $n, where n is an integer, appears in the replacement string, it is interpreted as the text corresponding to group n. You can therefore replace the text matched to a complete pattern by any sequence of your choosing of the subsequences corresponding to the capturing groups in the pattern. That’s hard to describe in words, so let’s demonstrate it with an example.

TRY IT OUT: Rearranging Captured Group Text

I'm sure you remember that the Math.pow() method requires two arguments; the second argument is the power to which the first argument must be raised. Thus, to calculate 163 you can write:

double result = Math.pow(16.0, 3.0);
 

Let’s suppose a weak programmer on your team has written a Java program in which the two arguments have mistakenly been switched, so in trying to compute 163 the programmer has written:

double result = Math.pow(3.0, 16.0);
 

Of course, this computes 316, which is not quite the same thing. Let’s suppose further that this sort of error is strewn throughout the source code and in every case the arguments are the wrong way round. You would need a month of Sundays to go through manually and switch the argument values, so let’s see if regular expressions can rescue the situation.

What you need to do is find each occurrence of Math.pow() and switch the arguments around. The intention here is to understand how you can switch things around, so I’ll keep it simple and assume that the argument values to Math.pow() are always a numerical value or a variable name.

The key to the whole problem is to devise a regular expression with capturing groups for the bits you want to switch — the two arguments. Be warned: This is going to get a little messy, not difficult though — just messy.

You can define the first part of the regular expression that finds the sequence "Math.pow(" at any point, and where you want to allow an arbitrary number of whitespace characters, you can use the sequence \s*. Recall that \s in a Java string specifies the predefined character class s, which is whitespace. The * quantifier specifies zero or more of them. If you allow for whitespace between Math.pow and the opening parenthesis for the arguments, and some more whitespace after the opening parenthesis, the regular expression is:

"(Math.pow)\s*\(\s*"
 

You have to specify the opening parenthesis by "\(". An opening parenthesis is a meta-character, so you have to write it as an escape sequence.

The opening parenthesis is followed by the first argument, which I said could be a number or a variable name. You created a regular expression to identify a number earlier:

"[+|-]?(\d+(\.\d*)?)|(\.\d+)"

To keep things simple, you assume that a variable name is just any sequence of letters, digits, or underscores that begins with a letter or an underscore. This avoids getting involved with qualified names. You can match a variable name with the expression:

"[a-zA-Z_]\w*"
 

You can therefore match either a variable name or a number with the pattern:

"(([a-zA-Z_]\w*)|([+|-]?(\d+(\.\d*)?)|(\.\d+)))"
 

This just ORs the two possibilities together and parenthesizes the whole thing so it is a capturing group.

A comma that may be surrounded by zero or more whitespace characters on either side follows the first argument. You can match that with the pattern:

\s*,\s*
 

The pattern to match the second argument will be exactly the same as the first:

"(([a-zA-Z_]\w*)|([+|-]?(\d+(\.\d*)?)|(\.\d+)))"
 

Finally, this must be followed by a closing parenthesis that may or may not be preceded by whitespace:

\s*\)
 

You can put all this together to define the entire regular expression as the value for a String variable:

String regEx = "(Math.pow)" +                                 // Math.pow
    "\s*\(\s*" +                                           // Opening (
    "(([a-zA-Z_]\w*)|([+|-]?(\d+(\.\d*)?)|(\.\d+)))" +  // First argument
    "\s*,\s*" +                                             // Comma
    "(([a-zA-Z_]\w*)|([+|-]?(\d+(\.\d*)?)|(\.\d+)))" +  // Second argument
    "\s*\)";                                                // Closing )
 

Here you assemble the string literal for the regular expression by concatenating six separate string literals. Each of these corresponds to an easily identified part of the method call. If you count the left parentheses, excluding the escaped parenthesis of course, you can also see that capturing group 1 corresponds with the method name, group 2 is the first method argument, and group 8 is the second method argument.

You can put this in the following example:

image
import java.util.regex.Pattern;
import java.util.regex.Matcher;
 
public class RearrangeText {
  public static void main(String args[]) {
    String regEx = "(Math.pow)"                               // Math.pow
    + "\s*\(\s*"                                           // Opening (
    + "(([a-zA-Z_]\w*)|([+|-]?(\d+(\.\d*)?)|(\.\d+)))"  // First argument
    + "\s*,\s*"                                             // Comma
    + "(([a-zA-Z_]\w*)|([+|-]?(\d+(\.\d*)?)|(\.\d+)))"  // Second argument
    + "\s*\)";                                              // Closing )
 
    String oldCode =
          "double result = Math.pow( 3.0, 16.0);
" +
          "double resultSquared = Math.pow(2 ,result );
" +
          "double hypotenuse = Math.sqrt(Math.pow(2.0, 30.0)+Math.pow(2 , 40.0));
";
    Pattern pattern = Pattern.compile(regEx);
    Matcher m = pattern.matcher(oldCode);
 
    StringBuffer newCode = new StringBuffer();
    while(m.find()) {
      m.appendReplacement(newCode, "$1\($8,$2\)");
    }
    m.appendTail(newCode);
 
    System.out.println("Original Code:
" + oldCode);
    System.out.println("New Code:
" + newCode);
  }
}

RearrangeText.java

You should get the following output:

Original Code:
double result = Math.pow( 3.0, 16.0);
double resultSquared = Math.pow(2 ,result );
double hypotenuse = Math.sqrt(Math.pow(2.0, 30.0)+Math.pow(2 , 40.0));
 
New Code:
double result = Math.pow(16.0,3.0);
double resultSquared = Math.pow(result,2);
double hypotenuse = Math.sqrt(Math.pow(30.0,2.0)+Math.pow(40.0,2));
 

How It Works

You have defined the regular expression so that separate capturing groups identify the method name and both arguments. As you saw earlier, the method name corresponds to group 1, the first argument to group 2, and the second argument to group 8. You therefore define the replacement string to the appendReplacement() method as "$1\($8,$2\)". The effect of this is to replace the text for each method call that is matched by the following items in Table 15-8, in sequence:

TABLE 15-8: Matching a Method Call

ITEM TEXT THAT IS MATCHED
$1 The text matching capturing group 1 — the method name
\( A left parenthesis
$8 The text matching capturing group 8 — the second argument
, A comma
$2 The text matching capturing group 2 — the first argument
\) A right parenthesis

The call to appendTail() is necessary to ensure that any text left at the end of oldCode following the last match for regEx gets copied to newCode.

In the process, you have eliminated any superfluous whitespace that was lying around in the original text.

USING A SCANNER

The java.util.Scanner class defines objects that use regular expressions to scan character input from a variety of sources and present the input as a sequence of tokens of various primitive types or as strings. For example, you can use a Scanner object to read data values of various types from a file or a stream, including the standard stream System.in. Indeed, using a Scanner object would have saved you the trouble of developing the FormattedInput class in Chapter 8 — still, it was good practice, wasn’t it?

The facilities provided by the Scanner class are quite extensive, so I’m not able to go into all of it in detail because of space limitations. I just provide you with an idea of how the scanner mechanisms you are likely to find most useful can be applied. After you have a grasp of the basics, I’m sure you’ll find the other facilities quite easy to use.

Creating Scanner Objects

A Scanner object can scan a source of text and parse it into tokens using regular expressions. You can create a Scanner object by passing an object that encapsulates the source of the data to a Scanner constructor. You can construct a Scanner from any of the following types:

InputStream  File  Path  ReadableByteChannel  Readable  String
 

The Scanner object that is created is able to read data from whichever source you supply as the argument to the constructor. Readable is an interface implemented by objects of type such as BufferedReader, CharBuffer, InputStreamReader, and a number of other readers, so you can create a Scanner object that scans any of these. For input from an external source, such as an InputStream or a file identified by a Path object or a File object, bytes are converted into characters either using the default charset in effect or using a charset that you specify as a second argument to the constructor.

Of course, read operations for Readable sources may also result in an IOException being thrown. If this occurs, the Scanner object interprets this as signaling that the end of input has been reached and does not rethrow the exception. You can test whether an IOException has been thrown when reading from a source by calling the ioException() method for the Scanner object; the method returns the exception object if the source has thrown an IOException.

Let’s take the obvious example of a source from which you might want to interpret data. To obtain a Scanner object that scans input from the keyboard, you could use the following statement:

java.util.Scanner keyboard = new java.util.Scanner(System.in);
 

Creating a Scanner object to read from a file is a little more laborious because of the exception that might be thrown:

Path file = Paths.get("TryScanner.java");
try (Scanner fileScan = new Scanner(file)){
  // Scan the input...
} catch(IOException e) {
  e.printStackTrace();
  System.exit(1);
}
 

This creates a Scanner object that you can use to scan the file TryScanner.java. The Scanner class implements AutoClosable so you can create it in the form of a try block with resources.

Getting Input from a Scanner

By default, a Scanner object reads tokens assuming they are delimited by whitespace. Whitespace corresponds to any character for which the isWhitespace() method in the Character class returns true. Reading a token therefore involves skipping over any delimiter characters until a non-delimiter character is found and then attempting to interpret the sequence of non-delimiter characters in the way you have requested. You can read tokens of primitive types from the scanner source using the methods found in Table 15-9.

TABLE 15-9: Calendar Field Setting Options

METHOD DESCRIPTION
nextByte() Reads and returns the next token as type byte
nextShort() Reads and returns the next token as type short
nextInt() Reads and returns the next token as type int
nextLong() Reads and returns the next token as type long
nextFloat() Reads and returns the next token as type float
nextDouble() Reads and returns the next token as type double
nextBoolean() Reads and returns the next token as type boolean

The first four methods each have an overloaded version that accepts an argument of type int specifying the radix to be used in the interpretation of the value. All of these methods throw a java.util.InputMismatchException if the input does not match the regular expression for the input type being read or a java.util.NoSuchElementException if the input is exhausted. Note that type NoSuchElementException is a superclass of type InputMismatchException, so you must put a catch clause for the latter first if you intend to catch both types of exceptions separately. The methods can also throw an exception of type IllegalStateException if the scanner is closed.

If the input read does not match the token you are trying to read, the invalid input is left in the input buffer, so you have an opportunity to try an alternative way of matching it. Of course, if it is simply erroneous input, you should skip over it before continuing. In this case you can call the next() method for the Scanner object, which reads the next token up to the next delimiter in the input and returns it as a String object.

The Scanner class also defines nextBigInteger() and nextBigDecimal() methods that read the next token as a java.math.BigInteger object or a java.math.BigDecimal object, respectively. The BigInteger class defines objects that encapsulate integers with an arbitrary number of digits and provides the methods you need to work with such values. The BigDecimal class does the same thing for non-integral values.

You have enough knowledge to try out a scanner, so let’s do it.

TRY IT OUT: Using a Scanner

Here’s a simple example that just reads a variety of input from the standard input stream and displays what was read from the keyboard:

image
import java.util.Scanner;
import java.util.InputMismatchException;
 
public class TryScanner {
  public static void main(String[] args) {
    Scanner kbScan = new Scanner(System.in);     // Create the scanner
    int selectRead = 1;                          // Selects the read operation
    final int MAXTRIES = 3;                       // Maximum attempts at input
    int tries = 0;                               // Number of input attempts
 
    while(tries < MAXTRIES) {
      try {
        switch(selectRead) {
          case 1:
          System.out.print("Enter an integer: ");
          System.out.println("You entered: " + kbScan.nextLong());
          ++selectRead;                          // Select next read operation
          tries = 0;                             // Reset count of tries
 
          case 2:
          System.out.print("Enter a floating-point value: ");
          System.out.println("You entered: " + kbScan.nextDouble());
          ++selectRead;                          // Select next read operation
          tries = 0;                             // Reset count of tries
 
          case 3:
          System.out.print("Enter a boolean value(true or false): ");
          System.out.println("You entered: " + kbScan.nextBoolean());
        }
        break;
      } catch(InputMismatchException e) {
          String input = kbScan.next();
          System.out.println(""" + input + "" is not valid input.");
          if(tries<MAXTRIES) {
            System.out.println("Try again.");
          } else {
            System.out.println(" Terminating program.");
            System.exit(1);
         }
      }
    }
  }
}
 

TryScanner.java

You probably get a compiler warning about possible fall-through in the switch case statements, but it is intentional. With my limited typing skills, I got the following output:

Enter an integer: 1$
"1$" is not valid input.
Try again.
Enter an integer: 14
You entered: 14
Enter a floating-point value: 2e1
You entered: 20.0
Enter a boolean value(true or false): tree
"tree" is not valid input.
Try again.
Enter a boolean value(true or false): true
You entered: true
 

How It Works

You use a scanner to read values of three different types from the standard input stream. The read operations take place in a loop to allow multiple attempts at correct input. Within the loop you have a rare example of a switch statement that doesn’t require a break statement after each case. In this case you want each case to fall through to the next. The selectRead variable that selects a switch case provides the means by which you manage subsequent attempts at correct input, because it records the case label currently in effect.

If you enter invalid input, an InputMismatchException is thrown by the Scanner method that is attempting to read a token of a particular type. In the catch block, you call the next() method for the Scanner object to retrieve and thus skip over the input that was not recognized. You then continue with the next while loop iteration to allow a further attempt at reading the token.

Testing for Tokens

The hasNext() method for a Scanner object returns true if another token is available from the input source. You can use this in combination with the next() method to read a sequence of tokens of any type from a source, delimited by whitespace. For example:

Path file = Paths.get("TryScanner.java");
try (Scanner fileScan = new Scanner(file)){
  String token = null;
  while(fileScan.hasNext()) {
    token = fileScan.next();
    // Do something with the token read...
  }
} catch(IOException e) {
  e.printStackTrace();
  System.exit(1);
}
 

Here you are just reading an arbitrary number of tokens as strings. In general, the next() method can throw an exception of type NoSuchElementException, but this cannot happen here because you use the hasNext() method to establish that there is another token to be read before you call the next() method.

The Scanner object can do better than this. In addition to the hasNext() method that checks whether a token of any kind is available, you have methods such as hasNextInt() and hasNextDouble() for testing for the availability of any of the types that you can read with methods such as nextInt() and nextDouble(). This enables you to code so that you can process tokens of various types, even when you don’t know ahead of time the sequence in which they will be received. For example:

while(fileScan.hasNext()) {
  if(fileScan.hasNextInt()) {
  // Process integer input...
 
  } else if(fileScan.hasNextDouble()) {
  // Process floating-point input...
 
  } else if(fileScan.hasNextBoolean()) {
  // Process boolean input...
 
  }
}
 

The while loop continues as long as there are tokens of any kind available from the scanner. The if statements within the loop decide how the next token is to be processed, assuming it is one of the ones that you are interested in. If you want to skip tokens that you don’t want to process within the loop, you call the next() method for fileScan.

Defining Your Own Patterns for Tokens

The Scanner class provides a way for you to specify how a token should be recognized. You use one of two overloaded versions of the next() method to do this. One version accepts an argument of type Pattern that you produce by compiling a regular expression in the way you saw earlier in this chapter. The other accepts an argument of type String that specifies a regular expression that identifies the token. In both cases the token is returned as type String.

There are also overloaded versions of the hasNext() method that accept either a Pattern argument, or a String object containing a regular expression that identifies a token. You use these to test for tokens of your own specification. You could see these in action in an example that scans a string for a token specified by a simple pattern.

TRY IT OUT: Scanning a String

This example scans a string looking for occurrences of "had":

image
import java.util.Scanner;
import java.util.regex.Pattern;
 
public class ScanString {
  public static void main(String[] args) {
    String str = "Smith , where Jones had had 'had', had had 'had had'.";
    String regex = "had";
    System.out.println("String is:
" + str + "
Token sought is: " + regex);
 
    Pattern had = Pattern.compile(regex);
    Scanner strScan = new Scanner(str);
    int hadCount = 0;
    while(strScan.hasNext()) {
      if(strScan.hasNext(had)) {
        ++hadCount;
        System.out.println("Token found!: " + strScan.next(had));
      } else {
        System.out.println("Token is    : " + strScan.next());
      }
    }
    System.out.println(hadCount + " instances of "" + regex +  "" were found.");
  }
}

ScanString.java

This program produces the following output:

String is:
Smith , where Jones had had 'had', had had 'had had'.
Token sought is: had
Token is    : Smith
Token is    : ,
Token is    : where
Token is    : Jones
Token found!: had
Token found!: had
Token is    : 'had',
Token found!: had
Token found!: had
Token is    : 'had
Token is    : had'.
4 instances of "had" were found.
 

How It Works

After defining the string to be scanned and the regular expression that defines the form of a token, you compile the regular expression into a Pattern object. Passing a Pattern object to the hasNext() method (or the next() method) is more efficient than passing the original regular expression when you are calling the method more than once. When you pass a regular expression as a String object to the hasNext() method, the method must compile it to a pattern before it can use it. If you compile the regular expression first and pass the Pattern object as the argument, the compile operation occurs only once.

You scan the string, str, in the while loop. The loop continues as long as there is another token available from the string. Within the loop, you check for the presence of a token defined by regex by calling the hasNext() method with the had pattern as the argument:

      if(strScan.hasNext(had)) {
        ++hadCount;
        System.out.println("Token found!: " + strScan.next(had));
      } else {
        System.out.println("Token is    : " + strScan.next());
      }
 

If hasNext()returns true, you increment hadCount and output the token returned by next() with the argument as had. Of course, you could just as well have used the next() method with no argument here. If the next token does not correspond to had, you read it anyway with the next() method. Finally, you output the number of times your token was found.

From the output you can see that only four instances of "had" were found. This is because the scanner assumes the delimiter is one or more whitespace characters. If you don’t like this you can specify another regular expression that the scanner should use for the delimiter:

    strScan.useDelimiter("[^\w*]");
 

The useDelimiter() method expects an argument of type String that specifies a regular expression for recognizing delimiters. In this case the expression implies a delimiter is any number of characters that are not uppercase or lowercase letters, or digits. If you add this statement following the creation of the Scanner object the program should find all the "had" tokens.

SUMMARY

This chapter has been a brief canter through some of the interesting and useful classes available in the java.util package. The ones I chose to discuss are those that seem to me to be applicable in a wide range of application contexts, but there’s much more to this package than I have had the space to discuss here. You should find it is a rewarding exercise to delve into the contents of this package a little further.

EXERCISES

You can download the source code for the examples in the book and the solutions to the following exercises from www.wrox.com.

1. Define a static method to fill an array of type char[] with a given value passed as an argument to the method.

2. For the adventurous gambler — use a stack and a Random object in a program to simulate a game of Blackjack for one player using two decks of cards.

3. Write a program to display the sign of the Zodiac corresponding to a birth date entered through the keyboard.

4. Write a program using regular expressions to remove spaces from the beginning and end of each line in a file.

5. Write a program using a regular expression to reproduce a file with a sequential line number starting at “0001" inserted at the beginning of each line in the original file. You can use a copy of your Java source file as the input to test this.

6. Write a program using a regular expression to eliminate any line numbers that appear at the beginning of lines in a file. You can use the output from the previous exercise as a test for your program.

image

• WHAT YOU LEARNED IN THIS CHAPTER

TOPIC CONCEPTS
The Arrays Class The java.util.Arrays class provides static methods for sorting, searching, filling, copying, and comparing arrays.
The Random Class Objects of type java.util.Random can generate pseudo-random numbers of type int, long, float, and double. The integers are uniformly distributed across the range of the type int or long. The floating-point numbers are between 0.0 and 1.0. You can also generate numbers of type double with a Gaussian distribution with a mean of 0.0 and a standard deviation of 1.0 and random boolean values.
The Observable Class Classes derived from the java.util.Observable class can signal changes to classes that implement the Observer interface. You define the Observer objects that are to be associated with an Observable class object by calling the addObserver() method. This is primarily intended to be used to implement the document/view architecture for applications in a GUI environment.
The Date Class You can create java.util.Date objects to represent a date and time that you specify in milliseconds since January 1, 1970, 00:00:00 GMT or as the current date and time from your computer clock.
The DateFormat Class You can use a java.util.DateFormat object to format the date and time for a Date object as a string. The format is determined by the style and the locale that you specify.
The GregorianCalendar Class A java.util.GregorianCalendar object represents a calendar set to an instant in time on a given date.
Regular Expressions A regular expression defines a pattern that is used for searching text.
Patterns and Matchers In Java, a regular expression is compiled into a java.util.Pattern object that you can then use to obtain a java.util.Matcher object that scans a given string looking for the pattern.
Making Pattern Substitutions The appendReplacement() method for a Matcher object enables you to make substitutions for patterns found in the input text.
Capturing Groups A capturing group in a regular expression records the text that matches a sub-pattern. By using capturing groups you can rearrange the sequence of substrings in a string matching a pattern.
The Scanner Class A java.util.Scanner object uses a regular expression to segment data from a variety of sources into tokens.
image
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.186.247