Chapter 8. Text and Core Utilities

If you’ve been reading this book sequentially, you’ve read all about the core Java language constructs, including the object-oriented aspects of the language and the use of threads. Now it’s time to shift gears and start talking about the Java Application Programming Interface (API), the collection of classes that compose the standard Java packages and come with every Java implementation. Java’s core packages are one of its most distinguishing features. Many other object-oriented languages have similar features, but none has as extensive a set of standardized APIs and tools as Java does. This is both a reflection of and a reason for Java’s success.

Strings

We’ll start by taking a closer look at the Java String class (or, more specifically, java.lang.String). Because working with Strings is so fundamental, it’s important to understand how they are implemented and what you can do with them. A String object encapsulates a sequence of Unicode characters. Internally, these characters are stored in a regular Java array, but the String object guards this array jealously and gives you access to it only through its own API. This is to support the idea that Strings are immutable; once you create a String object, you can’t change its value. Lots of operations on a String object appear to change the characters or length of a string, but what they really do is return a new String object that copies or internally references the needed characters of the original. Java implementations make an effort to consolidate identical strings used in the same class into a shared-string pool and to share parts of Strings where possible.

The original motivation for all of this was performance. Immutable Strings can save memory and be optimized for speed by the Java VM. The flip side is that a programmer should have a basic understanding of the String class in order to avoid creating an excessive number of String objects in places where performance is an issue. That was especially true in the past, when VMs were slow and handled memory poorly. Nowadays, string usage is not usually an issue in the overall performance of a real application.1

Constructing Strings

Literal strings, defined in your source code, are declared with double quotes and can be assigned to a String variable:

    String quote = "To be or not to be";

Java automatically converts the literal string into a String object and assigns it to the variable.

Strings keep track of their own length, so String objects in Java don’t require special terminators. You can get the length of a String with the length() method. You can also test for a zero length string by using isEmpty():

    int length = quote.length();
    boolean empty = quote.isEmpty();

Strings can take advantage of the only overloaded operator in Java, the + operator, for string concatenation. The following code produces equivalent strings:

    String name = "John " + "Smith";
    String name = "John ".concat("Smith");

Literal strings can’t (yet2) span lines in Java source files, but we can concatenate lines to produce the same effect:

    String poem =
        "'Twas brillig, and the slithy toves
" +
        "   Did gyre and gimble in the wabe:
" +
        "All mimsy were the borogoves,
" +
        "   And the mome raths outgrabe.
";

Embedding lengthy text in source code is not normally something you want to do. In Chapter 11, we’ll talk about ways to load Strings from files and URLs.

In addition to making strings from literal expressions, you can construct a String directly from an array of characters:

    char [] data = new char [] { 'L', 'e', 'm', 'm', 'i', 'n', 'g' };
    String lemming = new String( data );

You can also construct a String from an array of bytes:

    byte [] data = new byte [] { (byte)97, (byte)98, (byte)99 };
    String abc = new String(data, "ISO8859_1");

In this case, the second argument to the String constructor is the name of a character-encoding scheme. The String constructor uses it to convert the raw bytes in the specified encoding to the internally used encoding chosen by the runtime. If you don’t specify a character encoding, the default encoding scheme on your system is used.3

Conversely, the charAt() method of the String class lets you access the characters of a String in an array-like fashion:

    String s = "Newton";
    for ( int i = 0; i < s.length(); i++ )
        System.out.println( s.charAt( i ) );

This code prints the characters of the string one at a time.

The notion that a String is a sequence of characters is also codified by the String class implementing the interface java.lang.CharSequence, which prescribes the methods length() and charAt() as well as a way to get a subset of the characters.

Strings from Things

Objects and primitive types in Java can be turned into a default textual representation as a String. For primitive types like numbers, the string should be fairly obvious; for object types, it is under the control of the object itself. We can get the string representation of an item with the static String.valueOf() method. Various overloaded versions of this method accept each of the primitive types:

    String one = String.valueOf( 1 ); // integer, "1"
    String two = String.valueOf( 2.384f );  // float, "2.384"
    String notTrue = String.valueOf( false ); // boolean, "false"

All objects in Java have a toString() method that is inherited from the Object class. For many objects, this method returns a useful result that displays the contents of the object. For example, a java.util.Date object’s toString() method returns the date it represents formatted as a string. For objects that do not provide a representation, the string result is just a unique identifier that can be used for debugging. The String.valueOf() method, when called for an object, invokes the object’s toString() method and returns the result. The only real difference in using this method is that if you pass it a null object reference, it returns the String “null” for you, instead of producing a NullPointerException:

    Date date = new Date();
    // Equivalent, e.g., "Fri Dec 19 05:45:34 CST 1969"
    String d1 = String.valueOf( date );
    String d2 = date.toString();

    date = null;
    d1 = String.valueOf( date );  // "null"
    d2 = date.toString();  // NullPointerException!

String concatenation uses the valueOf() method internally, so if you “add” an object or primitive using the plus operator (+), you get a String:

    String today = "Today's date is :" + date;

You’ll sometimes see people use the empty string and the plus operator (+) as shorthand to get the string value of an object. For example:

    String two = "" + 2.384f;
    String today = "" + new Date();

Comparing Strings

The standard equals() method can compare strings for equality; they contain exactly the same characters in the same order. You can use a different method, equalsIgnoreCase(), to check the equivalence of strings in a case-insensitive way:

    String one = "FOO";
    String two = "foo";

    one.equals( two );             // false
    one.equalsIgnoreCase( two );   // true

A common mistake for novice programmers in Java is to compare strings with the == operator when they intend to use the equals() method. Remember that strings are objects in Java, and == tests for object identity; that is, whether the two arguments being tested are the same object. In Java, it’s easy to make two strings that have the same characters but are not the same string object. For example:

    String foo1 = "foo";
    String foo2 = String.valueOf( new char [] { 'f', 'o', 'o' }  );

    foo1 == foo2         // false!
    foo1.equals( foo2 )  // true

This mistake is particularly dangerous because it often works for the common case in which you are comparing literal strings (strings declared with double quotes right in the code). The reason for this is that Java tries to manage strings efficiently by combining them. At compile time, Java finds all the identical strings within a given class and makes only one object for them. This is safe because strings are immutable and cannot change. You can coalesce strings yourself in this way at runtime using the String intern() method. Interning a string returns an equivalent string reference that is unique across the VM.

The compareTo() method compares the lexical value of the String to another String, determining whether it sorts alphabetically earlier than, the same as, or later than the target string. It returns an integer that is less than, equal to, or greater than zero:

    String abc = "abc";
    String def = "def";
    String num = "123";

    if ( abc.compareTo( def ) < 0 )         // true
    if ( abc.compareTo( abc ) == 0 )        // true
    if ( abc.compareTo( num ) > 0 )         // true

The compareTo() method compares strings strictly by their characters’ positions in the Unicode specification. This works for simple text but does not handle all language variations well. The Collator class, discussed next, can be used for more sophisticated comparisons.

Searching

The String class provides several simple methods for finding fixed substrings within a string. The startsWith() and endsWith() methods compare an argument string with the beginning and end of the String, respectively:

    String url = "http://foo.bar.com/";
    if ( url.startsWith("http:") )  // true

The indexOf() method searches for the first occurrence of a character or substring and returns the starting character position, or -1 if the substring is not found:

    String abcs = "abcdefghijklmnopqrstuvwxyz";
    int i = abcs.indexOf( 'p' );     // 15
    int i = abcs.indexOf( "def" );   // 3
    int I = abcs.indexOf( "Fang" );  // -1

Similarly, lastIndexOf() searches backward through the string for the last occurrence of a character or substring.

The contains() method handles the very common task of checking to see whether a given substring is contained in the target string:

    String log = "There is an emergency in sector 7!";
    if  ( log.contains("emergency") ) pageSomeone();

    // equivalent to
    if ( log.indexOf("emergency") != -1 ) ...

For more complex searching, you can use the Regular Expression API, which allows you to look for and parse complex patterns. We’ll talk about regular expressions later in this chapter.

String Method Summary

Table 8-1 summarizes the methods provided by the String class. We’ve included several methods we have not discussed in this chapter to make sure you’re aware of other String capabilities. Feel free to try these methods out in jshell or look up the documentation online.

Table 8-1. String methods
Method Functionality

charAt()

Gets a particular character in the string

compareTo()

Compares the string with another string

concat()

Concatenates the string with another string

contains()

Checks whether the string contains another string

copyValueOf()

Returns a string equivalent to the specified character array

endsWith()

Checks whether the string ends with a specified suffix

equals()

Compares the string with another string

equalsIgnoreCase()

Compares the string with another string, ignoring case

getBytes()

Copies characters from the string into a byte array

getChars()

Copies characters from the string into a character array

hashCode()

Returns a hashcode for the string

indexOf()

Searches for the first occurrence of a character or substring in the string

intern()

Fetches a unique instance of the string from a global shared-string pool

isBlank()

Returns true if the string is zero length or contains only whitespace

isEmpty()

Returns true if the string is zero length

lastIndexOf()

Searches for the last occurrence of a character or substring in a string

length()

Returns the length of the string

lines()

Returns a stream of lines separated by line terminators

matches()

Determines if the whole string matches a regular expression pattern

regionMatches()

Checks whether a region of the string matches the specified region of another string

repeat()

Returns a concatenation of this string repeated a given number of times

replace()

Replaces all occurrences of a character in the string with another character

replaceAll()

Replaces all occurrences of a regular expression pattern with a pattern

replaceFirst()

Replaces the first occurrence of a regular expression pattern with a pattern

split()

Splits the string into an array of strings using a regular expression pattern as a delimiter

startsWith()

Checks whether the string starts with a specified prefix

strip()

Removes leading and trailing whitespace as defined by Character.isWhitespace()

stripLeading()

Removes leading whitespace similar to strip() above

stripTrailing()

Removes trailing whitespace similar to strip() above

substring()

Returns a substring from the string

toCharArray()

Returns the array of characters from the string

toLowerCase()

Converts the string to lowercase

toString()

Returns the string value of an object

toUpperCase()

Converts the string to uppercase

trim()

Removes leading and trailing whitespace defined here as any character with a codepoint less than or equal to 32 (the ‘space’ character)

valueOf()

Returns a string representation of a value

Things from Strings

Parsing and formatting text is a large, open-ended topic. So far in this chapter, we’ve looked at only primitive operations on strings—creation, searching, and turning simple values into strings. Now we’d like to move on to more structured forms of text. Java has a rich set of APIs for parsing and printing formatted strings, including numbers, dates, times, and currency values. We’ll cover most of these topics in this chapter, but we’ll wait to discuss date and time formatting below in “Local Dates and Times”.

We’ll start with parsing—reading primitive numbers and values as strings and chopping long strings into tokens. Then we’ll take a look at regular expressions, the most powerful text-parsing tool Java offers. Regular expressions let you define your own patterns of arbitrary complexity, search for them, and parse them from text.

Parsing Primitive Numbers

In Java, numbers, characters, and booleans are primitive types—not objects. But for each primitive type, Java also defines a primitive wrapper class. Specifically, the java.lang package includes the following classes: Byte, Short, Integer, Long, Float, Double, Character, and Boolean. We talked about these in “Wrappers for Primitive Types”, but we bring them up now because these classes hold static utility methods that know how to parse their respective types from strings. Each of these primitive wrapper classes has a static “parse” method that reads a String and returns the corresponding primitive type. For example:

    byte b = Byte.parseByte("16");
    int n = Integer.parseInt( "42" );
    long l = Long.parseLong( "99999999999" );
    float f = Float.parseFloat( "4.2" );
    double d = Double.parseDouble( "99.99999999" );
    boolean b = Boolean.parseBoolean("true");

Alternately, the java.util.Scanner provides a single API for not only parsing individual primitive types from strings, but reading them from a stream of tokens. This example shows how to use it in place of the preceding wrapper classes:

    byte b = new Scanner("16").nextByte();
    int n = new Scanner("42").nextInt();
    long l = new Scanner("99999999999").nextLong();
    float f = new Scanner("4.2").nextFloat();
    double d = new Scanner("99.99999999").nextDouble();
    boolean b = new Scanner("true").nextBoolean();

Tokenizing Text

A common programming task involves parsing a string of text into words or “tokens” that are separated by some set of delimiter characters, such as spaces or commas. The first example contains words separated by single spaces. The second, more realistic problem involves comma-delimited fields.

    Now is the time for all good men (and women)...

    Check Number, Description,      Amount
    4231,         Java Programming, 1000.00

Java has several (unfortunately overlapping) APIs for handling situations like this. The most powerful and useful are the String split() and Scanner APIs. Both utilize regular expressions to allow you to break the string on arbitrary patterns. We haven’t talked about regular expressions yet, but in order to show you how this works we’ll just give you the necessary magic and explain in detail later in this chapter. We’ll also mention a legacy utility, java.util.StringTokenizer, which uses simple character sets to split a string. StringTokenizer is not as powerful, but doesn’t require an understanding of regular expressions.

The String split() method accepts a regular expression that describes a delimiter and uses it to chop the string into an array of Strings:

    String text = "Now is the time for all good men";
    String [] words = text.split("\s");
    // words = "Now", "is", "the", "time", ...

    String text = "4231,         Java Programming, 1000.00";
    String [] fields = text.split("\s*,\s*");
    // fields = "4231", "Java Programming", "1000.00"

In the first example, we used the regular expression \s, which matches a single whitespace character (space, tab, or carriage return). The split() method returned an array of eight strings. In the second example, we used a more complicated regular expression, \s*,\s*, which matches a comma surrounded by any number of contiguous spaces (possibly zero). This reduced our text to three nice, tidy fields.

With the new Scanner API, we could go a step further and parse the numbers of our second example as we extract them:

    String text = "4231,         Java Programming, 1000.00";
    Scanner scanner = new Scanner( text ).useDelimiter("\s*,\s*");
    int checkNumber = scanner.nextInt(); // 4231
    String description = scanner.next(); // "Java Programming"
    float amount = scanner.nextFloat();  // 1000.00

Here, we’ve told the Scanner to use our regular expression as the delimiter and then called it repeatedly to parse each field as its corresponding type. The Scanner is convenient because it can read not only from Strings but directly from stream sources (more in Chapter 11) such as InputStreams, Files, and Channels:

    Scanner fileScanner = new Scanner( new File("spreadsheet.csv") );
    fileScanner.useDelimiter( "\s*,\s* );
    // ...

Another thing that you can do with the Scanner is to look ahead with the “hasNext” methods to see if another item is coming:

    while( scanner.hasNextInt() ) {
      int n = scanner.nextInt();
      ...
    }

StringTokenizer

Even though the StringTokenizer class that we mentioned is now a legacy item, it’s good to know that it’s there because it’s been around since the beginning of Java and is used in a lot of code. StringTokenizer allows you to specify a delimiter as a set of characters and matches any number or combination of those characters as a delimiter between tokens. The following snippet reads the words of our first example:

    String text = "Now is the time for all good men (and women)...";
    StringTokenizer st = new StringTokenizer( text );

    while ( st.hasMoreTokens() )  {
        String word = st.nextToken();
        ...
    }

We invoke the hasMoreTokens() and nextToken() methods to loop over the words of the text. By default, the StringTokenizer class uses standard whitespace characters—carriage return, newline, and tab—as delimiters. You can also specify your own set of delimiter characters in the StringTokenizer constructor. Any contiguous combination of the specified characters that appears in the target string is skipped between tokens:

    String text = "4231,     Java Programming, 1000.00";
    StringTokenizer st = new StringTokenizer( text, "," );

    while ( st.hasMoreTokens() )  {
       String word = st.nextToken();
       // word = "4231", "     Java Programming", "1000.00"
    }

This isn’t as clean as our regular expression example. Here we used a comma as the delimiter so we get extra leading whitespace in our description field. If we had added space to our delimiter string, the StringTokenizer would have broken our description into two words, “Java” and “Programming,” which is not what we wanted. A solution here would be to use trim() to remove the leading and trailing space on each element.

Regular Expressions

Now it’s time to take a brief detour on our trip through Java and enter the land of regular expressions. A regular expression, or regex for short, describes a text pattern. Regular expressions are used with many tools—including the java.util.regex package, text editors, and many scripting languages—to provide sophisticated text-searching and powerful string-manipulation capabilities.

If you are already familiar with the concept of regular expressions and how they are used with other languages, you may wish to skim through this section. At the very least, you’ll need to look at “The java.util.regex API” later in this chapter, which covers the Java classes necessary to use them. On the other hand, if you’ve come to this point on your Java journey with a clean slate on this topic and you’re wondering exactly what regular expressions are, then pop open your favorite beverage and get ready. You are about to learn about the most powerful tool in the arsenal of text manipulation and what is, in fact, a tiny language within a language, all in the span of a few pages.

Regex Notation

A regular expression describes a pattern in text. By pattern, we mean just about any feature you can imagine identifying in text from the literal characters alone, without actually understanding their meaning. This includes features, such as words, word groupings, lines and paragraphs, punctuation, case, and more generally, strings and numbers with a specific structure to them, such as phone numbers, email addresses, and quoted phrases. With regular expressions, you can search the dictionary for all the words that have the letter “q” without its pal “u” next to it, or words that start and end with the same letter. Once you have constructed a pattern, you can use simple tools to hunt for it in text or to determine if a given string matches it. A regex can also be arranged to help you dismember specific parts of the text it matched, which you could then use as elements of replacement text if you wish.

Write once, run away

Before moving on, we should say a few words about regular expression syntax in general. At the beginning of this section, we casually mentioned that we would be discussing a new language. Regular expressions do, in fact, constitute a simple form of programming language. If you think for a moment about the examples we cited earlier, you can see that something like a language is going to be needed to describe even simple patterns—such as email addresses—that have some variation in form.

A computer science textbook would classify regular expressions at the bottom of the hierarchy of computer languages, in terms of both what they can describe and what you can do with them. They are still capable of being quite sophisticated, however. As with most programming languages, the elements of regular expressions are simple, but they can be built up in combination to arbitrary complexity. And that is where things start to get sticky.

Since regexes work on strings, it is convenient to have a very compact notation that can be easily wedged between characters. But compact notation can be very cryptic, and experience shows that it is much easier to write a complex statement than to read it again later. Such is the curse of the regular expression. You may find that in a moment of late-night, caffeine-fueled inspiration, you can write a single glorious pattern to simplify the rest of your program down to one line. When you return to read that line the next day, however, it may look like Egyptian hieroglyphics to you. Simpler is generally better, but if you can break your problem down and do it more clearly in several steps, maybe you should.

Escaped characters

Now that you’re properly warned, we have to throw one more thing at you before we build you back up. Not only can the regex notation get a little hairy, but it is also somewhat ambiguous with ordinary Java strings. An important part of the notation is the escaped character—a character with a backslash in front of it. For example, the escaped d character, d, (backslash ‘d’) is shorthand that matches any single digit character (0-9). However, you cannot simply write d as part of a Java string, because you might recall that Java uses the backslash for its own special characters and to specify Unicode character sequences (uxxxx). Fortunately, Java gives us a replacement: an escaped backslash, which is two backslashes (\), means a literal backslash. The rule is, when you want a backslash to appear in your regex, you must escape it with an extra one:

    "\d" // Java string that yields backslash "d"

And just to make things crazier, because regex notation itself uses backslash to denote special characters, it must provide the same “escape hatch” as well—allowing you to double up backslashes if you want a literal backslash. So if you want to specify a regular expression that includes a single literal backslash, it looks like this:

    "\\"  // Java string yields two backslashes; regex yields one

Most of the “magic” operator characters you read about in this section operate on the character that precedes them, so these also must be escaped if you want their literal meaning. This includes such characters as ., *, +, braces {}, and parentheses ().

If you need to create part of an expression that has lots of literal characters in it, you can use the special delimiters Q and E to help you. Any text appearing between Q and E is automatically escaped. (You still need the Java String escapes—double backslashes for backslash, but not quadruple.) There is also a static method Pattern.quote(), which does the same thing, returning a properly escaped version of whatever string you give it.

Beyond that, our only suggestion to help maintain your sanity when working with these examples is to keep two copies—a comment line showing the naked regular expression and the real Java string, where you must double up all backslashes. And don’t forget about jshell! It can be a very powerful playground for testing and tweaking your patterns.

Characters and character classes

Now, let’s dive into the actual regex syntax. The simplest form of a regular expression is plain, literal text, which has no special meaning and is matched directly (character for character) in the input. This can be a single character or more. For example, in the following string, the pattern “s” can match the character s in the words rose and is:

    "A rose is $1.99."

The pattern “rose” can match only the literal word rose. But this isn’t very interesting. Let’s crank things up a notch by introducing some special characters and the notion of character “classes.”

Any character: dot (.)

The special character dot (.) matches any single character. The pattern “.ose” matches rose, nose, _ose (space followed by ose) or any other character followed by the sequence ose. Two dots match any two characters (“prose”, “close”, etc.), and so on. The dot operator is not discriminating; it normally stops only for an end-of-line character (and, optionally, you can tell it not to; we discuss that later).
We can consider “.” to represent the group or class of all characters. And regexes define more interesting character classes as well.

Whitespace or nonwhitespace character: s, S

The special character s matches a literal-space character or one of the following characters: (tab), (carriage return), (newline), f (formfeed), and backspace. The corresponding special character S does the inverse, matching any character except whitespace.

Digit or nondigit character: d, D

d matches any of the digits 0-9. D does the inverse, matching all characters except digits.

Word or nonword character: w, W

w matches a “word” character, including upper- and lowercase letters A-Z, a-z, the digits 0-9, and the underscore character (_). W matches everything except those characters.

Custom character classes

You can define your own character classes using the notation […]. For example, the following class matches any of the characters a, b, c, x, y, or z:

    [abcxyz]

The special x-y range notation can be used as shorthand for the alphanumeric characters. The following example defines a character class containing all upper- and lowercase letters:

    [A-Za-z]

Placing a caret (^) as the first character inside the brackets inverts the character class. This example matches any character except uppercase A-F:

    [^A-F]    //  G, H, I, ..., a, b, c, ... etc.

Nesting character classes simply adds them:

    [A-F[G-Z]w]   // A-Z plus whitespace

The && logical AND notation can be used to take the intersection (characters in common):

    [a-p&&[l-z]]  // l, m, n, o, p
    [A-Z&&[^P]]  // A through Z except P

Position markers

The pattern “[Aa] rose” (including an upper- or lowercase A) matches three times in the following phrase:

    "A rose is a rose is a rose"

Position characters allow you to designate the relative location of a match. The most important are ^ and $, which match the beginning and end of a line, respectively:

    ^[Aa] rose  // matches "A rose" at the beginning of line
    [Aa] rose$  // matches "a rose" at end of line

To be a little more precise, ^ and $ match the beginning and end of “input,” which is often a single line. If you are working with multiple lines of text and wish to match the beginnings and endings of lines within a single large string, you can turn on “multiline” mode with a flag as described later in “Special options”.

The position markers  and B match a word boundary or nonword boundary, respectively. For example, the following pattern matches rose and rosemary, but not primrose:

    brose

Iteration (multiplicity)

Simply matching fixed character patterns would not get us very far. Next, we look at operators that count the number of occurrences of a character (or more generally, of a pattern, as we’ll see in “Pattern”):

Any (zero or more iterations): asterisk (*)

Placing an asterisk (*) after a character or character class means “allow any number of that type of character”—in other words, zero or more. For example, the following pattern matches a digit with any number of leading zeros (possibly none):

    0*d   // match a digit with any number of leading zeros
Some (one or more iterations): plus sign (+)

The plus sign (+) means “one or more” iterations and is equivalent to XX* (pattern followed by pattern asterisk). For example, the following pattern matches a number with one or more digits, plus optional leading zeros:

    0*d+   // match a number (one or more digits) with optional leading
            // zeros

It may seem redundant to match the zeros at the beginning of an expression because zero is a digit and is thus matched by the d+ portion of the expression anyway. However, we’ll show later how you can pick apart the string using a regex and get at just the pieces you want. In this case, you might want to strip off the leading zeros and keep only the digits.

Optional (zero or one iteration): question mark (?)

The question mark operator (?) allows exactly zero or one iteration. For example, the following pattern matches a credit-card expiration date, which may or may not have a slash in the middle:

    dd/?dd  // match four digits with an optional slash in the middle
Range (between x and y iterations, inclusive): {x,y}

The {x,y} curly-brace range operator is the most general iteration operator. It specifies a precise range to match. A range takes two arguments: a lower bound and an upper bound, separated by a comma. This regex matches any word with five to seven characters, inclusive:

    bw{5,7}b  // match words with at least 5 and at most 7 characters
At least x or more iterations (y is infinite): {x,}

If you omit the upper bound, simply leaving a dangling comma in the range, the upper bound becomes infinite. This is a way to specify a minimum of occurrences with no maximum.

Alternation

The vertical bar (|) operator denotes the logical OR operation, also called alternation or choice. The | operator does not operate on individual characters but instead applies to everything on either side of it. It splits the expression in two unless constrained by parentheses grouping. For example, a slightly naive approach to parsing dates might be the following:

    w+, w+ d+ d+|dd/dd/dd  // pattern 1 or pattern 2

In this expression, the left matches patterns such as Fri, Oct 12, 2001, and the right matches 10/12/2001.

The following regex might be used to match email addresses with one of three domains (net, edu, and gov):

    w+@[w.]*.(net|edu|gov)  // email address ending in .net, .edu, or .gov

Special options

There are several special options that affect the way the regex engine performs its matching. These options can be applied in two ways:

  • You can pass in one or more flags during the Pattern.compile() step (discussed in the next section).

  • You can include a special block of code in your regex.

We’ll show the latter approach here. To do this, include one or more flags in a special block (?x), where x is the flag for the option we want to turn on. Generally, you do this at the beginning of the regex. You can also turn off flags by adding a minus sign (?-x), which allows you to apply flags to select parts of your pattern.

The following flags are available:

Case-insensitive: (?i)

The (?i) flag tells the regex engine to ignore case while matching, for example:

    (?i)yahoo   // match Yahoo, yahoo, yahOO, etc.
Dot all: (?s)

The (?s) flag turns on “dot all” mode, allowing the dot character to match anything, including end-of-line characters. It is useful if you are matching patterns that span multiple lines. The s stands for “single-line mode,” a somewhat confusing name derived from Perl.

Multiline: (?m)

By default, ^ and $ don’t really match the beginning and end of lines (as defined by carriage return or newline combinations); they instead match the beginning or end of the entire input text. In many cases, “one line” is synonymous with the entire input. If you have a big block of text to process, you’ll often break that block up into separate lines for other reasons and then checking any given line for a regular expression is straightforward and ^ and $ behave as expected. However, if you want to use a regex with the entire input string containing multiple lines (separated by those carriage return or newline combinations) you can turn on multiline mode with (?m). This flag causes ^ and $ to match the beginning and end of the individual lines within the block of text as well as the beginning and end of the entire block. Specifically, this means the spot before the first character, the spot after the last character, and the spots just before and after line terminators inside the string.

Unix lines: (?d)

The (?d) flag limits the definition of the line terminator for the ^, $, and . special characters to Unix-style newline only ( ). By default, carriage return newline ( ) is also allowed.

The java.util.regex API

Now that we’ve covered the theory of how to construct regular expressions, the hard part is over. All that’s left is to investigate the Java API for applying these expressions.

Pattern

As we’ve said, the regex patterns that we write as strings are, in actuality, little programs describing how to match text. At runtime, the Java regex package compiles these little programs into a form that it can execute against some target text. Several simple convenience methods accept strings directly to use as patterns. More generally, however, Java allows you to explicitly compile your pattern and encapsulate it in an instance of a Pattern object. This is the most efficient way to handle patterns that are used more than once, because it eliminates needlessly recompiling the string. To compile a pattern, we use the static method Pattern.compile():

    Pattern urlPattern = Pattern.compile("\w+://[\w/]*");

Once you have a Pattern, you can ask it to create a Matcher object, which associates the pattern with a target string:

    Matcher matcher = urlPattern.matcher( myText );

The matcher executes the matches. We’ll talk about that next. But before we do, we’ll just mention one convenience method of Pattern. The static method Pattern.matches() simply takes two strings—a regex and a target string—and determines if the target matches the regex. This is very convenient if you want to do a quick test once in your application. For example:

    Boolean match = Pattern.matches( "\d+\.\d+f?", myText );

This line of code can test if the string myText contains a Java-style floating-point number such as “42.0f.” Note that the string must match completely in order to be considered a match. If you want to see if a small pattern is contained within a larger string but don’t care about the rest of the string, you have to use a Matcher as described in “The Matcher” below.

Let’s try another (simplified) pattern that we could use in our game once we start letting multiple players compete against each other. Many login systems use email addresses as the user identifier. Such systems aren’t perfect, of course, but an email address will work great for our needs. We would like to invite the user to input their email address but we want to make sure it looks valid before using it. A regular expression can be a quick way to perform such a validation.4

Much like writing algorithms to solve programming problems, designing a regular expression requires you to break down your pattern matching problem into bite-sized pieces. If we think about email addresses, there are a few patterns that stand out right away. The most obvious is the @ in the middle of every address. A naive (but better than nothing!) pattern relying on that fact could be built like this:

    String sample = "[email protected]";
    Boolean validEmail = Pattern.matches(".*@.*", sample);

But that pattern is too permissive. It will certainly recognize valid email addresses, but it will also recognize many invalid ones like "bad.address@" or "@also.bad" or even "@@". (Test these out in a jshell and maybe cook up a few more bad examples of your own!) How can we make better matches? One quick adjustment would be to use the + modifier instead of the *. The upgraded pattern now requires at least one character on each side of the @. But we know a few other things about email addresses. For example, the left “half” of the address (the name portion) cannot contain the @ character. For that matter, neither can the domain portion. We can use a custom character class for this next upgrade.

    String sample = "[email protected]";
    Boolean validEmail = Pattern.matches("[^@]+@[^@]+", sample);

This pattern is better, but still allows several invalid addresses such as "still@bad" since domain names have at least a name followed by a period (.) followed by a top-level domain (TLD) such as “oreilly.com”. So maybe a pattern like this:

    String sample = "[email protected]";
    Boolean validEmail = Pattern.matches("[^@]+@[^@]+\.(com|org)", sample);

That pattern fixes our issue with an address like "still@bad" but we’ve gone a bit too far the other way. There are many, many TLDs—too many to reasonably list even if we ignore the problem of maintaining that list as new TLDs are added.5 So let’s step back a little. We’ll keep the “dot” in the domain portion, but remove the specific TLD and just accept a simple run of letters:

    String sample = "[email protected]";
    Boolean validEmail = Pattern.matches("[^@]+@[^@]+\.[a-z]+", sample);

Much better. We can add one last tweak to make sure we don’t worry about the case of the address since all email addresses are case-insensitive. Just tack on a flag:

    String sample = "[email protected]";
    Boolean validEmail = Pattern.matches("(?i)[^@]+@[^@]+\.[a-z]+", sample);

Again, this is by no means a perfect email validator, but it is definitely a good start and will suffice for our simple login system once we add networking. If you want to tinker around with the validation pattern and expand or improve it, remember you can “reuse” lines in jshell with the keyboard arrow keys. Use the up arrow to retrieve the previous line. Indeed, you can use up arrow and down arrow to navigate all of your recent lines. Within a line, use the left arrow and right arrow to move around and delete/add/edit your command. Then just hit the return key to run the newly altered command—you do not need to move the cursor to the end of the line before hitting return.

jshell> Pattern.matches("(?i)[^@]+@[^@]+\.[a-z]+", "[email protected]")
$1 ==> true

jshell> Pattern.matches("(?i)[^@]+@[^@]+\.[a-z]+", "[email protected]")
$2 ==> true

jshell> Pattern.matches("(?i)[^@]+@[^@]+\.[a-z]+", "oreilly.com")
$3 ==> false

jshell> Pattern.matches("(?i)[^@]+@[^@]+\.[a-z]+", "bad@oreilly@com")
$4 ==> false

jshell> Pattern.matches("(?i)[^@]+@[^@]+\.[a-z]+", "[email protected]")
$5 ==> true

jshell> Pattern.matches("[^@]+@[^@]+\.[a-z]+", "[email protected]")
$6 ==> false

In the examples above, we only typed in the full Pattern.matches(…) line once. After that it was a simple up arrow and then edit and then return for the subsequent five lines. Can you see why the final match test failed?

The Matcher

A Matcher associates a pattern with a string and provides tools for testing, finding, and iterating over matches of the pattern against it. The Matcher is “stateful.” For example, the find() method tries to find the next match each time it is called. But you can clear the Matcher and start over by calling its reset() method.

If you’re just interested in “one big match”—that is, you’re expecting your string to either match the pattern or not—you can use matches() or lookingAt(). These correspond roughly to the methods equals() and startsWith() of the String class. The matches() method asks if the string matches the pattern in its entirety (with no string characters left over) and returns true or false. The lookingAt() method does the same, except that it asks only whether the string starts with the pattern and doesn’t care if the pattern uses up all the string’s characters.

More generally, you’ll want to be able to search through the string and find one or more matches. To do this, you can use the find() method. Each call to find() returns true or false for the next match of the pattern and internally notes the position of the matching text. You can get the starting and ending character positions with the Matcher start() and end() methods, or you can simply retrieve the matched text with the group() method. For example:

    import java.util.regex.*;

    String text="A horse is a horse, of course of course...";
    String pattern="horse|course";

    Matcher matcher = Pattern.compile( pattern ).matcher( text );
    while ( matcher.find() )
      System.out.println(
        "Matched: '"+matcher.group()+"' at position "+matcher.start() );

The previous snippet prints the starting location of the words “horse” and “course” (four in all):

    Matched: 'horse' at position 2
    Matched: 'horse' at position 13
    Matched: 'course' at position 23
    Matched: 'course' at position 33

The method to retrieve the matched text is called group() because it refers to capture group zero (the entire match). You can also retrieve the text of other numbered capture groups by giving the group() method an integer argument. You can determine how many capture groups you have with the groupCount() method:

    for (int i=1; i < matcher.groupCount(); i++)
    System.out.println( matcher.group(i) );

Splitting and tokenizing strings

A very common need is to parse a string into a bunch of fields based on some delimiter, such as a comma. It’s such a common problem the String class contains a method for doing just this. The split() method accepts a regular expression and returns an array of substrings broken around that pattern. Consider the following string and split() calls:

    String text = "Foo, bar ,   blah";
    String[] badFields = text.split(",");
    String[] goodFields = text.split( "\s*,\s*" );

The first split() returns a String array, but the naive use of “,” to separate the string means the which space in our text variable remains stuck to the more interesting characters. We get Foo as a single word as expected, but then we get bar<space> and finally <space><space><space>blah. Yikes! The second split() also yields a String array, but this time containing the expected Foo, bar (with no trailing space), and blah (with no leading spaces).

If you are going to use an operation like this more than a few times in your code, you should probably compile the pattern and use its split() method, which is identical to the version in String. The String split() method is equivalent to:

    Pattern.compile(pattern).split(string);

As we noted before, there is a lot to learn about regular expressions above and beyond the specific regex capabilities provided by Java. Revisit using jshell (“Pattern”) to play around with expressions and splitting. This is definitely a topic that benefits from practice.

Math Utilities

Java supports integer and floating-point arithmetic directly in the language. Higher-level math operations are supported through the java.lang.Math class. As you may have seen by now, wrapper classes for primitive data types allow you to treat them as objects. Wrapper classes also hold some methods for basic conversions.

First, a few words about built-in arithmetic in Java. Java handles errors in integer arithmetic by throwing an ArithmeticException:

    int zero = 0;

    try {
        int i = 72 / zero;
    } catch ( ArithmeticException e ) {
        // division by zero
    }

To generate the error in this example, we created the intermediate variable zero. The compiler is somewhat crafty and would have caught us if we had blatantly tried to perform division by a literal zero.

Floating-point arithmetic expressions, on the other hand, don’t throw exceptions. Instead, they take on the special out-of-range values shown in Table 8-2.

Table 8-2. Special floating-point values
Value Mathematical representation

POSITIVE_INFINITY

1.0/0.0

NEGATIVE_INFINITY

-1.0/0.0

NaN

0.0/0.0

The following example generates an infinite result:

    double zero = 0.0;
    double d = 1.0/zero;

    if ( d == Double.POSITIVE_INFINITY )
        System.out.println( "Division by zero" );

The special value NaN (not a number) indicates the result of dividing zero by zero. This value has the special mathematical distinction of not being equal to itself (NaN != NaN evaluates to true). Use Float.isNaN() or Double.isNaN() to test for NaN.

The java.lang.Math Class

The java.lang.Math class is Java’s math library. It holds a suite of static methods covering all of the usual mathematical operations like sin(), cos(), and sqrt(). The Math class isn’t very object-oriented (you can’t create an instance of Math). Instead, it’s really just a convenient holder for static methods that are more like global functions. As we saw in Chapter 5, it’s possible to use the static import functionality to import the names of static methods and constants like this directly into the scope of our class and use them by their simple, unqualified names.

Table 8-3 summarizes the methods in java.lang.Math.

Table 8-3. Methods in java.lang.Math
Method Argument type(s) Functionality

Math.abs(a)

int, long, float, double

Absolute value

Math.acos(a)

double

Arc cosine

Math.asin(a)

double

Arc sine

Math.atan(a)

double

Arc tangent

Math.atan2(a,b)

double

Angle part of rectangular-to-polar coordinate transform

Math.ceil(a)

double

Smallest whole number greater than or equal to a

Math.cbrt(a)

double

Cube root of a

Math.cos(a)

double

Cosine

Math.cosh(a)

double

Hyperbolic cosine

Math.exp(a)

double

Math.E to the power a

Math.floor(a)

double

Largest whole number less than or equal to a

Math.hypot(a,b)

double

Precision calculation of the sqrt() of a2 + b2

Math.log(a)

double

Natural logarithm of a

Math.log10(a)

double

Log base 10 of a

Math.max(a, b)

int, long, float, double

The value a or b closer to Long.MAX_VALUE

Math.min(a, b)

int, long, float, double

The value a or b closer to Long.MIN_VALUE

Math.pow(a, b)

double

a to the power b

Math.random()

None

Random-number generator

Math.rint(a)

double

Converts double value to integral value in double format

Math.round(a)

float, double

Rounds to whole number

Math.signum(a)

double, float

Get the sign of the number at 1.0, –1.0, or 0

Math.sin(a)

double

Sine

Math.sinh(a)

double

Hyperbolic sine

Math.sqrt(a)

double

Square root

Math.tan(a)

double

Tangent

Math.tanh(a)

double

Hyperbolic tangent

Math.toDegrees(a)

double

Convert radians to degrees

Math.toRadians(a)

double

Convert degrees to radians

log(), pow(), and sqrt() can throw a runtime ArithmeticException. abs(), max(), and min() are overloaded for all the scalar values, int, long, float, or double, and return the corresponding type. Versions of Math.round() accept either float or double and return int or long, respectively. The rest of the methods operate on and return double values:

    double irrational = Math.sqrt( 2.0 ); // 1.414...
    int bigger = Math.max( 3, 4 );  // 4
    long one = Math.round( 1.125798 ); // 1

And just to highlight the convenience of that static import option, we can try these simple functions in jshell:

jshell> import static java.lang.Math.*

jshell> double irrational = sqrt(2.0)
irrational ==> 1.4142135623730951

jshell> int bigger = max(3,4)
bigger ==> 4

jshell> long one = round(1.125798)
one ==> 1

Math also contains the static final double values E and PI:

    double circumference = diameter  * Math.PI;

Math in Action

We’ve already touched on using the Math class and its static methods in “Accessing Fields and Methods”. We can use it again in making our game a little more fun by randomizing where the trees appear. The Math.random() method returns a random double greater than or equal to 0 and less that 1. Add in a little arithmetic and rounding or truncating and you can use that value to create random numbers in any range you need. In particular, converting this value into a desired range follows this formula:

    int randomValue = min + (int)(Math.random() * (max - min));

Try it! Try to generate a random 4-digit number in jshell. You could set the min to 1000 and the max to 10000, like so:

jshell> int min = 1000
min ==> 1000

jshell> int max = 10000
max ==> 10000

jshell> int fourDigit = min + (int)(Math.random() * (max - min))
fourDigit ==> 9603

jshell> fourDigit = min + (int)(Math.random() * (max - min))
fourDigit ==> 9178

jshell> fourDigit = min + (int)(Math.random() * (max - min))
fourDigit ==> 3789

To place our trees, we’ll need two random numbers for the x and y coordinates. We can set a range that will keep the trees on the screen by thinking about a margin around the edges. For the x coordinate, one way to do that might look like this:

private int goodX() {
    // at least half the width of the tree plus a few pixels
    int leftMargin = Field.TREE_WIDTH_IN_PIXELS / 2 + 5;
    // now find a random number between a left and right margin
    int rightMargin = FIELD_WIDTH - leftMargin;

    // And return a random number starting at the left margin
    return leftMargin + (int)(Math.random() * (rightMargin - leftMargin));
}

Set up a similar method for finding a y value and you should start to see something like the image shown in Figure 8-1. You could even get fancy and use the isTouching() method we discussed back in Chapter 5 to avoid placing any trees in direct contact with our physicist. Here’s our upgraded tree setup loop:

for (int i = field.trees.size(); i < Field.MAX_TREES; i++) {
    Tree t = new Tree();
    t.setPosition(goodX(), goodY());
    // Trees can be close to each other and overlap,
    // but they shouldn't intersect our physicist
    while(player1.isTouching(t)) {
        // We do intersect this tree, so let's try again
        t.setPosition(goodX(), goodY());
        System.err.println("Repositioning an intersecting tree...");
    }
    field.addTree(t);
}
lj5e 0801
Figure 8-1. Randomly distributed trees

Try quitting the game and launching it again. You should see the trees in different places each time you run the application.

Big/Precise Numbers

If the long and double types are not large or precise enough for you, the java.math package provides two classes, BigInteger and BigDecimal, that support arbitrary-precision numbers. These full-featured classes have a bevy of methods for performing arbitrary-precision math and precisely controlling rounding of remainders. In the following example, we use BigDecimal to add two very large numbers and then create a fraction with a 100-digit result:

    long l1 = 9223372036854775807L; // Long.MAX_VALUE
    long l2 = 9223372036854775807L;
    System.out.println( l1 + l2 ); // -2 ! Not good.
     
    try {
        BigDecimal bd1 = new BigDecimal( "9223372036854775807" );
        BigDecimal bd2 = new BigDecimal( 9223372036854775807L );
        System.out.println( bd1.add( bd2 ) ); // 18446744073709551614
     
        BigDecimal numerator = new BigDecimal(1);
        BigDecimal denominator = new BigDecimal(3);
        BigDecimal fraction =
            numerator.divide( denominator, 100, BigDecimal.ROUND_UP );
        // 100 digit fraction = 0.333333 ... 3334
    }
    catch (NumberFormatException nfe) { }
    catch (ArithmeticException ae) { }

If you implement cryptographic or scientific algorithms for fun, BigInteger is crucial. BigDecimal, in turn, can be found in applications dealing with currency and financial data. Other than that, you’re not likely to need these classes.

Dates and Times

Working with dates and times without the proper tools can be a chore. Prior to Java 8, you had access to three classes that handled most of the work for you. The java.util.Date class encapsulates a point in time. The java.util.GregorianCalendar class, which extends the abstract java.util.Calendar, translates between a point in time and calendar fields like month, day, and year. Finally, the java.text.DateFormat class knows how to generate and parse string representations of dates and times in many languages.

While the Date and Calendar classes covered many use cases, they lacked granularity and were missing other features that caused the creation of several third-party libraries all aimed at making it easier for developers to work with dates and times and time durations. Java 8 provided much needed improvements in this area with the addition of the java.time package. We will explore this new package, but you will still encounter many, many Date and Calendar examples in the wild so it’s useful to know they exist. As always, the online docs are an invaluable source for reviewing parts of the Java API we don’t tackle here.

Local Dates and Times

The java.time.LocalDate class represents a date without time information for your local region. Think of a holiday such as May 4, 2019. Similarly, java.time.LocalTime represents a time without any date information. Perhaps your alarm clock goes off at 7:15 every morning. The java.time.LocalDateTime stores both date and time values for things like appointments with your eye doctor so you can keep reading books on Java. All of these classes offer static methods for creating new instances using either appropriate numeric values with of() or by parsing strings with parse(). Let’s pop into jshell and try creating a few examples.

jshell> import java.time.*

jshell> LocalDate.of(2019,5,4)
$2 ==> 2019-05-04

jshell> LocalDate.parse("2019-05-04")
$3 ==> 2019-05-04

jshell> LocalTime.of(7,15)
$4 ==> 07:15

jshell> LocalTime.parse("07:15")
$5 ==> 07:15

jshell> LocalDateTime.of(2019,5,4,7,0)
$6 ==> 2019-05-04T07:00

jshell> LocalDateTime.parse("2019-05-04T07:15")
$7 ==> 2019-05-04T07:15

Another great static method for creating these objects is now() which provides the current date or time or date-and-time as you might expect:

jshell> LocalTime.now()
$8 ==> 15:57:24.052935

jshell> LocalDate.now()
$9 ==> 2019-12-12

jshell> LocalDateTime.now()
$10 ==> 2019-12-12T15:57:37.909038

Great! After importing the java.time package, we can create instances of each of the Local… classes for specific moments or for “right now”. You may have noticed the objects created with now() include seconds and nanoseconds. You can supply those values to the of() and parse() methods if you want or need them. Not much exciting there, but once you have these objects, you can do a lot with them. Read on!

Comparing and Manipulating Dates and Times

One of the big advantages of using java.time classes is the consistent set of methods you have available for comparing and changing dates and times. For example, many chat applications will show you “how long ago” a message was sent. The java.time.temporal subpackage has just what we need: the ChronoUnit interface. It contains several date and time units such as MONTHS, DAYS, HOURS, MINUTES, etc. These units can be used to calculate differences. For example, we could calculate how long it takes us to create two example date-times in jshell using the between() method:

jshell> LocalDateTime first = LocalDateTime.now()
first ==> 2019-12-12T16:03:21.875196

jshell> LocalDateTime second = LocalDateTime.now()
second ==> 2019-12-12T16:03:33.175675

jshell> import java.time.temporal.*

jshell> ChronoUnit.SECONDS.between(first, second)
$12 ==> 11

A visual spot check shows that it did indeed take about 11 seconds to type in the line that created our second variable. You should check out the docs for ChronoUnit for a complete list of units available, but you get the full range from nanoseconds up to millennia.

Those units can also help you manipulate dates and times with the plus() and minus() methods. To set a reminder for one week from today, for example:

jshell> LocalDate today = LocalDate.now()
today ==> 2019-12-12

jshell> LocalDate reminder = today.plus(1, ChronoUnit.WEEKS)
reminder ==> 2019-12-19

Neat! But this reminder example brings up another bit of manipulation you may need to perfrom from time to time. You might want the reminder at a paricular time on the 19th. You can convert between dates or times and date-times easily enough with the atDate() or atTime() methods:

jshell> LocalDateTime betterReminder = reminder.atTime(LocalTime.of(9,0))
betterReminder ==> 2019-12-19T09:00

Now we’ll get that reminder at 9 o’clock AM. Except, what if we set that reminder in Atlanta and then flew to San Francisco? When would the alarm go off? LocalDateTime is, well, local! So the T09:00 portion is still 9 o’clock AM wherever we are when we run the program. But if we are handling something like a shared calendar and scheduling a meeting, we cannot ignore the different time zones involved. Fortunately the java.time package has thought of that, too.

Time Zones

The authors of the new java.time package certainly encourage you to use the local variations of the time and date classes where possible. Adding support for time zones means adding complexity to your app—they want you to avoid that complexity if possible. But there are many scenarios where support for time zones is unavoidable. You can work with “zoned” dates and times using the ZonedDateTime and OffsetDateTime classes. The zoned variant understands named time zones and things like daylight saving adjustments. The offset variant is a constant, simple numeric offset from UTC/Greenwich.

Most user-facing uses of dates and times will use the named zone approach so let’s look at created a zoned date-time. To attach a zone, we use the ZoneId class which has the commmon of() static method for creating new instances. You can supply a region zone as a String to get your zoned value.

jshell> LocalDateTime piLocal = LocalDateTime.parse("2019-03-14T01:59")
piLocal ==> 2019-03-14T01:59

jshell> ZonedDateTime piCentral = piLocal.atZone(ZoneId.of("America/Chicago"))
piCentral ==> 2019-03-14T01:59-05:00[America/Chicago]

And now you can do things like make sure your friends in Paris are able to join you at the correct moment using the verbose but aptly named withZoneSameInstant() method.

jshell> ZonedDateTime piAlaMode = piCentral.withZoneSameInstant(ZoneId.of("Europe/Paris"))
piAlaMode ==> 2019-03-14T07:59+01:00[Europe/Paris]

If you have other friends who aren’t conveniently located in a major metropolitan region but you want them to join as well, you can use the systemDefault() method of ZoneId to pickup their time zone programmatically.

jshell> ZonedDateTime piOther = piCentral.withZoneSameInstant(ZoneId.systemDefault())
piOther ==> 2019-03-14T02:59-04:00[America/New_York]

In our case, jshell was running on a laptop in the standard Eastern time zone (not during the daylight saving period) of the United States and piOther comes out exactly as hoped. The systemDefault() zone ID is a very handy way to quickly tailor date-times from some other zone to match what your user’s clock and calendar are most likely to say. In commercial applications you may want to let the user tell you their preferred zone, but systemDefault() is usually a good guess.

Parsing and Formatting Dates and Times

For creating and showing our local and zoned date-times using strings, we’be been relying on the default formats which follow ISO values and generally work wherever we need to accept or display dates and times. But as every programmer knows, “generally” is not “always”. Fortunately you can use the utility class java.time.format.DateTimeFormatter can help with both parsing input and formatting output.

The core of DateTimeFormatter centers on building a format string that governs both parsing and formatting. You build up your format with the pieces in Table 8-4. We are only listing a portion of the options available here, but these should get you through the bulk of the dates and times you will encounter. Note that case matters when using the characters mentioned!

Table 8-4. Popular DateTimeFormatter Elements
Character Description Example

y

year-of-era

2004; 04

M

month-of-year

7; 07

L

month-of-year

Jul; July; J

d

day-of-month

10

E

day-of-week

Tue; Tuesday; T

a

am-pm-of-day

PM

h

clock-hour-of-am-pm (1-12)

12

K

hour-of-am-pm (0-11)

0

k

clock-hour-of-day (1-24)

24

H

hour-of-day (0-23)

0

m

minute-of-hour

30

s

second-of-minute

55

S

fraction-of-second

033954

z

time-zone name

Pacific Standard Time; PST

Z

zone-offset

+0000; -0800; -08:00

To put together a common US short format, for example, you could use the ‘M’, ‘d’, and ‘y’ characters. You build the formatter using the static ofPattern() method. Now the formatter can be used (and reused) with the parse() method of any of the date or time classes.

jshell> import java.time.format.DateTimeFormatter

jshell> DateTimeFormatter shortUS = DateTimeFormatter.ofPattern("MM/dd/yy")
shortUS ==> Value(MonthOfYe ... (YearOfEra,2,2,2000-01-01)

jshell> LocalDate valentines = LocalDate.parse("02/14/19", shortUS)
valentines ==> 2019-02-14

jshell> LocalDate piDay = LocalDate.parse("03/14/19", shortUS)
piDay ==> 2019-03-14

And as we mentioned earlier, the formatter works in both directions. Just use the format() method of your formatter to produce a string representation of your date or time.

jshell> LocalDate today = LocalDate.now()
today ==> 2019-12-14

jshell> shortUS.format(today)
$30 ==> "12/14/19"

jshell> shortUS.format(piDay)
$31 ==> "03/14/19"

Of course, formatters work for times and date-times as well!

jshell> DateTimeFormatter military = DateTimeFormatter.ofPattern("HHmm")
military ==> Value(HourOfDay,2)Value(MinuteOfHour,2)

jshell> LocalTime sunset = LocalTime.parse("2020", military)
sunset ==> 20:20

jshell> DateTimeFormatter basic = DateTimeFormatter.ofPattern("h:mm a")
basic ==> Value(ClockHourOfAmPm)':'Value(MinuteOfHour,2)' 'Text(AmPmOfDay,SHORT)

jshell> basic.format(sunset)
$42 ==> "8:20 PM"

jshell> DateTimeFormatter appointment = DateTimeFormatter.ofPattern("h:mm a MM/dd/yy z")
appointment ==> Value(ClockHourOfAmPm)':'Value(MinuteOfHour,2)' ' ... 0-01-01)' 'ZoneText(SHORT)

jshell> ZonedDateTime dentist = ZonedDateTime.parse("10:30 AM 11/01/19 EST", appointment)
dentist ==> 2019-11-01T10:30-04:00[America/New_York]

jshell> ZonedDateTime nowEST = ZonedDateTime.now()
nowEST ==> 2019-12-14T09:55:58.493006-05:00[America/New_York]

jshell> appointment.format(nowEST)
$47 ==> "9:55 AM 12/14/19 EST"

Notice in the ZonedDateTime portion above that we put the time zone identifier (the z character) at the end—probably not where you were expecting it! We wanted to illustrate the power of these formats. You can design a format to accommodate a very wide range of input or output styles. Legacy data and poorly designed web forms come to mind as direct examples of where DateTimeFormatter can help you retain your sanity.

Parsing Errors

Even with all this parsing power at your fingertips, things will sometimes go wrong. And regrettably, the exceptions you see are often too vague to be immediately useful. Consider the following attempt to parse a time with hours, minutes, and seconds:

jshell> DateTimeFormatter withSeconds = DateTimeFormatter.ofPattern("hh:mm:ss")
withSeconds ==> Value(ClockHourOfAmPm,2)':'Value(MinuteOfHour,2)':'Value(SecondOfMinute,2)

jshell> LocalTime.parse("03:14:15", withSeconds)
|  Exception java.time.format.DateTimeParseException: Text '03:14:15' could not be parsed: Unable to obtain LocalTime from TemporalAccessor: {MinuteOfHour=14, MilliOfSecond=0, SecondOfMinute=15, NanoOfSecond=0, HourOfAmPm=3, MicroOfSecond=0},ISO of type java.time.format.Parsed
|        at DateTimeFormatter.createError (DateTimeFormatter.java:2020)
|        at DateTimeFormatter.parse (DateTimeFormatter.java:1955)
|        at LocalTime.parse (LocalTime.java:463)
|        at (#33:1)
|  Caused by: java.time.DateTimeException: Unable to obtain LocalTime from TemporalAccessor: {MinuteOfHour=14, MilliOfSecond=0, SecondOfMinute=15, NanoOfSecond=0, HourOfAmPm=3, MicroOfSecond=0},ISO of type java.time.format.Parsed
|        at LocalTime.from (LocalTime.java:431)
|        at Parsed.query (Parsed.java:235)
|        at DateTimeFormatter.parse (DateTimeFormatter.java:1951)
|        ...

Yikes! A DateTimeParseException will be thrown anytime the string input cannot be parsed. It will also be thrown in cases like our example above; the fields were correctly parsed from the string but they did not supply enough information to create a LocalTime object. It may not be obvious, but our time, “3:14:15” could be either mid-afternoon or very, very early in the morning. Our choice of the hh pattern for the hours turns out to be the culprit. We can either pick an hour pattern that uses an unambiguous 24-hour scale or we can add an explicit AM/PM element:

jshell> DateTimeFormatter valid1 = DateTimeFormatter.ofPattern("hh:mm:ss a")
valid1 ==> Value(ClockHourOfAmPm,2)':'Value(MinuteOfHour,2)' ... 2)' 'Text(AmPmOfDay,SHORT)

jshell> DateTimeFormatter valid2 = DateTimeFormatter.ofPattern("HH:mm:ss")
valid2 ==> Value(HourOfDay,2)':'Value(MinuteOfHour,2)':'Value(SecondOfMinute,2)

jshell> LocalTime piDay1 = LocalTime.parse("03:14:15 PM", valid1)
piDay1 ==> 15:14:15

jshell> LocalTime piDay2 = LocalTime.parse("03:14:15", valid2)
piDay2 ==> 03:14:15

So if you ever get a DateTimeParseException but your input looks like a correct match for the format, double check that your format itself includes everything necessary to create your date or time. One parting thought on these exceptions: you may need to use the non-mnemonic ‘u’ character for parsing years

There are many, many more details on DateTimeFormatter. More than most utility classes, it’s worth a trip to read the docs online.

Timestamps

One other popular date-time concept that java.time understands is the notion of a timestamp. Any situation where tracking the flow of information is required, you’ll need a record of exactly when the information is produced or modified. You will still see the java.util.Date class used to store these moments in time, but the java.time.Instant class carries everything you need for a timestamp and comes with all the other benefits of the other classes in the java.time package.

jshell> Instant time1 = Instant.now()
time1 ==> 2019-12-14T15:38:29.033954Z

jshell> Instant time2 = Instant.now()
time2 ==> 2019-12-14T15:38:46.095633Z

jshell> time1.isAfter(time2)
$54 ==> false

jshell> time1.plus(3, ChronoUnit.DAYS)
$55 ==> 2019-12-17T15:38:29.033954Z

If dates or times appear in your work, the java.time package makes for a welcome addition to Java. You now have a mature, well-designed set of tools for dealing with this data—no third-party libraries needed!

Other Useful Utilities

We’ve looked at some of Java’s building blocks including strings and numbers as well as one of the most popular combinations of those strings and numbers—dates—in the LocalDate and LocalTime classes. Hopefully this range of utilities has given you a sense of how Java works with many simple or common elements you are likely to encounter when solving real world problems. Be sure to read the documentation on the java.util, java.text, and java.time packages for more utilities that may come in handy. For example, you could look into using java.util.Random for generating the random coordinates of the trees we saw in Figure 8-1. It is also important to point out that sometimes “utility” work is actually complex and requires careful attention to detail. You can often search online to find code examples or even complete libraries written by other developers that may speed up your own efforts.

Next up we want to start building on these more fundamental concepts. Java remains as popular as it is because it includes support for more advanced techniques in addition to the basics. One of those advanced techniques that played an important role in Java’s early success is the “thread” features baked right in. Threads provide the programmer with better access to modern, powerful systems keeping your applications performant even while handling many complex tasks. Let’s dig in to see how you can take advantage of this signature support.

1 When in doubt, measure it! If your String-manipulating code is clean and easy to understand, don’t rewrite it until someone proves to you that it is too slow. Chances are that they will be wrong. And don’t be fooled by relative comparisons. A millisecond is 1,000 times slower than a microsecond, but it still may be negligible to your application’s overall performance.

2 Java 13 has a preview of multiline string literals: https://openjdk.java.net/jeps/355

3 On most platforms the default encoding is UTF-8. You can get more details on character sets, default sets and standard sets supported by Java in the official Javadoc for the java.nio.charset.Charset class at docs.oracle.com.

4 Validation of email addresses turns out to be much trickier than we can address here. Regular expressions can cover most valid addresses, but if you are doing validation for a commercial or other professional application, you may want to investigate third-party libraries such as those available from Apache Commons.

5 You are welcome to apply for your own, custom global TLD if you have a few (hundred) thousand dollars lying around.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.1.239