C H A P T E R  15

Generics and Regular Expressions

Readers who already know Java may very well ask why these two topics are together in the same chapter. The answer is that they both involve pattern matching. A generic specifier (also known as a parameter) is a pattern that code must match in order to use a particular block of code (which might be an interface, a class, a method, or other things). Regular Expressions, on the other hand, use patterns to select substrings within strings. In both cases, the pattern restricts the available selections. Both have additional benefits as well, which we'll get to next.

Generics

Generics offer a way to specify the kind of objects that a class, variable, or method can use. The most common use of generics is to specify what kind of object can go into a collection (such as a list or a tree or a hashmap). Another use is to allow a type that has yet to be specified to be used where the generic is specified. In that sense, the type is generic, which is where the name of this idea comes from. We'll see examples of both kinds of generics as proceed through this section.

Prior to Java 5, Java had no mechanism for specifying generics. That lack led to a number of problems, including being able to assign unexpected types of objects to a collection (leading to run-time errors), the necessity of casting objects from one type to another, and overly verbose and complex code. Fortunately, we now have a way to avoid those problems.

The syntax of generic specifiers relies on the angle bracket characters (< and >). To create a collection with a generic specifier, add the generic expression at the end of the type specifier for the collection, as shown in Listing 15-1.

Listing 15-1. A Simple Generic

LinkedList<JPanel> panelList = new LinkedList<JPanel>();

That line of code declares a LinkedList that can only contain JPanel objects. Java 7 introduces a nice shorthand that slightly reduces the amount of code. In particular, you can leave out the type declaration when creating an instance of a parameterized object, provided the compiler can infer the type from elsewhere in the line. Thus, the code in Listing 15-1 could be replaced with the code in Listing 15-2.

Listing 15-2. A Simplified Generic

LinkedList<JPanel> panelList = new LinkedList<>();

Notice that the constructor for our LinkedList object indicates that it's a generic but doesn't provide the type of the objects that can go in the list. We can leave that out because the compiler can infer a type of JPanel from the type declaration portion of this variable declaration. By the way, the <> expression is often called "the diamond."

An object can have multiple parameters, provided a matching class exists to define that object. Listing 15-3 shows an example.

Listing 15-3. A Generic with Multiple Parameters

package com.bryantcs.examples;

public class GenericRole<Actor, Role> {

        private Actor actor;
        private Role role;
        
        public GenericRole(Actor p, Role a) {
                actor = p;
                role = a;
        }

        public Actor getActor() {
                return actor;
        }

        public Role getRole() {
                return role;
        }
}

A significant feature of this class is that you do not have to create an Actor class or a Role class. Because it uses generics, the declarations that create instances of the GenericRole object must specify the types of the Actor and Role objects.

As I mentioned at the beginning of this section, this arrangement is where the word "generic" comes from. These objects can be undeclared at this point, so they are, in a sense, generic.

So let's look at a class that does something with our GenericRole class. Consider Listing 15-4.

Listing 15-4. Using a Multiple-Parameter Generic

package com.bryantcs.examples;

import java.util.LinkedList;

public class GenericRoleProgram {
        public static void main(String[] args) {
                LinkedList<GenericRole<String, String>> roleMap =
                        new LinkedList<GenericRole<>>();
                
                roleMap.add(new GenericRole<String, String>("Humphrey Bogart",
                                "Sam Spade"));
                System.out.println(roleMap.getFirst().getActor() +
                                " appeared on screen as " + roleMap.getFirst().getRole());
        }
}

Notice that we now have parameters nested within parameters (in  LinkedList<GenericRole<String, String>> roleMap). When a parameterized (that is, generic) collection consists of a type of object that itself has parameters, you get nested generic specifiers. It may seem odd, but it's common practice once you start using generics.

You can also specify that a collection can contain multiple kinds of objects, provided that the objects all extend the same class or implement the same interface. To do so, Java includes a wildcard (in the form of the question mark character) that you can use with generic specifiers. For an example, consider Listing 15-5.

Listing 15-5. Using the Generic Wildcard

LinkedList<? extends JPanel> panels = new LinkedList<>();

That declaration says any class that extends JPanel can be a member of this list. Remember all the times I extended JPanel in the chapters about animation, video games, and recursion? If I ever need a single list to hold those different panels, the list above would do the job nicely.

The extends keyword works with interfaces, too. So, if I wanted a list of classes that implement the MouseListener interface, I could use the declaration shown in Listing 15-6.

Listing 15-6. A Generic for an Interface

LinkedList<? extends MouseListener> mouseListeners = new LinkedList<>();

Listing 15-7. A Generic with the Super Keyword

LinkedList<? super JPanel> panelAncestors = new LinkedList<>();

The super keyword specifies that any object extended by the JPanel class can be a member of this list. For this list, that would be objects of type javax.swing.JComponent. The super keyword is probably most useful when you want to ensure that the objects that satisfy a parameter are comparable (so that they can be sorted). In my experience, it's not often used, so I won't dive into it any further.

Now that we've seen the syntax for generics, let's talk about why you want to use them. The biggest benefit is earlier error detection. It's a truism in software development (and many other professions) that the earlier you catch an error, the less expensive it is to fix. If we can catch an error at coding time, fixing it is just a matter of re-writing code we're already working on; the cost is trivial. If an error makes it to the testers (assuming we have testers—not all software companies have test teams), it's more expensive. The test team has to find it and tell the developer about it, the developer (who has moved on to some other task) has to re-open and modify that code, and then the test team has to verify that the fixed code works correctly. The worst result is when an error gets all the way to the customer; we get all the added expense of communicating with the test team and the customer, with the added (and usually more important) cost that the customer now thinks less of our software and our company. So let's adopt techniques, including generics, that catch errors early.

Listing 15-9 and Listing 15-10 demonstrate why generics promote early error detection.

Listing 15-8. An Ordinary List

List integerList = new LinkedList();
integerList.add(new Integer(0));
integerList.add("here's a problem"); // perfectly legal and very wrong

In listing 15-9, integerList can contain any object. I can pass objects of type String into that list. A name is just a name and, while it reveals the intent (which is good practice), it doesn't offer any protection against someone passing things other than objects of type Integer into the list. Consequently, when someone does pass something other than an Integer object, we get a run-time error when we try to get Integer objects out of this list.

So let's see how generics prevent the testers or, worse, the customer from ever seeing our error.

Listing 15-9. A Generic List

List<Integer> integerList = new LinkedList<Integer>();
integerList.add(new Integer(0));
integerList.add("here's a problem");

The generic expression (or parameter) on the List declaration in the second example specifies that this list can contain only Integer objects. The Integer parameter prevents anyone from passing a String object (or anything but an Integer object) to integerList. When some other programmer tries to add an object that isn't of type Integer to integerList, they get a compiler error. Their code can't be compiled, and there's no chance the customer will ever see an error because some sloppy coder confuses a String with an Integer. Figure 15-1 shows the error that Eclipse produces when I try it.

images

Figure 15-1. Type match error from trying to misuse a generic list

Notice how it says the proper argument for the add method is an Integer object. The List interface has no such method, in fact. However, the Eclipse compiler creates an instance of the LinkedList class that has such a method. Consequently, no one can compile code that violates the intention of our generic list. That prevents all the problems that might occur at run-time and prevents our fellow programmers, the testers, and ultimately our customers from thinking we must be idiots.

Personally, I also find this kind of code to be easier to read and to write. Casting always feels like clutter to me. Purist that I am, I also much prefer to have the proper type in the first place and not need to cast.

Finally, ensuring that your collections contain only the types you expect is one aspect of defensive programming (another good practice every programmer should adopt). If you ensure that no other programmer (including yourself at a later date) can pass bad values to your code, you ensure less trouble for your users. It's a thankless task, as no one (except possibly your co-workers) will ever realize you did it, but it's a good idea all the same. If you wish to think of it in more positive terms, think of it as ensuring that the developers who use your code are more likely to write error-free code. One of my co-workers (Matt Hinze, who also writes books about MVC when not coding) calls it “pushing our customers into a pit of success.” However you phrase it, limiting the possibilities for errors to creep into the system is the epitome of good software development practice.

Regular Expressions

If you've ever worked with files from the command line on your computer, you may very well have used a regular expression without realizing it. For example, I recently wanted a list of all the HTML files in a directory (on a Windows 7 system). In a command window, I typed dir *.htm and got the list I wanted. *.htm is in fact a regular expression that means all the files with an extension of htm. Suppose I had wanted all the HTML files with names that start with “s”. The command would have been dir s*.htm. Regular expressions in Java work in much the same way, except that you can specify much more complex patterns.

The Java regular expression package is java.util.regex. It contains the MatchResult interface, the Matcher class (which implements the MatchResult interface), the Pattern class, and the PatternSyntaxException class. You can't directly instantiate the Matcher and Pattern classes, as they have no public constructors. In other words, new Matcher and new Pattern don't work. Instead, the pattern for using them is to get a Pattern object by calling one of the compile methods within the Pattern class. Then you get a Matcher object by calling the matcher method within the Pattern class. Finally, to find the substrings that match your pattern, you call the find method within the Matcher class. Let's create a class that will let us experiment with the Pattern and Matcher classes. Listing 15-11 shows one possible implementation of such a class.

Listing 15-10. RegexTester Class

package com.bryantcs.examples;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTester {

  public static void main(String[] args) {
    Pattern pattern = Pattern.compile(args[0]);
    Matcher matcher = pattern.matcher(args[1]);
    while(matcher.find()) {
      String groupText = matcher.group();
      int matchBegin = matcher.start();
      int matchEnd  = matcher.end();
      StringBuilder sb = new StringBuilder("Found a match for ");
      sb.append(groupText);
      sb.append(" beginning at ");
      sb.append(matchBegin);
      sb.append(" and ending at ");
      sb.append(matchEnd);
      System.out.println(sb);
    }
  }
}

matcher.group gives us the text being matched (that's essentially the result of the pattern we specify in the second argument). matcher.start gives us the starting position of the matched string within the input string (the first argument). matcher.end gives us the ending position of the matched string. I used a StringBuilder object to avoid a really long line, which is awkward to read in a book. I often use StringBuilder objects within my production code, too, for the sake of performance (the concatenation operator is the worst way to create a String object).

Before we plunge into the syntax of regular expressions, let's cover how to pass values to the RegexTester program. In doing so, we'll also run it for the first time. To set up arguments for RegexTester, follow these steps.

  1. From the Run menu, choose Run Configurations. The Run Configurations window appears, as shown in Figure 15-2.
    images

    Figure 15-2. Run configurations for RegexTester

  2. In the right pane, click the Arguments tab. The Arguments tab appears, as shown in Figure 15-3.
    images

    Figure 15-3. Empty parameters for RegexTester

  3. To set up our first test data, type the following text (including the quotation marks) into the Program arguments: field:
         "Sam" "Sam Spade;Yosemite Sam;Sam Merlotte;Samwise Gamgee;"

    When you're done typing, the window should look like the window shown in Figure 15-4.

    images

    Figure 15-4. Populated parameters for RegexTester

  4. To run the RegexTester program, click Run. The output of the program appears in the Eclipse console, as shown in Figure 15-5.
    images

    Figure 15-5. Output of RegexTester in the Eclipse console

I won't show all those steps for each test, but I thought it might help you to see them for the first test. For the subsequent tests, I'll just show what to type in the Arguments tab and what appears in the console.

Now that we have a testing program and know how to use it, we need to focus on the regular expression syntax that Java supports. Regular expression syntax is almost a language unto itself, so we'll focus on the basics and some of the more commonly used advanced bits. The whole thing is worthy of a book (and such books exist).

Our simple test case uses a string literal. A string literal is just a piece of text. In the example we just ran, "Sam" is a string literal. "Spade" is another string literal. If we replace "Sam" with "Spade," we get the following output in the console:

Found a match for Spade beginning at 4 and ending at 9

We won't be able to accomplish much with just string literals. We can find all the instances of a particular string, but we can't find anything that matches a pattern. To create a pattern, we have to dive into the key component of regular expressions—metacharacters.

Metacharacters are characters that create patterns. Rather than represent a single literal character, a metacharacter represents a set of characters. Some metacharacters work by themselves, while other metacharacters are meaningless in the absence of other metacharacters. Table 15-1 describes the metacharacters supported by the Java regular expression syntax.

image
image

From all those examples, I bet you're beginning to get an idea of how powerful regular expressions can be. In truth, though, describing the metacharacters is just scratching the surface of regular expressions. There's lots more to it than what I've shown here. Let's learn a little more by looking at examples.

Returning to our example involving fictional characters named Sam, suppose we want to get the whole name (including the separator, which is a semicolon). We might try something like the following:

(Sam).*;

The output of that is:

Found a match for Sam Spade;Yosemite Sam;Sam Merlotte;Samwise Gamgee; beginning at 0 and
ending at 51

That's not going to work. The trouble is that the .* pattern matches everything it can (that's called a greedy match). In this case, it matches the whole line. Fortunately, the Java regular expression syntax includes a way to make a pattern not be greedy (regular expression programmers would say it's reluctant). To make a match be reluctant, we can append the question mark character (?) to the pattern, as follows:

(Sam).*?;

The output of that regular expression is:

Found a match for Sam Spade; beginning at 0 and ending at 10
Found a match for Sam; beginning at 19 and ending at 23
Found a match for Sam Merlotte; beginning at 23 and ending at 36
Found a match for Samwise Gamgee; beginning at 36 and ending at 51

We're getting closer, but what happened to the “Yosemite” in “Yosemite Sam”? Well, the expression starts with (Sam), so it will match only bits that start with “Sam”, which doesn't include “Yosemite Sam”. The solution is to use the .*? pattern at the beginning as well as at the end, as follows:

.*?(Sam).*?;

Notice that the leading pattern must be reluctant, too, or we get the whole line again. Now the output is:

Found a match for Sam Spade; beginning at 0 and ending at 10
Found a match for Yosemite Sam; beginning at 10 and ending at 23
Found a match for Sam Merlotte; beginning at 23 and ending at 36
Found a match for Samwise Gamgee; beginning at 36 and ending at 51

In this fashion, we've parsed a line containing multiple records. We could then add code to write each match to a separate line in a file or otherwise manipulate each of the matching values. This kind of parsing is a common task in software development, and regular expressions offer one good way to do it.

As I have indicated, regular expressions can get a lot more complicated. The following regular expression removes "Sam" from each entry that starts with “Sam”:

S(?!am)|(?<!S)a|a(?!m)|(?<!Sa)m|[^Sam](.*?;)

Its output is:

Found a match for  Spade; beginning at 3 and ending at 10
Found a match for Yosemite Sam; beginning at 10 and ending at 23
Found a match for  Merlotte; beginning at 26 and ending at 36
Found a match for wise Gamgee; beginning at 39 and ending at 51

The code to also remove the “Sam” in “Yosemite Sam” would be even more complex. As it happens, negating a group is one thing that regular expressions don't make easy. In those cases, it's often best to mix regular expressions with other String operations and to pass the result of one expression to another regular expression (a process known as chaining). Those techniques let you manage the complexity of your regular expressions and may offer better performance than a single complex regular expression.

If you want to know more about regular expressions, start with the official Regular Expression Tutorial at http://download.oracle.com/javase/tutorial/essential/regex/index.html

Summary

This chapter covered the things that benefit from pattern matching: generics and regular expressions. About generics, we learned that:

  • We can specify the kind of content that goes into a collection.
  • Thanks to an improvement introduced in Java 7, we can use the diamond specifier (<>) to shorten our code a bit, so long as the compiler can determine the type from earlier in the line.
  • Generics can have multiple parameters.
  • We can nest generic parameters to ensure we get the proper kinds of objects at any depth.
  • We can use wildcards within generic parameters, to accommodate similar objects (any object that extends a particular class or implements a particular set of interfaces or both).
  • Generics let us catch problems at coding time rather than at run time, saving time and embarrassment.

About regular expressions, we learned:

  • How to instantiate the member classes (Matcher and Pattern) of the java.util.regex package.
  • What each of the metacharacters does.
  • How to combine the metacharacters in a number of useful ways.
  • How to make a pattern be reluctant (match the fewest possible characters) rather than greedy (match the most possible characters).
  • That regular expressions can become very complex and a bit about how to manage that complexity.

This chapter covered two language features that I hope you will find useful as you develop your own programs. I especially hope that you'll use generics any time you use a collection, as you should embrace best practices whenever you can. As for regular expressions, remember that they are supposed to make things simpler. If you find that a regular expression is too hard to figure out, break it up with other String operations and use multiple regular expressions rather than one big one.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.60.158