Avoid Compute-Intense Operations During Iteration

 class​ Inventory {
 
 private​ List<Supply> supplies = ​new​ ArrayList<>();
 
  List<Supply> find(String regex) {
  List<Supply> result = ​new​ LinkedList<>();
 for​ (Supply supply : supplies) {
»if​ (Pattern.matches(regex, supply.toString())) {
  result.add(supply);
  }
  }
 return​ result;
  }
 }

When you iterate over a data structure, you need to be careful with what kind of operations you perform. If you do something that is compute-intense, it can easily turn into a performance pitfall. The code above shows a typical example for this with the method find() that locates Supply objects with a regular expression.

In Java, or in any other programming language, you’ll build query strings using a regular expression—regex for short. A regex enables efficient queries on large sets of textual data.

To get familiar with regexes, we recommend you look at java.util.regex.Pattern[21] in the Java API. This class is Java’s representation of regular expressions, and it offers a variety of methods for building and executing them. The most straightforward way is probably the one you see in the code snippet above: you just call the static method matches() and provide it with a regex String and a String to search in. This is handy, but it’s a performance pitfall. During execution, Java takes the expression String, regex, and constructs a special-purpose automaton from it. This automaton will accept Strings that follow the pattern, rejecting all others.

In Pattern.matches(regex, supply.toString()) we both compile such an automaton and try to match supply.toString() to it. Compiling a regex automaton consumes time and processing power, just as the compilation of a class takes time. Usually, it’s a one-time effort, but here the regex is compiled on every iteration.

Be aware that some other very popular methods in the Java API, such as String.replaceAll(), behave the same way!

So how can you prevent the compilation of a single regular expression over and over?

 class​ Inventory {
 
 private​ List<Supply> supplies = ​new​ ArrayList<>();
 
  List<Supply> find(String regex) {
  List<Supply> result = ​new​ LinkedList<>();
» Pattern pattern = Pattern.compile(regex);
 for​ (Supply supply : supplies) {
»if​ (pattern.matcher(supply.toString()).matches()) {
  result.add(supply);
  }
  }
 return​ result;
  }
 }

The solution to the potential performance pitfall is very simple: make sure that the computation-intense operation takes place as rarely as possible.

Here, you should only compile the regex once for every method call. After all, the expression string doesn’t change between each iteration of the loop.

Luckily, we can get a single compilation easily with the Pattern API. To do so, we need to separate the two operations that are bundled in the call of Pattern.matches(). First, this is the compilation of the expression and, second, its execution on the search string.

We can extract the first step through a call to Pattern.compile(), which creates a compiled regular expression, an instance of Pattern. This is the computation-intense step, and we can store its result in a local variable.

The second step, the execution of the compiled expression, is the easy and quick one. It’s also the one that we need to execute for every instance of Supply, so it needs to go into the body of the loop.

Here, we first create a Matcher for the String we want to search. This is a handle for searching in different fashions, even iteratively. In this case, we need to check only if a supply matches the regular expression at all, which we can do by calling matches().

To sum up, with regexes, a little modification can mean a big difference in performance!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.14.93