Searching strings with regular expressions

The regular expression API became part of JDK since v1.4. Pattern, Matcher, and String classes contain functionality for regular expression matching and replacement that may not always be obvious, especially for complex use cases.

Luckily, Groovy adds some syntax sugar and functionality to support regular expressions in a more native fashion that will be demonstrated in this recipe.

Getting ready

We assume that you already have familiarity with regular expressions. This recipe will only focus on the features added by Groovy to the already rich infrastructure offered by Java to deal with regular expressions.

How to do it...

To begin with, Groovy offers a simple way to create Pattern objects using the ~/pattern/ notation:

Pattern pattern = ~/^.*?groovy.*$/

The previous pattern will match any text that contains the word groovy in it.

In fact, slashes at the beginning and the end of a pattern represent an alternative way to define strings in Groovy. The following code will have the same effect:

Pattern pattern = ~"^.*?groovy.*$"

The important difference between using slashes and quotes is that, in the former case, you omit escaping backslashes and some other special characters, for example, "\s\d" can just be written as /sd/.

You can use Pattern objects in the same way you would do in Java, but there are other goodies that Groovy offers. There is the ==~ operator that you can use to directly match your input string against a regular expression:

def input = 'Probably the easiest way to get groovy' +
            ' is to try working with collections.'
if (input ==~ /^.*?groovy.*$/) {
  println 'Found groovy'
}

In fact, the ==~ operator is equivalent to calling the matches method of the String class.

Another operator that you can use in a similar way is =~ (note the single = sign):

if (input =~ /^.*?groovy.*$/) {
  println 'Found groovy'
}

Actually, the input =~ /^.*?groovy.*$/ expression just creates a Matcher object under the hood. According to Groovy Truth, a Matcher object is equivalent to true if it has at least one match. That's why if the input string contains the groovy, the body of the if statement will be executed. That's also why you can assign this expression to a variable:

def matcher = input =~ /^.*?groovy.*$/

You can use all the standard JDK methods of the Matcher class as well as the additional ones added by Groovy JDK.

It is also possible to refer to matcher's occurrences and internal regular expression groups using the array index notation:

def matcher =
  'The Groovy Cook Book contains Groovy recipes' =~ /(.oo.)s/

println "<${matcher[0][0]}>"
println "<${matcher[0][1]}>"
println "<${matcher[1][0]}>"
println "<${matcher[1][1]}>"

The previous code will print:

<Cook >
<Cook>
<Book >
<Book>

To better explain the previous example, we can rewrite this code using closures and the each method:

matcher.each { match ->
  match.each { group ->
    println "<$group>"
  }
}

This will produce the same output as the first code snippet. As you can see, the first dimension of matcher is our regular expression occurrences, and the second dimension is a list of groups inside the regular expression. The group with index 0 corresponds to a fully matched string, and the next indexes refer to internal groups.

There's more...

A more advanced way to use regular expressions is to use them with replacement patterns. That's where Groovy provides very interesting extensions. For example, you can apply a closure to matching strings:

def input = 'The Groovy Cook Book contains Groovy recipes'
println input.replaceAll(/w*?oow*?/) { match ->
  match.toUpperCase()
}

The previous code will print:

The GROOVY COOK BOOK contains GROOVY recipes

We just called the toUpperCase method for every string that matched the /w*?oow*?/ expression in the original input. As you can guess, the regular expression in the snippet matches any word containing a double small "o" letter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.35.122