Chapter 13. Tips and tricks

 

The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague.

 
 --Edsger Dijkstra

Learning language features and library APIs is one thing; using a language for your everyday programming tasks has its own challenges. As the saying goes, “In theory, practice and theory are one and the same. In practice, they’re not.” This chapter attempts to bridge the gap, giving some insight into what it’s like to use Groovy for real, and (we hope) steering you clear of some of the potholes others (including us) have run into.

Closures are a good example of the gulf between practice and theory. They may appear unfamiliar and difficult in the language description, but they turn out to be simple and straightforward in everyday use. Other concepts may appear simple but have certain consequences that the programmer needs to be aware of to avoid typical pitfalls. This is covered in section 13.1.

Furthermore, the features of Groovy often suggest a certain way of approaching a task that is different from Java or other languages. In such cases, although there is a certain comfort level in staying with what you know, you will generally become more productive if you follow the Groovy idioms. This isn’t because “Groovy knows best” (although of course we believe that the Groovy way is usually the best way), but because it’s generally easier to go with the flow of a language than to fight against it. We show a few pieces of Groovy idiom in sections 13.2, 13.3, and 13.4.

Software development consists of more than just the programming; it also includes debugging, profiling, and setting up the working environment to make programming easier. Section 13.5 gives hints for organizing your work.

Things to remember

The following sections should remind you about some Groovy idiosyncrasies that result from its language design and dynamic nature. Take this as a checklist of topics that have been presented earlier in this book and that you should not forget. It’s also handy as a list of potential “gotchas” to run down if your code isn’t behaving as you expect it to.

Equality versus identity

The distinction between equality and identity is one of the first things you learn about Groovy. There are some consequences you should be aware of. Table 13.1 has the comparison between Java and Groovy idioms of equality and identity.

Table 13.1. Equality and identity in Groovy compared to Java

 

Groovy

Java

Equality

a == b

a.equals(b)

Identity

a.is(b)

a == b

Groovy equality isn’t necessarily commutative; it isn’t guaranteed that a==b is the same as b==a. A programmer may choose to override equals and break this behavior, even though you shouldn’t do so.

Furthermore, Groovy allows null in equality checks:

null == null // is true
null == 1    // is false

You cannot do that in Java, because null.equals(b) would throw a NullPointerException.

Using parentheses wisely

When in doubt, you can use the Java style of using parentheses: always putting parentheses around method arguments. On the other hand, leaving out parentheses can often enhance readability by focusing the eye of the reader on the guts of the code. You have the choice between the following:

println 'hi'
println('hi')

However, if no arguments are given, the parentheses are mandatory to distinguish method calls from property access:

println()  // ok
println    // <- fails with MissingPropertyException

The MissingPropertyException is thrown because with no arguments and no parentheses given, Groovy assumes you are looking for the println property and would call getPrintln if there were such a method.

Note that this is different from other languages with optional elements of syntax, such as Ruby. Another difference is that parentheses can be omitted only for method calls that are top-level statements. In other words, parentheses are mandatory for method calls that are used in expressions:

'abc'.substring 1,3         // ok
x = 'abc'.substring 1,3     // assignment expression -> parser error
println 'abc'.substring 1,3 // argument expression -> parser error

Finally, putting symbols in parentheses forces the Groovy parser to resolve the symbol as an expression. This can be helpful when specifying keys in maps. Consider a map like

map = [x:1]

which is equivalent to

map = ['x':1]

Now, what if you have a variable x in scope, and you would like to use its content as a key in the map? You can enforce that with parentheses around x:

def x = 'a'
assert ['a':1] == [(x):1]

This trick is also described in section 14.3.

Returning from methods and closures

Remember that inside a closure, return returns from the closure, not from the method the closure was passed to as an argument, nor from any method surrounding the closure definition. Suppose you run a line like

[1,2,3].each { print it; return }

This prints 123, not 1 as some might expect, because return returns from the closure, not from the each method. The closure is called three times, and each time it is left via return. Compare this to

for (it in [1,2,3]) { print it; return }

which prints 1 because return now leaves the current block. With this difference in mind, you can guess what this snippet does:

def myMethod() {
    [1,2,3].each { print it; return }
}
myMethod()

Right, it prints 123. Again, the return keyword only leaves the closure, not the surrounding myMethod.

So, how can you write closure code that leaves a method prematurely? There currently is only one way—by throwing an exception:

def myMethod() {
    [1,2,3].each { print it; throw new RuntimeException() }
}
try {myMethod()} catch (Exception e){}

This prints 1. However, this code is really ugly. Alternatives are in the works but not yet available at the time of writing. See also section 5.6.

The groovier way to leave an iteration prematurely is different. If possible, you should attempt to iterate over the right set of elements to start with, rather than aborting the iteration early. The methods find, findAll, and grep and the subscript operator with indexes or ranges are your friends here. The following lines show some alternatives:

list[0..1]                .each { processing(it) }
list.find{ it == 2 }      .each { processing(it) }
list.findAll{ it % 2 == 0}.each { processing(it) }
list.grep(~/d/)          .each { processing(it) }

In essence, you’re using a GPath to restrict the work items declaratively rather than using control structures in a procedural way. This course of action isn’t always available, but it should be used where it is both possible and elegant. When you follow this style, you have the additional benefit of separating the concerns of selecting items and processing them.

Calling methods in builder code

Suppose you are going to build nodes with NodeBuilder such that you get an outer node containing a nested middle node with an inner node like this:

outer() {
  middle() {
    inner()
  }
}

The usual code for producing this structure with NodeBuilder is straightforward:

new NodeBuilder().outer {
    middle {
        inner()
    }
}

Now, suppose you would like to extract the production of the middle and inner nodes to a method. You might want to do this because the production is complicated or you use that production logic in multiple places. Let’s call the new method produce.

You cannot implement it as

def produce(){
    middle {    // fails - no such method!
        inner()
    }
}

and call it like this:

new NodeBuilder().outer {
    produce()
}

Groovy will complain because it can’t find the middle method. Within the scope of the produce method, you have to make the builder known to the first method call on the builder:

def builder = new NodeBuilder()

builder.outer {
    produce()
}
def produce(){
    builder.middle(){ // needs the builder reference
        inner()       // now it's known
    }
}

Alternatively, you can use the following to avoid using a shared variable—if your production code is in a different class to the declaration of the builder, for example:

def builder = new NodeBuilder()

builder.outer {
    produce(builder)
}
def produce(builderContext){
    builderContext.middle(){
        inner()
    }
}

In both cases, you make the reference to the NodeBuilder available in the produce method in order to get back into the context of the builder. Once the first method call has been made, the builder will set the delegate of the closure to the builder, which is why the call to inner doesn’t need to be made explicitly on the builder.

Apart the builder reference, there is another issue to keep in mind when using methods from within builder code: how the produce method is looked up. Why does the preceding code call the produce method we’ve defined rather than creating a new node called produce?

Before doing any builder-specific handling of a method call, the builder first tries calling the method on the owner. The builder handles the method call (by building nodes, for example) only if the owner doesn’t handle the method.

Consequently, in the preceding code, there would be a conflict if there were another method called inner within the script.

All builders that come with the Groovy distribution obey this rule. All builders that subclass BuilderSupport also have this behavior by default. However, the priority of local method lookup in builders cannot be guaranteed for all possible builders, because a pathological builder implementation may choose to override it, even though this is not advised.

Qualifying access to “this”

When referring to fields or methods from within the same class, most of the time it’s optional to prefix the name of the field or method with the this. qualifier. This behavior is equivalent to that of Java.

Disambiguation in Java

Even in Java, this prefix is sometimes used for disambiguation. The typical use is to distinguish between a local variable and an instance variable, either in a constructor or in a property setter, for example:

MyClass (Object myField) {       // Java constructor example
    this.myField = myField;
}

void setField (Object myField) { // Java property setter example
    this.myField = myField;
}

Disambiguation in Groovy

In Groovy, the need for distinction goes beyond that. Listing 13.1 combines examples for using the this prefix to differentiate between local variables, fields, and properties.

Example 13.1. Using this to distinguish between property and field access

Using this to distinguish between property and field access

It goes without saying that it is always good practice to avoid such name clashes. But they sometimes occur accidentally, such as when performing a renaming refactoring.

If you ever find yourself unsure about what’s going on but do want to make a lookup against this, it’s worth qualifying it, even if you decide that would be the default behavior anyway. There’s no need to make a maintenance engineer go through the same hoops as you to work out behavior.

A reference prefix like this is always needed when denoting method closures. A reference like &myMethod will never work; only using a reference like this.&myMethod works. The same is true for field access with the @ sign, as in this.@zero, which cannot be used without a preceding reference.

Considering number types

Groovy shines at capturing business logic in a declarative style. In the financial business and in scientific research, this often means lots of formulas and calculations.

In this scenario, you need to remember that Groovy returns BigDecimal objects from the division operator, and any BigDecimal math is slow compared to other number types.

When calculations are used extensively, it is profitable to avoid full floatingpoint division operations where possible. For example, you may want to calculate monetary values with cents instead of dollars and use intdiv for division. This proved to be useful in the first big commercial project that was fully implemented in Groovy.

Although Groovy relieves the programmer of tinkering with number types in a lot of places, there are remaining areas that need attention. Suppose we need to print the sine values from zero to 2π at every increment of π/2. Expected values would be close to 0, 1, 0, -1. The following solution would be straightforward, but wrong:

0.step(Math.PI*2, Math.PI/2){       // wrong!
  println "$it : ${Math.sin(it)}"
}

The Integer.step method takes an Integer argument for the upper bound. The preceding code is like using 0.step (6,PI/2). The correct version needs to call the step method on a non-integer, such as the BigDecimal 0.0 :

0.0G. step(Math.PI*2, Math.PI/2){ println "$it : ${Math.sin(it)}" }

Note that the G suffix is optional, but helps to make it obvious which dot is part of the number and which dot is involved in the method call. Whenever you encounter unexpected values with your calculations, check the number types being used and the method signatures.

Leveraging Ant

Groovy and Ant make a power duo. From within a Groovy script, all Ant capabilities are easily accessible via AntBuilder. From within an Ant script, all Groovy capabilities are easily accessible via the <groovy> task. You can take the best of both worlds, mixing and matching as needed.

Using Ant from Groovy

In section 8.4, you saw how to use AntBuilder. This is a valuable possibility to keep in mind. There are so many well-engineered Ant tasks that you will often find a good solution there.

But there are more reusable components in the Ant distribution than just the tasks. For example, the Ant fileScanner allows you to get all the File objects of one or multiple filesets, as shown in listing 13.2. The example scans all the listings in the current directory and—in our usual self-checking manner—asserts that the result contains our example script.

Example 13.2. Ant fileScanner example

def files = new AntBuilder().fileScanner {
    fileset(dir: '.') {
        include(name: 'Listing*.groovy')
    }
}
def scriptName = getClass().name + '.groovy'
assert files.collect{ it.name }.contains(scriptName)

The files variable refers to a FileScanner object. Because it has an iterator method, it supports all Groovy object iteration methods such as collect, which we use here.

A special task that is useful for calling a full Ant script file from Groovy is Ant’s ant task. It is straightforward to use. To call the build.xml Ant script, use it like

new AntBuilder().ant(antfile:'build.xml')

You can see this as a way to include an Ant build script into a Groovy script. This is also possible in the opposite direction: You can use Groovy scripts from Ant.

Using Groovy from Ant

Although Ant is extremely powerful, it can’t cater to every eventuality. It uses a declarative paradigm that is great for many tasks but can get in the way on occasion. As an example, you may have a classpath that is specified as a property in a compressed form—for instance, as a list of library names (dom4j, hibernate, spring) instead of as a full list of jar files. The code required (even with the AntContrib library) to build a classpath from such a list is horrendous, whereas in Groovy it can be specified very cleanly.

The <groovy> Ant task allows you to run Groovy code directly from an Ant file, either using a script file that is specified as a parameter to the task, or inline as the text content of the task. Listing 13.3 shows a simple Ant build file that calls a Groovy script included in the body of the build file.

Example 13.3. A simple Ant script running some Groovy code

<?xml version="1.0" ?>
<project name="groovy-test" default="test" >

  <taskdef name="groovy"
    classname="org.codehaus.groovy.ant.Groovy"
    classpath="groovy-all-1.0.jar"/>

  <target name="test">
    <groovy>
      println "Running in Groovy"
    </groovy>
  </target>

</project>

The easiest way to make Groovy available to Ant is with a <taskdef> that refers to the embeddable Groovy jar file. The Groovy script is run within a binding which knows about various Ant-specific properties, as shown in table 13.2.

Table 13.2. The properties available in the binding when running a Groovy script from Ant with the <groovy> task

Name in binding

Description

ant

An AntBuilder with knowledge of the current project

project

The project currently being built

properties

The current properties (can be modified)

target

The currently executing target

task

The task wrapping Groovy

Using the ant variable from the binding allows you to use an AntBuilder that is transparently aware of the enclosing Ant project and shares its properties, such as the basedir. Therefore it’s easy to use tasks such as copy, move, and delete, and to use filesets in general:

<groovy>
    def dirMap = ['old1': 'new1', 'old2': 'new2']
    dirMap.each {old, new -> ant.copy(dir: old, toDir: new) }
</groovy>

The project variable from the binding can also be useful, because project provides access to a number of interesting features, such as properties, references,[1] build listeners,[2] and task definitions. Suppose we want to implement a RulePrinter task in Groovy and add it to the project:

<groovy>
    class RulePrinter extends Task {
        def size = 40
        def symbol = '*'
        public void execute() { println symbol * size }
    }
    project.addTaskDefinition('ruler', RulePrinter)
</groovy>

<ruler/>
<ruler symbol="--8<-" size="10"/>

The usual way of implementing such an Ant custom task would have been with Ant’s <scriptdef> task. However, our solution is more elegant and demonstrates again Groovy’s seamless integration with any Java-based technology.

The properties property is particularly useful, because it allows you to use the same means of parameterization within your script as in the rest of your build file. Note that it lets you set the value of properties, which can be useful when setting the value involves applying some logic:

properties.'out' = properties.'user.dir'+System.currentTimeMillis()

Ant usually doesn’t modify the value of properties during the run of a build. Although doing so is technically possible, it is better to avoid this, in order to comply with the Ant property contract.

When writing code inline, you need to be careful about characters that have special meaning within XML—particularly angle brackets. It’s often easiest to use CDATA sections to avoid even having to worry about it. For example:

<groovy><![CDATA[
  println (Math.random() < 0.5 ? "Lower" : "Higher")
]]></groovy>

An alternative to using the <groovy> task directly is to use Ant’s own <script> task. Consult the Ant documentation for the options available. The language should be specified as "groovy".

Using Groovy in your Ant scripts gives you a way to execute arbitrary logic without resorting to compiling extra Ant tasks from Java. Although a build tool would otherwise need to build other tools before it can complete its build, it’s nice to have an ace up your sleeve such as Groovy.

Scripts are classes but different

One of the biggest misconceptions about Groovy is that a Groovy script will be interpreted line-by-line. This is not the case.

When a Groovy script gets executed, it is transformed into a class, and then the class is executed. This transformation happens transparently to the developer. However, the process has some consequences that are helpful to be aware of.

Script naming

First and foremost, a class must have a name. Groovy chooses to name your class by the filename (without the .groovy extension of course). So if you create a script with content

println x

and save it to a file named x.groovy, executing the script via groovy x gives you

class x

This can be surprising. You can lower the risk of such surprises by naming your script files like classes, with Pascal-cased names such as FileNameFinder.groovy rather than findFileNames.groovy.

As soon as you have the file x.groovy on the classpath, using the undeclared variable x in any of your scripts can produce some odd behavior, because x will then refer to the class x.

Script inclusion

Only in the simplest possible cases does all script code reside in one file. What is the Groovy way of including dependent files into a script? There are no include or require directives, unlike in some other scripting languages.

The compilation process from scripts to classes would make it difficult to allow a directive that does a literal inclusion of code that is stored in a dependent script file. The concept doesn’t fit into the Java world.

Instead, Groovy offers two alternatives:

  • Make your dependent script a declared class.

  • Evaluate the dependent script via evaluate(file).

We’ve used the first alternative many times throughout the book, even in the earliest examples. Do you remember the Book example that we started the whole Groovy adventure with? The Book class was declared in a file called Book.groovy. We then called it from a script using code such as

Book gina = new Book('Groovy in Action')

and called methods on the reference stored in the gina variable. No special directive is needed for finding the Book.groovy file. The only prerequisite is that Groovy must be able to find the file on the classpath and must be able to compile its content. If the lookup or the compilation fails, you’ll encounter a ClassNotFoundException.[3]

The second alternative is using the evaluate method that all scripts inherit from GroovyShell. The overloads for this method include passing it a File object to evaluate (see chapter 11). The evaluation of the file will work on the current binding: It can use variables from the binding, read and change their values, and add new variables to the binding. The evaluate method returns the value of the script’s last evaluated expression.

Let’s assume we have a smart configuration as in listing 13.4 to store a person’s preferences in nodes, dynamically constructed with NodeBuilder and mixed with iteration logic to assemble the hours when this person is supposed to appear at work. We save that script to a file named Preferences.groovy.

Example 13.4. Preferences.groovy as a smart configuration

def builder = new NodeBuilder()
builder.prefs(name:'Dierk') {
     language('Groovy')
     conference('http://www.waterfall2006.com')
     for (i in 9..17) {
         workingHour(i)
    }
}

The script does not have an explicit return statement, because that would be atypical for a script. The last evaluated expression serves this purpose, which is the prefs node. Of course, it is also valid to use an explicit return statement.

We have a second script in listing 13.5 that makes use of this smart configuration. It does so by using the evaluate method. Some assertions show how to access information from the smart configuration.

Example 13.5. Including the configuration as a dependent script

def prefs = evaluate(new File('Preferences.groovy'))

assert prefs.'@name' == 'Dierk'
assert prefs.workingHour*.value().contains(16)

For successful execution of listing 13.5, Preferences.groovy must be saved to the working directory. Because the filename is used to find the dependent script, this solution gets brittle in more complex scenarios. As soon as you have multiple scripts depending on each other, scripts being stored in subdirectories and so on, you are better off relying on declared classes and the classpath.

For the Geeks

If you are keen to work with dependent files but seek more flexibility, look at the JDK File API to set the parent of a file or use ClassLoader. getResourceAsStream to read the dependent file as a stream from the classpath and pass it to the evaluate method.

Now you know a few problems to avoid—but more positive examples are called for as well. In the next section, we will provide some pieces of code that are self-contained and can give you ideas during your own product development. They also give you opportunities for experimentation and enhancement.

Useful snippets

Here are some code snippets that you may find useful when programming in Groovy. They are aimed at being idiomatic. We will show you a novel use of closures, a neat way to modify text with regular expressions, a useful way of indicating progress in command-line applications, a useful tool to display execution results line by line, and some advanced uses of GStrings.

Shuffling a collection

Suppose you have a collection—a list, for example—and you would like to shuffle the content. For instance, you may have track numbers for your Groovy MP3 player and wish to create a random playlist. The Groovy variant of a solution that is often suggested for scripting languages is

[1, 2, 3, 4, 5].sort { Math.random() } // very questionable solution

This works the following way: when a closure that is passed to the sort method does not take two parameters (in which case it would have been used as a Comparator) then sort applies the closure to each element before comparing. Because we return random numbers each comparison has a random outcome.

Although this works, it is neither efficient nor guaranteed to be stable with all sort algorithms, nor does it deliver good results.

Programming in Groovy means you have the wealth of Java at your disposal, and thus you can use the shuffle method of java.util.Collections:

def list = [1,2,3,4,5]
Collections.shuffle(list)
println list

This solution is efficient and stable, and it leads leads to an even distribution of the shuffled object; each item has an equal probability of being shuffled to a given index.

We will reuse this functionality in the next example.

Scrambling text with regular expressions

You may have heard about the experiment where text remains readable even though the words in the text are scrambled, as long as the first and last character don’t change. Look at the following scrambled text. Can you read what it means?

Sarbmlce the inner crharatces of words
laenvig the text sltil reabldae for poeple but
not for cutoermps.

Listing 13.6 implements this scrambling process in Groovy.

Example 13.6. Scrambling the inner character of words

Scrambling the inner character of words

We use a regular expression to find all inner word characters. Then, replaceAll replaces all occurrences with the result of a closure that is fed the corresponding match. The match is converted to a list, shuffled, converted to a string, and returned. The regular expression for finding the inner characters of a word models the first and last character as a non-word-boundary (B) with one or more word characters (w+) in between.

The ability to use a closure to build the replacement value for a regular expression match is often very useful.

We proceed with other helpful examples of closures.

Console progress bar

Suppose you have a time-consuming task that you need to apply to every file in a directory. It would be helpful to get some information about the progress: how much has already been processed, how much is still left to do, and which file is currently being processed.

The output should not be longer than a single line on the console, showing updated information on-the-fly.

When started on the directory containing this book’s listings, this line may for example read

::::::::: AthleteDAO.groovy

in between be refreshed to

####::::: Mailman.groovy

and finally be

######### x.groovy

Note: This is all one single displayed line that is updated over time, like a normal progress bar. If you have used the wget command-line tool for fetching web content, you have seen the same kind of display there.

The processFiles method in listing 13.7 takes a closure argument called notify. This closure is notified whenever a new files starts being processed. This is equivalent to the Observer pattern.[4]

The processFiles method is called with a closure that updates the progress bar whenever it receives a notification. For simplicity, our processing only consists of sleeping a little, and processing is done for files in the current directory only.

Example 13.7. Printing a progress bar on the console

Printing a progress bar on the console

Of course, this snippet could be extended in a number of ways. However, even running this simple version on the console is fun and worthwhile.

We will look into more cool things you can do with the console in the next example.

Self-commenting single-steps

How about a snippet that reads a codebase and prints it to the console with an indication what each line evaluates to? Example output could look like this:

data = [0,1,2,3]         //-> [0, 1, 2, 3]
data[1..2]               //-> [1, 2]
data.collect { it / 2 }  //-> [0, 0.5, 1, 1.5]

Saving this output back to the original file would mean we have written a piece of code that is able to write comments about itself.

Listing 13.8 reveals how to achieve this. We split the code by line, ignore empty lines, print each line, and finally evaluate the line and print the result.

Example 13.8. Evaluating and printing line-by-line

Evaluating and printing line-by-line

But wait—didn’t we say that you cannot evaluate Groovy code line-by-line? Yes, and the example works only because data has no declaration, which Groovy takes as a hint to put it into the current binding. Each line is evaluated separately, but the binding is passed onto the GroovyShell that conducts the evaluation. The first line adds data to the binding; the second line reads data from the binding when getting the 1..2 range from it.

What would happen if the first line read List data = [0,1,2,3]? At that point, data would be a local variable in the script and so would not be added to the binding. The first line would still evaluate correctly, but the second line will fail because data would not be known in the scope of the GroovyShell that evaluates the second line.

That means that the applicability of our single-step printer is very restricted. However, it makes a good example to sharpen your understanding of scripts being classes rather than sequences of evaluated lines.

Advanced GString usage

In the majority of cases, GStrings are used for simple formatting with the placeholders resolved immediately, as in

println "Now is ${new Date()}"

GStrings have a special way in which they resolve any contained placeholders. At the time of the GString creation, they evaluate each placeholder and store a reference to the result of that evaluation within the GString object. At the time of transformation into a java.lang.String, each reference is asked for its string representation in order to construct the fully concatenated result.

In other words: Although the placeholder resolution is eager, writing the references is lazy. The interesting point comes when a placeholder reference refers to an object that changes its string representation over time, especially after the GString was constructed. There are a number of objects that behave like this, such as lists and maps that base their string representation on their current content. Listing 13.9 uses a list to demonstrate this behavior and a typical Groovy object that writes itself lazily: a writable closure.

Example 13.9. Writing GString content lazily

Writing GString content lazily

Note how the stanza GString Writing GString content lazily first works on the current values of count and data but changes its string representation when count and data change.

This behavior enables GStrings to be used as a lightweight alternative to Groovy’s template engines (see section 9.4).

One word of caution: You need to be extremely careful when using such dynamic GStrings as elements of a HashSet or as keys in a HashMap. In general, you should avoid doing so, because the hash code of the GString will change if its string representation changes. If the hash code changes after the GString has been inserted into a map, the map cannot find the entry again, even if you present it with the exact same GString reference.

Writing idiomatic Groovy is one side of working with the language instead of fighting against it. Another side is using the tools provided as effectively as possible. In the next section, we will give more information on the groovy tool used to run scripts and classes.

Using groovy on the command line

While working through the book, you have used the groovy command to execute Groovy programs and scripts. It has some additional options to use it on the command line or as a client-server program. We will explore the evaluation of short scripts specified on the command line, processing text files line-by-line, setting up very simple servers, and performing in-place file modifications.

Table 13.3 lists the command-line options for the groovy command.

Table 13.3. Command-line options for the groovy tool

Option

Argument

Meaning

-c, --encoding

Character encoding

Specify the encoding of the files

-d, --debug

 

Debug mode will print out full stack traces

-e

Text to execute

Specify an in-line command-line script

-h, --help

 

Usage information

-i

Extension

Modify files in place

-l

Port

Listen on a port, and process inbound lines

-n

 

Process files line by line

-p

 

Process files line by line, and print the result

-v, --version

 

Display the Groovy and JVM versions

The -c/--encoding, -d/--debug, and -v/--version options are self-explanatory. The other options will be demonstrated by example. But first, let’s try running a short script.

Evaluating a command-line script

The -e option (e stands for evaluate) lets you pass one-line scripts to groovy on the command line as well as pipe output from one command or script as input to the groovy command. It is similar to the -e option in Perl, Ruby, and other languages.

A simple one-liner using -e follows. This script prints the vendor of the JVM in which Groovy is running, using the java.lang.System class to retrieve the java.vendor System property value:

> groovy -e "println System.properties.'java.vendor'"

Sun Microsystems Inc.

Note the enclosing quotes around the script: When using –e to pass scripts to groovy on the command line, make sure you enclose the script in single or double quotes so that the command or shell interpreter in which you are running (cmd on Windows or bash on UNIX, for example) does not interpret the contents of your Groovy script as commands or wildcards for itself.

Here is an example demonstrating piping the output of one Groovy script to another Groovy script that takes it as input and transforms the characters to uppercase. Enter the whole input in one line:

> groovy -e "println System.properties.'java.vendor'" |
  groovy -e "println System.in.text.toUpperCase()"

SUN MICROSYSTEMS INC.

You can also do this with native operating system commands, of course.

If you pass additional arguments on the command line, they are available to the script in the args variable. That means you can, for example, count lines in a file like so:

> groovy -e "println new File(args[0]).readLines().size()" jokes.txt

1024

Alternatively, you could print a random joke:

> groovy -e "lines = new File(args[0]).readLines();
  println lines[(int)(lines.size()*Math.random())]" jokes.txt

A horse goes into a bar ... "Hey buddy, why the long face?"

So far, so good—but we’re not making particularly extensive use of the piping feature of most shells, where the result of one operation can be the input to the next. That’s just one of the uses for the options we deal with next.

Using print and line options

The -e option becomes more interesting when combined with other options. The -p (print) and –n (line) options tell groovy to create an implicit variable named line from each line of input the groovy command receives from standard input. Standard input may be sourced from a pipe or from files given as trailing command-line arguments.

The line variable is useful when you want to do something for each line of input rather than for the text of the input stream as a whole.

Assume there is a file example.txt in the subdirectory data containing

line one
line two
line three

You can cut off the line prefix with

> groovy -pe "line-'line '" dataexample.txt
one
two
three

The -p option is essentially the same as -n, except it ensures that the result of processing each line is printed to the console (it is an implicit println for each line processed), whereas with -n you need to explicitly specify print or println for anything you want to output.

This can be helpful when filtering, for example, the directory entries for a given date:

> dir | groovy -ne "if (line.contains('05.02.06')) println line"
05.02.06  17:48    <DIR>          .
05.02.06  17:48    <DIR>          ..
05.02.06  14:16               272 BraceCounter.groovy

Here’s a second example for system administrators, which uses the input redirection capabilities of your command shell with the < sign. In a cygwin shell, you might do something like this:

> groovy -ne 'if (line =~ /dierk/) println line' < /etc/passwd
dierk:unused_by_nt/2000/xp:...:/home/dierk:/bin/bash

Note how the examples read from different input sources: from a file given on the command line, from the piped output stream of the dir command, and from streams redirected by the shell. On the command line, you always have a close interaction with your command shell.

Using the listen mode

The –l (listen) option lets you run a Groovy script in client-server mode. You execute a script (using –e or specifying a file to execute), and Groovy starts a simple server on port 1960 (by default; you may override the port setting if you choose). You can then connect to that server via a telnet application, for example, and run the script or pass arguments to the script for it to process and return results to your client.

Note

Case in point: Jeremy Rayner, one of the core Groovy developers, wrote a simple HTTP server[5] in less than 75 lines of Groovy code!

Here is an example of a tiny script that looks up and returns the IP address of any hostname it receives. You will need two console windows for this example, one for the server and one for the client. First start the server. By default, the server will start on port 1960, but you can specify any unused port on the command line after the –l option. We’re using port 5000 here:

> groovy -l 5000 -e "println 'ip address: ' +
  InetAddress.getByName(line).hostAddress"

groovy is listening on port 5000

Now the server is running, has opened a socket, and is listening for input on port 5000. Run a telnet client to connect to the server, and send it some hostnames to look up:

> telnet localhost 5000
Trying ::1...
Connected to localhost.
Escape character is '^]'.
localhost
ip address: 127.0.0.1
java.sun.com
ip address: 209.249.116.141
manning.com
ip address: 64.49.223.143

Line-oriented client-server programming could hardly be simpler.

In-place editing from the command line

Finally, the –i (in-place edit) option is used when you want your Groovy script to iterate over a file or list of files, modifying them in place and, optionally, saving backups of the original files. Here is an example that goes through all *.java files in the current directory and replaces author tags in the Javadoc such that Dierk’s full name appears instead of his nickname. For every file, a backup is generated with a .bak extension:

> groovy -p –i .bak -e
  "line.replaceAll('@author Mittie','@author Dierk Koenig')" *.java

If you do not provide a backup extension, no visible backup file will be generated. The “visible” part is necessary for accuracy’s sake because behind the scenes, Groovy creates a backup anyway in your personal temporary folder and deletes it when finished normally. So, in the worst case, such as when your power supply is interrupted in the middle of such an operation and your working file is corrupted, you can still recover it from the temporary folder. However, providing a backup extension is the safer choice.

Note

You can collapse option sequences such as collapsing –p –e to –pe as long as, at most, the last one of these options takes an additional parameter. So groovy –pie will not work as expected because this is interpreted as using e for an extension (because it’s trailing after i). Additional parameters can be appended with or without whitespace, so –i.bak and –i .bak are both valid.

That’s it for the numerous options that groovy can be started with. If you come from Ruby or Perl, they probably look familiar.

Now that you can write useful scripts, you can use them to handle minor chores you have to perform time and time again. Our next section helps to smooth the process of automating away annoyance.

Writing automation scripts

A software developer’s range of responsibilities includes many activities that require monitoring either constantly or on a repetitive schedule. Is the web server still running? Is the latest state on the build server OK? Is there so much data in the spam folder that it needs to be cleaned up? Did some prospect download an evaluation copy of our product?

You can easily feel like a juggler who spins as many plates as possible and merely keeps them from falling down. Figure 13.1 suggests that life would be easier if there were some device that would take care of keeping the plates spinning without our constant attention.

Keeping the plates spinning with lots of scheduled scripts

Figure 13.1. Keeping the plates spinning with lots of scheduled scripts

Groovy is well suited to writing those little “house-elf” scripts that automate our daily work. We will go through some issues that are special to command-line scripts, explore the support provided by Groovy, and visit a series of examples. In particular, we examine the simple processing of command-line options, starting Java programs with the minimum of fuss, and scheduling tasks for delayed or repeated execution.

Supporting command-line options consistently

Helper scripts are often started automatically from a scheduler such as cron or at, or as a service. Therefore, they have no graphical user interface but receive all necessary configuration on the command line. Starting a script generally looks like this:

> groovy MyScript –o value

where –o value stands for assigning value to the o option. This is a standard way of dealing with command-line options that users expect nowadays, and Groovy supports it in its libraries.

The standard option handling

An option can have a short name and a long name, where the short name consists of only one character. Short options are tagged on the command line with a single dash, such as -h; long names use two dashes, such as --help. Most options are optional, but certain options may be required.

Options may have zero, one, or multiple trailing arguments such as filename in –f filename. Multiple arguments may be separated by a character. When the separation character is a comma, this looks like --lines 1,2,3.

When the user enters an invalid command, it is good practice to give an error indication and print a usage statement. Options may be given in any sequence, but when multiple arguments are supplied with an option, they are sequence dependent.

If you had to re-implement the option-parsing logic for every script, you would probably shy away from the work. Luckily, there’s an easy way to achieve the standard behavior.

Declaring command-line options

Groovy provides special support for dealing with command-line options. The Groovy distribution comes with the Jakarta Commons command-line interface (CLI).[6] Groovy provides a specialized wrapper around it.

The strategy is to specify what options should be supported by the current script and let the CLI do the work of parsing, validating, error handling, and capturing the option values for later access in the script.

The specification is done with CliBuilder. With this builder, you specify an option by calling its short name as a method on the builder, provide a map of additional properties, and provide a help message. You specify a help option, for example, via

def cli = new CliBuilder()
cli.h(longOpt: 'help', 'usage information')

Table 13.4 contains the properties that you can use to specify an option with CliBuilder.

Table 13.4. CliBuilder option properties

Property name

Type

Meaning

argName

String

Alias for being more descriptive when looking up values

longOpt

String

The long name for the option as used with doubled dashes

required

boolean

Whether the option is required; default: false

args

int

Number of arguments for this option; default: 0

optionalArg

boolean

Whether there is an optional argument; default: false

type

Object

Type of the argument

valueSeparator

char

The character to use for separating multiple arguments

When the options are specified to the builder, the Groovy command-line support has all the information it needs to achieve the standard behavior. CliBuilder exposes two special methods:

  • parse(args) to parse the command line

  • usage() to print the usage statement

We will explain each of these before embarking on a full example.

Working with options

Letting CliBuilder parse the command-line arguments is easy. Just use its parse method, and pass it the arguments the script was called with. Groovy puts the list of command-line arguments in the binding of the script under the name args. Therefore, the call reads

def options = cli.parse(args)

with options being an OptionAccessor that encapsulates what options the user requested on the command line. When parsing fails, it prints the usage statement and returns null. If parsing succeeds, you can ask options whether a certain option was given on the command line—for example, whether –h was requested—and print the usage statement if requested:

if (options.h) cli.usage()

The options object is a clever beast. For any option x, the property options.x returns the argument that was given with –x somearg. If no argument was supplied with –x, it returns true. If –x was not on the command line at all, it returns false.

If an argName such as myArgName was specified for the x option, then options.x and options.myArgName return the same value.

If the x option is specified to have multiple arguments, the list of values can be obtained by appending an s character to the property name—for example, options.xs or options.myArgNames.

Finally, options has a method arguments to return a list of all arguments that were trailing after all options on the command line.

Let’s go through an example to see how all this fits together.

The Mailman example

Assume we set out to provide a Groovy command-line script that sends a message via email on our behalf. Our Mailman script should be reusable, and therefore it cannot hard-wire all the details. On the command line, it expects to get information about the mail server, the mail addresses it should use, the text to send, and optionally the mail subject.

Here is how a casual user can request the information about the script and its options:

> groovy Mailman -h
error: sft
usage: groovy Mailman -sft[mh] "text"
 -f,--from <address>     from mail address (like [email protected])
 -h,--help               usage information
 -m,--subject <matter>   subject matter (default: no subject)
 -s,--smtp <host>        smtp host name
 -t,--to <address>       to address (like [email protected])

The user will also see this output whenever they pass options and arguments that are incomplete or otherwise insufficient.

Listing 13.10 implements the script starting with a specification of its command-line options. It proceeds with parsing the given arguments and using them for instrumenting the Ant task that finally delivers the mail.

Example 13.10. Mailman.groovy script using CliBuilder

Mailman.groovy script using CliBuilder

There are multiple aspects to consider about listing 13.10. It shows how the compact declarative style of CliBuilder not only simplifies the code, but also improves the documentation as well: better for the user because of the instant availability of the usage statement, and better for the programmer because of the inherent self-documentation.

The multiple uses for documentation, parsing, and validation pay off after the initial investment in the specification. With this support in place, you are likely to produce professional command-line interfaces more often.

Providing command-line options is one part of starting a program, but you won’t get very far if the program can’t find all the classes it requires. Next, you will see how Groovy helps you with that perennial Java bugbear, the classpath.

Expanding the classpath with RootLoader

Suppose you’d like to start a script using groovy MyScript but your script code depends on libraries that are not on the default classpath (<GROOVY_HOME>/ lib/*.jar and <USER_HOME>/.groovy/lib/*.jar).

In this case, you’d need to set the classpath before calling the script, just like you need to do for any Java program.

Starting Java is considered tricky

When starting a Java program, you have to either make sure your CLASSPATH environment variable is set up correctly for specifically this program or you have to pass the classpath command-line option to the java executable.

Either way is cumbersome, requires a lot of typing, and is hard to remember how to do correctly. The common solution to this problem is to write a shell script for the startup. This works but requires knowledge about yet another language: your shell script language (Windows command script or bash).

Java is platform independent, but this value is lost if you cannot start your program on all platforms. When trying to provide startup scripts for all popular systems (Windows in its various versions, Cygwin, Linux, Solaris), things get complex. For examples, look at Ant’s various starter scripts in <ANT_HOME>/bin.

All the work is required only because a Java program cannot easily expand the classpath programmatically to locate the classes it needs. But Groovy can.

Groovy starters

Groovy comes with a so-called RootLoader, which is available as a property on the current classloader whenever the Groovy program was started by the groovy starter. It is not guaranteed to be available for Groovy code that is evaluated from Java code.

That means the RootLoader can be accessed as

def loader = this.class.classLoader.rootLoader

The trick with this is that it has an addURL(url) method that allows you to add a URL at runtime that points to the classpath entry to add, for example, the URL of a jar file:

loader.addURL(new File('lib/mylib.jar').toURL())

Sometimes it is also useful to know what URLs are currently contained in the RootLoader, such as for debugging classloading problems:

loader.URLs.each{ println it }

With this, you can easily write a platform-independent starter script in Groovy. Let’s go through a small example.

We need a Groovy script that depends on an external library. For the fun of it, we shall use JFugue, an open-source Java library that allows us to play music as defined in strings. Download jfugue.jar from http://www.jfugue.org, and copy it into a subdirectory named lib.

Listing 13.11 contains an example that uses the JFugue library to play a theme from Star Wars. Save it to file StarWars.groovy.

Example 13.11. StarWars.groovy uses the JFugue external library

import org.jfugue.*

def darthVaderTheme = new Pattern('T160 I[Cello] '+
     'G3q G3q G3q Eb3q Bb3i G3qi Eb3q Bb3i G3hi')

new Player().play(darthVaderTheme)

To start this script, we would normally need to set the classpath from the outside to contain lib/jfugue.jar. Listing 13.12 calls the StarWars script by making up the classpath. It adds all jar files from the lib subdirectory to the RootLoader before evaluating StarWars.groovy.

Example 13.12. Starting JFugue by adding all *.jar files from lib to RootLoader

def loader = this.class.classLoader.rootLoader

def dir = new File('lib')
dir.eachFileMatch(~/.*.jar$/) {
    loader.addURL(it.toURL())
}
evaluate(new File('StarWars.groovy'))

With this functionality in place, you can easily distribute your automated player together with the libraries it depends on. There is no need for the user to install libraries in their <USER_HOME>/.groovy/lib directory or change any environment variables.

Also, everything is self-contained, and the user is less likely to run into version conflicts with the external libraries.

If you use dependency resolution packages such as Maven[7] or Ivy,[8] you can directly refer to their downloaded artifacts. Groovy may provide even more sophisticated support for this scenario in the future.

We’ve been trying to lower the difficulty level of starting Groovy programs, and we’ve made it simple to start them from the command line. The next obvious step is to make programs so simple to run that the user doesn’t even need to use the command line.

Scheduling scripts for execution

Automation scripts really shine when running unattended on a background schedule. As the saying goes, “They claim it’s automatic, but actually you have to press this button.”

There are numerous ways to schedule your automation scripts:

  • Your operating system may provide tools for scheduled execution. The standard mechanisms are the cron scheduler for UNIX/Linux/Solaris systems and the at service on Windows platforms. The downsides with these solutions are that you might not be authorized to use the system tools and that you cannot ship a system-independent scheduling mechanism with your application.

  • The Java platform supports scheduling with the Timer class. It uses an implementation based on Java threads and their synchronization features. Although this cannot give any real-time guarantees, it is good enough for many scenarios and scales well.

  • There also several third-party scheduler libraries for Java, both open-source and commercial. The Quartz scheduler is a well-known example, and one that is supported in Spring. It’s available from http://www.opensymphony.com/quartz/. Of course, the cost of using advanced features tends to be higher complexity.

  • Roll your own scheduler with the simplest possible means.

In a lot of scenarios, it is sufficient to schedule an execution like so:

while(true) {
    println "execution called at ${new Date().toGMTString()}"
    // call execution here
    sleep 1000
}

Remember that unlike in Java, the Groovy sleep method really sleeps at least a second, even if interrupted (see section 9.1.2).

Listing 13.13 extends this simple scheduling to a real-life[9] scenario. A task should be scheduled to run all working days (Monday through Friday) at office hours (08:00 a.m. to 06:00 p.m.). Within this timeframe, the task is to be started every 10 minutes.

Example 13.13. Scheduling a task for every 10 minutes during office hours

def workDays    = Calendar.MONDAY..Calendar.FRIDAY
def officeHours = 8..18

while(true) {
    def now = new Date()
    if (
        workDays.contains(now.day)       &&
        officeHours.contains(now.hours)  &&
        0 == now.minutes % 10
    ) {
        println "execution called at ${now.toGMTString()}"
        // call execution here
        sleep 31 * 1000
    }
    sleep 31 * 1000
}

The purpose of sleeping 31 seconds is to make sure the check is performed at least once per minute. The extra sleep after execution is needed to avoid a second execution within the same minute.

The solution in listing 13.13 is certainly not suited for scheduling at the granularity of milliseconds. It is also not perfect, because it uses deprecated Date methods.[10] However, it is sufficient for the majority of scheduling tasks, such as checking the source code repository for changes every 10 minutes, generating a revenue report every night, or cleaning the database every Sunday at 4:00 a.m.

We’ve examined how to make scripts easy to run and easy to schedule, but we’ve said little about the kinds of things you might want such a script to do. Our next section gives a few examples to whet your appetite.

Example automation tasks

We couldn’t possibly tell you what your automation needs are. However, many tasks have similar flavors. By giving you a few examples, we hope we’ll set some sparks going in your imagination. You may have a moment where you spot that a repetitive task that has been getting under your skin could easily be automated in Groovy. If that’s the case, feel free to rush straight to your nearest computer before you lose inspiration. We’ll wait until you’ve finished.

Still here? Let’s roll up our sleeves and get groovy.

Scraping HTML pages

The web is not only full of endless information, but it is also full of interesting new and updated information. Regularly visiting your favorite pages for updated content is one of the plates you need to keep spinning. It’s easy to delegate this task to a Groovy script.

The script needs to

  1. Connect to a URL.

  2. Read the HTML content.

  3. Find the interesting information in the HTML.

Finding the information of interest is the tricky part, because HTML source code can be complex. Also, our script should be forgiving in terms of whitespaces, attribute sequences, quoting of attribute values, and so on. In other words, we cannot use regular expressions to cut the information out of the source code.

If we could work in XML rather than HTML, we could use an XML parser and GPath or XPath expression to scrape off the interesting parts reliably.

By the Way

The term scraping stems from olden times when users were faced with a 25x80 character terminal screen. New automation features could be added by reading characters off this screen. This technique was called screen scraping.

The good news is that there are free open-source parsers that read HTML and expose the content as SAX events such that Groovy’s XML parsers can work with it. The popular NekoHTML parser can be found at http://people.apache.org/~andyc/neko/doc/index.html. Download it, and copy its jar file to the classpath.

As an example, consider analyzing the HTML page of http://java.sun.com as captured in figure 13.2. Let’s assume we’re interested in the news items, or everything that appears as links in bold type. For the screen shown in figure 13.2, our script should print

Developing Web Services Using JAX-WS
More Enhancements in Java SE 6 (Mustang)
"Get Java" Software Button Now Available
Gosling T-Shirt Hurling Contest
Screenshot of http://java.sun.com to scrape information off

Figure 13.2. Screenshot of http://java.sun.com to scrape information off

The links in bold appear in the page’s HTML source like this (pretty-printed):

<B>
  <A href="http://logos.sun.com/spreadtheword/">
    "Get Java" Software Button Now Available
  </A>
</B>

Listing 13.14 shows the surprisingly compact solution to extract this data.

Example 13.14. Scraping news off the Java homepage

import org.cyberneko.html.parsers.SAXParser

def url = 'http://java.sun.com'

def html = new XmlSlurper(new SAXParser()).parse(url)

def bolded = html.'**'.findAll{ it.name() == 'B' }
def out = bolded.A*.text().collect{ it.trim() }
out.removeAll([''])
out[2..5].each{ println it }

We only need to wrap the NekoHTML SAXParser with the Groovy XmlSlurper. With the help of the slurper, we find all B nodes and their nested A nodes. Finally, we trim surrounding whitespace and remove empty links for nicer output.

Of course, if the web site offers XML datafeeds such as RSS or ATOM, or even as web services, then it’s more reliable to use those. See chapter 12 for more details. But think about all those web pages that have no such luxury, but still convey important information: webmail clients, web server administration pages, web-based planning tools, calendaring systems, conference pages, project build information, and so forth. The list is literally endless.

In combination with a task scheduler, you can use this approach to regularly check whether your server is alive and kicking. If it doesn’t respond in a timely manner or contains an error indication in the page, you can send a notification to the admin.

Reading HTML is nice, but how about clicking links and submitting forms? We’ll show that next.

Automating web actions

HTML-based web applications are perfect candidates for automating all the actions that you would do manually otherwise. Think about the steps you repeatedly take in web applications: filling in your daily timesheet, updating the project plan, synchronizing with the address database, posting your current location to the corporate intranet, and so on.

To automate these steps, you can download HtmlUnit[11] from http://htmlunit.sourceforge.net/ and put its jars on the classpath. HtmlUnit was originally designed for testing web applications and thus developed all the means to operate them. We will only use the operation controls here.

Our example of an interactive web interface is the ubiquitous Google search form, as shown in figure 13.3, with search results for “Groovy” in figure 13.4.

The Google search form when searching for “Groovy”

Figure 13.3. The Google search form when searching for “Groovy”

Top three Google search results for “Groovy”

Figure 13.4. Top three Google search results for “Groovy”

Our example is a basic interaction, but nevertheless it contains all the steps for automated web actions:

  • Starting at an initial page

  • Filling an input field in a web form

  • Submitting the form

  • Working on the results

From the results, we filter the top three hits and report them as follows:

http://groovy.codehaus.org/    : Groovy - Home
http://www.groovy.de/          : Groovy.de - Headshop Growshop...
http://www.jeronimogroovy.com/ : JERONIMO GROOVY RADIO

Listing 13.15 uses HtmlUnit to achieve this. With a newly constructed WebClient, it gets the starting page with the Google URL. From the page, it reads the form and input field by the names they are tagged with. The input field is filled and the form submitted. The form submission returns the result page with the main result anchors having the class attribute 'l'.

Example 13.15. Finding the top three hits in Google

import com.gargoylesoftware.htmlunit.WebClient

def client = new WebClient()
def page   = client.getPage('http://www.google.com')
def input  = page.forms[0].getInputByName('q')
input.valueAttribute = 'Groovy'
page       = page.forms[0].submit()

def hits   = page.anchors.grep { it.classAttribute == 'l' } [0..2]
hits.each  { println it.hrefAttribute.padRight(30) + ' : ' +
             it.asText() }

HtmlUnit offers a lot of sophisticated features. It also includes NekoHTML and can deliver the current pages asXml, allowing Groovy to fully leverage its XML support. It can also deal with a wide range of JavaScript content and present the DOM for XPath processing. See its API documentation for details.

All this makes it an ideal companion to Groovy when implementing a remote control for web applications.

Inspecting version control

One nice feature of version-control systems such as Concurrent Versioning System (CVS) or Subversion (SVN) is that they come with command-line clients. This makes them ideal candidates for inspection by Groovy scripts.

Let’s go through a CVS example. CVS comes with a command-line client that supports a variety of options. You can achieve almost everything with these options, but sometimes you need a little more. For example, when trying to find out who accessed the repository since a certain date, you can use the history command:

cvs history -a -e -D 2006-02-04

But that prints too much and in a rather cryptic way, with countless lines such as

M 2006-02-03 23:52 +0000 denis 1.8 website_base.css ...

It would be nice if a Groovy script could consolidate the output into something that displays the information as a summary:

2006-02-04   cruise  update: delete
2006-02-04   denis   commit: modified
2006-02-04   marc    update: delete
2006-02-04   paul    commit: add
2006-02-04   paul    commit: modified

This summary tells you that the user named cruise[12] updated from the repository, deleting a local file as a result. You can see who accessed the repository and the resulting operations.

The output is the result of running listing 13.16 against the CVS repository of the open-source Canoo WebTest project. It issues the cvs command and processes the output line by line. Each line is split on whitespace, and the fields of interest are extracted and joined to a string. Each string is put into a HashSet, which has the effect of removing duplicates. The result set is finally printed in a sorted order.

Example 13.16. Summarizing cvs command output for access surveillance

Summarizing cvs command output for access surveillance

Some aspects in listing 13.16 are particularly Groovy in style. The command is first executed in an extremely simple and readable fashion. The output of the resulting process is processed line by line Summarizing cvs command output for access surveillance. Using list and string operations, including a literal list, we transform the raw output into the more readable format Summarizing cvs command output for access surveillance.

Note that we could have replaced the last two data extraction lines with a single line of code:

result << "${fields[1]}	${fields[4]}	${codes[fields[0]]}"

This would have been even shorter, because it saves one line. However, it doesn’t read as declaratively, because it mixes the concerns of field selection and presentation. Changing either the fields to be displayed or the formatting of those fields is a simple task in the original code, requiring no duplication.

Pragmatic code analysis

In our consulting work, we’re asked to do code reviews every now and then. We even review and analyze code of our own projects regularly. In the course of this activity, we’ve learned to value pragmatic tools that work on any codebase.

When reviewing, you need a starting point. A good move is to assemble some statistical data such as the number of files per directory, files sizes, line count per file, and so on for that purpose. It’s amazing how much you can tell about a project from this data. Put it in a spreadsheet, and generate charts for the various dimensions. You will soon spot the hot candidates for review.

There are some helpful measures (we wouldn’t dare to call them metrics) that you can assemble with the help of Groovy. For example, it helps to know the revision number of each file in the version-control system. Unusually high revision numbers can indicate a problematic area, just like files with the most conflicts (see the previous section).

Listing 13.17 points to another interesting measure: maximum nesting depth of braces. It’s a pragmatic approach, because it doesn’t use a real parser for the language and may thus be slightly off when braces occur in comments or strings. However, the solution can be applied to a wide range of languages and gives a good indication of complexity.

Example 13.17. Finding the maximum brace nesting depth

def source = new File(args[0]).text

def nesting = 0
def maxnest = 0

for (c in source) {
    switch (c) {
        case '{' : nesting++
                   if (nesting > maxnest) maxnest++
                   break
        case '}' : nesting--
                   break
    }
}
println maxnest

When applying this measure to Groovy code, you can expect higher numbers than for Java, due to the usage of braces in builders, closures, and GPath expressions. On the other hand, the line count is likely to be significantly lower!

More points of interest

There is a huge list of external libraries that are specifically helpful when used together with Groovy scripts.

First, automation often sends notifications. There are Ant tasks for this purpose, but for fine-grained control, you can use the JavaMail[13] API that comes as an external package of the JRE (javax.mail). Via mail gateways, you can also send text messages to a cell phone.

When automation is used for periodic reporting, libraries for producing graphs and charts are useful. You will find many of these on the Web, such as Snip-Graph, JCCKit, and JFreeChart. They all work from a textual representation of data, and using them with Groovy is therefore easy. Groovy templates can make the production of such text input files much simpler.

Reporting can also mean producing Microsoft Office documents. When running on a Windows platform, you can relay such tasks to Groovy’s Scriptom module, which we will describe in chapter 15. There also are platform-independent solutions with restricted functionality that may nonetheless be sufficient for your needs: POI (http://jakarta.apache.org/poi) for Office documents, and for spreadsheets in particular, JExcelApi (http://jexcelapi.sourceforge.net).

A variety of projects implement customized Groovy support. You can find the list at http://groovy.codehaus.org/Related+Projects. There is, for example, special support for the Lucene search engine. Running its indexer repeatedly would be a typical automation task.

When reports are to be published on the Web, using a Groovy-enabled Wiki can be handy, because the pages can contain Groovy code to update themselves. Currently, the known implementations[14] are Biscuit, SnipSnap, and XWiki.

The Groovy developers provide specialized modules for making particularly interesting libraries more groovy. Have a look at the modules section at http://groovy.codehaus.org. For example, you will find Groovy support for Google’s calendaring package, allowing constructions such as

import org.codehaus.groovy.runtime.TimeCategory

use(TimeCategory) {
    Date reminder = 1.week.from.now
}

and expressions such as 2.days + 10.hours, basically allowing convenient definitions of dates, timestamps, and durations as usual Java objects. Over time, such a module may be promoted to the Groovy distribution.

Because you can use any Java library, there is an endless list of possibilities. The goal was to trigger your curiosity and make you think about the wide range of applicability.

Next, we will go through various aspects of making your life as a Groovy programmer easier.

Laying out the workspace

When your work with Groovy only encompasses writing a few little scripts, it is sufficient to use an all-purpose text editor. Groovy doesn’t force you to use big programs for small tasks.

However, as soon as you start developing more elaborate programs in Groovy, you will benefit from using one of the IDEs mentioned in section 1.5. The benefit comes not only from the available Groovy plug-ins but also from the general Java programming support: integration of version-control clients, local versioning, browsing dependent Java libraries, search and replace, classpath management, and so on.

This section collects some hints for how to make your daily programming life with Groovy easier using the features of any Java IDE. It explains how to create a comfortable environment for working with Groovy, describes how to use Java debuggers and profilers with Groovy code, and discusses the current state of the Groovy refactoring landscape.

IDE setup

As soon as you step into serious Groovy programming, you should look at the available IDE plug-ins and select the one of your choice.

Make sure you have your JDK configured to also include the JDK source and its API documentation. Unlike Java, Groovy plug-ins cannot always provide you with instant code completion. Therefore, you will look up JDK classes and methods more often than you are used to when programming Java. With a proper setup, such a lookup by name can still be efficient.

Most IDEs support the notion of a library that assembles jar files, classes, resources, source code, and API documentation of a common purpose. Create such a library for Groovy, including the Groovy source tree. This enables you to quickly look up important information such as GDK methods. For example, you could run a search for method definitions of the name eachFile*.

Note that Groovy comes with a comprehensive suite of unit tests. Most of these are written in Groovy. This is also a good source of information to have around when programming.

What is true for the JDK and the Groovy distribution is also true for any other external library. The better your setup and the more complete your local information, the less time you will spend scanning through external documentation.

If your IDE can include a Java decompiler such as JAD, get it. It helps a lot when decompiling class files that were generated by groovyc. Make sure the decompilation writes into a directory that will not be used to pick up source files for your next run or compile operation.

IDEs often support a mechanism to break the whole source tree into modules or projects with an option to define their dependencies. In case of a mixed Groovy/Java project, you can use this feature to avoid compile problems with mutual dependencies. For example, you can have three modules: a Groovy-only module, a Java-only module, and a module of shared Java interfaces.

Figure 13.5 illustrates the dependencies of the Groovy and Java modules to the shared interface module.

Groovy and Java source modules depending on a shared interface module

Figure 13.5. Groovy and Java source modules depending on a shared interface module

This setup ensures that you can compile either module and the whole project easily. Once you have your code compiling, you’ll want to run it sooner or later—and sometimes that will mean running it in a debugger.

Debugging

Debugging is the act of removing bugs from the code. Some people claim that this implies that programming is the act of putting them in.

The best advice we can possibly give about debugging is advice on how to avoid it. The need for debugging is drastically reduced when solid unit testing is in place and when code is created in a test-first manner.

The next best approach is to make wise use of assertions throughout your code, making it fail early for obvious reasons.

The debugging tool that everybody uses every day is putting println statements in the code under development. This is certainly helpful, and Groovy makes it a workable way of debugging, because transparent compiling and instant class-reloading lead to quick coding cycles.

However, don’t fall into the trap of leaving println statements in the code after debugging is done. Don’t even put them in comments. Depending on the purpose of the line, you can change it into an assertion or into a log statement.

By the Way

Consistent use of logging makes debugging much easier. In Groovy, you can use the same mechanics for logging as for any other Java code running on JDK 1.4. See the JDK documentation for details.

Generally, debugging offers a chance to learn something new about your code and improve it. After finding the bug, you can ask yourself how you could have found it earlier or could have avoided it altogether: what log statement would have helped, what assertion, what unit test, and how could the wrong behavior have been more visible in the code?

Until then, you first have to locate the bug.

Exploiting groovyc

When you get errors from Groovy scripts, precompilation with groovyc can provide more detailed error messages, especially when you’re working with multiple dependent scripts. In this case, use groovyc on all scripts.

When things get really tricky and you suspect Groovy is parsing your code incorrectly or producing bad constructions from it, you can use the properties listed in table 13.5 to make groovyc produce more artifacts. Set the environment variable JAVA_OPTS to the appropriate value before calling groovyc.

Table 13.5. System properties that groovyc is sensitive to

JAVA_OPTS

Purpose

Destination

-Dantlr.ast=groovy

Pretty printing

Foo.groovy.pretty.groovy

-Dantlr.ast=html

Writing a colored version of source, with each AST node as a rollover tooltip

Foo.groovy.html

-Dantlr.ast=mindmap

Writing the AST as a mind map (view with http://freemind.sf.net)

Foo.groovy.mm

The pretty-printer gives a first indication of possible misconceptions about the nesting structure in the code. This can easily occur when you’re using single-statement control statements without braces, such as

if (true)
    println "that's really true"

If you later add more lines to the if-block but forget to add the braces that are then needed, you end up with an error that you can spot by looking at the pretty-print.

The pretty-printer is an obvious candidate to integrate in the Groovy IDE plug-ins.

The HTML and mindmap options in table 13.5 are two other interesting views on the Abstract Syntax Tree (AST) that the Groovy parser creates from your source code. The HTML view is rather conventional, but the mindmap allows you to navigate and expand/collapse the AST nodes. Figure 13.6 shows the AST mindmap for the brace-matching analyzer in listing 13.17.

Mindmap AST of listing 13.17 expanded on a for loop

Figure 13.6. Mindmap AST of listing 13.17 expanded on a for loop

So far, we have assessed only the static aspects of the code. We will now look into the live execution of the code.

Groovy runtime inspection

Groovy’s MetaClass concepts allow a full new range of options when inspecting code for debugging purposes. Because all method calls, the method dispatch, dynamic name resolution, and the property access are funneled through this device, it makes an ideal point of interception.

You came across the usage of MetaClasses and the TracingInterceptor in section 7.6.3. The TracingInterceptor can be engaged by attaching it to a ProxyMetaClass that acts as a decorator over the original one. This results in a non-intrusive tracing facility—one that doesn’t change the code under inspection.

Revisit the code in chapter 7 for more examples.

Another groovy way to do live-system debugging is integration of an inspection capability. Examples of this are Grash, a shell-like inspection utility,[15] and the ULC Admin Console.[16] Although there are security implications in providing these capabilities in your applications, the potential for diagnosing problems that may occur in the field is immense.

Using debugging tools

Groovy runs inside the Java Virtual Machine as ordinary Java bytecode. The bytecode is constructed such that it contains all information required by the Java Platform Debugger Architecture (JPDA). In other words, you can use any JPDA-compliant Java debugger with Groovy and get Groovy source-level debugging!

The debugger that ships with your preferred Java IDE is most likely JPDA compliant. For graphical standalone debuggers, good experiences have been reported with JSwat,[17] which shines when it comes to Groovy source-level debugging.

One debugging tool that ships with every JDK is the jdb Java command-line debugger. The JDK documentation describes it in detail: the various ways of starting it, the commands it understands, and how to use it for in-process and remote debugging.

Let’s go through a sample usage with the script in listing 13.18. It prints the numbers from 1 to 100, stating whether each number is a prime number. An integral number x is a prime number if no integral number y between 2 and x-1 divides into x without a remainder. Note how the isPrime method implements this specification in a declarative way.

Example 13.18. Primer.groovy for printing prime number information

boolean isPrime(int x) {
    ! (2..<x).any { y ->  x % y == 0 }
}

for (i in 1..100) {
    println "$i : ${isPrime(i)}"
}

To work with this script in the jdb, you need to set the CLASSPATH environment variable to include the current dir (.) and all jars in the GROOVY_HOME/lib dir.

For ease of use, place a file named jdb.ini in your USER_HOME or current directory to set up defaults when using jdb. When working with Groovy, it is convenient to make it contain the line

exclude groovy.*,java.*,org.codehaus.*,sun.*

to go avoid stepping through the Java and Groovy internals.

Jdb session transcript

With this preparation in place, you can start as follows:

> %JAVA_HOME%injdb groovy.lang.GroovyShell Primer
Initializing jdb ...
*** Reading commands from ...jdb.ini

Let’s set a breakpoint at the second line for inspecting the call. You do this before running the program. Otherwise, it would complete so quickly that you couldn’t see anything.

There are various ways to set breakpoints. Type help to see them. We use a simple one with a classname and a line number. Note that scripts compile to classes, with the name of the script file becoming the classname:

> > stop at Primer:2
Deferring breakpoint Primer:2.
It will be set after the class is loaded.

Now, start the program:

> run
run groovy.lang.GroovyShell Primer
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
>
VM Started: Set deferred breakpoint Primer:2
Breakpoint hit: "thread=main", Primer.isPrime(), line=2 bci=13
2        ! (2..<x).any { y ->  x % y == 0 }

Jdb has started the program, told you that it will not catch any Throwables on your behalf, and reported the breakpoint where it stopped.

It’s good to see the line of the breakpoint, but you can hardly understand it without seeing the surrounding lines. The list command shows the neighborhood:

main[1] list
1    boolean isPrime(int x) {
2 =>     ! (2..<x).any { y ->  x % y == 0 }
3    }
4
5    for (i in 1..9) {
6        println "$i : ${isPrime(i)}"
7    }

Let’s see what local variables you have at this point:

main[1] locals
Method arguments:
Local variables:
x = instance of groovy.lang.Reference(id=726)

The x variable is not a simple int, but a Reference object, because we’re looking at Groovy code through Java glasses. Reference objects have a get method that returns their value. You can eval this method call:

main[1] eval x.get()
 x.get() = "1"

That’s what you expected. You are done with the isPrime method. Let’s ask jdb to bring you back to the caller of this method:

main[1] step out
>
Step completed: "thread=main", Primer.run(), line=6 bci=76
6        println "$i : ${isPrime(i)}"

Time to end the jdb session:

main[1] exit

Dear passengers, thank you very much for flying with jdb airlines.

Debugging gives you control over the execution so you can make the code run as slowly as you need it to in order to understand it. Profiling helps you do the reverse—with the aid of a profiler, you can usually make your code run faster and more efficiently. Although Groovy code is rarely used when absolute performance is important, it can nevertheless be instructive to see where your code is spending the most time or what the most memory is being used for.

Profiling

Profiling is the task of analyzing a run of your program for memory and CPU time consumption. Groovy code can be profiled with any ordinary Java profiling tool. Profiling our Primer script as shown in listing 13.18 can easily be done with the profiling support that comes with the Java Runtime Environment. The JDK documentation comes with extensive documentation of this topic. In short, you can a run a compact command-line profiler with

java -Xprof groovy.lang.GroovyShell Primer

A more sophisticated solution is available with

java -agentlib:hprof groovy.lang.GroovyShell Primer

This second way of starting the JRE profiler writes extensive data to a file named java.hprof.txt. There are a lot of options that you can set when profiling this way. For a list of options, type

java -agentlib:hprof=help

That extensive output of the JRE profiler requires some time to understand. Therefore, commercial profiling solutions are used more often. Figure 13.7 shows profiling data from a Primer run as presented by the commercial YourKit profiler, which grants a free license to the Groovy committers.

From looking at the profiling analysis in figure 13.7, you can tell (line 2) that we started the profiling run from within the Intellij IDE. A special YourKit plug-in for that IDE allows easy profiling setup.

Profiling data from YourKit for the Primer script

Figure 13.7. Profiling data from YourKit for the Primer script

Because scripts are created with main and run methods, you see these method calls in lines 4 and 6. Line 8 shows that calls to the isPrime method took almost no time. Almost all the time was used for writing the resulting GString to the console. This is not surprising, because I/O operations are always expensive compared to mere calculations.

In between the calls to Primer, you see calls to the Groovy runtime system. The icon indicates that these lines are filtered; in other words, they represent a series of hidden calls. Setting such filters is important to make the interesting parts of the stack stand out.

Profiling and debugging are expert activities, and it takes some time to get proficient with the tools and their usage. But this is not particular to Groovy. It is also true for Java.

What is particular to Groovy is that you will be faced with lots of the internals of the Groovy runtime system: how classes are constructed, what objects get created, how the method dispatch works, and so on.

Refactoring

Refactoring is the activity of improving the design of existing code. The internal structure of the code changes, but the external behavior remains unchanged.

The classic book on refactoring is Refactoring: Improving the Design of Existing Code by Martin Fowler. All the listed refactorings and mechanics can be applied to Groovy exactly as shown for Java code in the book. Where the mechanics suggest compiling the changed code, such a compilation check should be accompanied with running the unit tests for Groovy code.

For the Java world, lots of the standard refactorings such as Extract Method, Introduce Explaining Variable, Pull Members Up, and so on have been automated for use from inside the IDE.

For Groovy, refactoring support is currently not as complete. The future will show which IDE vendor or open source project will be able to provide a compelling solution.

Summary

Groovy is a unique language. It has a lot of similarities with Java and is fully integrated into the Java runtime architecture. This can sometimes lead us to forget about the differences. On the other hand, if you have a background in scripting languages such as Perl, Ruby, or Python, the biggest difference that you need to be aware of is—again—the Java runtime. Having gone through section 13.1, you are less likely to fall for the most common traps.

The uniqueness of Groovy leads to its own style of tackling programming tasks. Groovy still lets you write code in a procedural or Java-like way, but the idiomatic solutions as shown in section 13.2 have their own appeal.

Groovy is a good friend for all kinds of ad-hoc command-line scripting and serious automation solutions. This ranges from the groovy command and its various options for one-liners to scheduled execution of complex automation actions. Whether you want to surf the Web automatically or play some music, Groovy can do it for you. The important point is that Groovy can use any Java library to fulfill these tasks.

Finally, everyday programming work needs good organization to make it an efficient and satisfying experience. With the information provided in section 13.6, you are now able to make the best possible use of the existing Groovy and Java tools.

Although we have largely avoided making assumptions about your development process, one practice that is becoming more and more widely used is unit testing. The developers of Groovy believe strongly in the merits of unit testing (as do we), so it would be strange if Groovy didn’t have good support for it. The next chapter shows that our expectations are met once again.



[1] Projects can store arbitrary objects in references.

[2] You can register Groovy listener objects that get notified whenever a task is started and ended; see http://ant.apache.org/manual/listeners.html.

[3] When a ClassNotFoundException is encountered, it helps to explicitly compile the dependent .groovy file with groovyc to get a more detailed error message from the compiler. This advice is sometimes misconstrued as “dependent scripts need to be compiled,” which is of course not true.

[4] See Erich Gamma et al, Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1995) for an explanation.

[7] Maven is a project build tool including dependency resolution: http://maven.apache.org.

[8] Ivy is a dependency resolution tool: http://jayasoft.org/ivy. Note: This is not JavaSoft!

[9] Canoo has a corporate client that has run such a schedule for over two years now.

[10] Using the day/hours/minutes properties of Date has been deprecated since JDK 1.1. However, correctly using Calendar methods here would distract from the focus of the example.

[11] The examples use HtmlUnit version 1.9.

[12] This is not the famous actor but the technical user for the cruisecontrol continuous integration service.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.233.153