Chapter 12. Working with XML

 

Perfection is achieved not when you have nothing more to add, but when you have nothing left to take away.

 
 --Antoine de Saint-Exupery

XML, the eXtensible Markup Language, is a reasonably young innovation. It’s just becoming a teenager, but we use it so commonly these days that it’s hard to believe there were times without it. The World Wide Web Consortium (W3C) standardized the first version of XML in 1996.

The widespread use of XML and worldwide adoption of Java took place at about the same time. This may be one of the reasons why the Java platform developed such excellent support for working with XML. Not only are there the built-in SAX and DOM APIs, but many other libraries have appeared over time for parsing and creating XML and for working with it using standards such as XPath.

The topic of XML has the unusual property of being simple and complex at the same time. XML is straightforward until you bring in namespaces, entities, and the like. Similarly, although it’s feasible to demonstrate one way of working with XML fairly simply, giving a good overview of all (or even most) of the ways of working with XML would require more space than we have in this book. We will concentrate on the new capabilities that Groovy brings, as well as mention the enhanced support for the DOM API. Even limiting ourselves to these topics doesn’t let us explore every nook and cranny.

This chapter is broadly divided into three parts. First, you’ll see the different techniques available for parsing XML in Groovy. Second, you will learn some tricks about processing and transforming XML. Finally, we will examine the Groovy support for web services—one of the most common uses of XML in business today.

We assume you already have a reasonable understanding of XML. If you find yourself struggling with any of the XML concepts we use in this chapter, please refer to one of the many available XML books.[1]

XML processing typically starts with reading an XML document, which is our first topic.

Reading XML documents

When working with XML, we have to somehow read it to begin with. This section will lead you through the many options available in Groovy for parsing XML: the normal DOM route, enhanced by Groovy; Groovy’s own XmlParser and XmlSlurper classes; SAX event-based parsing; and the recently introduced StAX pull-parsers.

Let’s suppose we have a little datastore in XML format for planning our Groovy self-education activities. In this datastore, we capture how many hours per week we can invest in this training, what tasks need to be done, and how many hours each task will eat up in total. To keep track of our progress, we will also store how many hours are “done” for each task.

Listing 12.1 shows our XML datastore as it resides in a file named data/plan.xml.

Example 12.1. The example datastore data/plan.xml

<plan>
    <week capacity="8">
        <task done="2" total="2" title="read XML chapter"/>
        <task done="3" total="3" title="try some reporting"/>
        <task done="1" total="2" title="use in current project"/>
    </week>
    <week capacity="8">
        <task done="0" total="1" title="re-read DB chapter"/>
        <task done="0" total="3" title="use DB/XML combination"/>
    </week>
</plan>

We plan for two weeks, with eight hours for education each week. Three tasks are scheduled for the current week: reading this chapter (two hours for a quick reader), playing with the newly acquired knowledge (three hours of real fun), and using it in the real world (one hour done and one still left).

This will be our running example for most of the chapter.

For reading such a datastore, we will present several different approaches: first using technologies built into the JRE, and then using the Groovy parsers. We’ll start with the more familiar DOM parser.

Working with a DOM parser

Why do we bother with Java’s classic DOM parsers? Shouldn’t we restrict ourselves to show only Groovy specifics here?

Well, first of all, even in Groovy code, we sometimes need DOM objects for further processing, for example when applying XPath expressions to an object as we will explain in section 12.2.3. For that reason, we show the Groovy way of retrieving the DOM representation of our datastore with the help of Java’s DOM parsers. Second, there is basic Groovy support for dealing with DOM NodeLists, and Groovy also provides extra helper classes to simplify common tasks within DOM. Finally, it’s much easier to appreciate how slick the Groovy parsers are after having seen the “old” way of reading XML.

We start by loading a DOM tree into memory.

Getting the document

Not surprisingly, the Document Object Model is based around the central abstraction of a document, realized as the Java interface org.w3c.dom.Document. An object of this type will hold our datastore.

The Java way of retrieving a document is through the parse method of a DocumentBuilder (= parser). This method takes an InputStream to read the XML from. So a first attempt of reading is

def doc = builder.parse(new FileInputStream('data/plan.xml'))

Now, where does builder come from? We are working slowly backward to find a solution. The builder must be of type DocumentBuilder. Instances of this type are delivered from a DocumentBuilderFactory, which has a factory method called newDocumentBuilder:

def builder = fac.newDocumentBuilder()
def doc     = builder.parse(new FileInputStream('data/plan.xml'))

Now, where does this factory come from? Here it is:

import javax.xml.parsers.DocumentBuilderFactory

def fac     = DocumentBuilderFactory.newInstance()
def builder = fac.newDocumentBuilder()
def doc     = builder.parse(new FileInputStream('data/plan.xml'))

Java’s XML handling API is designed with flexibility in mind.[2] A downside of this flexibility is that for our simple example, we have a few hoops to jump through in order to retrieve our file. It’s not too bad, though, and now that we have it we can dive into the document.

Walking the DOM

The document object is not yet the root of our datastore. In order to get the toplevel element, which is plan in our case, we have to ask the document for its documentElement property:

def plan = doc.documentElement

We can now work with the plan variable. It’s of type org.w3c.dom.Node and so it can be asked for its nodeType and nodeName. The nodeType is Node.ELEMENT_NODE, and nodeName is plan.

The design of such DOM nodes is a bit strange (to put it mildly). Every node has the same properties, such as nodeType, nodeName, nodeValue, childNodes, and attributes (to name only a few; see the API documentation for the full list). However, what is stored in these properties and how they behave depends on the value of the nodeType property.

We will deal with types ELEMENT_NODE, ATTRIBUTE_NODE, and TEXT_NODE (see the API documentation for the exhaustive list).

It is not surprising that XML elements are stored in nodes of type ELEMENT_NODE, but it is surprising that attributes are also stored in node objects (of nodeType ATTRIBUTE_NODE). To make things even more complex, each value of an attribute is stored in an extra node object (with nodeType TEXT_NODE). This complexity is a large part of the reason why simpler APIs such as JDOM, dom4j, and XOM have become popular.

As an example, the nodes and their names, types, and values are depicted in figure 12.1 for the first week element in the datastore.

Example of a DOM object model (excerpt) for element, attribute, and text nodes

Figure 12.1. Example of a DOM object model (excerpt) for element, attribute, and text nodes

The fact that node objects behave differently with respect to their nodeType leads to code that needs to work with this distinction. For example, when reading information from a node, we need a method such as this:

import org.w3c.dom.Node

String info(node) {
    switch (node.nodeType) {
        case Node.ELEMENT_NODE:
             return 'element: '+ node.nodeName
        case Node.ATTRIBUTE_NODE:
             return "attribute: ${node.nodeName}=${node.nodeValue}"
        case Node.TEXT_NODE:
             return 'text: '+ node.nodeValue
    }
    return 'some other type: '+ node.nodeType
}

With this helper method, we have almost everything we need to read information from our datastore. Two pieces of information are not yet explained: the types of the childNodes and attributes properties.

The childNodes property is of type org.w3c.dom.NodeList. Unfortunately, it doesn’t extend the java.util.List interface but provides its own methods, getLength and item(index). This makes it inconvenient to work with. However, as you saw in section 9.1.3, Groovy makes its object iteration methods (each, find, findAll, and so on) available on that type.

The attributes property is of type org.w3c.dom.NamedNodeMap, which doesn’t extend java.util.Map either. We will use its getNamedItem(name) method.

Listing 12.2 puts all this together and reads our plan from the XML datastore, walking into the first task of the first week.

Example 12.2. Reading plan.xml with the classic DOM parser

Reading plan.xml with the classic DOM parser

Note how we use the object iteration method find Reading plan.xml with the classic DOM parser to access the first week element under plan. We use indexed access to the first task child node at Reading plan.xml with the classic DOM parser. But why is the index one and not zero? Because in our XML document, there is a line break between week and task. The DOM parser generates a text node containing this line break (and surrounding whitespace) and adds it as the first child node of week (at index zero). The task node floats to the second position with index one.

Making DOM groovier

Groovy wouldn’t be groovy without a convenience method for the lengthy parsing prework:

def doc  = groovy.xml.DOMBuilder.parse
                (new FileReader('data/plan.xml'))
def plan = doc.documentElement

Note

The DOMBuilder is not only for convenient parsing. As the name suggests, it is a builder and can be used like any other builder (see chapter 8). It returns a tree of org.w3c.dom.Node objects just as if they’d been parsed from an XML document. You can add it to another tree, write it to XML, or query it using XPath (see section 12.2.3).

Dealing with child nodes and attributes as in listing 12.2 doesn’t feel groovy either. Therefore, Groovy provides a DOMCategory that you can use for simplified access. With this, you can index child nodes via the subscript operator or via their node name. You can refer to attributes by getting the @attributeName property:

use(groovy.xml.dom.DOMCategory){
    assert 'plan' == plan.nodeName
    assert 'week' == plan[1].nodeName
    assert 'week' == plan.week.nodeName
    assert '8'    == plan[1].'@capacity'
}

Although not shown in the example, DOMCategory has recently been improved to provide additional syntax shortcuts such as name, text, children, iterator, parent, and attributes. We explain these shortcuts later in this chapter, because they originated in Groovy’s purpose-built XML parsing classes. Consult the online Groovy documentation for more details.

This was a lot of work to get the DOM parser to read our data, and we had to face some surprises along the way. We will now do the same task using the Groovy parser with less effort and fewer surprises.

Reading with a Groovy parser

The Groovy way of reading the plan datastore is so simple, we’ll dive headfirst into the solution as presented in listing 12.3.

Example 12.3. Reading plan.xml with Groovy’s XmlParser

def plan = new XmlParser().parse(new File('data/plan.xml'))

assert 'plan' == plan.name()
assert 'week' == plan.week[0].name()
assert 'task' == plan.week[0].task[0].name()
assert 'read XML chapter' == plan.week[0].task[0].'@title'

No fluff, just stuff. The parsing is only a one-liner. Because Groovy’s XmlParser resides in package groovy.util, we don’t even need an import statement for that class. The parser can work directly on File objects and other input sources, as you will see in table 12.2. The parser returns a groovy.util.Node. You already came across this type in section 8.2. That means we can easily use GPath expressions to walk through the tree, as shown with the assert statements.

Up to this point, you have seen that Groovy’s XmlParser provides all the functionality you first saw with the DOM parser. But there is more to come. In addition to the XmlParser, Groovy comes with the XmlSlurper. Let’s explore the commonalities and differences between those two before considering more advanced usages of each.

Commonalities between XmlParser and XmlSlurper

Let’s start with the commonalities of XmlParser and XmlSlurper: They both reside in package groovy.util and provide the constructors listed in table 12.1.

Table 12.1. Common constructors of XmlParser and XmlSlurper

Parameter list

Note

()

Parameterless constructor.

(boolean validating, boolean namespaceAware)

After parsing, the document can be validated against a declared DTD, and namespace declarations shall be taken into account.

(XMLReader reader)

If you already have a org.xml.sax.XMLReader available, it can be reused.

(SAXParser parser)

If you already have a javax.xml.parsers.SAXParser available, it can be reused.

Besides sharing constructors with the same parameter lists, the types share parsing methods with the same signatures. The only difference is that the parsing methods of XmlParser return objects of type groovy.util.Node whereas XmlSlurper returns GPathResult objects. Table 12.2 lists the uniform parse methods.

Table 12.2. Parse methods common to XmlParser and XmlSlurper

Signature

Note

parse(InputSource input)

Reads from an org.xml.sax.InputSource

parse(File file)

Reads from an java.io.File

parse(InputStream input)

Reads from an java.io.InputStream

parse(Reader in)

Reads from an java.io.Reader

parse(String uri)

Reads the resource that the uri points to after connecting to it

parseText(String text)

Uses the text as input

These are the most commonly used methods on XmlParser and XmlSlurper. The description of additional methods (such as for using specialized DTD handlers and entity resolvers) is in the API documentation.

The result of the parse method is either a Node (for XmlParser) or a GPathResult (for XmlSlurper). Table 12.3 lists the common available methods for both result types. Note that because both types understand the iterator method, all object iteration methods are also instantly available.

Table 12.3. Common methods of groovy.util.Node and GPathResult

Node method

GPathResult method

Shortcut

Object

name()

String

name()

 

String

text()

String

text()

 

String

toString()

String

toString()

 

Node

parent()

GPathResult

parent()

'..'

List

children()

GPathResult

children()

'*'

Map

attributes()[a]

Map

attributes()

 

Iterator

iterator()

Iterator

iterator()

 

List

depthFirst()

Iterator

depthFirst()

'**'

List

breadthFirst()

Iterator

breadthFirst()

 

[a] Strictly speaking, attributes() is a method of NodeChild, not GPathResult, but this is transparent in most usages.

GPathResult and groovy.util.Node provide additional shortcuts for method calls to the parent object and all descendent objects. Such shortcuts make reading a GPath expression more like other declarative path expressions such as XPath or Ant paths.[3]

Objects of type Node and GPathResult can access both child elements and attributes as if they were properties of the current object. Table 12.4 shows the syntax and how the leading @ sign distinguishes attribute names from nested element names.

Table 12.4. Element and attribute access in groovy.util.Node and GPathResult

Node (XmlParser)

GPathResult (XmlSlurper)

Meaning

[' elementName ']

[' elementName ']

All child elements of that name

.elementName

.elementName

[index]

[index]

Child element by index

['@attributeName']

['@attributeName']

 

.'@attributeName'

.'@attributeName'

The attribute value stored under that name

 

.@attributeName

 

Listing 12.4 plays with various method calls and uses GPath expressions to work on objects of type Node and GPathResult alike. It uses XmlParser to return Node objects and XmlSlurper to return a GPathResult. To make the similarities stand out, listing 12.4 shows doubled lines, one using Node, one using GPathResult.

Example 12.4. Using common methods of groovy.util.Node and GPathResult

Using common methods of groovy.util.Node and GPathResult

Note that the GPath expression node.week.task Using common methods of groovy.util.Node and GPathResult first collects all child elements named week, and then, for each of those, collects all their child elements named task (compare the second row in table 12.4). In the case of node.week.task, we have a list of task nodes that we can ask for its size. In the case of path.week.task, we have a GPathResult that we can ask for its size. The interesting thing here is that the GPathResult can determine the size without collecting intermediate results (such as week and task nodes) in a temporary datastructure such as a list. Instead, it stores whatever iteration logic is needed to determine the result and then executes that logic and returns the result (the size in this example).

At Using common methods of groovy.util.Node and GPathResult, you see that in GPath, attribute access has the same effect as access to child elements; node.week.task.'@done' results in a list of all values of the done attribute of all tasks of all weeks. We use the spread-dot operator (see section 7.5.1) to apply the toInteger method to all strings in that list, returning a list of integers. We finally use the GDK method sum on that list.

The line at Using common methods of groovy.util.Node and GPathResult can be read as: “Assert that the done attribute in every task of week[1] is '0'.” What’s new here is using indexed access and the object iteration method every. Because indexing starts at zero, week[1] means the second week.

This example should serve as an appetizer for your own experiences with applying GPath expressions to XML documents.

In addition to the convenient GPath notation, you might also wish to make use of traversal methods; for example, we could add the following lines to listing 12.4:

assert 'plan->week->week->task->task->task->task->task' ==
        node.breadthFirst()*.name().join('->')

assert 'plan->week->task->task->task->week->task->task' ==
        node.depthFirst()*.name().join('->')

So far, you have seen that XmlParser and XmlSlurper can be used in a similar fashion to produce similar results. But there would be no need for two separate classes if there wasn’t a difference. That’s what we cover next.

Differences between XmlParser and XmlSlurper

Despite the similarities between XmlParser and XmlSlurper when used for simple reading purposes, there are differences when it comes to more advanced reading tasks and when processing XML documents into other formats.

XmlParser uses the groovy.util.Node type and its GPath expressions result in lists of nodes. That makes working with XmlParser feel like there always is a tangible object representation of elements—something that we can inspect via toString, print, or change in-place. Because GPath expressions return lists of such elements, we can apply all our knowledge of the list datatype (see section 4.2).

This convenience comes at the expense of additional up-front processing and extra memory consumption. The GPath expression node.week.task.'@done' generates three lists: a temporary list of weeks[4] (two entries), a temporary list of tasks (five entries), and a list of done attribute values (five strings) that is finally returned. This is reasonable for our small example but hampers processing large or deeply nested XML documents.

XmlSlurper in contrast does not store intermediate results when processing information after a document has been parsed. It avoids the extra memory hit when processing. Internally, XmlSlurper uses iterators instead of extra collections to reflect every step in the GPath. With this construction, it is possible to defer processing until the last possible moment.

Note

This does not mean that XmlSlurper would work without storing the parsed information in memory. It still does, and the memory consumption rises with the size of the XML document. However, for processing that stored information via GPath, XmlSlurper does not need extra memory.

Table 12.5 lists the methods unique to Node. When using XmlParser, you can use these methods in your processing.

Table 12.5. XmlParser: methods of groovy.util.Node not available in GPathResult

Method

Note

Object

value()

Retrieves the payload of the node, either the children() or the text()

void

setValue(Object value)

Changes the payload

Object

attribute(Object key)

Shortcut to attributes().get(key)

NodeList

getAt(QName name)

Provides namespace support for selecting child elements by their groovy.xml.QName

void

print(PrintWriter out)

Pretty-printing with NodePrinter

Table 12.6 lists the methods that are unique to or are optimized in GPathResult. As an example, we could add the following line to listing 12.4 to use the optimized findAll in GPathResult:

assert 2 == path.week.task.findAll{ it.'@title' =~ 'XML' }.size()

Table 12.6. XmlSlurper: methods of GPathResult not available in groovy.util.Node

Method

Note

GPathResult

parents()

Represents all parent elements on the path from the current element up to the root

GPathResult

declareNamespace (Map newNamespaceMapping)

Registers namespace prefixes and their URIs

List

list()

Converts a GPathResult into a list of groovy.util.slurpersupport.Node objects for list-friendly processing

int

size()

The number of result elements (memory optimized implementation)

GPathResult

find(Closure closure)

Overrides the object iteration method find

GPathResult

findAll(Closure closure)

Overrides the object iteration method findAll

Additionally, some classes may only work on one type or the other; for example, there is groovy.util.XmlNodePrinter with method print(Node) but no support for GPathResult. Like the name suggests, XmlNodePrinter pretty-prints a Node tree to a PrintStream in XML format.

You have seen that there are a lot of similarities and some slight differences when reading XML via XmlParser or XmlSlurper. The real, fundamental differences become apparent when processing the parsed information. Coming up in section 12.2, we will look at these differences in more detail by exploring two examples: processing with direct in-place data manipulation and processing in a streaming scenario. However, first we are going to look at event style parsing and how it can be used with Groovy. This will help us better position some of Groovy’s powerful XML features in our forthcoming more-detailed examples.

Reading with a SAX parser

In addition to the original Java DOM parsing you saw earlier, Java also supports what is known as event-based parsing. The original and most common form of event-based parsing is called SAX. SAX is a push-style event-based parser because the parser pushes events to your code.

When using this style of processing, no memory structure is constructed to store the parsed information; instead, the parser notifies a handler about parsing events. We implement such a handler interface in our program to perform processing relevant to our application’s needs whenever the parser notifies us.

Let’s explore this for our simple plan example. Suppose we wish to display a quick summary of the tasks that are underway and those that are upcoming; we aren’t interested in completed activities for the moment. Listing 12.5 shows how to receive start element events using SAX and perform our business logic of printing out the tasks of interest.

Example 12.5. Using a SAX parser with Groovy

Using a SAX parser with Groovy

Note that with this style of processing, we have more work to do. When our startElement method is called, we are provided with SAX event information including the name of the element (along with a namespace, if provided) and all the attributes. It’s up to us to work out whether we need this information and process or store it as required during this method call. The parser won’t do any further storage for us. This minimizes memory overhead of the parser, but the implication is that we won’t be able to do GPath-style processing and we aren’t in a position to manipulate a tree-like data structure. We’ll have more to say about SAX event information when we explore XmlSlurper in more detail in section 12.2.

Reading with a StAX parser

In addition to the push-style SAX parsers supported by Java, a recent trend in processing XML with Java is to use pull-style event-based parsers. The most common of these are called StAX-based parsers.[5] With such a parser, you are still interested in events, but you ask the parser for events (you pull events as needed) during processing[6], instead of waiting to be informed by methods being called.

Listing 12.6 shows how you can use StAX with Groovy. You will need a StAX parser in your classpath to run this example. If you have already set up Groovy-SOAP, which we explore further in section 12.3, you may already have everything you need.

Example 12.6. Using a StAX parser with Groovy

Using a StAX parser with Groovy

Note that this style of parsing is similar to SAX-style parsing except that we are running the main control loop ourselves rather than having the parser do it. This style has advantages for certain kinds of processing where the code becomes simpler to write and understand.

Suppose you have to respond to many parts of the document differently. With push models, your code has to maintain extra state to know where you are and how to react. With a pull model, you can decide what parts of the document to process at any point within your business logic. The flow through the document is easier to follow, and the code feels more natural.

We have now explored the breadth of parsing options available in Groovy. Next we explore the advantages of the Groovy-specific parsing options in more detail.

Processing XML

Many situations involving XML call for more than just reading the data and then navigating to a specific element or node. XML documents often require transformation, modification, or complex querying. When we look at the characteristics of XmlParser and XmlSlurper when processing XML data in these ways, we see the biggest differences between the two. Let’s start with a simple but perhaps surprising analogy: heating water.

There are essentially two ways of boiling water, as illustrated in figure 12.2. You can pour water into a tank (called a boiler), heat it up, and get the hot water from the outlet. The second way of boiling is with the help of a continuous-flow heater, which heats up the water while it streams from the cold-water inlet through the heating coil until it reaches the outlet. The heating happens only when requested, as indicated by opening the outlet tap.

Comparing the strategies of boiling vs. continuous-flow heating

Figure 12.2. Comparing the strategies of boiling vs. continuous-flow heating

How does XML processing relate to boiling water? Well, processing XML means you are not just using bits of the stored information, but retrieving it, adding some new quality to it (making it hot in our analogy), and outputting the whole thing. Just like boiling water, this can be done in two ways: by storing the information in memory and processing it in-place, or by retrieving information from an input stream, processing it on the fly, and streaming it to an output device.

In general, processing XML with XmlParser (and groovy.util.Node) is more like using a boiler, whereas XmlSlurper can serve as a source in a streaming scenario analogous to continuous-flow heating.

We’re going to start by looking at the “boiling” strategy of in-place modification and processing and then proceed to explore streamed processing and combinations with XPath.

In-place processing

In-place processing is the conventional means of XML processing. It uses the XmlParser to retrieve a tree of nodes. These nodes reside in memory and can be rearranged, copied, or deleted, and their attributes can be changed. We will use this approach to generate an HTML report for keeping track of our Groovy learning activities.

Suppose the report should look like figure 12.3. You can see that new information is derived from existing data: tasks and weeks have a new property that we will call status with the possible values of scheduled, in progress, and finished.

An HTML progress report of Groovy learning activities

Figure 12.3. An HTML progress report of Groovy learning activities

For tasks, the value of the status property is determined by looking at the done and total attributes. If done is zero, the status is considered scheduled; if done is equal to or exceeds total, the status is finished; otherwise, the status is in progress.

Weeks are finished when all contained tasks are finished. They are in progress when at least one contained task is in progress.

This sounds like we are going to do lots of number comparisons with the done and total attributes. Unfortunately these attributes are stored as strings, not numbers. These considerations lead to a three-step “heating” process:

  1. Convert all string attribute values to numbers where suitable.

  2. Add a new attribute called status to all tasks, and determine the value.

  3. Add a new attribute called status to all weeks, and determine the value.

With such an improved data representation, it is finally straightforward to use MarkupBuilder to produce the HTML report.

We have to produce HTML source like

<html>
  <head>
    <title>Current Groovy progress</title>
    <link href='style.css' type='text/css' rel='stylesheet' />
  </head>
  <body>
    <h1>Week No. 0: in progress</h1>
    <dl>
      <dt class='finished'>read XML chapter</dt>
      <dd>(2/2): finished</dd>
...
    </dl>
  </body>
</html>

where the stylesheet style.css contains the decision of how a task is finally displayed according to its status. It can for example use the following lines for that purpose:

dt          { font-weight:bold }
dt.finished { font-weight:normal; text-decoration:line-through }

Listing 12.7 contains the full solution. The numberfy method implements the string-to-number conversion for those attributes that we expect to be of integer content. It also shows how to work recursively through the node tree.

The methods weekStatus and taskStatus make the new status attribute available on the corresponding node, where weekStatus calls taskStatus for all its contained tasks to make sure it can work on their status inside GPath expressions.

The final htmlReport method is the conventional way of building HTML. Thanks to the “heating” prework, there is no logic needed in the report. The report uses the status attribute to assign a stylesheet class of the same value.

Example 12.7. Generating an HTML report with in-memory data preparation

Generating an HTML report with in-memory data preparation

After the careful prework, the code in listing 12.7 is not surprising. What’s a bit unconventional is having a lot of closing braces on one line at the end of htmlReport. This is not only for compact typesetting in the book. We also sometimes use this style in our everyday code. We find it nicely reveals what levels of indentation are to be closed and still allows us to check brace-matching by column. It would be great to have IDE support for toggling between this and conventional code layout.

Now that you have seen how to use the in-memory “boiler,” let’s investigate the streaming scenario.

Streaming processing

In order to demonstrate the use of streaming, let’s start with the simplest kind of processing that we can think of: pumping out what comes in without any modification. Even this simple example is hard to understand when you first encounter it. We recommend that if you find it confusing, keep reading, but don’t worry too much about the details. It’s definitely worth coming back later for a second try, though—in many situations, the benefits of stream-based processing are well worth the harder conceptual model.

Unmodified piping

You use XmlSlurper to parse the original XML. Because the final output format is XML again, you need some device that can generate XML in a streaming fashion. The groovy.xml.StreamingMarkupBuilder class is specialized for outputting markup on demand—in other words, when an information sink requests it. Such a sink is an operation that requests a Writable—for example, the leftshift operator call on streams or the evaluation of GStrings. The trick that StreamingMarkupBuilder uses to achieve this effect is similar to the approach of template engines. StreamingMarkupBuilder provides a bind method that returns a WritableClosure. This object is a Writable and a closure at the same time. Because it is a Writable, you can use it wherever the final markup is requested. Because it is a closure, the generation of this markup can be done lazily on-the-fly, without storing intermediate results.

Listing 12.8 shows this in action. The bind method also needs the information about what logic is to be applied to produce the final markup. Wherever logic is needed, closures are the first candidate, and so it is with bind. We pass a closure to the bind method that describes the markup logic.

For our initial example of pumping the path through, we use a special feature of StreamingMarkupBuilder that allows us to yield the markup generation logic to a Buildable, an object that knows how to build itself. It happens that a GPathResult (and thus path) is buildable. In order to yield the building logic to it, we use the yield method. However, we cannot use it unqualified because we would produce a <yield/> markup if we did. The special symbol mkp marks our method call as belonging to the namespace of markup keywords.

Example 12.8. Pumping an XML stream without modification

import groovy.xml.StreamingMarkupBuilder

def path = new XmlSlurper().parse(new File('data/plan.xml'))

def builder = new StreamingMarkupBuilder()
def copier = builder.bind{ mkp.yield(path) }
def result = "$copier"

assert result.startsWith('<plan><week ')
assert result.endsWith('</week></plan>')

There is a lot going on in only a few lines of code. The result variable for example refers to a GString with one value: a reference to copier. Note that we didn’t call it “copy” because it is not a thing but an actor.

When we call the startsWith method on result, the string representation of the GString is requested, and because the one GString value copier is a Writable, its writeTo method is called. The copier was constructed by the builder such that writeTo relays to path.build().

Figure 12.4 summarizes this streaming behavior.

UML sequence diagram for streamed building

Figure 12.4. UML sequence diagram for streamed building

Note how in figure 12.4, the processing doesn’t start before the values are requested. Only after the GString’s toString method is called does the copier start running and is the path iterated upon. Until then, the path isn’t touched! No memory representation has been created for the purpose of markup or iteration. This is a simplification of what is going on. XmlSlurper does have memory requirements. It stores the SAX event information you saw in section 12.1.3 but doesn’t process or store it in the processing-friendly Node objects.

Calling startsWith is like opening the outlet tap to draw the markup from the copier, which in turn draws its source information from the path inlet. Any code before that point is only the plumbing.

As a variant of listing 12.8, you can also directly write the markup onto the console. Use the following:

System.out << copier

Remember that System.out is an OutputStream that understands the leftshift operator with a Writable argument.

For this simple example, we could have used the SAX or StAX approaches you saw earlier. They would be even more streamlined solutions. Not only would they not need to process and store the tree-like data structures that XmlParser creates for you, but they also wouldn’t need to store the SAX event information. The same isn’t true for the more complicated scenarios that follow. As is common in many XML processing scenarios, the remaining examples have processing requirements that span multiple elements. Such scenarios benefit greatly from the ability to use GPath-style expressions.

Heating up to HTML

Until now, we copied only the “cold” input. It’s time to light our heater. The goal is to produce the same GUI as in figure 12.3.

We start with the basis of listing 12.8 but enhance the markup closure that gets bound to the builder. In listing 12.9, building looks almost the same as in the “boiling” example of listing 12.7; only the evaluation of the week and task status needs to be adapted. We do not calculate the status in advance and store it for later reference, but do the classification on-the-fly when the builder lazily requests it.

Example 12.9. Streamed heating from XML to HTML

Streamed heating from XML to HTML

The cool thing here is that at first glance it looks similar to listing 12.7, but it works very differently:

  • All evaluation is done lazily.

  • Memory consumption for GPath operations is minimized.

  • No in-memory assembly of HTML representation is built before outputting.

This allows us to produce lots of output, because it is not assembled in memory but directly streamed to the output as the building logic demands. However, because of the storage of SAX event information on the input, this approach will not allow input documents as large as would be possible with SAX or StAX.

Figure 12.5 sketches the differences between both processing approaches with respect to processing requirements and memory usage. The process goes from left to right either in the top row (for “boiling”) or in the bottom row (for “streaming”). Either process encompasses parsing, evaluating, building, and serializing to HTML, where evaluating and building are not necessarily in strict sequence. This is also where the differences are: working on intermediate data structures (trees of lists and nodes) or on lightweight objects that encapsulate logic (iterators and closures).

Memory usage characteristics for the “boiling” vs. “streaming” strategies

Figure 12.5. Memory usage characteristics for the “boiling” vs. “streaming” strategies

That’s it for the basics of processing XML with the structures provided by the Groovy XML parsers.

In section 12.1.1, you saw that classic Java DOM parsers return objects of type org.w3c.dom.Node, which differs from what the Groovy parsers return. The Java way of processing such nodes is with the help of XPath. The next section shows how Java XPath and Groovy XML processing can be used in combination.

Combining with XPath

XPath is for XML what SQL select statements are for relational databases or what regular expressions are for plain text. It’s a means to select parts of the whole document and to do so in a descriptive manner.

Understanding XPath

An XPath is an expression that appears in Java or Groovy as a string (exactly like regex patterns or SQL statements do). A full introduction to XPath is beyond the scope of this book, but here is a short introduction from a Groovy programmer’s point of view.[7]

Just like a GPath, an XPath selects nodes. Where GPath uses dots, XPath uses slashes. For example

/plan/week/task

selects all task nodes of all weeks below plan. The leading slash indicates that the selection starts at the root element. In this expression, plan, week, and task are each called a node test. Each node test may be preceded with an axis specifier from table 12.7 and a double colon.

Table 12.7. XPath axis specifiers

Axis

Selects nodes

Shortcut

child

Directly below

nothing or *

parent

Directly above

..

self

The node itself (use for further references)

.

ancestor

All above

 

ancestor-or-self

All above including self

 

descendant

All below

 

descendant-or-self

All below including self

//

following

All on the same level trailing in the XML document

 

following-sibling

All with the same parent trailing in the XML document

 

preceding

All on the same level preceding in the XML document

 

preceding-sibling

All with the same parent preceding in the XML document

 

attribute

The attribute node

@

namespace

The namespace node

 

With these specifiers, you can select all task elements via

/descendant-or-self::task

With the shortcut syntax, you can select all total attribute nodes of all tasks via

//task/@total

A node test can have a trailing predicate in square brackets to constrain the result. A predicate is an expression made up from path expressions, functions, and operators for the datatypes node-set, string, number, and boolean. Table 12.8 lists what’s possible.[8]

Table 12.8. XPath predicate expression cheat sheet

Category

Appearance

Note

Path operators

/, //, @, [], *, .., .

As above

Union operator

|

Union of two node-sets

Boolean operators

and, or, not()

not() is a function

Arithmetic operators

+, -, *, div, mod

 

Comparison operators

=, !=, <, >, <=, >=

 

String functions

concat(), substring(), contains(), substring-before(), substring-after(), translate(), normalize-space(), string-length()

See the docs for exact meanings and parameters

Number functions

sum(), round(), floor(), ceiling()

 

Node functions

name(), local-name(), namespace-uri()

 

Context functions

position(), last()

[n] is short for [position()=n]

Conversion functions

string(), number(), boolean()

 

Table 12.9 shows some examples.

Table 12.9. XPath examples

XPath

Meaning and notes

Note

/plan/week[1]

First[a] week node

Indexing starts at one

//task[@done<@total]

All unfinished tasks

Auto-conversion to a number

//task[@done<@total][@done>0]

All tasks in progress

Implicit and between brackets

sum(//week[1]/task/@total)

Total hours in the first week

Returns a number

[a] More specifically: the week node at position 1 below plan.

The next obvious question is how to use such XPath expressions in Groovy code.

Using the XPath API

Groovy comes with all the support you need for using XPath expressions in your code. This is because of the xml-apis*.jar and xerces*.jar files in your GROOVY_HOME/lib dir. In case you are running Groovy in an embedded scenario, make sure these jars are on your classpath.

We will use XPath through the convenience methods in org.apache.xpath. XPathAPI. This class provides lot of static helper methods that are easy to use even though the implementation is not always efficient.[9] We will use

Node     selectSingleNode(Node contextNode, String xpath)
NodeList selectNodeList  (Node contextNode, String xpath)
XObject  eval            (Node contextNode, String xpath)

where XObject wraps the XPath datatype that eval returns. For converting it into a Groovy datatype, we can use the methods num, bool, str, and nodelist.

In practice, we may want to do something with all weeks. We select the appropriate list of nodes via XPathAPI.selectNodeList(plan,'week'). Because this returns a NodeList, we can use the object iteration methods on it to get hold of each week:

XPathAPI.selectNodeList(plan, 'week').eachWithIndex{ week, i ->
    // do something with week
}

For each week, we want to print the sum of the total and done attributes with the help of XPath. Each week node becomes the new context node for the XPath evaluation:

XPathAPI.selectNodeList(plan, 'week').eachWithIndex{ week, i ->
    println "
Week No. $i
"
    println XPathAPI.eval(week, 'sum(task/@total)').num()
    println XPathAPI.eval(week, 'sum(task/@done)').num()
}

Listing 12.10 puts all this together with a little reporting functionality that produces a text report for each week, stating the capacity, the total hours planned, and the progress in hours done.

Example 12.10. XPath to text reporting

XPath to text reporting

XPath is used in two ways here—the querying capability is used to select all the week elements XPath to text reporting, and then attributes total and done are extracted with the eval method XPath to text reporting. We mix and match ways of accessing attributes, using DOMCategory to access the capacity attribute with the node.@attributeName syntax XPath to text reporting.

Such a text report is fine to start with, but it would certainly be nicer to show the progress in a chart. Figure 12.6 suggests an HTML solution. In a normal situation, we would use colors in such a report, but they would not be visible in the print of this book. Therefore, we use only a simple box representation of the numbers.

Screenshot of an HTML based reporting

Figure 12.6. Screenshot of an HTML based reporting

Each box is made from the border of a styled div element. The style also determines the width of each box.

This kind of HTML production task calls for a templating approach, because there are multiple recurring patterns for HTML fragments: for the boxes, for each attribute row, and for each week. We will use template engines, GPath, and XPath in combination to make this happen.

Listing 12.11 presents the template that we are going to use. It is a simple template as introduced in section 9.4. It assumes the presence of two variables in the binding: a scale, which is needed to make visible box sizes from the attribute values, and weeks, which is a list of week maps. Each week map contains the keys 'capacity', 'total', and 'done' with integer values.

The template resides in a separate file. We like to name such files with the word template in the name and ending in the usual file extension for the format they produce. For example, the name GroovyPlans.template.html reveals the nature of the file, and we can still use it with an HTML editor.

Example 12.11. HTML reporting layout in data/GroovyPlans.template.html

<html>
  <head>
    <title>Current Groovy progress</title>
  </head>
  <body>
    <% weeks.eachWithIndex{ week, i -> %>
    <h1>Week No. $i</h1>
    <table cellspacing="5" >
        <tbody>
             <% ['capacity','total','done'].each{ attr -> %>
             <tr>
               <td>$attr</td>
               <td>${week[attr]} </td>
                 <td>
                     <div style=
"border: thin solid #000000; width: ${week[attr]*scale}px">
                        &nbsp;</div>
                </td>
            </tr>
            <% } // end of attribute %>
      </tbody>
    </table>
    <% } // end of week %>
  </body>
</html>

This template looks like a JSP file, but it isn’t. The contained logic is expressed in Groovy, not plain Java. Instead of being processed by a JSP engine, it will be evaluated by Groovy’s SimpleTemplateEngine as shown in listing 12.12. We use XPath expressions to prepare the values for binding. A special application of GPath comes into play when calculating the scaling factor.

Scaling is required so that the longest capacity bar is of length 200, so we have to find the maximum capacity for the calculation. Because we have already put these values in the binding, we can use a GPath to get a list of those and play our GDK tricks with it (calling max).

Example 12.12. Using XPath, GPath, and templating in combination for HTML reporting

Using XPath, GPath, and templating in combination for HTML reporting

The code did not change dramatically between the text reporting in listing 12.10 and the HTML reporting in listing 12.12. However, listing 12.12 provides a more general solution, because we can also get a text report from it solely by changing the template.

The kind of transformation from XML to HTML that we achieve with listing 12.12 is classically addressed with XML Stylesheet Transformation (XSLT), which is a powerful technology. It uses stylesheets in XML format to describe a transformation mapping, also using XPath and templates. Its logical means are equivalent to those of a functional programming language.

Although XSLT is suitable for mapping tree structures, we often find it easier to use the Groovy approach when the logic is the least bit complex. XPath, templates, builders, and the Groovy language make a unique combination that allows for elegant and concise solutions. There may be people who are able to look at significant amounts of XSLT for more than a few minutes at a time without risking their mental stability, but they are few and far between. Using the technologies you’ve encountered, you can play to your strengths of understanding Groovy instead of using a different language with a fundamentally different paradigm.

Leveraging additional Java XML processing technologies

Before wrapping up our introduction of processing XML with Groovy, we should mention that although we think that you will find Groovy’s built-in XML features are suitable for many of your processing needs, you are not locked into using just those APIs. Because of Groovy’s Java heritage, many libraries and technologies are available for you to consider. We have already mentioned StAX and Jaxen. Here are a few more of our favorites:[10]

  • Although XmlParser, XmlSlurper, and of course the Java DOM and SAX should meet most of your needs, you can always consider JDOM, dom4j, or XOM.

  • If you need to compare two XML fragments for differences, consider XMLUnit.

  • If you wish to process XML using XQuery, consider Saxon.

  • If you need to persist your XML, consider JAXB or XmlBeans.

  • If you need to do high-performance streaming, consider Nux.

Our introduction to Groovy XML could finish at this point, because you have seen all the basics of XML manipulation. You should now be able to write Groovy programs that read, process, and write XML in a basic way. You will need more detailed documentation when the need arises to deal with more advanced issues such as namespaces, resolving entities, and handling DTDs in a customized way.

The final section of this chapter deals not with the details of XML but with one of its most important modern applications: exchanging data between systems, and talking to web services in particular.

Distributed processing with XML

XML describes data in a system-independent way. This makes it an obvious candidate for exchanging data across a network. Interconnected systems can be heterogeneous. They may be written in different languages, run on different platforms (think .NET vs. Java), use different operating systems, and run on different hardware architectures. But no matter how different these systems are, they can exchange data through XML, so long as both sides have some idea of how to interpret the XML they are given.

At a simple level, sharing data happens every time you surf the Web. With the help of your browser, you request a URL. The server responds with an HTML document that your browser knows how to display. The server and the browser are interconnected through the Hypertext Transfer Protocol (HTTP) that implements the request-response model, and they use HTML as the data-exchange format.

Now imagine a program that surfs the Web on your behalf. Such a program could visit a list of URLs to check for updates, browse a list of news providers for new information about your favorite topics (we suggest “Groovy”), access a stock ticker to see whether your shares have exceeded the target price, and check the local weather service to warn you about upcoming thunderstorms.

Such a program would have significant difficulties to overcome if it had to find the requested information in the HTML of each web site. The HTML describes not only what the data is, but also how it should broadly be presented. A change to the presentation aspect of the HTML could easily break the program that was trying to understand the data. Instead of dealing with the two aspects together, it would be more reliable if there were an XML description of the pure content. This is what web services are about.

A full description of all web service formats and protocols is beyond the scope of this book, but we will show how you can use some of them with Groovy. We cover reading XML resources via RSS and ATOM, followed by using REST and Groovy’s special XML-RPC support on the client and server side, and finally request SOAP services from Groovy as well as writing a simple web service using Groovy.

In case REST and SOAP make it sound like we’re talking about having a bath instead of accessing web services, you’ll be pleased to hear we’re starting with a brief description of some of these protocols and conventions.

An overview of web services

Web service solutions cover a spectrum of approaches from the simple to what some regard as extremely complex. Perhaps the simplest approach is to use the stateless HTTP protocol to request a resource via a URL. This is the basis of the Representational State Transfer (REST) architecture. The term REST has also been used more widely as a synonym for any mechanism for exposing content on the Web via simple XML.

The REST architecture is popular for making content of weblogs available. Two of the most commonly used formats in this area are Really Simple Syndication[11] (RSS) and ATOM (RFC-4287). The next logical extension from using a URL to request a resource is to use simple XML embodied within a normal HTTP POST request. This also can be regarded as a REST solution. We will examine an XML API of this nature as part of our REST tour.

When the focus is not on the remote resource but on triggering an operation on the remote system, the XML Remote Procedure Call (XML-RPC) can be used. XML-RPC uses HTTP but adds context, which makes it a stateful protocol (as opposed to REST).

The SOAP[12] protocol extends the concept of XML-RPC to support not only remote operations but even remote objects. Web service enterprise features that build upon SOAP provide other functionality such as security, transactions, and reliable messaging, to name a few of the many advanced features available.

Now that you have your bearings, let’s look at how Groovy can access two of the most popular web service formats in use today.

Reading RSS and ATOM

Let’s start our day by reading the news. The BBC broadcasts its latest news on an RSS channel. Because we are busy programmers, we are interested only in the top three headlines. A little Groovy program fetches them and prints them to the console. What we would like to see is the headline, a short description, and a URL pointing to the full article in case a headline catches our interest.

Here is some sample output:

The top three news items today:
Three Britons kidnapped in Gaza

http://news.bbc.co.uk/go/rss/-/1/hi/world/middle_east/4564586.stm
Three British citizens have been kidnapped by unidentified gunmen in southern
   Gaza, police say.
----
Geldof defends Tory adviser role
http://news.bbc.co.uk/go/rss/-/1/hi/uk_politics/4564130.stm
Bob Geldof promises to stay politically "non-partisan" after agreeing to
  advise the Tories on global poverty.
----
Glitter 'pays money to accusers'
http://news.bbc.co.uk/go/rss/-/1/hi/world/asia-pacific/4563542.stm
Former singer Gary Glitter paid his alleged victims' families "for co-
   operation", his Vietnamese lawyer says.
----

Listing 12.13 implements this newsreader. It requests the web resource that contains the news as XML. It finds the resource by its URL. Passing the URL to the parse method implicitly fetches it from the Web. The remainder of the code can directly work on the node tree using GPath expressions.

Example 12.13. A simple RSS newsreader

def base = 'http://news.bbc.co.uk/rss/newsonline_uk_edition/'
def url  = base +'front_page/rss091.xml'

println 'The top three news items today:'
def items = new XmlParser().parse(url).channel[0].item
for (item in items[0..2]) {
    println item.title.text()
    println item.link.text()
    println item.description.text()
    println '----'
}

Of course, for writing such code, we need to know what elements and attributes are available in the RSS format. In listing 12.13, we assumed that at least the following structure is available:

<rss ...>
  <channel>
    ...
    <item>
      <title>...      </title>
      <description>...</description>
      <link>...       </link>
      ...

This is only a small subset of the full information. You can find a full description of the RSS and ATOM formats and their various versions in RSS and ATOM in Action.[13]

Reading an ATOM feed is equally easy, as shown in listing 12.14. It reads the weblog of David M. Johnson, one of the fathers of the weblog movement. At the time of writing this chapter, it prints

Sun portal 7 to include JSPWiki, hey what about LGPL?
ApacheCon Tuesday
ApacheCon Tuesday: Tim Bray's keynote
...

One thing that’s new in listing 12.14 is the use of XML namespaces. The ATOM format makes use of namespaces like so:

<feed xmlns="http://www.w3.org/2005/Atom">
    ...
    <entry>
        <title>Sun portal ...</title>
        ...

In order to traverse nodes that are bound to namespaces with GPath expressions, qualified names (QName objects) are used. A QName object can be retrieved from a Namespace object by requesting the property of the corresponding element name.

Example 12.14. Reading an ATOM feed

import groovy.xml.Namespace

def url = 'http://rollerweblogger.org/atom/roller?catname=/Java'

def atom   = new Namespace('http://www.w3.org/2005/Atom')
def titles = new XmlParser().parse(url)[atom.entry][atom.title]

println titles*.text().join("
")

That was all fairly easy, right? The next topic, REST, will be more elaborate but covers a wider area of applicability, because it is a more general approach.

Using a REST-based API

Although most web services are bound to a standard, REST is an open concept rather than a standard. The common denominator of REST services is that

  • XML is used for exchanging data between client and server.

  • Communication is done on a stateless request/response model over HTTP(S).

  • Resources or services are addressed by a URL.

No binding standard describes the structure of the XML that is sent around. You need to look into the documentation of each REST service to find out what information is requested and provided.

For an example, we will look into the REST services of the BackPack web application. BackPack is an online authoring system based on the Wiki[14] concept: It publishes web pages that the author can edit through the browser. You can find it at http://www.backpackit.com. If you want to run the examples from this section, you need to create a free account. You will receive a user-id and a 40-character token for identification. In the following examples, we will use the user-id user and *** as the token. When trying the examples, you need to replace these placeholders with your personal values.

Occasionally, it’s helpful to update the published information programmatically through the REST API. Suppose you have published information about your favorite books’ selling rank, your corporate web site’s Alexa[15] rating, or your current project’s tracking status. With the REST API, you can update such information automatically.

BackPack describes its REST API under http://www.backpackit.com/api. You will find 32 operations together with the XML structure they expect in the request and the XML they respond with.

For example, the create new page operation is available under the URL

http://user.backpackit.com/ws/pages/new

It expects this XML in the request:

<request>
  <token>***</token>
  <page>
    <title>new page title</title>
    <description>initial page body</description>
  </page>
</request>

If the operation is successful, it returns

<response success='true'>
  <page title='new page title' id='1234' />
</response>

Now, how do you get this running from Groovy? You need some way to connect to the URL and send the request XML. You can do this with a UrlConnection and the POST method. The API additionally demands that you set the request header 'X-POST_DATA_FORMAT' to 'xml'. It would be nice to put all the infrastructure code in one place and provide your own little Groovy-friendly API.

To use this API to create a new page, update the content, and finally delete it, the code should be as simple as in Listing 12.15.

Example 12.15. BackPack page manipulation through the Groovy REST API

def bp = new BackPack(account:"user", key:"***")

def response = bp.newPage("Page Title", "Page Description")
def pageId = response.page.@id
println  "created page $pageId"

response   = bp.updateBody(pageId, "new Body")
println  "updating body ok: ${response.@success}"

response   = bp.destroyPage(pageId)
println  "destroying page ok: ${response.@success}"

When every operation succeeds (and when you have the appropriate API in place), listing 12.15 prints

created page 383655
updating body ok: true
destroying page ok: true

The infrastructure class BackPack that implements the Groovy API to the BackPack REST API was written by John Wilson, the grandmaster of Groovy XML, and the full version is available at http://www.wilson.co.uk/Groovy/BackpackAPI.txt.

Listing 12.16 shows a stripped-down version of the original, not covering all operations and without proper error handling. This implementation makes the code in listing 12.15 run, shows the infrastructure code needed for using the HTTP POST method, uses the typical Groovy trick of overriding invokeMethod to make a nice API, and is another compelling example of using builders and parsers with streams.

Example 12.16. BackPack infrastructure class that implements the Groovy REST API

BackPack infrastructure class that implements the Groovy REST API

A call to bp.newPage will be handled by invokeMethod, which looks up the name newPage in the methods map declared at BackPack infrastructure class that implements the Groovy REST API. The methods map stores a closure under that name, which invokeMethod immediately calls BackPack infrastructure class that implements the Groovy REST API, relaying all parameters (title and description) to it.

The closure calls the makeRemoteCall method, providing the distinctive part of the URL that locates the service and a markup closure that is used at BackPack infrastructure class that implements the Groovy REST API to build the request XML.

When using a REST API, it is often beneficial to create an infrastructure class like BackPack in listing 12.16. It is hardly possible to provide a more general solution that can be used with every REST service, because there is no standard that you can build upon.

You will see how useful such a standard is when we look into XML-RPC in the next section.

Using XML-RPC

The XML-RPC specification is almost as old as XML. It is extremely simple and concise. See http://www.xmlrpc.com for all details.

Thanks to this specification, Groovy can provide a general implementation for many of the infrastructure details that you have to write for REST. This general implementation comes with the Groovy distribution.[16] There is nothing extra you have to do or install to make this easy distributed processing environment work.

Perhaps the best way to convince you of its merits is by example. Suppose you have a simple XML-RPC server running on your local machine on port 8080 that exposes an echo operation that returns whatever it receives. Using this service from a Groovy client is as simple as

import groovy.net.xmlrpc.XMLRPCServerProxy as Proxy

def remote = new Proxy('http://localhost:8080/')

assert 'Hello world!' == remote.echo('Hello world!')

Installing a server that implements the echo operation is equally easy. Create a server instance, and assign a closure to its echo property:

import groovy.net.xmlrpc.XMLRPCServer as Server

def server = new Server()

server.echo = { return it }

Finally, the server must be started on a ServerSocket before the client can call it, and it must be stopped afterward. Listing 12.17 installs the echo server, starts it, requests the echo operation, and stops it at the end.

Example 12.17. Self-contained XML-RPC server and client for the echo operation

Self-contained XML-RPC server and client for the echo operation

Having client and server together as shown in listing 12.17 is useful for testing purposes, but in production these two parts usually run on different systems.

XML-RPC also defines fault handling, which in Groovy XML-RPC is available through the XMLRPCCallFailureException with the properties faultString and faultCode.

The areas of applicability for XML-RPC are so wide that any list we could come up with would be necessarily incomplete. It is used for reading and posting to blogs, connecting to instant messaging systems (over the Jabber protocol for systems such as GoogleTalk[17]), news feeds, search engines, continuous integration servers, bug-tracking systems, and so on.

It’s appealing because it is powerful and simple at the same time. Let’s for example find out information about the projects managed at Codehaus.[18] Codehaus provides the JIRA[19] bug-tracking system for its hosted projects.

Printing all project names can be done easily with the following code:

import groovy.net.xmlrpc.XMLRPCServerProxy as Proxy

def remote = new Proxy('http://jira.codehaus.org/rpc/xmlrpc')

def loginToken = remote.jira1.login('user','***')
def projects   = remote.jira1.getProjects(loginToken)
projects.each { println it.name }

It’s conventional for operations exposed via XML-RPC to have a dot-notation like jira1.login. Groovy’s XML-RPC support can deal with that.

However, if you call a lot of methods, using remote.jira1. gets in the way of readability. It would be nicer to avoid that. Listing 12.18 has a solution. Calls to proxy methods can always optionally take a closure. Inside that closure, method names are resolved against the proxy. We extend this behavior with a specialized JiraProxy that prefixes method calls with jira1..

To make things a bit more interesting this time, we print some information about the Groovy project in the Codehaus JIRA.

Example 12.18. Using the JIRA XML-RPC API on the Groovy project

import groovy.net.xmlrpc.XMLRPCServerProxy as Proxy

class JiraProxy extends Proxy {
    JiraProxy (url) { super ( url ) }
    Object invokeMethod(String methodname, args) {
        super.invokeMethod('jira1.'+methodname, args)
    }
}

def jira = new JiraProxy('http://jira.codehaus.org/rpc/xmlrpc')

jira.login('user','***') { loginToken ->
   def projects = getProjects(loginToken)
   def groovy   = projects.find { it.name == 'groovy' }
   println groovy.key
   println groovy.description
   println groovy.lead
}

This prints

GROOVY
Groovy JVM language.
guillaume

Note the simplicity of the code. Unlike with REST, you don’t need to work on XML nodes, either in the request or in the response. You can just use Groovy datatypes such as strings (user), lists (projects), and maps (groovy). Who can ask for more?

There would be a book’s worth more to say about XML-RPC and its Groovy module, especially about implementing the server side. But this book has only so many pages, and you need to refer to the online documentation for more details and usage scenarios.

You now have the basic information to start your work with XML-RPC. Try it! Of all the distributed processing approaches, this is the one that feels the most groovy to us.

We will close our tour through the various options for distributed processing with the all-embracing solution: SOAP.

Applying SOAP

SOAP is the successor of XML-RPC and follows the approach of providing a binding standard. This standard is maintained by the W3C; see http://www.w3.org/TR/soap/.

The SOAP standard extends the XML-RPC standard in multiple dimensions. One extension is datatypes. Where XML-RPC allows only a small fixed set of datatypes, SOAP provides means to define new service-specific datatypes. Other frameworks, including CORBA, DCOM, and Java RMI, provide functionality similar to that of SOAP, but SOAP messages are written entirely in XML and are therefore platform and language independent. The general approach of SOAP is to allow a web service to describe its public API: where it is located, what operations are available, and the request and response formats (called messages). A SOAP service makes this information available via the Web Services Definition Language (WSDL).

SOAP has been widely adopted by the industry, and numerous free services are available, ranging from online shops through financial data, maps, music, payment systems, online auctions, order tracking, blogs, news, picture galleries, weather services, credit card validation—the list is endless.

Numerous programming languages and platforms provide excellent support for SOAP. Popular SOAP stack implementations on the Java platform include Jakarta Axis (http://ws.apache.org/axis/) and XFire (http://xfire.codehaus.org/). Built-in SOAP support for Groovy is still in its infancy, but it’s already in use for production projects. First, we will explore how you can use SOAP with pure Groovy in an effective yet concise manner.

Doing SOAP with plain Groovy

Our example uses a web service at http://www.webservicex.net, which provides a lot of interesting public web services. First, we fetch the service description for its currency converter like so:

import groovy.xml.Namespace

def url = 'http://www.webservicex.net/CurrencyConvertor.asmx?WSDL'

def wsdl = new Namespace('http://schemas.xmlsoap.org/wsdl/','wsdl')
def doc  = new XmlParser().parse(url)

println doc[wsdl.portType][wsdl.operation].'@name'

This prints the available operations:

["ConversionRate", "ConversionRate", "ConversionRate"]

The service exposes three operations named ConversionRate with different characteristics.[20] We are interested in one that takes FromCurrency and ToCurrency as input parameters and returns the current conversion rate. Currencies can be expressed using a format like 'USD' or 'EUR'.

SOAP uses something called an envelope format for the request. The details are beyond the scope of this chapter—see the specifications for details. Our envelope looks like this:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <ConversionRate xmlns="http://www.webserviceX.NET/">
      <FromCurrency>${from}</FromCurrency>
      <ToCurrency>${to}</ToCurrency>
    </ConversionRate>
  </soap:Body>
</soap:Envelope>

As you see from the ${} notation, this envelope is a template that we can use with a Groovy template engine.

Listing 12.19 reads this template, fills it with parameters for US dollar to euro conversion, and adds it to a POST request to the service URL. The request needs some additional request headers—for example, the SOAPAction to make the server understand it. We explicitly use UTF-8 character encoding to avoid any cross-platform encoding problems.

The service responds with a SOAP result envelope. We know it contains a node named ConversionRateResult belonging to the service’s namespace. We locate the first such node in the response and get the conversion rate as its text value.

Example 12.19. Using the ConversionRate SOAP service

Using the ConversionRate SOAP service

At the time of writing, it prints

Current USD to EUR conversion rate: 0.8449

This is straightforward in terms of each individual step, but taken as a whole, the code is fairly cumbersome. One point to note about the implementation is hidden in locating the result in the response envelope. We use the serv namespace and ask it for its ConversionRateResult property, which returns a QName. We assign it to the result variable and make use of the fact that QName implements the equals method with strings such that we find the proper node.

SOAP is verbose compared to other approaches. It is verbose in the code it demands for execution and—more important—it is verbose in its message format. It is not unusual for SOAP messages to have 10 times more XML markup then the payload size.

However, the SOAP standard makes it possible to provide general tools for dealing with its complexity.

Simplifying SOAP access with the GroovySOAP module

One of these tools is the GroovySOAP module, which eases the process of using web services. Download the required jar files as outlined at http://groovy.codehaus.org/Groovy+SOAP, and drop them into your GROOVY_HOME/lib directory. As an example of what you get from the GroovySOAP, listing 12.20 implements the SOAP client for the conversion rate service with a minimum of effort.

Example 12.20. Using the SoapClient from the GroovySOAP module

import groovy.net.soap.SoapClient

def url = 'http://www.webservicex.net/CurrencyConvertor.asmx?WSDL'
def remote = new SoapClient(url)

println 'USD to EUR rate: '+remote.ConversionRate('USD', 'EUR')

Now, that’s a lot groovier! Should your server be using a complex datatype in its response, GroovySOAP will unmarshall it and define a variable in your script. This can be demonstrated using the weather forecast located at webservicex.net. Using a place name located in the USA as an input, the web service replies with a one-week weather forecast in a complex document. Listing 12.21 nicely presents the data with the help of GroovySOAP.

Example 12.21. Using complex data types with the SoapClient

import groovy.net.soap.SoapClient

def url = 'http://www.webservicex.net/WeatherForecast.asmx?WSDL'
def proxy = new SoapClient(url)
def result=proxy.GetWeatherByPlaceName("Seattle")

println result.latitude
println result.details.weatherData[0].weatherImage

Here’s the output:

47.6114349
http://www.nws.noaa.gov/weather/images/fcicons/sct.jpg

Publishing a SOAP service with GroovySOAP

Suppose now that you want to develop your own server. GroovySOAP allows the construction of such a service from a simple Plain Old Groovy Object (POGO) representing your business logic. If you wanted to set up a small math server,[21] you could have a script that looks like listing 12.22.

Example 12.22. The Groovy SOAP service script MathService.groovy

  double add(double op1, double op2) {
    return (op1 + op2)
  }

  double square(double op1) {
    return (op1 * op1)
  }

Note that there is nothing about the script that suggests it has anything to do with a web service. Listing 12.23 exposes this POGO as a web service.

Example 12.23. Using the SoapServer from the GroovySOAP module

import groovy.net.soap.SoapServer

def server = new SoapServer("localhost", 6990)
server.setNode("MathService")
System.out.println("start Math Server")
server.start()

This little bit of magic is possible thanks to the delegation pattern and introspection that enables GroovySOAP to generate automatically the web service interface by filtering the methods inherited from the GroovyObject interface.

It’s worth paying attention to this area of ongoing Groovy development. We anticipate that before long, new SOAP tools will arise and provide more functionality for using web services with Groovy.

Summary

XML is such a big topic that we cannot possibly touch all bases in an introductory book on Groovy. We have covered the most important aspects in enough detail to provide a good basis for experimentation and further reading. When pushing the limits with Groovy XML, you will probably encounter topics that are not covered in this chapter. Don’t hesitate to consult the online resources.

At this point, you have a solid basis for understanding the different ways of working with XML in Groovy.

Using the familiar Java DOM parsers in Groovy enables you to work on the standard org.w3c.com.Node objects whenever the situations calls for it. Such nodes can be retrieved from the DOMBuilder, conveniently accessed with the help of DOMCategory, and investigated with XPath expressions. Groovy makes life with the DOM easier, but it can’t rectify some of the design decisions that give surprises or involve extra work for no benefit.

Groovy’s internal XmlParser and XmlSlurper provide access to XML documents in a Groovy-friendly way that supports GPath expressions for working on the document. XmlParser provides an in-memory representation for in-place manipulation of nodes, whereas XmlSlurper is able to work in a more stream-like fashion. For even further memory reductions, you can also use SAX and StAX.

Finally, it’s easy to send XML around the world to make networked computers work together, sharing information and computing power. XML-RPC and SOAP have support in the Groovy libraries, although that support is likely to change significantly over time. REST can’t benefit from such support as easily (not even in the dynamic world of Groovy) due to a lack of standardization, but you have seen how the use of builders can make the development of an API for a specific REST service straightforward.

Whatever your XML-based activity, Groovy is likely to have something that will ease your work. By now, that shouldn’t come as a surprise.



[1] We recommend XML Made Simple by Deane and Henderson (Made Simple, 2003) as an introductory text and XML 1.1 Bible by Elliotte Rusty Harold (Wiley, 2004) for more comprehensive coverage.

[2] The DocumentBuilderFactory can be augmented in several ways to deliver various DocumentBuilder implementations. See its API documentation for details.

[4] This is short for: a list of references to objects of type groovy.util.Node with name()=='week'.

[5] See http://www.xml.com/pub/a/2003/09/17/stax.html for a tutorial introduction.

[6] This is the main event-based style supported by .NET and will also be included with Java 6.

[7] For a full description of the standard, see http://www.w3.org/TR/xpath; and for a tutorial, see http://www.w3schools.com/xpath/.

[8] This covers only XPath 1.0 because XPath 2.0 is not yet finalized at the time of writing.

[9] When performance is crucial, consider using the Jaxen XPath library which is used by JDOM, dom4j, and XOM for their processing needs as well as being useful on its own.

[11] Also called Rich Site Summary (RSS 0.9x) or Resource Description Framework (RDF) Site Summary (RSS 1.0).

[12] SOAP used to stand for Simple Object Access Protocol, but this meaning has been dropped since version 1.2 because SOAP does more than access objects and the word simple was questionable from the start.

[13] Dave Johnson, RSS and ATOM in Action (Manning, 2006).

[14] Bo Leuf and Ward Cunningham, The Wiki Way: Quick Collaboration on the Web (Addision-Wesley Professional, 2001).

[15] www.alexa.com is a rating service for the popularity of web sites.

[16] It’s not in groovy-all-*.jar but in the GROOVY_HOME/lib directory.

[17] See Guillaume’s excellent article on how to use GoogleTalk through Groovy at http://glaforge.free.fr/weblog/index.php?itemid=142.

[18] www.codehaus.org is the open source platform that hosts popular open source projects such as Groovy and Maven.

[19] Find information about the JIRA XML-RPC methods at http://confluence.atlassian.com/display/JIRA/JIRA+XML-RPC+Overview.

[20] For advice on how to read a WSDL service description, refer to http://www.w3.org/TR/wsdl.

[21] Simple calculations and currency conversions have become the “hello world” of web service examples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.188.201