Reading XML using XmlSlurper

The eXtensible Markup Language or simply XML is the standard data format for exchanging information among computer systems. The first two recipes of this chapter show how to parse XML using Groovy. There are two parsers available in the groovy.util package, XmlParser and XmlSlurper. They both expose similar API; but there are use cases for when it is more appropriate to use one or the other. In this recipe, we look at how to read XML with XmlSlurper and its main peculiarities.

Getting ready

For the examples in the rest of this recipe, we will work with an XML document (shown in the following code) containing a list of works from William Shakespeare. The document is named shakespeare.xml:

<?xml version="1.0" ?>
<bib:bibliography xmlns:bib="http://bibliography.org"xmlns:lit="http://literature.org">
<bib:author>William Shakespeare</bib:author>
  <lit:play>
    <lit:year>1589</lit:year>
    <lit:title>The Two Gentlemen of Verona.</lit:title>
  </lit:play>
  <lit:play>
    <lit:year>1594</lit:year>
    <lit:title>Love's Labour's Lost.</lit:title>
  </lit:play>
  <lit:play>
    <lit:year>1594</lit:year>
    <lit:title>Romeo and Juliet.</lit:title>
  </lit:play>
  <lit:play>
    <lit:year>1595</lit:year>
    <lit:title>A Midsummer-Night's Dream.</lit:title>
  </lit:play>
</bib:bibliography>

How to do it...

Let's go through the process of parsing the previously mentioned XML file:

  1. One way to read XML data using XmlSlurper is to create an instance of the class and pass a java.io.File object, which references the file we want to read, into the parse method:
    def xmlSource = new File('shakespeare.xml')
    def bibliography = new XmlSlurper().parse(xmlSource)
  2. The parse method returns an implementation of groovy.util.slurpersupport.GPathResult, which can be used to navigate the XML element tree. For example, the following code will print the text representation of the author element:
    println bibliography.author
  3. Deeper elements and element collections can also be referenced with the help of the "." operator. Also, a set of finder and iterator methods are available to build complex search expressions:
    bibliography.play
            .findAll { it.year.toInteger() > 1592 }
            .each { println it.title }

    The expressions that are used to navigate (and eventually also modify) the XML tree are referred to as GPath expressions. More examples of those expressions can be found in the Searching in XML with GPath recipe.

  4. The output of the script should be as follows:
    William Shakespeare
    Love's Labour's Lost.
    Romeo and Juliet.
    A Midsummer-Night's Dream.
    

How it works...

The previous example selects all the plays written after 1592 and prints their titles.

Groovy's XmlSlurper resides in the groovy.util package, which is imported automatically by Groovy. That's why we do not need an import statement for that class.

XmlSlurper is a SAX-based parser; it loads the full document in memory, but it doesn't require extra memory to process the document using GPath. GPath expressions are lazily evaluated and no extra objects are created when evaluating the expression. XmlSlurper is also null-safe: when accessing an attribute that doesn't exist, it returns an empty string; the same goes for a non-existing node.

As a rule of thumb, you want to use XmlSlurper when you intend to process only a small part of the document; while it is more efficient to use XmlParser when you have to process the whole XML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.69.199