Reading XML using XmlParser

In the previous recipe, Reading XML using XmlSlurper, we learned how to read an XML document using the XmlSlurper provided by Groovy. Now it's time to look at the other parser available in Groovy, groovy.util.XmlParser. Its internal implementation differs from groovy.util.XmlSlurper, but it exposes a very similar API when it comes to document parsing, navigation, and modification.

In this recipe, we will cover the essential usage scenarios for the XmlParser class and its differences from XmlSlurper.

How to do it...

Let's use the same shakespeare.xml file we used in the Reading XML using XmlSlurper recipe.

  1. Reading XML data is very similar to XmlSlurper. You need to create an instance of XmlParser and pass a file reference to its parse method as shown:
    def xmlSource = new File('shakespeare.xml')
    def bibliography = new XmlParser().parse(xmlSource)
  2. As with XmlSlurper, GPath expressions (see the Searching in XML with GPath recipe for more advanced examples) are also possible with XmlParser. For example, the code to print the titles of all plays written after 1592 would be as follows:
    println bibliography.'bib:author'.text()
    
    bibliography.'lit:play'
            .findAll { it.'lit:year'
                         .text().toInteger() > 1592 }
            .each { println it.'lit:title'.text() }
  3. The output of the script will be the same as in the previous recipe:
    William Shakespeare
    Love's Labour's Lost.
    Romeo and Juliet.
    A Midsummer-Night's Dream.
    

How it works...

Navigating XML data with XmlParser is slightly different from XmlSlurper. In order to find an element, you need to use its fully qualified name (FQN) including the exact prefix. Since, in our XML example, we use the bib: prefix for the author element, we need to refer to author's data as bib:author (or as *:author to be more independent).

If our XML example didn't contain a FQN, then we could have referred to them in a very similar way as we did for XmlSlurper, for example, bibliography.author.

In step 2, you may have noticed that we have used the text method to get the textual representation of the author element. That's because XmlParser returns instances of groovy.util.Node, whose toString method does not return the element's textual content by default. There is also the attribute method that accepts a name and returns the given attribute. If you ask for an attribute that doesn't exist, attribute returns null (this is the opposite behavior of XmlSlurper that returns an empty string).

The main difference between XmlParser and XmlSlurper is that the first uses the groovy.util.Node type and its GPath expressions result in lists of nodes, which are easily manipulable using our knowledge of lists and collections. Compared to XmlSlurper, XmlParser consumes more memory because it has to create an intermediate data structure to represent the node tree, but it makes XML tree queries a bit faster. So, it's up to developers to decide which implementation better suits their needs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.106.237