C H A P T E R  22

image

XML Processing

XML APIs have always been available for the Java developer, usually supplied as third-party libraries that could be added to the runtime classpath. However, in Java 7, you will find that the Java API for XML Processing (JAXP), Java API for XML Binding (JAXB), and even the Java API for XML Web Services (JAX-WS) have been included in the core runtime libraries.

The most fundamental XML processing tasks that you will encounter involve only a few use cases: writing and reading XML documents, validating those documents, and using JAXB to assist in marshalling/unmarshalling Java objects. This chapter provides recipes for these common tasks.

images 0Note The source code for this chapter’s examples is available in the org.java7recipes.chapter22 package. Please see the introductory chapters for instructions on how to find and download this book’s sample source code.

22-1. Writing an XML File

Problem

You want to create an XML document to store application data.

Solution

To write an XML document, use the javax.xml.stream.XMLStreamWriter class. The following code iterates over an array of Patient objects and writes their data to an .xml file. This sample code comes from the org.java7recipes.chapter22.DocWriter example:

import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;

public void run(String outputFile) throws FileNotFoundException, XMLStreamException, IOException {
    Patient[] patients = new Patient[3];
    patients[0].setId(BigInteger.valueOf(1));
    patients[0].setName("John Smith");
    patients[0].setDiagnosis("Common Cold");
    patients[1].setId(BigInteger.valueOf(2));
    patients[1].setName("Jane Doe");
    patients[1].setDiagnosis("Broken Ankle");
    patients[2].setId(BigInteger.valueOf(3));
    patients[2].setName("Jack Brown");
    patients[2].setDiagnosis("Food Allergy");
    XMLOutputFactory factory = XMLOutputFactory.newFactory();
    try (FileOutputStream fos = new FileOutputStream(outputFile)) {
        XMLStreamWriter writer = factory.createXMLStreamWriter(fos, "UTF-8");
        writer.writeStartDocument();
        writer.writeCharacters(" ");
        writer.writeStartElement("patients");
        writer.writeCharacters(" ");            
        for(Patient p: patients) {
            writer.writeCharacters(" ");
            writer.writeStartElement("patient");
            writer.writeAttribute("id", String.valueOf(p.getId()));
            writer.writeCharacters(" ");
            writer.writeStartElement("name");
            writer.writeCharacters(p.getName());
            writer.writeEndElement();
            writer.writeCharacters(" ");
            writer.writeStartElement("diagnosis");
            writer.writeCharacters(p.getDiagnosis());
            writer.writeEndElement();
            writer.writeCharacters(" ");
            writer.writeEndElement();     
            writer.writeCharacters(" ");
        }
        writer.writeEndElement();
        writer.writeEndDocument();
        writer.close();
    }
}

The previous code writes the following file contents:

<?xml version="1.0" ?>
<patients>
    <patient id="1">
        <name>John Smith</name>
        <diagnosis>Common Cold</diagnosis>
    </patient>
    <patient id="2">
        <name>Jane Doe</name>
        <diagnosis>Broken ankle</diagnosis>
    </patient>
    <patient id="3">
        <name>Jack Brown</name>
<diagnosis>Food allergy</diagnosis>
</patient>
</patients>

How It Works

Java 7 provides several ways to write XML documents. One model is the Simple API for XML (SAX). The newer, simpler, and more efficient model is the Streaming API for XML (StAX). This recipe uses StAX defined in the javax.xml.stream package. Writing an XML document takes only five steps:

  1. Create a file output stream.
  2. Create an XML output factory and an XML output stream writer
  3. Wrap the file stream in the XML stream writer.
  4. Use the XML stream writer’s write methods to create the document and write XML elements.
  5. Close the output streams.

Create a file output stream using the java.io.FileOutputStream class. You can use a try-block to open and close this stream. Learn more about the new try-block syntax in Chapter 6.

The javax.xml.stream.XMLOutputFactory provides a static method that creates an output factory. Use the factory to create a javax.xml.stream.XMLStreamWriter.

Once you have the writer, wrap the file stream object within the XML writer instance. You will use the various write methods to create the XML document elements and attributes. Finally, simply close the writer when you finish writing to the file. Some of the more useful methods of the XMLStreamWriter instance are these:

  • writeStartDocument()
  • writeStartElement()
  • writeEndElement()
  • writeEndDocument()
  • writeAttribute()

After creating the file and XMLStreamWriter, you always should begin the document by calling the writeStartDocumentMethod() method. Follow this by writing individual elements using the writeStartElement() and writeEndElement() methods in combination. Of course, elements can have nested elements. You have the responsibility to call these in proper sequence to create well-formed documents. Use the writeAttribute() method to place an attribute name and value into the current element. You should call writeAttribute() immediately after calling the writeStartElement() method. Finally, signal the end of the document with the writeEndDocument() method and close the Writer instance.

One interesting point of using the XMLStreamWriter is that it does not format the document output. Unless you specifically use the writeCharacters() method to output space and new-line characters, the output will stream to a single unformatted line. Of course, this doesn’t invalidate the resulting XML file, but it does make it inconvenient and difficult for a human to read. Therefore, you should consider using the writeCharacters() method to output spacing and new-line characters as needed to create a human readable document. You can safely ignore this method of writing additional whitespace and line breaks if you do not need a document for human readability. Regardless of the format, the XML document will be well formed in that it is adheres to correct XML syntax.

The command-line usage pattern for this example code is this:

java org.java7recipes.chapter22.DocWriter <outputXmlFile>

Invoke this application to create a file named patients.xml in the following way:

java org.java7recipes.chapter22.DocWriter patients.xml

22-2. Reading an XML File

Problem

You need to parse an XML document, retrieving known elements and attributes.

Solution 1

Use the javax.xml.stream.XMLStreamReader interface to read documents. Using this API, your code will pull XML elements using a cursor-like interface similar to that in SQL to process each element in turn. The following code snippet from org.java7recipes.DocReader demonstrates how to read the patients.xml file from the previous recipe:

public void cursorReader(String xmlFile)
throws FileNotFoundException, IOException, XMLStreamException {
    XMLInputFactory factory = XMLInputFactory.newFactory();
    try (FileInputStream fis = new FileInputStream(xmlFile)) {
        XMLStreamReader reader = factory.createXMLStreamReader(fis);
        boolean inName = false;
        boolean inDiagnosis = false;
        String id = null;
        String name = null;
        String diagnosis = null;

        while (reader.hasNext()) {
            int event = reader.next();
            switch (event) {
                case XMLStreamConstants.START_ELEMENT:
                    String elementName = reader.getLocalName();
                    switch (elementName) {
                        case "patient":
                            id = reader.getAttributeValue(0);
                            break;
                        case "name":
                            inName = true;
                            break;
                        case "diagnosis":
                            inDiagnosis = true;
                            break;
                        default:
                            break;
                    }
                    break;
                case XMLStreamConstants.END_ELEMENT:
                    String elementname = reader.getLocalName();
                    if (elementname.equals("patient")) {
                        System.out.printf("Patient: %s Name: %s Diagnosis: %s ",id, name,
diagnosis);
                        id = name = diagnosis = null;
                        inName = inDiagnosis = false;
                    }
                    break;
                case XMLStreamConstants.CHARACTERS:
                    if (inName) {
                        name = reader.getText();
                        inName = false;
                    } else if (inDiagnosis) {
                        diagnosis = reader.getText();
                        inDiagnosis = false;
                    }
                    break;
                default:
                    break;
            }
        }
        reader.close();
    }        
}

Solution 2

Use the XMLEventReader to read and process events using an event-oriented interface. This API is called an iterator-oriented API as well. The following code is much like that of Solution 1, except that it uses the event-oriented API instead of the cursor-oriented API. This code snippet is also available from the same org.java7recipes.DocReader class used in Solution 1:

public void eventReader(String xmlFile)
        throws FileNotFoundException, IOException, XMLStreamException {
    XMLInputFactory factory = XMLInputFactory.newFactory();
    XMLEventReader reader = null;
    try(FileInputStream fis = new FileInputStream(xmlFile)) {
        reader = factory.createXMLEventReader(fis);
        boolean inName = false;
        boolean inDiagnosis = false;
        String id = null;
        String name = null;
        String diagnosis = null;

        while(reader.hasNext()) {
            XMLEvent event = reader.nextEvent();
            String elementName = null;
            switch(event.getEventType()) {
                case XMLEvent.START_ELEMENT:
                    StartElement startElement = event.asStartElement();
                    elementName = startElement.getName().getLocalPart();
                    switch(elementName) {
                        case "patient":
                            id =
startElement.getAttributeByName(QName.valueOf("id")).getValue();

                            break;
                        case "name":
                            inName = true;
                            break;
                        case "diagnosis":
                            inDiagnosis = true;
                            break;
                        default:
                            break;
                    }
                    break;
                case XMLEvent.END_ELEMENT:
                    EndElement endElement = event.asEndElement();
                    elementName = endElement.getName().getLocalPart();
                    if (elementName.equals("patient")) {
                        System.out.printf("Patient: %s Name: %s Diagnosis: %s ",id, name, diagnosis);
                        id = name = diagnosis = null;
                        inName = inDiagnosis = false;
                    }
                    break;                        
                case XMLEvent.CHARACTERS:
                    String value = event.asCharacters().getData();
                    if (inName) {
                        name = value;
                        inName = false;
                    } else if (inDiagnosis) {
                        diagnosis = value;
                        inDiagnosis = false;
                    }
                    break;
            }
        }
    }
    if(reader != null) {
        reader.close();
    }
}

How It Works

Java 7 provides several ways to read XML documents. One way is to use StAX, a streaming model. It is better than the older SAX API in that it allows you to both read and write XML documents. Although StAX is not quite as powerful as a DOM API, it is an excellent and efficient API that is less taxing on memory resources.

StAX provides two methods for reading XML documents: a cursor-oriented API and an iterator-based, event-oriented API. The event-oriented, iterator API is preferred over the cursor API at this time because it provides XMLEvent objects with the following benefits:

  • The XMLEvent objects are immutable and can persist even though the StAX parser has moved on to subsequent events. You can pass these XMLEvent objects to other processes or store them in lists, arrays, and maps.
  • You can subclass XMLEvent, creating your own specialized events as needed.
  • You can modify the incoming event stream by adding or removing events, which is more flexible than the cursor API.

To use StAX to read documents, create an XML event reader on your file input stream. Check that events are still available with the hasNext() method, and read each event using the nextEvent() method. The nextEvent() method will return a specific type of XMLEvent, which corresponds to the start and stop elements, attributes, and value data in the XML file. Remember to close your readers and file streams when finished with those objects.

You can invoke the example application like this, using the patients.xml file as your <xmlFile> argument:

java org.java7recipes.chapter22.DocReader <xmlFile>

22-3. Transforming XML

Problem

You want to convert an XML document to another format, for example HTML.

Solution

Use the javax.xml.transform package to transform an XML document to another document format.

The following code demonstrates how to read a source document, apply an Extensible Stylesheet Language (XSL) transform file, and produce the transformed, new document. Use the sample code from the org.java7recipes.chapter22.TransformXml class to read the patients.xml file and create a patients.html file. The following snippet shows the important pieces of this class:

import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public void run(String xmlFile, String xslFile, String outputFile)
        throws FileNotFoundException, TransformerConfigurationException, TransformerException {
    InputStream xslInputStream = new FileInputStream(xslFile);
    Source xslSource = new StreamSource(xslInputStream);
    TransformerFactory factory = TransformerFactory.newInstance();
    Transformer transformer = factory.newTransformer(xslSource);
    InputStream xmlInputStream = new FileInputStream(xmlFile);
    StreamSource in = new StreamSource(xmlInputStream);
    StreamResult out = new StreamResult(outputFile);
    transformer.transform(in, out);       
    …
}

How It Works

The javax.xml.transform package contains all the classes you need to transform an XML document into any other document type. The most common use case is to convert data-oriented XML documents to user-readable HTML documents.

Transforming from one document type to another requires three files:

  • An XML source document
  • An XSL transformation document that maps XML elements to your new document elements
  • A target output file

The XML source document is, of course, your source data file. It will most often contain data-oriented content that is easy to parse programmatically. However, people don’t easily read XML files, especially complex, data-rich files. Instead, people are much more comfortable reading properly rendered HTML documents.

The XSL transformation document specifies how an XML document should be transformed into a different format. An XSL file will usually contain an HTML template that specifies dynamic fields that will hold the extracted contents of a source XML file.

In this example’s source code, you’ll find two source documents:

  • resources/patients.xml
  • resources/patients.xsl

The patients.xml file is short, containing the following data:

<?xml version="1.0" encoding="UTF-8"?>
<patients>
    <patient id="1">
        <name>John Smith</name>
        <diagnosis>Common Cold</diagnosis>
    </patient>
    <patient id="2">
        <name>Jane Doe</name>
        <diagnosis>Broken ankle</diagnosis>
    </patient>
    <patient id="3">
        <name>Jack Brown</name>
        <diagnosis>Food allergy</diagnosis>
    </patient>
</patients>

The patients.xml file defines a root element called patients. It has three nested patient elements. The patient element contains three pieces of data:

  • Patient identifier, provided as the id attribute of the patient element
  • Patient name, provided as the name subelement
  • Patient diagnosis, provided as the diagnosis subelement

The transformation XSL document (patients.xsl) is quite small as well, and it simply maps the patient data to a more user-readable, HTML format using XSL:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
    <title>Patients</title>
</head>
<body>
    <table border="1">
        <tr>
            <th>Id</th>
            <th>Name</th>
            <th>Diagnosis</th>
        </tr>
        <xsl:for-each select="patients/patient">
        <tr>
            <td>
        <xsl:value-of select="@id"/>
            </td>
            <td>
        <xsl:value-of select="name"/>
            </td>
            <td>
        <xsl:value-of select="diagnosis"/>
            </td>
            </tr>
        </xsl:for-each>
    </table>
</body>
</html>
        </xsl:template>
        </xsl:stylesheet>

Using this stylesheet, the sample code transforms the XMLinto an HTML table containing all the patients and their data. Rendered in a browser, the HTML table should look like the one in Figure 22-1.

images

Figure 22-1. A common rendering of an HTML table

The process for using this XSL file to convert the XML file to an HTML file is straightforward, but every step can be enhanced with additional error checking and processing. For this example, refer to the previous code in the solution section.

The most basic transformation steps are these:

  1. Read the XSL document into your Java application as a Source object.
  2. Create a Transformer instance and provide your XSL Source instance for it to use during its operation.
  3. Create a SourceStream that represents the source XML contents.
  4. Create a StreamResult instance for your output document, which is an HTML file in this case.
  5. Use the Transformer object’s transform() method to perform the conversion.
  6. Close all the relevant streams and file instances as needed.

If you choose to execute the sample code, you should invoke it in the following way, using patients.xml, patients.xsl, and patients.html as arguments:

java org.java7recipes.chapter22.TransformXml <xmlFile><xslFile><outputFile>

22-4. Validating XML

Problem

You want to confirm that your XML is valid, conforming to a known document definition or schema.

Solution

Validate that your XML conforms to a specific schema by using the javax.xml.validation package. The following code snippet from org.java7recipes.chapter22.ValidateXml demonstrates how to validate against an XML schema file:

import java.io.File;
import java.io.IOException;
import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.xml.sax.SAXException;

public void run(String xmlFile, String validationFile) {
    boolean valid = true;
    SchemaFactory sFactory =
            SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
    try {
        Schema schema = sFactory.newSchema(new File(validationFile));
        Validator validator = schema.newValidator();
        Source source = new StreamSource(new File(xmlFile));
        validator.validate(source);
    } catch (SAXException | IOException | IllegalArgumentException ex) {
        valid = false;
    }
    System.out.printf("XML file is %s. ", valid ? "valid" : "invalid");
}

How It Works

The javax.xml.validation package provides all the classes needed to reliably validate an XML file against a variety of schemas. The most common schemas that you will use for XML validation are defined as constant URIs within the XMLConstants class:

  • XMLConstants.W3C_XML_SCHEMA_NS_URI
  • XMLConstants.RELAXNG_NS_URI

Begin by creating a SchemaFactory for a specific type of schema definition. A SchemaFactory knows how to parse a particular schema type and prepares it for validation. Use the SchemaFactory instance to create a Schema object. The Schema object is an in-memory representation of the schema definition grammar. You can use the Schema instance to retrieve a Validator instance that understands this grammar. Finally, use the validate() method to check your XML. The method call will generate several different exceptions if anything goes wrong during the validation. Otherwise, the validate() method returns quietly, and you can continue to use the XML file.

images Note The XML Schema was the first schema to receive “Recommendation” status from the World Wide Web consortium (W3C) in 2001. Competing schemas have since become available. One competing schema is the Regular Language for XML Next Generation (RELAX NG) schema. RELAX NG may be a simpler schema, and its specification also defines a non-XML, compact syntax. This recipe’s example uses the XML schema.

Run the example code using the following command-line syntax, preferably with the sample .xml file and validation files provided as resources/patients.xml and patients.xsl, respectively:

java org.java7recipes.chapter22.ValidateXml <xmlFile><validationFile>

22-5. Creating Java Bindings for an XML Schema

Problem

You would like to generate a set of Java classes (Java bindings) that represent the objects within an XML schema.

Solution

The JDK provides a tool that can turn schema documents into representative Java class files. Use the <JDK_HOME>/bin/xjc command-line tool to generate Java bindings for your XML schemas. To create the Java classes for the patients.xsd file from section 22-3, you could issue the following command from within a console:

xjc –p org.java7recipes.chapter22 patients.xsd

This command will process the patients.xsd file and create all the classes needed to process an XML file that validates with this schema. For this example, the patients.xsd file looks like the following:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="patients">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="patient" type="Patient"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="Patient">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="diagnosis" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:integer" use="required"/>
</xs:complexType>
</xs:schema>

Executed on the previous xsd file, the xjc command creates the following files in the org.java7recipes.chapter22 package:

  • ObjectFactory.java
  • Patients.java
  • Patient.java

How It Works

The JDK includes the <JDK_HOME>/bin/xjc utility. The xjc utility is a command-line application that creates Java bindings from schema files. The source schema files can be of several types, including XML Schemas, RELAX NG, and others.

The xjc command has several options for performing its work. Some of the most common options specify the source schema file, the package of the generated Java binding files, and the output directory that will receive the Java binding files.

You can get detailed descriptions of all the command line options by using the tools’ –help option:

xjc –help

A Java binding contains annotated fields that correspond to the fields defined in the XML Schema file. These annotations mark the root element of the schema file and all other subelements. This is useful during the next step of XML processing, which is either unmarshalling or marshalling these bindings.

22-6. Unmarshalling XML to a Java Object

Problem

You want to unmarshall an XML file and create its corresponding Java object tree.

Solution

JAXB provides an unmarshalling service that parses an XML file and generates the Java objects from the bindings you created in recipe 22-4. The following code can read the file patients.xml from the org.java7recipes.chapter22 package to create a Patients root object and its list of Patient objects:

public void run(String xmlFile, String context)
        throws JAXBException, FileNotFoundException {
    JAXBContext jc = JAXBContext.newInstance(context);
    Unmarshaller u = jc.createUnmarshaller();
    FileInputStream fis = new FileInputStream(xmlFile);
    Patients patients = (Patients)u.unmarshal(fis);
    for (Patient p: patients.getPatient()) {
        System.out.printf("ID: %s ", p.getId());
        System.out.printf("NAME: %s ", p.getName());
        System.out.printf("DIAGNOSIS: %s ", p.getDiagnosis());
    }
}

If you run the sample code on the resources/patients.xml file and use the org.java7recipes.chapter22 context, the application will print the following to the console as it iterates over the Patient object list:

ID: 1
NAME: John Smith
DIAGNOSIS: Common Cold

ID: 2
NAME: Jane Doe
DIAGNOSIS: Broken ankle

ID: 3
NAME: Jack Brown
DIAGNOSIS: Food allergy

images Note The previous output comes directly from instances of the Java Patient class that was created from XML representations. The code does not print the contents of the XML file directly. Instead, it is printing the contents of the Java bindings after the XML has been marshalled into appropriate Java binding instances.

How It Works

Unmarshalling an XML file into its Java object representation has at least two criteria:

  • A well-formed and valid XML file
  • A set of corresponding Java bindings

The Java bindings don’t have to be autogenerated from the xjc command. Once you’ve gained some experience with Java bindings and the annotation features, you may prefer to create and control all aspects of Java binding by handcrafting your Java bindings. Whatever your preference, Java’s unmarshalling service utilizes the bindings and their annotations to map XML objects to a target Java object and to map XML elements to target object fields.

Execute the example application for this recipe using this syntax, substituting patients.xml and org.java7recipes.chapter22 for the respective parameters:

java org.java7recipes.chapter22.UnmarshalPatients <xmlfile><context>

22-7. Building an XML Document with JAXB

Problem

You need to write an object’s data to an XML representation.

Solution

Assuming you have created Java binding files for your XML schema as described in recipe 22-4, use a JAXBContext instance to create a Marshaller object. Use the Marshaller object to serialize your Java object tree to an XML document. The following code demonstrates this:

public void run(String xmlFile, String context)
        throws JAXBException, FileNotFoundException {
    Patients patients = new Patients();
    List<Patient> patientList = patients.getPatient();
    Patient p = new Patient();
    p.setId(BigInteger.valueOf(1));
    p.setName("John Doe");
    p.setDiagnosis("Schizophrenia");
    patientList.add(p);

    JAXBContext jc = JAXBContext.newInstance(context);
    Marshaller m = jc.createMarshaller();
    m.marshal(patients, new FileOutputStream(xmlFile));
}

The previous code produces an unformatted but well-formed and valid XML document. For readability, the XML document is formatted here:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <patients>
    <patient id="1">
        <name>John Doe</name>
        <diagnosis>Schizophrenia</diagnosis>
    </patient>
    </patients>

images Note The getPatient() method in the previous code returns a List of Patient objects instead of a single patient. This is a naming oddity of the JAXB code generation from the XSD schema in this example.

How It Works

A Marshaller object understands JAXB annotations. As it processes classes, it uses the JAXB annotations to provide it the context it needs for creating the object tree in XML.

You can run the previous code from the org.java7recipes.chapter22.MarshalPatients application using the following command line:

java org.java7recipes.chapter22.MarshalPatients <xmlfile><context>

The context argument refers to the package of the Java classes that you will marshal. In the previous example, because the code marshals a Patients object tree, the correct context is the package name of the Patients class. In this case, the context is org.java7recipes.chapter22.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.81.111