Chapter 27. XML and Java

  • XML Versus HTML

  • Some Rules of XML

  • The Document Type Definition (DTD)

  • That Is XML Used For?

  • XML Versions and Glossary

  • JAXP Library Contents

  • Reading XML With DOM Parsers

  • A Program That Uses a DOM Parser

  • Reading an XML File—SAX Parsers

  • A Program That Uses a SAX Parser

  • The Factory Design Pattern

  • Design Pattern Summary

  • Other Java XML Notes

  • Further Reading

  • Exercises

  • Some Light Relief—“View Source” on Kevin's Life

This chapter is in three parts. The first part describes XML, what it's for, and how you use it. It's straightforward and is described in a couple of sections in the chapter. The largest part of this chapter describes Java support for XML, covering how you access and update XML documents. The XML world defines two different algorithms for accessing XML documents (“everything at once” versus “piece by piece”), and Java supports them both. We put together a Java program that uses each of these algorithms. The third part of the chapter explains how to use the Java library for XML so you can trying running the code for yourself.

XML Versus HTML

You'll probably be relieved to hear that the basics of XML can be learned in a few minutes, though it takes a while longer to master the accompanying tools and standards. XML is a set of rules, guidelines, and conventions for describing structured data in a plain text editable file. The abbreviation XML stands for “eXtensible Mark-up Language.”

XML is related to the HTML used to write web pages, and has a similar appearance of text with mark-up tags sprinkled through it.

  • HTML mark-up tags are things like <br> (break to a new line), <table> (start a table), and <li> (make an entry in a list). In HTML the set of mark-up tags are fixed in advance, and the only purpose for most of them is to guide the way something looks on the screen.

  • With XML, you define your own tags and attributes (and thus it is “extensible”) and you give them meaning, and that meaning goes way beyond minor points like the font size to use when printing something out.

XML advantages over HTML

Don't make the mistake of thinking that XML is merely “HTML on steroids.” Although we approach it from HTML to make it easy to explain, XML does much more than HTML does. XML offers the following advantages:

  • It is an archival representation of data. Because its format is in plain text and carried around with the data, it can never be lost. That contrasts with binary representations of a file which all too easily become outdated. If this was all it did, it would be enough to justify its existence.

  • It provides a way to web-publish files that can be directly processed by computer, rather than merely human-readable text and pictures.

  • It is plain text, so it can be read by people without special tools.

  • It can easily be transformed into HTML, or PDF, or data structures internal to a program, or any other format yet to be dreamed up, so it is “future-proof.”

  • It's portable, open, and a standard, which makes it a great fit with Java.

We will see these benefits as we go through this chapter. XML holds the promise of taking web-based systems to the next level by supporting data interchange everywhere. The web made everyone into a publisher of human-readable HTML files. XML lets everyone become a publisher or consumer of computer-readable data files.

Keep that concept in mind as we go through this example.

HTML—a good display format and not much else

We'll start with HTML because it's a good way to get into XML. Let's say you have an online business selling CDs of popular music. You'll probably have a catalog of your inventory online, so that customers know what's in stock. Amazon.com works exactly like this. One possibility for storing your inventory is to put it in an HTML table. Each row will hold information on a particular CD title, and each column will be the details you keep about a CD—the title, artist, price, number in stock, and so on. The HTML for some of your online inventory might look like this:

<table>
<tr> <th>title</th>   <th>artist</th>   <th>price</th>  <th>stock</th>  </tr>

<tr> <td>The Tubes</td>   <td>The Tubes</td>   <td>22</td>  <td>3</td>  </tr>

<tr> <td>Some Girls</td>   <td>Rolling Stones</td>   <td>25</td>  <td>5</td>  </tr>

<tr> <td>Tubthumper</td>   <td>Chumbawamba</td>   <td>17</td>  <td>6</td>  </tr>
</table>

We are using tags like <tr> to define table rows. When you display it in a web page, it looks like Figure 27-1.

Figure 27-1. HTML table displayed in a browser

image

The HTML table is a reasonable format for displaying data, but it's no help for all the other things you might want to do with your data, like search it, update it, or share it with others.

Say we want to find all CDs by some particular artist. We can look for that string in the HTML file, but HTML doesn't have any way to restrict the search to the “artist” column. When we find the string, we can't easily tell if it's in the title column or the artist column or somewhere else again. HTML tables aren't very useful for holding data with a variable number of elements. Say imported CDs have additional fields relating to country, genre, non-discount status, and so on. With HTML, we have to add those fields to all CDs, or put imported CDs in a special table of their own, or find some other hack.

XML does things that HTML cannot

This is where XML comes in. The basic idea is that you represent your data in character form, and each field (or “element,” as it is properly called) has tags that say what it is. It looks that straightforward! Just as with HTML, XML tags consist of an opening angle bracket followed by a string and a closing angle bracket. The XML version of your online CD catalog might look like this:

<cd> <title>The Tubes</title>    <artist>The Tubes</artist>
    <price>22</price>          <qty>3</qty>  </cd>

<cd> <title>Some Girls</title>   <artist>Rolling Stones</artist>
    <price>25</price>  <qty>5</qty>  </cd>

<cd> <title>Tubthumper</title>   <artist>Chumbawamba</artist>
    <price>17</price>  <qty>6</qty>  </cd>

It looks trivial, but the simple act of storing everything as character data and wrapping it with a pair of labels saying what it is opens up some powerful possibilities that we will get into shortly. XML is intended for some entirely different uses than displaying in a browser. In fact, most browsers ignore tags that they don't recognize, so if you browse an XML file you'll just get the embedded text without tags (unless the browser recognizes XML, as recent versions of Microsoft's Internet Explorer do).

Don't double-wrap data; transform it

Should we also wrap HTML around the XML so it can be displayed in a browser? You could do that, but it is not the usual approach. XML is usually consumed by data-processing programs, not by a browser. The purpose of XML is to make it easy for enterprise programs to pass around data together with their structure.

It's much more common to keep the data as records in a database, extract and convert it into XML on demand, pass the XML around, then have a servlet or JSP program read the XML and transform it into HTML on the fly as it sends the data to a browser. The Java XSLT library, in package javax.xml.transform, does exactly that. Let us go on to make a few perhaps obvious remarks about the rules of XML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.70.247