Creating Valid XML

As you have seen, XML validators recognize well-formed XML, and this is very useful for picking up syntax errors in your document. Unfortunately, a well-formed, syntactically-correct XML document may still have semantic errors in it. For example, a job in Listing 16.4 with no location or skills does not make sense, but without these elements, the XML document is still well-formed, but not valid.

What is required is a set of rules or constraints that define what is a valid structure for an XML document. There are two common methods for specifying XML rules—the Document Type Definition (DTD) and schemas.

Document Type Definitions

A DTD provides a template that defines the occurrence, and arrangement of elements and attributes in an XML document. Using a DTD, you can define

  • Element ordering and hierarchy

  • Which attributes are associated with an element

  • Default values and enumeration values for attributes

  • Any entity references used in the document (internal constants, external files, and parameters)

Note

Entity references are covered in Appendix C, “An Overview of XML,” on the CD-ROM.


DTDs originated with SGML and have some disadvantages when compared with XML Schemas, which were developed explicitly for XML. One of these disadvantages is that a DTD is not written in XML, which means you have to learn another syntax to define a DTD.

DTD rules can be included in the XML document as document type declarations, or they can stored in an external document. The syntax is the same in both cases.

If a DTD is being used, the XML document must include a DOCTYPE declaration, which is followed by the name of the root element for the XML document. If an external DTD is being used, the declaration also includes the word SYSTEM followed by a system identifier (the URI that identifies the location of the DTD file). For example

<!DOCTYPE jobSummary SYSTEM "jobSummary.dtd">

specifies that the root element for this XML document is jobSummary and the remainder of the DTD rules are in the file called jobSummary.dtd in the same directory.

An external identifier can also include a public identifier. The public identifier precedes the system identifier and is denoted by the word PUBLIC. An XML processor can use the public identifier to try to generate an alternative URI. If the document is unavailable by this method, the system identifier will be used.

<!DOCTYPE web-app PUBLIC '-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN''http:/
/java.sun.com/dtd/web-app_2_3.dtd'>

Note

DOCTYPE, SYSTEM and PUBLIC must appear in capitals to be recognized.


Element Type Declarations

The DTD defines every element in the XML document with element type declarations. Each element type declaration takes the following form:

<!ELEMENT name ( content ) >

For example, for the jobSummary XML document in Listing 16.4, the jobSummary root element is defined as

<!ELEMENT jobSummary ( job* )>

The * sign indicates that the jobSummary element may consist of zero or more job elements. There are other symbols used to designate rules for combining elements and these are listed in Table 16.3.

Table 16.3. Occurrence Characters Used in DTD Definitions
Character Meaning
* Zero or more (not required)
+ One or more (at least one required)
? Element is optional (if present can only appear once)
| Alternate elements
() Group of elements

The following defines an XML job element that must include one location, an optional description, and at least one skill.

<!ELEMENT job (location, description*, skill+)>

Defining the Element Content

Elements can contain other elements, or content, or have elements and content. The jobSummary element, in Listing 16.4, contains other elements but no text body; whereas the location element has a text body but does not contain any elements.

To define an element that has a text body, use the reference #PCDATA (Parsed Character DATA). For example, the location element in Listing 16.4 is defined by

<!ELEMENT location (#PCDATA)>

An element can also have no content (the <br> tag in HTML is such an example). This tag would be defined with the EMPTY keyword as

<!ELEMENT br EMPTY>

You will also see elements defined with contents of ANY. The ANY keyword denotes that the element can contain all possible elements, as well as PCDATA. The use of ANY should be avoided. If your data is so unstructured that it cannot be defined explicitly, there probably is no point in creating a DTD in the first place.

Defining Attributes

In Listing 16.4, the job element has two attributes—customer and reference. Attributes are defined in an ATTLIST that has the following form:

<!ATTLIST element attribute type default-value>

The element is the name of the element and attribute is the name of the attribute. The type defines the kind of attribute that is expected. A type is either one of the defined constants described in Table 16.4, or it is an enumerated type where the permitted values are given in a bracketed list.

Table 16.4. DTD Attribute Types
Type Attribute Is a…
CDATA Character string.
NMTOKEN Valid XML name.
NMTOKENS Multiple XML names.
ID Unique identifier.
IDREF An element found elsewhere in the document. The value for IDREF must match the ID of another element.
ENTITY External binary data file (such as a gif image).
ENTITIES Multiple external binary files.
NOTATION Helper program

The ATTLIST default-value component defines a value that will be used if one is not supplied. For example

<!ATTLIST button visible (true | false) "true").

defines that the element button has an attribute called visible that can be either true or false. If the attribute is not supplied, because of the default value it will be assumed to be true.

The default-value item can also be used to specify that the attribute is #REQUIRED, #FIXED, or #IMPLIED. The meaning of these values is given in Table 16.5.

Table 16.5. DTD Attribute Default Values
Default Value Meaning
#REQUIRED Attribute must be provided.
#FIXED Effectively a constant declaration. The attribute must be set to the given value or the XML is not valid.
#IMPLIED The attribute is optional and the processing application is allowed to use any appropriate value if required.

Example DTD

Listing 16.5 is the DTD for the jobSummary XML document. Create the DTD in a file called jobSummary.dtd in the same directory as your jobSummary XML document.

Listing 16.5. DTD for jobSummary XML
1: <!ELEMENT jobSummary (job*)>
2: <!ELEMENT job (location, description, skill+)>
3: <!ATTLIST job customer CDATA #REQUIRED>
4: <!ATTLIST job reference CDATA #REQUIRED>
5: <!ELEMENT location (#PCDATA)>
6: <!ELEMENT description (#PCDATA)>
7: <!ELEMENT skill (#PCDATA)>

Don't forget to add the following line to the jobSummary XML at line 2 (following the PI):

<!DOCTYPE jobSummary SYSTEM "jobSummary.dtd">

View the jobSummary.xml document in your XML browser or other XML validator.

If the browser cannot find the DTD, it will generate an error. Edit jobSummary.xml, remove the customer attribute, and check that your XML validator generates an appropriate error (such as “Required attribute 'customer' is missing”).

Namespaces

When an individual designs an XML structure for some data, he or she is free to choose tag names that are appropriate for the data. Consequently, there is nothing to stop two individuals from using the same tag name for different purposes or in different ways. Consider the job agency that deals with two contract companies, each of which uses a different form of job description (such as those in Listings 16.3 and 16.4). How can an application differentiate between these different types of book descriptions?

The answer is to use namespaces. XML provides namespaces that can be used to impose a hierarchical structure on XML tag names in the same way that Java packages provides a naming hierarchy for Java methods. You can define a unique namespace with which you can qualify your tags to avoid them being confused with those from other XML authors.

An attribute called xmlns (XML Name Space) is added to an element tag in a document and is used to define the namespace. For example, line 2 in Listing 16.6 indicates that the tags for the whole of this document are scoped within the agency namespace.

Listing 16.6. XML Document with Namespace
 1: <?xml version ="1.0"?>
 2: <jobSummary xmlns="agency">
 3:   <job customer="winston" reference="Cigar Trimmer">
 4:     <location>London</location>
 5:     <description>Must like to talk and smoke</description>
 6:     <skill>Cigar maker</skill>
 7:     <skill>Critic</skill>
 8:   </job>
 9:   <job customer="george" reference="Tree pruner">
10:     <location>Washington</location>
11:     <description>Must be honest</description>
12:     <skill>Tree surgeon</skill>
13:   </job>
14: </jobSummary>

The xmlns attribute can be added to any element in the document to enable scoping of elements, and multiple namespaces can be defined in the same document using a prefix. For example, Listing 16.7 has two namespaces—ad and be. All the tags have been prefixed with the appropriate namespace and now two different forms of the job tag (one with attributes and one without) can coexist in the same file.

Listing 16.7. XML Document with NameSpaces
<?xml version ="1.0"?>
<jobSummary xmlns:ad="ADAgency" xmlns:be="BEAgency">
<ad:job customer="winston" reference="Cigar Trimmer">
    <ad:location>London</ad:location>
    <ad:description>Must like to talk and smoke</ad:description>
    <ad:skill>Cigar maker</ad:skill>
    <ad:skill>Critic</ad:skill>
  </ad:job>
  <be:job>
    <be:customer>george</be:customer>
    <be:reference>Tree pruner</be:refenence>
    <be:location>Washington</be:location>
    <be:description>Must be honest</be:description>
    <be:skill>Tree surgeon</be:skill>
  </be:job>
</jobSummary>
						

Enforcing Document Structure with an XML Schema

As has been already stated, DTDs existed before XML, and they have some limitations:

  • A DTD cannot define type information other than characters.

  • DTDs were not designed to support namespaces and, although it is possible to add namespaces to a DTD, how to do so is beyond the scope of this book.

  • DTDs are not easily extended.

  • You can only have one DTD per-document, so you cannot have different definitions of an element in a single document and have them validated with a DTD.

  • The syntax for DTDs is not XML. Tools and developers must understand the DTD syntax as well as XML.

To address these issues, a new structure definition mechanism was developed by the W3C to fulfil the role of DTDs while addressing the previously listed limitations. This mechanism is called an XML Schema. It uses XML to represent structure and type information.

The XML Schema standard is split into two parts:

  • Specifying the structure and constraints on an XML document

  • A way of defining data types, including a set of pre-defined types

Because it is a more powerful and flexible mechanism than DTDs, the syntax for defining an XML schema is slightly more involved. An example of an XML schema for the jobSummary XML shown in Listing 16.4 can be seen in Listing 16.8.

Tip

The World Wide Web Consortium provides an online XML schema validator. It can be accessed via www.w3.org/2001/03/webdata/xsv. If your schema is not accessible via the Web, you will have to upload the file to the W3C site.


Listing 16.8. XML Schema for Job Agency JobSummary XML Document
 1: <?xml version="1.0"?>
 2:   <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
 3:
 4:     <xsd:element name="jobSummary">
 5:       <xsd:complexType>
 6:         <xsd:sequence>
 7:           <xsd:element name="job" type="jobType" minOccurs="0" maxOccurs="unbounded"/>
 8:         </xsd:sequence>
 9:       </xsd:complexType>
10:     </xsd:element>
11:
12:     <xsd:complexType name="jobType">
13:       <xsd:sequence>
14:         <xsd:element name="location" type="xsd:string"/>
15:         <xsd:element name="description" type="xsd:string"/>
16:         <xsd:element name="skill" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/>
17:       </xsd:sequence>
18:         <xsd:attribute name="customer" type="xsd:string" use="required"/>
19:         <xsd:attribute name="reference" type="xsd:string" use="required"/>
20:     </xsd:complexType>
21:   </xsd:schema>
						

The first thing to notice is that this schema exists within a namespace as defined on line 2. The string xsd is used by convention for a schema namespace, but any prefix can be used.

Schema Type Definitions and Element and Attribute Declarations

Elements that have sub-elements and/or attributes are defined as complex types. In addition to complex types, there are a number of built-in simple types. Examples of a few simple types are

  • string Any combination of characters

  • integer Whole number

  • float Floating point number

  • boolean true/false or 1/0

  • date yyyy-mm-dd

A complex type element (one with attributes or sub-elements) has to be defined in the schema and will typically contain a set of element declarations, element references, and attribute declarations. Line 12 of Listing 16.8 is the start of the definition for the job tag complex type, which contains three elements (location, description, and skill) and two attributes (customer and reference).

In a schema, like a DTD, elements can be made optional or required. The job element on line 7 is optional because the value of the minOccurs attribute is 0. In general, an element is required to appear when the value of minOccurs is 1 or more. Similarly, the maximum number of times an element can appear is determined by the value of maxOccurs. This value can be a positive integer or the term unbounded to indicate there is no maximum number of occurrences. The default value for both the minOccurs and the maxOccurs attributes is 1. If you do not specify the number of occurrences, the element must be present and must only occur once.

Element attributes (like those on lines 18 and 19) can be declared with a use attribute to indicate whether the element attribute is required, optional, or even prohibited.

There are more aspects to schemas than it is possible to cover here in this book on J2EE. Visit the WC3 Web site (www.w3.org) for more information on XML schemas and all other aspects of XML.

How XML Is Used in J2EE

XML is portable data, and the Java platform is portable code. Add Java APIs for XML that make it easy to use XML and, together, you have the ideal combination:

  • Portability of data

  • Portability of code

  • Ease of use

The J2EE platform bundles all these advantages together.

Enterprises are rapidly discovering the benefits of using J2EE for developing Web Services that use XML for the dissemination and integration of data. Particularly because XML eases both the sharing of legacy data, internally among departments, and the sharing of any data with other enterprises.

J2EE includes the Java API for XML Processing (JAXP) that makes it easy to process XML data with applications written in Java. JAXP embraces the parser standards:

  • Simple API for XML Parsing (SAX) for parsing XML as a stream.

  • Document Object Model (DOM) to build an in-memory tree representation of an XML document.

  • XML Stylesheet Language Transformations (XSLT) to control the presentation of the data and convert it to other XML documents or to other formats, such as HTML. XLST is covered on Day 17, “Transforming XML Documents.”

JAXP also provides namespace support, allowing you to work with multiple XML documents that might otherwise cause naming conflicts.

Internally, J2EE also uses XML to store configuration information about applications. You will have seen the deployment descriptor on many occasions while working through this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.127.37