Creating Valid XML

As you have seen, XML validators recognize well-formed XML, and this is very useful for picking up syntax errors in your document. Unfortunately, a well-formed, syntactically-correct XML document may still have semantic errors in it. For example, a job in Listing 16.4 with no location or skills does not make sense, but without these elements, the XML document is still well-formed, but not valid.

What is required is a set of rules or constraints that define a valid structure for an XML document. There are two common methods for specifying XML rules—the Document Type Definition (DTD) and XML Schemas.

Document Type Definitions

A DTD provides a template that defines the occurrence, and arrangement of elements and attributes in an XML document. Using a DTD, you can define

  • Element ordering and hierarchy

  • Which attributes are associated with an element

  • Default values and enumeration values for attributes

  • Any entity references used in the document (internal constants, external files, and parameters)

NOTE

Entity references are covered in Appendix A, “An Overview of XML.”


DTDs originated with SGML and have some disadvantages when compared with XML Schemas, which were developed explicitly for XML. One of these disadvantages is that a DTD is not written in XML, which means you have to learn another syntax to define a DTD. Another disadvantage is that DTD's are not as comprehensive as XML Schemas and cannot therefore constrain an XML document as tightly as an XML Schema.

DTD rules can be included in the XML document as document type declarations, or they can be stored in an external document. The syntax is the same in both cases.

If a DTD is being used, the XML document must include a DOCTYPE declaration, which is followed by the name of the root element for the XML document. If an external DTD is being used, the declaration also includes the word SYSTEM followed by a system identifier (the URI that identifies the location of the DTD file). For example

<!DOCTYPE jobSummary SYSTEM "jobSummary.dtd">

specifies that the root element for this XML document is jobSummary and the remainder of the DTD rules are in the file called jobSummary.dtd in the same directory.

An external identifier can also include a public identifier. The public identifier precedes the system identifier and is denoted by the word PUBLIC. An XML processor can use the public identifier to try to generate an alternative URI. If the document is unavailable by this method, the system identifier will be used.

<!DOCTYPE web-app
 PUBLIC '-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN'
 'http://java.sun.com/dtd/web-app_2_3.dtd'>

NOTE

DOCTYPE, SYSTEM and PUBLIC must appear in capitals to be recognized.


Element Type Declarations

The DTD defines every element in the XML document with element type declarations. Each element type declaration takes the following form:

<!ELEMENT name ( content ) >

For example, for the jobSummary XML document in Listing 16.4, the jobSummary root element is defined as

<!ELEMENT jobSummary ( job* )>

The * sign indicates that the jobSummary element may consist of zero or more job elements. There are other symbols used to designate rules for combining elements and these are listed in Table 16.3.

Table 16.3. Occurrence Characters Used in DTD Definitions
CharacterMeaning
*Zero or more (not required)
+One or more (at least one required)
?Element is optional (if present can only appear once)
|Alternate elements
()Group of elements

The following defines an XML job element that must include one location, an optional description, and at least one skill:

<!ELEMENT job (location, description?, skill+)>

Defining the Element Content

Elements can contain other elements, or content, or have elements and content. The jobSummary element, in Listing 16.4, contains other elements but no text body; whereas the location element has a text body but does not contain any elements.

To define an element that has a text body, use the reference #PCDATA (Parsed Character DATA). For example, the location element in Listing 16.4 is defined by

<!ELEMENT location (#PCDATA)>

An element can also have no content (the <br> tag in HTML is such an example). This tag would be defined with the EMPTY keyword as

<!ELEMENT br EMPTY>

You will also see elements defined with contents of ANY. The ANY keyword denotes that the element can contain all possible elements, as well as PCDATA. The use of ANY should be avoided. If your data is so unstructured that it cannot be defined explicitly, there probably is no point in creating a DTD in the first place.

Defining Attributes

In Listing 16.4, the job element has two attributes—customer and reference. Attributes are defined in an ATTLIST that has the following form:

<!ATTLIST element attribute type default-value>
							

The element is the name of the element and attribute is the name of the attribute. The type defines the kind of attribute that is expected. A type is either one of the defined constants described in Table 16.4, or it is an enumerated type where the permitted values are given in a bracketed list.

Table 16.4. DTD Attribute Types
TypeAttribute Is a…
CDATACharacter string.
NMTOKENValid XML name.
NMTOKENSMultiple XML names.
IDUnique identifier.
IDREFAn element found elsewhere in the document. The value for IDREF must match the ID of another element.
ENTITYExternal binary data file (such as a gif image).
ENTITIESMultiple external binary files.
NOTATIONHelper program.

The ATTLIST default-value component defines a value that will be used if one is not supplied. For example

<!ATTLIST button visible (true | false) "true").

defines that the element button has an attribute called visible that can be either true or false. If the attribute is not supplied, because a default value is supplied, it will be set to be true.

The default-value item can also be used to specify that the attribute is #REQUIRED, #FIXED, or #IMPLIED. The meaning of these values is given in Table 16.5.

Table 16.5. DTD Attribute Default Values
Default ValueMeaning
#REQUIREDAttribute must be provided.
#FIXEDEffectively a constant declaration. The attribute must be set to the given value or the XML is not valid.
#IMPLIEDThe attribute is optional and the processing application is allowed to use any appropriate value if required.

Example DTD

Listing 16.7 is the DTD for the jobSummary XML document. Create the DTD in a file called jobSummary.dtd in the same directory as your jobSummary XML document.

Listing 16.7. DTD for jobSummary XML
<!ELEMENT jobSummary (job*)>
<!ELEMENT job (location, description, skill+)>
<!ATTLIST job customer CDATA #REQUIRED>
<!ATTLIST job reference CDATA #REQUIRED>
<!ELEMENT location (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT skill (#PCDATA)>

Don't forget to add the following line to the jobSummary XML at line 2 (following the PI):

<!DOCTYPE jobSummary SYSTEM "jobSummary.dtd">

View the jobSummary.xml document in your XML browser or other XML validator.

If the browser cannot find the DTD, it will generate an error. Edit jobSummary.xml, remove the customer attribute, and check that your XML validator generates an appropriate error (such as “Required attribute 'customer' is missing”).

XML Schemas

As has been already stated, DTDs have some limitations:

  • A DTD cannot define type information other than characters.

  • DTDs were not designed to support namespaces and, although it is possible to add namespaces to a DTD, how to do so is beyond the scope of this book.

  • DTDs are not easily extended.

  • You can only have one DTD per document, so you cannot have different definitions of an element in a single document and have them validated with a DTD.

  • The syntax for DTDs is not XML. Tools and developers must understand the DTD syntax as well as XML.

To address these issues, the XML Schema structure definition mechanism was developed by the W3C to fulfill the role of DTDs while addressing the previously listed limitations. XML Schemas are XML documents.

The XML Schema standard is split into two parts:

  • Specifying the structure and constraints on an XML document

  • A way of defining data types, including a set of pre-defined types

Because it is a more powerful and flexible mechanism than DTDs, the syntax for defining an XML schema is slightly more involved. An example of an XML schema for the jobSummary XML shown in Listing 16.4 can be seen in Listing 16.8.

TIP

The World Wide Web Consortium Web site provides access to a number of XML schema tools, including XML schema browsers and validators. These tools can be found at http://www.w3.org/XML/Schema.


Listing 16.8. XML Schema for Job Agency JobSummary XML Document
<?xml version="1.0"?>
 <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
             elementFormDefault="qualified">

  <xsd:element name="jobSummary">
   <xsd:complexType>
    <xsd:sequence>
     <xsd:element name="job" type="jobType"
                  minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence>
   </xsd:complexType>
  </xsd:element>

  <xsd:complexType name="jobType">
   <xsd:sequence>
    <xsd:element name="location" type="xsd:string"/>
    <xsd:element name="description" type="xsd:string"/>
    <xsd:element name="skill" type="xsd:string"
                 minOccurs="1" maxOccurs="unbounded"/>
   </xsd:sequence>
    <xsd:attribute name="customer" type="xsd:string" use="required"/>
    <xsd:attribute name="reference" type="xsd:string" use="required"/>
  </xsd:complexType>
</xsd:schema>

The first thing to notice is that this schema exists within a namespace as defined on the second line. The string xsd is used by convention for a schema namespace, but any prefix can be used.

Schema Type Definitions and Element and Attribute Declarations

Elements that have sub-elements and/or attributes are defined as complex types. In addition to complex types, there are a number of built-in simple types. Examples of a few simple types are

  • string Any combination of characters

  • integer Whole number

  • float Floating point number

  • boolean true/false or 1/0

  • date yyyy-mm-dd

A complex type element (one with attributes or sub-elements) has to be defined in the schema and will typically contain a set of element declarations, element references, and attribute declarations. Listing 16.8 contains the definition for the job tag complex type, which contains three elements (location, description, and skill) and two attributes (customer and reference).

In a schema, like a DTD, elements can be made optional or required. The job element in Listing 16.8 is optional because the value of the minOccurs attribute is 0. In general, an element is required to appear when the value of minOccurs is 1 or more. Similarly, the maximum number of times an element can appear is determined by the value of maxOccurs. This value can be a positive integer or the term unbounded to indicate there is no maximum number of occurrences. The default value for both the minOccurs and the maxOccurs attributes is 1. If you do not specify the number of occurrences, the element must be present and must occur only once.

Element attributes can be declared with a use attribute to indicate whether the element attribute is required, optional, or even prohibited.

There are more aspects to schemas than it is possible to cover in this book. Visit the WC3 Web site (www.w3.org) for more information on XML schemas and all other aspects of XML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.228.88