If XML is used to transfer information between applications, there needs to be a mechanism for ensuring that the XML is not only syntactically correct but also is structurally correct. In fact, there are two common mechanisms for this:
Document Type Definitions
XML Schemas
A Document Type Definition (DTD) is a way of defining the structure of an XML document. DTD elements can be included in the XML document itself or in a separate external document. The syntax used to define a DTD is different from XML itself.
The following is an example DTD that describes the jobSummary XML:
<!DOCTYPE jobSummary> <!ELEMENT jobSummary (job*)> <!ELEMENT job (location, description?, skill*)> <!ATTLIST job customer CDATA #REQUIRED> <!ATTLIST job reference CDATA #REQUIRED> <!ELEMENT location (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT skill (#PCDATA)>
The !DOCTYPE element must include the name of the root element. If the remainder of the document type definitions are stored in an external file, it will have the following form:
<!DOCTYPE root_element SYSTEM "external_filename">>
If the definitions are included in the XML document itself, the !DOCTYPE element must appear in the document prolog before the actual document data begins. In this case, the !DOCTYPE element must include all the DTD elements with the following syntax:
<!DOCTYPE jobSummary [ <!ELEMENT jobSummary (job*)> <!ELEMENT job (location, description?, skill*)> <!ATTLIST job customer CDATA #REQUIRED> <!ATTLIST job reference CDATA #REQUIRED> <!ELEMENT location (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT skill (#PCDATA)> ]>
The other elements (!ELEMENT and !ATTLIST) are described in this section.
Element declarations take the following form:
<!ELEMENT element_name (content)>
where element_name is the XML tag and content is one or more of the values shown in Table C.2.
Note
#PCDATA limits the content of the element to character data only; nested elements are not allowed. Do no confuse with CDATA sections in XML that are used to present large areas of un-interpreted text.
The characters in Table C.3 can be used to combine multiple element content types to define more complex elements.
The following is a declaration for the job element:
<!ELEMENT job (location, description?, skill*)>
The job element consists of, in order, one location, an optional description, and an optional list of skill elements.
Attribute declarations take the following form:
<!ATTLIST element_name attribute_1_name (type) default-value attribute_2_name (type) default-value>
An attribute type can be any one of the types shown in Table C.4, though CDATA (text) is the most common.
The default-value item can also be used to specify that the attribute is #REQUIRED, #FIXED, or #IMPLIED. The meanings of these values are presented in Table C.5.
Another DTD element not mentioned so far is an entity reference. An entity reference has more than one form. The first, called a general entity reference, provides shorthand for often-used text. An entity reference has the following format:
<!ENTITY name "replacement text">
Note
This is, in fact, how the special characters are handled. The character entity & is defined as <!ENTITY & "&">.
The entity reference called name can be referred to in the XML document using &name;, as shown in the following:
<!DOCTYPE book [ ... <ENTITY copyright "Copyright 2002 by Sams Publishing> ]> <book title="J2EE in 21 Days">A very useful book ©right;</book>
The second form, called an external entity reference, provides a mechanism to include data from external sources into the document's contents. This has the following format:
<!ENTITY name SYSTEM "URI">
For example, if the file Copy.xml that can be retrieved from the Sams Web site contains the following XML fragment
<copyright> <date>2002</date> <publisher>Sams Publishing</publisher> </copyright>
this can be referenced in any XML document as follows:
<!DOCTYPE [ ... <ENITITY copyright http://www.samspublishing.com/xml/Copy.xml> ]> <book> <title>J2EE in 21 Days> ..©right; <synopsis>All you need to know about J2EE</synopsis> </book>
Like DTDs, an XML Schema can be used to specify the structure of an XML document. In addition, it has many advantages over DTDs:
Schemas have a way of defining data types, including a set of pre-defined types.
A schema is namespace aware.
It is possible to precisely specify the number of occurrences of an element (as opposed to a DTD's imprecise use of ?, *, and +) with the minOccurs and maxOccurs attributes.
The ability to restrict the values that can be assigned to predefined types.
A schema is written in XML.
The following is a schema to define the jobSummary XML:
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xsd:element name="jobSummary"> <xsd:complexType> <xsd:sequence> <xsd:element name="job" type="jobType" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:complexType name="jobType"> <xsd:sequence> <xsd:element name="location" type="xsd:string"/> <xsd:element name="description" type="xsd:string"/> <xsd:element name="skill" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> <xsd:attribute name="customer" type="xsd:string" use="required"/> <xsd:attribute name="reference" type="xsd:string" use="required"/> </xsd:complexType> </xsd:schema>
In schemas, elements can have a type attribute that can be one of the following:
There are considerably more predefined simple data types. A full list can be obtained from the W3C Web site.
Or an element can be a complex type, which is a combination of elements or elements and text.
The number of times an element can appear is controlled by two attributes:
minOccurs
maxOccurs
For example, the following skill element must appear at least once and can occur any number of times.
<xsd:element name="skill" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/>
Elements can be made optional by setting the value of the minOccurs attribute to 0.
Element attributes can be declared with a use attribute to indicate whether the element attribute is required, optional, or even prohibited.
A declaration of a complex type generally includes one of the following that specifies how the elements appear in the document:
3.145.86.211