There are two parts to the XML Schema specification: Structures and Data Types. The Structures specification describes a replacement syntax for describing XML documents to a finer granularity than is possible with a document type definition (DTD—the current method standardized with the XML 1.0 recommendation). The Data Types specification defines primi tive data types that can be used in XML schema and other XML specifications like XSL and RDF.
Note
At the time of this writing, the XML Schema specification is still a working draft. The latest specification can be found at http://www.w3c.org/TR.
The purpose of an XML Schema is to define and describe a class of XML documents by using XML-compliant markup to constrain and document the meaning, usage, and relationships of the document's datatypes; elements and their content; attributes and their values; entities and their contents; and notations.
The XML Schema:Structures formalism will allow a useful level of constraint checking to be described and validated for a wide spectrum of XML applications.
XML Schema:Structures has a dependency on the data typing mechanisms defined in its companion document, XML Schemas:Datatypes, published simultaneously.
These are key definitions in the specification:
Instance— An XML document whose structure conforms to some schema. Documents are associated with the schema to which they conform.
Schema— A set of rules for constraining the structure and articulating the information set of XML documents.
The key idea behind XML Schema is to define the vocabulary and content model of a markup language using the rules of XML. The basic features of XML Schema are listed in Table 4.7.
XML 1.0 does not provide any facility for rigorous type checking of data elements in an XML-compliant document. This specification defines standard data types for constraining values in element content and attributes'values.
The current specification concerns itself with scalar datatypes. A scalar is a single constrained value (formally, a value described in its entirety by magnitude).
Future versions of this specification will also cover aggregate data types like sets and bags (collections).
In this specification, a datatype has a set of distinct values, called its value space, and is characterized by facets or properties of those values and by operations on or resulting in those values. Further, each datatype is characterized by a space consisting of valid lexical representations for each value in the value space. A value space is an abstract collection of permitted values for the datatype. The lexical space for a datatype consists of a set of valid literals. Each value in the datatype's value space maps to one or more valid literals in its lexical space.
Datatypes can be broken down into several dichotomies. The first of these is atomic versus aggregate:
Atomic datatypes are those having values that are intrinsically indivisible.
Aggregate datatypes are those having values that can be decomposed into two or more component values.
Next is primitive versus generated:
Primitive datatypes are those that are not defined in terms of other datatypes.
Generated datatypes are those that are defined in terms of other datatypes.
Finally, built-in versus user-generated:
Built-in datatypes are those that are entirely defined in the XML Schemas:Datatypes specification and can be either primitive or generated.
User-generated datatypes are those generated datatypes whose base types are built-in datatypes or user-generated datatypes and are defined by individual schema designers by giving values to constraining facets.
Table 4.8 shows a description of primitive and generated datatypes.
Strings can be constrained using either picture elements (from COBOL) or regular expressions.
To demonstrate and compare schemas in relation to DTDs, I present a schema for our Address Book Markup Language (ABML) that we created a DTD for in Chapter 1:
<schema targetNameSpace="http://www.gosynergy.com/abml" xmlns = "http://www.w3.org/TR/1999/WD-xmlschema-1-19991217" xmlns:abml = "http://www.gosynergy.com/abml" > <element name="ADDRESS_BOOK" type = "ADDRESS_BOOK_TYPE" /> <type name="ADDRESS_BOOK_TYPE"> <element name="ADDRESS" type="ADDRESS_BOOK_TYPE" minOccurs="1" maxOccurs="*" /> </type> <type name="ADDRESS_TYPE" > <element name="NAME" type="string" /> <element name="STREET" type="string" /> <element name="CITY" type="string" /> <element name="STATE" type="string" /> <element name="ZIP" type="string" /> </type> </schema>
18.189.14.219