XML Schemas

There are two parts to the XML Schema specification: Structures and Data Types. The Structures specification describes a replacement syntax for describing XML documents to a finer granularity than is possible with a document type definition (DTD—the current method standardized with the XML 1.0 recommendation). The Data Types specification defines primi tive data types that can be used in XML schema and other XML specifications like XSL and RDF.

Note

At the time of this writing, the XML Schema specification is still a working draft. The latest specification can be found at http://www.w3c.org/TR.


The purpose of an XML Schema is to define and describe a class of XML documents by using XML-compliant markup to constrain and document the meaning, usage, and relationships of the document's datatypes; elements and their content; attributes and their values; entities and their contents; and notations.

The XML Schema:Structures formalism will allow a useful level of constraint checking to be described and validated for a wide spectrum of XML applications.

XML Schema:Structures has a dependency on the data typing mechanisms defined in its companion document, XML Schemas:Datatypes, published simultaneously.

These are key definitions in the specification:

  • Instance— An XML document whose structure conforms to some schema. Documents are associated with the schema to which they conform.

  • Schema— A set of rules for constraining the structure and articulating the information set of XML documents.

Schema Structures

The key idea behind XML Schema is to define the vocabulary and content model of a markup language using the rules of XML. The basic features of XML Schema are listed in Table 4.7.

Table 4.7. XML Schema Features
Feature Definition
Schema All definitions and declarations are contained within a Schema element. Uses <schema ...> </schema>.
Simple Type Definition The mechanisms for typing character data for either attribute values or element contents. Rules for this are specified in XML Schemas:Datatypes specification.
Complex Type Definition A complete set of constraints for elements in a document. Uses <type> </type>.
Element Type Declaration Associates an element name with a type. Uses <element ...> </element>.
Attribute Declaration Associates an attribute name and a data type. Uses <attribute ...> </attribute>.
Content Type Either a simple type or a content model.
Element Content Model A type that constrains the contents of an element. Has specifications for sequences and grouping.
Attribute Group Definition Ability to group a set of attributes under a name for reusability.
Deriving Type Definitions A type may be based on another type and acquire content type and attributes from the other type.
References to Schema Components Across Namespaces Integrates definitions and declarations defined elsewhere into the schema as if they were defined/declared locally.
Unique Key and Key Reference Constraints Provides powerful uniqueness and intradocument reference mechanisms.

Schema Datatypes

XML 1.0 does not provide any facility for rigorous type checking of data elements in an XML-compliant document. This specification defines standard data types for constraining values in element content and attributes'values.

The current specification concerns itself with scalar datatypes. A scalar is a single constrained value (formally, a value described in its entirety by magnitude).

Future versions of this specification will also cover aggregate data types like sets and bags (collections).

In this specification, a datatype has a set of distinct values, called its value space, and is characterized by facets or properties of those values and by operations on or resulting in those values. Further, each datatype is characterized by a space consisting of valid lexical representations for each value in the value space. A value space is an abstract collection of permitted values for the datatype. The lexical space for a datatype consists of a set of valid literals. Each value in the datatype's value space maps to one or more valid literals in its lexical space.

Datatypes can be broken down into several dichotomies. The first of these is atomic versus aggregate:

  • Atomic datatypes are those having values that are intrinsically indivisible.

  • Aggregate datatypes are those having values that can be decomposed into two or more component values.

Next is primitive versus generated:

  • Primitive datatypes are those that are not defined in terms of other datatypes.

  • Generated datatypes are those that are defined in terms of other datatypes.

Finally, built-in versus user-generated:

  • Built-in datatypes are those that are entirely defined in the XML Schemas:Datatypes specification and can be either primitive or generated.

  • User-generated datatypes are those generated datatypes whose base types are built-in datatypes or user-generated datatypes and are defined by individual schema designers by giving values to constraining facets.

Table 4.8 shows a description of primitive and generated datatypes.

Table 4.8. Primitive and Generated Datatypes
Datatype Description
string UCS characters of some specified length.
boolean A binary-state value.
binary Sequence of bytes.
uriReference A uniform resource locator.
language Represents natural language identifiers as defined by RFC 1766
ID From XML 1.0 spec.
IDREF From XML 1.0 spec.
IDREFS From XML 1.0 spec.
ENTITY From XML 1.0 spec.
ENTITIES From XML 1.0 spec.
NMTOKEN From XML 1.0 spec.
NMTOKENS From XML 1.0 spec.
NOTATION From XML 1.0 spec.
name An XML name as defined by the XML 1.0 spec.
QName A qualified XML name as defined by the XML Namespace recommendation.
NCName NCName represents XML "non-colonized" names as defined by the XML Namespace recommendation.
integer Whole numbers.
PositiveInteger Derived from nonNegativeInteger by fixing the value of minInclusive to be 1.
nonPositiveInteger Negative integers where the value of maxInclusive is fixed at 0.
negativeInteger Negative integers where the value of maxInclusive is –1.
nonNegativeInteger Derived from integer by fixing the value of minInclusive to be 0.
long long is derived from integer by fixing the values of maxInclusive to be 9223372036854775807 and minInclusive to be –9223372036854775808.
int int is derived from long by fixing the values of maxInclusive to be 2147483647 and minInclusive to be –2147483648.
short short is derived from int by fixing the values of maxInclusive to be 32767 and minInclusive to be –32768.
byte byte is derived from short by fixing the values of maxInclusive to be 127 and minInclusive to be –128.
unsignedLong Derived from nonNegativeInteger by fixing the values of maxInclusive to be 18446744073709551615.
unsignedInt Derived from unsignedLong by fixing the values of maxInclusive to be 4294967295.
unsignedShort Derived from unsignedInt by fixing the value maxInclusive to be 65535.
unsignedByte Derived from unsignedShort by fixing the value maxInclusive to be 255.
decimal Numbers with an exact fractional part.
real Floating-point numbers expressed with a mantissa and an exponent.
float IEEE single-precision 32-bit floating point type.
double IEEE double-precision 64-bit floating point type.
date Date as a string as defined in ISO 8601.
month A timePeriod that starts at midnight on the first day of the month and lasts until the midnight that ends the last day of the month.
year A timePeriod that starts at the midnight that starts the first day of the year and ends at the midnight that ends the last day of the year.
century A timePeriod that starts at the midnight that starts the first day of the century and ends at the midnight that ends that last day of the century.
time Time as a string as defined in ISO 8601.
timeInstant Represents a specific instant of time.
timePeriod A period of time as a string as defined in ISO 8601.
timeDuration Represents a duration of time as defined in ISO 8601.
recurringDay A specific day that recurs within a specific timeDuration.
recurringDate A specific date that recurs.

Strings can be constrained using either picture elements (from COBOL) or regular expressions.

A Sample Schema

To demonstrate and compare schemas in relation to DTDs, I present a schema for our Address Book Markup Language (ABML) that we created a DTD for in Chapter 1:

<schema targetNameSpace="http://www.gosynergy.com/abml"
        xmlns = "http://www.w3.org/TR/1999/WD-xmlschema-1-19991217"
    xmlns:abml = "http://www.gosynergy.com/abml" >

<element name="ADDRESS_BOOK" type = "ADDRESS_BOOK_TYPE" />

<type name="ADDRESS_BOOK_TYPE">
    <element name="ADDRESS" type="ADDRESS_BOOK_TYPE" minOccurs="1"
             maxOccurs="*" />
</type>

<type name="ADDRESS_TYPE" >
    <element name="NAME" type="string" />
    <element name="STREET" type="string" />
    <element name="CITY" type="string" />
    <element name="STATE" type="string" />
    <element name="ZIP" type="string" />
</type>
</schema>

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.14.219