Chapter 21. Schema design and documentation

It is fairly easy to create a schema once you know the syntax. It is harder to design one well. This chapter focuses on a strategy for designing schemas that are accurate, durable, and easy to implement. Carefully planning your schema design strategy is especially important when creating a complex set of schemas, or a standard schema that is designed to be used and extended by others.

21.1. The importance of schema design

Schemas are a fundamental part of many XML-based applications, whether XML is being used in temporary messages for information sharing or as an enduring representation of content (e.g., in publishing). Enterprise architects, DBAs, and software developers often devote a lot of time to data design: They create enterprise data models, data dictionaries with strict naming, and documentation standards, and carefully design and optimize relational databases. Unfortunately, software designers and implementers often do not pay as much attention to good design when it comes to XML messages.

There are several reasons for this. Some people feel that with transitory XML messages, it is not important how they are structured. Some decide that it is easier to use whatever schema is generated for them by a toolkit. Others decide to use an industry-standard XML vocabulary, but fail to figure out how their data really fits into that standard, or to come up with a strategy for customizing it for their needs.

As with any data design, there are many ways to organize XML messages. For example, decisions must be made about how many levels of elements to include, whether the elements should represent generic or more specific concepts, how to represent relationships, and how far to break down data into separate elements. In addition, there are multiple ways to express the same XML structure in XML Schema. Decisions must be made about whether to use global versus local declarations, whether to use named versus anonymous types, whether to achieve reuse through type extension or through named model groups, and how schemas should be broken down into separate schema documents.

The choices you make when designing a schema can have a significant impact on the ease of implementation, ease of maintenance, and even the ongoing relevance of the system itself. Failure to take into account design goals such as reuse, graceful versioning, flexibility, and tool support can have serious financial impacts on software development projects.

21.2. Uses for schemas

When designing a schema, it is first important to understand what it will be used for. Schemas actually play several roles.

Validation. Validation is the purpose that is most often associated with schemas. Given an XML document, you can use a schema to automatically determine whether that document is valid or not. Are all of the required elements there, in the right order? Do they contain valid values according to their data types? Schema validation does a good job of checking the basic structure and content of elements.

A service contract. A schema serves as part of the understanding between two parties. The document provider and the document consumer can both use the schema as a machine-enforceable set of rules describing an interface between two systems or services.

Documentation. Schemas are used to document the XML structure for the developers and end users that will be implementing or using it. Narrative human-readable annotations can be added to schema components to further document them. Although schemas themselves are not particularly human-readable, they can be viewed by less technical users in a graphical XML editor tool. In addition, there are a number of tools that will generate HTML documentation from schemas, making them more easily understood.

Providing type information. Schemas contain information about the data types that can affect how the information is processed. For example, if the schema tells an XSLT 2.0 stylesheet that a value is an integer, it will know to sort it and compare to other values as an integer instead of a string.

Assisted editing. For documents that will be hand-modified by human users, a schema can be used by XML editing software to provide context-sensitive validation, help, and content completion.

Code generation. Schemas are also commonly used, particularly in web services and other structured data interfaces, to generate classes and interfaces that read and write the XML message payloads. When a schema is designed first, classes can be generated automatically from the schema definitions, ensuring that they match. Other software artifacts can also be generated from schemas, for example, data entry forms.

Debugging. Schemas can assist in the debugging and testing processes for applications that will process the XML. For example, importing a schema into a XSLT 2.0 stylesheet or an XQuery query can help identify invalid paths and type errors in the code that, otherwise, may not have been found during testing.

As you can see, schemas are an important part of an XML implementation, and can be involved at both design time and run time. Although it is certainly possible to use XML without schemas, valuable functionality would be lost. You would be forced to use a nonstandard method to validate your messages, document your system interfaces, and generate code for web services. You also would not be able to take advantage of the many schema-based tools that implement this functionality at low cost.

The various roles of schemas should be taken into account when designing them. For example, use of obscure schema features can make code generation difficult, and not adequately documenting schemas can impact the usefulness of generated documentation.

21.3. Schema design goals

Designing schemas well is a matter of paying attention to certain important design considerations: flexibility and extensibility, reusability, clarity and simplicity, support for versioning, interoperability, and tool support. This section takes a closer look at each of these design goals.

21.3.1. Flexibility and extensibility

Schema design often requires a balancing act between flexibility, on the one hand, versus rigidity on the other. For example, suppose I am selling digital cameras that have a variety of features, such as resolution, battery type, and screen size. Each camera model has a different set of features, and the types of features change over time as new technology is developed. When designing a message that incorporates these camera descriptions, I want enough flexibility to handle variations in feature types, without having to redesign my message every time a new feature comes along. On the other hand, I want to be able to accurately and precisely specify these features.

To allow for total flexibility in the camera features, I could declare a features element whose type contains an element wildcard, which means that any well-formed XML is allowed. This would have the advantage of being extremely versatile and adaptable to change. The disadvantage is that the message structure is very poorly defined. A developer trying to write an application to process the message would have no idea what features to expect and what format they might have.

On the other hand, I can declare highly constrained elements for each feature, with no opportunity for variation. This has the benefit of making the features well defined, easy to validate, and much more predictable. Validation is more effective because certain features can be required and their values can be constrained by specific data types. However, the schema is brittle because it must be changed every time a new feature is introduced. When the schema changes, the applications that process the documents must also often change.

The ideal design is usually somewhere in the middle. A balanced approach in the case of the camera features might be to create a repeating feature element that contains the name of the feature as an attribute and the value of the feature as its content. This eliminates the brittleness while still providing a predictable structure for implementers.

21.3.2. Reusability

Reuse is an important goal in the design of any software. Schemas that reuse XML components across multiple kinds of documents are easier for developers and users to learn, are more consistent, and save development and maintenance time that could be spent writing redundant software components.

Using XML Schema, reuse can be achieved in a number of ways.

Reusing types. It is highly desirable to reuse complex and simple types in multiple element and attribute declarations. For example, you can define a complex type named AddressType that represents a mailing address, and then use it for both BillingAddress and ShippingAddress elements. Only named, global types can be reused, so types in XML Schema should generally be named.

Type inheritance. In XML Schema, complex types can be specialized from other types using type extensions. For example, I can create a more generic type ProductType and derive types named CameraType and LensType from it. This is a form of reuse because CameraType and LensType inherit a shared set of properties from ProductType.

Named model groups and attribute groups. Through the use of named model groups, it is possible to define reusable pieces of content models. This is a useful alternative to type inheritance for types that are semantically different but just happen to share some properties with other types.

Reusing schema documents. Entire schema documents can be reused by taking advantage of the include and import mechanisms of XML Schema. This is useful for defining components that might be used in several different contexts or services. In order to plan for reuse, schema documents should be broken down into logical components by subject area. Having schema documents that are too large and all-encompassing tends to inhibit reuse because it forces other schema documents to take all or nothing when importing them. It is also good practice to create a “core components” schema that has low-level building blocks, such as types for Address and Quantity, that are imported by all other schema documents.

21.3.3. Clarity and simplicity

When human users are creating and updating XML documents, clarity is of the utmost importance. If users have difficulty understanding the document structure, it will take far more time to edit a document, and the editing process will be much more prone to errors. Even when XML documents are both written and read by software applications, they still should be designed so that they are easy to conceptualize and process. Implementers on both sides—those who create XML documents and those who consume them—are writing and maintaining applications to process these messages, and they must understand them. Overly complex message designs lead to overly complex applications that create and process them, and both are hard to learn and maintain.

21.3.3.1. Naming and documentation

Properly and consistently naming schema components—elements, attributes, types, groups—can go a long way toward making the documents comprehensible. Using a common set of terms rather than multiple synonymous terms is good practice, as is the avoidance of obscure acronyms. In XML Schema, it is helpful to identify the kind of component in its name, for example by using the word “Type” at the end of type names. Namespaces should also be consistently and meaningfully named.

Of course, good documentation is very important to achieving clarity. XML Schema allows components to be documented using annotations. While you probably have other documentation that describes your system, having human-readable definitions of the components in your schema is very useful for people who maintain and use that schema. It also allows you to use tools that automatically generate schema documentation more effectively.

21.3.3.2. Clarity of structure

Consistent structure can also help improve clarity. For example, if many different types have child elements Identifier and Name, put them first and always in the same order. Reuse of components helps to ensure consistent structure.

It is often difficult to determine how many levels of elements to put in a message. Using intermediate elements that group together related properties can help with understanding. For example, embedding all address-related elements (street, city, etc.) inside an Address child element, not directly inside a Customer element, is an obvious choice. It makes the components of the address clearly contained and allows you to make the entire address optional or repeating.

It is also often useful to use intermediate elements to contain lists of list-like elements. For example, it is a good idea to embed a repeating sequence of OrderedItem elements inside an OrderedItems (plural) container, rather than directly inside a PurchaseOrder element. These container elements can make messages easier to process and often work better with code generation tools.

However, there is such a thing as excessive use of intermediate elements. XML messages that are a dozen levels deep can become unwieldy and difficult to process.

21.3.3.3. Simplicity

It is best to minimize the number of ways a particular type of data or content can be expressed. Having multiple ways to represent a particular kind of data or content in your XML documents may seem like a good idea because it is more flexible. However, allowing too many choices is confusing to users, puts more of a burden on applications that process the documents, and can lead to interoperability problems.

21.3.4. Support for graceful versioning

Systems will change over time. Schemas should be designed with a plan for how to handle changes in a way that causes minimum impact on the systems that create and process XML documents.

A typical schema versioning strategy differentiates between major versions and minor versions. Major versions, such as 1.0, 2.0, or 3.0, are by definition disruptive and not backward-compatible; at times this is an unavoidable part of software evolution. On the other hand, minor versions, such as 1.1, 1.2, or 1.3, are backward-compatible. They involve changes to schemas that will still allow old message instances to be valid according to the new schema. For example, a version 1.2 message can be valid according to a version 1.3 schema if the version 1.3 limits itself to backward-compatible changes.

21.3.5. Interoperability and tool compatibility

Schemas are used heavily by tools—not just for validation but also for the generation of code and documentation. In an ideal world, all schema parsers and toolkits would support the exact same schema language, and all schemas would be interoperable. The unfortunate reality is that tools, especially code generation tools, vary in their support for XML Schema, for several reasons.

• Some toolkits incorrectly implement features of XML Schema because the recommendation is complex and in some cases even ambiguous.

• Some web services toolkits deliberately do not support certain features of XML Schema because they do not find them to be relevant or useful to a particular use case, such as data binding.

• Some XML Schema concepts do not map cleanly onto object-oriented concepts. Even if a toolkit attempts to support these features, it may do so in a less than useful way.

In general, it is advisable to stick to a subset of the XML Schema language that is well supported by the kinds of toolkits you will be using in your environment. For example, features of XML Schema to avoid in a web services environment where data-binding toolkits are in use include

• Mixed content (elements that allow text content as well as children)

choice and all model groups

• Complex content models with nested model groups

• Substitution groups

• Dynamic type substitution using the xsi:type attribute

• Default and fixed values for elements or attributes

• Redefinition of schema documents

It is advisable to test your schemas against a variety of toolkits to be sure that they can handle them gracefully.

21.4. Developing a schema design strategy

Many organizations that are implementing medium- to large-scale XML vocabularies develop enterprise-wide guidelines for schema design, taking into account the considerations described in this chapter. Sometimes these guidelines are organized into documents that are referred to as Naming and Design Rules (NDR) documents.

Having a cohesive schema design strategy has a number of benefits.

• It promotes a standard approach to schema development that improves consistency and therefore clarity.

• It ensures that certain strategies, such as how to approach versioning, are well thought out before too much investment has been made in development.

• It allows the proposed approach to be tested with toolkits in use in the organization to see if they generate manageable code.

• It serves as a basis for design reviews, which are a useful way for centralized data architects to guide or even enforce design standards within an organization.

A schema design strategy should include the following topics:

Naming standards: standard word separators, upper versus lower case names, a standard glossary of terms, special considerations for naming types and groups. Naming standards are discussed in Section 21.6 on p. 559.

Namespaces: what they should be named, how many to have, how many schema documents to use per namespace, how they should be documented. See Section 21.7 on p. 564 for namespace guidelines.

Schema structure strategy: how many schema documents to have, recommended folder structure, global versus local components. Section 21.5 on p. 550 covers these topics.

Documentation standards: the types of documentation required for schema components, where they are to be documented. Schema documentation is covered in Section 21.8 on p. 580.

XML Schema features: a list of allowed (or prohibited) XML Schema features, limited to promote simplicity, better tool support, and interoperability.

Versioning strategy: whether to require forward compatibility (and if so how to accomplish it), rules for backward compatibility of releases, patterns for version numbering. All of Chapter 23 is devoted to versioning, with particular attention paid to developing a versioning strategy in Section 23.4.1 on p. 636.

Reuse strategy: recommended methods of achieving reuse, an approach for a common component library. Reuse is covered in Section 22.1 on p. 596.

Extension strategy: which external standards are approved for use, description of the correct way to incorporate or extend them, how other standards should extend yours and under what conditions. Section 22.2 on p. 599 compares and contrasts six methods for extending schemas.

These considerations are covered in the rest of this chapter and the next two chapters.

21.5. Schema organization considerations

There are a number of design decisions that affect the way a schema is organized, without impacting validation of instances. They include whether to use global or local declarations and how to modularize your schemas.

21.5.1. Global vs. local components

Some schema components can be either global or local. Element and attribute declarations, for example, can be scoped entirely with a complex type (local) or at the top level of the schema document (global). Type definitions (both simple and complex) can be scoped to a particular element or attribute declaration, in which case they are anonymous, or at the top level of the schema document, in which case they are named. Sections 6.1.3 on p. 95, 7.2.3 on p. 119, and 8.2.3 on p. 133 cover the pros and cons of global versus local components.

It is possible to decide individually for each component whether it should be global or local, but it is better to have a consistent strategy that is planned in advance. Table 21–1 provides an overview of the four possible approaches to the global/local decision. The names associated with the approaches (with the exception of Garden of Eden) were developed as the result of a discussion on XML-DEV led by Roger Costello, who wrote them up as a set of best practices at www.xfront.com/GlobalVersusLocal.pdf.

Table 21–1. Schema structure patterns

Image

This section provides an overview of the advantages and disadvantages of each approach. All four of these approaches will validate the same instance, so the question is more one of schema design than XML document design.

In all four approaches, the attribute declarations are locally declared. This follows the recommended practice of allowing unqualified attribute names when the attributes are part of the vocabulary being defined by the schema.

21.5.1.1. Russian Doll

The Russian Doll approach is characterized by all local definitions, with the exception of the root element declaration. All types are anonymous, and all element and attribute declarations are local. Example 21–1 is a Russian Doll schema.

The main disadvantage of this approach is that neither the elements nor the types are reusable. This can result in code that is redundant and hard to maintain. It can also be cumbersome to read. With all the indenting it is easy to lose track of where you are in the hierarchy.

There are a few advantages of this approach but they are less compelling.

• Since the elements are locally declared, it is possible to have more than one element with the same name but a different type or other characteristics. For example, there can be a number child of product that has a format different from a number child of order.

Example 21–1. Schema for Russian Doll approach


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/prod"
           targetNamespace="http://datypic.com/prod"
           elementFormDefault="qualified">
  <xs:element name="catalog">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="product" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="number" type="xs:integer"/>
              <xs:element name="name" type="xs:string"/>
              <xs:element name="size">
                <xs:simpleType>
                  <xs:restriction base="xs:integer">
                    <xs:minInclusive value="2"/>
                    <xs:maxInclusive value="18"/>
                  </xs:restriction>
                </xs:simpleType>
              </xs:element>
            </xs:sequence>
            <xs:attribute name="dept" type="xs:string"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>


• Since the elements are locally declared, their names can be unqualified in the instance. However, this practice is not recommended, as described in Section 21.7.3.6 on p. 579.

• There is only one global element declaration, so it is obvious which one is the root.

• Since there is no reuse, it is easier to see the impact of a change: You simply look up the hierarchy.

21.5.1.2. Salami Slice

The Salami Slice uses global element declarations but anonymous (local) types. This places importance on the element as the unit of reuse. Example 21–2 is a Salami Slice schema.

Example 21–2. Schema for Salami Slice approach


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/prod"
           targetNamespace="http://datypic.com/prod"
           elementFormDefault="qualified">
  <xs:element name="catalog">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="product" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="product">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="number"/>
        <xs:element ref="name"/>
        <xs:element ref="size"/>
      </xs:sequence>
      <xs:attribute name="dept" type="xs:string"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="number" type="xs:integer"/>
  <xs:element name="name" type="xs:string"/>
  <xs:element name="size">
    <xs:simpleType>
      <xs:restriction base="xs:integer">
        <xs:minInclusive value="2"/>
        <xs:maxInclusive value="18"/>
      </xs:restriction>
    </xs:simpleType>
  </xs:element>
</xs:schema>


The disadvantage of this approach is that the types are not reusable by multiple element declarations. Often you will have multiple element names that have the same structure, such as billingAddress and shippingAddress with the same address structure. Using this model, the entire address structure would need to be respecified each time, or put into a named model group. Anonymous types also cannot be used in derivation—another form of reuse and sometimes an important expression of an information model. Although you can reuse the element declarations, this might mean watered-down element names, such as address instead of a more specific kind of address. Since elements are globally declared, it is not possible to have more than one element with the same name but a different type or other characteristics; all element names in the entire schema must be unique.

This approach does have some advantages over Russian Doll, namely that it is more readable and does allow some degree of reuse through element declarations. Unlike Russian Doll, it does allow the use of substitution groups, which require global element declarations.

21.5.1.3. Venetian Blind

The Venetian Blind approach has local element declarations but named global types. Example 21–3 is a Venetian Blind schema.

The significant advantage to this approach is that the types are reusable. The advantages of named types over anonymous types are compelling and make this approach more flexible and better defined than either of the previous two approaches. Since elements are locally declared, it is possible to have more than one element with the same name but a different type or other characteristics, which also improves flexibility.

The disadvantage of this approach is that element declarations are not reused, so if they have complex constraints such as type alternatives or identity constraints, these need to be respecified in multiple places. Also, element declarations cannot participate in substitution groups.

If substitution groups aren’t needed, this is the author’s preferred approach and is the style used in most of this book. It allows for full reuse through types, but also allows the flexibility of varying element names. It maps very cleanly onto an object-oriented model, where the complex types are analogous to classes and the element declarations are analogous to instance variables that have that class.

Example 21–3. Schema for Venetian Blind approach


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/prod"
           targetNamespace="http://datypic.com/prod"
           elementFormDefault="qualified">
  <xs:element name="catalog" type="CatalogType"/>
  <xs:complexType name="CatalogType">
    <xs:sequence>
      <xs:element name="product" type="ProductType"
                  maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element name="number" type="xs:integer"/>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="size" type="SizeType"/>
    </xs:sequence>
    <xs:attribute name="dept" type="xs:string"/>
  </xs:complexType>
  <xs:simpleType name="SizeType">
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="2"/>
      <xs:maxInclusive value="18"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>


21.5.1.4. Garden of Eden

The Garden of Eden approach has all global (named) types and global element declarations. Example 21–4 is a Garden of Eden schema.

The only disadvantage of this approach, other than its verbosity, is that since elements are globally declared, it is not possible to have more than one element with the same name but a different type or other characteristics. All element names in the entire schema must be unique. However, some would consider unique, meaningful element names to be an advantage, and indeed it can simplify processing of the document using technologies like XSLT and SAX.

Example 21–4. Schema for Garden of Eden approach


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/prod"
           targetNamespace="http://datypic.com/prod"
           elementFormDefault="qualified">
  <xs:element name="catalog" type="CatalogType"/>
  <xs:complexType name="CatalogType">
    <xs:sequence>
      <xs:element ref="product" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:element name="product" type="ProductType"/>
  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element ref="number"/>
      <xs:element ref="name"/>
      <xs:element ref="size"/>
    </xs:sequence>
    <xs:attribute name="dept" type="xs:string"/>
  </xs:complexType>
  <xs:element name="number" type="xs:integer"/>
  <xs:element name="name" type="xs:string"/>
  <xs:element name="size" type="SizeType"/>
  <xs:simpleType name="SizeType">
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="2"/>
      <xs:maxInclusive value="18"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>


Garden of Eden is a very viable approach to schema design. Its big advantage is that it allows both element declarations and types to be referenced from multiple components, maximizing their reuse potential. In addition, compared to Venetian Blind, it allows the use of substitution groups. If substitution groups are needed, and it is acceptable to force the uniqueness of element names, this approach is the right choice. Many standard XML vocabularies use this approach.

Overall, the Garden of Eden and Venetian Blind, depending on your requirements, are the recommended approaches. The Russian Doll approach has obvious limitations in terms of reuse, and the Salami Slice approach does not benefit from the very significant advantages of named types over anonymous types.

21.5.2. Modularizing schema documents

Another decision related to schema structure is how to modularize your schema documents. Consider a project that involves orders for retail products. The order will contain information from several different domains. It will contain general information that applies to the order itself, such as the order number and date. It may also contain customer information, such as customer name, number, and address. Finally, it may contain product information, such as product number, name, description, and size.

Do you want one big schema document, or three schema documents, one for each of the subject areas (order, customer, and product)? There are a number of advantages to composing your schema representation from multiple schema documents.

Easier reuse. If schema documents are small and focused, they are more likely to be reused. For example, a product catalog application might want to reuse the definitions from the product schema. This is much more efficient if the product catalog application is not forced to include everything from the order application.

Ease of maintenance. Smaller schema documents are more readable and manageable.

Reduced chance of name collisions. If different namespaces are used for the different schema documents, name collisions are less likely.

Versioning granularity. It is helpful to separate components that change more frequently, or change on a different schedule, into a separate schema document. This creates less disruption when a new version is released. Code lists (simple types with enumerations) are often placed into individual schema documents so they can be versioned separately.

Access control granularity. Security can be managed per schema document, allowing more granular access control.

Dividing up your schema documents too much, though, can make them hard to manage. For example, having one element declaration or type definition per schema document would necessitate the use of dozens of imports or includes in your schema documents. If different namespaces are used in these schema documents, that compounds the complexity, because an instance document will have to declare dozens of namespaces.

There are several ways to distribute your components among schema documents, for example:

Subject area. If your instance will contain application data, this could mean one schema document per application or per database. If the instances incorporate XML documents of different types, such as test reports and product specifications, and each is defined by its own root element declaration, it would be logical to use a separate schema document for each document type.

General/specific. There may be a base set of components that can be extended for a variety of different purposes. For example, you may create a schema document that contains generic (possibly abstract) definitions for purchase orders and invoices, and separate schema documents for each set of industry-specific extensions.

Basic/advanced. Suppose you have a core set of components that are used in all instances, plus a number of optional components. You may want to define these optional components in a separate schema document. This allows an instance to be validated against just the core set of components or the enhanced set, depending on the application.

Governance. Schemas should be divided so that a single schema document is not governed by more than one group of people.

Versioning Schedule. As mentioned above, it is helpful to separate components that are versioned frequently or on different schedules, such as code lists.

Another issue when you break up schema documents is whether to use the same namespace for all of them, or break them up into separate namespaces. This issue is covered in detail in Section 21.7.2 on p. 565.

21.6. Naming considerations

This section provides detailed recommendations for choosing and managing names for XML elements and attributes, as well as for other XML Schema components. It discusses naming guidelines, the use of qualified and unqualified names, and organizing a namespace.

Consistency in naming can be as important as the names themselves. Consistent names are easier to understand, remember, and maintain. This section provides guidelines for defining an XML naming standard to ensure the quality and consistency of names.

These guidelines apply primarily to the names that will appear in an instance—namely, element, attribute, and notation names. However, much of this section is also applicable to the names used within the schema, such as type, group, and identity constraint names.

21.6.1. Rules for valid XML names

Names in XML must start with a letter or underscore (_), and can contain only letters, digits, underscores (_), colons (:), hyphens (-), and periods (.). Colons should be reserved for use with namespace prefixes. In addition, an XML name cannot start with the letters xml in either upper or lower case.

Names in XML are always case-sensitive, so accountNumber and AccountNumber are two different element names.

Since schema components have XML names, these name restrictions apply not only to the element and attribute names that appear in instances, but also to the names of the types, named model groups, attribute groups, identity constraints, and notations you define in your schemas.

21.6.2. Separators

If a name is made up of several terms, such as “account number,” you should decide on a standard way to separate the terms. It can be done through capitalization (e.g., accountNumber) or through punctuation (e.g., account-number).

Some programming languages, database management systems, and other technologies do not allow hyphens or other punctuation in the names they use. Therefore, if you want to directly match your element names, for example, with variable names or database column names, you should use capitalization to separate terms.

If you choose to use capitalization, the next question is whether to use mixed case (e.g., AccountNumber) or camel case (e.g., accountNumber). In some programming languages, it is a convention to use mixed case for class names and camel case for instance variables. In XML, this maps roughly to using mixed case for type names and camel case for element names. This is the convention used in this book.

Regardless of which approach you choose, the most important thing is being consistent.

21.6.3. Name length

There is no technical limit to the length of an XML name. However, the ideal length of a name is somewhere between four and twelve characters. Excessively long element names, such as HazardousMaterialsHandlingFeeDomestic, can, if used frequently, add dramatically to the size of the instance. They are also difficult to type, hard to distinguish from other long element names, and not very readable. On the other hand, very short element names, such as b, can be too cryptic for people who are not familiar with the vocabulary.

21.6.4. Standard terms and abbreviations

In order to encourage consistency, it is helpful to choose a standard set of terms that will be used in your names. Table 21–2 shows a sample list of standard terms. This term list is used in the examples in this book. Synonyms are included in the list to prevent new terms from being created that have the same meaning as other terms.

Table 21–2. Terms

Image

These consistent terms are then combined to form element and attribute names. For example, “product number” might become productNumber.

In some cases, the name will be too long if all of the terms are concatenated together. Therefore, it is useful to have a standard abbreviation associated with each term. Instead of productNumber, prodNumber may be more manageable by being shorter.

21.6.5. Use of object terms

In some contexts, using an element name such as prodNumber may be redundant. In Example 21–5, it is obvious from the context that the number is a product number as opposed to some other kind of number.

Example 21–5. Repetition of terms


<product>
  <prodNumber>557</prodNumber>
  <prodName>Short-Sleeved Linen Blouse</prodName>
  <prodSize sizeSystem="US-DRESS">10</prodSize>
</product>


In this case, it may be clearer to leave off the prod term on the child elements, as shown in Example 21–6.

Example 21–6. No repetition of terms


<product>
  <number>557</number>
  <name>Short-Sleeved Linen Blouse</name>
  <size system="US-DRESS">10</size>
</product>


There may be other cases where the object is not so obvious. In Example 21–7, there are two names: a customer name and a product name. If we took out the terms cust and prod, we would not be able to distinguish between the two names. In this case, it should be left as shown.

Example 21–7. Less clear context


<letter>Dear <custName>Priscilla Walmsley</custName>,
Unfortunately, we are out of stock of the
<prodName>Short-Sleeved Linen Blouse</prodName> in size
<prodSize>10</prodSize> that you ordered...</letter>


When creating element and attribute names, it is helpful to list two names: one to be used when the object is obvious, and one to be used in other contexts. This is illustrated in Table 21–3.

Table 21–3. Element or attribute names

Image

21.7. Namespace considerations

21.7.1. Whether to use namespaces

Some designers of XML vocabularies wonder whether they should even use namespaces. In general, using namespaces has a lot of advantages:

• It indicates clear ownership of the definitions in that namespace.

• The namespace name, if it is a URL, provides a natural place to locate more information about that XML vocabulary.

• It allows the vocabulary to be combined with other XML vocabularies with a clear separation and without the risk of name collision.

• It is an indication to processing software what kind of document it is.

The downside of using namespaces is the complexity, or perceived complexity. You have to declare them in your schemas and your instances, and the code you write to process the documents has to pay attention to them. Another possible disadvantage is their limited support in DTDs. If you are writing both a DTD and a schema for your vocabulary, it requires special care to flexibly allow namespace declarations and prefixes in the DTD.1

Despite some negative perceptions about namespaces, it is fairly unusual for standard, reusable vocabularies not to use namespaces. If you are writing a one-off XML vocabulary that is internal to a single organization or application, it is probably fine not to use namespaces. However, if you are planning for your vocabulary to be used by a variety of organizations, or combined with other vocabularies, namespaces are highly recommended.

The complexity of namespaces can be somewhat mitigated by choosing a straightforward namespace strategy. For example, using fewer namespaces, using qualified local element names, and using conventional prefixes consistently can make namespaces seem less cumbersome.

21.7.2. Organizing namespaces

Consider a project that involves orders for retail products. An order will contain information from several different domains. It will contain general information that applies to the order itself, such as order number and date. It may also contain customer information, such as customer name, number, and address. Finally, it may contain product information, such as product number, name, description, and size. Is it best to use the same namespace for all of them, or break them up into separate namespaces? There are three approaches:

1. Same namespace: Use the same namespace for all of the schema documents.

2. Different namespaces: Use multiple namespaces, perhaps a different one for each schema document.

3. Chameleon namespaces: Use a namespace for the parent schema document, but no namespaces for the included schema documents.

21.7.2.1. Same namespace

It is possible to give all the schema documents the same target namespace and use include to assemble them to represent a schema with that namespace. This is depicted in Figure 21–1.

Image

Figure 21–1. Same namespace

Example 21–8 shows our three schema documents using this approach. They all have the same target namespace, and ord.xsd includes the other two schema documents.

Example 21–8. Same namespace in a schema


ord.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/all"
           targetNamespace="http://datypic.com/all"
           elementFormDefault="qualified">
  <xs:include schemaLocation="prod.xsd"/>
  <xs:include schemaLocation="cust.xsd"/>
  <xs:element name="order" type="OrderType"/>
  <xs:complexType name="OrderType">
    <xs:sequence>
      <xs:element name="customer" type="CustomerType"/>
      <xs:element name="items" type="ItemsType"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

prod.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/all"
           targetNamespace="http://datypic.com/all"
           elementFormDefault="qualified">
  <xs:complexType name="ItemsType">
    <xs:sequence maxOccurs="unbounded">
      <xs:element name="product" type="ProductType"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element name="number" type="xs:integer"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

cust.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/all"
           targetNamespace="http://datypic.com/all"
           elementFormDefault="qualified">
  <xs:complexType name="CustomerType">
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>


Example 21–9 shows an instance that conforms to the schema. Since there is only one namespace for all of the elements, a default namespace declaration is used.

The advantages of this approach are that it is uncomplicated and the instance is not cluttered by prefixes. The disadvantage is that you cannot have multiple global components with the same name in the same namespace, so you will have to be careful of name collisions.

Example 21–9. Same namespace in an instance


<order xmlns="http://datypic.com/all"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://datypic.com/all ord.xsd">
  <customer>
    <name>Priscilla Walmsley</name>
  </customer>
  <items>
    <product>
      <number>557</number>
    </product>
  </items>
</order>


This approach assumes that you have control over all the schema documents. If you are using elements from a namespace over which you have no control, such as the XHTML namespace, you should use the approach described in the next section.

This approach is best within a particular application where you have control over all the schema documents involved.

21.7.2.2. Different namespaces

It is also possible to give each schema document a different target namespace and use an import (or other method) to assemble the multiple schema documents. This is depicted in Figure 21–2.

Image

Figure 21–2. Different namespaces

Example 21–10 shows our three schema documents using this approach. They all have different target namespaces, and ord.xsd imports the other two schema documents.

Example 21–10. Different namespaces in a schema


ord.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:prod="http://datypic.com/prod"
           xmlns:cust="http://datypic.com/cust"
           xmlns="http://datypic.com/ord"
           targetNamespace="http://datypic.com/ord"
           elementFormDefault="qualified">
  <xs:import schemaLocation="prod.xsd"
             namespace="http://datypic.com/prod"/>
  <xs:import schemaLocation="cust.xsd"
             namespace="http://datypic.com/cust"/>
  <xs:element name="order" type="OrderType"/>
  <xs:complexType name="OrderType">
    <xs:sequence>
      <xs:element name="customer" type="cust:CustomerType"/>
      <xs:element name="items" type="prod:ItemsType"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

prod.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/prod"
           targetNamespace="http://datypic.com/prod"
           elementFormDefault="qualified">
  <xs:complexType name="ItemsType">
    <xs:sequence maxOccurs="unbounded">
      <xs:element name="product" type="ProductType"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element name="number" type="xs:integer"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

cust.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/cust"
           targetNamespace="http://datypic.com/cust"
           elementFormDefault="qualified">
  <xs:complexType name="CustomerType">
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>


Example 21–11 shows an instance that conforms to this schema. You are required to declare all three namespaces in the instance and to prefix the element names appropriately. However, since ord.xsd imports the other two schema documents, you are not required to specify xsi:schemaLocation pairs for all three schema documents, just the “main” one.

Example 21–11. Different namespaces in an instance


<order xmlns="http://datypic.com/ord"
       xmlns:prod="http://datypic.com/prod"
       xmlns:cust="http://datypic.com/cust"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://datypic.com/ord ord.xsd">
  <customer>
    <cust:name>Priscilla Walmsley</cust:name>
  </customer>
  <items>
    <prod:product>
      <prod:number>557</prod:number>
    </prod:product>
  </items>
</order>


To slightly simplify the instance, different default namespace declarations could appear at different levels of the document, resulting in the instance shown in Example 21–12. It could be simplified even further by the use of unqualified local element names, as discussed in Section 21.7.3.2 on p. 576.

Example 21–12. Different namespaces in an instance, with default namespaces


<order xmlns="http://datypic.com/ord"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://datypic.com/ord ord.xsd">
  <customer>
    <name xmlns="http://datypic.com/cust">Priscilla Walmsley</name>
  </customer>
  <items>
    <product xmlns="http://datypic.com/prod">
      <number>557</number>
    </product>
  </items>
</order>


This is an obvious approach when you are using namespaces over which you have no control—for example, if you want to include XHTML elements in your product description. There is no point trying to copy all the XHTML element declarations into a new namespace. This would create maintenance problems and would not be very clear to users. In addition, applications that process XHTML may require that the elements be in the XHTML namespace.

The advantage to this approach is that the source and context of an element are very clear. In addition, it allows different groups to be responsible for different namespaces. Finally, you can be less concerned about name collisions, because the names must only be unique within a namespace.

The disadvantage of this approach is that instances are more complex, requiring prefixes for multiple different namespaces. Also, you cannot use the redefine or override feature on these components, since they are in a different namespace.

21.7.2.3. Chameleon namespaces

The third possibility is to specify a target namespace only for the “main” schema document, not the included schema documents. The included components then take on the target namespace of the including document. In our example, all of the definitions and declarations in both prod.xsd and cust.xsd would take on the http://datypic.com/ord namespace once they are included in ord.xsd. This is depicted in Figure 21–3.

Image

Figure 21–3. Chameleon namespaces

Example 21–13 shows our three schema documents using this approach. Neither prod.xsd nor cust.xsd has a target namespace, while ord.xsd does. ord.xsd includes the other two schema documents and changes their namespace as a result.

The instance in this case would look similar to that of the samenamespace approach shown in Example 21–9, since all the elements would be in the same namespace. The only difference is that in this case the namespace would be http://datypic.com/ord instead of http://datypic.com/all.

Example 21–13. Chameleon namespaces in a schema


ord.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/ord"
           targetNamespace="http://datypic.com/ord"
           elementFormDefault="qualified">
  <xs:include schemaLocation="prod.xsd"/>
  <xs:include schemaLocation="cust.xsd"/>
  <xs:element name="order" type="OrderType"/>
  <xs:complexType name="OrderType">
    <xs:sequence>
      <xs:element name="customer" type="CustomerType"/>
      <xs:element name="items" type="ItemsType"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

prod.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified">
  <xs:complexType name="ItemsType">
    <xs:sequence maxOccurs="unbounded">
      <xs:element name="product" type="ProductType"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element name="number" type="xs:integer"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

cust.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified">
  <xs:complexType name="CustomerType">
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>


The advantage of this approach is its flexibility. Components can be included in multiple namespaces, and redefined or overridden whenever desired.

The disadvantage is that the risk of name collisions is even more serious. If the non-namespace schema documents grow over time, the risk increases that there will be name collisions with the schema documents that include them. If they were in their own namespace, unexpected collisions would be far less likely.

Another disadvantage is that the chameleon components lack an identity. Namespaces can be well-defined containers that provide a recognizable context as well as specific semantics, documentation, and application code.

21.7.3. Qualified vs. unqualified forms

Many instances include elements from more than one namespace. This can potentially result in instances with a large number of different prefixes, one for each namespace. However, when an element declaration is local—that is, when it isn’t at the top level of a schema document—you have the choice of using either qualified or unqualified element names in instances, a concept that was introduced in Section 6.3 on p. 98. Let’s recap the two alternatives, this time looking at more complex multinamespace documents.

21.7.3.1. Qualified local names

Example 21–14 shows an instance where all element names are qualified. Every element name has a prefix that maps it to either the product namespace or the order namespace.

Example 21–14. Qualified local names


<ord:order xmlns:ord="http://datypic.com/ord"
           xmlns:prod="http://datypic.com/prod">
  <ord:number>123123</ord:number>
  <ord:items>
    <prod:product>
      <prod:number>557</prod:number>
    </prod:product>
  </ord:items>
</ord:order>


This instance is very explicit about which namespace each element is in. There will be no confusion about whether a particular number element is in the order namespace or in the product namespace. However, the application or person that generates this instance must be aware which elements are in the order namespace and which are in the product namespace.

21.7.3.2. Unqualified local names

Example 21–15, on the other hand, shows an instance where only the root element name, order, is qualified. The other element names have no prefix, and since there is no default namespace provided, they are not in any namespace.

This instance has the advantage of looking slightly less complicated and not requiring the instance author to care about what namespace each element belongs in. In fact, the instance author does not even need to know of the existence of the product namespace.

Example 21–15. Unqualified local names


<ord:order xmlns:ord="http://datypic.com/ord">
  <number>123123</number>
  <items>
    <product>
      <number>557</number>
    </product>
  </items>
</ord:order>


21.7.3.3. Using form in schemas

Let’s look at the schemas that would describe these two instances. Example 21–16 shows how to represent the schema for the instance in Example 21–14, which has qualified element names. The representation is made up of two schema documents: ord.xsd, which defines components in the order namespace, and prod.xsd, which defines components in the product namespace.

Both schema documents have elementFormDefault set to qualified. As a result, locally declared elements must use qualified element names in the instance. In this example, the declaration for order is global and the declarations for product, items, and number are local.

Example 21–16. Schema for qualified local element names


ord.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:prod="http://datypic.com/prod"
           xmlns="http://datypic.com/ord"
           targetNamespace="http://datypic.com/ord"
           elementFormDefault="qualified">
  <xs:import schemaLocation="prod.xsd"
             namespace="http://datypic.com/prod"/>
  <xs:element name="order" type="OrderType"/>
  <xs:complexType name="OrderType">
    <xs:sequence>
      <xs:element name="number" type="xs:integer"/>
      <xs:element name="items" type="prod:ItemsType"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

prod.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/prod"
           targetNamespace="http://datypic.com/prod"
           elementFormDefault="qualified">
  <xs:complexType name="ItemsType">
    <xs:sequence maxOccurs="unbounded">
      <xs:element name="product" type="ProductType"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element name="number" type="xs:integer"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>


To create a schema for the instance in Example 21–15, which has unqualified names, you can simply change the value of elementFormDefault in both schema documents to unqualified. Since the default value is unqualified, you could alternatively simply omit the attribute. In this case, globally declared elements still must use qualified element names, hence the use of ord:order in the instance.

21.7.3.4. Form and global element declarations

An important thing to notice is that the choice between qualified and unqualified names applies only to local element declarations. All globally declared elements must have qualified element names in the instance; there is no way to override this. In our example, all of the element declarations except order are local. If the product declaration had been global, the product elements would have to use qualified element names, regardless of the value of elementFormDefault.

This can cause confusion if you choose to use unqualified local element names, and you want to mix global and local element declarations. Not only would an instance author be required to know what namespace each element is in, but he or she would also need to know whether it was globally or locally declared in the schema.

This can be avoided by making all element declarations local, except for the declaration for the root elements. However, you may not have this choice if you import element declarations from namespaces that are not under your control. Also, if you plan to use substitution groups, the participating element declarations must be global.

21.7.3.5. Default namespaces and unqualified names

Default namespaces do not mix well with unqualified element names. The instance in Example 21–17 declares the order namespace as the default namespace. However, this will not work with a schema document where elementFormDefault is set to unqualified, because it will be looking for the elements items, product, and number in the order namespace, in which they are not—they are not in any namespace.

Example 21–17. Invalid mixing of unqualified names and default namespace


<order xmlns="http://datypic.com/ord">
  <number>123ABBCC123</number>
  <items>
    <product>
      <number>557</number>
    </product>
  </items>
</order>


21.7.3.6. Qualified vs. unqualified element names

Whether to use qualified or unqualified local element names is a matter of style. The advantages of using qualified local element names are:

• You can tell by looking at the document which namespace a name is in. If you see that a b element is in the XHTML namespace, you can more quickly understand its meaning.

• There is no ambiguity to a person or application what namespace an element belongs in. In our example, there was a number element in each of the namespaces. In most cases, you can determine from its position in the instance whether it is an order number or a product number, but not always.

• Certain applications or processors, for example an XHTML processor, might be expecting the element names to be qualified with the appropriate namespace.

• You can mix global and local element declarations without affecting the instance authors. You may be forced to make some element declarations global because you are using substitution groups, or because you are importing a schema document over which you have no control. If you use unqualified local names, the instance author has to know which element declarations are global and which are local.

The advantages of using unqualified local element names are:

• The instance author does not have to be aware of which namespace each element name is in. If many namespaces are used in the instance, this can simplify creation of instances.

• The lack of prefixes and namespace declarations makes the instance look less cluttered.

In general, it is best to use qualified element names, for the reasons stated above. If consistent prefixes are used, they just become part of the name, and authors get used to writing prod:number rather than just number. Most XML editors assist instance authors in choosing the right element names, prefixed or not.

21.7.3.7. Qualified vs. unqualified attribute names

The decision about qualified versus unqualified forms is simpler for attributes than elements. Qualified attribute names should only be used for attributes that apply to a variety of elements in a variety of namespaces, such as xml:lang or xsi:type. Such attributes are almost always declared globally. For locally declared attributes, whose scope is the type definition in which they appear, prefixes add extra text without any additional meaning.

The best way to handle qualification of attribute names is to ignore the form and attributeFormDefault attributes completely. Then, globally declared attributes will have qualified names, and locally declared attributes will have unqualified names, which makes sense.

21.8. Schema documentation

XML Schema is a full-featured language for describing the structure of XML documents. However, it cannot express everything there is to know about an instance or the data it contains. This section explains how you can extend XML Schema to include additional information for users and applications, using two methods: annotations and non-native attributes.

21.8.1. Annotations

Annotations are represented by annotation elements, whose syntax is shown in Table 21–4. An annotation may appear in almost any element in the schema, with the exception of annotation itself and its children, appinfo and documentation. The schema, override, and redefine elements can contain multiple annotation elements anywhere among their children. All other elements may only contain one annotation, and it must be their first child.

Table 21–4. XSD Syntax: annotation

Image

The content model for annotation allows two types of children: documentation and appinfo. A documentation element is intended to be human-readable user documentation, and appinfo is machine-readable for applications. A single annotation may contain multiple documentation and appinfo elements, in any order, which may serve different purposes.

21.8.2. User documentation

User documentation is represented by the documentation element. Sometimes it will consist of simple text content used for a description of a component. However, as it can contain child elements, it can be used for a more complex structure. Often, it is preferable to store more detail than just a simple description. The types of user information that you might add to a schema include:

• Descriptive information about what the component means. The name and structure of a type definition or element declaration can explain a lot, but they cannot impart all the semantics of the schema component.

• An explanation of why a component is structured in a particular way, or why certain XML Schema mechanisms are used.

• Metadata such as copyright information, who is responsible for the schema component, its version, and when it was last changed.

• Internationalization and localization parameters, including language translations.

• Examples of valid instances.

21.8.2.1. Documentation syntax

The syntax for a documentation element is shown in Table 21–5. It uses an element wildcard for its content model, which specifies that it may contain any number of elements from any namespace (or no namespace), in any order. Its content is mixed, so it may contain character data as well as children.

Table 21–5. XSD Syntax: documentation

Image

The source attribute can contain a URI reference that points to further documentation. The schema processor does not dereference this URI during validation.

Example 21–18 shows the use of documentation to document the product element declaration.

Example 21–18. Documentation


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:doc="http://datypic.com/doc">
  <xs:element name="product" type="ProductType">
    <xs:annotation>
      <xs:documentation xml:lang="en"
                     source="http://datypic.com/prod.html#product">
        <doc:description>This element represents a product.
        </doc:description>
      </xs:documentation>
    </xs:annotation>
  </xs:element>
  <!--...-->
</xs:schema>


Although you can put character data content directly in documentation or appinfo, it is preferable to structure it using at least one child element. This allows the type of information (e.g., description) to be uniquely identified in the case of future additions.

Instead of (or in addition to) including the information in the annotation, you can also provide links to one or more external documents. To do this, you can either use the source attribute or other mechanisms, such as XLink. This allows reuse of documentation that may apply to more than one schema component.

21.8.2.2. Data element definitions

When creating reusable schema documents such as type libraries, it is helpful to have a complete definition of each component declared or defined in it. This ensures that schema authors are reusing the correct components, and allows you to automatically generate human-readable documentation about the components. Example 21–19 shows a schema document that includes a complete definition of a simple type CountryType in its documentation. The example is roughly based on ISO 11179, the ISO standard for the specification and standardization of data elements.

Example 21–19. ISO 11179-based type definition


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:doc="http://datypic.com/doc">

  <xs:simpleType name="CountryType">
    <xs:annotation>
      <xs:documentation>
        <doc:name>Country identifier</doc:name>
        <doc:identifier>3166</doc:identifier>
        <doc:version>1990</doc:version>
        <doc:registrationAuthority>ISO</doc:registrationAuthority>
        <doc:definition>A code for the names of countries of the
          world.</doc:definition>
        <doc:keyword>geopolitical entity</doc:keyword>
        <doc:keyword>country</doc:keyword>
        <!--...-->
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:token">
      <!--...-->
    </xs:restriction>
  </xs:simpleType>

</xs:schema>


21.8.2.3. Code documentation

Another type of user documentation is code control information, such as when it was created and by whom, its version, and its dependencies. This is illustrated in Example 21–20. The element names used in the example are similar to the keywords in Javadoc.

Example 21–20. Code documentation


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:doc="http://datypic.com/doc">

  <xs:simpleType name="CountryType">
    <xs:annotation>
      <xs:documentation>
        <doc:author>Priscilla Walmsley</doc:author>
        <doc:version>1.1</doc:version>
        <doc:since>1.0</doc:since>
        <doc:see>
          <doc:label>Country Code Listings</doc:label>
          <doc:link>http://datypic.com/countries.html</doc:link>
        </doc:see>
        <doc:deprecated>false</doc:deprecated>
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:token">
      <!--...-->
    </xs:restriction>
  </xs:simpleType>

</xs:schema>


21.8.2.4. Section comments

There is a reason that schema, override, and redefine elements can have multiple annotations anywhere in their content. These annotations can be used to break a schema document into sections and provide comments on each section. Example 21–21 shows annotations that serve as section comments. Although they are more verbose than regular XML comments (which are also permitted), they are more structured. This means that they can be used, for example, to generate XHTML documentation for the schema.

Example 21–21. Section identifiers


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:annotation><xs:documentation><sectionHeader>
    ********* Product-Related Element Declarations ***************
  </sectionHeader></xs:documentation></xs:annotation>
  <xs:element name="product" type="ProductType"/>
  <xs:element name="size" type="SizeType"/>

  <xs:annotation><xs:documentation><sectionHeader>
    ********* Order-Related Element Declarations *****************
  </sectionHeader></xs:documentation></xs:annotation>
  <xs:element name="order" type="OrderType"/>
  <xs:element name="items" type="ItemsType"/>

  <!--...-->
</xs:schema>


21.8.3. Application information

There is a wide variety of use cases for adding application information to schemas. Some of the typical kinds of application information to include are:

• Extra validation rules, such as co-constraints. XML Schema alone cannot express every constraint you might want to impose on your instances.

• Mappings to other structures, such as databases or EDI messages. These mappings are a flexible way to tell an application where to store or extract individual elements.

• Mapping to XHTML forms or other user input mechanisms. The mappings can include special presentation information for each data element, such as translations of labels to other languages.

• Formatting information, such as a stylesheet fragment that can convert the instance element to XHTML, making it presentable to the user.

The syntax for appinfo, shown in Table 21–6, is identical to that of documentation, minus the xml:lang attribute.

Table 21–6. XSD Syntax: application information

Image

Example 21–22 shows the use of appinfo to map product to a database table.

Example 21–22. Application information


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:app="http://datypic.com/app">
  <xs:element name="product" type="ProductType">
    <xs:annotation>
      <xs:appinfo>
        <app:dbmapping>
            <app:tb>PRODUCT_MASTER</app:tb>
        </app:dbmapping>
      </xs:appinfo>
    </xs:annotation>
  </xs:element>
  <!--...-->
</xs:schema>


In this example, we declare a namespace http://datypic.com/app for the dbmapping and tb elements used in the annotation. This is not required; appinfo can contain elements with no namespace. However, it is preferable to use a namespace because it makes the extension easily distinguishable from other information that may be included for use by other applications.

21.8.4. Non-native attributes

In addition to annotations, all schema elements are permitted to have additional attributes. These attributes are known as non-native attributes, since they must be in a namespace other than the XML Schema Namespace. Example 21–23 shows an element declaration that has the non-native attribute description.

Example 21–23. Non-native attributes


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:doc="http://datypic.com/doc"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:schemaLocation="http://datypic.com/doc doc.xsd">
  <xs:element name="product" type="ProductType"
           doc:description="This element represents a product."/>
  <!--...-->
</xs:schema>


As with appinfo and documentation contents, the non-native attributes are validated through a wildcard with lax validation. If attribute declarations can be found for the attributes, they will be validated, otherwise the processor will ignore them. In this case, the xsi:schemaLocation attribute points to a schema document for the additional attributes. Example 21–24 shows a schema that might be used to validate the non-native attributes.

The schema does not include new declarations for schema elements. Rather, it contains global declarations of any non-native attributes.

Example 21–24. A schema for non-native attributes


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/doc"
           targetNamespace="http://datypic.com/doc">
  <xs:attribute name="description" type="xs:string"/>
</xs:schema>


21.8.4.1. Design hint: Should I use annotations or non-native attributes?

This is roughly the same as the general question of whether to use elements or attributes. Both can convey additional information, and both are made available to the application by the schema processor. Non-native attributes are less verbose, and perhaps more clear because they are closer to the definitions to which they apply. However, they have drawbacks: they cannot be used more than once in a particular element, they cannot be extended in the future, and they cannot contain other elements. For example, if you decide that you want the descriptions to be expressed in XHTML, this cannot be done if the description is an attribute. For more information on attributes versus elements, see Section 7.1 on p. 113.

21.8.5. Documenting namespaces

It is generally a good idea to use URLs for namespace names and to put a resource at the location referenced by the URL. There are many reasons not to require your application to dereference a namespace name at runtime, including security, performance, and network availability. However, a person might want to dereference the namespace in order to find out more information about it.

It might seem logical to put a schema document at that location. Having a namespace name resolve to a schema document, though, is not ideal because:

• Many schemas may describe that namespace. Which one do you choose?

• A variety of documents in other formats may also describe that namespace, including DTDs, human-readable documentation, schemas written in other schema languages, and stylesheets. Each may be applicable in different circumstances.

• Schema documents are not particularly human-readable, even by humans who write them!

A better choice is a resource directory, which lists all the resources related to a namespace. Such a directory can be both human- and application-readable. It can also allow different resources to be used depending on the application or purpose.

One language that can be used to define a resource directory is RDDL (Resource Directory Description Language). RDDL is an extension of XHTML that is used to define a resource directory. It does not only apply to namespaces, but it is an excellent choice for documenting a namespace. Example 21–25 shows an RDDL document that might be placed at the location http://datypic.com/prod.

Example 21–25. RDDL for the product catalog namespace


<?xml version='1.0'?>
<!DOCTYPE html PUBLIC "-//XML-DEV//DTD XHTML RDDL 1.0//EN"
                      "rddl/rddl-xhtml.dtd">
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xmlns:rddl="http://www.rddl.org/">
<head><title>Product Catalog</title></head>
<body><h1>Product Catalog</h1>
  <div id="toc"><h2>Table of Contents</h2>
    <ol>
      <li><a href="#intro">Introduction</a></li>
      <li><a href="#related.resources">Resources</a></li>
    </ol>
  </div>
  <div id="intro"><h2>Introduction</h2>
    <p>This document describes the <a href="#xmlschemap1">Product
      Catalog</a> namespace and contains a directory of links to
      related resources.</p>
  </div>
  <div id="related.resources">
    <h2>Related Resources for the Product Catalog Namespace</h2>
    <!-- start resource definitions -->

  <div class="resource" id="DTD">
    <rddl:resource xlink:title="DTD for validation"
     xlink:arcrole="http://www.rddl.org/purposes#validation"
     xlink:role="http://www.isi.edu/in-
                 notes/iana/assignments/media-types/text/xml-dtd"
     xlink:href="prod.dtd">
      <h3>DTD</h3>
      <p>A <a href="prod.dtd">DTD</a> for the Product Catalog.</p>
    </rddl:resource>
  </div>

  <div class="resource" id="xmlschema">
    <rddl:resource xlink:title="Products schema"
     xlink:role="http://www.w3.org/2001/XMLSchema"
     xlink:arcrole="http://www.rddl.org/purposes#schema-validation"
     xlink:href="prod.xsd">
      <h3>XML Schema</h3>
      <p>An <a href="prod.xsd">XML Schema</a> for the Product
         Catalog.</p>
    </rddl:resource>
  </div>

  <div class="resource" id="documentation">
    <rddl:resource xlink:title="Application Documentation"
     xlink:role="http://www.w3.org/TR/html4/"
     xlink:arcrole="http://www.rddl.org/purposes#reference"
     xlink:href="prod.html">
      <h3>Application Documentation</h3>
      <p><a href="prod.html">Application documentation</a> for
          the Product Catalog application.</p>
    </rddl:resource>
  </div>
  </div>
</body></html>


This document defines three related resources. Each resource has a role, which describes the nature of the resource (e.g., schema, DTD, stylesheet), and an arcrole, which indicates the purpose of the resource (e.g., validation, reference). An application that wants to do schema validation, for example, can read this document and extract the location of the schema document to be used for validation. A person could also read this document in a browser, as shown in Figure 21–4.

Image

Figure 21–4. Viewing a RDDL document in a web browser

For more information on RDDL, see www.rddl.org.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.111.58