16.2. The PSVI

The various kinds of infoset classes defined in the Infoset Recommendation are specific classes providing specific properties. If you consider additional properties, what you have is no longer that infoset but something else. In the case of schema processing, what we have is just that: properties added to the infoset classes and new classes added. The result is the PSVI (the “post-schema-validation infoset”). In this book, the infoset structure defined in the Infoset Recommendation is called the “basic infoset.”

A qualitative difference exists between the PSVI and the basic infoset. Specifically, the basic infoset contains almost no information about the results of validation against a DTD other than attribute values that are implied because they have defaults and are not specified in the start-tag or empty-element tag of the element involved. Most of the additions in the PSVI, on the other hand, make available in the infoset a lot of information determined during schema processing.

Warning

The Schema Recommendation describes the PSVI and recommends that the information in it be made accessible to the application. But it does not define a standard programming interface for that access, nor does it make that recommendation mandatory for conformance. Until standard APIs are defined and supporting products become available, be sure to carefully check the information that is made available by each schema processor you consider using. Watch for a future DOM revision, perhaps (http://www.w3.org/DOM/).


16.2.1. The Basic Infoset

Eleven distinct information item classes are defined in the Infoset Recommendation:

  • Document information item

  • Element information item

  • Attribute information item

  • Character information item

  • Processing instruction information item

  • Unexpanded entity reference information item

  • Comment information item

  • Document type declaration information item

  • Unparsed entity information item

  • Notation information item

  • Namespace information item

Of these, only element, attribute, character, and notation information items have anything at all to do with the PSVI; the latter two only marginally. All but notation information items are described in Section 5.2.

A notation information item plays the same role as a notation component in a schema: it’s just some information to be passed on to the application. Schema processing does nothing with notations except check that the local notation name is in fact associated with a notation. (Compare the description of notation components in Section 15.6; for more information about handling notations, see Section 7.10.)

Note

“Notation” is not something that is ever accurately defined. There are allusions to the idea that it identifies a markup language; various markup languages for text and graphics have all been called “notations.” But as far as XML is concerned, a notation is just a “something” identified by a system or public identifier (or both). XML processing passes the identifier(s) on to the application, which has to be programmed to recognize them and implement them properly for its purposes.

Schema processing (or DTD processing, for that matter) does only one thing with notations: It ensures that any QName used as the value of a notation-datatyped element or attribute matches a notation component that associates the local QName with the system and/or public identifier the application is expected to be able to recognize and act upon.


16.2.2. PSVI-added Properties

The PSVI adds properties to only two basic-infoset information items: element and attribute. All else is added via these properties, some of which have new PSVI-added information items as values.

16.2.2.1. The PSVI Element Information Item

These are the properties of basic element information items:

  • Namespace name

  • Local name

  • Prefix

  • Children

  • Attributes

  • Namespace attribute

  • In-scope namespaces

  • Base URI

  • Parent

These properties are discussed in Section 5.2.2.

Twenty-four more properties of element information items are added for the PSVI:

  • Properties applicable only to the validation root element:

    • ID/IDREF table

    • A set of ID/IDREF bindings. (These information items are discussed in Section 16.2.4.)

    • Identity-constraint table

    • A set of identity-constraint bindings. (These information items are discussed in Section 16.2.4.)

    • Schema information

    • A set of namespace schema information information items. (These information items are discussed in Section 16.2.4.)

  • “Real information”: If the element has a valid attribute whose structure type specifies the notation datatype, there will be either a notation property or a system identifier and/or public identifier on the element information item. The value of notation is a notation component (see Sections 16.2.4 and 15.6); the value of either identifier is a character string. Specifically, these are the possible properties; either the first or the other two must be present:

    • Notation

    • Notation public

    • Notation system

    If the element has simple content, the datatype specifies one of several normalization algorithms (involving whitespace modification) to be applied to the raw character string. The schema-normalized value is the value of this property. For elements, schema normalization begins with the raw data character string that is the element’s content; for attributes, the starting value has already been normalized according to rules in the XML Recommendation.

    • Schema normalized value

      The normalized data: a character string, or ABSENT if the element does not have simple content or does have an xsi:nil attribute whose value is ‘true’).

    • Schema specified

      Either SCHEMA or INFOSET. SCHEMA specifies that the schema normalized value’s value came from a schema-specified default rather than from raw data present in the basic infoset.

    (Some might consider the nil property to be real information, but the Schema Recommendation considers it to be parallel to the element declaration property, which is clearly type information. You’ll find nil described under “type information”, following.)

  • Validity information:

    • Validity

      One of VALID, INVALID, or NOTKNOWN.

    • Schema error code

      A list of outcomes as prescribed in Appendix C of the Schema Recommendation; ABSENT unless validity is INVALID.

    • Validation attempted

      One of FULL, PARTIAL, or NONE.

    • Validation context

      A back-pointer to the validation root.

  • Type information:

    • Schema default

      The canonical lexical representation of the default value, if the governing element type provides one.

    There are two options: Either schema processing provides full copies of the structure type components or it provides name information identifying the structure type.

    Providing full structure type components will give values to these properties:

    • Element declaration

      The element type component (or “element declaration schema component”) against which this element was validated, if any. (It may have been validated only against a content type specified by the xsi:type “escape hatch.”)

    • Type definition

      The structure type prescribed by the element type (or the structure type specified in the element itself by using the xsi:type “escape hatch”).

    • Member type definition

      If the structure type is a simple type that is a union, the string that is the content matches a string in the lexical space of this member of the union (otherwise ABSENT).

    Providing only name information identifying the structure type will give values to these properties:

    • nil

      TRUE if the element was explicitly nilled by specifying xsi:nil to be ‘true’; otherwise, FALSE. (The attribute is also reflected in the attributes set.)

      The Schema Recommendation considers nil to be parallel to element declaration.

    • Type definition anonymous

      A Boolean; FALSE if the type definition has a name.

    • Type definition name

      The name of the applicable structure type, if the type definition anonymous property value is FALSE; otherwise, ABSENT or a system-generated unique identifier that distinguishes between distinct unnamed structure types.

    • Type definition namespace

      A namespace name (a URI) or ABSENT; the target namespace of the applicable structure type.

    • Type definition type

    • Either SIMPLE or COMPLEX.

    If the content type is a simple type that is a union, the following three additional properties provide information about the member type of the union that gave rise to the element, similar to that provided by the three similarly-named properties just described:

    • Member type definition anonymous

    • Member type definition name

    • Member type definition namespace

16.2.2.2. The PSVI Attribute Information Item

These are the properties of basic attribute information items:

  • Namespace name

  • Local name

  • Prefix

  • Normalized value

  • Specified

  • Attribute type

  • References

  • Owner element

These properties are discussed in Section 5.2.3.

Note

Unlike elements, which get no direct indication of their structure type in the basic infoset, attributes do get an indication of their DTD-prescribed structure type. When schema processing occurs, much more information is made directly available in the PSVI, as it is for elements.


Seventeen more properties of attribute information items are added for the PSVI:

  • “Real information”: The (simple) structure type of the attribute specifies a datatype to which it must conform. That datatype specifies one of several normalization algorithms (mostly involving whitespace modification) to be applied to the raw character string. This schema-normalized value is the value of this property. For elements, schema normalization begins with the raw data character string that is the element’s content. For attributes, the value has already been normalized according to rules in the XML Recommendation; it is this already-normalized value (the only available value from the basic infoset) that is further normalized according to the datatype requirements. (For elements, the only prior normalization is the replacement of character references by the characters referenced.)

    • Schema normalized value

      The normalized data (a character string, even if the attribute’s structure type specifies a non-string datatype).

    • Schema specified: Either SCHEMA or INFOSET. SCHEMA specifies that the schema normalized value’s value came from a schema-specified default rather than from a specified attribute.

  • Validity information:

    • Validity

      One of VALID, INVALID, or NOTKNOWN.

    • Schema error code

      A list of outcomes as prescribed in Appendix C of the Schema Recommendation; ABSENT unless validity is INVALID.

    • Validation attempted

      One of FULL or NONE.

    • Validation context

      A back-pointer to the validation root.

  • Type information:

    • Schema default

      The canonical lexical representation of the default value, if the governing attribute type provides one.

    There are two options: Either schema processing provides full copies of the structure type components or it provides name information identifying the type.

    Providing full structure type components will give values to these properties:

    • Attribute declaration

      The attribute type component against which this attribute was validated, if any.

    • Type definition

      The structure type prescribed by the attribute type.

    • Member type definition

      If the content type is a union, the string that is the value matches a string in the lexical space of this member of the union (otherwise ABSENT).

    Providing only name information identifying the structure type will give values to these properties:

    • Type definition anonymous

      A Boolean; FALSE if the structure type has a name.

    • Type definition name

      The name of the applicable structure type if the type definition anonymous property value is FALSE; if not, may be ABSENT or a system-generated unique identifier that distinguishes between distinct unnamed structure types.

    • Type definition namespace

      A namespace name (a URI) or ABSENT; the target namespace of the applicable structure type.

    • Type definition type

      Always SIMPLE for attributes.

    If the content type is a simple type that is a union, the following three additional properties provide information about the member type of the union that gave rise to the element, similar to that provided by the three similarly-named properties just described:

    • Member type definition anonymous

    • Member type definition name

    • Member type definition namespace

16.2.3. PSVI-added Information Items

A number of new kinds of information items are added in the PSVI. They occur as values of added properties on the element and attribute information items.

Unlike the basic infoset, the PSVI loads in a lot of information about the results of processing the XML instance against the schema. This includes pointers to (or copies of) components of the schema, Booleans, and other indications of whether particular elements and attributes were validated, which validation algorithm (strict, lax or skip) was used, how validation failed (if it did), what schema was used, what particular structure type was used for each element and attribute, where validation was attempted, and so on.

The PSVI not only includes the name of the structure type—it can also include the entire structure type component and any other components that might be appended to it via values of properties. As a result, many kinds of schema components can wind up in any given PSVI.

Notation Note

The Schema Recommendation does not want to call schema components “information items” and does not want to pollute infosets with specialized objects that are not information items. Accordingly, the Recommendation introduces the notion of information items “isomorphic to” schema components: objects with the same properties and values thereof, except that all of the values that are themselves schema components are replaced by corresponding “isomorphic” information items. It’s as though you took a schema, painted all the components green, and declared that the green components (with green child components) are really information items suitable for inclusion in information sets. There’s really no difference between the two.

In this book, they’re called “components,” even when they’re used as information items. The Schema Recommendation’s isomorphism is, at least for this book’s purposes, the identity.


Four brand-new kinds of information items are also added by the PSVI:

  • ID/IDREF binding

  • Identity-constraint binding

  • Namespace schema information

  • Schema document

Of these, the first two play similar roles with respect to the ID/IDREF binding capabilities of DTDs mimicked by schemas and the more complex binding capabilities available only via schemas. The third and fourth serve to identify the various schema documents that make up the schema, and are collected by the target namespace. In fact, one property (to be given a value “if available”) has as its value the document information item: the entire schema document in “abstract” form.

16.2.4. The New PSVI Information Items

In addition to the information items considered isomorphic to schema components, four totally new kinds of information items are used in schema processing. Two relate to binding by ID/IDREF constraints and schema-specific identity constraints. The other two relate to identifying relevant schema parts by target namespace and schema-document source. All occur only at the validation root element, as members of the ID/IDREF table, identity-constraint table, and schema information properties’ values (each of which are sets).

ID/IDREF bindings provide information linking IDs with IDREFs. Identity-constraint bindings provide similar information for links determined by schema-based identity constraints (discussed in Chapter 13).

Both ID/IDREF bindings and identity-constraint bindings are used during validation to determine the existence and uniqueness of targets for corresponding pointers. Neither is required to be made visible by the processor to the using application. Accordingly, their structure is not important. The structure of ID/IDREF bindings is described here because it is relatively simple; the structure of identity-constraint bindings is not. For any details of the identity-constraint binding mechanism’s structure not available in Chapter 13, see the Schema Recommendation itself.

Here follow descriptions of each new information item’s properties.

  • The ID/IDREF binding information item

  • These two properties are provided by ID/IDREF binding:

    • id

      A name used somewhere as an ID or IDREF.

    • binding

      The set of all element information items given that ID. (Validation will require this set to have exactly one member. The set will be empty if the name is used as an IDREF but never as an ID; it will have more than one member if two elements are given the same ID.)

  • The identity-constraint binding information item

  • These two properties are provided by identity-constraint binding:

    • Definition

    • Node table

Namespace schema information information items give information about that part of a schema that relates to one namespace. They occur only as members of the schema information property (of the validation root); that entire set ultimately contains information from the entire schema, sorted by target namespace.

Schema document information items in turn occur only as members of the schema documents property of namespace schema information items.

  • The namespace schema information information item

    These three properties are provided by namespace schema information:

    • Schema namespace

      A namespace name (a URI), naming the namespace about which this information item provides information.

    • Schema components

      A set of schema components; all the components whose target namespace is this object’s schema namespace.

    • Schema documents

      A set of schema document information items, one for each schema document that contributed components to the schema components set.

Note

A schema can in theory be created by some means other than being built from schema documents, in which case the schema documents property will be an empty set.


  • The schema document information item:

    These two properties are provided by schema document:

    • Document location

      A URI pointing to the schema document.

    • Document

      The document information item that is the abstract version of the schema document.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.90.44