Simple type derivations

A major feature of this standard is the ability it gives schema authors to create new simple data types that are specific to the document model being defined. A new data type is 'derived' from an existing type by first referencing an existing simple data type, then creating a new one that is either a restricted version or an extended version of that type (later it is shown how complex types can also be derived).

For example, it is possible to create a new numeric type that is based on the 'integer' type, but is constrained to a value between '5' and '50'. Conversely, the original type can be extended to allow lists of values, or by allowing the value to conform to one of several possible types. Once created, the new type can be referenced in the usual way from element and attribute definitions. In the following example, the element Score will hold a value of type 'scoreRange', which is a restricted or extended version of a simpler data type, such as 'integer':

<element name="score" type="DOC:scoreRange" />

Because 'scoreRange' is not a built-in data type, it has the namespace prefix of the target namespace, in this case 'DOC:', or would have no prefix if the default namespace was in use.

The creation of new simple data types is not limited to the creation of custom types from those types built in to the standard. It is also possible to create new types from other custom types. There is also a way to prevent this from happening, either at all, or in specific ways (using techniques described toward the end of this chapter).

Simple type definitions

New simple data types are created using the SimpleType element, and named using a Name attribute. In the following example, a new type called 'scoreRange' is being defined:

<simpleType name="scoreRange">...</simpleType>

This element can initially contain an Annotation element, then can either define a list type, create a new type that is a combination of two or more existing types (a 'union') or create a restricted version of an existing data type.

Lists

The List element is used to create lists of values of a given type. The ItemType attribute identifies the data type that the values in the list must conform to:

<simpleType name="scoresList">
  <list itemType="integer" />
</simpleType>

Spaces separate each item in a list. The following example shows multiple integer values in a Scores element that has adopted the type defined above:

<scores>57 123 19 87</scores>

It follows that it would be dangerous to base a list type on the 'string' data type, because strings can contain spaces. It would not be possible to distinguish between embedded spaces and item-separating spaces. A string with one space in it would be treated as two list items.

As an alternative to referencing an existing data type, the List element can contain a complete simple type definition, including unions and restrictions:

<simpleType name="scores">
  <list><simpleType>...</simpleType></list>
</simpleType>

The List element may have a Fixed attribute which, if set to 'true', indicates that no derived types can later be defined with this type as its base.

Unions

It is possible to create a new data type that is an amalgamation of two or more existing types. Values are considered valid when they conform to the constraints of any of the types concerned. The Union element refers to two or more data types, possibly using the MemberTypes attribute to refer to the types to be included. The following example creates a new data type called 'ScoreOrNoScore', from a union of the 'integer' type and the 'NoScore' type (the 'enumeration' element is discussed later):

<!-- value is 'none' only -->
<simpleType name="NoScore">
  <restriction base="NMTOKEN">
    <enumeration value="none"/>
  </restriction>
</simpleType>

<!-- value is either 'none', or a number -->
<simpleType name="ScoreOrNoScore">
  <union memberTypes="integer DOC:NoScore" />
</simpleType>

A Score element that adopts this type can have values such as:

<score>44</score>
<score>none</score>
<score>9</score>

But it is not necessary to reference existing data types. Instead, one or more of the types to be merged can be created within the Union element itself. This approach requires another, embedded SimpleType element, usually containing a Restriction element to create a restricted version of an existing data type (described later).

Combinations of unions and lists

A list can be created from a union type. For example:

<!-- value is either 'none' or a number, and
     is repeatable -->
<simpleType name="ScoreOrNoScoreList">
  <list itemType="DOC:ScoreOrNoScore" />
</simpleType>

A Scores element that adopts this type could have a value as follows:

<scores>44 none 9</scores>

Conversely, a union can include a list as one (or more) of its components. In the following example, the scores list type is combined with a simple boolean type:

<simpleType name="ScoreOrGamePlayed">
  <union memberTypes="DOC:ScoreOrNoScoreList boolean" />
</simpleType>

But this construct can be confusing. It it important to understand that it does not allow the two component types to be mixed within a single value, just because one of the component types happens to be a list type. Instead, it is only possible to have a list of values from the list type, or to have a single value from the other type. An element called ScoresOrGamePlayed that adopted this type could have the following values:

           <!-- game was played -->
<scoresOrGamePlayed>true</scoresOrGamePlayed>

        <!-- game was not played -->
<scoresOrGamePlayed>false</scoresOrGamePlayed>

    <!-- game way played, and here are the scores -->
<scoresOrGamePlayed>44 none 9</scoresOrGamePlayed>

Restrictions

A simple way to form a new data type is to create a restricted version of an existing type. The previous chapter briefly showed how this concept is used to restrict an attribute value to one of a pre-defined set of possible values. In this example, the Attribute definition must modify a simple type (attributes cannot contain complex types). This is done by embedding a SimpleType element (or possibly referencing one defined elsewhere). Within this element, a Restriction element is used to first identify, in its Base attribute, the existing data type that is to be the basis of the new, more restricted type (in this case, 'NMTOKEN'), then hold Enumeration elements to identify each possible value:

<attribute name="security">
  <simpleType>
    <restriction base="NMTOKEN">
      <enumeration value="normal"/>
      <enumeration value="secret"/>
      <enumeration value="topSecret"/>
    </restriction>
  </simpleType>
</attribute>

Note that, because this attribute is defining its own data type, it must not include either a Type attribute or a Ref attribute (it is creating the type, not referring to an existing one, and also not referencing an attribute declaration elsewhere).

Facets

As shown above, a restriction is created by first identifying an existing data type, then specifying the constraints to be applied to this data type in order to form the new data type. These constraints are known as facets. The facets available vary somewhat, depending on the data type. For example, the 'NMTOKEN' type has an 'enumeration' facet, and indeed this facet applies to all data types except for the 'boolean' type. Other facets are less ubiquitous. For example, the 'length' facet cannot be used with the number-based types, such as the 'integer' facet.

The most basic facet types are:

  • length ('length')

  • minimum length ('minLength')

  • maximum length ('maxLength')

  • pattern ('pattern')

  • enumeration ('enumeration')

  • whitespace ('whiteSpace').

Simple type length minLength maxLength pattern enumeration whiteSpace
string Y Y Y Y Y Y
normalizedString Y Y Y Y Y Y
token Y Y Y Y Y Y
byte    Y Y Y
unsignedByte    Y Y Y
base64Binary Y Y Y Y Y Y
hexBinary Y Y Y Y Y Y
integer    Y Y Y
positiveInteger    Y Y Y
negativeInteger    Y Y Y
nonNegativeInteger    Y Y Y
nonPositiveInteger    Y Y Y
int    Y Y Y
unsignedInt    Y Y Y
long    Y Y Y
unsignedLong    Y Y Y
short    Y Y Y
unsignedShort    Y Y Y
decimal    Y Y Y
float    Y Y Y
double    Y Y Y
boolean    Y  Y
time    Y Y Y
dateTime    Y Y Y
duration    Y Y Y
date    Y Y Y
gMonth    Y Y Y
gYear    Y Y Y
gYearMonth    Y Y Y
gDay    Y Y Y
gMonthDay    Y Y Y
Name Y Y Y Y Y Y
QName Y Y Y Y Y Y
NCName Y Y Y Y Y Y
anyURI Y Y Y Y Y Y
language Y Y Y Y Y Y
ID Y Y Y Y Y Y
IDREF Y Y Y Y Y Y
IDREFS Y Y Y Y Y Y
ENTITY Y Y Y Y Y Y
ENTITIES Y Y Y Y Y Y
NOTATION Y Y Y Y Y Y
NMTOKEN Y Y Y Y Y Y
NMTOKENS Y Y Y Y Y Y

Enumerated types have additional facets. These types can be constrained to a specific range of values, using a minimum and maximum setting, and the total number of digits in the value, or in just the fractional part of the value, can also be specified. The additional facets are:

  • maximum inclusive ('maxInclusive')

  • maximum exclusive ('maxExclusive')

  • minimum inclusive ('minInclusive')

  • minimum exclusive ('minExclusive')

  • total digits ('totalDigits')

  • fractional digits ('fractionDigits').

The following table shows which enumerated types can have these facets:

Simple type maxInclusive maxExclusive minInclusive minExclusive totalDigits fractionDigits
byte Y Y Y Y Y Y
unsignedByte Y Y Y Y Y Y
integer Y Y Y Y Y Y
positiveInteger Y Y Y Y Y Y
negativeInteger Y Y Y Y Y Y
nonNegativeInteger Y Y Y Y Y Y
nonPositiveInteger Y Y Y Y Y Y
int Y Y Y Y Y Y
unsignedInt Y Y Y Y Y Y
long Y Y Y Y Y Y
unsignedLong Y Y Y Y Y Y
short Y Y Y Y Y Y
unsignedShort Y Y Y Y Y Y
decimal Y Y Y Y Y Y
float Y Y Y Y   
double Y Y Y Y   
time Y Y Y Y   
dateTime Y Y Y Y   
duration Y Y Y Y   
date Y Y Y Y   
gMonth Y Y Y Y   
gYear Y Y Y Y   
gYearMonth Y Y Y Y   
gDay Y Y Y Y   
gMonthDay Y Y Y Y   

A separate element represents each facet. These are the Length, MinLength, MaxLength, Pattern, Enumeration, Whitespace, MaxInclusive, MaxExclusive, MinInclusive, MaxInclusive, TotalDigits and FractionDigits elements. All of these elements are empty apart from an optional Annotation element, and all have a Value attribute to specify a constraint for that particular type. For example, the 'length' facet may be used to constrain a value to 13 characters:

<length value="13" />

<length value="13"><annotation>...</annotation></length>

The Fixed attribute can be set to 'true' in order to ensure that any data type that uses the current data type as a base cannot override the facet value (except on the Enumeration and Pattern elements, where this attribute is not permitted):

<length value="13" fixed="true"/>

Length facets

The 'length' facet constrains the value to a set number of characters. The 'minimum length' and 'maximum length' facets merely constrain the value to a minimum and maximum number of characters respectively. These two facets can be used together, but it would not make sense to use either along with the 'length' facet (which already specifies a minimum and (identical) maximum value). The Length, MinLength and MaxLength elements are used:

<restriction type="string>
  <length value="13" />
</restriction>
<restriction type="string>
  <minLength value="13" />
  <maxLength value="13" />
</restriction>

Pattern facet

The 'pattern' facet defines a pattern of characters (a template), against which a value is compared. For example, a pattern can be used to specify that a value must begin with three letters, followed by four digits. A value of 'abc1234' would match this pattern, and so would 'xyz9876', but 'ab12345' would fail to match because it has too few letters and too many digits. Patterns can be much more complex, however. For example, 'ab?c(x^x|[d-w-[m]]|zzp{IsGothic})+' is a valid pattern (and the pattern language is explained in detail below). The Pattern element is used:

<pattern value="ab?c(x^x|[d-w-[m]]|zzp{IsGothic})+"/>

Enumeration facet

The 'enumeration' facet defines a fixed value. Typically, there will be several Enumeration elements, between them defining a range of options. Note that an enumeration value can have a space in it (unlike a DTD enumerated type), but this is not advisable if there is a possibility that a new list type will be derived from this type, because spaces are used in list types to separate the items. The Enumeration element is used:

<restriction base="string"> <!-- abc or def -->
  <enumeration value="abc" />
  <enumeration value="def" />
</restriction>


   <code>abc</code>
   <code>def</code>
   <code>abcdef</code> <!-- NOT VALID -->
   <code>xyz</code>    <!-- NOT VALID -->

Whitespace facet

The 'whitespace' facet affects the whitespace characters (space, tab, line-feed and carriage-return) in a value, and can be used to 'replace' line-feed, carriage-return and tab characters with space characters, or go further and 'collapse' such a value by then also removing leading, trailing and multiple-embedded spaces. The value 'collapse' is assumed and fixed for data types that are not string-based. The same is true for all list types. At the other extreme, the value 'preserve' is assumed and fixed for the 'string' data type. But for all types derived from 'string', the Whitespace element can be used to set any of the options (the default being 'preserve').

Note that, uniquely, this facet type is not used to validate a target value, but to modify that value (though the standard seems to be confused on this point, as it also suggests that it is used to create the 'normalizedString' type from the 'string' type, and therefore acts as a validation rather than a transformation instruction):

  <!-- ORIGINAL -->
   <para> This is a
      paragraph.  </para>

<restriction base="string">
  <whitespace value="replace" />
</restriction>

   <!-- REPLACED -->
   <para> This is a   paragraph.  </para>

<restriction base="string">
  <whitespace value="collapse" />
</restriction>

   <!-- COLLAPSED -->
   <para>This is a paragraph.</para>

Numeric value limitation facets

The 'minimum inclusive' facet specifies a minimum allowed value, such as '15'. For a value to be legal, it must be at least this amount. The term 'inclusive' means that the specified value is allowed, as well as all higher values. A 'minimum exclusive' value, on the other hand, excludes the specified value. A value of '15' means that the actual value must be higher than this. How close it can get to this value depends on the data type. For an integer type, the lowest possible value would be '16', but for a decimal type it could be lower than '15.0000000001'.

The MinInclusive, MinExclusive, MaxInclusive and MaxExclusive elements are used to set these limits on values:

<!-- 1 - 99 -->
<restriction base="integer">
  <minInclusive value="1" />
  <maxInclusive value="99" />
</restriction>

<!-- 0.00001 - 0.99999 -->
<restriction base="decimal">
  <minExclusive value="0" />
  <maxExclusive value="1" />
</restriction>

Number of digits facets

The 'total digits' facet specifies the maximum number of digits in a numeric value. This is not the same as the length constraint, as other symbols may appear in a number, such as a leading '+' or '-' sign, and a decimal point. The value '6' simply states that there may be no more than six digits in the value. It is further possible to control the number of these digits that follow a decimal point, using the 'fraction digits' facet. The value of this facet type is the maximum number of digits allowed in the fractional part of a decimal number. The TotalDigits and FractionDigits elements are used:

<restriction base="decimal">
  <totalDigits value="4"/>
  <fractionDigits value="2"/>
</restriction>


   <amount>1</amount>
   <amount>1.2</amount>
   <amount>12.3</amount>
   <amount>12.34</amount>
   <amount>123.45</amount> <!-- TOO MANY DIGITS -->
  <amount>1.234</amount>  <!-- TOO MANY FRACTION DIGITS -->
  <amount>12.345</amount> <!-- BOTH CONSTRAINTS BROKEN -->

Facets in list types

List data types can be constrained using the following facets:

  • length

  • minLength

  • maxLength

  • enumeration.

The three length-related facets are, in this case, used to constrain the number of items in the list, rather than the length of each of these items:

<!-- value is either 'none', or a number and
     is repeatable -->
<simpleType name="ScoreOrNoScoreList">
  <list itemType="DOC:ScoreOrNoScore" />
</simpleType>

 <!-- no more than 10 scores in list -->
<simpleType name="ScoreOrNoScoreListLimits">
  <restriction base="DOC:ScoreOrNoScoreList">
    <maxInclusive value="10" />
  </restriction>
</simpleType>

The 'enumeration' facet restricts the items that can appear in the list to the given values.

Facet usage limitations

While the Pattern and Enumeration elements may repeat, in order to define overlapping restrictions, and to define a list of options, it is not possible to repeat any of the other facets. It makes no sense, for example, to specify two lengths, or two minimum values.

Many combinations of facets make no sense, and are therefore mutually exclusive. For example, it is not possible to set a minimum inclusive value while also setting a minimum exclusive value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.186.201