Element declarations

An element declaration is used to define a new element and specify its allowed content. The keyword 'ELEMENT' introduces an element declaration. The name of the element being declared follows (recall the constraint to an initial letter, underscore character, '_', or colon, ':', and thereafter also digits, '.' and '-'), separated by at least one space character:

<!ELEMENT title ..... >

A statement of the legal content of the element is the final required part of the declaration. An element may have no content at all, may have content of only child elements, of only text, or of a mixture of elements and text. If the element can hold no child elements, and also no text, then it is known as an empty element. The keyword 'EMPTY' is used to denote this. In the example below, the Image element is declared to be an empty element, as it is used only to indicate the position of the image. When child elements are allowed, the declaration may contain either the keyword 'ANY', or a model group. An element declared to have a content of 'ANY' may contain all of the other elements declared in the DTD (in practice, however, this approach is rarely used because it allows too much freedom, and therefore undermines the benefits that derive from defining document structures):

<!ELEMENT p ANY>
<!ELEMENT image EMPTY>


   <book><p>An image <image.../> in the text.</p></book>

Note that an element that is allowed to hold child elements, text or both, may just happen to have no content at all. In this case it is legal to employ both a start-tag and end-tag, or to use an empty element tag:

<title></title>                        <title/>

Likewise, an element declared to be empty may be represented by a start-tag, immediately followed by an end-tag, though there must be no elements or text between these tags.

A model group is used to describe enclosed elements and text. The structure of a model group can be complex, and is explained fully in the next section.

<!ELEMENT book (para*, chapter+)>

SGML Notes

The optional minimization codes, '- -' and their variants, never appear in an XML DTD because minimization is not supported (though a DTD can be compliant with both SGML and XML by replacing these characters with a parameter entity, which in the XML version must have an empty replacement value). It is not possible to embed comments within other declarations. Also, an element cannot be declared to have CDATA or RCDATA content.


Model groups

A model group is used to define an element that has mixed content or element content. An element defined to have element content may contain only child elements. An element defined to have mixed content may contain a mixture of child elements and free text. When applied in a document, however, this element may equally contain only text, only child elements or a mixture of the two, and it is not possible to specify the order in which text and elements may intermix.

A model group is bounded by brackets, and contains at least one token. The token may be the name of an included element. In this way document hierarchies are built. For example, a model group used in the declaration for a Book element may refer to embedded Front Matter and Body elements. The declarations for these elements may in turn specify the inclusion of further elements, such as Title and Chapter.

When a model group contains more than one content token, the child elements can be organized in different ways. The organization of elements is controlled using two logical connector operators: ',' (sequence connector) and '|' (choice connector):

(token, token, token)


(token | token | token)

SGML Note

The 'and' connector, '&', is not available. The reason for not including this connector type is related to the added complexity it introduces to document structure models, which complicates development of parser software. At the loss of some flexibility for document authors, it can simply be replaced by the sequence connector.


Sequence control

The sequence rule '(a, b, c)' indicates that element A is followed by element B, which in turn is followed by element C:



Note that other markup, such as comments and processing instructions, may be inserted between these elements. Such markup is not part of the document structure. In an article, for example, it may be important for the title to appear first, followed by the author's name, then a summary:

 ... (title, author, summary)...


<article>
  <!-- this is an article -->
  <title>Article Title</title>
  <?PAGE-BREAK?>
  <author>J. Smith</author>
  <summary>This is an article about XML.</summary>
  ...
</article>

Choice control

The choice rule '(a | b | c)' indicates a choice between the elements A, B and C (only one can be selected):



For example, an article in a magazine may be factual or fictional:

 ... (fact | fiction)...

<article>
  <fact>...</fact>
</article>

<article>
  <fiction>...</fiction>
</article>

Embedded model groups

It is not legal to mix these operators because this would introduce ambiguity in the model. The rule '(a, b, c | d)' is invalid, for example, because it may indicate that 'D' is an alternative to all the other elements, or that 'D' is an alternative only to element C (A and B still being required). To take a more realistic example, consider the need to place a title before a factual or fictional article, but the following model would not make it clear that the title is required regardless of whether the article is factual or fictional:

... (title, fact | fiction) ...

The solution to this problem is to use enclosed model groups. Further brackets are placed according to the meaning required – '((a, b, c) | d)' indicates the first meaning, whereas '(a, b, (c | d))' indicates the latter. In this way, operators are not actually mixed in the same group. In the last example, the outer model group makes use of the choice connector and the inner model group makes use of the sequence connector. The article example is also clarified:

... (title, (fact | fiction)) ...

Quantity control

The DTD author can also dictate how often an element can appear at each location. If the element is required and may not repeat, no further information is required. All of the previous examples indicated a required presence (except where the '|' connector specified a choice of elements). It is also a simple matter to specify a fixed number of reoccurrences of an element. For example, if every article in a magazine had three authors, the following model would ensure that three names are present:

... author, author, author, ...

But is also possible to make an element optional, to allow it to repeat any number of times, and even to make it both optional and repeatable. These occurrence rules are governed using quantity indicators (the symbols '?', '*' and '+').

Optional element

If an element is optional, and cannot repeat, it is followed by a question mark, '?'. For example, '(a, b?)' indicates that element B is optional:



In an article, the Title element may be required, but the Author element may be absent.

Required and repeatable element

If an element is required and may repeat, the element name is followed by a plus, '+'. For example, '(a, b+)' indicates that element B must appear, but may also repeat:



For example, a Book element may require the presence of at least one embedded Chapter element.

To take another example, a list has a number of items so a List element would have repeatable Item child elements. At least one must occur, but it may then be repeated:

<list>                   <list>
  <item>...</item>         <item>...</item>
  <item>...</item>       </list>
  <item>...</item>
</list>

Minimum occurrences

The DTD author can ensure that an element appears at least twice. For example, a list that contains a single item should not be a list at all. A List element may therefore be obliged to hold more than one Item element. This can be achieved using the model '(item, item+)', though care must be taken to place the '+' occurrence symbol after the second Item, as the alternative would be ambiguous, for reasons described below.

Optional and repeatable element

If an element is optional, and also repeatable, the element name is followed by an asterisk, '*'. The '*' may be seen as equivalent to the (illegal) combination '?+'. For example, '(a, b*)' indicates that element B may occur any number of times, and may also be absent:



An Article element may contain any number of Author elements, including none.

To take another example, a chapter may have preliminary paragraphs, but (as in this book) may not always do so:

<chapter>                   <chapter>
  <para>...</para>            <section>...</section>
  <para>...</para>          </chapter>
  <para>...</para>
  <section>...</section>
</list>

Model group occurrences

A model group may itself have an occurrence indicator. The entire group may be optional, required or repeatable. The example '(a, b)?' indicates that the elements A and B must either occur in sequence, or both be absent. Similarly, the example '(a, b)*' indicates that the sequence A then B may be absent, but if present may also repeat any number of times. The example '(a, b)+' indicates that elements A and B must exist, but may also then repeat.

Note

When creating a DTD, there may be several ways to achieve a required effect. The shortest representation possible should always be used for the sake of clarity. For example, the rule '(a+)?' is more simply defined as '(a*)', though '(a+, b+)' should not be confused with '(a, b)+', which is a very different model.


Text

The locations where document text is allowed are indicated by the keyword 'PCDATA' (Parsable Character Data), which must be preceded by a reserved name indicator, '#', to avoid confusion with an element that has the same name (as unlikely as this seems). This keyword represents zero or more characters. An element that may contain only text would be defined as follows:

<!ELEMENT emph (#PCDATA)>

<emph>This element contains text.</emph>

There are strict rules which must be applied when an element is allowed to contain both text and child elements. The PCDATA keyword must be the first token in the group, and the group must be a choice group. Finally, the group must be optional and repeatable. This is known as a mixed content model:

<!ELEMENT emph  (#PCDATA | sub | super)*>
<!ELEMENT sub   (#PCDATA)>
<!ELEMENT super (#PCDATA)>

<emph>H<sub>2</sub>O is water.</emph>

SGML Note

These strict rules are to avoid the ambiguities that alternative arrangements have caused in SGML DTDs. Most SGML DTD authors have adopted these restrictions as an informal rule.


Model group ambiguities

Some care should be taken when creating model groups as it is possible to confuse the parser. There are several ways to inadvertently create an ambiguous content model.

Ambiguities arise when the element encountered in the data stream matches more than one token in the model. The example below illustrates such a case. On encountering an Item element, the parser cannot tell whether it corresponds to the first token in the group (the optional item) or to the second (the required item). If the parser assumes the first case then discovers no more Item elements in the data, an error will result (because the second item is required). If the parser assumes the second case, then encounters another item, an error will also result (because no more Item elements are allowed). The parser is not expected to look ahead to see which situation is relevant, because some examples of this problem would require the parser to search a long way (possibly to the end of the document), complicating the process and hindering efficiency. The example below could be made valid simply by switching the '?' to the second token:

(item?, item)

If alternative model groups contain the same initial element, the parser cannot determine which model group is being followed:

((surname, employee) | (surname, customer ))

On encountering a Surname element, the parser is unsure which model group is active and, as before, will not look ahead to determine which is in use. Such problems can be resolved by redefining the model groups as follows:

(surname, (employee | customer ))

One severe cause of ambiguity in mixed content models has been avoided by only allowing the choice connector in such models. This decision was made in response to the problems that using other connector types in SGML has raised in the past.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.41.214