Element hierarchies

A key feature of container elements is that they are often permitted to contain other elements. For example, a Book element should be expected to be able to contain Chapter elements, and Chapter elements should be able to contain Section elements. This is termed an element hierarchy.

The document element hierarchy may be visualized as boxes within boxes, or as branches of a tree. A tree representation can be drawn in any direction, but left-to-right or top-to-bottom is perhaps the most natural way to view it:



Ultimately, a complete document must be enclosed by a single element. This element lies at the root of the tree, and is informally termed a root element for this reason, though it is properly called the document element. The Book element in the example above is therefore a document element.

Layout of element hierarchies

Embedded elements can be placed on the same line:

<book><chapter><section>...</section></chapter></book>

For increased clarity, they can be placed on different lines instead:

<book>

<chapter>
<section>...</section>
<section>...</section>
</chapter>

<chapter>
<section>...</section>
<section>...</section>
</chapter>

</book>

For even greater clarity, it is common practice (at least in example documents) to indent embedded elements. In the following example, the start and end of each chapter is easy to see at a glance:

<book>
  <chapter>
    <section>...</section>
    <section>...</section>
  </chapter>
  <chapter>
    <section>...</section>
    <section>...</section>
  </chapter>
</book>

The following example shows a fictional XML application for handling quotations. In this example, it is clear to see that the publication and author details are both part of the citation:

<quotation>
  <quoteText>The surest way to make a monkey of a man
  is to quote him</quoteText>
  <citation>
    <publication>Quick Quotations</publication>
    <author>
      <name>Robert Benchley</name>
      <born>1889</born>
      <died>1945</died>
    </author>
  </citation>
</quotation>

Mixed content

Sometimes, it is permitted for an element to contain both text and other elements. This is called mixed content, though in some cases the content will happen to be just element or just text. In the following example, a paragraph contains both text and Name elements:

						<para>The <name>XML</name> standard was
released today by the <name>W3C</name>.</para>
					

Line-ending codes are significant in text content. The example above shows the ideal, or safe way to format the content of a mixed content element. The line-end code after the word 'was' should be considered equivalent to a space character by an application that is formatting the text for display or print (there is much more information on line-ending and space significance in Chapter 8).

Element content

An element that does not directly contain text, but does contain other elements, is said to have element content. For example, it would not usually be reasonable for a Book element to directly contain text. Instead, it may contain Title, Preamble and Chapter elements.

Unless a DTD is in use, it is not possible to know for certain that an element has only element content. A human reader may make reasonable deductions from the name of the element, but software cannot reach such conclusions so easily. Just because there is no actual text between the elements, this does not mean that there cannot ever be. Whether this matters or not depends on a number of factors, mainly concerned with how an application might interpret line-end codes, and also has implications for advanced hypertext linking schemes.

Recursion

Some hierarchical structures may be recursive. This means that an element may directly or indirectly contain other instances of the same type. The term nested element is also used to describe an element that is embedded within another element of the same type. In a typical example, a list consists of a number of items, and one of the items contains a further complete sub-list. Some of the List and Item elements are therefore nested:



However, this leads to the possibility of infinite recursion, which may cause problems for processing or publishing software. It is not possible to limit the degree of recursion once it has been allowed at all by the DTD:

<book>
  <chapter>

    <list>
      <item>...</item>
      <item>

        <list>
          <item>...</item>
          <item>

            <list>
              <item>...</item>
              <item>...</item>
              <item>

                <list>
                  ...

Contextual significance

A book such as this one contains many titles. Apart from the title of the book itself, each chapter, section and sub-section also has one. A different element type could be defined for each usage, with names such as BookTitle, ChapterTitle, SectionTitle and SubSectionTitle.

But this approach is both unwieldy and unnecessary. Document authors, in particular, should not need to have to learn so many element types (though readers familiar with stylesheets in DTP software and word-processors will be familiar with this requirement).

The presence of hierarchical and recursive structures allows the meaning of elements to be at least partially defined by their location in the document. For example, the content of a Title element may be processed or formatted differently, depending on whether the element occurs directly within a book, author, chapter, section, table or illustration:

<book>
  <author><title>Mr</title>...</author>
  <title>Book Title</title>
  <chapter>
    <title>Chapter Title</title>
    <section>
      <title>Section Title</title>
      ...
      <table><title>Table Title</title>...</table>
      ...
      <figure><title>Figure Title</title>...</figure>
    </section>
    ...
  </chapter>
  ...
</book>

For example, it would be possible to target and extract a list of chapter titles to create a table of contents.

Structure constraints

Hierarchical structures are strictly enforced. A document is not well-formed if the structure is broken for any reason. An element must be completely embedded within another element, or must be completely outside of that other element.

For example, a section may not straddle two chapters:



Those familiar with HTML tags may be aware that Web browsers would not object to the following fragment, where the bold and italic text ranges overlap:



This is illegal in XML documents. A document that contained this structure would not be considered to be well-formed. In this simple case, it is only necessary to re-arrange the end-tags in order to make it valid:



However, the following example could not be rectified so easily:



Here, it is necessary to split the range of italic text into two separate elements. One of these elements must be inside the bold element, and the other outside of it:



These constraints may appear to be inconvenient and unnecessary, but are required to build a strict, hierarchical structure. Hierarchies are very useful structures. They give each element an unambiguous contextual location within the document. This is useful for finding, controlling and manipulating XML document fragments (as later chapters will show).

However, there are tricks that have been developed to overcome this constraint, involving pairs of empty elements (see Chapter 6).

Terminology

It is often necessary to discuss a particular element in an XML document, and relate it to other, nearby elements. When describing the relationship between elements the terminology of the family tree is often adopted (an analogy that clearly fits a tree-like view of structures).

From the perspective of a specific Chapter element, for example, adjacent Chapter elements are siblings, like brothers or sisters, the Book element is its parent, and any contained sections are its children:



This concept can be further illustrated with an example XML document that happens to contain appropriate element names in respect to the element named 'target':

<parent>
  <sibling>...</sibling>
  <target>
    <child>...</child>
    <child>...</child>
    <child>...</child>
  </target>
  <sibling>...</sibling>
</parent>

Taking this analogy further, all elements directly or indirectly enclosed by the Chapter element are descendants of that element (including its children), and the Book element can be described as its ancestor (as well as its parent). If the Book element were part of a collection of books in the same XML document, then the Collection element would also be an ancestor:



Again, an example with appropriate element names demonstrates this concept:

<ancestor>
  <ancestor>
    <target>
      <descendant>...</descendant>
      <descendant>
        <descendant>...</descendant>
        <descendant>...</descendant>
        <descendant>...</descendant>
      </descendant>
      <descendant>...</descendant>
    </target>
  </ancestor>
</ancestor>

However, terminology based on the concept of the family tree has its limitations. First, the plural term 'parents' has no meaning, because XML elements can only have one parent. Also, an element that has no child elements is not 'childless', but is termed a 'leaf' element (just as the only element with no parent is called the 'root' element).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.178.157