Case study (this book)

It can be imagined that this book is part of a collection of 'companion' books, which conform to a standard style and layout. The editors of existing books and authors of new books are to use XML-sensitive word processors in order to ensure consistency of structure and style. All new issues will then be paginated automatically. Initial analysis will focus on one book in the range: The XML Companion.

Note

This book was actually produced using FrameMaker+SGML, which (as the name implies) is compatible with SGML (the older brother of XML), and includes a semi-controlled authoring environment based on an SGML DTD. The following example is based on this DTD, but is simplified for XML compatibility.


Book structure

This book obviously follows general structure conventions, including division into three main segments: front-matter, body and back-matter:

<!ELEMENT book  (front, body, back)>

Front-matter

The front-matter segment contains, amongst other items, the title and edition, author name and contact details, publisher name and address, and the date of publication:

<!ELEMENT front     (title, edition, author,
                     publisher)>

<!ELEMENT title     (#PCDATA)>

<!ELEMENT edition   (#PCDATA)>

<!ELEMENT author    (first, second, e-mail?)>

<!ELEMENT first     (#PCDATA)>

<!ELEMENT second    (#PCDATA)>

<!ELEMENT e-mail    (#PCDATA)>

<!ELEMENT publisher (pubName, address)

<!ELEMENT pubName   (#PCDATA)>

<!ELEMENT address   (#PCDATA)>

  <front>
    <title>The XML Companion</title>
    <edition>Third Edition</edition>
    <author>
      <first>Neil</first>
      <second>Bradley</second>
      <e-mail>[email protected]</e-mail>
    </author>
    <publisher>
      <pub-name>Pearson Education Limited</pub-name>
      <address>Edinburgh Gate, Harlow, CM20 2JE,
      United Kingdom</address>
    </publisher>
  </front>

Body

At first sight, the body of the book appears to be a simple sequence of chapters. However, close study of the contents list reveals a higher level of structure. Although not named, this can be thought of as major book divisions. It may be decided that this layer is not mandatory in the series of books:

<!ELEMENT body     ((chapter*, division+) | chapter+)>
<!ELEMENT division (title, chapter+)>

  <body>
    <division>
      <title>The XML standard</title>
      ...
    </division>
    <division>
      <title>Extension standards</title>
      ...
    </division>
    ...
    ...
  </body>

Note that the Division element contains a Title element, which has already been defined and used to title the book. The purpose of a particular Title instance is dependent on its context, so it is easy to distinguish a book title, which appears in one style in the first pages, from a division title that appears in the contents. The Title element is also used in other contexts:

<!ELEMENT chapter  (title, (...))>

Sections

Most chapters contain headings, some larger than others. In fact, there are two levels of heading, and it may be tempting to define elements called Header-one and Header-two, which surround only the heading text. However, another way to look at this is to recognize that the headings are identifying a block of text. In this case, the whole block of text should be isolated, perhaps by an element named Section, and the heading text itself is then identified by an embedded Title element. The smaller headings identify sub-sections. The advantage of this approach is that it becomes possible to extract the sections and sub-sections for possible reuse in other publications. In addition, a hypertext link to a section may return the entire section to the browser:

<!ELEMENT chapter    (title, quote, ..., section*)>

<!ELEMENT section    (title, (..., subSection*))>

<!ELEMENT subSection (title, (...))>

Blocks

At the next level down in the book structure, there are miscellaneous 'block'-level structures. They are called blocks because they do not share horizontal space on the page with other elements. The most obvious block-level element is the Paragraph element. In addition, there are List, Table, Graphic and Markup Paragraph elements (and a PageBreak element for forcing page-breaks where appropriate). As all these block structures may be used in various places, it is appropriate to create an entity to hold the content model that groups them:

<!ENTITY % Blocks    "(para | list | markupPara |
                       graphic | table | pageBreak)*" >

Blocks can be used as introductory material in a chapter and section, and form the content of sub-sections:

<!ELEMENT chapter    (title, quote, %Blocks;, section*)>

<!ELEMENT section    (title, %Blocks;, subSection*)>

<!ELEMENT subSection (title, %Blocks;)>

The Markup Paragraph is used to hold multi-line fragments of XML example data. The content appears in a monospaced font, and is indented (as in the fragment above this paragraph). So that the author can control line-break positioning in the example data, each line is enclosed by a Markup Line element:

<!ELEMENT markupPara  (markupLine*)>

<!ELEMENT markupLine  (#PCDATA | ... )*>

  <markupPara>
    <markupLine>Line one of markup</markupLine>
    <markupLine>Line two of markup</markupLine>
  </markupPara>

The List element contains further block-type elements, called Item, which contain the text of each item in the list:

<!ELEMENT list  (item+)>

<!ELEMENT item  (...)>

  <list>
    <item>Item One</item>
    <item>Item Two</item>
    <item>Item Three</item>
  </list>

The List element contains a Type attribute to specify whether it is a numbered or random (bulleted) list. It defaults to random, as this is the most common type used:

<!ATTLIST item       type   (number|random)  "random">

For tables, the popular CALS model is used. This is to take advantage of the capabilities of some SGML-sensitive typesetting and DTP software. In future, this model may be replaced by the HTML table model.

The Graphic element is empty because it is a placeholder for an image. It contains an Identifier attribute, which holds an entity name. An entity declaration is required for each picture in the book:

<!ELEMENT graphic      EMPTY>
<!ATTLIST graphic      id     ID       #IMPLIED
                       ident  ENTITY   #REQUIRED>

In-line elements

There are various classes of in-line element, which may be used in varying combinations within the block-level elements. To help describe these classifications, three entities are defined:

<!ENTITY % SuperSub "sup | sub" >

<!ENTITY % Hilite  "markup | emphStrong | emphWeak" >

<!ENTITY % Inline  "(#PCDATA | %Hilite; |
                     %SuperSub; | x-ref)*" >

The Superscript/Subscript entity refers to superscript and subscript text, such as 'H2O'. The Hilite entity refers to the Markup, Emphasis Strong and Emphasis Weak elements, which are used to enclose example fragments within a paragraph. Typically, different fonts would be used to identify them in the text. In this book, the Markup element content is presented in a monospaced font, as in 'this is markup', the Emphasis Weak element content is presented in italic typeface, for 'important terms', and the Emphasis Strong element content is presented in bold typeface, for 'key terms'. The Inline entity includes the previous entities and adds the #PCDATA token and Cross Reference element (X-ref). Each element definition is carefully designed to avoid including itself:

<!ELEMENT markup     (#PCDATA | %SuperSub; |
                      emphStrong | emphWeak)*>

<!ELEMENT emphStrong (#PCDATA | markup | emphWeak |
                      %SuperSub; | xRef)*>

<!ELEMENT emphWeak   (#PCDATA | markup |
                      emphStrong | %SuperSub; | xRef)*>

<!ELEMENT sup        (#PCDATA)>

<!ELEMENT sub        (#PCDATA)>

<!ELEMENT xRef       (#PCDATA)>

Examples

The MarkupLine element can, in addition, contain a Presented element. This is used to show the published output, using a sans-serif font:

<!ELEMENT markupPara   (markupLine*)>

<!ELEMENT markupLine   (#PCDATA | ... | presented)*>

  <markupPara>
    <markupLine>XML fragment</markupLine>
    <markupLine>
      <presented>XML fragment</presented>
    </markupLine>
  </markupPara>

    XML fragment


    XML fragment

Some markup fragments may be quite large, in which case it is likely that a page-break would naturally appear somewhere within it. The Splitable attribute is used to specify whether or not the block can be split across pages. The default value of 'loose' means that a page-break may appear within the block. The alternative value of 'together' means that the lines must be kept together (even at the expense of leaving whitespace at the bottom of the page):

<!ATTLIST markupPara  splitable
                        (loose | together) "loose">

Back-matter

The back-matter consists of only the Glossary element. The Glossary element is a simplified version of a Chapter. There is no Title element, because the title 'Glossary' can be assumed, and can therefore be generated automatically:

<!ELEMENT back         (glossary)>

<!ELEMENT glossary     (para*, section*)>

The index is generated automatically, so no data or tags are required.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.121.160