Markup minimization techniques

At the time SGML was defined, memory and disk space were vastly more expensive than today. The limited memory available to computers in the early 1980s was a concern. A document could be too large, or too complex for the parser to deal with. The size of documents was therefore a major concern, and keeping markup to a minimum was seen as essential. There was also no expectation that WYSIWYG editing tools would be made available to authors. XML document authors were expected to type every character in every tag. So techniques designed to minimize the task of keying markup constructs was of prime importance. For these two reasons, the SGML specification includes a number of markup minimization techniques. Markup tags may be shortened or omitted without disrupting the document structure; but only where the context permits, where the DTD rules allow, and when some optional SGML features are enabled on the local system.

HTML Note

Some of these minimization techniques found their way into HTML. In particular, some end-tags may be omitted, and some attribute names are not required (or indeed expected) as in '<ul compact>'.


Normalized SGML

Minimization in SGML was always an optional feature. When SGML documents are created without using any of the minimization techniques described below, they are considered to be fully normalized. With only a few very minor exceptions, fully normalized SGML looks identical to XML. The following fragment is a fully normalized structure identifying a company employee:

<!-- SGML -->
<employee>
   <name>J. Smith</name>
   <number>9876</number>
   <title>XML Developer</title>
</employee>

For the sake of following examples, it will be assumed that all the elements shown above are required by the DTD, in the strict order given.

Omitted elements

Though it may seem a confusing contradiction, an element that is officially required to be present may in fact be omitted from the document, simply because its presence can be implied.

The DTD may contain switches within element declarations that specify whether or not the start-tag or the end-tag may be omitted from the document (providing that their presence can be implied by context). When this feature is enabled (by the SGML declaration) two single character tokens appear immediately after the element name. Each token must be either '-' (required) or 'o' (omit), and the first applies to the start-tag, the second to the end-tag. In the following example, the Paragraph element end-tag may be omitted:

<!-- SGML -->
<!ELEMENT para - o (.....)>

<!-- SGML -->
<chapter>
<para>This is a paragraph.
<para>This is another paragraph.
</chapter>

Omit start-tag

In some cases, the start-tag may be omitted. This is an option where it is obvious by context that the element must have started. In this case, the presence of the Employee start-tag is enough to signify the start of its first required child element, Name, and the presence of the Name end-tag similarly signifies the start of its next sibling element, Number:

<!-- SGML -->
<employee>
   J. Smith</name>
   9876</number>
   XML Developer</title>
</employee>

Omit end-tag

In some cases, as shown below, the end-tag may be omitted. This is an option where it is obvious by context that the element has ended. In this case, the Number start-tag is enough to signify the end of the Name element, and the Employee end-tag similarly signifies the end of the embedded Title element:

<!-- SGML -->
<employee>
   <name>J. Smith
   <number>9876
   <title>XML Developer
</employee>

Empty end-tag

Alternatively, the end-tags may be present, but may omit the element names. These are known as empty end-tag elements:

<!-- SGML -->
<employee>
   <name>J. Smith</>
   <number>9876</>
   <title>XML Developer</>
</employee>

Null end-tag

Another technique abbreviates both the start-tag and end-tag. The end-tag is a null end-tag, '/', and the start-tag is a net-enabling start-tag (net = null end-tag), which ends with the same character, '/':

<!-- SGML -->
<employee>
   <name/J. Smith/
   <number/9876/
   <title/XML Developer/
</employee>

This feature can be convenient when the element content is restricted to text and is likely to be brief:

Water is H<sub/2/O


   Water is H2O

Empty start-tag

The start-tag may be empty if it is the same as the previous start-tag. This is known as an empty start-tag:

<!-- SGML -->
<title>XML Developer</title><>Java Developer</title>

Here, the '<>' tag has an implied meaning of '<title>', as this is the previous element in the data stream.

Short references

Markup can be implied from the contextual use of normal text characters using short reference mappings. This can be considered the ultimate minimization technique, as it may involve the insertion of no extra characters. For example, the use of the quotation mark character to surround quoted text could be interpreted as markup, in which case the following fragments would be considered equivalent:

<!-- SGML -->
<p>Alice in Wonderland thought "What is the
use of a book without pictures or convers-
ations?".</p>

<!-- SGML -->
<p>Alice in Wonderland thought <quote>What
is the use of a book without pictures or
conversations?</quote>.</p>

The mapping of strings or individual characters to element tags can be made context-sensitive. Taking the example above, a quotation mark found within a Paragraph element is mapped to the start-tag '<quote>', whereas a quotation mark found within a (now opened) Quote element is mapped to the end-tag '</quote>'.

This technique is especially suitable for tabular material, where the line-end codes are mapped to Row elements and tab or comma characters are mapped to Entry elements:

<!-- SGML -->
Red [TAB] 1 [TAB] Danger [CR][LF]
Yellow [TAB] 2 [TAB] Alert [CR][LF]
Green [TAB] 3 [TAB] Normal [CR][LF]
					

Attribute minimization

When an attribute value is restricted to a single word or number the quotes are not necessary. This is because the next space or chevron unambiguously ends the value (though this feature may be disabled in some systems):

<!-- SGML -->
<list offset=yes indent=15>

An attribute value may be further restricted to one word from a group of words, termed a name group (as in XML). In the following example, the Offset attribute value is restricted to a value of either 'yes' or 'no'. Any other value would be illegal (though this is not obvious from the example - the limitation is defined in and controlled by the DTD). In this situation, the attribute name and the value indicator ('=') may also be absent:

<!-- SGML -->
<list yes>

Attribute inheritance

An attribute value may be inherited from the value of the previous occurrence of that attribute. This is a useful technique when its value is likely to switch only occasionally. The declaration includes the #CURRENT keyword to indicate this behaviour. In the following example, the second and third paragraphs are 'English', whereas the fifth is 'French':

<!-- SGML -->
<!-- English paragraphs -->
<para lang="English"> ... English ... </para>
<para> ... English ... </para>
<para> ... English ... </para>

<!-- French paragraphs -->
<para lang="French"> ... Francais ... </para>
<para> ... Francais ... </para>

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.82.23