Architectural forms

When an application is tuned to a specific DTD or XML Schema, it directly understands the significance of each element and attribute. For example, an HTML-sensitive Web browser recognizes and responds appropriately to each occurrence of the Image, Table and Form elements it encounters in a document. But when an application must perform specific tasks on data that conform to a variety of different models, the names of the significant elements and attributes are unlikely to be the same across all the models involved.

Harmonizing different models

Software that requires specific information from documents that conform to various models should expect to find the information in elements or attributes with different names.

For example, an indexing application may need to identify the author and title of each document in a collection that is composed of documents from many different sources (placing them in a database table). The title may be tagged with an element called 'Title', and the author name with an element called 'Author', but in another model these elements may be named 'Tel' and 'Auto'. Foreign language models may further increase the range of possibilities, including 'Titel' and 'Verfasser' (the German equivalents of 'title' and 'author'):



Avoiding the problem

Naturally, the ideal solution would be to avoid this problem entirely, and harmonize the models (the Namespaces standard, discussed in Chapter 10, provides a suitable approach to the problem). But this can be impractical to enforce when the models were produced by different authorities, or were developed primarily for other purposes.

Assuming that the problem cannot be avoided, the issue of how to tell the software what the significant element names are in each document type must be addressed. The solution is an architectural form. An 30 architectural form is a mechanism that enables standard templates to be added, as an extra layer of meaning, to documents that conform to diverse models.

Standardized forms

It would be useful if all applications that perform identical functions could understand documents conforming to various models, without any further preparation by the user of a particular application. If all applications adopt the architectural form mechanism, and an independent group devises an appropriate architectural form model, this laudable goal can be achieved.

One obvious example is hypertext linking, where each browser must be made aware of the linking elements and attributes in order to provide active linking. The HyTime standard (ISO/IEC 10744) takes this approach, though XLink and XPointer (Chapter 27 and Chapter 28) now offer a better way forward.

Reserved attributes

Architectural forms work by storing the roles that each significant element or attribute plays in the DTD or XML Schema itself, so that no additional data files need to be maintained, and document authors are not affected.

Typically, an architectural form involves the use of significant, or 'reserved' attribute names. When a model is analysed, and some elements are found to contain these attributes, the application can match its capabilities to documents conforming to this model. For example, the application may have the capability to create a simple database table of names and works, as shown above, and assign significance to elements that contain attributes named 'IndexTitle' and 'IndexName'.

To keep the number of reserved attributes to a minimum, the fixed attribute type may be used to distinguish roles. For example, a single attribute called 'WorkIndex' could be defined, with possible values of 'TITLE' and 'NAME', but having a different fixed value in each element:

<!ATTLIST title       workIndex CDATA #FIXED "TITLE">
<!ATTLIST author      workIndex CDATA #FIXED "NAME">

The reason for using fixed attributes is to prevent document authors from changing the values. In fact, document authors can completely ignore these attributes.

Unfortunately, it is possible for a reserved attribute, such as 'WorkIndex', to conflict accidentally with an attribute of the same name already residing in the DTD. In some cases, it is hoped, this can be avoided by assigning very specific names, such as 'HyTime' and 'SDARULE' (see below). A more secure workaround is to define a single required reserved attribute, which is used to override the default names for other reserved attributes.

ISO standard

A standard for the use of architectural forms, released by the ISO under the designation ISO 10744, is aimed at their use in SGML documents. Applications that are expected to process documents containing unknown architectural form markup need some indication that one or more forms are present, and which ones they are. A processing instruction of the following form is specified:

<?IS10744:arch name=MyForm ?>

<!ATTLIST book myForm NMTOKEN #FIXED "MyForm-Document">

Case study (the ICADD initiative)

Some practical issues regarding the use of architectural forms can be covered through analysis of a real application: an attempt to help print-impaired readers by making documents accessible in large print, Braille or voice synthesis forms.

The ICADD (International Committee for Accessible Document Design) organization produced a suitable SGML DTD for use with software that can re-publish information in these forms.

Developed for use with SGML, and having played a fleeting role in some earlier versions of HTML, the ICADD DTD should now be viewed as an example, rather than a recommendation.

The software developed by ICADD to process documents relies upon a custom DTD, so that both generic and formatting tags can be unambiguously translated (in the same way that Web browsers have relied upon conformance to the HTML element set in order to present material). The ICADD DTD is relatively simple, and contains the following basic elements:

Anchor (mark spot on page) Lhead (list heading)
Au (author) List  
B (bold) Litem (list item)
Book (document element) Note  
Box (sidebar information) Other (emphasize)
Fig (figure title) Para  
Fn (footnote) Pp (print page number)
H1–H6 (header levels) Term (or keyword)
Ipp (Ink print page) Ti (title of book)
It (italic) Xref (cross reference)
Lang (language)   

In order to make a wide variety of information available via this means, it would normally be necessary to either impose the ICADD DTD on all contributors, or translate information between DTDs. But both options have drawbacks. The first is impossible to enforce, so is clearly impractical. The second is very costly because such translations can rarely be performed without human guidance.

For these reasons, the SDA (SGML Document Access) architectural form was developed. When SDA rules are embedded in a DTD, a special converter application learns how to map document instances into the ICADD DTD format, without manual intervention:



A number of special attributes are defined. They are SdaRule, SdaForm, SdaBdy, SdaPart, SdaPref and SdaSuff. Each dictates a different action to be taken by the transformation software. For example, the SdaForm attribute is used to map an element to an ICADD DTD element. In the following example, the Title element is mapped to the Ti element:

   <title>The Title</title>

<!ATTLIST  title  SDAFORM CDATA #FIXED "ti">

   <ti>The Title</ti>

The SdaPref attribute is used to specify a prefix to be generated. In the following example, the original meaning of the Abstract element is kept intact by specifying that the paragraph is to be preceded by an appropriate header:

   <abstract>The Abstract</abstract>

<!ATTLIST  abstract
              SDAFORM CDATA #FIXED "para"
              SDAPREF CDATA #FIXED "<h1>Abstract</h1>">

   <h1>Abstract</h1>
   <para>The Abstract</para>

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.34.87