Appropriate names

There are few restrictions on the naming of elements, but some guidelines are worth considering. A number of software and markup language conventions are available to choose from.

Letter-case

A coherent and easily remembered policy on the use of upper-case and lower-case letters is essential, as element names are case-sensitive. The most obvious options are all lower-case ('myelement') and all upper-case ('MYELEMENT'). Lower-case letters are, for two reasons, generally considered to be better. First, they are easier on the eye, whether viewing elements in XML documents, or options from an XML editor menu (and tests have shown that upper-case words take 30% longer to read). But they also make documents compressed for transfer smaller, because the same words, consisting of identical characters, are more likely to appear in the document text.

However, mixed-case ('MyElement') has the benefit of clearly distinguishing each part of a name that is derived from multiple words:

<CompanyPresident>J. Smith</CompanyPresident>

Software conventions

When developing software to process XML data, one common approach is to use the same name for an element or attribute as the variable that holds its value while in memory.

For example, Java conventions include the use of lower-case for the first letter, and capitals for the start of each embedded word, such as 'theTag' (this convention is generally used throughout this book):

   String companyPresident = "J. Smith";

<companyPresident>J. Smith</companyPresident>

HTML and XHTML conventions

Another common approach now is to adopt HTML tag naming conventions. A number of recent standards have taken HTML as a starting-point, simply removing unwanted elements and adding new required ones. For example, both WML (Wireless Markup Language) and OEB (Open Electronic Book) standards adopt this practice. Familiarity with HTML makes learning the new standards relatively simple, as people already know what element names such as 'P' (paragraph), 'UL' (unordered list) and 'TR' (table row) mean. The second benefit is that HTML is currently used as the core storage format for a huge range of information. The ability to extract and copy HTML-based text into documents that conform to other standards, with a minimum of fuss, is of clear benefit.

The HTML standards are based on SGML, rather than XML, which is not (by default) case-sensitive. This means that 'p' and 'P' are both valid ways of identifying the HTML Paragraph element. To distinguish the original HTML-originating elements from the domain-specific new ones, the HTML elements could be made upper-case, and the others mixed- or lower-case:

<P>Company president:
<companyPresident>J.Smith</companyPresident>.</P>

As XHTML is an XML application, names are case-sensitive, and all names are lower-case.

Length of names

The other factor to consider is the length of the name. Unfortunately, there are two conflicting aims to keep in mind. The desire to create unambiguous, self-describing structures would tend to suggest the need for longer names. Clearly, the name 'PriceCode' is more meaningful than 'PC', or even 'PriCd'. But in contradiction to this is the need to minimize document size, so as to increase the speed of transfer over networks.

One practical solution is to use short names for commonly used elements, and long names for infrequently used elements. This approach addresses both problems, as document authors will use the shorter named elements so frequently that memorizing their meaning is not an issue, and, at the same time, document size is not greatly affected by the increased length of a few, rarely used elements.

Naming of attributes should follow the same considerations, with a reasonable balance between clarity and brevity, perhaps also taking into account the likely number of occurrences of both the element and of the attribute itself.

Note that HTML tends to follow these rules, with 'P' standing for 'paragraph', a commonly used element, and the longer but less common 'FRAMESET' representing an entire document containing frames.

Consistency

Consistency is particularly important, regardless of which convention is chosen. If an underscore character is used to separate words within one element name, this character should be used for the same purpose in all compound element names. Confidence in a model is easily undermined if there is little or no consistency.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.36.30