Create Well-Formed XML with JavaScript

Use Javascript to ensure that you write correct, well-formed XML in web pages.

Sometimes you need to create some XML from within a browser. It is easy to write bad XML without realizing it. Writing correct XML with all its bells and whistles is not easy, but in this type of scenario you usually only need to write basic XML.

There is a kind of hierarchy of XML:

  1. Basic: Elements only; no attributes, entities, character references, escaped characters, or encoding issues

  2. Plain: Basic plus attributes

  3. Plain/escaped: Plain with special XML characters escaped

  4. Plain/advanced: Plain/escaped with CDATA sections and processing instructions

The list continues with increasing levels of sophistication (and difficulty).

This hack covers the basic and plain styles (with some enhancements), and you can adapt the techniques to move several more steps up the ladder if you like.

The main issues with writing basic XML is to get the elements closed properly and keep the code simple. Here is how.

The Element Function

Here is a Javascript function for writing elements:

// Bare bones XML writer - no attributes
function element(name,content){
    var xml
    if (!content){
        xml='<' + name + '/>'
    }
    else {
        xml='<'+ name + '>' + content + '</' + name + '>'
    }
    return xml
}

This basic hack even writes the empty-element form when there is no element content. What is especially nice about this hack is that you can use it recursively, like this:

var xml = element('p', 'This is ' + 
    element('strong','Bold Text') + 'inline')

Both inner and outer elements are guaranteed to be closed properly. You can display the result for testing like this:

alert(xml)

You can build up your entire XML document by combining bits like these, and all the elements will be properly nested and closed.

The element() function does not do any pretty-printing, because it has no way to know where line breaks should go. If that is important to you, just create a variant function:

function elementNL(name, content) {
    return element(name,content) + '
'
}

More sophisticated variations are possible but rarely needed.

Adding Attributes

At the next level up, the most pressing problems are to format the attribute string properly, to escape single and double quotes embedded in the attribute values, and to do the least amount of quote escaping so that the result will be as readable as possible.

We modify the element() function to optionally accept an associative array containing the attribute names and values. In other languages, an associative array may be called a dictionary or a hash.

// XML writer with attributes and smart attribute quote escaping 
function element(name,content,attributes){
    var att_str = ''
    if (attributes) { // tests false if this arg is missing!
        att_str = formatAttributes(attributes)
    }
    var xml
    if (!content){
        xml='<' + name + att_str + '/>'
    }
    else {
        xml='<' + name + att_str + '>' + content + '</'+name+'>'
    }
    return xml
}

The function formatAtributes() handles formatting and escaping the attributes.

To fix up the quotes, we use the following algorithm if there are embedded quotes (single or double):

  1. Whichever type of quote occurs first in the string, use the other kind to enclose the attribute value.

  2. Only escape occurrences of the kind of quote used to enclose the attribute value. We don’t need to escape the other kind.

Here is the code:

var APOS = "'"; QUOTE = '"'
var ESCAPED_QUOTE = {  }
ESCAPED_QUOTE[QUOTE] = '&quot;'
ESCAPED_QUOTE[APOS] = '&apos;'
   
/*
   Format a dictionary of attributes into a string suitable
   for inserting into the start tag of an element.  Be smart
   about escaping embedded quotes in the attribute values.
*/
function formatAttributes(attributes) {
    var att_value
    var apos_pos, quot_pos
    var use_quote, escape, quote_to_escape
    var att_str
    var re
    var result = ''
   
    for (var att in attributes) {
        att_value = attributes[att]
        
        // Find first quote marks if any
        apos_pos = att_value.indexOf(APOS)
        quot_pos = att_value.indexOf(QUOTE)
       
        // Determine which quote type to use around 
        // the attribute value
        if (apos_pos =  = -1 && quot_pos =  = -1) {
            att_str = ' ' + att + "='" + att_value +  "'"
            result += att_str
            continue
        }
        
        // Prefer the single quote unless forced to use double
        if (quot_pos != -1 && quot_pos < apos_pos) {
            use_quote = APOS
        }
        else {
            use_quote = QUOTE
        }
   
        // Figure out which kind of quote to escape
        // Use nice dictionary instead of yucky if-else nests
        escape = ESCAPED_QUOTE[use_quote]
        
        // Escape only the right kind of quote
        re = new RegExp(use_quote,'g')
        att_str = ' ' + att + '=' + use_quote + 
            att_value.replace(re, escape) + use_quote
        result += att_str
    }
    return result
}

Here is code to test everything we’ve seen so far:

function test() {   
    var atts = {att1:"a1", 
        att2:"This is in "double quotes" and this is " +
         "in 'single quotes'",
        att3:"This is in 'single quotes' and this is in " +
         ""double quotes""}
    
    // Basic XML example
    alert(element('elem','This is a test'))
   
    // Nested elements
    var xml = element('p', 'This is ' + 
    element('strong','Bold Text') + 'inline')
    alert(xml)
   
    // Attributes with all kinds of embedded quotes
    alert(element('elem','This is a test', atts))
   
    // Empty element version
    alert(element('elem','', atts))    
}

Open the file jswriter.html (Example 7-18) in a browser that supports Java-Script (the script is also stored in jswriter.js so you can easily include it in any HTML or XHTML document).

Example 7-18. jswriter.html

<html xmlns="http://www.w3.org/1999/xhtml">
<head><Title>Testing the Well-formed XML Hack</head>
<script type='text/javascript'>
// XML writer with attributes and smart attribute quote escaping 
function element(name,content,attributes){
    var att_str = ''
    if (attributes) { // tests false if this arg is missing!
        att_str = formatAttributes(attributes)
    }
    var xml
    if (!content){
        xml='<' + name + att_str + '/>'
    }
    else {
        xml='<' + name + att_str + '>' + content + '</'+name+'>'
    }
    return xml
}
var APOS = "'"; QUOTE = '"'
var ESCAPED_QUOTE = {  }
ESCAPED_QUOTE[QUOTE] = '&quot;'
ESCAPED_QUOTE[APOS] = '&apos;'
   
/*
   Format a dictionary of attributes into a string suitable
   for inserting into the start tag of an element.  Be smart
   about escaping embedded quotes in the attribute values.
*/
function formatAttributes(attributes) {
    var att_value
    var apos_pos, quot_pos
    var use_quote, escape, quote_to_escape
    var att_str
    var re
    var result = ''
   
    for (var att in attributes) {
        att_value = attributes[att]
        
        // Find first quote marks if any
        apos_pos = att_value.indexOf(APOS)
        quot_pos = att_value.indexOf(QUOTE)
       
        // Determine which quote type to use around 
        // the attribute value
        if (apos_pos =  = -1 && quot_pos =  = -1) {
            att_str = ' ' + att + "='" + att_value +  "'"
            result += att_str
            continue
        }
        
        // Prefer the single quote unless forced to use double
        if (quot_pos != -1 && quot_pos < apos_pos) {
            use_quote = APOS
        }
        else {
            use_quote = QUOTE
        }
   
        // Figure out which kind of quote to escape
        // Use nice dictionary instead of yucky if-else nests
        escape = ESCAPED_QUOTE[use_quote]
        
        // Escape only the right kind of quote
        re = new RegExp(use_quote,'g')
        att_str = ' ' + att + '=' + use_quote + 
            att_value.replace(re, escape) + use_quote
        result += att_str
    }
    return result
}
function test() {   
    var atts = {att1:"a1", 
        att2:"This is in "double quotes" and this is " +
         "in 'single quotes'",
        att3:"This is in 'single quotes' and this is in " +
         ""double quotes""}
    
    // Basic XML example
    alert(element('elem','This is a test'))
   
    // Nested elements
    var xml = element('p', 'This is ' + 
    element('strong','Bold Text') + 'inline')
    alert(xml)
   
    // Attributes with all kinds of embedded quotes
    alert(element('elem','This is a test', atts))
   
    // Empty element version
    alert(element('elem','', atts))    
}   
</script>
</head>
   
<body onload='test()'>
</body>
</html>

When the page loads, you will see the following in four successive alert boxes, as shown in Figure 7-1. The lines have been wrapped for readability.

First alert:

<elem>This is a test</elem>

Second alert:

<p>This is <strong>Bold Text</strong>inline</p>

Third alert:

<elem att1='a1'

att2='This is in "double quotes" and this is

in &apos;single quotes&apos;'

att3="This is in 'single quotes' and this is in

&quot;double quotes&quot;">This is a test</elem>

Fourth alert:

<elem att1='a1'

att2='This is in "double quotes" and this is in

&apos;single quotes&apos;'

att3="This is in 'single quotes' and this is in

&quot;double quotes&quot;"/>

jswriter.html in Firefox

Figure 7-1. jswriter.html in Firefox

Extending the Hack

You may want to escape the other special XML characters. You can do this by adding calls such as:

content = content.replace(/</g, '&lt;')

Take care not to replace the quotes in attribute values, since formatAttributes() handles this so nicely. Because the parameters to elements() and formatAttributes() are strings, they are easy to manipulate as you like.

Creating Large Chunks of XML

If you create long strings of XML, say with more than a few hundred string fragments, you may find the performance to be slow. That’s normal, and happens because JavaScript, like most other languages, has to allocate memory for each new string every time you concatenate more fragments.

The standard way around this is to accumulate the fragments in a list, then join the list back to a string at the end. This process is generally very fast, even for very large results.

Here is how you can do it:

var results = [  ]
results.push(element("p","This is some content"))
results.push(element('p', 'This is ' + 
    element('strong','Bold Text') + 'inline'))
// ... Append more bits
   
var end_result = results.join(' ')

See Also

  • JavaScript: The Definitive Guide, by David Flanagan (O’Reilly)

Tom Passin

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.235.176