22
XML in JavaScript

WHAT'S IN THIS CHAPTER?

  • Examining XML DOM support in browsers
  • Understanding XPath in JavaScript
  • Using XSLT processors

At one point in time, XML was the standard for structured data storage and transmission over the Internet. The evolution of XML closely mirrored the evolution of web technologies, as the DOM was developed for use not just in web browsers but also in desktop and server applications for dealing with XML data structures. Many developers started writing their own XML parsers in JavaScript to deal with the lack of built-in solutions. Since that time, all browsers have introduced native support for XML, the XML DOM, and many related technologies.

XML DOM SUPPORT IN BROWSERS

Because browser vendors began implementing XML solutions before formal standards were created, each offers not only different levels of support but also different implementations. DOM Level 2 was the first specification to introduce the concept of dynamic XML DOM creation. This capability was expanded in DOM Level 3 to include parsing and serialization. By the time DOM Level 3 was finalized, however, most browsers had implemented their own solutions.

DOM Level 2 Core

As mentioned in Chapter 12, DOM Level 2 introduced the createDocument() method of document.implementation. You may recall that it's possible to create a blank XML document using the following syntax:

let xmldom = document.implementation.createDocument(namespaceUri, root, doctype); 

When dealing with XML in JavaScript, the root argument is typically the only one that is used because this defines the tag name of the XML DOM's document element. The namespaceUri argument is used sparingly because namespaces are difficult to manage from JavaScript. The doctype argument is rarely, if ever, used.

To create a new XML document with document element of <root>, you can use the following code:

let xmldom = document.implementation.createDocument("", "root", null);
      
console.log(xmldom.documentElement.tagName); // "root"
      
let child = xmldom.createElement("child");
xmldom.documentElement.appendChild(child);

This example creates an XML DOM document with no default namespace and no doctype. Note that even though a namespace and doctype aren't needed, the arguments must still be passed in. An empty string is passed as the namespace URI so that no namespace is applied, and null is passed as the doctype. The xmldom variable contains an instance of the DOM Level 2 Document type, complete with all of the DOM methods and properties discussed in Chapter 12. In this example, the document element's tag name is displayed and then a new child element is created and added.

You can check to see if DOM Level 2 XML support is enabled in a browser by using the following line of code:

let hasXmlDom = document.implementation.hasFeature("XML", "2.0");

In practice, it is rare to create an XML document from scratch and then build it up systematically using DOM methods. It is much more likely that an XML document needs to be parsed into a DOM structure, or vice versa. Because DOM Level 2 didn't provide for such functionality, a couple of de facto standards emerged.

The DOMParser Type

Firefox introduced the DOMParser type specifically for parsing XML into a DOM document, and it was later adopted by all other browser vendors. To use it, you must first create an instance of DOMParser and then call the parseFromString() method. This method accepts two arguments: the XML string to parse and a content type, which should always be "text/xml". The return value is an instance of Document. Consider the following example:

let parser = new DOMParser();
let xmldom = parser.parseFromString("<root><child/></root>", "text/xml");
      
console.log(xmldom.documentElement.tagName); // "root"
console.log(xmldom.documentElement.firstChild.tagName); // "child"
      
let anotherChild = xmldom.createElement("child");
xmldom.documentElement.appendChild(anotherChild);
      
let children = xmldom.getElementsByTagName("child");
console.log(children.length);  // 2

In this example, a simple XML string is parsed into a DOM document. The DOM structure has <root> as the document element with a single <child> element as its child. You can then interact with the returned document using DOM methods.

The DOMParser can parse only well-formed XML and, as such, cannot parse HTML into an HTML document. Unfortunately, browsers behave differently when a parsing error occurs. When a parsing error occurs in Firefox, Opera, Safari, and Chrome, a Document object is still returned from parseFromString(), but its document element is <parsererror> and the content of the element is a description of the parsing error. Here is an example:

<parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">XML
Parsing Error: no element found Location: file:// /I:/My%20Writing/My%20Books/
Professional%20JavaScript/Second%20Edition/Examples/Ch15/DOMParserExample2.js Line
Number 1, Column 7:<sourcetext><root> ------^</sourcetext></parsererror>

Firefox and Opera both return documents in this format. Safari and Chrome return a document that has a <parsererror> element embedded at the point where the parsing error occurred. Early Internet Explorer versions throw a parsing error at the point where parseFromString() is called. Because of these differences, the best way to determine if a parsing error has occurred is to use a try-catch block, and if there's no error, look for a <parsererror> element anywhere in the document via getElementsByTagName(), as shown here:

let parser = new DOMParser(),
 xmldom, 
 errors;
try {
 xmldom = parser.parseFromString("<root>", "text/xml");
 errors = xmldom.getElementsByTagName("parsererror");
 if (errors.length> 0) {
  throw new Error("Parsing error!");
 }
} catch (ex) {
 console.log("Parsing error!");
}

In this example, the string to be parsed is missing a closing </root> tag, which causes a parse error. In Internet Explorer, this throws an error. In Firefox and Opera, the <parsererror> element will be the document element, whereas it's the first child <root> in Chrome and Safari. The call to getElementsByTagName("parsererror") covers both cases. If any elements are returned by this method call, then an error has occurred and an alert is displayed. You could go one step further and extract the error information from the element as well.

The XMLSerializer Type

As a companion to DOMParser, Firefox also introduced the XMLSerializer type to provide the reverse functionality: serializing a DOM document into an XML string. Since that time, the XMLSerializer has been adopted by all major browser vendors.

To serialize a DOM document, you must create a new instance of XMLSerializer and then pass the document into the serializeToString() method, as in this example:

let serializer = new XMLSerializer();
let xml = serializer.serializeToString(xmldom);
console.log(xml);

The value returned from serializeToString() is a string that is not pretty-printed, so it may be difficult to read with the naked eye.

The XMLSerializer is capable of serializing any valid DOM object, which includes individual nodes and HTML documents. When an HTML document is passed into serializeToString(), it is treated as an XML document, and so the resulting code is well-formed.

XPATH SUPPORT IN BROWSERS

XPath was created as a way to locate specific nodes within a DOM document, so it's important to XML processing. An API for XPath wasn't part of a specification until DOM Level 3, which introduced the DOM Level 3 XPath recommendation. Many browsers chose to implement this specification, but Internet Explorer decided to implement support in its own way.

DOM Level 3 XPath

The DOM Level 3 XPath specification defines interfaces to use for evaluating XPath expressions in the DOM. To determine if the browser supports DOM Level 3 XPath, use the following JavaScript code:

let supportsXPath = document.implementation.hasFeature("XPath", "3.0");

Although there are several types defined in the specification, the two most important ones are XPathEvaluator and XPathResult. The XPathEvaluator is used to evaluate XPath expressions within a specific context. This type has the following three methods:

  • createExpression(expression, nsresolver )—Computes the XPath expression and accompanying namespace information into an XPathExpression, which is a compiled version of the query. This is useful if the same query is going to be run multiple times.
  • createNSResolver(node )—Creates a new XPathNSResolver object based on the namespace information of node. An XPathNSResolver object is required when evaluating against an XML document that uses namespaces.
  • evaluate(expression, context, nsresolver, type, result )—Evaluates an XPath expression in the given context and with specific namespace information. The additional arguments indicate how the result should be returned.

The Document type is typically implemented with the XPathEvaluator interface. So you can either create a new instance of XPathEvaluator or use the methods located on the Document instance (for both XML and HTML documents).

Of the three methods, evaluate() is the most frequently used. This method takes five arguments: the XPath expression, a context node, a namespace resolver, the type of result to return, and an XPathResult object to fill with the result (usually null, because the result is also returned as the function value). The third argument, the namespace resolver, is necessary only when the XML code uses an XML namespace. If namespaces aren't used, this should be set to null. The fourth argument, the type of result to return, is one of the following 10 constants values:

  • XPathResult.ANY_TYPE—Returns the type of data appropriate for the XPath expression.
  • XPathResult.NUMBER_TYPE—Returns a number value.
  • XPathResult.STRING_TYPE—Returns a string value.
  • XPathResult.BOOLEAN_TYPE—Returns a Boolean value.
  • XPathResult.UNORDERED_NODE_ITERATOR_TYPE—Returns a node set of matching nodes, although the order may not match the order of the nodes within the document.
  • XPathResult.ORDERED_NODE_ITERATOR_TYPE—Returns a node set of matching nodes in the order in which they appear in the document. This is the most commonly used result type.
  • XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE—Returns a node set snapshot, capturing the nodes outside of the document so that any further document modification doesn't affect the node set. The nodes in the node set are not necessarily in the same order as they appear in the document.
  • XPathResult.ORDERED_NODE_SNAPSHOT_TYPE—Returns a node set snapshot, capturing the nodes outside of the document so that any further document modification doesn't affect the result set. The nodes in the result set are in the same order as they appear in the document.
  • XPathResult.ANY_UNORDERED_NODE_TYPE—Returns a node set of matching nodes, although the order may not match the order of the nodes within the document.
  • XPathResult.FIRST_ORDERED_NODE_TYPE—Returns a node set with only one node, which is the first matching node in the document.

The type of result you specify determines how to retrieve the value of the result. Here's a typical example:

let result = xmldom.evaluate("employee/name", xmldom.documentElement, null, 
         XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
      
if (result !== null) {
 let element = result.iterateNext();
 while(element) {
  console.log(element.tagName);
  node = result.iterateNext();
 }
}

This example uses the XPathResult.ORDERED_NODE_ITERATOR_TYPE result, which is the most commonly used result type. If no nodes match the XPath expression, evaluate() returns null; otherwise, it returns an XPathResult object. The XPathResult has properties and methods for retrieving results of specific types. If the result is a node iterator, whether it be ordered or unordered, the iterateNext() method must be used to retrieve each matching node in the result. When there are no further matching nodes, iterateNext() returns null.

If you specify a snapshot result type (either ordered or unordered), you must use the snapshotItem() method and snapshotLength property, as in the following example:

let result = xmldom.evaluate("employee/name", xmldom.documentElement, null, 
         XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
if (result !== null) {
 for (let i = 0, len=result.snapshotLength; i < len; i++) {
  console.log(result.snapshotItem(i).tagName);
 }
}

In this example, snapshotLength returns the number of nodes in the snapshot, and snapshotItem() returns the node in a given position in the snapshot (similar to length and item() in a NodeList).

Single Node Results

The XPathResult.FIRST_ORDERED_NODE_TYPE result returns the first matching node, which is accessible through the singleNodeValue property of the result. For example:

let result = xmldom.evaluate("employee/name", xmldom.documentElement, null, 
         XPathResult.FIRST_ORDERED_NODE_TYPE, null);
      
if (result !== null) {
 console.log(result.singleNodeValue.tagName);
}

As with other queries, evaluate() returns null when there are no matching nodes. If a node is returned, it is accessed using the singleNodeValue property. This is the same for XPathResult.FIRST_ORDERED_NODE_TYPE.

Simple Type Results

It's possible to retrieve simple, nonnode data types from XPath as well, using the XPathResult types of Boolean, number, and string. These result types return a single value using the booleanValue, numberValue, and stringValue properties, respectively. For the Boolean type, the evaluation typically returns true if at least one node matches the XPath expression and returns false otherwise. Consider the following:

let result = xmldom.evaluate("employee/name", xmldom.documentElement, null, 
         XPathResult.BOOLEAN_TYPE, null);
console.log(result.booleanValue);

In this example, if any nodes match "employee/name", the booleanValue property is equal to true.

For the number type, the XPath expression must use an XPath function that returns a number, such as count(), which counts all the nodes that match a given pattern. Here's an example:

let result = xmldom.evaluate("count(employee/name)", xmldom.documentElement, 
         null, XPathResult.NUMBER_TYPE, null);
console.log(result.numberValue);

This code outputs the number of nodes that match "employee/name" (which is 2). If you try using this method without one of the special XPath functions, numberValue is equal to NaN.

For the string type, the evaluate() method finds the first node matching the XPath expression and then returns the value of the first child node, assuming the first child node is a text node. If not, the result is an empty string. Here is an example:

let result = xmldom.evaluate("employee/name", xmldom.documentElement, null, 
         XPathResult.STRING_TYPE, null);
console.log(result.stringValue);

In this example, the code outputs the string contained in the first text node under the first element matching "element/name".

Default Type Results

All XPath expressions automatically map to a specific result type. Setting the specific result type limits the output of the expression. You can, however, use the XPathResult.ANY_TYPE constant to allow the automatic result type to be returned. Typically, the result type ends up as a Boolean value, a number value, a string value, or an unordered node iterator. To determine which result type has been returned, use the resultType property on the evaluation result, as shown in this example:

let result = xmldom.evaluate("employee/name", xmldom.documentElement, null, 
         XPathResult.ANY_TYPE, null);
      
if (result !== null) {
 switch(result.resultType) {
  case XPathResult.STRING_TYPE:
   // handle string type
   break;
      
  case XPathResult.NUMBER_TYPE:
   // handle number type
   break;
      
  case XPathResult.BOOLEAN_TYPE:
   // handle boolean type
   break;
      
  case XPathResult.UNORDERED_NODE_ITERATOR_TYPE:
   // handle unordered node iterator type
   break;
      
  default:
   // handle other possible result types
      
 }
}

Using the XPathResult.ANY_TYPE constant allows more natural use of XPath but may also require extra processing code after the result is returned.

Namespace Support

For XML documents that make use of namespaces, the XPathEvaluator must be informed of the namespace information in order to make a proper evaluation. There are a number of ways to accomplish this. Consider the following XML code:

<?xml version="1.0" ?>
<wrox:books xmlns:wrox="http://www.wrox.com/">
 <wrox:book>
  <wrox:title>Professional JavaScript for Web Developers</wrox:title>
  <wrox:author>Nicholas C. Zakas</wrox:author>
 </wrox:book>
 <wrox:book>
  <wrox:title>Professional Ajax</wrox:title>
  <wrox:author>Nicholas C. Zakas</wrox:author>
  <wrox:author>Jeremy McPeak</wrox:author>
  <wrox:author>Joe Fawcett</wrox:author>
 </wrox:book>
</wrox:books>

In this XML document, all elements are part of the http://www.wrox.com/ namespace, identified by the wrox prefix. If you want to use XPath with this document, you need to define the namespaces being used; otherwise the evaluation will fail.

The first way to handle namespaces is to create an XPathNSResolver object via the createNSResolver() method. This method accepts a single argument, which is a node in the document that contains the namespace definition. In the previous example, this node is the document element <wrox:books>, which has the xmlns attribute defining the namespace. This node can be passed into createNSResolver(), and the result can then be used in evaluate()as follows:

let nsresolver = xmldom.createNSResolver(xmldom.documentElement);
      
let result = xmldom.evaluate("wrox:book/wrox:author", 
               xmldom.documentElement, nsresolver,
               XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
      
console.log(result.snapshotLength);

When the nsresolver object is passed into evaluate(), it ensures that the wrox prefix used in the XPath expression will be understood appropriately. Attempting to use this same expression without using an XPathNSResolver will result in an error.

The second way to deal with namespaces is by defining a function that accepts a namespace prefix and returns the associated URI, as in this example:

let nsresolver = function(prefix) {
 switch(prefix) {
  case "wrox": return "http://www.wrox.com/";
  // others here
 }
};
      
let result = xmldom.evaluate("count(wrox:book/wrox:author)", 
        xmldom.documentElement, nsresolver, XPathResult.NUMBER_TYPE, null);
      
console.log(result.numberValue);

Defining a namespace-resolving function is helpful when you're not sure which node of a document contains the namespace definitions. As long as you know the prefixes and URIs, you can define a function to return this information and pass it in as the third argument to evaluate().

XSLT SUPPORT IN BROWSERS

XSLT is a companion technology to XML that makes use of XPath to transform one document representation into another. Unlike XML and XPath, XSLT has no formal API associated with it and is not represented in the formal DOM at all. This left browser vendors to implement support in their own way. The first browser to add XSLT processing in JavaScript was Internet Explorer.

The XSLTProcessor Type

Mozilla implemented JavaScript support for XSLT in Firefox by creating a new type. The XSLTProcessor type allows developers to transform XML documents by using XSLT in a manner similar to the XSL processor in Internet Explorer. Since it was first implemented, all major browsers have copied the implementation, making XSLTProcessor into a de facto standard for JavaScript-enabled XSLT transformations.

As with the Internet Explorer implementation, the first step is to load two DOM documents, one with the XML and the other with the XSLT. After that, create a new XSLTProcessor and use the importStylesheet() method to assign the XSLT to it, as shown in this example:

let processor = new XSLTProcessor()
processor.importStylesheet(xsltdom);

The last step is to perform the transformation. This can be done in two different ways. If you want to return a complete DOM document as the result, call transformToDocument(). You can also get a document fragment object as the result by calling transformToFragment(). Generally speaking, the only reason to use transformToFragment() is if you intend to add the results to another DOM document.

When using transformToDocument(), just pass in the XML DOM and use the result as another completely different DOM. Here's an example:

let result = processor.transformToDocument(xmldom);
console.log(serializeXml(result));

The transformToFragment() method accepts two arguments: the XML DOM to transform and the document that should own the resulting fragment. This ensures that the new document fragment is valid in the destination document. You can, therefore, create the fragment and add it to the page by passing in document as the second argument. Consider the following example:

let fragment = processor.transformToFragment(xmldom, document);
let div = document.getElementById("divResult");
div.appendChild(fragment);

Here, the processor creates a fragment owned by the document object. This enables the fragment to be added to a <div> element that exists in the page.

When the output format for an XSLT style sheet is either "xml" or "html", creating a document or document fragment makes perfect sense. When the output format is "text", however, you typically just want the text result of the transformation. Unfortunately, there is no method that returns text directly. Calling transformToDocument() when the output is "text" results in a full XML document being returned, but the contents of that document are different from browser to browser. Safari, for example, returns an entire HTML document, whereas Opera and Firefox return a one-element document with the output as the element's text.

The solution is to call transformToFragment(), which returns a document fragment that has a single child node containing the result text. You can, therefore, get the text by using the following code:

let fragment = processor.transformToFragment(xmldom, document);
let text = fragment.firstChild.nodeValue;
console.log(text);

This code works the same way for each of the supporting browsers and correctly returns just the text output from the transformation.

Using Parameters

The XSLTProcessor also allows you to set XSLT parameters using the setParameter() method, which accepts three arguments: a namespace URI, the parameter local name, and the value to set. Typically, the namespace URI is null, and the local name is simply the parameter's name. This method must be called prior to transformToDocument() or transformToFragment(). Here's an example:

let processor = new XSLTProcessor()
processor.importStylesheet(xsltdom);
processor.setParameter(null, "message", "Hello World!");
let result = processor.transformToDocument(xmldom);

Two other methods are related to parameters, getParameter() and removeParameter(); they are used to get the current value of a parameter and remove the parameter value, respectively. Each method takes the namespace URI (once again, typically null) and the local name of the parameter. For example:

let processor = new XSLTProcessor()
processor.importStylesheet(xsltdom);
processor.setParameter(null, "message", "Hello World!");
      
console.log(processor.getParameter(null, "message")); // outputs "Hello World!"
processor.removeParameter(null, "message");
      
let result = processor.transformToDocument(xmldom);

These methods aren't used often and are provided mostly for convenience.

Resetting the Processor

Each XSLTProcessor instance can be reused multiple times for multiple transformations with different XSLT style sheets. The reset() method removes all parameters and style sheets from the processor, allowing you to once again call importStylesheet() to load a different XSLT style sheet, as in this example:

let processor = new XSLTProcessor()
processor.importStylesheet(xsltdom);
      
// do some transformations
      
processor.reset();
processor.importStylesheet(xsltdom2);
      
// do more transformations

Reusing a single XSLTProcessor saves memory when using multiple style sheets to perform transformations.

SUMMARY

There is a great deal of support for XML and related technologies in JavaScript. Unfortunately, because of an early lack of specifications, there are several different implementations for common functionality. DOM Level 2 provides an API for creating empty XML documents but not for parsing or serialization. Browsers implemented two new types to deal with XML parsing and serialization as follows:

  • The DOMParser type is a simple object that parses an XML string into a DOM document.
  • The XMLSerializer type performs the opposite operation, serializing a DOM document into an XML string.

DOM Level 3 introduced a specification for an XPath API that has been implemented by all major browsers. The API enables JavaScript to run any XPath query against a DOM document and retrieve the result regardless of its data type.

The last related technology is XSLT, which has no public specification defining an API for its usage. Firefox created the XSLTProcessor type to handle transformations via JavaScript.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.160.61