10. Using XML

After the hype, the Extensible Markup Language (XML) is now really used almost everywhere. An application that receives a lot of buzz is Web Services, a technology that is covered in detail in Chapter 9, “Working with Other Databases.” However, XML can be used elsewhere, as well. It is a good format to store any kind of data.

The tasks behind using XML are always the same: reading data from XML and writing data into it. So, this chapter focuses on these tasks and shows how to implement them.

Unfortunately, PHP 4’s XML support was somewhat limited. Some extensions did not prove to be very stable. This changed drastically with PHP 5 and a revamped XML support. Therefore, we completely omit PHP 4 in this chapter (and in the rest of this book). In PHP 5.1 and 5.2, some new features have been added that are also covered in this chapter.

As the sample XML file and format in this chapter, the XML from the following code reuses the quotes database example from the preceding chapter. As you can see, <quotes> is the root element, and each quote (including its author and the year the phrase was coined) is contained in a <quote> element.

<?xml version="1.0" encoding="ISO-8859-1" ?>
<quotes>
  <quote year="1991">
    <phrase>Hasta la vista, baby!</phrase>
    <author>Arnold Schwarzenegger</author>
  </quote>
</quotes>

The Sample XML File (quotes.xml; excerpt)

Parsing XML with SAX

$sax = xml_parser_create();

$sax = xml_parser_create();
xml_parser_set_option($sax, XML_OPTION_CASE_FOLDING, false);
xml_parser_set_option($sax, XML_OPTION_SKIP_WHITE, true);
xml_set_element_handler($sax, 'sax_start', 'sax_end'),
xml_set_character_data_handler($sax, 'sax_cdata'),
xml_parse($sax, file_get_contents('quotes.xml'), true);
xml_parser_free($sax);

Parsing XML with SAX (sax.php; excerpt)

Simple API for XML (SAX) is an approach to parse XML documents, but not to validate them.

You create a SAX parser using xml_parser_create(), optionally providing the encoding as an argument. This parser can look at an XML file and react upon various events. The following three events are the most important ones:

• Beginning of an element

• End of an element

• CDATA blocks

You can then define handler functions for these elements and use them to transform the XML into something else, for instance Hypertext Markup Language (HTML). The following listing shows this and outputs the contents of the XML file as a bulleted HTML list, as shown in Figure 10.1. The function xml_set_element_handler() sets the handlers for the beginning and end of an element, whereas xml_set_character_data_handler() sets the handler for CDATA blocks. With xml_parser_set_option(), you can configure the handler, for instance, to ignore whitespace and to handle tag names as case sensitive (then tag names are not converted into uppercase letters automatically). The following code contains the code for the handler functions:

function sax_start($sax, $tag, $attr) {
  if ($tag == 'quotes') {
    echo '<ul>';
  } elseif ($tag == 'quote') {
    echo '<li>' . htmlspecialchars($attr['year']) . ': ';
  } elseif ($tag == 'phrase') {
    echo '"';
  } elseif ($tag == 'author') {
    echo ' (';
  }
}
function sax_end($sax, $tag) {
  if ($tag == 'quotes') {
    echo '</ul>';
  } elseif ($tag == 'quote') {
    echo '</li>';
  } elseif ($tag == 'phrase') {
    echo '"';
  } elseif ($tag == 'author') {
    echo ') ';
  }
}
function sax_cdata($sax, $data) {
  echo htmlspecialchars($data);
}

Image

Figure 10.1. HTML created from XML

Parsing XML with XMLReader

$xml = new XMLReader();

echo '<ul>';
$xml = new XMLReader();
$xml->open('quotes.xml'),
while ($xml->read()) {
  if ($xml->nodeType == XMLREADER::ELEMENT) {
    if ($xml->localName == 'phrase') {
      $xml->read();
      echo '<li>' . htmlspecialchars($xml->value) . '</li>';
    }
  }
}
echo '</ul>';

Parsing XML with XMLReader (xmlreader.php)

XMLReader, a clone of the XmlTextReader interface in Microsoft .NET, became part of the PHP distribution with version 5.1.0 and is activated by default. At first glance, it looks similar to the SAX parser, but a fundamental difference exists: SAX is an event-based parser, whereas XMLReader uses a cursor that iterates over all elements in an XML file.

To user XMLReader, you instantiate the XMLReader class and then open a file (open() method). The common pattern used iterates over all elements using a while loop and the read() method.

Within that loop, you can determine the type of node the cursor has reached (noteTyp property) and then act accordingly (for instance, by reading out tag names [localName property] and accessing the value of text nodes [value property]).

The preceding code again reads in all phrases from the XML document and outputs them as a bulleted list.

Using DOM to Read XML

$dom = new DOMDocument();

<?php
  $dom = new DOMDocument();
  $dom->load('quotes.xml'),
  echo '<ul>';
  foreach ($dom->getElementsByTagname('quote') as $element) {
    $year = $element->getAttribute('year'),
    foreach (($element->childNodes) as $e) {
      if ($e instanceof DOMElement) {
        if ($e->tagName == 'phrase') {
          $phrase = htmlspecialchars($e->textContent);
        } elseif ($e->tagName == 'author') {
          $author = htmlspecialchars($e->textContent);
        }
      }
    }
    echo "<li>$author: "$phrase" ($year)</li>";
  }
  echo '</ul>';
?>

Parsing XML with DOM (dom-read.php; excerpt)

The W3C’s Document Object Model (DOM) defines a unified way to access elements in an XML structure. Therefore, accessing elements in an HTML page using JavaScript’s DOM access and accessing elements in an XML file using PHP’s DOM access are quite similar.

The DOM extension is bundled with PHP, so no installation is required. First, you instantiate a DOMDocument object, and then you load() a file or loadXML() a string. The new object supports, among other functionality, the two methods getElementsByTagname() and getElementById() that return all nodes with a certain tag name or a specific element identified by its ID. Then each node exposes some properties such as the following:

firstChild—First child node

lastChild—Last child node

nextSibling—Next node

previousSibling—Previous node

nodeValue—Value of the node

The preceding code uses DOM to access all quotes in the XML file and outputs them.

Note that the listings use instanceof so that the tag names are only evaluated in nodes of the type DOMElement. This is because whitespace is considered as a DOM node (however, of type DOMText).

Using DOM to Write XML

$dom->save('quotes.xml'),

<?php
  $dom = new DOMDocument();
  $dom->load('quotes.xml'),
  $quote = $dom->createElement('quote'),
  $quote->setAttribute('year', $_POST['year']);
  $phrase = $dom->createElement('phrase'),
  $phraseText = $dom->createTextNode($_POST['quote']);
  $phrase->appendChild($phraseText);
  $author = $dom->createElement('author'),
  $authorText = $dom->createTextNode($_POST['author']);
  $author->appendChild($authorText);
  $quote->appendChild($phrase);
  $quote->appendChild($author);
  $dom->documentElement->appendChild($quote);
  $dom->save('quotes.xml'),
  echo 'Quote saved.';
?>

Creating XML with DOM (dom-write.php; excerpt)

Apart from the read access, it is also possible to build complete XML documents from the ground up using PHP’s DOM support. This might look a bit clumsy, but it works very well when you have to automatically parse a lot of data.

The createElement() method creates a new element. You can set its content by appending a new text node (created with createTextNode()) and add attributes with setAttribute(). Finally, you access the root element of the XML file with documentElement and then call appendChild(). Finally, save() writes the whole XML file to the hard disk.

The preceding code saves author, quote, and year in an XML document, appending to the data already there.


Note

PHP 5’s DOM extension does not offer something like set_content() (which was available in PHP 4), so you have to define the text values of the nodes using the createTextNode() method, as shown in the preceding code.


Using XMLWriter to Write XML

$xml = new XMLWriter();

<?php
  header('Content-type', 'text/xml; charset=ISO-8859-1'),

  $xml = new XMLWriter();
  $xml->openMemory();
  $xml->startDocument('1.0', 'ISO-8859-1'),
  $xml->startElement('quotes'),
    $xml->startElement('quote'),
    $xml->writeAttribute('year', '1991'),
      $xml->startElement('phrase'),
      $xml->text('Hasta la vista, baby!'),
      $xml->endElement();
      $xml->startElement('author'),
      $xml->text('Arnold Schwarzenegger'),
      $xml->endElement();
    $xml->endElement();
  $xml->endElement();
  $xml->endElement();
  echo $xml->outputMemory();
?>

Creating XML with XMLWriter (xmlwriter.php)

XMLWriter, the sibling of XMLReader, was introduced into the PHP distribution in version 5.1.2. It provides a structured API to create XML files and is especially valuable when you are dynamically creating XML data (for instance, when processing data from a database).

The API is quite simple: After opening an XML document in memory, you have specific functions to start the document, to start and end an element, and to write attributes and text nodes. The preceding code creates the XML file from the beginning of this chapter, except for the indentation; see Figure 10.2 for the result.

Image

Figure 10.2. XML created with XMLWriter

Using SimpleXML

$xml = simplexml_load_file('quotes.xml'),

<?php
  $xml = simplexml_load_file('quotes.xml'),
  echo '<ul>';
  foreach ($xml->quote as $quote) {
    $year = htmlspecialchars($quote['year']);
    $phrase = htmlspecialchars($quote->phrase);
    $author = htmlspecialchars($quote->author);
    echo "<li>$author: "$phrase" ($year)</li>";
  }
  echo '</ul>';
?>

Parsing XML with SimpleXML (simplexml-read.php)

One of the greatest new features in PHP 5.1 is the SimpleXML extension, an idea borrowed from a Perl module in CPAN. The approach is as simple as it is ingenious. The most intuitive way to access XML is probably via an object-oriented programming (OOP) approach: Subnodes are properties of their parent nodes/objects, and XML attributes turn into object attributes. This makes accessing XML very easy, including full iterator support, so foreach can be used.

This code loads a file using simplexml_load_file()—you can also use simplexml_load_string() for strings—and then reads all information in.

Compare this to the DOM approach. SimpleXML may be slower in some instances than DOM, but the coding is so much quicker.


Note

Writing can be done easily, as well. However, it is not possible to append elements without any external help (for instance, by using DOM and loading this DOM into SimpleXML using simplexml_import_dom()).


Using XPath with SimpleXML

$xml->xpath()

<?php
  $xml = simplexml_load_file('quotes.xml'),
  foreach ($xml->xpath('*/quote') as $quote) {
    echo '<p>' . htmlspecialchars($quote) . '</p>';
  }
?>

Using XPath with SimpleXML (xpath.php)

One of the best guarded secrets of SimpleXML is that the extension has a built-in support for XPath, the XML query language. Using it is easy: After creating a SimpleXML object, you can use the xpath() method and get all matching nodes in return.

Transforming XML with XSL

$xslt = new XsltProcessor();

<?php
  $xml = new DOMDocument();
  $xml->load('quotes.xml'),
  $xsl = new DOMDocument();
  $xsl->load('quotes.xsl'),
  $xslt = new XsltProcessor();
  $xslt->importStylesheet($xsl);
  $result = $xslt->transformToDoc($xml);
  echo $result->saveXML();
?>

Using XSLT with PHP (xslt.php)

Transforming XML into another format is usually done by XSLT (XSL Transformation). In PHP, XSLT is done by libxslt and can be enabled using php_xsl.dll (in php.ini) on Windows and the switch --with-xsl on other platforms. Writing the XSL file may be hard, but the PHP code afterward is quite simple: Load both the XML and the XSLT (which is an XML document, as well) into a DOM object, and then instantiate an XsltProcessor object. Call importStylesheet(), and then transformToDoc().

The preceding phrase contains the code for these steps; the file quotes.xsl in the download repository contains markup that transforms the quotes’ XML into the well-known HTML bulleted list.

Validating XML

$dom->relaxNGValidate('quotes.rng')

<?php
  $dom = new DOMDocument;
  $dom->load('quotes.xml'),
  echo 'Validation ' .
    (($dom->relaxNGValidate('quotes.rng')) ? 'succeeded.' : 'failed.'),
?>

Validating XML against relaxNG (validate-rng.php)

PHP can validate XML against three types of files: Document Type Definitions (DTDs), Schemas (.xsd), and relaxNG. For the last two, the following four methods of the DOM object are available:

schemaValidate('file.xsd')—Validates against a Schema file

schemaValidateSource('...')—Validates against a Schema string

relaxNGValidate('file.rng')—Validates against a relaxNG file

relaxNGValidateSource('...')—Validates against a relaxNG string

The preceding code uses relaxNGValidate() to validate a (well-formed) XML file against a nonmatching relaxNG file. If you change <element name="person"> to <element name="author"> in the file quotes.rng, the validation succeeds.


Tip

Creating a relaxNG file can be quite difficult; the Java tool Trang, available at http://thaiopensource.com/relaxng/trang.html, can read in an XML file and create a relaxNG, Schema, or DTD file out of it.


Validating a Schema is similar and is shown in the file validate-xsd.php in the download repository. When it comes to validating DTDs, you have to patch the XML a bit. The DTD file must be included in the file or referenced like this:

<!DOCTYPE note SYSTEM "quotes.dtd">

Then, just load the XML document into a DOM object and call validate(). The following contains the appropriate code; the file referenced in the code repository contains an intentional error in the DTD (month rather than year):

<?php
  $dom = new DOMDocument();
  $dom->load('quotes-dtd.xml'),
  echo 'Validation ' .
    (($dom->validate()) ? 'succeeded.' : 'failed.'),
?>

Validating XML against a DTD (validate-dtd.php)


What Does PEAR Offer?

As of this writing, the XML section of the PHP Extension and Application Repository (PEAR) contains 35 packages, too many to mention. Here are some of them:

XML_Beautifier formats XML documents so that they are prettier.

XML_DTD allows parsing of DTDs, even with PHP 4.

XML_Parser2 provides an advanced XML parser.

XML_Serializer converts XML files into data structures and vice versa.

XML_Util contains a wealth of helper functions for working with XML.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.134.107