SimpleXML

In PHP 5 all XML support is now provided by the libxml2 XML toolkit. By default PHP 5 supports SimpleXML, but if libxml2 is not installed on your machine or the version number is lower than 2.5.10, go to www.xmlsoft.org and download the latest version. (You can use the PHP function phpinfo to check which version of libxml is running on your server.) Without going into too many details, suffice it to say that support for XML has been brought into line with the standards defined by the World Wide Web Consortium (W3C). Unified treatment of XML under libxml2 makes for a more efficient and more easily maintained implementation of XML support.

Support for XML is much improved in PHP 5, in terms of both performance and functionality. The SimpleXML extension makes full use of the libxml2 toolkit to provide easy access to XML, and as a quick way of converting XML documents to PHP data types.

XML

Since an RSS document is an XML document, you need some understanding of the basics of XML if you want to be able to read a feed. XML is a markup language that is similar in many ways to HTML—this should come as no surprise given that both HTML and XML have a common heritage in Standard Generalized Markup Language (SGML). As a web developer, even if you have never seen an XML file before, it will look familiar, especially if you are coding to the XHTML standard. XML makes use of tags or elements enclosed by angle brackets. Just as in HTML, a closing tag is differentiated from an opening tag by preceding the element name with a forward slash. Also like HTML, tags can have attributes. The major difference between XML tags and HTML tags is that HTML tags are predefined; in XML you can define your own tags. It is this capability that puts the "extensible" in XML. The best way to understand XML is by examining an XML document. Before doing so, let me say a few words about RSS documents.

RSS

Unfortunately there are numerous versions of RSS. Let's take a pragmatic approach and ignore the details of RSS's tortuous history. With something new it's always best to start with a simple example, and the simplest version of RSS is version 0.91. This version has officially been declared obsolete, but it is still widely used, and knowledge of its structure provides a firm basis for migrating to version 2.0, so your efforts will not be wasted. I'll show you an example of a version 0.91 RSS file—in fact, it is the very RSS feed that we are going to use to display news items in a web page.

Structure of an RSS File

As we have done earlier with our own code, let's walk through the RSS code, commenting where appropriate.

The very first component of an XML file is the version declaration. This declaration shows a version number and, like the following example, may also contain information about character encoding.

<?xml version="1.0" encoding="iso-8859-1"?>

After the XML version declaration, the next line of code begins the very first element of the document. The name of this element defines the type of XML document. For this reason, this element is known as the document element or root element. Not surprisingly, our document type is RSS. This opening element defines the RSS version number and has a matching closing tag that terminates the document in much the same way that <html> and </html> open and close a web page.

<rss version="0.91">

A properly formatted RSS document requires a single channel element. This element will contain metadata about the feed as well as the actual data that makes up the feed. A channel element has three required sub-elements: a title, a link, and a description. In our code we will extract the channel title element to form a header for our web page.

  <channel>
    <title>About Classical Music</title>
    <link>http://classicalmusic.about.com/</link>
    <description>Get the latest headlines from the About.com Classical Music Guide
Site.</description>

The language, pubDate, and image sub-elements all contain optional metadata about the channel.

    <language>en-us</language>
    <pubDate>Sun, 19 March 2006 21:25:29 -0500</pubDate>
    <image>
        <title>About.com</title>
        <url>http://z.about.com/d/lg/rss.gif</url>
        <link>http://about.com/</link>
        <width>88</width>
        <height>31</height>
    </image>

The item element that follows is what we are really interested in. The three required elements of an item are the ones that appear here: the title, link, and description. This is the part of the RSS feed that will form the content of our web page. We'll create an HTML anchor tag using the title and link elements, and follow this with the description.

    <item>
        <title>And the Oscar goes to...</title>
        <link>http://classicalmusic.about.com/b/a/249503.htm</link>
        <description>Find out who won this year's Oscar for Best Music...
        </description>
    </item>

Only one item is shown here, but any number may appear. It is common to find about 20 items in a typical RSS feed.

</channel>
</rss>

Termination of the channel element is followed by the termination of the rss element. These tags are properly nested one within the other, and each tag has a matching end tag, so we may say that this XML document is well-formed.

Reading the Feed

In order to read this feed we'll pass its URI to the simplexml_load_file function and create a SimpleXMLElement object. This object has four built-in methods and as many properties or data members as its XML source file.

<?php
//point to an xml file
$feed = "http://z.about.com/6/g/classicalmusic/b/index.xml";
//create object of SimpleXMLElement class
$sxml = simplexml_load_file($feed);

We can use the attributes method to extract the RSS version number from the root element.

foreach ($sxml->attributes() as $key => $value){
  echo "RSS $key $value";
}

The channel title can be referenced in an OO fashion as a nested property. Please note, however, that we cannot reference $sxml->channel->title from within quotation marks because it is a complex expression. Alternate syntax using curly braces is shown in the comment below.

echo "<h2>" . $sxml->channel->title . "</h2>
";
//below won't work
//echo "<h2>$sxml->channel->title</h2>
";
//may use the syntax below
//echo "<h2>{$sxml->channel->title}</h2>
";echo "<p>
";

As you might expect, a SimpleXMLElement supports iteration.

//iterate through items as though an array
foreach ($sxml->channel->item as $item){
  $strtemp = "<a href="$item->link">".
    "$item->title</a> $item->description<br /><br />
";
  echo $strtemp;
}
?>
</p>

I told you it was going to be easy, but I'll bet you didn't expect so few lines of code. With only a basic understanding of the structure of an RSS file we were able to embed an RSS feed into a web page.

The SimpleXML extension excels in circumstances such as this where the file structure is known beforehand. We know we are dealing with an RSS file, and we know that if the file is well-formed it must contain certain elements. On the other hand, if we don't know the file format we're dealing with, the SimpleXML extension won't be able to do the job. A SimpleXMLElement cannot query an XML file in order to determine its structure. Living up to its name, SimpleXML is the easiest XML extension to use. For more complex interactions with XML files you'll have to use the Document Object Model (DOM) or the Simple API for XML (SAX) extensions. In any case, by providing the SimpleXML extension, PHP 5 has stayed true to its origins and provided an easy way to perform what might otherwise be a fairly complex task.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.0.192