XHTML Checklist

The W3C has a number of requirements for documents before they can be called true XHTML documents. Here's the list of requirements that documents must meet:

  • The document must successfully validate against one of the W3C XHTML DTDs.

  • The document element must be <html>.

  • The document element, <html>, must set an XML namespace for the document, using the xmlns attribute. This namespace must be "http://www.w3.org/1999/xhtml.

  • There must be a <!DOCTYPE> element, and it must appear before the document element.

XHTML is designed to be displayed in today's browsers, and it works well (largely because those browsers ignore elements that they don't understand, such as <?xml?> and <!DOCTYPE>). However, because XHTML is also XML, a number of differences exist between legal HTML and legal XHTML.

XHTML Versus HTML

As you know, XML is more particular about many aspects of writing documents than HTML is. For example, you need to place all attribute values in quotes in XML, although HTML documents can use unquoted values because HTML browsers will accept that. One of the problems the W3C is trying to solve with XHTML, in fact, is the thicket of nonstandard HTML that's out there on the Web, mostly because browsers support it. Some observers estimate that half of the code in browsers is there to handle nonstandard use of HTML, and that discourages any but the largest companies from creating HTML browsers. XHTML is supposed to be different—if a document isn't in perfect XHTML, the browser is supposed to quit loading it and display an error, not guess what the document author was trying to do. Hopefully, that will make it easier to write XHTML browsers.

Here are some of the major differences between HTML and XHTML:

  • XHTML documents must be well-formed XML documents.

  • Element and attribute names must be in lowercase.

  • Elements that aren't empty need end tags; end tags can't be omitted as they can sometimes in HTML.

  • Attribute values must always be quoted.

  • You cannot use "standalone" attributes that are not assigned values. If need be, assign a dummy value to an attribute, as in action = "action".

  • Empty elements must end with the /> characters. In practice, this does not seem to be a problem for the major browsers, which is a lucky thing for XHTML because it's definitely not standard HTML.

  • The <a> element cannot contain other <a> elements.

  • The <pre> element cannot contain the <img>, <object>, <big>, <small>, <sub>, or <sup> elements.

  • The <button> element cannot contain the <input>, <select>, <textarea>, <label>, <button>, <form>, <fieldset>, <iframe>, or <isindex> elements.

  • The <label> element cannot contain other <label> elements.

  • The <form> element cannot contain other <form> elements.

  • You must use the id attribute, not the name attribute, even on elements that have also had a name attribute. In XHTML 1.0, the name attribute of the <a>, <applet>, <form>, <frame>, <iframe>, <img>, and <map> elements is formally deprecated. In practice, this is a little difficult in browsers such as Netscape that support name and not id; in that case, you should use both attributes in the same element, even though it's not legal XHTML.

  • You must escape sensitive characters. For example, when an attribute value contains an ampersand (&), the ampersand must be expressed as a character entity reference, as &amp;.

As we'll see in the next chapter, there are some additional requirements—for example, if you use < characters in your scripts, you should either escape such characters as &lt;, or, if the browser can't handle that, place the script in an external file. (The W3C's suggestion—to place scripts in CDATA sections—is definitely not understood by any major browser today.)

Automatic Conversion from HTML to XHTML

You may already have a huge Web site full of HTML pages, and you might be reading all this with some trepidation—how are you going to convert all those pages to the far more strict XHTML? In fact, a utility out there can do it for you—the Tidy utility, created by Dave Raggett. This utility is available for a wide variety of platforms, and you can download it for free from http://www.w3.org/People/Raggett/tidy. There's also a complete set of instructions on that page.

Here's an example: I'll use Tidy in Windows to convert a file from HTML to XHTML. In this case, I'll use the example HTML file we developed earlier, as saved in a file named index.html:

<HTML>
    <HEAD>
        <TITLE>
            Welcome to my page
        </TITLE>
    </HEAD>

    <BODY>
        <H1>
            Welcome to XHTML!
        </H1>
    </BODY>
</HTML>

After downloading Tidy, you run it at the command prompt. Here are the command-line switches, or options, that you can use with Tidy:

SwitchDescription
-config fileUse the configuration file named file
-indent or -iIndent element content
-omit or -oOmit optional end tags
-wrap 72Wrap text at column 72 (default is 68)
-upper or -uForce tags to uppercase (default is lowercase)
-clean or -cReplace font, nobr, &amp;, and center tags, by CSS
-rawDon't substitute entities for characters 128 to 255
-asciiUse ASCII for output, and Latin-1 for input
-latin1Use Latin-1 for both input and output
-utf8Use UTF-8 for both input and output
-iso2022Use ISO2022 for both input and output
-numeric or -nOutput numeric rather than named entities
-modify or -mModify original files
-errors or -eShow only error messages
-quiet or -qSuppress nonessential output
-f fileWrite errors to file
-xmlUse this when input is in XML
-asxmlConvert HTML to XML
-slidesBurst into slides on h2 elements
-helpList command-line options
-versionShow release date

In this example, I'll use three switches:

  • -m indicates that I want Tidy to modify the file I pass to it, which will be index.html

  • -i indicates that I want it to indent the resulting XHTML elements

  • -config indicates that I want to use a configuration file named config.txt.

Here's how I use Tidy from the command line:

%tidy -m -i -config configuration.txt index.html

Tidy is actually a utility that cleans up HTML, as you might gather from its name. To make it create XHTML, you must use a configuration file, which I've named configuration.txt here. You can see all the configuration file options on the Tidy Web site. Here are the contents of configuration.txt, which I'll use to convert index.html to XHTML:

output-xhtml: yes
add-xml-pi: yes
doctype: loose

Here, output-xhtml indicates that I want Tidy to create XHTML output. Using add-xml-pi indicates that the output should also include an XML declaration, and doctype: loose means that I want to use the transitional XHTML DTD. If you don't specify what DTD to use, Tidy will guess, based on your HTML.

Here's the resulting XHTML document:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content="HTML Tidy, see www.w3.org" />

    <title>Welcome to my page</title>
  </head>

  <body>
    <h1> Welcome to XHTML!</h1>
  </body>
</html>

You can even teach Tidy about new XHTML tags that you've added. If you're ever stuck and want a quick way of translating HTML into XHTML, check out Tidy; it's fast, it's effective, and it's free.

Validating Your XHTML Document

The W3C has a validator you can use to check the validity of your XHTML document, and you can find this validator at http://validator.w3.org. To use the XHTML validator, you just enter the URI of your document and click the Validate This Page button. The W3C validator checks the document and gives you a full report. Here's an example response:

Congratulations, this document validates as XHTML1.0 Transitional!
To show your readers that you have taken the care to create an
interoperable Web page, you may display this icon on any page that
validates. Here is the HTML you could use to add this icon to your
Web page:
  <p>
    <a href="http://validator.w3.org/check/referer"><img
        src="http://validator.w3.org/images/vxhtml10"
        alt="Valid XHTML 1.0!" height="31" width="88" /></a>
  </p>

In this case, the document I tested validated properly, and the W3C validator says that I can add the official W3C XHTML 1.0 Transitional logo to the document. That logo appears in Figure 16.2.

Figure 16.2. The W3C transitional XHTML logo.


Actually, the W3C XHTML validator does not do a complete job—it doesn't check to see if values are supplied for required attributes, for example, or make sure that child elements are allowed to be nested inside the particular type of their parents. However, it does a reasonably good job.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.171.86