HTML is the language of the Web. It is used to encode embedded directions (tags) that indicate to a Web browser how to display the contents of a document.
The HTML standard is under the authority of the World Wide Web Consortium. Unlike the HTTP standard that is used consistently across implementations, browser writers have implemented HTML differently, according to their whim, and have added their own proprietary HTML extensions (to the point that different versions of the same browser may handle the same HTML tag differently).
Tip
In your servlet code, you are advised to restrict the use of HTML to well-established tags and features. All the HTML covered here will work in all the most popular browsers, although you may find the output may look different in your favorite.
An HTML document has a well-defined structure consisting of required and optional HTML elements.
An HTML element consists of a tag name followed by an optional list of attributes all enclosed in angle brackets (<...>). Tag names and attributes are not case sensitive and cannot contain a space, tab, or return character. Most HTML tags come in pairs—a start tag and an end tag. The end tag is the same as the start tag but has a forward slash character preceding the tag name. For example, an HTML document begins with <HTML> and ends with </HTML>.
Tags are nested. This means that you must end the most recent tag before ending a preceding one. Apart from this restriction, the actual layout is completely free format. An indented layout can be used to aid readability but is not required.
Each HTML document has an optional HEAD and a BODY. The HEAD is where you pass information to the browser about the document; text in the header is not displayed as the content of the document. The BODY includes the information (tags and text) that defines the document's content. A well-formed (if a little basic) HTML document is shown in Listing 12.3, the output displayed in Microsoft's Internet Explorer is shown in Figure 12.2.
All tags have a name, and some tags may also have one or more attributes that are used to add extra information. Attribute values can be case sensitive and should be enclosed in quotes if they include any space or special characters (if in doubt—quote), both single and double quotes can be used.
A little confusingly, a few common HTML tags do not normally come in pairs because the end tag can be omitted. These include <IMG>, which inserts a graphic image, and <BR>, which causes a line break.
Table 12.3 shows a list of HTML tags and attributes that can be used to format a simple HTML document. Only tags from this list are used in today's lesson. This is not a full list of HTML tags, nor does it show any attributes to the tags. For a definitive list, see the latest HTML specification available from www.w3.org.
Listing 12.4 is an HTML document that illustrates the use of some of these tags. It contains an input form with a button and outputs data in the form of a table.
The output of this code in Microsoft's Internet Explorer version 6 is shown in Figure 12.3. The page will look similar, but not necessarily exactly the same, in other browsers.
This completes the discussion of HTML; access the latest standard and other documents on the WC3 Web site (www.w3.org) for more information on the standard.
3.22.74.160