HTML is the language of the Web. It is used to encode embedded directions (tags) that indicate to a Web browser how to display the contents of a document.
The HTML standard is under the authority of the World Wide Web Consortium. Browser developers have implemented HTML differently, according to their whim, and have added their own proprietary HTML extensions (to the point that different versions of the same browser may handle the same HTML tag differently).
TIP
In your servlet code, you are advised to restrict the use of HTML to well-established tags and features. All the HTML covered here will work in all the popular browsers.
An HTML document has a well-defined structure consisting of required and optional HTML elements.
An HTML element consists of a tag name followed by an optional list of attributes all enclosed in angle brackets (<...>). Tag names and attributes are not case sensitive and cannot contain a space, tab, or return character. Most HTML tags come in pairs—a start tag and an end tag. The end tag is the same as the start tag but has a forward slash character preceding the tag name. For example, an HTML document begins with <HTML> and ends with </HTML>.
Tags are nested. This means that you must end the most recent tag before ending a preceding one. Apart from this restriction, the actual layout is a completely free format. An indented layout can be used to aid readability but is not required.
Each HTML document has an optional HEAD and a BODY. The HEAD is where you pass information to the browser about the document; text in the header is not displayed as the content of the document. The BODY includes the information (tags and text) that defines the document's content. A well-formed (if a little basic) HTML document is shown in Listing 12.3, the output displayed in Microsoft's Internet Explorer is shown in Figure 12.2.
<HTML> <HEAD> <TITLE>My Very First HTML Document</TITLE> </HEAD> <BODY> <H1>Here is a H1 header</H1> <P>and here is some text – hopefully it looks different from the header</P> </BODY> </HTML> |
All tags have a name, and some tags may also have one or more attributes that are used to add extra information. Modern design style is to always enclose attribute values in single or double quotes as this reflects the requirements of XML documents (see Day 16, “Integrating XML with J2EE”). Unlike XML HTML allows simple attribute values (those not containing spaces or other special characters) to be specified unquoted.
A little confusingly, a few common HTML tags do not normally come in pairs because the end tag can be omitted. These include <IMG>, which inserts a graphic image, and <BR>, which causes a line break.
Table 12.3 shows a list of HTML tags and attributes that can be used to format a simple HTML document. Only tags from this list are used in today's lesson. This is not a full list of HTML tags, nor does it show any attributes to the tags. For a definitive list, see the latest HTML specification available from www.w3.org.
Listing 12.4 is an HTML document that illustrates the use of some of these tags. It contains an input form with a button and outputs data in the form of a table.
NOTE
Many of the examples in this and subsequent chapters use a <LINK> element to associate the Web page with a cascading stylesheet (CSS) to define formatting requirement. In theory different browsers should display Web pages formatted using a stylesheet in the same manner but typically there are minor differences. You do not need to know how stylesheets work in order to understand the basic principles of J2EE Web components, such as servlets, so no detailed discussion of them is included here.
The output of the page in Listing 12.4 from Microsoft's Internet Explorer version 6 is shown in Figure 12.3. The page will look similar, but not necessarily exactly the same, in other browsers.
This completes the brief discussion of HTML; access the latest standard and other documents on the WC3 Web site (http://www.w3.org) for more information on the standard.
18.224.59.145