CHAPTER THREE: HTML for Content Structure

Chapter opener image: © Godruma/ShutterStock

Overview and Objectives

The time has now come to learn the essentials of the latest widely used incarnation of the HyperText Markup Language, HTML5. This language will not only help you create the look of most of the web pages you see every day, but it will also provide the foundation for many of the technologies you will be using later on in this book. Our immediate goal will be to extend our simple website by incorporating some commonly used HTML markup, adding more pages, and introducing the notion of web page validation. Then we will show how to extract the common parts of our web pages and place them in one location, from there to be included everywhere they are needed, thus allowing us to make changes in one place that will “show up everywhere”.

So, in this chapter we will discuss the following topics:

  • A brief history of HTML

  • The importance of maintaining both a conceptual and physical separation of the structure and presentation of our web page content

  • HTML tags and elements

  • The basic structure of every web page

  • The DOCTYPE declaration and web page validity

  • Some basic HTML markup, including the head, title, and body of a page, as well as headings, paragraphs, line breaks, tables, images, comments, tag attributes, and HTML entities

  • Multipage websites and hyperlinks connecting the various pages

  • A mechanism for inclusion of common material in several different website documents

  • The new HTML5 semantic elements and how they got their names

3.1 The Long Road to HTML5, the New Norm

It has been a long and tortuous journey getting to HTML5, the current standard for web page markup, and the journey is not yet over. The first documents on the World Wide Web were created using the original, and very primitive, version of the HyperText Markup Language (HTML). The history of HTML goes back to 1980, when Tim Berners-Lee, a physicist, began to create a system for sharing documents among fellow physicists. The first specification for HTML was eventually published in 1991 by Berners-Lee, and HTML subsequently evolved through several standards, until its development was essentially “frozen” at version 4.01. Then along came XHTML, which was essentially HTML rewritten to comply with the much stricter standards of XML (see Chapter 11), and for a while this became a commonly accepted standard for use by all web browsers and developers. XHTML is very similar to HTML, and many people regarded it as an improvement over HTML because it required developers to be more careful and consistent when writing markup code. Unfortunately, as time went on the web community decided it wanted less strict standards, rather than more, so further development of the XHTML standard stalled, and work on XHTML 2 was eventually halted altogether, in late 2009. See TABLE 3.1 for a brief summary of HTML and XHTML history.

TABLE 3.1 Some major milestones in the “tortured” history of HTML.

Date Version Notes
1991 original Tim Berners-Lee first publicly describes HTML.
1995 HTML 2.0 First standard.
1996 W3C assumes control of HTML specs.
January 1997 HTML 3.2 First version developed and standardized by W3C.
December 1997 HTML 4.0
December 1999 HTML 4.01 HTML “frozen” at this version; everyone supposed to start using XHTML.
January 2000 XHTML 1.0 This was HTML rewritten as an XML (more strict) application.
May 2001 XHTML 1.1 Included some relatively minor changes to XHTML 1.0.
January 2008 HTML5 HTML5 published as a “Working Draft” by the W3C.
2009 XHTML 2 Work discontinued on XHTML.
May 2011 HTML5 W3C establishes clear milestones for HTML5.
October 2014 HTML5 W3C declares HTML5 standard complete (which does not mean that every browser, or even any browser, has implemented all features).

Although XHTML is no longer the standard that you should follow, it had many “rules” that you can still use as “guidelines” when you write your HTML markup, and doing so will result in much “better” web pages. We will provide more details on this approach later, and we recommend that all new web page markup code (including yours) be written using the guidelines we will present. And if, by chance, you have already written some XHTML markup using its much stricter rules, there is virtually nothing you will have to “unlearn”.

3.2 A Very Important Distinction: Structure vs. Presentation

The purpose of HTML is to describe the structure of a web page. That is, HTML is used to indicate which parts of the page content should be treated as headers, which should be ordinary paragraphs of text, and what should be placed in the rows and columns of a table, for example. Note that this notion of structure does not include any mention of such things as what fonts should be used for the various items of text, nor in what color that text should appear. These are aspects of the presentation of the page content, and you will find it convenient to keep these two aspects of each web page separate in your mind, as well as in the actual documents that make up your web pages. For one thing, doing so will make it much easier for you to change the “look and feel” of your page design, should you decide to do so.

Unfortunately, during the early days of web development, browser developers got impatient and “jumped the gun” by inventing HTML tags such as <font> to deal with presentational aspects of their web pages. This was not always done consistently from browser to browser, which was one problem, and it was eventually recognized as a bad idea in any case. The solution was to describe the presentational aspects of a web page using Cascading Style Sheets (CSS), which we will discuss in the next chapter.

3.3 HTML Tags and Elements

An HTML document, such as the one shown in FIGURE 3.1, contains nothing but text. In fact, it will contain “ordinary text” that you want to display on your web page, as well as markup (i.e., “formatting commands” or “structure-indicating commands”) that describe the structure of your document. This markup is specified using HTML tags. An HTML tag is a single letter or “keyword” enclosed in angled brackets < >, as in <html> or <p>. This “keyword” may or may not be an actual “word”. More often than not it is just some mnemonic sequence of one or more alphanumeric characters. When referring to an HTML tag, we may or may not include the angle brackets, depending on context. For example, we may refer to “the <p> tag”, or simply “the p tag”.

Usually, but not always, HTML tags come in pairs. Each pair encloses the content to which that pair applies, and the tag pair together with its content is called an HTML element. Thus we can also refer to a “p element” and mean some particular <p>...</p> tag pair and its content.

Often it is convenient to think of the opening and closing tags of the element as indicating the beginning and end of a formatting instruction for the enclosed text. Both tags in the pair use the same keyword. However, the tag that indicates the end of the element has a forward slash (/) before its keyword.

FIGURE 3.1 ch03/first.html

The HTML markup for the first version of a single-file website for Nature’s Source.

For example, we have a tag <html> that indicates the beginning of an HTML document. It is paired with the tag </html> that indicates the end of the HTML document. Everything included in this pair is the HTML document, or, looked at another way, the entire HTML document is an html element. Similarly, a <p> tag indicates the beginning of a paragraph, the next </p> tag indicates the end of that paragraph, and the two tags, together with the actual text of the paragraph between them, comprise a p element.

HTML is not case-sensitive, so any HTML tag may contain all lowercase letters, all uppercase letters, or a mixture of both, but we (and most developers) recommend using all lowercase. This is one example of an XHTML rule that we recommend as an HTML guideline.

An HTML document file will typically have an extension of either .html or .htm. Thus our first HTML document file (shown in Figure 3.1) is called first.html. Note that the content of first.html is just the content of first.txt from Chapter 2 with the addition of some HTML tags. Note as well that we have for the moment deliberately left the text formatted as it was in first.txt, and have placed all HTML tags at the left margin, except for the closing </title> tag.

With our next example we shall begin paying more attention to formatting to make our HTML more readable (to human readers).

3.4 The Basic Structure of Every Web Page

Figure 3.1 gives us some idea of what goes into the making of an HTML document. Although it is not strictly necessary, it is a very good idea (what we might call a “best practice”) to make sure that every HTML document has the following general form:

<html>
  <head>
    <title> ... </title>
  </head>
  <body>
    ...
  </body>
</html>

Here the outermost <html>...</html> tag pair indicates the nature of the entire document. In this “skeletal” code we show as well, for the first time, the indentation level (two spaces) we will use to help make our code more readable.

Note that each HTML document should be separated into a head element and a body element by the <head>...</head> and <body>...</body> tag pairs. The information enclosed in the <head>...</head> tag pair is general information about the web page that is not shown in the display area of the browser window, while the information enclosed in the <body>...</body> tag pair is the actual page content and is shown in the display area of the browser window.

Before continuing the discussion, let’s take a moment to look at FIGURE 3.2, which shows how first.html will look in a web browser.

FIGURE 3.2 graphics/ch03/displayFirstHtml.jpg

A display of ch03/first.html in the Firefox browser.

Notice that the top bar (the “title bar”) of the browser window shows the name of the browser itself, and, depending on your browser and settings, it may show the “title” of the web page. This title is the one specified using the tag pair <title>...</title> that appears inside the <head>...</head> tag pair (in other words, it’s the content of the title element).

Although the <body>...</body> tag pair contains the content that is displayed in the browser window, note that all the “formatting” we were so careful to insert in our textfile first.txt (see Figure 2.1) has disappeared. That formatting was also preserved when the file was displayed in the browser as a simple textfile, as shown in Figure 2.2. But all that formatting has been lost, now that the file is being viewed by the browser as an HTML file. The reason for this is that there is a different MIME type at play here (text/html rather than text/plain, because of the .html file extension), and the browser has its own ideas about how the content of such a file, containing ordinary text along with some markup, should be formatted. We can now see that this is different from how it thinks plain text, in a file with a .txt extension, should be formatted.

This is why we need to tell the browser how it should view the “structure” of our file, which we do using appropriate tags. In fact, we look at how to structure the content in the body of first.html in the next section.

3.5 Some Basic Markup: Headings, Paragraphs, Line Breaks, and Lists

The body of our first HTML document, shown in Figure 3.1, contains just two HTML elements.

The first tag pair, <h1>...</h1>, is used to specify a first-level heading element. The number 1 that follows h indicates the heading level. An h1 heading provides the largest display of its content, while h2 provides the second largest display of its content, and so on, down to h6.

The second tag pair in our first HTML document is <p>...</p>. The <p> opening tag indicates the beginning of a paragraph and the </p> closing tag marks the end of that paragraph. Even though we have typed the contents of the paragraph in the file first.html in Figure 3.1 with a format that seems reasonable, its display shown in Figure 3.2 completely ignores the formatting specified using our manually inserted line breaks and indentations. The paragraph appears as “running text”. If we want to break lines and provide an indented and itemized list, we will have to insert additional tags to indicate the desired structure.

In the ch03 subdirectory of the book’s website files you will find a version of the file modified in just this way. It is called second.html and contains the required additional formatting tags. The file itself is shown in FIGURE 3.3. Its display in a web browser appears in FIGURE 3.4.

The first thing to note about Figure 3.3 is the first line, which illustrates an HTML comment. Comments in HTML have the following syntax:

<!-- Text of the comment goes here. -->

Such comments can be single-line, multi-line, or end-of-line, but cannot be nested. In other words, one comment cannot be placed inside another.

FIGURE 3.3 ch03/second.html

The HTML markup for the second version of a single-file website for Nature’s Source.

FIGURE 3.4 graphics/ch03/displaySecondHtml.jpg

A display of ch03/second.html in the Firefox browser.

The second thing to note about the file is that it has been formatted using our particular formatting choices, which we will attempt to keep consistent throughout the text. There are many different formatting styles you could use for HTML “source code” (a term you quite often hear, but since it’s not really “code” in the usual sense, a much better term is HTML markup). Ours is, of course, just one of those styles. It is a reasonable one, however, and you could do worse than imitate it. The main features of our formatting style include an indentation level of two spaces, and the fact that when one element is nested inside another, the inner element is indented a further level with respect to the outer. Most decent editors and IDEs can be configured to perform this sort of formatting automatically, but no matter what software you use you often need to do a little format-tweaking after the fact to get things just the way you want them. Whatever style you use, or are required to use, be sure to use it consistently.

You also see three new HTML tags in second.html. The first of these new tags is br (line 11), which is used to insert a line break into the text. This <br> element is different from previous elements you have seen. It does not come with a tag pair, and has no closing tag. Elements like this are also called empty elements, since they have no content.

The second new tag is ul. The <ul>...</ul> pair (lines 13 and 18) lets us create an unordered list by enclosing all the items in our list. Such a list is one in which the list items are not numbered; instead, each list item is preceded by the same symbol, which is often a “bullet” by default. Every item in the list is, in turn, enclosed within a <li>...</li> tag pair, which illustrates li, the third new tag.

As you can see in Figure 3.4, the use of <h1>, <p>, <br>, <ul>, and <li> makes it possible for us to create a reasonably formatted web page, which is already much improved over the display of plain text.

But there is more to a web page than well-formatted text. We will soon introduce more HTML tags by adding to our web page a table with rows and columns, as well as images. But before doing that we need to discuss a very important aspect of web development: how to ensure that our web pages are valid, and why we should do so.

3.6 What Does It Mean for a Web Page to Be Valid?

So far we have been creating web pages more or less “on the fly”, trying as we go to show you some useful HTML features in context. You may have begun to wonder if we know what we’re talking about. Our pages do show up in browser windows and have a reasonable appearance, but how confident does that make you feel?

Historically, browsers have been written to be very “forgiving” when displaying web pages. In other words, even if web designers were not very careful when designing their web pages, and didn’t follow all the rules, a browser would often still be able to do a reasonable job of displaying such pages. That is one reason why the web contains so many very badly constructed web pages: Folks could be sloppy and get away with it; they were and they did.

In any case, you have a right to expect that a browser will be able to display any one of your web pages, so long as it is a valid web page. But what exactly does this mean? Simply put, it means that if you are using HTML to mark up your web pages, then you should “do the right thing” when creating those pages. Thus a web developer should know what the rules and guidelines are, and should try to follow them.

This is not as bad or onerous as it may seem at first. As we have already mentioned, HTML5 is much less strict than XHTML, but we think it is still worthwhile to follow some XHTML rules, even if they are not, strictly speaking, “rules” in HTML5. We will not provide all the rules we eventually want to use at this point, just a short but useful subset to get us started:

  1. Use lowercase for all tag names (required for XHTML, recommended for HTML5).

  2. Use both opening and closing tags for all elements with content (required for XHTML, recommended for HTML5).

  3. Make sure tag pairs are properly nested (required for XHTML, and just a really good idea in any case).

These are three guidelines that we recommend you follow when writing your HTML markup. There are many ways in which you could not follow these guidelines, and still have a web page that displays fine and even validates as HTML5, but adherence to the guidelines will make your markup much more readable, modifiable, and thus maintainable.

The last of the above guidelines, the “proper nesting of tags”, is one we have in fact followed, but not mentioned explicitly. It’s easiest to show what this means by giving an example where tags are not properly nested. Suppose we had created a file in which the tag order was like this:

<head>
<title>
...
</head>
</title>

Then we would be violating this rule because the title element must be completely contained within the head element. In general, if the opening tag <tag2> follows the opening tag <tag1>, then the closing tag </tag2> must precede the closing tag </tag1>. This applies whether the elements in question are contained in either the head element or the body element of the page.

We will say more about HTML rules and guidelines as time goes on, but for now you will be safe (and your pages will be valid) if you simply emulate the practices you observe in our sample files.

3.7 How Can We Determine if a Web Page Is Valid?

It is one thing to feel confident that you have followed all the necessary rules for inserting HTML markup code into your web page files, but is there a way you can be certain you have done so?

The answer, fortunately, is yes. You can submit your file to an online validator, of which there are several. To do this you browse to the online validator site, enter the URL of the web page you wish to validate into the validator, and click a button that starts the validation process. The validator will then provide a feedback report telling you either that your page is fine, or that you have violated one or more of the rules.

FIGURE 3.5 ch03/third.html

The HTML markup for the third version of a single-file website for Nature’s Source. Same content as second.html, but new or modified lines 1, 3, and 5 now contain information that a validator will use during the validation process.

Since online validators are usually capable of validating not only HTML5 web pages, but also very early and more recent versions of HTML, as well as XHTML, it is necessary to tell a validator what version of HTML (or XHTML) has been used in the construction of the web page you are validating, as well as some information about the encoding scheme (character set) you are using. To do this you need to add some informational lines at the beginning of your file. In particular, although not everything we list here may be strictly necessary, we recommend that you do each of the following:

  1. Add a DOCTYPE declaration to indicate the version of the markup language being used. This is actually very important, since it tells the validator what markup version you’re using, information it will need to perform a validation. (See the first line of FIGURE 3.5, which shows the DOCTYPE for HTML5.)

  2. Modify the opening html tag by giving it a lang attribute that indicates what (natural) language the web page is using. (See line 3 of Figure 3.5, which indicates that the language of our web page is English. Tag attributes will be discussed later.)

  3. Place a meta element within the document head element to indicate the encoding scheme being used. (See line 5 of Figure 3.5, which indicates that we are using the utf-8 character set, a very commonly used web page standard.)

This may sound a little scary, but all of this is much simpler now with HTML5 than it used to be with XHTML, and fortunately you can easily turn it into a ritual, then stop thinking about it. The file third.html is a copy of second.html modified to contain this information for the validator. Figure 3.5 shows this file, which now contains all of the information mentioned above.

Having to do this for each file that you wish to have validated is the bad news. The good news is that you can treat the lines containing all of this additional information as “boilerplate” markup that you simply copy and insert into any new file you are creating. Then, as long as you “follow the guidelines” in the rest of the document (for the kind of document you said it would be, in our case HTML5), that document should validate without any problem

Now, once you have prepared your document and inserted the validation information, how do you actually validate it? There are a number of validation sites on the Internet, including this one:

http://validator.w3.org

This site is maintained by the World Wide Web Consortium itself and is very easy to use. At the time of this writing this site seemed to be in transition, so you might find a recommendation to go elsewhere if you use this site. Not to worry. You will always be able to find a site or a tool to validate your web pages, and you should always do so.

To use the above site for validating one of your web pages, first go to the site and enter the URL of the web page you wish to have validated into the window provided for that purpose. Then click on the button marked Check. You will then either be told that your page is valid, or be given an itemized report on the ways it violates the specifications of the document type (DOCTYPE) against which it was validated.

For example, FIGURE 3.6 shows a (partial) validator display just as the file third.html is about to be validated, and in FIGURE 3.7 we see a display (again partial) of the validator showing a successful validation of that file.

3.8 Validating with the Firefox Web Developer Add-on

As you have just seen, it is possible to perform web page validation just by entering the URL of the page to be validated into the validator. However, given the frequency with which you should be validating your pages during the development process, this is a lot of effort that you do not want to repeat as often as would be necessary.

To our rescue comes the Web Developer add-on tool for Firefox. Notice we say it comes to “our” rescue, and it can come to your rescue as well, if you are using Firefox as your browser of choice. However, all browsers have either built-in features, or available add-ons, that can be very helpful to web developers. We will describe how we use the Firefox Web Developer add-on to help us quickly validate web pages, but the tool can be used for many other things as well, and you should explore the possibilities if you install it.

FIGURE 3.6 graphics/ch03/displayThirdHtmlToValidate.jpg

A Firefox browser display just before clicking the Check button to validate the file ch03/third.html.

First, you do need to install it, and we should say here we are referring to the Web Developer Firefox add-on available from chrispedrick.com. This add-on is also available for the Chrome and Opera browsers, but in our judgement it is more convenient to use in Firefox. Once you have it installed and activated you have an extensive toolbar extending across your browser window underneath your address bar. One of the menu items is Tools, and clicking on it reveals a dropdown list of choices, one of which is Validate HTML. Clicking on that submenu option will cause the current page to be validated at the W3C site discussed earlier. Note that there is also a keystroke shortcut for this operation, namely Ctrl+Shift+V. The Web Developer add-on about to perform a validation is illustrated in FIGURE 3.8.

FIGURE 3.7 graphics/ch03/displayThirdHtmlValidated.jpg

A Firefox browser display showing a successful validation of the file ch03/third.html.

By the way, the keystroke combination Ctrl+Shift+V is not built-in, but it (or another keystroke combination) can be set under the Web Developer’s Options menu, if there is no built-in keystroke combination for this action by the time you install the add-on. Click first on Options, then Options . . ., finally Tools, where you can click on the Validate option you want to edit. In addition to editing (or setting) the keystroke shortcut, you can also choose the site you want to use for validating, if you wish. Try to choose a keystroke combination that is not used for anything else in your local context.

FIGURE 3.8 graphics/ch03/WebDeveloper.jpg

A Firefox browser display showing the activated Web Developer toolbar and the Options dropdown menu about to validate the file ch03/third.html.

Also, do not confuse the Validate HTML option with the Validate Local HTML option. The former is for validation of a page that has been loaded into your browser from a web server, while the latter (but not the former) can be used if you have simply opened the web page file from a directory on your personal computer.

3.9 Tables, Images, and Tag Attributes

The images and other multimedia files that appear on web pages are in fact stored in separate files and embedded into the appropriate web pages during the display process. When dealing with several such files, it is important to set up certain conventions for file organization. We have created a subdirectory called nature1 under the directory ch03 of our website files, which contains the web page and related files used in this section. If you look in that directory, you will notice that there are two items in the directory, a file called index.html and a subdirectory called images. The filename index.html has a special meaning. We are telling the web server that users should access the contents of the directory nature1 through the file index.html. Let us find out what happens when we try to access the directory nature1 on our website with this URL:

http://cs.smu.ca/webbook2e/ch03/nature1/

If you enter the above URL into your web browser, the web server will display the web page produced by the file index.html from the directory ch03/nature1. This file is shown in FIGURE 3.9. Note that we do not have to specify the filename index.html, although we could do so. This is because the Apache web server looks for a file with this name “by default” when you browse to a directory. In fact, most any browser can be configured to look for a specific file (or for several different specific files) when the user browses to a directory, and though often the name of such a file is specified to be index.html, it may therefore be something else, such as home.htm, or even default.asp.

FIGURE 3.9 ch03/nature1/index.html

The HTML markup for the home page of this chapter’s first version of a multi-file website for Nature’s Source, using a simple (and temporary) table layout, and containing links to two image files.

FIGURE 3.10 graphics/ch03/nature1/displayIndexHtml.jpg

A display of ch03/nature1/index.html in the Firefox browser. Photo: © AlexBrylov/iStockphoto

The actual browser display of this file is shown in FIGURE 3.10. You can also load this file by opening the file nature1/index.html in the ch03 folder, if you have copied the textbook files to your computer.

Let’s examine Figure 3.9 a little more closely to see what’s new in nature1/index.html. The display of this file in the web browser, as shown in Figure 3.10, shows that the contents of the web page are arranged in a table format. This table contains a total of four cells. The top-left cell contains the logo of our company. In the top-right cell, we have the address and other contact information for the company. General information about the company is in the bottom-left cell. Finally, an image that may convey what the company is all about is embedded in the bottom-right cell. For example, in our case, this (subliminal) message might be, “Use our products to get fit and enjoy the great outdoors!”

There are four new tags in the HTML markup of the file nature1/index.html, shown in Figure 3.9. Three of them are related to the table used to create the table of cells on our web page:

  1. The table element itself is specified by the <table>...</table> tag pair and the complete contents of the table (its rows and columns) are the content of this element.

  2. Each row of the table is specified by a <tr>...</tr> tag pair. Since this particular table has two rows, it has two tr elements.

  3. Similarly, each column of the table is specified by a <td>...</td> tag pair, and you can think of td as standing for “table data”. In this table each row contains two columns, so each row has two td elements “nested” within it.

This is also the first of our HTML documents to display an image on its web page. This is done using an <img> element. This is another empty element, like the <br> element. Of course, if we are going to display an image that is contained in a file, we have to specify the name of the image file, and possibly the full path to that file, if it is not in the same directory as the page that uses it. We have placed the two images in a subdirectory called images. One of these images is our Nature’s Source logo, and the other is the fourth of a group of six business-related images we will use in a later chapter when we implement a “rotating image” feature on our website. It is common practice to put all the images to be used in a given context in a single directory. While the directory name can be anything, images seems like an appropriate name and is a popular convention among web developers.

Now, if the <img> tag is not a pair, obviously we can’t make the path to the file the content of the pair. So, what do we do? Well, now we get to see another aspect of HTML tags, the tag attribute. Study the img tag from lines 12 and 13 in the index.html file of Figure 3.9:

<img src="images/naturelogo.gif" alt="Nature's Source Logo" width="608" height="90">

This line shows four attributes of the img tag. These are src, alt, width, and height. Note the syntax: First, attributes always follow the tag name inside the opening tag (although in this case there is only an opening tag), and second, each attribute is followed by an equals sign (=) and then by its attribute value, which is enclosed in quotes.

Not every HTML tag attribute needs to have a value. But if an attribute does have a value, you may choose to enclose that value in quotes, or not, unless the value has a space in it, in which case the quotes are necessary. However, it’s just simpler and safer to quote attribute values always. Both single quotes and double quotes are permitted, but be consistent. We also recommend using all lowercase for tag attribute names, just as we did for tag names. This is another XHTML rule that we recommend following as an HTML5 guideline.

The value of the src attribute of this img tag is the path to the file that contains the logo image we want to appear in the first column of the first row of our table, namely images/naturelogo. gif. In other words, this image is the content of the first td element in that row (lines 12–13). In a similar way, the image in the file images/outdoor4.jpg is made to appear in the second column of the second row (lines 40–41). Thus the src attribute of the <img> tag is mandatory: the browser has to know where the image file is located: It’s in the images subdirectory, which itself is in the same location as the index.html file.

Another mandatory attribute for the img tag is alt, whose value specifies the text to be displayed in case the image itself cannot be displayed for any reason. An image might not be displayed for any number of reasons. For example, the image file may be inaccessible, the browser may not support graphics, or a user with a visual disability may have the graphical display turned off. The text from the alt attribute value may also appear in a popup text box when your mouse hovers over the image (or may not appear, depending on the browser).

By “mandatory” in this context, we mean that a web page containing an <img> tag will not validate as HTML5 if either attribute is omitted. The web page will still display if the alt attribute is omitted; it just won’t validate as HTML5. Of course, the image won’t display, nor will the page validate, if the src attribute is omitted.

The width and height attributes of the img tag give the dimensions of the image as we want it to appear on our page. The values of these two attributes are in pixels, a unit of “screen real estate” about which we will say more in the next chapter on CSS. For now just think of a pixel as a single point in an on-screen image. The word pixel is short for “picture element”, and its actual size depends on your screen resolution. This is generally the default unit in HTML whenever you do not specify the unit to be used for some measurement. The abbreviation for pixel is px, but the unit should be omitted in this context, as it is the assumed default and, in fact, must be omitted if the page is to validate. This is also new in HTML5.

The width and height attributes are not mandatory, but without them the browser will display the image using its actual size, which may not be what we want. The ideal situation is that an image is the exact size you want to display, so the browser does not have to do any extra work to “scale” the image to meet your requested dimensions. You should still tell the browser what those dimensions are, however, since this allows the browser to “set aside” exactly the right amount of screen space for the image, even before downloading it. If an image is not the size you want it to be for display on your web page, it is better to use an image editing program to crop or scale the image to the exact size (or something close to the exact size) you want, to improve the loading speed of your page.

Actually, most HTML tags have attributes, including the attributes we have just discussed for the img tag. The reason we haven’t discussed attributes before is that, unlike two of the four we have seen for the img tag, most tag attributes are optional. All tags have “default behavior”, and attributes can be used to supply necessary or helpful information for the tag, or to alter tag behavior in some useful way.

3.10 HTML Entities

Another new HTML feature we see in nature1/index.html is the use of the special character code &amp; (see line 28 of Figure 3.9), which produces the ampersand character (&) when the web page is displayed.

Some characters have special meanings in HTML (the angle brackets enclosing tags, for example) and so if you simply want to display one of these characters you cannot just enter it into your file because it will be interpreted as having its special meaning rather than just appear as itself, thus causing the browser no end of confusion. Such special characters are often called metacharacters in general, and HTML entities in our current context.

You need a special code to tell the browser to display such a character. Each such code starts with an ampersand (&), ends with a semicolon (;), and in between has a word or character string indicating the character in question. The fact that the ampersand is used in this way is what makes the ampersand itself a metacharacter. A short list of commonly used HTML entities (metacharacters) is shown in TABLE 3.2.

3.11 Adding More Web Pages to Our Site and Connecting Them with Hyperlinks

A fundamental characteristic of the World Wide Web is the ease with which you can navigate from one web page to another page, either on the same site or on another site anywhere in the world, just by clicking on a link to a second page that appears in the first page. We are now ready to develop a more elaborate website consisting of multiple pages, with links (or, more formally, hyperlinks) connecting them.

TABLE 3.2 HTML entities, also called “special characters” or “metacharacters”.

HTML Entity Meaning Actual Symbol
&amp; ampersand &
&gt; greater than >
&lt; less than <
&divide; divide ÷
&plusmn; plus/minus ±
&cent; cent ¢
&euro; euro
&pound; British pound sterling £
&yen; Japanese yen ¥
&copy; copyright ©
&reg; registered ®
&trade; trademark TM

We have placed the files for this version of our website in a subdirectory called ch03/nature2. Once again the index.html file constitutes the “home page” of this simple website. The HTML markup for this version of our index.html file is shown in FIGURE 3.11, and the browser display of the file appears in FIGURE 3.12. The browser display is not fancy, but it shows a typical website structure for a company, with the company logo at the top, followed by a row of menu links, then the main content area, and finally a footer, which in this case contains two additional menu links.

In this new version of our index.html file we continue to use a table for layout, this time a table with four rows and five columns. Note that the first row (the logo row) and third row (the main content display area) contain the same content as the previous version of our index.html file, except that the contact information that appeared to the right of the logo has been removed and now appears on the page available under the Locations link, which you can reach, in turn, by clicking the Contact Us menu item on our “home page”.

FIGURE 3.11 ch03/nature2/index.html

The HTML markup for the home page of this chapter’s second version of a multi-file website for Nature’s Source, to which a main menu row and a footer row have been added.

FIGURE 3.12 graphics/ch03/nature2/displayIndexHtml.jpg

A display of ch03/nature2/index.html in the Firefox browser. Photo: © AlexBrylov/iStockphoto

3.11.1 A Menu of Hyperlinks

Row two (the main menu) and row four (the footer, with two additional menu items) are new to this version of our home page. The menu items we have chosen and placed in row two, and which will appear on each of the pages in this version of our website, include the following:

  • Home, which will always return the user to the home page, and therefore links to the index.html file, so that clicking it will again display the view shown in Figure 3.12. (Of course, if you are already at the home page of the website and click this link, you will see no change. If you are looking at any other page and click this link, you will be returned to the home page.)

  • e-store, which will eventually link to our complete e-commerce setup

  • Products+Services, which will take the user to information about the products and services provided by our business

  • Your Health, a business-specific link (Since the business we have chosen is related to health, it is only natural to have more information on that topic available via a link like this one.)

  • About Us, which takes the user to a page giving information about the company

The footer, which will also now appear on each of the pages of our site, contains some typical copyright information and two additional links that might be of interest to a site visitor: Contact Us and Site Map.

Begin your examination of the HTML markup in Figure 3.11 by looking first at the second row of the table (the second tr element), in lines 16 to 22. This second row is the only row that has all five columns (five td elements). Each of these columns contains one of our “links”, which is the usual shorter term used for the more formal term hyperlink like those in our menu.

Links are specified using the <a>...</a> tag pair. Each a tag, often called an anchor tag, has an href attribute, which is a required attribute for an opening a tag. The value of the href attribute specifies which URL to open when a user clicks on the link. In our case, all menu item links are to files that reside in a subdirectory called pages that is located in the same directory as index. html. Note that the Home link refers to index.html itself, but each of the other links has the form pages/filename.html, which reflects the fact that each of the other files is in the subdirectory pages. We can have more elaborate values for href such as http://mypyramid.gov/ if we wish to link to some external site. The text between <a> and </a> is what is displayed in the browser. For example, with the HTML code

<a href="pages/estore.html">e-store</a>

the browser displays the text e-store as a link, and clicking on that e-store link takes us to the file pages/estore.html, which will then be displayed in the browser window, replacing the display of index.html.

Now take a look at the first row of the table (the first tr element), in lines 10 to 15. Even though the table has five columns, we only specify one column in this first row. This column actually “spans” all five columns that appear in the second row. This is accomplished by using the attribute colspan with a value of “5” for the td tag of this column, which specifies how many columns of the table this particular column (td element) is to span (or “extend over”).

In the third row (lines 23–43) we have only two td elements, the first of which spans three of the five table columns and the second of which spans the remaining two columns. In the fourth and final row (lines 44–49) we have three td elements, with the first one spanning the first three columns of that row, and the remaining two occupying one column each.

As you might guess, the td tag also has a rowspan attribute (which is not needed or illustrated here), in case we need to have a table cell span more than one row.

3.11.2 Our Site Now Has Many Pages

There are quite a few other page files that belong to our nature2 website, and you will find them all in the pages subdirectory of ch03/nature2. In fact, there is a different page corresponding to each link on our home page, so all of the links on our home page are “live”. These pages, and others reachable by clicking on new links that appear on these other pages, form a kind of “skeleton” of the website we plan to develop. The links that appear on additional pages will appear in the first column of the main content area row, under the Home link of the main menu, and will lead to pages related in some way to that content area.

We will not examine all pages of this site in detail here, but we will look at two additional page documents that we can use to further our discussion. You should, however, take time to visit each of the pages of this version of our website, and note how the “look and feel” is maintained from one page to the next. Keep in mind that for the moment we are not concerned with the presentational style of our pages, but only their structure and (abbreviated) content as we try to get a feel for what our site will eventually provide for our users.

If you click on the About Us link on the home page, you will be taken to the page about. html, a display of which is shown in FIGURE 3.13. This is one of the pages with additional links of the kind mentioned above, which are seen to the left of the main content. These additional links provide, in effect, a submenu for the menu item About Us. In a later chapter of the text we will use JavaScript to create dropdown menus as a better alternative to this kind of submenu. All the HTML markup we have used in the file about.html is based on the concepts we have already discussed, and you should study the file to confirm this.

The final page document from this group that you will look at now is sitemap.html, which comes up if you click on the Site Map link at the bottom-right corner of any page on the site. Every website should have a site map, since users may not know how to find their way through a labyrinth of menus and submenus to find what they are looking for. A site map should give them a good, concise high-level view of the structure and content of the site.

The browser display of the file sitemap.html is shown in FIGURE 3.14, and we can see that this page can be viewed as having four “rows”, and that the top two rows of the web page, as well as the footer, are the same as all the other page displays on the site.

However, the third row of the markup, which you see in FIGURE 3.15, is different in that it contains a numbered list, in which some of the list items contain a “nested” unordered list, in which each list item is marked by a small empty circle, rather than the solid bullet character that is used as an item marker when an unordered list is not nested. These are browser defaults in action.

FIGURE 3.13 graphics/ch03/nature2/displayAboutHtml.jpg

A display of ch03/nature2/pages/about.html in the Firefox browser.

The ordered list is specified with the tag pair <ol>...</ol>. Each item in the ordered list is specified, in turn, with the tag pair <li>...</li>, just as we did previously for unordered lists. We nest an unordered list under four of the seven ordered list items using the <ul>...</ul> tag pair we discussed before.

Note that the row of the table containing the ordered list is in two columns (two td elements), with items 1 to 4 in the first column (the first td element) and items 5 to 7 in the second column (the second td element). This is actually achieved by having two ordered lists, with the second one having a start attribute with value “5” on its opening ol tag.

3.11.3 Beware the “Legacy Fix”!

You can also see another browser default in action on this sitemap page. Note that items 5 to 7 of the ordered list have some extra space above them. This is caused by the fact that whatever content is in a td element is, by default, vertically centered within that element. Hence the fact that items 5 to 7 of the ordered list in the second column of that row occupy less space than items 1 to 4 in the first column gives the observed effect. We could fix this with the td align attribute, but this would cause our page to fail HTML5 validation, so once again we choose to wait till we have a CSS solution. You will encounter this kind of situation often as you develop web pages. Many of the “legacy solutions” to this sort of “problem” will still work if you choose to use them, but they will not validate. Because we always want to have valid pages, the advice is obvious: Avoid this kind of “legacy fix”!

FIGURE 3.14 graphics/ch03/nature2/displaySitemapHtml.jpg

A display of ch03/nature2/pages/sitemap.html in the Firefox browser.

FIGURE 3.15 ch03/nature2/pages/sitemap.html (partial)

The fourth row of the table element in this file. Note the two td elements with different colspan values, and the nested lists.

3.12 Using Server-Side Includes (SSI) to Make Common Markup Available to Multiple Documents

Note that the version of our Nature’s Source website discussed in this section will not display properly unless it is being “served” by an SSI-aware web server. In particular, it will not display properly if you have just copied the files to your computer and try to view the website from there.

This section discusses a very important principle to keep in mind when you are developing your website, and one that we can illustrate quite nicely by taking a closer look at our website in ch03/nature2, and how we transform it into the next version of our website, given in ch03/nature3.

3.12.1 The “Maintenance Nightmare” Problem

First, note that our ch03/nature2 website now contains 19 page files in our pages subdirectory, and they all contain quite a bit of information that is the same in each file. That is, every one of those page files of this version of the website, as well as the index file, contains four “rows” of information (the logo row, the menu row, the main content row, and the footer row), and only the main content row differs from page to page. Of course, the markup for all four rows appears in each file, even though that markup is exactly the same for the first, second, and fourth rows in each case.

If we carry on like this, we are potentially leaving ourselves wide open to a very serious problem down the road. Suppose that at some point we want to change the wording of a menu option or add a new menu item, or change one of the names in the footer. What does this mean? It means that every single one of those 20 files will have to be edited and modified. And what are the chances that all the changes that have to be made will be made correctly and consistently across all affected files? And then, if another change is required, we will have to do it all over again. If this happens, and it almost certainly will, you will have what is often called a maintenance nightmare on your hands, and it should be clear why it has this name.

3.12.2 Identifying and Extracting Common Markup

So, what to do? The central idea is that you want to eliminate, as far as possible, duplicate markup, so that if changes are necessary you only have to make them in one place. To solve this problem we are going to employ a very useful technology called Server-Side Includes (SSI), and also introduce a new HTML tag, the base tag. As part of this process, we also restructure the content of the nature2 version of our website to look like this for the nature3 version:

nature3
  index.html
  /images
    naturelogo.gif
    outdoor4.jpg
  /common
    document_head.html
    logo_row.html
    mainmenu_row.html
    footer_row.html
  /pages
    about.html
    sitemap.html
    (... and the other seventeen page files)

The most obvious, and major, difference between the directory structure of ch03/nature2 and that of ch03/nature3 is the appearance of a new subdirectory called common in ch03/nature3. This subdirectory contains these four files: document_head.html, logo_row.html, mainmenu_row.html, and footer_row.html. Each of these files contains markup that is common to the file index.html, as well to the 19 other files in the pages subdirectory. The actual contents of each of these files is shown in FIGURES 3.17, 3.18, 3.19, and 3.20. With the common markup removed from all the files where it occurs and placed in these four files, the next question is this: How do we get that common markup into any file where it is needed before the file is sent to the browser for display?

3.12.3 Using SSI to Include Common Markup Where Needed

To answer this question, look at FIGURE 3.16, which shows the third version of our index.html file for this chapter. Note that lines 1, 5, 6, and 28 have a similar syntax. For example, here is line 1:

<!--#include virtual="common/document_head.html"-->

This line, whose syntax must be exactly as shown (and note that there is only a single blank space) causes the Apache server to include the contents of the file document_head.html into this latest version of our index.html file, in place of this line, before index.html is sent to the browser for display. This action of the server software, namely the “including” of the text of an external file on the server before sending it to the browser, is the reason for the terminology used to describe the process: SSI. Of course, each file in the pages subdirectory will also have its common markup sections replaced by the same four “virtual include” lines that we see in index.html.

The contents of the file document_head.html are shown in Figure 3.17, and as you can see from that figure, the file contains the initial lines of markup needed by index.html, as well as by each of the other pages in our pages subdirectory. This common piece of markup includes everything from the DOCTYPE declaration down to the end of the head element. It also includes the promised new HTML base element, which we will explain shortly.

FIGURE 3.16 ch03/nature3/index.html

The HTML markup for the home page of this chapter’s third version of a multi-file website for Nature’s Source, which now contains the “virtual include” directives that cause the markup common to all pages in this version of the site to be included from external files.

FIGURE 3.17 ch03/nature3/common/document_head.html

The file containing the initial HTML page markup, which is included in the file ch03/nature3/index.html, as well as in each file in the ch03/nature3/pages subdirectory.

FIGURE 3.18 ch03/nature3/common/logo_row.html

The file containing the company logo, which is included in the file ch03/nature3/common/index.html, as well as in each file in the ch03/nature3/pages subdirectory.

FIGURE 3.19 ch03/nature3/common/mainmenu_row.html

The file containing the main menu, which is included in the file ch03/nature3/common/index.html, as well as in each file in the ch03/nature3/pages subdirectory.

FIGURE 3.20 ch03/nature3/common/footer_row.html

The file containing the footer information, which is included in the file ch03/nature3/index.html, as well as in each file in the ch03/nature3/pages subdirectory.

In a similar way, the contents of the logo row, the main menu row, and the footer row of our previous index.html file (from ch03/nature2) are “included” (by SSI) into our ch03/nature3/index.html file by lines 5, 6, and 28 of that file.

By placing common content in four separate files in a subdirectory called common, and including their content in index.html at the appropriate places, we have accomplished what we set out to do. There is one possible “gotcha”, however, which we now point out. While in our index.html file the line that includes the document_header.html file is

<!--#include virtual="common/document_head.html"-->

the corresponding line in each of the files in the pages subdirectory is different:

<!--#include virtual="../common/document_head.html"-->

In the above line we need the ../ in front of common because the value of virtual must be the path to the file to be included, which in this case is the relative path from the pages subdirectory to the common subdirectory (we go up one level from pages to its parent, and then down into common).

Now each page document on our site can include all necessary markup from the four files in the common subdirectory, and if any changes need to be made to anything in any one of these four files, the changes need only be made in the files in that subdirectory, and those changes will appear in each page of the site the next time the page is displayed. The remaining content of each page document file on our website (the files in pages) is specific to that file.

3.12.4 One Thing Leads to Another: A Second Problem

However, we are not quite finished. When we solved this one problem by extracting all the common markup and localizing it in the common subdirectory, we have inadvertently created another problem. The problem arises because by default when the href value of a link contains a relative path, that path is relative to the directory where the file containing the link is located. We can see what this problem is by looking at mainmenu_row.html in Figure 3.19, which contains the markup for our main menu. When that file is included in index.html, all the menu links will work fine, because all the menu links are relative to the nature3 directory, the same directory where index.html is located. But when that same file is included in any one of the files in the pages subdirectory of nature3, none of the menu links will work, because none of the menu links is relative to the pages subdirectory.

3.12.5 The base Tag Solves Our Second Problem

This is where we introduce the HTML base tag, which gets us out of this conundrum. Take a look at line 7 in the file document_head.html in Figure 3.17:

<base href="http://cs.smu.ca/~webbook2e/ch03/nature3/">

This line defines the href attribute for the base element of the website for our text. Once we have done this, every relative path in the links on our site will be appended to the value of the href attribute in our base tag before being used. This means (for example) that it doesn’t matter whether the file mainmenu_row.html has been included in index.html or in pages/about. html, when we click on the e-store menu link, we are activating the following link:

http://cs.smu.ca/~webbook2e/ch03/nature3/pages/estore.html

Note that all of the submenu links that appear on any of the page files in the pages subdirectory have to be modified in the same way as the main menu links that appear on all pages.

And remember this: If you have installed our textbook files on your own server, you will need to change the value of the href attribute of the base tag in this file (and in the corresponding file in later chapters) to the appropriate value for your local situation.

3.12.6 Our Revised Site Looks and Behaves Exactly Like the Previous Version

To summarize, our ch03/nature3 website is a revision of our ch03/nature2 website, in which each page on the site has been revised to include the markup common to each from four separate files. This, of course, is a “behind the scenes” effect, and if you start by displaying the index.html file for this version and then view any or all pages of the site you should see exactly what you saw in the previous version. We sometimes describe this kind of scenario as something that is “transparent to the user”.

3.13 The New HTML5 Semantic Elements

In this chapter we have introduced, discussed, and used a sufficient number of HTML elements and some of their attributes to allow us to create some simple but functional web pages. The HTML elements we have chosen to study so far could be described as “legacy” elements, in that they have always been available in HTML.

However, with HTML5 we have access to many new elements of various kinds. We will want to use some of these elements in the next chapter when we begin to discuss CSS, so we mention here some of the more useful ones. Before doing that, however, we need to say a bit more about some “legacy” HTML features and elements, which will help us to understand how some of these new HTML5 elements came to be.

3.13.1 Block-level Elements and Inline-level Elements

An important distinction to be aware of when you are placing HTML elements into your web documents is the difference between block-level elements and inline-level elements.

Block-level elements occupy their own “vertical space” on the page, and generally cause the browser to place extra space both before and after the element (how much space depends on the element and the browser). Examples of block-level elements that we have seen include the heading elements (h1, etc.) and the paragraph element p.

Inline elements, such as the img element, do not cause any additional space to appear either before or after them.

Another aspect of nesting in HTML is that some (but not all) block-level elements permit other block-level elements to be nested inside them, but you cannot nest a block-level element inside an inline element.

This idea of a distinction between block-level elements and inline-level elements remains a useful one, but it has been greatly expanded in HTML5 into a much more complex content model, the details of which need not concern you here, and a discussion of which would take us too far afield.

3.13.2 Semantic Elements and Non-Semantic Elements

Another useful distinction to make among HTML elements is that of a semantic element vs. a non-semantic element. We can also refer, equivalently to semantic tags and non-semantic tags.

A semantic element is an “element whose tag name has meaning”. Examples that we have seen are table and img, since each of these tag names indicates what kind of information is associated with the corresponding element.

A non-semantic element is one whose tag name suggests nothing about its content. Two of the most important such elements are the ones we introduce next.

3.13.3 Two More Legacy Elements: div and span

If you look “behind the scenes” at almost any web page on the Internet, you are likely to see a lot of div elements, many of them with an id attribute, especially if that page has been there for a while.

The div element has been an HTML “workhorse” for many years. It is a block-level element that allows a developer to group any number of other block-level and inline elements together and treat them as a single unit, for styling with CSS or for applying some action via JavaScript, for example. The id attribute of a div element is used to identify the element for styling or some other purpose.

The span element has a similar purpose, except that it is an inline-level element and so must be used to group or enclose only inline elements or information. Words or phrases within a paragraph often appear as content in a span element, for example.

Note that both div and span are non-semantic elements, and that leads directly into the following discussion of some new HTML5 elements.

3.13.4 New Semantic Elements in HTML5

HTML5 has quite a large number of new semantic elements, and here is an alphabetical listing of their tags: article, aside, details, figcaption, figure, footer, header, main, mark, nav, section, summary, time. This list contains more new elements than we will need, but there are some here that we will use in our development (main, header, nav, article, footer), but we will not discuss them in detail until the following chapter when we will combine their use with CSS. However, if you have read what we said above about what semantic elements are, you should agree that these are, in fact, semantic elements. That is, each one of these tags has a name that suggests what kind of information is likely to be associated with its element. For some the distinction might be a bit fuzzy; what’s the difference between an “article” and a “section”, for example? This is the kind of thing that is treated at length by the new HTML5 content model mentioned earlier, but a full discussion is beyond your needs.

It’s important to note that an element like the header element, for example, does nothing “extra” for you, in other words nothing that a div element with id="header" did not do, except greatly improve the conceptual structure of your HTML pages by specifically identifying those parts of your pages that are headers with an “official” semantic tag designed for just that purpose. Browsers will generally treat the header element as a block-level element, just as they do a div element, but even that is not guaranteed, and if that’s what you want you should say so using CSS, as you shall see.

The “back story” of how the names of these semantic tags were chosen is quite interesting. A lot of web pages were examined, and it was found that, for example, many thousands of them had a div element with an id="header" attribute. Even if some other attribute value, such as "head" or "heading" was used, the intent of such a div was clear. Since for many purposes on the web, it is much easier to deal with tags than with tag attributes, this analysis suggested that a new tag named header, whose content would be “header information” for a web page, was warranted, and so one was created. Similar analysis led to the addition to HTML5 of the other tags in the above list.

Summary

In this chapter you first learned that HTML was the first widely used markup language on the web, how it eventually was rewritten as XHTML, which did not “catch on”, and then returned to its roots in a new incarnation, HTML5.

We stressed the importance of maintaining a distinction between the structure of the content on a web page and its presentation. HTML should deal only with structure. We distinguished between HTML tags and HTML elements (tag pairs and their content).

Even though the XHTML standard is now behind us, many of its “rules” are still valuable and we recommended they continue to be used as “guidelines” when writing your HTML5 markup:

  1. Use only lowercase letters for both tag names and attribute names.

  2. Ensure elements with content have both an opening tag and a closing tag.

  3. Ensure that tag pairs are properly nested.

  4. Enclose attribute values in quotes.

You learned that the basic structure of any HTML document should include at least the following four elements: html, head, title, and body.

You saw how to apply some simple markup to the content in the body of an HTML document so that a browser can identify the structural divisions of the web page content and display them accordingly. HTML elements allow you to mark such things as headings, paragraphs, and lists. An element can also be empty (have no content), like the tag for a line break, and element tags can have attributes that can be used to alter the display of the element content or the effect of the element.

Some characters, such as the tag delimiters < and >, cannot appear in an HTML document except in their tag-delimiter context, so if you wish to have such a character in your document you have to use an HTML entity, such as &lt; for <.

You saw how the table element can be used for page layout, even though it should no longer be so used, but with CSS not yet at your disposal you had little choice. Fortunately, you also saw a bona fide use of a table (on our site map page).

You saw how to add images to your web pages, and how to link one web page to another when your site has multiple pages. Furthermore, when multiple pages have content in common, you saw how SSI can be used to avoid the maintenance nightmare of trying to keep duplicate code consistent when updating takes place.

We distinguished between block-level HTML elements and inline-level elements, as well as semantic and non-semantic elements, and pointed out how the new HTML5 semantic elements such as main, header, and footer allow you to create “better” web pages than you could using the traditional approach with div elements.

Finally, you now know what a valid web page is, and how to determine if a given web page is in fact valid.

Quick Questions to Test Your Basic Knowledge

  1. Who was Tim Berners-Lee and what part did he play in the origin of the World Wide Web?

  2. What is the relationship between HTML and XHTML?

  3. What was the last version of HTML before the (temporary) shift to XHTML?

  4. HTML was designed to describe web page structure. Some browser vendors tried to make it do more, but they should not have done so. Can you explain what we mean by these two statements?

  5. What is the difference between an HTML tag and an HTML element, and how are they related?

  6. What is an empty HTML element, and what is its general syntax? Give an example.

  7. What is, in your opinion, the best reason for keeping web page “structure” and web page “presentation” separate?

  8. What should the high-level structure of every HTML web page document look like?

  9. What would you give for a short description of what it means for HTML tags to be “properly nested”?

  10. Why do we recommend retaining some XHTML “rules” for our HTML “guidelines”?

  11. What is the purpose of a DOCTYPE declaration?

  12. For what did we use a meta element?

  13. What attribute should the opening html tag of any web page have?

  14. What does it mean for a web page (an HTML document) to be valid?

  15. How do you determine if an HTML document is valid?

  16. What have tables been used for in HTML that they should no longer be used for?

  17. What are the HTML comment delimiters?

  18. What is a pixel, and what is the abbreviation for it?

  19. Why do we need HTML entities?

  20. What is the syntax of an HTML entity? Give an example.

  21. What are the two required attributes of the img tag?

  22. What are two recommended attributes for the img tag?

  23. What tag is used for hyperlinks, and what is its only required attribute?

  24. We think it’s always a good idea to put attribute values in quotes, but when must you do so?

  25. Why should every nontrivial website have a site map?

  26. What is the term we used to refer to the problems that almost always arise when you have the same markup in many different pages on your website and have to change something in that markup?

  27. What does the acronym SSI stand for, and for what is this technology used?

  28. How many files did we use to hold the common markup in our nature3 website files, and what did each contain?

  29. What is the syntax of the “virtual include” directive that we used?

  30. The path of the file in a “virtual include” directive must be relative to what?

  31. For what purpose did we use an HTML base element?

  32. What is the difference between an HTML block-level element and an inline-level element? Give an example of each.

  33. What is meant by a “semantic element” in HTML? Give a “legacy example” (one from HTML prior to HTML5) and another example that is new in HTML5.

Short Exercises to Improve Your Basic Understanding

In these and subsequent exercises, we may sometimes explicitly ask you to make a copy of a file from the text and modify it in some way. However, it is worth pointing out that even in those cases when we do not explicitly ask you to make copies, whenever we ask you to make a change to a file from the text, it should be understood that we really mean for you to first make your own copy of that file and then do whatever is asked to the copy. That way, you can always go back to the original for a fresh copy if necessary.

  1. Load the file first.html from Figure 3.1 into your browser. It will probably not look exactly like the display we have shown in Figure 3.2. Try changing the size of the browser window to see if you can make it look more like that display. The main thing to note as you do this is how the text in both the heading and the following single paragraph “flows” to conform to the size of the display window.

  2. Make a copy of the file first.txt from Chapter 2 and call it first.html. Note that this first.html file will not be the same as the first.html file from the beginning of this chapter, since there will be no markup in it. In this case make no changes to your copy other than the file extension. Load both first.txt and this new first.html into your browser and take careful note of what should be some considerable difference between how the two files are displayed. Explain any differences you see in terms of MIME types. Repeat the exercise with one or more additional browsers to confirm that they all exhibit similar behavior, as should be the case.

  3. In the file second.html of Figure 3.3 make the line following the line break a separate paragraph, and to the p tag of that paragraph add the align attribute with a value of "center". Load the revised file into your browser and note how the new paragraph is centered above the list. Then repeat the exercise, this time giving the align attribute the value "right". Be aware that although doing this “works”, and you will often see it done this way on the web, it is not the way you should accomplish these tasks. It’s the kind of “legacy fix” we warned you about, and you should use instead techniques provided by CSS, the topic of the next chapter.

  4. In the file second.html of Figure 3.3 change the ul tag to ol, so that you get a numbered list instead of a bulleted list. Note that by default your list items are numbered 1, 2, 3, ... and so on, just as the unnumbered list had a bullet marker as the default. This default style can be changed by setting the value of the type attribute. Do a little “research” to find out what the alternatives are. This begs an obvious question: Can you change the default bullet marker for an unordered list in the same way? Find the answer to this question as well.

  5. Browse to the file third.html of Figure 3.5. Make a copy of its URL; then open up another window in your browser and go to http://validator.w3.org, paste in the URL, and click on the Check button. You should get a response highlighted in green saying the file validates. Make some changes in the file that will cause it not to validate.

  6. In the file of Figure 3.9 change the name of both image files so that the browser cannot find them, and then reload the page. In each case you should see the text of the alt attribute value in the place where the image would otherwise appear. Try this in more than one browser as well.

  7. Go to the W3Schools website (www.w3schools.com) and study the HTML table element, its associated th, tr, td, and caption elements and any relevant attributes (particularly colspan and rowspan). We have used the table element temporarily, of necessity, for page layout (not a good idea for the long term, as we have repeatedly pointed out), but tables are very useful for displaying data that is “tabular” in nature, and that is what they should be used for. Experiment with the table element and note in particular the defaults that are used for displaying text within a th element and a td element.

  8. Create a web page whose content is a table containing the data in Table 3.1.

  9. Create a web page whose content is a table containing the data in Table 3.2.

Exercises on the Parallel Project

In these exercises we ask you to replicate, for your previously chosen business, the kinds of web pages we have discussed in the text for our own sample health product business. Since the layout required is a table with two rows and two columns, you may use, as we have done in this chapter, the HTML table tag for your page layout, but only because we do not yet have an alternate approach.

  1. Before you begin, make sure you have thoroughly explored each of the three versions of our sample website: the single-file version (nature1), the version in which each page document has the duplicate markup embedded within it (nature2), and the final version in which the common markup has been extracted into the four separate files (nature3).

  2. Your first task is to produce a single-page website for your business that looks like the web page for Nature’s Source as shown in the markup of ch03/nature1/index.html and the browser display of Figure 3.10, according to the following specifications for the content of each table cell:

    1. The logo for your business goes in the top-left corner of the page. If you have a paint program, and are artistically inclined, you can produce your own logo for this purpose. Even a simple program like the standard Paint program on Windows can be used. Otherwise, at least for the moment, you can either search for a suitable logo on the Internet (googling “free logos” should turn up something) and hope you can find a suitable one to use, or simply place text in this cell for the time being. In any case, the name of your business must appear here.

    2. Your business address and other contact information goes in the top-right corner.

    3. Some general information about your business must appear in the bottom-left “content” part of the page. This should be appropriate reading for someone coming to your site for the first time and should therefore be designed to catch the attention of visitors and make them want to explore the rest of your site as it develops.

    4. Finally, a photographic image relevant to your business must go into the bottom-right corner of your page. You can take such a photo yourself and upload it, or download one from the Internet, provided you do not violate any copyright laws in so doing.

  3. Your second task is to revise and extend the single-page website you created in the previous exercise. The goal is to have it “parallel” either our nature2 website or our nature3 website. Recall that nature3 made use of SSI, while nature2 did not. You should make sure that SSI is enabled on your sever; otherwise, you run the risk of getting into the kind of “maintenance nightmare” situation we described earlier in this chapter.

    If this exercise is being assigned in a course, your instructor may ask you to produce a website that is “parallel” to either nature2, or to nature3, or better yet, produce two different websites, one that “parallels” each of nature2 and nature3. It is a very useful exercise to produce two different websites that look and behave exactly the same, but are constructed quite differently behind the scenes, and that is what you would be doing. Your instructor may or may not insist that your pages validate as HTML5 for this exercise, but you should try to make sure they do.

    So, in any case, your home page should have an appearance analogous to the display in Figure 3.12. Recall that this display would be the same for both versions of our multi-file site (i.e., for both nature2/index.html and nature3/index.html). The same is true of any other two files having the same name in the subdirectories nature2/pages and nature3/pages. For simplicity and consistency, make your menu links the same as the (generic) ones we have used (at least for the time being), except that our link called Your Health clearly must be replaced with a link more specific to, and appropriate for, your own business.

    All links on your home page must be active, that is, each one must link to an actual page and not be a “broken link”, and some of those pages must also have their own “submenu” links to other pages in a column at the left, in the manner of our sample site. One thing to note in this context is that if you click on one of these submenu links you get another page in the general context of the main menu link. For example, if you click on the Your Health menu option, there are four submenu links at the left of the resulting page. If you then click on Ask An Expert, you get that page, but the submenu links at the left now include only the other three that were there before. This behavior should be consistent throughout our site, and your site should emulate this behavior as well.

    You need not have exactly the same number of files as our sample site, and of course the content of these additional pages will depend on the nature of your chosen business. Many of these pages can contain a short “coming soon” message, similar to that found on many of the pages of our sample site. But that message should be at least a short paragraph of one or two sentences saying something about just what is “coming soon”.

What Else You May Want or Need to Know

  1. In keeping with our need-to-know approach in learning new material, in this chapter we have introduced you to only a very small selection of the HTML tags that are available to you when you are constructing the HTML document for one of your web pages. You are likely already curious to see what else is available, and to begin experimenting with various other tags and their attributes. There is no better place to do this than at the W3 Schools site, several links to which are given in the References section that follows. This is a wonderful site to explore. You will find both reference material and tutorials, as well as examples that you can modify and experiment with right there on the site. However, here, we give a summary list of some tags that includes all those we have discussed, as well as some that are closely related to those we discussed, and some new ones. You should explore further as many of these as you can find time for on your own by going to the website mentioned above, because you will find them useful for constructing your own web pages.

    • html, head, title, body The “infrastructure” elements used to set up any website.

    • link The (empty) tag to place in the head element of your document if it needs to link to an external document (such as a CSS style sheet, discussed in the next chapter).

    • meta The (empty) tag to place in the head element of your document if your document needs to make available to some external processing agent some high-level information about itself.

    • base The tag that allows us to set the value of a “base” URL to use for our website, and to which any other relative href value on our site will be appended before that href value is used as a link destination.

    • h1, h2, h3, h4, h5, h6 The heading tags that give progressively smaller text.

    • p The ubiquitous paragraph tag, one of the most frequently used.

    • ul, ol, li The tags for bulleted (unnumbered) or numbered generic lists (which can be nested), and their items.

    • dl, dt, dd The tags for a special kind of list—a definition list—which is convenient when you are defining or explaining a sequence of terms.

    • table, tr, td, th, caption The tags for tables, with rows and columns. The th is a new tag (for us) that is often used for the first table cell of a row or the top table cell of a column if the content of that cell is to be used as the label for the rest of the corresponding row or column. Use th instead of td if you want the text content to appear in bold and centered within the table cell. The caption element, which we did not use, allows us to provide a caption for any of our tables.

    • br The (empty) line-break tag that moves the text following it in a paragraph to the next line, without adding any vertical space.

    • hr The (empty) horizontal-line tag that creates a horizontal line on a web page, often used for separation purposes.

    • img The (empty) image tag that permits you to place images on your web pages.

    • strong, em Tags that emphasize text by making it (usually) bold or italic, respectively. There are also b and i tags for bold and italic, but strong and em are preferred instead, since they provide “logical emphasis”, which can let a browser decide how they should be rendered, as opposed to the “physical emphasis” of b and i, which insist on bold and italic.

    • small A tag used to make text “smaller” (than the surrounding text). There used to be a big tag as well, but for subtle reasons we need not try to explain here it is no longer supported in HTML5, while small does continue to be supported.

    • pre The tag to use if you want your text to retain the format you used when you typed it into your web page document.

    • blockquote, q The tags for two kinds of quotations: blockquote if you want your text to have extra space before and after it, and indented margins, and q if you just want quotation marks around it.

    • address, dfn, var, cite Tags to designate an address, a definition, a variable, and a citation, generally rendered in italics.

    • code, samp, kbd Tags to designate computer code, sample code or data, and keyboard input, generally rendered in monospace.

    • div, span Tags for designating parts of a page for processing of some kind (such as CSS styling or responding to JavaScript events).

    • article, aside, details, figcaption, figure, footer, header, main, mark, nav, section, summary, time Semantic tags new to HTML5, which of course contains many other kinds of new tags. For example, there is a new video tag, which you will see in the next chapter.

    • applet, basefont, big, center, dir, font, frame, frameset, isindex, menu, s, strike, u, xmp Tags that are deprecated (should no longer be used in new web pages), but are nevertheless still supported by many browsers.

  2. As you have seen, HTML tags can have attributes. These attributes fall into several categories:

    1. Some attributes are required, such as the src attribute and the alt attribute for the img tag, for example.

    2. Some attributes are optional, such as the width attribute and the height attribute for the img tag, for example. Note that sometimes, as in the case of these two, even though an attribute is optional it may not be a good idea to omit it.

    3. Some attributes are called core attributes (or standard attributes) because they can be used with virtually any HTML tag. These tags are also optional. These include the id and class attributes that you will meet in the next chapter in the context of CSS, as well as the title and style attributes.

    4. Some attributes are called event attributes, because they can be used to fire up a JavaScript script under certain conditions (when certain “events” happen). For example, the onclick attribute might have as its value a JavaScript script that would run when the element with that attribute was clicked on by the user. We will look at event attributes in a later chapter on JavaScript. These attributes are also optional.

  3. One of the reasons we often do not provide attributes for our tags when they are optional is because optional tags have default values that tend to be what we want most of the time. It is helpful to become familiar with the default values of commonly used tag attributes.

    For example, the p tag has an attribute called align, which may take any of the values “left”, “right”, “center”, or “justify”. Fortunately, the default value is left, since we want our paragraphs to be left-justified most of the time. Furthermore, this attribute is actually deprecated, so you should use CSS to get any alignment effect other than left that you would like to achieve, rather than get it by using this attribute.

  4. You know that clicking on a link will usually take you to the beginning of the page at the end of that link, but sometimes the href value of a link will look like

    http://mysite.com/mypage.html#markedspot
    

    and in this case clicking on the corresponding link will take you, as usual, to mypage.html on mysite.com, but rather than displaying that page from the beginning, the browser will start its display of the page at the place on the page identified by an a tag having an id attribute with a value of "markedspot". Such a location “somewhere down the page” is often called a bookmark. For example, if you wanted to go to a certain h1 heading on the page mypage.html by clicking on the above link, you could identify that h1 heading like this:

    <h1><a id="markedspot">This is the certain heading</a></h1>
    
  5. HTML5 became an accepted standard in 2014, but you should not think that’s the end of anything. The World Wide Web and all of its “standards” are all moving targets, and you should try to keep your finger on the pulse of developments that are taking place.

  6. We mentioned briefly in this chapter the new HTML5 content model, when we introduced the older notion of block-level and inline-level elements. Studying the details of this model would be overkill in our context, but it is something you should be aware of and something that you might find useful as time goes on. Check out the link on this subject in the following References section, and come back to it from time to time until you get a “feel” for what it’s all about. You will probably design and write better web pages as a result.

  7. When you want to validate one of your web pages, one of the things you need to ensure is that it contains a meta element like the following:

    <meta charset="utf-8">
    

    There are many character sets available for use on the web and elsewhere, but as of the time of this writing utf-8 was the dominant one in use on the web by a very wide margin. UTF is an acronym for Unicode Transformation Format, and this particular character encoding is so useful because it is capable of representing pretty much any character you might want to use, in virtually any natural language. We provide a link in the References section if you wish to read more about it.

References

  • 1. Many of our references here and in the rest of the text will refer you to particular pages on the W3 Schools site, since the site is very useful from both a tutorial and reference perspective, but you may want to start at the home page and explore, so here it is:

    http://www.w3schools.com/
    
  • 2. Another tutorial site that you may find useful now and later is this one:

    http://www.tizag.com/
    
  • 3. For further information on HTML and XHTML, including their history and the relationship between them, check the following Wikipedia links (the HTML page has a nice picture of Tim Berners-Lee):

    http://en.wikipedia.org/wiki/HTML
    http://en.wikipedia.org/wiki/XHTML
    
  • 4. You will find lists of available HTML entities here:

    http://www.w3schools.com/html/html_entities.asp
    https://dev.w3.org/html5/html-author/charref
    
  • 5. Anytime you look up an HTML tag on the W3 Schools site, you will also find information on its attributes and their values, but for a general overview of tag attributes see the following links:

    http://www.w3schools.com/tags/ref_standardattributes.asp
    http://www.w3schools.com/tags/ref_eventattributes.asp
    
  • 6. For further information on the differences between HTML and XHTML, see the following link:

    http://www.w3schools.com/html/html_xhtml.asp
    
  • 7. For further information on which elements can be nested inside other elements, see:

    http://www.cs.tut.fi/~jkorpela/html/nesting.html
    
  • 8. To keep an eye on what’s happening with HTML5, you can check out the actual specification at

    http://www.w3.org/TR/html5/
    
  • and a more web-author-friendly version of the specification here

    http://dev.w3.org/html5/spec-author-view/
    
  • as well as a useful page summarizing all markup tags (with a clear indication of which tags are new and which have altered semantics) at this location:

    http://dev.w3.org/html5/markup/
    
  • The W3C itself seems to be in a bit of a promotional mood with respect to HTML5, and you too can get on board at this site:

    http://www.w3.org/html/logo/
    
  • To read about the differences between HTML5 and HTML4 look here:

    http://www.w3.org/TR/html5-diff/
    
  • The Web Hypertext Application Technology Working Group (WHATWG, pronounced “What-Wig”) is a community of folks interested in the “practical” evolution of the web. It was this group whose work eventually convinced the W3C that continued development of the XHTML standard would not be a good idea. Both the W3C and the WHATWG are now fully behind the HTML5 effort, though they remain separate entities. For more information, go to this site and check out the FAQ (Frequently Asked Questions) section in particular:

    http://www.whatwg.org/
    
  • 9. For further information on the HTML5 content model, check this link:

    http://www.w3.org/TR/2011/WD-html5-20110525/content-models.html
    
  • 10. Here’s a link to the Wikipedia article on UTF-8:

    https://en.wikipedia.org/wiki/UTF-8
    
  • 11. An interesting history of some web developments, including some perhaps-not-so-well-known details, and featuring a number of fascinating photos, can be found here:

    http://www.w3.org/People/Raggett/book4/ch02.html
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.22.169