As you learned earlier, web servers and browsers communicate using the Hypertext Transfer Protocol (HTTP). HTTP (1.1) is described in RFC 2616, and later documents have extended HTTP functionality. The purpose of HTTP is to support the transfer of HTML documents. HTTP is an application-level protocol. The HTTP client and server applications use the reliable TCP transport protocol to establish a connection.
HTTP does the following:
Establishes a connection between the browser (the client) and the server
Negotiates settings and establish parameters for the session
Closes the connection with the server
Although the nature of Web communication has become extremely complex, most of that complexity relates to how the server builds the HTML content and what the browser does with the content it receives. The actual process of transferring the content through HTML is relatively uncluttered.
When you enter a URL into the browser window, the browser first checks the scheme of the URL to determine the protocol. (Most web browsers support other protocols besides HTTP.) If the browser determines that the URL refers to a resource on an HTTP site, it extracts the DNS name from the URL and initiates the name resolution process. The client computer sends the DNS lookup request to a name server and receives the server’s IP address. The browser then uses the server’s IP address to initiate a TCP connection with the server. (See Hour 6, “The Transport Layer,” for more on TCP.)
By the Way
In older versions of HTTP (before version 1.1), the client and server opened a new TCP connection for each item transferred. Recent versions of HTTP allow the client and server to maintain a persistent connection.
After the TCP connection is established, the browser uses the HTTP GET command to request the web page from the server. The GET command contains the URL of the resource the browser is requesting and the version of HTTP the browser wants to use for the transaction. In most cases, the browser can send the relative URL with the GET request (rather than the full URL) because the connection with the server has already been established:
GET /watergate/tapes/transcript HTTP/1.1
Several other optional field:value pairs might follow the GET command, specifying settings such as the language, browser type, and acceptable file types.
The server response consists of a header followed by the requested document. The format of the response header is
HTTP/1.1 status_code reason-phrase field:value field:value...
The status code is a three-digit number describing the status of the request. The reason-phrase is a brief description of the status. Some common status codes are shown in Table 17.3. As you can see, the leftmost digit of the code identifies a general category. The 100s are informational; the 200s denote success; the 300s specify redirection; the 400s show a client error; and the 500s specify a server error. You might be familiar with the famous 404 code, which often appears in response to a missing page or a mistyped URL. Like the client request, the server response can also include a number of optional field:value pairs. Some of the header fields are shown in Table 17.4. Any field that is not understood by the browser is ignored.
Code | Reason-Phrase | Description |
---|---|---|
100 | Continue | Request is in process. |
200 | OK | Request is successful. |
202 | Accepted | Request accepted for processing but not finished. |
301 | Moving Permanently | Resource has a new address. |
302 | Moving Temporarily | Resource has a new temporary address. |
400 | Bad Request | Server doesn’t recognize the request. |
401 | Unauthorized | Authorization failed. |
404 | Not Found | Resource requested doesn’t exist. |
406 | Not Acceptable | Content will not be acceptable to browser. |
500 | Internal Server Error | Server encountered error. |
503 | Service Unavailable | Server is overloaded or not working. |
Field | Value Must Be | Description |
---|---|---|
Content-Length | integer | Size of the content object in octets |
Content-Encoding | x-compress x-gzip | Value representing the type of encoding associated with the message |
Date | Standard date format defined in RFC 850 | Date in Greenwich Mean when the object was created |
Last-modified date | Standard date format defined in RFC 850 | Date in Greenwich Mean Time when the object was last modified |
Content-Language | Language code per ISO 3316 | The language in which the object was written |
As you can see from Table 17.4, some of the header fields are purely informational. Other header fields might contain information used to parse and process the incoming HTML document.
By the Way
The header field format used with HTML is borrowed from the email header format specified in RFC 822.
The Content-Length field is particularly important. In the earlier HTTP version 1.0, each request/response cycle required a new TCP connection. The client opened a connection and initiated a request. The server fulfilled the request and then closed the connection. In that situation, the client knew when the server had stopped sending data because the server closed the TCP connection. Unfortunately, this process required the increased overhead necessary for continually opening and closing connections. HTTP 1.1 allows the client and server to maintain the connection for longer than a single transmission. In that case, the client needs some way of knowing when a single response is finished. The Content-Length field specifies the length of the HTML object associated with the response. If the server doesn’t know the length of the object it is sending—a situation increasingly common with the appearance of Dynamic HTML—the server sends the header field Connection:close to notify the browser that the server will specify the end of the data by closing the connection.
HTTP also supports a negotiation phase in which the server and browser agree to common settings for certain format and preference options.
18.223.195.97