Understanding HTTP

As you learned earlier, web servers and browsers communicate using the Hypertext Transfer Protocol (HTTP). HTTP (1.1) is described in RFC 2616, and later documents have extended HTTP functionality. The purpose of HTTP is to support the transfer of HTML documents. HTTP is an application-level protocol. The HTTP client and server applications use the reliable TCP transport protocol to establish a connection.

HTTP does the following:

  • Establishes a connection between the browser (the client) and the server

  • Negotiates settings and establish parameters for the session

  • Provides for the orderly transfer of HTML content

  • Closes the connection with the server

Although the nature of Web communication has become extremely complex, most of that complexity relates to how the server builds the HTML content and what the browser does with the content it receives. The actual process of transferring the content through HTML is relatively uncluttered.

When you enter a URL into the browser window, the browser first checks the scheme of the URL to determine the protocol. (Most web browsers support other protocols besides HTTP.) If the browser determines that the URL refers to a resource on an HTTP site, it extracts the DNS name from the URL and initiates the name resolution process. The client computer sends the DNS lookup request to a name server and receives the server’s IP address. The browser then uses the server’s IP address to initiate a TCP connection with the server. (See Hour 6, “The Transport Layer,” for more on TCP.)

By the Way

In older versions of HTTP (before version 1.1), the client and server opened a new TCP connection for each item transferred. Recent versions of HTTP allow the client and server to maintain a persistent connection.


After the TCP connection is established, the browser uses the HTTP GET command to request the web page from the server. The GET command contains the URL of the resource the browser is requesting and the version of HTTP the browser wants to use for the transaction. In most cases, the browser can send the relative URL with the GET request (rather than the full URL) because the connection with the server has already been established:

GET /watergate/tapes/transcript HTTP/1.1

Several other optional field:value pairs might follow the GET command, specifying settings such as the language, browser type, and acceptable file types.

The server response consists of a header followed by the requested document. The format of the response header is

HTTP/1.1 status_code reason-phrase
field:value
field:value...

The status code is a three-digit number describing the status of the request. The reason-phrase is a brief description of the status. Some common status codes are shown in Table 17.3. As you can see, the leftmost digit of the code identifies a general category. The 100s are informational; the 200s denote success; the 300s specify redirection; the 400s show a client error; and the 500s specify a server error. You might be familiar with the famous 404 code, which often appears in response to a missing page or a mistyped URL. Like the client request, the server response can also include a number of optional field:value pairs. Some of the header fields are shown in Table 17.4. Any field that is not understood by the browser is ignored.

Table 17.3. Some Common HTTP Status Codes
CodeReason-PhraseDescription
100ContinueRequest is in process.
200OKRequest is successful.
202AcceptedRequest accepted for processing but not finished.
301Moving PermanentlyResource has a new address.
302Moving TemporarilyResource has a new temporary address.
400Bad RequestServer doesn’t recognize the request.
401UnauthorizedAuthorization failed.
404Not FoundResource requested doesn’t exist.
406Not AcceptableContent will not be acceptable to browser.
500Internal Server ErrorServer encountered error.
503Service UnavailableServer is overloaded or not working.

Table 17.4. Examples of HTTP Header Fields
FieldValue Must BeDescription
Content-LengthintegerSize of the content object in octets
Content-Encodingx-compress x-gzipValue representing the type of encoding associated with the message
DateStandard date format defined in RFC 850Date in Greenwich Mean when the object was created
Last-modified dateStandard date format defined in RFC 850Date in Greenwich Mean Time when the object was last modified
Content-LanguageLanguage code per ISO 3316The language in which the object was written

As you can see from Table 17.4, some of the header fields are purely informational. Other header fields might contain information used to parse and process the incoming HTML document.

By the Way

The header field format used with HTML is borrowed from the email header format specified in RFC 822.


The Content-Length field is particularly important. In the earlier HTTP version 1.0, each request/response cycle required a new TCP connection. The client opened a connection and initiated a request. The server fulfilled the request and then closed the connection. In that situation, the client knew when the server had stopped sending data because the server closed the TCP connection. Unfortunately, this process required the increased overhead necessary for continually opening and closing connections. HTTP 1.1 allows the client and server to maintain the connection for longer than a single transmission. In that case, the client needs some way of knowing when a single response is finished. The Content-Length field specifies the length of the HTML object associated with the response. If the server doesn’t know the length of the object it is sending—a situation increasingly common with the appearance of Dynamic HTML—the server sends the header field Connection:close to notify the browser that the server will specify the end of the data by closing the connection.

HTTP also supports a negotiation phase in which the server and browser agree to common settings for certain format and preference options.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.195.97