HTTP

HTTP, the Hypertext Transfer Protocol, is the standard protocol for communication between web browsers and web servers. HTTP specifies how a client and server establish a connection, how the client requests data from the server, how the server responds to that request, and finally how the connection is closed. HTTP connections use the TCP/IP protocol for data transfer.

HTTP 1.0 is the currently accepted version of the protocol. It uses MIME to encode data. The basic protocol defines a sequence of four steps for each request from a client to the server:

  1. Making the connection. The client establishes a TCP connection to the server, on port 80 by default; other ports may be specified in the URL.

  2. Making a request. The client sends a message to the server requesting the page at a specified URL. The format of this request is typically something like:

    GET /index.html HTTP 1.0

    GET is a keyword. /index.html is a relative URL to a file on the server. The file is assumed to be on the machine that receives the request, so there is no need to prefix it with http://www.thismachine.com/. HTTP 1.0 is the version of the protocol that the client understands. The request is terminated with two carriage return/linefeed pairs ( in Java parlance) regardless of how lines are terminated on the client or server platform.

    Although the GET line is all that is required, a client request can include other information as well. This takes the following form:

    Keyword: Value

    The most common such keyword is Accept, which tells the server what kinds of data the client can handle (though servers often ignore this). For example, the following line says that the client can handle four MIME types, corresponding to HTML documents, plain text, and JPEG and GIF images:

    Accept: text/html, text/plain, image/gif, image/jpeg

    User-Agent is another common keyword that lets the server know what browser is being used. This allows the server to send files optimized for the particular browser type. The line below says that the request comes from Version 2.4 of the Lynx browser:

    User-Agent: Lynx/2.4 libwww/2.1.4

    Finally, the request is terminated with a blank line; that is, two carriage return/linefeed pairs, . A complete request might look like:

    GET /index.html HTTP 1.0
    Accept: text/html
    Accept: text/plain
    User-Agent: Lynx/2.4 libwww/2.1.4

    In addition to GET, there are several other request types. HEAD retrieves only the header for the file, not the actual data. This is commonly used to check the modification date of a file, to see whether a copy stored in the local cache is still valid. POST sends form data to the server, and PUT uploads a file to the server.

  3. The response. The server sends a response to the client. The response begins with a response code, followed by MIME header information, then a blank line, then the requested document or an error message. Assuming the requested file is found, a typical response looks like this:

    HTTP 1.0 200 OK
    Server: NCSA/1.4.2
    MIME-version: 1.0
    Content-type: text/html
    Content-length: 107
    
    <html>
    <Head>
    <Title>
    A Sample HTML file
    </Title>
    </Head>
    <body>
    The rest of the document goes here
    </body>
    </html>

    The first line indicates the protocol the server is using (HTTP 1.0), followed by a response code. 200 OK is the most common response code, indicating that the request was successful. Table 3.1 is a complete list of the response codes used by HTTP 1.0; HTTP 1.1 adds many more to this list. The other header lines identify the server software (the NCSA server, Version 1.4.2), the version of MIME in use, the MIME content type, and the length of the document delivered (not counting this header)—in this case, 107 bytes.

  4. Closing the connection. Either the client or the server or both close the connection. Thus, a separate network connection is used for each request. If the client reconnects, the server retains no memory of the previous connection or its results. A protocol that retains no memory of past requests is called stateless ; in contrast, a stateful protocol such as FTP can process many requests before the connection is closed. The lack of state is both a strength and a weakness of HTTP.

Table 3-1. HTTP 1.0 Response Codes

Response Code

Meaning

2xx Successful

Response codes between 200 and 299 indicate that the request was received, understood, and accepted.

200 OK

This is the most common response code. If the request used GET or POST, then the requested data is contained in the response, along with the usual headers. If the request used HEAD, then only the header information is included.

201 Created

The server has created a data file at a URL specified in the body of the response. The web browser should now attempt to load that URL. This is sent only in response to POST requests.

202 Accepted

This rather uncommon response indicates that a request (generally from POST) is being processed, but the processing is not yet complete so no response can be returned. The server should return an HTML page that explains the situation to the user, provides an estimate of when the request is likely to be completed, and, ideally, has a link to a status monitor of some kind.

204 No Content

The server has successfully processed the request but has no information to send back to the client. This is usually the result of a poorly written form-processing CGI program that accepts data but does not return a response to the user indicating that it has finished.

3xx Redirection

Response codes from 300 to 399 indicate that the web browser needs to go to a different page.

300 Multiple Choices

The page requested is available from one or more locations. The body of the response includes a list of locations from which the user or web browser can pick the most appropriate one. If the server prefers one of these choices, the URL of this choice is included in a Location header, which web browsers can use to load the preferred page.

301 Moved Permanently

The page has moved to a new URL. The web browser should automatically load the page at this URL and update any bookmarks that point to the old URL.

302 Moved Temporarily

This unusual response code indicates that a page is temporarily at a new URL but that the document’s location will change again in the foreseeable future, so bookmarks should not be updated.

304 Not Modified

The client has performed a GET request but used the If-Modified-Since header to indicate that it wants the document only if it has been recently updated. This status code is returned because the document has not been updated. The web browser will now load the page from a cache.

4xx Client Error

Response codes from 400 to 499 indicate that the client has erred in some fashion, though this may as easily be the result of an unreliable network connection as it is of a buggy or nonconforming web browser. The browser should stop sending data to the server as soon as it receives a 4xx response. Unless it is responding to a HEAD request, the server should explain the error status in the body of its response.

400 Bad Request

The client request to the server used improper syntax. This is rather unusual, though it is likely to happen if you’re writing and debugging a client.

401 Unauthorized

Authorization, generally username and password controlled, is required to access this page. Either the username and password have not yet been presented or the username and password are invalid.

403 Forbidden

The server understood the request but is deliberately refusing to process it. Authorization will not help. One reason this occurs is that the client asks for a directory listing but the server is not configured to provide it, as shown in Figure 3.1.

404 Not Found

This most common error response indicates that the server cannot find the requested page. It may indicate a bad link, a page that has moved with no forwarding address, a mistyped URL, or something similar.

5xx Server Error

Response codes from 500 to 599 indicate that something has gone wrong with the server, and the server cannot fix the problem.

500 Internal Server Error

An unexpected condition occurred that the server does not know how to handle.

501 Not Implemented

The server does not have the feature that is needed to fulfill this request. A server that cannot handle POST requests might send this response to a client that tried to POST form data to it.

502 Bad Gateway

This response is applicable only to servers that act as proxies or gateways. It indicates that the proxy received an invalid response from a server it was connecting to in an effort to fulfill the request.

503 Service Unavailable

The server is temporarily unable to handle the request, perhaps because overloading or maintenance.

HTTP 1.1 more than doubles the number of responses. However, a response code from 200 to 299 always indicates success; a response code from 300 to 399 always indicates redirection; one from 400 to 499 always indicates a client error; and one from 500 to 599 indicates a server error.

HTTP 1.0 is documented in the informational RFC 1945; it is not an official Internet standard because it was primarily developed outside the IETF by early browser and server vendors. HTTP 1.1 is a proposed standard being developed by the W3C and the HTTP working group of the IETF. It provides for much more flexible and powerful communication between the client and the server. It’s also a lot more scalable.

The primary improvement in HTTP 1.1 is state. HTTP 1.0 opens a new connection for every request. In practice, the time taken to open and close all the connections opened in a typical web session can outweigh the time taken to transmit the data, especially for sessions with many small documents. HTTP 1.1 allows a browser to send many different requests over a single connection; the connection remains open until it is explicitly closed. The requests and responses are all asynchronous. A browser doesn’t need to wait for a response to its first request before sending a second or a third. However, it remains tied to the basic pattern of a client request, followed by a server response that consists of a series of headers, followed by a blank line, followed by MIME-encoded data.

There are a lot of other smaller improvements in HTTP 1.1. Requests include a Host MIME header so that one web server can easily serve different sites at different URLs. Servers and browsers can exchange compressed files and particular byte ranges of a document, both of which can decrease network traffic. And HTTP 1.1 is designed to work much better with proxy servers. Although HTTP 1.1 isn’t quite finished, it is relatively stable, and most major web servers implement at least some parts of it. Web clients (that is, browsers) are a little further behind, but the more recent browsers implement parts as well. HTTP 1.1 is a strict superset of HTTP 1.0, so HTTP 1.1 web servers have no trouble interacting with older browsers that speak only HTTP 1.0.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.90.141