HTTP, the Hypertext Transfer Protocol, is the standard protocol for communication between web browsers and web servers. HTTP specifies how a client and server establish a connection, how the client requests data from the server, how the server responds to that request, and finally how the connection is closed. HTTP connections use the TCP/IP protocol for data transfer.
HTTP 1.0 is the currently accepted version of the protocol. It uses MIME to encode data. The basic protocol defines a sequence of four steps for each request from a client to the server:
Making the connection. The client establishes a TCP connection to the server, on port 80 by default; other ports may be specified in the URL.
Making a request. The client sends a message to the server requesting the page at a specified URL. The format of this request is typically something like:
GET /index.html HTTP 1.0
GET
is a keyword. /index.html
is a relative URL to a file on the server. The file is assumed to be
on the machine that receives the request, so there is no need to
prefix it with http://www.thismachine.com/
.
HTTP 1.0
is the version of the protocol that the
client understands. The request is terminated with two carriage
return/linefeed pairs (
in Java parlance)
regardless of how lines are terminated on the client or server
platform.
Although the GET
line is all that is required, a
client request can include other information as well. This takes the
following form:
Keyword: Value
The most common such keyword is Accept
, which
tells the server what kinds of data the client can handle (though
servers often ignore this). For example, the following line says that
the client can handle four MIME types, corresponding to HTML
documents, plain text, and JPEG and GIF images:
Accept: text/html, text/plain, image/gif, image/jpeg
User-Agent
is another common keyword that lets the
server know what browser is being used. This allows the server to
send files optimized for the particular browser type. The line below
says that the request comes from Version 2.4 of the Lynx
browser:
User-Agent: Lynx/2.4 libwww/2.1.4
Finally, the request is terminated with a blank line; that is, two
carriage return/linefeed pairs,
. A
complete request might look like:
GET /index.html HTTP 1.0 Accept: text/html Accept: text/plain User-Agent: Lynx/2.4 libwww/2.1.4
In addition to GET
, there are several other
request types. HEAD
retrieves only the header for
the file, not the actual data. This is commonly used to check the
modification date of a file, to see whether a copy stored in the
local cache is still valid. POST
sends form data
to the server, and PUT
uploads a file to the
server.
The response. The server sends a response to the client. The response begins with a response code, followed by MIME header information, then a blank line, then the requested document or an error message. Assuming the requested file is found, a typical response looks like this:
HTTP 1.0 200 OK Server: NCSA/1.4.2 MIME-version: 1.0 Content-type: text/html Content-length: 107 <html> <Head> <Title> A Sample HTML file </Title> </Head> <body> The rest of the document goes here </body> </html>
The first line indicates the protocol the server is using
(HTTP 1.0
), followed by a response code.
200 OK
is the most common response code,
indicating that the request was successful. Table 3.1 is a complete list of the response codes used
by HTTP 1.0; HTTP 1.1 adds many more to this list. The other header
lines identify the server software (the NCSA server, Version 1.4.2),
the version of MIME in use, the MIME content type, and the length of
the document delivered (not counting this header)—in this case,
107 bytes.
Closing the connection. Either the client or the server or both close the connection. Thus, a separate network connection is used for each request. If the client reconnects, the server retains no memory of the previous connection or its results. A protocol that retains no memory of past requests is called stateless ; in contrast, a stateful protocol such as FTP can process many requests before the connection is closed. The lack of state is both a strength and a weakness of HTTP.
Table 3-1. HTTP 1.0 Response Codes
Response Code |
Meaning |
---|---|
2xx Successful |
Response codes between 200 and 299 indicate that the request was received, understood, and accepted. |
200 OK |
This is the most common response code. If the request used
|
201 Created |
The server has created a data file at a URL specified in the body of
the response. The web browser should now attempt to load that URL.
This is sent only in response to |
202 Accepted |
This rather uncommon response indicates that a request (generally
from |
204 No Content |
The server has successfully processed the request but has no information to send back to the client. This is usually the result of a poorly written form-processing CGI program that accepts data but does not return a response to the user indicating that it has finished. |
3xx Redirection |
Response codes from 300 to 399 indicate that the web browser needs to go to a different page. |
300 Multiple Choices |
The page requested is available from one or more locations. The body
of the response includes a list of locations from which the user or
web browser can pick the most appropriate one. If the server prefers
one of these choices, the URL of this choice is included in a
|
301 Moved Permanently |
The page has moved to a new URL. The web browser should automatically load the page at this URL and update any bookmarks that point to the old URL. |
302 Moved Temporarily |
This unusual response code indicates that a page is temporarily at a new URL but that the document’s location will change again in the foreseeable future, so bookmarks should not be updated. |
304 Not Modified |
The client has performed a |
4xx Client Error |
Response codes from 400 to 499 indicate that the client has erred in
some fashion, though this may as easily be the result of an
unreliable network connection as it is of a buggy or nonconforming
web browser. The browser should stop sending data to the server as
soon as it receives a 4xx response. Unless it is responding to a
|
400 Bad Request |
The client request to the server used improper syntax. This is rather unusual, though it is likely to happen if you’re writing and debugging a client. |
401 Unauthorized |
Authorization, generally username and password controlled, is required to access this page. Either the username and password have not yet been presented or the username and password are invalid. |
403 Forbidden |
The server understood the request but is deliberately refusing to process it. Authorization will not help. One reason this occurs is that the client asks for a directory listing but the server is not configured to provide it, as shown in Figure 3.1. |
404 Not Found |
This most common error response indicates that the server cannot find the requested page. It may indicate a bad link, a page that has moved with no forwarding address, a mistyped URL, or something similar. |
5xx Server Error |
Response codes from 500 to 599 indicate that something has gone wrong with the server, and the server cannot fix the problem. |
500 Internal Server Error |
An unexpected condition occurred that the server does not know how to handle. |
501 Not Implemented |
The server does not have the feature that is needed to fulfill this
request. A server that cannot handle |
502 Bad Gateway |
This response is applicable only to servers that act as proxies or gateways. It indicates that the proxy received an invalid response from a server it was connecting to in an effort to fulfill the request. |
503 Service Unavailable |
The server is temporarily unable to handle the request, perhaps because overloading or maintenance. |
HTTP 1.1 more than doubles the number of responses. However, a response code from 200 to 299 always indicates success; a response code from 300 to 399 always indicates redirection; one from 400 to 499 always indicates a client error; and one from 500 to 599 indicates a server error.
HTTP 1.0 is documented in the informational RFC 1945; it is not an official Internet standard because it was primarily developed outside the IETF by early browser and server vendors. HTTP 1.1 is a proposed standard being developed by the W3C and the HTTP working group of the IETF. It provides for much more flexible and powerful communication between the client and the server. It’s also a lot more scalable.
The primary improvement in HTTP 1.1 is state. HTTP 1.0 opens a new connection for every request. In practice, the time taken to open and close all the connections opened in a typical web session can outweigh the time taken to transmit the data, especially for sessions with many small documents. HTTP 1.1 allows a browser to send many different requests over a single connection; the connection remains open until it is explicitly closed. The requests and responses are all asynchronous. A browser doesn’t need to wait for a response to its first request before sending a second or a third. However, it remains tied to the basic pattern of a client request, followed by a server response that consists of a series of headers, followed by a blank line, followed by MIME-encoded data.
There are a lot of other smaller improvements in HTTP 1.1. Requests
include a Host
MIME header so that one web server
can easily serve different sites at different URLs. Servers and
browsers can exchange compressed files and particular byte ranges of
a document, both of which can decrease network traffic. And HTTP 1.1
is designed to work much better with proxy servers. Although HTTP 1.1
isn’t quite finished, it is relatively stable, and most major
web servers implement at least some parts of it. Web clients (that
is, browsers) are a little further behind, but the more recent
browsers implement parts as well. HTTP 1.1 is a strict superset of
HTTP 1.0, so HTTP 1.1 web servers have no trouble interacting with
older browsers that speak only HTTP 1.0.
18.116.90.141