Introduction to HTTP

Before looking at Java servlets in more detail, you will need an understanding of the Web protocol, HTTP (Hypertext Transfer Protocol), and how a browser interprets HTML (Hypertext Markup Language) to display a Web page. If you are comfortable with these topics, feel free to skip to the next section, titled “The Servlet Environment.”

HTTP is a protocol standard specified by the Internet Engineering Task force, and its current version is available as RFC 2616 available from http://www.ietf.org/rfc.html.

HTTP Request Structure

HTTP is a stateless protocol, and this has the advantage that the server does not have the overhead of tracking client connections. This was completely satisfactory when the primary use of the Web was to transfer static data. Realistically, most dynamic Web applications now require interaction between the client and the server, and information about the client state must be retained between page requests. Later, you will learn how a servlet can overcome the stateless nature of the HTTP protocol by tracking client state using session information stored in the URL, hidden fields or cookies.

HTTP transactions are either a request or a response. Regardless of which type they are, all HTTP transactions have three parts:

  • A single request or response line— A client request line consists of an HTTP method (usually GET or POST) followed by a document address and the HTTP version number being used. For example,

    GET /contents.html HTTP/1.1
    

    uses the HTTP GET method to request the document contents.html using HTTP version 1.1. The response line contains an HTTP status code that indicates whether the request was successful (understood and satisfied) or if not, why not.

  • The HTTP headers— A set of fields used to exchange information between the client and the server. For example, the following tells the server that the client will accept the IOS8859.5 and unicode1.1 character sets:

    Accept-Charset: iso-8859-5, unicode-1-1
    

    The client uses the headers to tell the server about its configuration and the document types it will accept. The server, in turn, uses the header to return information about the requested document, such as its age and location.

  • The HTTP body— The HTTP body is optionally used by the client to send any additional information (see POST method). The server uses the body to return the requested document.

Listing 12.1 shows a sample GET request. A GET request does not have a body, so this example only contains the request line and headers.

Listing 12.1. An Example HTTP GET Request
GET /some/url.html HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,application/vnd.ms-excel,
 application/msword, application/vnd.ms-powerpoint, */*
Referer: http://www.somewhere.com/search?sourceid=navclient&q=http+request+
Accept-Language: en-gb
Accept-Encoding: gzip
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: localhost:8000
Connection: Keep-Alive
						

Uniform Resource Identifiers (URIs)

The term Universal Resource Locator (URL) is a well-known term used to identify a resource (such as an electronic document, an image or a mailbox) by its location and the protocol used to access it. A resource could also be accessed by its name and this time a Universal Resource Name (URN) would be used to identify the resource. (In theory, but as yet a method of providing a universal namespace has not been achieved.) Both URLs and URNs are subclasses of Universal Resource Identifiers (URIs).

NOTE

A URL is therefore also a URI and, as URNs are not commonly used, you will often see the two terms used synonymously.


Using a simple syntax, URIs make use of a variety of naming schemes and access methods, such as HTTP, FTP, and Internet mail to identify online resources.

The syntax of an HTTP URL is as follows:

http_URL = "http://" host [ ":" port] [ path ]

where

  • host is a legal Internet host domain name or IP address (in dotted-decimal form).

  • port is the port number (also known as the socket or service number) to connect to on the host. The default port number for HTTP is 80.

  • path is the path to the document on the host.

The term URL is used more often than URI when referring to the HTTP address string and, for this reason, URL is the term that will be used for the rest of today's material.

NOTE

Space characters should be avoided in URL's because they may not be handled correctly on all platforms.


HTTP GET and POST Methods

A Web browser client communicates with the server typically using one of two HTTP methods—GET or POST. Typically, these methods are used as follows:

  • GET is used to request information from the server.

  • POST is used to send data to the server.

But as with many things, it is not quite that simple. The GET method can also be used to pass information in the form of a query string in the URL, and POST can be used for requests.

The following URL with a query string (the data following the ?) is passed by the GET method and sets a parameter called day to the value 12 (you will learn more about parameters later when you code some real servlets).

http://localhost:8000/j2ee?day=12

Because the query string is added to the end of the URL, information that is sent as part of a GET request is visible to the client and can be bookmarked and therefore re-run later. You will have seen examples of this many times when browsing the Web, especially when using search engines.

In contrast, the POST method sends its data directly after the HTTP header, in the body of the message, and does not append data to the URL (so even if you bookmark the page the data is not available later). A browser or server is only required to handle GET request lines of 255 characters; therefore, when sending large amounts of information (such as a complex HTML form) the POST request should be used.

In most other respects, the GET and POST methods can be thought of as the same. They both interact with the server and can be used to update or change the current Web page and change server-side properties. As a developer you usually code up your servlets to handle both GET and POST requests.

Other HTTP Methods

The following HTTP methods are used less often, but are covered here for completeness.

  • HEAD This method can be used if the client wants information about a document but does not want the document to be returned. Following a HEAD request, the server responds with the HTTP headers only; no HTTP body is sent.

  • PUT Requests the server to store the body of the request at a specified URL.

  • DELETE Requests the removal of data at a URL.

  • OPTIONS Requests information about the communications options available.

  • TRACE Used for debugging. The HTTP body is simply returned by the server.

HTTP Response Structure

After processing a request an HTTP server sends back an HTTP response to the client. The response has a single response line indicating the status of the response followed by optional header lines and the requested resource. The header will look something like Listing 12.2.

Listing 12.2. HTTP Response Header
HTTP/1.1 200 OK
Date: Tue, 20 Nov 2001 09:23:44 GMT
Server: Netscape-Enterprise/3.5.1G
Last-modified: Mon, 12 Nov 2001 15:31:26 GMT
Content-type: text/html
Content-length: 2048
Page-Completion-Status: Normal

The server sends back a status code as part of the first line of the response followed by header-fields describing the document. A blank line separates the header from the document itself.

Most of the time, the status code is handled by the browser, but you will be familiar with one or two that are reported to the end user. In particular, you will have no doubt seen the ubiquitous 404 Not Found error that is sent when the server was unable to find the requested URL.

To aid in coding (and debugging) your servlets, it is useful to have a knowledge of the HTTP status codes. Status codes are grouped as shown in Table 12.1.

Table 12.1. HTTP Status Code Groups
CodeDescription
100-199Information indicating that the request has been received and is being processed.
200-299Request was successful.
300-399Further action is required.
400-499Request is incomplete.
500-599Server error has occurred.

The handling of status codes is browser specific, but some status codes you may see include those shown in Table 12.2.

Table 12.2. HTTP Status Codes
CodeErrorDescription
400Bad RequestThe server detected a syntax error in the request.
401UnauthorizedThe request did not have the correct authorization.
403ForbiddenThe request was denied, reason unknown.
404Not FoundThe document was not found.
500Internal Server ErrorUsually indicates that part of the server (probably your servlet) has crashed.
501Not ImplementedThe server cannot perform the requested action.

As a servlet writer, most of these errors are outside of your control. A 501 error however is generated by the server when a servlet is sent an HTTP request that it does not handle. For example, if you write your servlet to handle only GET requests, but it receives a POST request, a 501 status will be returned.

As part of the response headers, a content type field is used to indicate the format of the data that is being sent in the response. The value for this field is in the Multipurpose Internet Mail Extensions (MIME) type—also used to describe the contents of an email.

NOTE

You can also find out more about MIME by reading RFCs 2045 through to 2049, which you can access at http://www.ietf.org/rfc.html.


Some self-explanatory MIME content types are as follows:

  • text/html

  • text/plain

  • image/gif

  • application/pdf

The browser can also specify in the request header the MIME types that it will accept.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.72.133