When designing an architecture that
would allow them to build a self-extensible browser, the engineers at
Sun divided the problem into two parts: handling protocols and
handling content. Handling a protocol means taking care of the
interaction between a client and a server: generating requests in the
correct format, interpreting the headers that come back with the
data, acknowledging that the data has been received, etc. Handling
the content means converting the raw data into a format Java
understands, for example, an InputStream
or an
AudioClip
. These two problems, handling protocols
and handling content, are distinct. The software that displays a GIF
image doesn’t care whether the image was retrieved via FTP,
HTTP, gopher, or some new protocol. Likewise, the protocol handler,
which manages the connection and interacts with the server,
doesn’t care if it’s receiving an HTML file or an MPEG
movie file; at most, it will extract a content type from the headers
to pass along to the content handler.
Java divides the task of handling protocols into a number of pieces.
As a result, there is no single class called
ProtocolHandler
. Instead, pieces of the protocol
handler mechanism are implemented by four different class es in the
java.net
package: URL
,
URLStreamHandler
,
URLConnection
, and
URLStreamHandlerFactory
. URL
is
the only concrete class in this group;
URLStreamHandler
and
URLConnection
are both abstract classes, and
URLStreamHandlerFactory
is an interface.
Therefore, if you are going to implement a new protocol handler, you
have to write concrete subclasses for the
URLStreamHandler
and the
URLConnection
. To use these classes, you may also
have to write a class that implements the
URLStreamHandlerFactory
interface.
The way the URL
,
URLStreamHandler
, URLConnection
, and
URLStreamHandlerFactory
classes work together can
be confusing. Everything starts with a URL, which represents a
pointer to a particular Internet resource. Each URL specifies the
protocol used to access the resource; typical values for the protocol
include mailto
, http
, and
ftp
. When you construct a URL
object from the URL’s string representation, the constructor
strips the protocol field and passes it to the
URLStreamHandlerFactory
. The factory’s job
is to take the protocol, locate the right subclass of
URLStreamHandler
for the protocol, and create a
new instance of that stream handler, which is stored as a field
within the URL
object. Each application has at
most one URLStreamHandlerFactory
; once the factory
has been installed, attempting to install another will throw an
Error
.
Now that the URL
object
has a stream handler, it asks the stream handler to finish parsing
the URL string and create a subclass of
URLConnection
that knows how to talk to servers using
this protocol. URLStreamHandler
subclasses and
URLConnection
subclasses always come in pairs; the
stream handler for a protocol always knows how to find an appropriate
URLConnection
for its protocol. It is worth noting
that the stream handler does most of the work of parsing the URL. The
format of the URL, although it is standard, depends on the protocol;
therefore, it must be parsed by a
URLStreamHandler
, which knows about a particular
protocol, and not by the URL
object, which is
generic and thus should have no knowledge of specific protocols. This
also means that if you are writing a new stream handler, you can
define a new URL format that’s appropriate to your task.
The URLConnection
class, which you learned about
in the previous chapter, represents an active connection to the
Internet resource. It is responsible for interacting with the server.
A URLConnection
knows how to generate requests and
interpret the headers that the server returns. The output from a
URLConnection
is the raw data requested with all
traces of the protocol (headers, etc.) stripped, ready for processing
by a content handler.
In most applications, you don’t need to worry about
URLConnection
objects and stream handlers; they
are hidden by the URL
class, which provides a
simple interface to the methods you need. When you call the
getInputStream( )
, getOutputStream( )
,
and getContent( )
methods of the
URL
class, you are really calling similarly named
methods in the URLConnection
class. We have seen
that interacting directly with a URLConnection
can
be convenient when you need a little more control over communication
with a server, most commonly when downloading binary files.
However, the URLConnection
and
URLStreamHandler
classes are even more important
when you need to add new protocols. By writing subclasses of these
classes, you can add support for standard protocols such as finger,
whois, or NTP that Java doesn’t support out of the box.
Furthermore, you’re not limited to established protocols with
well-known services. You can create new protocols that perform
database queries, search across multiple Internet search engines,
view pictures from binary newsgroups, and more. You can add new kinds
of URLs as needed to represent the new types of resources.
Furthermore, Java applications can be built so that they can load new
protocol handlers at runtime. Unlike current browsers such as Mozilla
and Internet Explorer, which contain explicit knowledge of all the
protocols and content types they can handle, a Java browser can be a
relatively lightweight skeleton that loads new handlers as needed.
Supporting a new protocol just means adding some new classes in
predefined locations, not writing an entirely new release of the
browser.
What’s involved in adding support
for a new protocol? As I said earlier, you need to write two new
classes: a subclass of URLConnection
and a
subclass of URLStreamHandler
. You may also need to
write a class that implements the
URLStreamHandlerFactory
interface. Your
URLConnection
subclass handles the interaction
with the server, converts anything the server sends into an
InputStream
, and converts anything the client
sends into an OutputStream
. This subclass must
implement the abstract method connect( )
; it may
also override the concrete methods getInputStream( )
, getOutputStream( )
, and
getContent Type( )
.
The URLStreamHandler
subclass parses the string
representation of the URL into its separate parts and creates a new
URLConnection
object that understands that
URL’s protocol. This subclass must implement the abstract
openConnection( )
method, which returns the new
URLConnection
to its caller. If the
String
representation of the URL doesn’t
look like a standard http URL,
then you should also override the parseURL( )
and
toExternalForm( )
methods.
Finally, you may need to create a class that implements the
URLStreamHandlerFactory
interface. The
URLStreamHandlerFactory
helps the application find
the right protocol handler for each type of URL. The
URLStreamHandlerFactory
interface has a single
method, create URLStreamHandler( )
, which returns a
URLStreamHandler
object. This method must find the
appropriate subclass of URLStreamHandler
given
only the protocol (e.g., ftp);
that is, it must understand whatever package and class naming
conventions you use for your stream handlers. Since
URLStreamHandlerFactory
is an interface, you can
place your createURLStreamHandler( )
method in any
convenient class, perhaps the main class of your application.
When it first encounters a protocol, Java looks for
URLStreamHandler
classes in this order:
First, Java checks to see whether a
URLStreamHandlerFactory
is installed. If it is,
the factory is asked for a URLStreamHandler
for
the protocol.
If a URLStreamHandlerFactory
isn’t installed
or if Java can’t find a URLStreamHandler
for
the protocol, then Java looks in the packages named in the
java.protocol.handler.pkgs
system property for a
sub-package that shares the protocol name and a class called
Handler
. The value of this property is a list of
package names separated by a vertical bar (|
).
Thus, to indicate that Java should seek protocol handlers in the
com.macfaq.net.www
and
org.cafeaulait.protocols
packages, you would add
this line to your properties file:
java.protocol.handler.pkgs=com.macfaq.net.www|org.cafeaulait.protocols
Then to find an FTP protocol handler (for example), Java would look
first for the class
com.macfaq.net.www.ftp.Handler
. If that
weren’t found, Java would next try to instantiate
org.cafeaulait.protocols.ftp.Handler
.
Finally, if all else fails, Java looks for a
URLStreamHandler
named
sun.net.www.protocol.
name
.Handler
,
where name
is replaced by the name of the
protocol; for example,
sun.net.www.protocol.ftp.Handler
.
In the early days of Java (circa 1995) Sun was promising that protocols could be installed at runtime from the server that used them. For instance, in 1996, James Gosling and Henry McGilton wrote: “The HotJava Browser is given a reference to an object (a URL). If the handler for that protocol is already loaded, it will be used. If not, the HotJava Browser will search first the local system and then the system that is the target of the URL.” [27] However, the loading of protocol handlers from web sites was never implemented; and Sun doesn’t much talk about it anymore.
Most of the time, an end user who wants to permanently install an
extra protocol handler in a program such as HotJava will place the
necessary classes in the program’s class path and add the
package prefix to the java.protocol.handler.pkgs
property. However, a programmer who just wants to add a custom proto
col handler to her program at compile time will write and install a
URLStreamHandlerFactory
that knows how to find her
custom protocol handlers. The factory can tell an application to look
for URLStreamHandler
classes in any place
that’s convenient: on a web site, in the same directory as the
application, or somewhere in the user’s class path.
When each of these classes has been written and compiled,
you’re ready to write an application that uses your new
protocol handler. Assuming that you’re using a
URLStreamHandlerFactory
, pass the factory object
to the static URL
.setURL StreamHandlerFactory( )
method like this:
URL.setURLStreamHandlerFactory(new MyURLStreamHandlerFactory( ));
This method can be called only once in the lifetime of an
application. If it is called a second time, it will throw an
Error
. Untrusted applets will generally not be
allowed to install factories or change the
java.protocol.handler.pkgs
property. Consequently,
protocol handlers are primarily of use to standalone applications
such as HotJava; Netscape and Internet Explorer use their own native
C code instead of Java to handle protocols, so they’re limited
to a fixed set of protocols.
To summarize, here’s the sequence of events:
The program constructs a URL
object.
The constructor uses the arguments it’s passed to determine the protocol part of the URL, e.g., http.
The URL( )
constructor tries to find a
URLStreamHandler
for the given protocol like this:
If the protocol has been used before, then the
URLStreamHandler
object is retrieved from a
cache.
Otherwise, if a URLStreamHandlerFactory
has been
set, then the protocol string is passed to the factory’s
createURLStreamHandler( )
method.
If the protocol hasn’t been seen before and there’s no
URlStream HandlerFactory
, then the constructor
attempts to instantiate a URLStreamHandler
object
named protocol
.Handler
in one of the packages listed in the
java.protocol.handler.pkgs
property.
Failing that, the constructor attempts to instantiate a
URLStreamHandler
object named
protocol
.Handler
in the
sun.net.www.protocol
package.
If any of these attempts succeed in retrieving a
URLStreamHandler
object, the
URL
constructor sets the URL
object’s handler
field. If none of the
attempts succeed, the constructor throws a
MalformedURLException
.
The program calls the URL
object’s
openConnection( )
method.
The URL
object asks the
URLStreamHandler
to return a
URLConnection
object appropriate for this URL. If
there’s any problem, an IOException
is
thrown. Otherwise, a URLConnection
object is
returned.
The program uses the methods of the URLConnection
class to interact with the remote resource.
Instead of calling openConnection( )
in step 4,
the program can call getContent( )
or
getInputStream( )
. In this case, the
URLStreamHandler
still instantiates a
URLConnection
object of the appropriate class.
However, instead of returning the URLConnection
object itself, the URLStreamHandler
returns the
result of URLConnection
’s
getContent( )
or getInputStream( )
method.
[27] James Gosling and Henry McGilton, The Java Language Environment, A White Paper, May 1996, http://java.sun.com/docs/white/langenv/HotJava.doc1.html.
3.145.156.250