Chapter 16. Protocol Handlers

When designing an architecture that would allow them to build a self-extensible browser, the engineers at Sun divided the problem into two parts: handling protocols and handling content. Handling a protocol means taking care of the interaction between a client and a server: generating requests in the correct format, interpreting the headers that come back with the data, acknowledging that the data has been received, etc. Handling the content means converting the raw data into a format Java understands, for example, an InputStream or an AudioClip. These two problems, handling protocols and handling content, are distinct. The software that displays a GIF image doesn’t care whether the image was retrieved via FTP, HTTP, gopher, or some new protocol. Likewise, the protocol handler, which manages the connection and interacts with the server, doesn’t care if it’s receiving an HTML file or an MPEG movie file; at most, it will extract a content type from the headers to pass along to the content handler.

Java divides the task of handling protocols into a number of pieces. As a result, there is no single class called ProtocolHandler. Instead, pieces of the protocol handler mechanism are implemented by four different class es in the java.net package: URL, URLStreamHandler, URLConnection, and URLStreamHandlerFactory. URL is the only concrete class in this group; URLStreamHandler and URLConnection are both abstract classes, and URLStreamHandlerFactory is an interface. Therefore, if you are going to implement a new protocol handler, you have to write concrete subclasses for the URLStreamHandler and the URLConnection. To use these classes, you may also have to write a class that implements the URLStreamHandlerFactory interface.

What Is a Protocol Handler?

The way the URL, URLStreamHandler , URLConnection, and URLStreamHandlerFactory classes work together can be confusing. Everything starts with a URL, which represents a pointer to a particular Internet resource. Each URL specifies the protocol used to access the resource; typical values for the protocol include mailto, http, and ftp. When you construct a URL object from the URL’s string representation, the constructor strips the protocol field and passes it to the URLStreamHandlerFactory. The factory’s job is to take the protocol, locate the right subclass of URLStreamHandler for the protocol, and create a new instance of that stream handler, which is stored as a field within the URL object. Each application has at most one URLStreamHandlerFactory; once the factory has been installed, attempting to install another will throw an Error.

Now that the URL object has a stream handler, it asks the stream handler to finish parsing the URL string and create a subclass of URLConnection that knows how to talk to servers using this protocol. URLStreamHandler subclasses and URLConnection subclasses always come in pairs; the stream handler for a protocol always knows how to find an appropriate URLConnection for its protocol. It is worth noting that the stream handler does most of the work of parsing the URL. The format of the URL, although it is standard, depends on the protocol; therefore, it must be parsed by a URLStreamHandler, which knows about a particular protocol, and not by the URL object, which is generic and thus should have no knowledge of specific protocols. This also means that if you are writing a new stream handler, you can define a new URL format that’s appropriate to your task.

The URLConnection class, which you learned about in the previous chapter, represents an active connection to the Internet resource. It is responsible for interacting with the server. A URLConnection knows how to generate requests and interpret the headers that the server returns. The output from a URLConnection is the raw data requested with all traces of the protocol (headers, etc.) stripped, ready for processing by a content handler.

In most applications, you don’t need to worry about URLConnection objects and stream handlers; they are hidden by the URL class, which provides a simple interface to the methods you need. When you call the getInputStream( ) , getOutputStream( ), and getContent( ) methods of the URL class, you are really calling similarly named methods in the URLConnection class. We have seen that interacting directly with a URLConnection can be convenient when you need a little more control over communication with a server, most commonly when downloading binary files.

However, the URLConnection and URLStreamHandler classes are even more important when you need to add new protocols. By writing subclasses of these classes, you can add support for standard protocols such as finger, whois, or NTP that Java doesn’t support out of the box. Furthermore, you’re not limited to established protocols with well-known services. You can create new protocols that perform database queries, search across multiple Internet search engines, view pictures from binary newsgroups, and more. You can add new kinds of URLs as needed to represent the new types of resources. Furthermore, Java applications can be built so that they can load new protocol handlers at runtime. Unlike current browsers such as Mozilla and Internet Explorer, which contain explicit knowledge of all the protocols and content types they can handle, a Java browser can be a relatively lightweight skeleton that loads new handlers as needed. Supporting a new protocol just means adding some new classes in predefined locations, not writing an entirely new release of the browser.

What’s involved in adding support for a new protocol? As I said earlier, you need to write two new classes: a subclass of URLConnection and a subclass of URLStreamHandler. You may also need to write a class that implements the URLStreamHandlerFactory interface. Your URLConnection subclass handles the interaction with the server, converts anything the server sends into an InputStream, and converts anything the client sends into an OutputStream. This subclass must implement the abstract method connect( ); it may also override the concrete methods getInputStream( ), getOutputStream( ), and getContent Type( ).

The URLStreamHandler subclass parses the string representation of the URL into its separate parts and creates a new URLConnection object that understands that URL’s protocol. This subclass must implement the abstract openConnection( ) method, which returns the new URLConnection to its caller. If the String representation of the URL doesn’t look like a standard http URL, then you should also override the parseURL( ) and toExternalForm( ) methods.

Finally, you may need to create a class that implements the URLStreamHandlerFactory interface. The URLStreamHandlerFactory helps the application find the right protocol handler for each type of URL. The URLStreamHandlerFactory interface has a single method, create URLStreamHandler( ) , which returns a URLStreamHandler object. This method must find the appropriate subclass of URLStreamHandler given only the protocol (e.g., ftp); that is, it must understand whatever package and class naming conventions you use for your stream handlers. Since URLStreamHandlerFactory is an interface, you can place your createURLStreamHandler( ) method in any convenient class, perhaps the main class of your application.

When it first encounters a protocol, Java looks for URLStreamHandler classes in this order:

  1. First, Java checks to see whether a URLStreamHandlerFactory is installed. If it is, the factory is asked for a URLStreamHandler for the protocol.

  2. If a URLStreamHandlerFactory isn’t installed or if Java can’t find a URLStreamHandler for the protocol, then Java looks in the packages named in the java.protocol.handler.pkgs system property for a sub-package that shares the protocol name and a class called Handler. The value of this property is a list of package names separated by a vertical bar (|). Thus, to indicate that Java should seek protocol handlers in the com.macfaq.net.www and org.cafeaulait.protocols packages, you would add this line to your properties file:

    java.protocol.handler.pkgs=com.macfaq.net.www|org.cafeaulait.protocols

    Then to find an FTP protocol handler (for example), Java would look first for the class com.macfaq.net.www.ftp.Handler. If that weren’t found, Java would next try to instantiate org.cafeaulait.protocols.ftp.Handler.

  3. Finally, if all else fails, Java looks for a URLStreamHandler named sun.net.www.protocol. name .Handler, where name is replaced by the name of the protocol; for example, sun.net.www.protocol.ftp.Handler.

Note

In the early days of Java (circa 1995) Sun was promising that protocols could be installed at runtime from the server that used them. For instance, in 1996, James Gosling and Henry McGilton wrote: “The HotJava Browser is given a reference to an object (a URL). If the handler for that protocol is already loaded, it will be used. If not, the HotJava Browser will search first the local system and then the system that is the target of the URL.” [27] However, the loading of protocol handlers from web sites was never implemented; and Sun doesn’t much talk about it anymore.

Most of the time, an end user who wants to permanently install an extra protocol handler in a program such as HotJava will place the necessary classes in the program’s class path and add the package prefix to the java.protocol.handler.pkgs property. However, a programmer who just wants to add a custom proto col handler to her program at compile time will write and install a URLStreamHandlerFactory that knows how to find her custom protocol handlers. The factory can tell an application to look for URLStreamHandler classes in any place that’s convenient: on a web site, in the same directory as the application, or somewhere in the user’s class path.

When each of these classes has been written and compiled, you’re ready to write an application that uses your new protocol handler. Assuming that you’re using a URLStreamHandlerFactory, pass the factory object to the static URL.setURL StreamHandlerFactory( ) method like this:

URL.setURLStreamHandlerFactory(new MyURLStreamHandlerFactory(  ));

This method can be called only once in the lifetime of an application. If it is called a second time, it will throw an Error. Untrusted applets will generally not be allowed to install factories or change the java.protocol.handler.pkgs property. Consequently, protocol handlers are primarily of use to standalone applications such as HotJava; Netscape and Internet Explorer use their own native C code instead of Java to handle protocols, so they’re limited to a fixed set of protocols.

To summarize, here’s the sequence of events:

  1. The program constructs a URL object.

  2. The constructor uses the arguments it’s passed to determine the protocol part of the URL, e.g., http.

  3. The URL( ) constructor tries to find a URLStreamHandler for the given protocol like this:

    1. If the protocol has been used before, then the URLStreamHandler object is retrieved from a cache.

    2. Otherwise, if a URLStreamHandlerFactory has been set, then the protocol string is passed to the factory’s createURLStreamHandler( ) method.

    3. If the protocol hasn’t been seen before and there’s no URlStream HandlerFactory, then the constructor attempts to instantiate a URLStreamHandler object named protocol .Handler in one of the packages listed in the java.protocol.handler.pkgs property.

    4. Failing that, the constructor attempts to instantiate a URLStreamHandler object named protocol .Handler in the sun.net.www.protocol package.

    5. If any of these attempts succeed in retrieving a URLStreamHandler object, the URL constructor sets the URL object’s handler field. If none of the attempts succeed, the constructor throws a MalformedURLException.

  4. The program calls the URL object’s openConnection( ) method.

  5. The URL object asks the URLStreamHandler to return a URLConnection object appropriate for this URL. If there’s any problem, an IOException is thrown. Otherwise, a URLConnection object is returned.

  6. The program uses the methods of the URLConnection class to interact with the remote resource.

Instead of calling openConnection( ) in step 4, the program can call getContent( ) or getInputStream( ). In this case, the URLStreamHandler still instantiates a URLConnection object of the appropriate class. However, instead of returning the URLConnection object itself, the URLStreamHandler returns the result of URLConnection’s getContent( ) or getInputStream( ) method.



[27] James Gosling and Henry McGilton, The Java Language Environment, A White Paper, May 1996, http://java.sun.com/docs/white/langenv/HotJava.doc1.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.156.250