Communicating with CGIs and Servlets Through GET

The URL class makes it easy for Java applets and applications to communicate with server-side CGI programs and servlets that use the GET method. (CGI programs and servlets that use the POST method require the URLConnection class and will be discussed in Chapter 15.) All you need to do is determine what combination of names and values the program expects to receive, then cook up a URL with a query string that provides the requisite names and values. All names and values must be x-www-form-url-encoded as by the URLEncoder.encode( ) method discussed in the last section.

There are a number of ways to determine the exact syntax for a query string that talks to a particular CGI or servlet. If you’ve written the server-side program yourself, you already know what name-value pairs it expects. If you’ve installed a third-party program on your own server, the documentation for that program should tell you what it expects.

On the other hand, if you’re talking to a program on a third-party server, matters are a little trickier. You can always ask people at the remote server to provide you with the specifications for talking to their CGI programs. However, even if they don’t mind you doing this, there’s probably no one person whose job description includes “telling third-party hackers with whom we have no business relationship exactly how to access our servers”. Thus, unless you happen upon a particularly friendly or bored individual who has nothing better to do with her time except write long emails detailing exactly how to access her server, you’re going to have to do a little reverse engineering.

Many CGI programs are designed to process form input. If this is the case, it’s straightforward to figure out what input the CGI program expects. The method the form uses should be the value of the METHOD attribute of the FORM element. This value should be either GET, in which case you use the process described here for talking to CGIs, or POST, in which case you use the process described in Chapter 15. The part of the URL that precedes the query string is given by the value of the ACTION attribute of the FORM element. Note that this may be a relative URL, in which case you’ll need to determine the corresponding absolute URL. Finally, the name-value pairs are simply the NAME attributes of the INPUT elements, except for any INPUT elements whose TYPE attribute has the value submit.

For example, consider this HTML form for the local search engine on my Cafe con Leche site. You can see that it uses the GET method. The CGI program that processes the form is found at the URL http://search.metalab.unc.edu:8765/query.html. It has 20 separate name-value pairs, most of which have default values:

<FORM NAME="seek" METHOD="GET" 
 ACTION="http://search.metalab.unc.edu:8765/query.html">
<INPUT TYPE="hidden" NAME="col" VALUE="metalab"></INPUT>
<INPUT TYPE="hidden" NAME="op0" VALUE="+"></INPUT>
<INPUT TYPE="hidden" NAME="fl0" VALUE="url:"></INPUT>
<INPUT TYPE="hidden" NAME="ty0" VALUE="w"></INPUT>
<INPUT TYPE="hidden" NAME="tx0" size="50" VALUE="xml/"></INPUT>
<INPUT TYPE="hidden" NAME="op1" VALUE="+"></INPUT>
<INPUT TYPE="hidden" NAME="fl1" VALUE=""></INPUT>
<INPUT TYPE="hidden" NAME="ty1" VALUE="w"></INPUT>
<INPUT TYPE="text" NAME="tx1" size="20" VALUE="" 	    
	   max length="2047"><INPUT>
INPUT TYPE="hidden" NAME="qp" VALUE=""></INPUT>
<INPUT TYPE="hidden" NAME="qs" VALUE=""></INPUT>
<INPUT TYPE="hidden" NAME="qc" VALUE=""></INPUT>
<INPUT TYPE="hidden" NAME="ws" VALUE="0"></INPUT>
<INPUT TYPE="hidden" NAME="qm" VALUE="0"></INPUT>
<INPUT TYPE="hidden" NAME="st" VALUE="1"></INPUT>
<INPUT TYPE="hidden" NAME="nh" VALUE="10"></INPUT>
<INPUT TYPE="hidden" NAME="lk" VALUE="1"></INPUT>
<INPUT TYPE="hidden" NAME="rf" VALUE="0"></INPUT>
<INPUT TYPE="hidden" NAME="oq" VALUE=""></INPUT>
<INPUT TYPE="hidden" NAME="rq" VALUE="0"></INPUT>
<br />
<INPUT TYPE="submit" VALUE="Search"></input>
</FORM>

The type of the INPUT field doesn’t matter—for instance, whether it’s a set of checkboxes or a pop-up list or a text field—only the name of each INPUT field and the value you give it. The single exception is a submit input that tells the web browser only when to send the data but does not give the server any extra information. In some cases, you may find hidden INPUT fields that must have particular required default values. This form is almost nothing but hidden INPUT fields.

In some cases, the CGI may not be able to handle arbitrary text strings for values of particular inputs. However, since the form is meant to be read and filled in by human beings, it should provide sufficient clues to figure out what input is expected; for instance, that a particular field is supposed to be a two-letter state abbreviation or a phone number.

A CGI that doesn’t respond to a form is much harder to reverse engineer. For example, at http://metalab.unc.edu/nywc/bios.phtm l, you’ll find a lot of links to a CGI that talks to a database to retrieve a list of musical works by a particular named composer. However, there’s no form anywhere that corresponds to this CGI. It’s all done by hardcoded URLs. In this case, the best you can do is look at as many of those URLs as possible and see whether you can guess what the server expects. If the designer hasn’t tried to be too devious, this generally isn’t all that hard. For example, these URLs are all found on that page:

http://metalab.unc.edu/nywc/compositionsbycomposer.phtml?last=Anderson  
    &first=Beth&middle=
http://metalab.unc.edu/nywc/compositionsbycomposer.phtml?last=Austin 
    &first=Dorothea&middle=
http://metalab.unc.edu/nywc/compositionsbycomposer.phtml?last=Bliss 
    &first=Marilyn&middle=
http://metalab.unc.edu/nywc/compositionsbycomposer.phtml?last=Hart 
    &first=Jane&middle=Smith

Looking at these, you can probably guess that this particular CGI programs expects three inputs named first, middle, and last whose values are the first, middle, and last names of a composer, respectively. Sometimes the inputs may not have such obvious names. In this case, you’ll just have to do some experimenting, first copying some existing values and then tweaking them to see what values are and aren’t accepted. You don’t need to do this in a Java program. You can do it simply by editing the URL in the Address or Location bar of your web browser window.

Note

The likelihood that other hackers may experiment with your own CGIs and servlets in such a fashion is a good reason to make them extremely robust against unexpected input.

Regardless of how you determine the set of name-value pairs the CGI or servlet expects, actually communicating with the program once you know them is simple. All you have to do is create a query string that includes the necessary name-value pairs, then form a URL that includes that query string. You send the query string to the server and read its response using the same methods you use to connect to a server and retrieve a static HTML page. There’s no special protocol to follow once the URL is constructed. (There is a special protocol to follow for the POST method, which is why discussion of that method will have to wait until Chapter 15.)

Let’s demonstrate this procedure by writing a very simple command-line program to look up topics in the Netscape Open Directory (http://dmoz.org/ ). This is shown in Figure 7.3 and has the advantage of being really simple.

The basic user interface for the Open Directory

Figure 7-3. The basic user interface for the Open Directory

The basic Open Directory interface is a simple form with one input field named search; input typed in this field is sent to a CGI program at http://search.dmoz.org/cgi-bin/search, which does the actual search. The HTML for the form looks like this:

<form method=get action="http://search.dmoz.org/cgi-bin/search">
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <input size=30 name=search>
<input type=submit value="Search">
<a href="http://search.dmoz.org/cgi-bin/search?a.x=0"><small><i>advanced</i>     
    </small></a>
</form>

Thus, to submit a search request to the Open Directory, you just need to collect the search string, encode that in a query string, and send it to http://search.dmoz.org/cgi-bin/search. For example, to search for “java”, you would open a connection to the URL http://search.dmoz.org/cgi-bin/search?search=java and read the resulting input stream. Example 7.10 does exactly this.

Example 7-10. Do an Open Directory Search

import com.macfaq.net.*;
import java.net.*;
import java.io.*;


public class DMoz {

  public static void main(String[] args) {
  
    String target = "";
    
    for (int i = 0; i < args.length; i++) {
      target += args[i] + " ";
    }
    target = target.trim(  );
    QueryString query = new QueryString("search", target);
    try {
      URL u = new URL("http://search.dmoz.org/cgi-bin/search?" + query);
      InputStream in = new BufferedInputStream(u.openStream(  ));
      InputStreamReader theHTML = new InputStreamReader(in);
      int c;
      while ((c = theHTML.read(  )) != -1) {
        System.out.print((char) c);
      } 
    }
    catch (MalformedURLException e) {
      System.err.println(e);
    }
    catch (IOException e) {
      System.err.println(e);
    }
    
  }

}

Of course, a lot more effort could be expended if you actually want to parse or display the results. But notice how simple the code was to talk to this CGI. Aside from the funky-looking URL, and the slightly greater likelihood that some pieces of it need to be x-www-form-url-encoded, talking to a CGI that uses GET is no harder than retrieving any other HTML page.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.70.101