The URLEncoder and URLDecoder Classes

One of the problems that the designers of the Web faced was differences between local operating systems. These differences can cause problems with URLs: for example, some operating systems allow spaces in filenames; some don’t. Most operating systems won’t complain about a # sign in a filename; in a URL, a # sign means that the filename has ended, and a named anchor follows. Similar problems are presented by other special characters, nonalphanumeric characters, etc., all of which may have a special meaning inside a URL or on another operating system. To solve these problems, characters used in URLs must come from a fixed subset of ASCII, in particular:

  • The capital letters A-Z

  • The lowercase letters a-z

  • The digits 0-9

  • The punctuation characters - _ . ! ~ * ` (and , )

The characters : / & ? @ # ; $ + = % and , may also be used, but only for their specified purposes. If these characters occur as part of a filename, then they and all other characters should be encoded.

The encoding used is very simple. Any characters that are not ASCII numerals, letters, or the punctuation marks specified earlier are represented by a percent sign followed by two hexadecimal digits giving the value for that character. Spaces are a special case because they’re so common. Besides being encoded as %20, they can be encoded as a plus sign (+). The plus sign itself is encoded as %2B. The / # = & and ? characters should be encoded when they are used as part of a name, and not as a separator between parts of the URL.

Note

This scheme doesn’t work well (or really at all) for multibyte character sets. This is a distinct shortcoming of the current URI specification that should be addressed in the future.

Java 1.0 and later provides a URLEncoder class to encode strings in this format. Java 1.2 adds a URLDecoder class that can decode strings in this format. Neither of these classes will be instantiated. Both provide a single static method to do their work:

public class URLDecoder extends Object
public class URLEncoder extends Object

URLEncoder

The java.net.URLEncoder class contains a single static method called encode( ) that encodes a String according to these rules:

public static String encode(String s)

URLEncoder.encode( ) changes any nonalphanumeric characters except the space, underscore, hyphen, period, and asterisk characters into % sequences. The space is converted into a plus sign. This method is a little overly aggressive in that it also converts tildes, single quotes, exclamation points, and parentheses to percent escapes even though they don’t absolutely have to be. (In Java 1.0, URLEncoder was even more aggressive and also encoded asterisks and periods.) However, this isn’t forbidden by the URL specification, so web browsers will deal reasonably with these excessively encoded URLs. There’s no reason encode( ) couldn’t have been included in the URL class, but it wasn’t. The signature of encode( ) is:

public static String encode(String s)

It returns a new String suitably encoded. Example 7.8 uses this method to print various encoded strings.

Example 7-8. x-www-form-urlencoded Strings

import java.net.*;


public class EncodeTest {

  public static void main(String[] args) {

      System.out.println(URLEncoder.encode("This string has spaces"));
      System.out.println(URLEncoder.encode("This*string*has*asterisks"));
      System.out.println(URLEncoder.encode(
		"This%string%has%percent%signs"));
      System.out.println(URLEncoder.encode("This+string+has+pluses"));
      System.out.println(URLEncoder.encode("This/string/has/slashes"));
      System.out.println(URLEncoder.encode(
		"Thisstring"has"quote"marks"));
      System.out.println(URLEncoder.encode(This:string:has:colons"));	
      System.out.println(URLEncoder.encode("This~string~has~tildes"));
      System.out.println(URLEncoder.encode(
		"This(string)has(parentheses)"));
      System.out.println(URLEncoder.encode("This.string.has.periods"));
      System.out.println(URLEncoder.encode(
		"This=string=has=equals=signs"));
      System.out.println(URLEncoder.encode("This&string&has&ampersands"));

  }

}

Here is the output:

% java EncodeTest
This+string+has+spaces
This*string*has*asterisks
This%25string%25has%25percent%25signs
This%2Bstring%2Bhas%2Bpluses
This%2Fstring%2Fhas%2Fslashes
This%22string%22has%22quote%22marks
This%3Astring%3Ahas%3Acolons
This%7Estring%7Ehas%7Etildes
This%28string%29has%28parentheses%29
This.string.has.periods
This%3Dstring%3Dhas%3Dequals%3Dsigns
This%26string%26has%26ampersands

Notice in particular that this method does encode the forward slash, the ampersand, the equals sign, and the colon. It does not attempt to determine how these characters are being used in a URL. Consequently, you have to encode your URLs piece by piece, rather than encoding an entire URL in one method call. This is an important point, because the primary use of URLEncoder is in preparing query strings for communicating with CGI programs that use GET . For example, suppose you want to encode this query string used for an AltaVista search:

pg=q&kl=XX&stype=stext&q=+"Java+I/O"&search.x=38&search.y=3

This code fragment encodes it:

String query = URLEncoder.encode(
 "pg=q&kl=XX&stype=stext&q=+"Java+I/O"&search.x=38&search.y=3");
System.out.println(query);

Unfortunately, the output is:

pg%3Dq%26kl%3DXX%26stype%3Dstext%26q%3D%2B%22Java%2BI%2FO%22%26search
.x%3D38%26search.y%3D3

The problem is that URLEncoder.encode( ) encodes blindly. It can’t distinguish between special characters used as part of the URL or query string, like & and = in the previous string, and characters that need to be encoded. Consequently, URLs need to be encoded a piece at a time like this:

String query = URLEncoder.encode("pg");
query += "=";
query += URLEncoder.encode("q");
query += "&";
query += URLEncoder.encode("kl");
query += "=";
query += URLEncoder.encode("XX");
query += "&";
query += URLEncoder.encode("stype");
query += "=";
query += URLEncoder.encode("stext");
query += "&";
query += URLEncoder.encode("q");
query += "=";
query += URLEncoder.encode(""Java I/O"");
query += "&";
query += URLEncoder.encode("search.x");
query += "=";
query += URLEncoder.encode("38");
query += "&";
query += URLEncoder.encode("search.y");
query += "=";
query += URLEncoder.encode("3");
System.out.println(query);

The output of this is what you actually want:

pg=q&kl=XX&stype=stext&q=%2B%22Java+I%2FO%22&search.x=38&search.y=3

Example 7.9 is a QueryString class that uses the URLEncoder to encode successive name and value pairs in a Java object, which will be used for sending data to CGI programs. When you create a QueryString, you can supply the first name-value pair to the constructor; the arguments are a pair of objects, which are converted to strings using their toString( ) methods and then encoded. To add further pairs, call the add( ) method, which also takes two objects as arguments, converts them to Strings, and encodes them. The QueryString class supplies its own toString( ) method, which simply returns the accumulated list of name-value pairs. toString( ) is called implicitly whenever you add a QueryString to another string or print it on an output stream.

Example 7-9. The QueryString Class

package com.macfaq.net;

import java.net.URLEncoder;

public class QueryString {

  private String query;

  public QueryString(Object name, Object value) {
    query = URLEncoder.encode(name.toString(  )) + "=" + 
      URLEncoder.encode(value.toString(  ));
  }
  
  public QueryString(  ) {
    query = "";
  }
  
  public synchronized void add(Object name, Object value) {
  
    if (!query.trim(  ).equals("")) query += "&" ; 
    query += URLEncoder.encode(name.toString(  )) + "=" + 
     URLEncoder.encode(value.toString(  ));
  
  }
  
  public String toString(  ) {
    return query;
  }

}

Using this class, we can now encode the previous example like this:

QueryString qs = new QueryString("pg", "q");
qs.add("kl", "XX");
qs.add("stype", "stext");
qs.add("q", "+"Java I/O"");
qs.add("search.x", "38");
qs.add("search.y", "3");
String url = "http://www.altavista.com/cgi-bin/query?" + qs;
System.out.println(url);

URLDecoder

Java 1.2 adds a corresponding URLDecoder class. This has a single static method that decodes any string encoded in x-www-form-url-encoded format. That is, it converts all plus signs to spaces and all percent escapes to their corresponding character. Its signature is:

public static String decode(String s) throws Exception

An IllegalArgumentException is thrown if the string contains a percent sign that isn’t followed by two hexadecimal digits. Since this method passes all non-escaped characters along as is, you can pass an entire URL to it, rather than splitting it into pieces first. For example:

String input = "http://www.altavista.com/cgi-bin/" + 
"query?pg=q&kl=XX&stype=stext&q=%2B%22Java+I%2FO%22&search.x=38&search.y=3";
 try {
  String output = URLDecoder.decode(input);
  System.out.println(output);
 }
               
               
               
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.193.232