Google Web Services

The first step is to download the beta kit (or the actual web services package if this has turned into a product by the time you're reading this). Go to http://www.google.com/apis/ read the license and click to download the zip file.

The license makes clear this is for personal use; you're not allowed to build this into commercial products. You're also restricted to fewer than 1000 queries a day, and not more frequently than one a second. Google may well offer a different license in future. You've got to respect their trademarks and agree this is beta software which might not work. The download is less than one MB, so only takes a few seconds.

Before you can go any further, you need to create a Google account. This is really just a free registration of your email address and a password. Google will send you an email, and when you click on the link in it, you're able to proceed. Your Google account can also be used to post to Usenet through the Google servers, to access Google mail, and a couple of other things.

Google will then send you a license key, also called a client key, by email. This is a 32-character (not -bit) string of mixed case alphabetics and punctuation. This isn't a valid key, but a key will look something like this: hNpM%kKY6+k;j1hxkO3KnwQmso+/UH2g

You have to include your individual client key in all program interactions with the Google web service. That lets Google keep track of who is doing what, and selectively disable the service if necessary.

Contents of the Google Beta Kit

After you have downloaded the beta kit, unzip it. It's well behaved, and will unpack into a directory called “googleapi” that contains:

  • A brief program written in Java that demonstrates how to call the web service.

  • An HTML document that explains in detail the semantics of the function calls you can make using the Google Web APIs service. If you didn't already know how to filter a Google search by date-published, or only get details from one site, this will give you the inside information.

  • A jar file that you link against. This jar file does all the heavy lifting of XML formulation, SOAP communication, and result parsing of the return value. Seriously, this is so easy to use, that you might get the wrong impression that all web services are that easy.

  • The WSDL file that describes the Google web services in XML.

  • Javadoc for the Google library contained in the jar file.

  • A readme file describing the above, and telling you how to run the demo immediately.

Let's look at some of these individual pieces.

The googleapi.jar library

The jar file googleapi.jar contains the items shown in Table 28-1.

Table 28-1. Contents of file googleapi.jar

File

Description

The jar file for package com.google.soap.search

Google's Java wrapper for the API SOAP calls.

activation.jar

Jar file for the JavaBeans Activation Framework. The Framework is a library for determining the MIME type of files—is it a GIF, a JPEG, an audio file, etc.

mailapi.jar

The Javamail library. This is used for its character encoding features. UFT-8 is used everywhere, not Latin-1. ASCII is not affected, but clients will get accented characters as two bytes not one, and must handle that.

apache-soap-22.jar

The Apache SOAP 2.2 library.

crimson.jar

The Apache Crimson 1.1.3 library. This is an XML parser.

The library has google's SOAP endpoint address http://api.google.com/search/beta2 built in to it. You don't actually need to know any of these libraries, except the first one. You don't need to understand SOAP or UDDI or WSDL or even XML to use the Google web services API!

When you compile and when you run your own search programs, you need to make sure that library googleapi.jar is on your classpath for both the compiler and the JVM.

The Google WSDL file

The WSDL file provides a standard description of Google's search services. The file is included with the beta kit, and is also at http://api.google.com/GoogleSearch.wsdl. This XML file is about 200 lines long, and the first few lines look like the following example.

<?xml version="1.0"?>
<!-- WSDL description of the Google Web APIs.
     The Google Web APIs are in beta release. All interfaces are subject to
     change as we refine and extend our APIs. Please see the terms of use
     for more information. -->

<!-- Revision 2002-08-16 -->

<definitions name="GoogleSearch"
             targetNamespace="urn:GoogleSearch"
             xmlns:typens="urn:GoogleSearch"
             xmlns:xsd="http://www.w3.org/2001/XMLSchema"
             xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
             xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
             xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
             xmlns="http://schemas.xmlsoap.org/wsdl/">

  <!-- Types for search - result elements, directory categories -->

  <types>
    <xsd:schema xmlns="http://www.w3.org/2001/XMLSchema"
                targetNamespace="urn:GoogleSearch">

      <xsd:complexType name="GoogleSearchResult">
        <xsd:all>
          <xsd:element name="documentFiltering"           type="xsd:boolean"/>
          <xsd:element name="searchComments"              type="xsd:string"/>
          <xsd:element name="estimatedTotalResultsCount"  type="xsd:int"/>
          <xsd:element name="estimateIsExact"             type="xsd:boolean"/>
          <xsd:element name="resultElements"
                           type="typens:ResultElementArray"/>
          <xsd:element name="searchQuery"                 type="xsd:string"/>
          <xsd:element name="startIndex"                  type="xsd:int"/>
          <xsd:element name="endIndex"                    type="xsd:int"/>
          <xsd:element name="searchTips"                  type="xsd:string"/>
          <xsd:element name="directoryCategories"
                           type="typens:DirectoryCategoryArray"/>
          <xsd:element name="searchTime"                  type="xsd:double"/>
        </xsd:all>
      </xsd:complexType>

      <xsd:complexType name="ResultElement">
        <xsd:all>
          <xsd:element name="summary" type="xsd:string"/>
          <xsd:element name="URL" type="xsd:string"/>
          <xsd:element name="snippet" type="xsd:string"/>
          <xsd:element name="title" type="xsd:string"/>
          <xsd:element name="cachedSize" type="xsd:string"/>
          <xsd:element name="relatedInformationPresent" type="xsd:boolean"/>
          <xsd:element name="hostName" type="xsd:string"/>
          <xsd:element name="directoryCategory" type="typens:DirectoryCategory"/>
          <xsd:element name="directoryTitle" type="xsd:string"/>
        </xsd:all>
      </xsd:complexType>

It describes the web services in a form that software can understand. You don't need to use it when you use the Google web services. You would need to use other people's Web Services Description Language files if you want your programs to connect to arbitrary web services dynamically.

Running the example program

The very first thing to try is running the example program that Google provides. You can do that by typing this command line, using your actual client key and any search term you want. If you have several search terms, enclose them in double quotes.

java -cp googleapi.jar com.google.soap.search.GoogleAPIDemo <key> search  honey

Here, “honey” is the search term we give Google, and <key> represents your client key. You'll either need to cd to the directory with the googleapi.jar file, or give its full pathname in the above command. When you do that successfully, the demo program will echo the parameters you have given it:

Parameters:
Client key = <your key>
Directive  = search
Args       = honey

There will be a pause of a second or two while the request is formulated in XML, wrapped in a SOAP bar, put on the wire, serviced by Google, and the XML response sent back to your system.

Then you'll see an answer starting like this:

Google Search Results:
======================
{
TM = 0.129773
Q  = "honey"
CT = ""
TT = ""
CATs =
  {
  {SE="", FVN="Top/Business/Industries/Food_and_Related_Products/Sweeteners/Honey"},
  {SE="", FVN="Top/Shopping/Food/Sweeteners/Honey"}
  }
Start Index = 1
End   Index = 10
Estimated Total Results Number = 3760000
Document Filtering = true
Estimate Correct = false
Rs =
  {

  [
  URL  = "http://www.honey-movie.com/"
  Title = "<b>Honey</b> DVD :: Hip Hop Dance Movie Stars Jessica Alba, Missy Elliot <b>...</b>"
  Snippet = ""
  Directory Category = {SE="", FVN="Top/Arts/Movies/Titles/H/Honey_-_2003"}
  Directory Title = "<b>Honey</b>"
  Summary = "Official site from Universal Pictures. Contains synopsis, trailer, photographs, cast and crew"
  Cached Size = "8k"
  Related information present = true
  Host Name = ""
  ],

  [
  URL  = "http://www.honey.com/"
  Title = "<b>Honey</b>.com - The <b>Honey</b> Expert"
  Snippet = "<b>Honey</b>.com is your source for <b>honey</b> information<br> and recipes. <b>Honey</b>.com -- the <b>honey</b> expert. <b>...</b>  "
  Directory Category = {SE="", FVN=""}
  Directory Title = ""
  Summary = ""
  Cached Size = "18k"
  Related information present = true
  Host Name = ""
  ],

You get up to ten results returned in this beta system. The Google web service gives you the results back in the form of an object, not in the form of XML. You call methods of that object to drill down on individual finds. The results returned will be the same as the first ten results you would get from an interactive query made in a browser at the same time.

Coding your own Google search

You can review the javadoc description of Google's web api in the directory created when you unzipped the download. It's in googleapi/javadoc/index.html.

You'll see that you can make three kinds of requests:

  • Search for web pages containing a term you provide.

  • Ask for spelling correction on a word you provide (did you know Google could do that?).

  • Ask for web pages from Google's cache that contain a term you provide. This is for looking at web pages that are not currently on the web, either because that web server is swamped, or because the owner has removed them.

You construct a request, and set the two mandatory attributes like this:

GoogleSearch s = new GoogleSearch();
s.setKey("your key goes here");
s.setQueryString("honey");

You initiate the web service request like this:

GoogleSearchResult r = s.doSearch();

It needs to be inside a try statement that catches the GoogleSearchFault exception. Those four lines above are enough to search, but you can also set dozens of other attributes on the query (such as restricting results to the English language and so on).

A short complete program looks like this:

import com.google.soap.search.*;
public class search {
  public static void main(String[] args) {

    GoogleSearch s = new GoogleSearch();
    s.setKey("your key goes here");
    s.setQueryString("honey");
    try {
        GoogleSearchResult r = s.doSearch();

        System.out.println(r.toString());
    } catch (GoogleSearchFault f) {
        System.out.print("The call to the Google Web APIs failed: ");
        System.out.println(f.toString());
    }
  }
}

When you compile and run the search program, you have to be careful to provide all the classpaths you need. The compiler needs to see the googleapi.jar class libraries:

javac -cp googleapi.jar  search.java

The JVM needs to see the library, and your search.class class file, so you probably want to include a “.” to represent current directory in the class path:

java -cp googleapi.jar;.  search

(On Unix, use a “:” not a “;” as the path separator, of course.) As you will see, you can just do a print() on the search results object returned to you. It has a pretty comprehensive toString() method that turns it into human readable output. Inside a program you'll want to call methods on the GoogleSearchResult object to get to the individual elements, and to get the fields out of the elements.

Google searches can return millions of web pages. Some limits have been set on the Beta implementation to keep it real. A search string cannot be more than 2KB, or have more than 10 words in the query. Each query will return you no more than 10 results. The javadoc html and the googleapi/APIs_Reference.html pages have more details on this. The Google API is not constrained to a particular version of the JDK.

What a delightful surprise the Google web service API turned out to be. It's surprisingly hard to make software easy to use. The Google folks obviously put some real thought into hiding the underlying complexities, and they did an excellent job. Let's go on and look at the Amazon web services.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.157.197