HTTP and Web Browsing: Retrieving HTTP Pages

Here is an example of interacting with an HTTP server to retrieve a web page from a system on the network. This shows how easy it is to post information to HTML forms. Forms are covered in more depth in Chapter 26, and you can peek ahead if you want.

HTML forms allow you to type some information in your browser, which is sent back to the server for processing. The information may be encoded as part of the URL, or sent separately in name/value pairs.

The Yahoo site is a wide-ranging access portal. They offer online stock quotes that you can read in your browser. I happen to know (by looking at the URL field of my browser) that a request for a stock quote for ABCD is translated to a socket connection of:

http://finance.yahoo.com/q?s=abcd

That's equivalent to opening a socket on port 80 of finance.yahoo.com and sending a “get /q?s=abcd.” You can make that same request yourself, in either of two ways. You can open a socket connection to port 80, the HTTP port. Or you can open a URL connection, which offers a simpler, higher-level interface. We'll show both of these here.

Here's the stock finder done with sockets:

import java.io.*;
import java.net.*;
public class Stock {

    public static void main(String a[]) throws Exception {
        if (a.length!=1) {
            System.out.println("usage:  java Stock <symbol> ");
            System.exit(0);
        }

        String yahoo = "finance.yahoo.com";
        final int httpd = 80;
        Socket sock = new Socket(yahoo, httpd);

        PrintStream out =
               new PrintStream( sock.getOutputStream() );

        String cmd = "GET /q?" +"s=" +a[0] +" ";
        out.print(cmd);
        out.flush();

        BufferedReader in =new BufferedReader( 
          new InputStreamReader( sock.getInputStream() ) );
        String s=null;
        int i, j;
      // pick out the stock price from the pile of HTML
      // it's in big bold, get the number following "<big><b>"
        while ( (s=in.readLine()) != null)  {
             if ((i=s.indexOf("<big><b>")) < 0) continue;
             j = s.indexOf("</b>");
             s=s.substring(i+8,j);
             System.out.println(a[0] +" is at "+s);
             break;
        }
    }
}

The Yahoo page that returns stock quotes contains thousands of characters of hrefs to ads and formatting information. Luckily it's fairly easy to pull out the stock price. From inspecting the output, it's on a line bracketed by <big><b> ... </b>, which is HTML formatting to print the number in bold face. This type of program is called a “screenscraper” and it has been replaced by XML markup, as we'll see in Chapter 27, “XML and Java.”. Screenscrapers are horribly unreliable and break as soon as the web page appearance changes. Check the book website for the latest.

Given all that, running the program provides this output:

java Stock ibm
ibm is at 88.19

It was a whole lot more fun running this program in the ancient Bubbylonian era (spring 2000), than it is today. Here is the same program, rewritten to use the classes URL and URLConnection. Obviously, URL represents a URL, and URLConnection represents a socket connection to that URL. The code to do the same work as before, but using URLConnection is:

import java.io.*;
import java.net.*;
public class Stock2 {

    public static void main(String a[]) throws Exception {
        if (a.length!=1) {
            System.out.println("usage:  java Stock2 <symbol> ");
            System.exit(0);
        }

        String yahoo = "http://finance.yahoo.com/q?s=" + a[0];

        URL url = new URL(yahoo);
        URLConnection conn = url.openConnection();

        BufferedReader in = new BufferedReader( new InputStreamReader(
                                                  conn.getInputStream()));
        String s=null;
        int i=0,j=0;
        while ( (s=in.readLine()) != null)  {
             if ((i=s.indexOf("<big><b>")) < 0) continue;
             j = s.indexOf("</b>");
             s=s.substring(i+8,j);
             System.out.println(a[0] +" is at "+s);
             break;
        }
    }
}

The main difference here is that we form a URL for the site and file (script) that we want to reference. We finish up as before, reading what the socket writes back and extracting the characters of interest.

Clearly, both programs will stop working when Yahoo changes the format of the page, but it demonstrates how we can use a URL and URLConnection for a slightly higher-level interface than a socket connection. We could even go one step further and use the class HttpURLConnection which is a subclass of URLConnection. Please look at the HTML documentation for information on these classes.

How to find the IP address given to a machine name

The class java.net.InetAddress represents IP addresses and about one dozen common operations on them. The class should have been called IP or IPAddress, but was not (presumably because such a name does not match the coding conventions for classnames). Common operations on IP addresses are things like: turning an IP address into the characters that represent the corresponding domain name, turning a host name into an IP address, determining if a given address belongs to the system you are currently executing on, and so on.

InetAddress has two subclasses:

Inet4Address

The class that represents classic, version 4, 32-bit IP addresses

Inet6Address

The class that represents version 6 128-bit IP addresses

Your programs will not use these classes directly very much, as you can create sockets using domain and host names. Further, in most of the places where a hostname is expected (such as in a URL), a String that contains an IP address will work equally well. However, if native code passes you an IP address, these classes give you a way to work on it.

The InetAddress class does not have any public constructors. Applications should use the methods getLocalHost(), getByName(), or getAllByName() to create a new InetAddress instance. The program that follows show examples of each of these.

This code will be able to find the IP address of all computers it knows about. That may mean all systems that have an entry in the local hosts table, or (if it is served by a name server) the domain of the name server, which could be as extensive as a large subnet or the entire organization.

import java.io.*;
import java.net.*;
public class addr {

    public static void main(String a[]) throws Exception {

        InetAddress me = InetAddress.getByName("localhost");
        PrintStream o = System.out;      
        o.println("localhost by name =" + me );

        InetAddress me2 = InetAddress.getLocalHost();
        o.println("localhost by getLocalHost =" + me2 );

        InetAddress[] many = InetAddress.getAllByName("microsoft.com");
        for (int i=0; i<many.length; i++) 
               o.println( many[i] );
    }
}

Run it with:

java addr

localhost by name =localhost/127.0.0.1
localhost by getLocalHost =zap/10.0.10.175
Microsoft: microsoft.com/207.46.230.218
Microsoft: microsoft.com/207.46.230.219
Microsoft: microsoft.com/207.46.197.100
Microsoft: microsoft.com/207.46.197.101
Microsoft: microsoft.com/207.46.197.102

The getAllByName() method reports all the IP addresses associated with a domain name. You can see from the output above that Microsoft.com, like most big sites, is served by multiple IP addresses, on two different subnets (probably for fault tolerance). Each of those five IP addresses probably represents load balancer hardware fanning out to dozens of server nodes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.5.201