Some Useful Programs

You now know everything there is to know about the java.net.InetAddress class. The tools in this class alone let you write some genuinely useful programs. Here we’ll look at two: one that queries your domain name server interactively and another that can improve the performance of your web server by processing log files offline.

HostLookup

nslookup is a Unix utility that converts hostnames to IP addresses and IP addresses to hostnames. It has two modes: interactive and command line. If you enter a hostname on the command line, nslookup prints the IP address of that host. If you enter an IP address on the command line, nslookup prints the hostname. If no hostname or IP address is entered on the command line, nslookup enters interactive mode, in which it reads hostnames and IP addresses from standard input and echoes back the corresponding IP addresses and hostnames until you type “exit”. Example 6.9 is a simple character mode application called HostLookup, which emulates nslookup. It doesn’t implement any of nslookup’s more complex features, but it does enough to be useful.

Example 6-9. An nslookup Clone

import java.net.*;
import java.io.*;

public class HostLookup {

  public static void main (String[] args) {

    if (args.length > 0) { // use command line
      for (int i = 0; i < args.length; i++) {
        System.out.println(lookup(args[i]));
      }
    }
    else {
      BufferedReader in = new BufferedReader(
                           new InputStreamReader(System.in));
      System.out.println(
       "Enter names and IP addresses. Enter "exit" to quit.");
      try {
        while (true) {
          String host = in.readLine(  );
          if (host.equals("exit")) break;
          System.out.println(lookup(host));
        }
      }
      catch (IOException e) {
        System.err.println(e);
      }

   }

  } /* end main */


  private static String lookup(String host) {

    InetAddress thisComputer;
    byte[] address;

    // get the bytes of the IP address
    try {
      thisComputer = InetAddress.getByName(host);
      address = thisComputer.getAddress(  );
    }
    catch (UnknownHostException e) {
      return "Cannot find host " + host;
    }

    if (isHostName(host)) {
      // Print the IP address
      String dottedQuad = "";
      for (int i = 0; i < address.length; i++) {
        int unsignedByte = address[i] < 0 ? address[i] + 256 : address[i];
        dottedQuad += unsignedByte;
        if (i != address.length-1) dottedQuad += ".";
      }
      return dottedQuad;
    }
    else {  // this is an IP address
      return thisComputer.getHostName(  );
    }

  }  // end lookup

  private static boolean isHostName(String host) {

    char[] ca = host.toCharArray(  );
    // if we see a character that is neither a digit nor a period
    // then host is probably a host name
    for (int i = 0; i < ca.length; i++) {
      if (!Character.isDigit(ca[i])) {
        if (ca[i] != '.') return true;
      }
    }

    // Everything was either a digit or a period
    // so host looks like an IP address in dotted quad format
    return false;

   }  // end isHostName

 } // end HostLookup

Here’s some sample output; input typed by the user is in bold:

% java HostLookup utopia.poly.edu
128.238.3.21
% java HostLookup 128.238.3.21
utopia.poly.edu
% java HostLookup
Enter names and IP addresses. Enter "exit" to quit.
cs.nyu.edu
128.122.80.78
199.1.32.90
star.blackstar.com
localhost
127.0.0.1
cs.cmu.edu
128.2.222.173
rtfm.mit.edu
18.181.0.29
star.blackstar.com
199.1.32.90
cs.med.edu
Cannot find host cs.med.edu
exit

The HostLookup program is built using three methods: main( ), lookup( ), and isHostName( ). The main( ) method determines whether there are command-line arguments. If there are command-line arguments, main( ) calls lookup( ) to process each one. If there are no command-line arguments, it chains a BufferedReader to an InputStreamReader chained to System.in and reads input from the user with the readLine( ) method. (The warning in Chapter 4, about this method doesn’t apply here because we’re reading from the console, not a network connection.) If the line is “exit”, then the program exits. Otherwise, the line is assumed to be a hostname or IP address, and is passed to the lookup( ) method.

The lookup( ) method uses InetAddress.getByName( ) to find the requested host, regardless of the input’s format; remember that getByName( ) doesn’t care if its argument is a name or a dotted quad address. If getByName( ) fails, then lookup( ) returns a failure message. Otherwise, it gets the address of the requested system. Then lookup( ) calls isHostName( ) to determine whether the input string host is a hostname like cs.nyu.edu or a dotted quad format IP address like 128.122.153.70. isHostName( ) looks at each character of the string; if all the characters are digits or periods, isHostName( ) guesses that the string is a numeric IP address and returns false. Otherwise, isHostName( ) guesses that the string is a hostname and returns true. What if the string is neither? That is very unlikely, since if the string is neither a hostname nor an address, getByName( ) won’t be able to do a lookup and will throw an exception. However, it would not be difficult to add a test making sure that the string looks valid; this is left as an exercise for the reader. If the user types a hostname, lookup( ) returns the corresponding dotted quad address; we have already saved the address in the byte array address[], and the only complication is making sure that we don’t treat byte values from 128 to 255 as negative numbers. If the user types an IP address, then we use the getHostName( ) method to look up the hostname corresponding to the address, and return it.

Processing Web Server Log Files

Web server logs track the hosts that access a web site. By default, the log reports the IP addresses of the sites that connect to the server. However, you can often get more information from the names of those sites than from their IP addresses. Most web servers have an option to store hostnames instead of IP addresses, but this can hurt performance because the server needs to make a DNS request for each hit. It is much more efficient to log the IP addresses and convert them to hostnames at a later time. This task can be done when the server isn’t busy or even on another machine completely. Example 6.10 is a program called Weblog that reads a web server log file and prints each line with IP addresses converted to hostnames.

Most web servers have standardized on the common log file format, although there are exceptions; if your web server is one of those exceptions, you’ll have to modify this program. A typical line in the common log file format looks like this:

205.160.186.76 unknown - [17/Jun/1999:22:53:58 -0500] "GET /bgs/greenbg.gif HTTP 1.0" 200 50

This means that a web browser at IP address 205.160.186.76 requested the file /bgs/greenbg.gif from this web server at 11:53 P.M. (and 58 seconds) on June 17, 1999. The file was found (response code 200), and 50 bytes of data were successfully transferred to the browser.

The first field is the IP address or, if DNS resolution is turned on, the hostname from which the connection was made. This is followed by a space. Therefore, for our purposes, parsing the log file is easy: everything before the first space is the IP address, and everything after it does not need to be changed.

The dotted quad format IP address is converted into a hostname using the usual methods of java.net.InetAddress. Example 6.10 shows the code.

Example 6-10. Process Web Server Log Files

import java.net.*;
import java.io.*;
import java.util.*;
import com.macfaq.io.SafeBufferedReader; 


public class Weblog {

  public static void main(String[] args) {

    Date start = new Date(  );
    try {
      FileInputStream fin =  new FileInputStream(args[0]);
      Reader in = new InputStreamReader(fin);
      SafeBufferedReader bin = new SafeBufferedReader(in);
      
      String entry = null;
      while ((entry = bin.readLine(  )) != null) {
        
        // separate out the IP address
        int index = entry.indexOf(' ', 0);
        String ip = entry.substring(0, index);
        String theRest = entry.substring(index, entry.length(  ));
        
        // find the host name and print it out
        try {
          InetAddress address = InetAddress.getByName(ip);
          System.out.println(address.getHostName(  ) + theRest);
        }
        catch (UnknownHostException e) {
          System.out.println(entry);
        }
        
      } // end while
    }
    catch (IOException e) {
      System.out.println("Exception: " + e);
    }
    
    Date end = new Date(  );
    long elapsedTime = (end.getTime()-start.getTime(  ))/1000;
    System.out.println("Elapsed time: " + elapsedTime + " seconds");

  }  // end main

}

The name of the file to be processed is passed to Weblog as the first argument on the command line. A FileInputStream fin is opened from this file, and an InputStreamReader is chained to fin. This InputStreamReader is buffered by chaining it to an instance of the SafeBufferedReader class developed in Chapter 4. The file is processed line by line in a while loop.

Each pass through the loop places one line in the String variable entry. entry is then split into two substrings: ip, which contains everything before the first space, and theRest, which is everything after the first space. The position of the first space is determined by entry.indexOf(" ", 0). ip is converted to an InetAddress object using getByName( ). The hostname is then looked up by getHostName( ). Finally, the hostname, a space, and everything else on the line (theRest) are printed on System.out. Output can be sent to a new file through the standard means for redirecting output.

Weblog is more efficient than you might expect. Most web browsers generate multiple log file entries per page served, since there’s an entry in the log not just for the page itself but for each graphic on the page. And many web browsers request multiple pages while visiting a site. DNS lookups are expensive, and it simply doesn’t make sense to look up each of those sites every time it appears in the log file. The InetAddress class caches requested addresses. If the same address is requested again, it can be retrieved from the cache much more quickly than from DNS.

Nonetheless, this program could certainly be faster. In my initial tests, it took more than a second per log entry. (Exact numbers depend on the speed of your network connection, the speed of both local and remote DNS servers you access, and network congestion when the program is run.) It spends a huge amount of time just sitting and waiting for DNS requests to return. Of course, this is exactly the problem multithreading is designed to solve. One main thread can read the log file and pass off individual entries to other threads for processing.

A thread pool is absolutely necessary here. Over the space of a few days, even low volume web servers can easily generate a log file with hundreds of thousands of lines. Trying to process such a log file by spawning a new thread for each entry would rapidly bring even the strongest virtual machine to its knees, especially since the main thread can read log file entries much faster than individual threads can resolve domain names and die. Consequently, reusing threads is essential here. The number of threads is stored in a tunable parameter, numberOfThreads, so that it can be adjusted to fit the VM and network stack. (Launching too many simultaneous DNS requests can also cause problems.)

This program is now divided into two classes. The first class, PooledWeblog, shown in Example 6.11, contains the main( ) method and the processLogFile( ) method. It also holds the resources that need to be shared among the threads. These are the pool, implemented as a synchronized LinkedList from the Java Collections API, and the output log, implemented as a BufferedWriter named out. Individual threads will have direct access to the pool but will have to pass through PooledWeblog’s log( ) method to write output.

The key method is processLogFile( ). As before, this method reads from the underlying log file. However, each entry is placed in the entries pool rather than being immediately processed. Because this method is likely to run much more quickly than the threads that have to access DNS, it yields after reading each entry. Furthermore, it goes to sleep if there are more entries in the pool than threads available to process them. The amount of time it sleeps depends on the number of threads. This will avoid using excessive amounts of memory for very large log files. When the last entry is read, the finished flag is set to true to tell the threads that they can die once they’ve completed their work.

Example 6-11. PooledWebLog

import java.io.*;
import java.util.*;
import com.macfaq.io.SafeBufferedReader;


public class PooledWeblog {

  private BufferedReader in;
  private BufferedWriter out;
  private int numberOfThreads;
  private List entries = Collections.synchronizedList(new LinkedList(  ));
  private boolean finished = false;
  private int test = 0;


  public PooledWeblog(InputStream in, OutputStream out, 
   int numberOfThreads) {
    this.in = new BufferedReader(new InputStreamReader(in));
    this.out = new BufferedWriter(new OutputStreamWriter(out));
    this.numberOfThreads = numberOfThreads;
  }
  
  public boolean isFinished(  ) {
    return this.finished; 
  }
  
  public int getNumberOfThreads(  ) {
    return numberOfThreads; 
  }
  
  public void processLogFile(  ) {
  
    for (int i = 0; i < numberOfThreads; i++) {
      Thread t = new LookupThread(entries, this);
      t.start(  );
    }
    
    try {

      String entry = null;
      while ((entry = in.readLine(  )) != null) {
        
        if (entries.size(  ) > numberOfThreads) {
          try {
            Thread.sleep((long) (1000.0/numberOfThreads));
          }
          catch (InterruptedException e) {}
          continue;
        }

        synchronized (entries) {
          entries.add(0, entry);
          entries.notifyAll(  ); 
        }
        
        Thread.yield(  );
        
      } // end while
      
    }
    catch (IOException e) {
      System.out.println("Exception: " + e);
    }
    
    this.finished = true;
    
    // finish any threads that are still waiting
    synchronized (entries) {
      entries.notifyAll(  ); 
    }

  }
  
  public void log(String entry) throws IOException {
    out.write(entry + System.getProperty("line.separator", "
"));
    out.flush(  );
  }
  
  public static void main(String[] args) {

    try {
      PooledWeblog tw = new PooledWeblog(new FileInputStream(args[0]), 
       System.out, 100);
      tw.processLogFile(  );
    }
    catch (FileNotFoundException e) {
      System.err.println("Usage: java PooledWeblog logfile_name");
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.err.println("Usage: java PooledWeblog logfile_name");
    }
    catch (Exception e) {
      System.err.println(e);
      e.printStackTrace(  );
    }

  }  // end main

}

The detailed work of converting IP addresses to hostnames in the log entries is handled by the LookupThread class, shown in Example 6.12. The constructor provides each thread with a reference to the entries pool it will retrieve work from and a reference to the PooledWeblog object it’s working for. The latter reference allows callbacks to the PooledWeblog so that the thread can log converted entries and check to see when the last entry has been processed. It does so by calling the isFinished( ) method in PooledWeblog when the entries pool is empty (has size 0). Neither an empty pool nor isFinished( ) returning true is sufficient by itself. isFinished( ) returns true after the last entry is placed in the pool, which is, at least for a small amount of time, before the last entry is removed from the pool. And entries may be empty while there are still many entries remaining to be read, if the lookup threads outrun the main thread reading the log file.

Example 6-12. LookupThread

import java.net.*; 
import java.io.*;
import java.util.*;

public class LookupThread extends Thread {

  private List entries;
  PooledWeblog log;   // used for callbacks
  
  public LookupThread(List entries, PooledWeblog log) {
    this.entries = entries;
    this.log = log;
  }
  
  public void run(  ) {
  
    String entry;

    while (true) {
    
      synchronized (entries) {
        while (entries.size(  ) == 0) {
          if (log.isFinished(  )) return;
          try {
            entries.wait(  );
          }
          catch (InterruptedException e) {
          }
        }       
        entry = (String) entries.remove(entries.size(  )-1);
      }
      
      int index = entry.indexOf(' ', 0);
      String remoteHost = entry.substring(0, index);
      String theRest = entry.substring(index, entry.length(  ));

      try {
        remoteHost = InetAddress.getByName(remoteHost).getHostName(  );
      }
      catch (Exception e) {
        // remoteHost remains in dotted quad format
      }

      try {
        log.log(remoteHost + theRest);
      }
      catch (IOException e) {
      } 
      this.yield(  );
      
    }

  }

}

Using threads like this lets the same log files be processed in parallel. This is a huge time savings. In my unscientific tests, the threaded version is 10 to 50 times faster than the sequential version.

The biggest disadvantage to the multithreaded approach is that it reorders the log file. The output statistics aren’t necessarily in the same order as the input statistics. For simple hit counting, this doesn’t matter. However, there are some log analysis tools that can mine a log file to determine paths users followed through a site. These could well get confused if the log is out of sequence. If that’s an issue, you’d need to attach a sequence number to each log entry. As the individual threads returned log entries to the main program, the log( ) method in the main program would store any that arrived out of order until their predecessors appeared. This is in some ways reminiscent of how network software reorders TCP packets that arrive out of order.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.32.86