Identifying the country, the operating system type, and the browser type from the logfile

This section explains how you can identify a user's country name, the operating system type, and the browser type by analyzing the server log line. By identifying the country name, we can easily identify the locations from where our site is attracting more attention and where it is getting less attention. Let's perform the following steps to identify the country name, operating system type, and browser type from the Apache log line:

  1. We will use the open source geoip library to identify the country name from the IP address. Add the following dependencies to the pom.xml file:
        <dependency>
          <groupId>org.geomind</groupId>
          <artifactId>geoip</artifactId>
          <version>1.2.8</version>
        </dependency>
  2. Add the following repository to the pom.xml file:
        <repository>
          <id>geoip</id>
          <url>http://snambi.github.com/maven/</url>
        </repository>
  3. We will create an IpToCountryConverter class in the com.learningstorm.stormlogprocessing package. This class contains the parameterized constructor that will take the location of the GeoLiteCity.dat file. You can find the GeoLiteCity.dat file in the Resources folder of the stormlogprocessing project. The location of the GeoLiteCity.dat file must be the same in all Storm nodes. The GeoLiteCity.dat file is the database we will use to identify the country name when the IP address is given. The following is the source code of the IpToCountryConverter class:
    /**
     * This class contains logic to identify
     * the country name from the IP address
    */
    public class IpToCountryConverter {
    
      private static LookupService cl = null;
    
      /**
       * A parameterized constructor which would take 
       * the location of the GeoLiteCity.dat file as input.
       * 
       * @param pathTOGeoLiteCityFile
       */
      public IpToCountryConverter(String pathTOGeoLiteCityFile) {
        try {
          cl = new LookupService("pathTOGeoLiteCityFile",
              LookupService.GEOIP_MEMORY_CACHE);
        } catch (Exception exception) {
          throw new RuntimeException(
              "Error occurred while initializing IpToCountryConverter class: ");
        }
      }
    
      /**
       * This method takes the IP address of the input and
       * converts it into a country name.
       * 
       * @param ip
       * @return
       */
      public String ipToCountry (String ip) {
        Location location = cl.getLocation(ip);
        if (location == null) {
          return "NA";
        }
        if (location.countryName == null) {
          return "NA";
        }
        return location.countryName;
      }
    }
  4. Now, download the UserAgentTools class from https://code.google.com/p/ndt/source/browse/branches/applet_91/Applet/src/main/java/edu/internet2/ndt/UserAgentTools.java?r=856.

    This class contains the logic to identify the operating system and the browser type from the user agent class. You can also find the UserAgentTools class in the stormlogprocessing project.

  5. Let's write the UserInformationGetterBolt class to the com.learningstorm.stormlogprocessing package as follows. This bolt uses the UserAgentTools and IpToCountryConverter classes to identify the country name, the operating system type, and the browser type:
    /**
     * This class uses the IpToCountryConverter and
     * UserAgentTools classes to identify
     * the country, os, and browser from log line.
    */
    public class UserInformationGetterBolt extends BaseRichBolt {
    
      private static final long serialVersionUID = 1L;
      private IpToCountryConverter ipToCountryConverter = null;
      private UserAgentTools userAgentTools = null;
      public OutputCollector collector;
      private String pathTOGeoLiteCityFile;
    
      public UserInformationGetterBolt(String pathTOGeoLiteCityFile) {
        // set the path of the GeoLiteCity.dat file.
        this.pathTOGeoLiteCityFile = pathTOGeoLiteCityFile;
      }
    
      public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("ip", "dateTime", "request", "response",
            "bytesSent", "referrer", "useragent", "country", "browser",
            "os"));
      }
      public void prepare(Map stormConf, TopologyContext context,
          OutputCollector collector) {
        this.collector = collector;
        this.ipToCountryConverter = new IpToCountryConverter(
            this.pathTOGeoLiteCityFile);
        this.userAgentTools = new UserAgentTools();
    
      }
    
      public void execute(Tuple input) {
    
        String ip = input.getStringByField("ip").toString();
        
        // Identify the country using the IP Address
        Object country = ipToCountryConverter.ipToCountry(ip);
    
        // Identify the browser using useragent.
        Object browser = userAgentTools.getBrowser(input.getStringByField(
            "useragent").toString())[1];
    
        // Identify the os using useragent.
        Object os = userAgentTools.getOS(input.getStringByField("useragent").toString())[1];
        collector.emit(new Values(input.getString(0), input.getString(1), input.getString(2), input.getString(3), input.getString(4), input.getString(5), input.getString(6), country, browser, os));
    
      }
    }

    The output of the UserInformationGetterBolt class contains ten fields. These fields are ip, dateTime, request, response, bytesSent, referrer, useragent, country, browser, and os.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.186.167