This section explains how you can identify a user's country name, the operating system type, and the browser type by analyzing the server log line. By identifying the country name, we can easily identify the locations from where our site is attracting more attention and where it is getting less attention. Let's perform the following steps to identify the country name, operating system type, and browser type from the Apache log line:
geoip
library to identify the country name from the IP address. Add the following dependencies to the pom.xml
file:<dependency> <groupId>org.geomind</groupId> <artifactId>geoip</artifactId> <version>1.2.8</version> </dependency>
pom
.xml
file:<repository> <id>geoip</id> <url>http://snambi.github.com/maven/</url> </repository>
IpToCountryConverter
class in the com.learningstorm.stormlogprocessing
package. This class contains the parameterized constructor that will take the location of the GeoLiteCity.dat
file. You can find the GeoLiteCity.dat
file in the Resources
folder of the stormlogprocessing
project. The location of the GeoLiteCity.dat
file must be the same in all Storm nodes. The GeoLiteCity.dat
file is the database we will use to identify the country name when the IP address is given. The following is the source code of the IpToCountryConverter
class:/** * This class contains logic to identify * the country name from the IP address */ public class IpToCountryConverter { private static LookupService cl = null; /** * A parameterized constructor which would take * the location of the GeoLiteCity.dat file as input. * * @param pathTOGeoLiteCityFile */ public IpToCountryConverter(String pathTOGeoLiteCityFile) { try { cl = new LookupService("pathTOGeoLiteCityFile", LookupService.GEOIP_MEMORY_CACHE); } catch (Exception exception) { throw new RuntimeException( "Error occurred while initializing IpToCountryConverter class: "); } } /** * This method takes the IP address of the input and * converts it into a country name. * * @param ip * @return */ public String ipToCountry (String ip) { Location location = cl.getLocation(ip); if (location == null) { return "NA"; } if (location.countryName == null) { return "NA"; } return location.countryName; } }
UserAgentTools
class from https://code.google.com/p/ndt/source/browse/branches/applet_91/Applet/src/main/java/edu/internet2/ndt/UserAgentTools.java?r=856.This class contains the logic to identify the operating system and the browser type from the user agent class. You can also find the UserAgentTools
class in the stormlogprocessing
project.
UserInformationGetterBolt
class to the com.learningstorm.stormlogprocessing
package as follows. This bolt uses the UserAgentTools
and IpToCountryConverter
classes to identify the country name, the operating system type, and the browser type:/** * This class uses the IpToCountryConverter and * UserAgentTools classes to identify * the country, os, and browser from log line. */ public class UserInformationGetterBolt extends BaseRichBolt { private static final long serialVersionUID = 1L; private IpToCountryConverter ipToCountryConverter = null; private UserAgentTools userAgentTools = null; public OutputCollector collector; private String pathTOGeoLiteCityFile; public UserInformationGetterBolt(String pathTOGeoLiteCityFile) { // set the path of the GeoLiteCity.dat file. this.pathTOGeoLiteCityFile = pathTOGeoLiteCityFile; } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("ip", "dateTime", "request", "response", "bytesSent", "referrer", "useragent", "country", "browser", "os")); } public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { this.collector = collector; this.ipToCountryConverter = new IpToCountryConverter( this.pathTOGeoLiteCityFile); this.userAgentTools = new UserAgentTools(); } public void execute(Tuple input) { String ip = input.getStringByField("ip").toString(); // Identify the country using the IP Address Object country = ipToCountryConverter.ipToCountry(ip); // Identify the browser using useragent. Object browser = userAgentTools.getBrowser(input.getStringByField( "useragent").toString())[1]; // Identify the os using useragent. Object os = userAgentTools.getOS(input.getStringByField("useragent").toString())[1]; collector.emit(new Values(input.getString(0), input.getString(1), input.getString(2), input.getString(3), input.getString(4), input.getString(5), input.getString(6), country, browser, os)); } }
The output of the UserInformationGetterBolt
class contains ten fields. These fields are ip
, dateTime
, request
, response
, bytesSent
, referrer
, useragent
, country
, browser
, and os
.
18.189.186.167