What we are looking for

The information extracted from a web server access log is very rich and give us good material for infinite possibilities of study. Being simple and direct, it is possible to count the number of requests that our web server receives just by counting the number of lines that the access log has. But we can expand our analysis and try to measure the average of the data traffic in bytes over the time, for example.

Recently, one of the most widely used services is the application performance management system, also known as APMs. Nowadays, these services are commonly offered as software-as-a-service and the main goal is to give us a view of an application's performance and health.

APMs are a good example of what can be analyzed based on the information extracted from the access log, due to the fact that a good part of the information that APMs generate is based on the access logs.

Attention! I am not saying that an APM works based only on the access log, but a good part of the information generated by APMs can be extracted from the access log. Okay?

As said at the beginning of this chapter, we do not have any intention of coding or creating an entire system, but we will show in practice how we can keep the access log information for an eventual analysis using MongoDB.

Based on APMs, we will structure our example on an analysis of web server resource throughput. It is possible to perform this analysis only with the information contained on the web server access log. To do so, what data do we need in our access log? And should we use the combined format?

Measuring the traffic on the web server

The throughput in our web server will be estimated based on the number of requests for a given period of time, that is, requests in a day, in an hour, in a minute, or in a second. The number of requests per minute is a very reasonable measure for a real-time monitoring.

The throughput is calculated by counting the requests processed in our web server. Because of this, it is not necessary to work with specific data from the access log. Nevertheless, in order to make possible a richer further analysis of our data, we will create a specific log format that will collect request information such as the HTTP status code, the request time, and length.

Both Apache HTTP and Nginx allow us to customize the access log or to create a new file with a custom format. The second option seems to be perfect. Before we start to configure our web server, we will create our log format using the variables previously explained. Just remember that we are working on an Nginx web server.

$remote_addr [$time_local] "$request" $status $request_time $request_length

As we defined our log format, we can configure our Nginx web server. To do so, let's perform the following steps:

  1. First of all, to define this new format in Nginx, we need to edit the nginx.conf file, adding a new entry in the HTTP element with the new log format:
    log_format custom_format '$remote_addr [$time_local] "$request" $status $request_time $request_length';
  2. Now we need to add another entry in nginx.conf file that defines in which file the new custom log will be written:
    access_log /var/log/nginx/custom_access.log custom_format;
  3. To apply our changes, reload the Nginx web server executing the following command in a terminal:
    /usr/sbin/nginx reload
    
  4. After reloading the Nginx server, we can look at our new log file, /var/log/nginx/custom_access.log, and check whether the lines are like the following lines:
    191.32.254.162 [29/Mar/2015:18:35:26 -0400] "GET / HTTP/1.1" 200 0.755 802

Log format configured, web server set up; it is time to design our schema.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.237.24