We are going to explore Apache web logs taken from the online store. These logs are taken from the Apache web server and uploaded to HDFS. You'll see how to read Apache logs out of the box. The name of the store is unicorn fashion. Here is an example log line:
135.51.156.129 - - [02/Dec/2013:13:52:29] "POST /product.screen?productName=SHORTS&JSESSIONID=CA10MO9AZ5USANA4955 HTTP 1.1" 200 2334 "http://www.yahoo.com" "Opera/9.01 (Windows NT 5.1; U; en)" 167
It's a normal Apache access combined log. We can build reports, dashboards, and alerts on top of this data. You will:
You know already how to create a virtual index; we provide a screenshot with an index configuration:
Let's try to create some reports in order to meet basic functionality of Hunk and Search Processing Language (SPL).
Let's get the top five browsers used by online store visitors. We need to start the Explore data wizard:
digital_analytics
virtual index:Your screen should look like this:
http://quickstart.cloudera:8000/en-US/app/launcher/home
and click on Search & Reporting:index="digital_analytics" | top 5 useragent
The search interface in front of you should look like this:
Let's create one more report. We are going to display the sources of site traffic. Go back to the query search and use the following expression to get referrers:
index="digital_analytics" referer != *unicorn*| top referer percentfield=percent
You should read the expression in this way:
digital_analytics
index.unicorn
substring.You can read more about top
command here: http://docs.splunk.com/Documentation/Splunk/6.2.2/SearchReference/Top.
You'll see a nice job report providing insights into why it's so slow or extremely fast. There is also a link to a log file. You might need it later if you encounter errors:
Let's perform naive analytics to count the errors occurring on our site.
index="digital_analytics" | chart count by status
Use bars to visualize the result:
We see various status codes. Let's pay special attention to code=500, which indicates an error on the server side.
eval
expression.index="digital_analytics" | eval errorRatio = if (status ==500, "ERROR", "ELSE") | timechart count by errorRatio | sort –errorRatio
The idea of the expression is to:
digital_analytics
index.errorRatio
. If the status
field in the index has the value 500
, then errorRatio = «ERROR»
; otherwise, the errorRatio
field gets the value «ELSE»
.errorRatio
occurrences over time and sort by the count of errorRatio
:Hunk can issue alerts when a condition is met. Let's configure an alert when the error count threshold is reached.
=500
, which signifies an error on the server side:index="digital_analytics" status=500 | stats count
It's hard to emulate scheduled activity right now; let's pick the most simple case and see how it works generally.
The following screenshot shows the settings for saving a new alert:
The following screenshot is an overview of the created alert:
Now it's time to see how dashboards work. Let's find regions where visitors get problems (status = 500
) while using our online store:
index="digital_analytics" status=500 | iplocation clientip | geostats latfield=lat longfield=lon count by Country
You should see a map showing country errors:
Now let's save this as a dashboard. Click on Save As and select Dashboard Panel from the drop-down menu:
The following screenshot shows the values for fields in the Save form:
You should get a new dashboard with a single panel and our report on it. We have several previously created reports. Let's add them to the newly created dashboard using separate panels. Click on Edit | Edit Panels:
Select Add new panel | New from report and add one of our reports:
3.139.70.101