Exploring data

We are going to explore Apache web logs taken from the online store. These logs are taken from the Apache web server and uploaded to HDFS. You'll see how to read Apache logs out of the box. The name of the store is unicorn fashion. Here is an example log line:

135.51.156.129 - - [02/Dec/2013:13:52:29] "POST /product.screen?productName=SHORTS&JSESSIONID=CA10MO9AZ5USANA4955 HTTP 1.1" 200 2334 "http://www.yahoo.com" "Opera/9.01 (Windows NT 5.1; U; en)" 167

It's a normal Apache access combined log. We can build reports, dashboards, and alerts on top of this data. You will:

  • Learn the basics of SPL to create queries
  • Learn visualization abilities
  • Drill-down from the aggregated report to the underlying detailed data
  • Check the job details used to prepare report data
  • Create alerts and see a simple alert use-case
  • Create a dashboard presenting web analytics reports on a single page
  • Create a virtual index

You know already how to create a virtual index; we provide a screenshot with an index configuration:

Exploring data

Creating reports

Let's try to create some reports in order to meet basic functionality of Hunk and Search Processing Language (SPL).

The top five browsers report

Let's get the top five browsers used by online store visitors. We need to start the Explore data wizard:

  1. Go to Virtual indexes:
    The top five browsers report
  2. Click on Explore Data.
    The top five browsers report
  3. Pick the Hadoop provider and our digital_analytics virtual index:
    The top five browsers report
  4. Select a file and click on Next:
    The top five browsers report
  5. Select Web | Access combined as a type for logs:
    The top five browsers report

    Your screen should look like this:

    The top five browsers report
  6. Complete the context settings:
    The top five browsers report
  7. Choose search in App Context and select App under Sharing Context.
  8. Review the settings and click on Finish. Now we are ready to create our first dashboard.
  9. Open http://quickstart.cloudera:8000/en-US/app/launcher/home and click on Search & Reporting:
    The top five browsers report
  10. Use a query to get the top five browsers:
    index="digital_analytics" | top 5 useragent
    

    The search interface in front of you should look like this:

    The top five browsers report
  11. Save the report by selecting Save As | Report from the drop-down list:
    The top five browsers report
  12. The settings for the report to be saved are as follows:
    The top five browsers report
  13. The report will have visualizations in the Column and Time Range Picker controls.
  14. Here is the final result. Go to the Reports page:
    The top five browsers report
  15. Select the report named Report top 5 browsers that we have just created:
    The top five browsers report

Top referrers

Let's create one more report. We are going to display the sources of site traffic. Go back to the query search and use the following expression to get referrers:

index="digital_analytics" referer != *unicorn*| top referer percentfield=percent 

You should read the expression in this way:

  1. Use the digital_analytics index.
  2. Exclude lines where the referrer field contains the unicorn substring.
  3. Group by the referrer field value, count those lines having the same referrer value, and order counts in descending order.

    Note

    You can read more about top command here: http://docs.splunk.com/Documentation/Splunk/6.2.2/SearchReference/Top.

  4. Check your search result page:
    Top referrers
  5. Check the job statistics.

    Note

    Hunk provides you with a nice way to get access to job counters and logs. This could be useful later when you are interested in fine-tuning performance.

  6. Click on Job and select Inspect Job from the drop-down list:
    Top referrers

    You'll see a nice job report providing insights into why it's so slow or extremely fast. There is also a link to a log file. You might need it later if you encounter errors:

    Top referrers
  7. Select a visualization for the report. Click on the Visualization tab and select a pie chart from the Drilldown option:
    Top referrers
  8. You can click on the pie and get detailed information. Click on any sector:
    Top referrers
  9. Hunk will automatically create a query for you to display detailed data:
    Top referrers
  10. Save the top referrer report:
    Top referrers

Site errors report

Let's perform naive analytics to count the errors occurring on our site.

  1. See the statuses logged by the Apache server.
    index="digital_analytics" | chart count by status
    

    Use bars to visualize the result:

    Site errors report

    We see various status codes. Let's pay special attention to code=500, which indicates an error on the server side.

  2. Calculate the error ratio using the eval expression.
    index="digital_analytics" | eval errorRatio = if (status ==500, "ERROR", "ELSE") | timechart count by errorRatio | sort –errorRatio
    

    The idea of the expression is to:

    1. Use the digital_analytics index.
    2. Calculate the field with the name errorRatio. If the status field in the index has the value 500, then errorRatio = «ERROR»; otherwise, the errorRatio field gets the value «ELSE».
    3. Count errorRatio occurrences over time and sort by the count of errorRatio:
    Site errors report
  3. Save the report:
    Site errors report

Creating alerts

Hunk can issue alerts when a condition is met. Let's configure an alert when the error count threshold is reached.

  1. Use a query to count the status with code =500, which signifies an error on the server side:
    index="digital_analytics" status=500 | stats count
    
  2. This query returns the error count. Select Save As | Alert:
    Creating alerts

    It's hard to emulate scheduled activity right now; let's pick the most simple case and see how it works generally.

    Note

    You should definitely choose the scheduled type of alert in production. The idea is to run the query periodically and issue an alert. It could optionally be sent as an email so the operator can react appropriately.

    The following screenshot shows the settings for saving a new alert:

    Creating alerts
  3. We've chosen Per-Result to get an alert each time the report returns something.
  4. Set the alert to be displayed by selecting Activity | Triggered Alerts:
    Creating alerts

    The following screenshot is an overview of the created alert:

    Creating alerts
  5. Go to Activity | Triggered Alerts and confirm that the alert has been published:
    Creating alerts

Creating a dashboard

Now it's time to see how dashboards work. Let's find regions where visitors get problems (status = 500) while using our online store:

index="digital_analytics" status=500 | iplocation clientip | geostats latfield=lat longfield=lon count by Country

You should see a map showing country errors:

Creating a dashboard

Now let's save this as a dashboard. Click on Save As and select Dashboard Panel from the drop-down menu:

Creating a dashboard

The following screenshot shows the values for fields in the Save form:

Creating a dashboard

You should get a new dashboard with a single panel and our report on it. We have several previously created reports. Let's add them to the newly created dashboard using separate panels. Click on Edit | Edit Panels:

Creating a dashboard

Select Add new panel | New from report and add one of our reports:

Creating a dashboard

You should get one page with four reports at the end.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.70.101