Finding the most accessed web pages

One of the data samples we loaded in Chapter 1, Play Time – Getting Data In, contained access logs from our web server. These have a Splunk sourcetype of access_combined and detail all pages accessed by the users of our web application. We are particularly interested in knowing which pages are being accessed the most, as this information provides great insight into how our e-commerce web application is being used. It could also help influence changes to our web application such that rarely visited pages are removed, or our application is redesigned to be more efficient.

In this recipe, we will write a Splunk search to find the most accessed web pages over a given period of time.

Getting ready

To step through this recipe, you will need a running Splunk Enterprise server, with the sample data loaded from Chapter 1, Play Time – Getting Data In. You should be familiar with the Splunk search bar and the time range picker to the right of it.

How to do it…

Follow the given steps to search for the most accessed web pages:

  1. Log in to your Splunk server.
  2. Select the Search & Reporting application.
  3. Set the range picker to Last 24 hours and type the following search into the Splunk search bar. Then, click on Search or hit Enter.
    index=main sourcetype=access_combined | stats count by uri_path | sort - count
  4. Splunk will return a list of pages, and a new field named count displays the total number of times a page has been accessed.
    How to do it…
  5. Save this search by clicking on Save As and then on Report. Give the report the name cp02_most_accessed_webpages and click on Save. On the next screen, click on Continue Editing to return to the search.

How it works…

Let's break down the search piece by piece:

Search fragment

Description

index=main

All the data in Splunk is held in one or more indexes. While not strictly necessary, it is a good practice to specify the index(es) to search, as this will ensure a more precise search.

sourcetype=access_combined

This tells Splunk to search only the data associated with the access_combined sourcetype, which, in our case, is the web access logs.

| stats count by uri_path

Using the stats command, we take the result of our search to the left-hand side of the pipe and tell Splunk to count the instances of each uri_path. The uri_path field is the name of the field associated with the website page.

| sort – count

Using the sort command, we take the count field generated by stats and tell Splunk to sort the results of the previous command in descending (-) order, such that the most visited web page appears at the top of the results.

There's more…

We can further build upon the base search to provide different variations of the results.

Searching for the top 10 accessed web pages

We can modify the search from this recipe and replace the stats command with the top command. By default, this will display the top 10 web pages:

sourcetype=access_combined index=main | top uri_path

Here, we modified the search and replaced the stats command with the top command. By default, this displays the top 10 web pages. If we want to get the top 20 web pages, we can specify a limit value, as follows:

sourcetype=access_combined index=main | top limit=20 uri_path

Searching for the most accessed pages by user

We can modify the search from this recipe and can use the distinct count (dc) function of the stats command to display a list of users and the unique pages they visited:

sourcetype=access_combined index=main | stats dc(uri_path) by user | sort - user

The distinct count function ensures that if a user visits the same page multiple times, it is only counted as one visit. The user who visited the most number of unique pages will be at the top of the list, as we used a descending sort.

Note

For more information on the various functions that can be used with the stats command, check out http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/CommonStatsFunctions.

See also

Also refer to the following recipes for more information:

  • The Finding the most used web browsers recipe
  • The Identifying the top-referring websites recipe
  • The Charting web page response codes recipe
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.200.66