One of the data samples we loaded in Chapter 1, Play Time – Getting Data In, contained access logs from our web server. These have a Splunk sourcetype of access_combined
and detail all pages accessed by the users of our web application. We are particularly interested in knowing which pages are being accessed the most, as this information provides great insight into how our e-commerce web application is being used. It could also help influence changes to our web application such that rarely visited pages are removed, or our application is redesigned to be more efficient.
In this recipe, we will write a Splunk search to find the most accessed web pages over a given period of time.
To step through this recipe, you will need a running Splunk Enterprise server, with the sample data loaded from Chapter 1, Play Time – Getting Data In. You should be familiar with the Splunk search bar and the time range picker to the right of it.
Follow the given steps to search for the most accessed web pages:
index=main sourcetype=access_combined | stats count by uri_path | sort - count
cp02_most_accessed_webpages
and click on Save. On the next screen, click on Continue Editing to return to the search.Let's break down the search piece by piece:
We can further build upon the base search to provide different variations of the results.
We can modify the search from this recipe and replace the stats
command with the top
command. By default, this will display the top 10 web pages:
sourcetype=access_combined index=main | top uri_path
Here, we modified the search and replaced the stats
command with the top
command. By default, this displays the top 10 web pages. If we want to get the top 20 web pages, we can specify a limit value, as follows:
sourcetype=access_combined index=main | top limit=20 uri_path
We can modify the search from this recipe and can use the distinct count (dc
) function of the stats
command to display a list of users and the unique pages they visited:
sourcetype=access_combined index=main | stats dc(uri_path) by user | sort - user
The distinct count function ensures that if a user visits the same page multiple times, it is only counted as one visit. The user who visited the most number of unique pages will be at the top of the list, as we used a descending sort.
For more information on the various functions that can be used with the stats
command, check out http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/CommonStatsFunctions.
3.138.200.66