We are going to learn about the not-so-common searching and reporting techniques that have been used in real-world scenarios in order to present data both accurately and in the pretty format that leadership asks us for. There are also some caveats to the Search Processing Language that we need to understand when performing a search, not only for presentation purposes, but also for presenting accurate data.
We will learn about:
Of course we can search whatever we want in Splunk, using it in a similar way to the way we use Google, for our log files, but there are some ways to make searching itself more efficient, and faster. There are a few things to understand when making your query practice more efficient, and I will use a few that are commonly overlooked.
This may be more editorial than technical, but it might be helpful here, to quickly describe the three components of a Splunk search before starting to explain core search. This seems to help a lot of people to understand searching as a concept:
Let's see an example of them:
Index=test index sourcetype=bookstuff action=purchase
Index=test index sourcetype=bookstuff action=purchase |
stats avg(latency) as Delay by host
Index=test index sourcetype=bookstuff action=purchase |
stats avg(latency) as Delay by host | table host Delay
Now that layout is clear, it might be easier to explain each component in more depth.
There are a few fields that Splunk writes to disk by default, and they can go quite far in helping you decrease search times. Using as many as these fields as you can as part of your root search will help you decrease your query time.
Time is always the most effective filter when performing searches. If you can limit the time you are searching, you limit the amount of data that Splunk has to look through, in order to find your results.
Pointing Splunk to a specific index at the time of search begins filtering the data out that you don't want to see. The index
field will relegate the rest of the query to that dataset (not including subsearches and joins, and so on) and will assist in bringing down the search time considerably.
The sourcetype
field is a way to limit the amount of data searched to a specific subset of the whole. Often when inputting data into Splunk, each log type is a different dataset which is allocated as a separate sourcetype
such as IIS, firewall, or syslog in order to define the dataset.
The host
field is written to disk at indexing time and also limits the amount of data that Splunk has to search through in order to get you the results you're looking for. If you have 500 machines dumping data to Splunk, and you only care about four hosts for your specific search, just use the OR
clause to search those hosts.
The source
field is often the actual filepath/logfile that is being ingested, and if you can use this field then it also helps to reduce the amount of time your searches take. Often using the source
field is not practical; however you can use it in your query and if it makes sense, it will help your search return results faster as well.
The moral of using these fields is that the more specific you can be with Splunk, the faster it will return the results, and specifically the results you want to see. With Splunk, the devil is in the detail, and the details in Splunk are so granular that we have to train our eyes to look for the smallest things that could affect our results. It can be as simple as a capital T where it should be lowercase t. A single character can throw off your result set by an amazing amount, so the more info you can give Splunk the better.
As a best practice, if you use only the fields you care about after your root search, you will increase your search times as well.
Take for instance iis
data:
If this is your original search:
index=access_combined sourcetype=iis
And you wanted to increase your search times, you could add only the fields you care about like this:
index=access_combined sourcetype=iis | fields host cs_host time_taken cs_ip sc_ip User
And you should see improvements in the returned results .
Case sensitivity doesn't matter too much in Splunk, but it does make a difference when searching in order not to get No results found. Case sensitivity is generally understood in SPL, but I've only seen it written down in a few places, so I figured I would present those to you here.
Here is a table that should explain most of what is case sensitive, and what's not:
Case Sensitive |
Example |
Boolean operators |
|
Field names |
|
Lookup field values |
|
Regular expressions |
|
if and case functions (eval/where commands) |
|
CASE() |
|
Tags |
|
On that note as well, inclusive searches are generally better than exclusive searches. Meaning that using terms to include data you want is going to be faster than attempting to exclude data you don't want to see. Sometimes it's unavoidable, although, in my mind, knowing this kind of information is often better than not knowing. It may also be important to note the AND
function of Splunk is implied here, so host=host1 AND host=host2
is not necessary.
As an example:
index=access_combined sourcetype=iis host!=host1 host!=host5 | fields host cs_host time_taken cs_ip sc_ip User
Will generally be slower at returning results than:
index=access_combined sourcetype=iis host=host2 OR host=host2 OR host=host3 | fields host cs_host time_taken cs_ip sc_ip User
18.222.37.169