Chapter 6. Searching and Reporting

We are going to learn about the not-so-common searching and reporting techniques that have been used in real-world scenarios in order to present data both accurately and in the pretty format that leadership asks us for. There are also some caveats to the Search Processing Language that we need to understand when performing a search, not only for presentation purposes, but also for presenting accurate data.

We will learn about:

  • General practices (efficiencies):
    • Core fields
    • Case sensitivity
    • Inclusive versus exclusive

  • Search modes:
    • Fast Mode
    • Smart Mode
    • Verbose Mode

  • Advanced charting: 
    • Overlay
    • Xyseries
    • Appending results
    • Day-over-day overlay

General practices

Of course we can search whatever we want in Splunk, using it in a similar way to the way we use Google, for our log files, but there are some ways to make searching itself more efficient, and faster. There are a few things to understand when making your query practice more efficient, and I will use a few that are commonly overlooked.

This may be more editorial than technical, but it might be helpful here, to quickly describe the three components of a Splunk search before starting to explain core search. This seems to help a lot of people to understand searching as a concept:

  • Core search (what data will be included in the search?):
  • Function or calculation
  • Formatting or presentation

Let's see an example of them:

  1. Perform a Core search:
            Index=test index sourcetype=bookstuff action=purchase 
    
  2. Then add a function or calculation:
            Index=test index sourcetype=bookstuff action=purchase | 
            stats avg(latency) as Delay by host
    
  3. Then tell Splunk how to present the results:
            Index=test index sourcetype=bookstuff action=purchase | 
            stats avg(latency) as Delay by host | table host Delay 
    
    

Now that layout is clear, it might be easier to explain each component in more depth.

Core fields (root search)

There are a few fields that Splunk writes to disk by default, and they can go quite far in helping you decrease search times. Using as many as these fields as you can as part of your root search will help you decrease your query time.

_time

Time is always the most effective filter when performing searches. If you can limit the time you are searching, you limit the amount of data that Splunk has to look through, in order to find your results.

Index

Pointing Splunk to a specific index at the time of search begins filtering the data out that you don't want to see. The index field will relegate the rest of the query to that dataset (not including subsearches and joins, and so on) and will assist in bringing down the search time considerably.

Sourcetype

The sourcetype field is a way to limit the amount of data searched to a specific subset of the whole. Often when inputting data into Splunk, each log type is a different dataset which is allocated as a separate sourcetype such as IIS, firewall, or syslog in order to define the dataset.

Host

The host field is written to disk at indexing time and also limits the amount of data that Splunk has to search through in order to get you the results you're looking for. If you have 500 machines dumping data to Splunk, and you only care about four hosts for your specific search, just use the OR clause to search those hosts.

Source

The source field is often the actual filepath/logfile that is being ingested, and if you can use this field then it also helps to reduce the amount of time your searches take. Often using the source field is not practical; however you can use it in your query and if it makes sense, it will help your search return results faster as well.

The moral of using these fields is that the more specific you can be with Splunk, the faster it will return the results, and specifically the results you want to see. With Splunk, the devil is in the detail, and the details in Splunk are so granular that we have to train our eyes to look for the smallest things that could affect our results. It can be as simple as a capital T where it should be lowercase t. A single character can throw off your result set by an amazing amount, so the more info you can give Splunk the better.

As a best practice, if you use only the fields you care about after your root search, you will increase your search times as well.

Take for instance iis data:

If this is your original search:

index=access_combined sourcetype=iis 

And you wanted to increase your search times, you could add only the fields you care about like this:

index=access_combined sourcetype=iis | fields host cs_host time_taken cs_ip sc_ip User 

And you should see improvements in the returned results .

Case sensitivity

Case sensitivity doesn't matter too much in Splunk, but it does make a difference when searching in order not to get No results found. Case sensitivity is generally understood in SPL, but I've only seen it written down in a few places, so I figured I would present those to you here.

Here is a table that should explain most of what is case sensitive, and what's not:

Case Sensitive

Example

Boolean operators

AND, OR, NOT

Field names

ipAddr versus ipaddr

Lookup field values

vendorName=Verizon versus vendorName=verizon

Regular expressions

ddd versus DDD

if and case functions (eval/where commands)

eval action=if(action=="login",...)

where action="login"

stats count(eval(action="login") as...

CASE()

CASE(login)

Tags

infosec versus INFOSEC

Inclusive versus exclusive

On that note as well, inclusive searches are generally better than exclusive searches. Meaning that using terms to include data you want is going to be faster than attempting to exclude data you don't want to see. Sometimes it's unavoidable, although, in my mind, knowing this kind of information is often better than not knowing. It may also be important to note the AND function of Splunk is implied here, so host=host1 AND host=host2 is not necessary.

As an example:

index=access_combined sourcetype=iis host!=host1 host!=host5 | fields host cs_host time_taken cs_ip sc_ip User 

Will generally be slower at returning results than:

index=access_combined sourcetype=iis host=host2 OR host=host2 OR host=host3 | fields host cs_host time_taken cs_ip sc_ip User 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.37.169