Improving the data input process

Data input is a very important process before you generate insight and visualizations from data. So, it is very important that the data is indexed, parsed, processed, and segmented properly. It may not be the case that the first approach/setting the user applies is the best, and there may be a need for a trial-and-error method to find the best settings for the data of those types for which settings are not available, by default, in Splunk.

It is always advisable to first upload small amount of data on a test index on a development server of Splunk. Once the data is available on Splunk in the correct format of events in which queries can result in the required visualizations, then the input can be forwarded to the correct index and source on the production server.

Many times, it happens that when you are testing and trying to upload the same file more than once to try different settings of event configuration, Splunk may not index the file, as the filename or file contents are already on Splunk to avoid redundancy. In such scenarios, the index can be cleaned or the index can be deleted or disabled using the following commands, respectively:

  • Cleaning an index: splunk clean eventdata -index <index_name>
  • Deleting an index: splunk remove index <index_name>
  • Disabling an index: splunk disable index <index_name>

If there is a stream of data directly sent to Splunk from a TCP or UDP stream, it is advisable to write that data to a file and then configure Splunk to monitor the file. This helps to avoid loss of data when Splunk or a network is down, and it can also be helpful in case you're deleting and reindexing on Splunk for some reason. Use persistent queues to buffer data in case the forwarder, TCP, UDP, or scripted data input is used. This helps us to store data in a queue in case of any issues.

It is advisable to use Splunk forwarders when data is to be uploaded on Splunk Enterprise remotely. Forwarders have a feature of sending a heartbeat to the indexer every 30 sec, and in case of connectivity loss, it will hold the data until connected again.

When the data that is to be uploaded on Splunk does not have a timestamp and Splunk is configured to use the uploaded time as a timestamp, in that scenario, timestamp searching should be disabled. Disabling timestamp searching on data that doesn't have a timestamp at all enhances the processing considerably and makes it faster. To disable timestamp searching inputs.conf append the [host::SatelliteData] block with the DATETIME_CONFIG attribute as NONE.

Refer to the following example for better clarity:

[host::SatelliteData]
DATETIME_CONFIG = NONE

Data input is a very important and crucial process of Splunk tools. The following are some of the points to be considered while setting up the input process:

  • Identify how and which input methods will be used to upload data on Splunk
  • Use Universal forwarder if required
  • Look for the Splunk app store and utilize any technology add-on depending on the requirement, if any
  • Apply the Common Information Model (CIM), which specifies the standard host tags, fields, event type tags used by Splunk when processing most of the IT data
  • Always test the upload on a test environment first and then proceed to the deployment server
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.66.178