This chapter will take you through important advanced data analytics commands to create reports, detect anomalies, and correlate the data. You will also go through the commands for predicting, trending, and machine learning on Splunk. This chapter will illustrate with examples the usage of advanced analytics commands to be run on Splunk to get detailed insight on the data.
In this chapter, we will cover the following topics:
You will now learn reporting commands that are used to format the data so that it can be visualized using various visualizations available on Splunk. Reporting commands are transforming commands that transform event data returned by searches in tables that can be used for visualizations.
The Splunk command makecontinuous
is used to make x-axis field continuous to plot it for visualization. This command adds empty buckets for the period where no data is available. Once the specified field is made continuous, then charts
/stats
/timechart
commands can be used for graphical visualizations.
The syntax for the makecontinuous
command is as follows:
… | makecontinous Field_name bin_options
The parameter description of the makecontinuous
command is as follows:
Field_name
: The name of the field that is to be plotted on the x axis can be specified.Bin_Options
: This parameter can be used to specify the options for discretization. This is a required parameter and can have values such as bins / span / start-end
. The options can be described as follows:Refer to the following example for better clarity:
| inputcsv datanse.csv |eval _time=strptime (date, "%e-%b-%y") | table _time DAX | makecontinuous span=1w DAX
The following screenshot describes the makecontinuous
command:
The preceding screenshot shows the output of the search result on the Splunk web console under the Visualization tab. The following screenshot shows the output of the same result under the Statistics tab:
The preceding example output graph shows how there are breaks in the line chart when there is non-continuous data. So, the makecontinuous
command adds empty bins for the period when no data is available, thus making the graph continuous. The second tabular image shows the time field along with the DAX field, which is specified to be made continuous and has values of 0
. The span
parameter is set to one week (1w
), which is basically the size/range of the bin created to make the data continuous.
The Splunk addtotals
command is used to compute the total of all the numeric fields or of the specified numeric fields for all the events in the result. The total value of numeric fields can either be calculated for all the rows, for all the columns, or for both of all the events.
The syntax for the addtotals
command is as follows:
… | addtotals row=true / false col=true / false labelfield=Field_name label=Label_name fieldname=Fieldnames/ Field_list
Refer to the following list for parameter description about the options of the addtotals
command:
row
: The default value for this argument is true
, which means that when the addtotals
command is used, it will result in calculating the sum of all the rows or for the specified field_list
for all the events. The result will be stored in a new field named as Total
by default or can be specified in the fieldname
parameter. Since the default value is true
, this parameter is used when the total of each row is not required. In that case, this parameter will be set to false
.col
: This parameter, if set to true
, will create a new event called the summary event at the bottom of the list of events. This parameter results in the sum of column totals in a table. The default value for this parameter is false
.labelfield
: The Field_name
can be specified to the newly created field for the column total. This field is used when the col
parameter is set to true
to override the field name of the summary field with the user specified field_name
.label
: The label_name
can be specified to name the field for row total, which, by default, has label_name
as total
with the user-specified fieldname.fieldname
: The list of fieldnames
/field_list
delimited by a space for which the sum is to be calculated is specified in this parameter. If this parameter is not specified, then the total of all the numeric fields is calculated.Take a look at the following example of the addtotals
command:
|inputcsv datanse.csv | table EM EU | addtotals col=true
The output of the preceding query will be similar to the following screenshot:
The Splunk addtotals
command computed the arithmetic total of fields (EM
and EU
) and resulted in the fieldname Total. The parameter col
is set to true
, which means each column total is also calculated and resulted in the output.
The Splunk xyseries
command is used to convert the data into a format that is Splunk visualization compatible. In other words, the data will be converted into a format such that the tabular data can be visualized using various visualization formats such as line chart, bar graph, area chart, pie chart, scatter chart, and so on. This command can be very useful in formatting the data to build visualizations of multiple data series.
Refer to the following query block for the syntax:
xyseries grouped=true / false x_axis_fieldname y_axis_fieldname y_axis_data_fieldname
The description of the parameters of the preceding query is as follows:
grouped
: This parameter, if set to true
, will allow multifile input, and the output will be sorted by the value of x_axis_fieldname
x_axis_fieldname
: The fieldname that is to be set as x axis in the outputy_axis_fieldname
: The fieldname that is to be used as a label for the data seriesy_axis_data_fieldname
: The field or list of fields containing the data to be plottedRefer to the following example for better clarity:
|inputcsvabc.csv |stats sum(Hits) AS Hits by Date UserID Transaction | eval temp=Date+"##"+UserID| table temp Transaction Hits | xyseries temp, Transaction Hits | fillnull | rex field=temp "(?<Date>.*)##(?<UserID>.*)" | fields - temp | table Date, UserID *
The output of the preceding query would look similar to the following screenshot:
The preceding screenshot is the sample data image, which shows the data points on which we will run the xyseries
command. The following screenshot shows the output of the search result on the given dataset:
The first screenshot displays the data that is basically logging off the type of transaction and number of hits with respect to UserID and Time. In a scenario when the user wants a summary of, for example, all transactions done by each user on each date in the dataset, then the xyseries
command can be used. In the example of xyseries
, first, the stats
command is used to create a statistical output by calculating the sum of hits based on Date
, UserID
and Transaction
. Then, a temporary variable temp
is created using the eval
command to add Date
and UserID
into a fieldname temp
. The xyseries
command of Splunk is used to create a statistical output, and then, the temporary variable temp
is expanded into its original variables, that is, Date
and UserID
. Hence, you get the result as required (shown in the second screenshot). Thus, the xyseries
command can be used to plot data visualization for multiple data series.
18.118.147.235