Chapter 5. Advanced Data Analytics

This chapter will take you through important advanced data analytics commands to create reports, detect anomalies, and correlate the data. You will also go through the commands for predicting, trending, and machine learning on Splunk. This chapter will illustrate with examples the usage of advanced analytics commands to be run on Splunk to get detailed insight on the data.

In this chapter, we will cover the following topics:

  • Reports
  • Geography and location
  • Anomalies
  • Prediction and trending
  • Correlation
  • Machine learning

Reports

You will now learn reporting commands that are used to format the data so that it can be visualized using various visualizations available on Splunk. Reporting commands are transforming commands that transform event data returned by searches in tables that can be used for visualizations.

The makecontinuous command

The Splunk command makecontinuous is used to make x-axis field continuous to plot it for visualization. This command adds empty buckets for the period where no data is available. Once the specified field is made continuous, then charts/stats/timechart commands can be used for graphical visualizations.

The syntax for the makecontinuous command is as follows:

… | makecontinous
    Field_name
    bin_options

The parameter description of the makecontinuous command is as follows:

  • Field_name: The name of the field that is to be plotted on the x axis can be specified.
  • Bin_Options: This parameter can be used to specify the options for discretization. This is a required parameter and can have values such as bins / span / start-end. The options can be described as follows:
    • bins: This parameter is used to specify the number of bins in which the data is to be discretize.
    • span: This parameter is used to specify the size of the bin based on time or log-based span.
    • start-end: This parameter is used to define the maximum and minimum size of the bins for discretization.

Refer to the following example for better clarity:

| inputcsv datanse.csv |eval _time=strptime (date, "%e-%b-%y") | table _time DAX | makecontinuous span=1w DAX

The following screenshot describes the makecontinuous command:

The makecontinuous command

The preceding screenshot shows the output of the search result on the Splunk web console under the Visualization tab. The following screenshot shows the output of the same result under the Statistics tab:

The makecontinuous command

The preceding example output graph shows how there are breaks in the line chart when there is non-continuous data. So, the makecontinuous command adds empty bins for the period when no data is available, thus making the graph continuous. The second tabular image shows the time field along with the DAX field, which is specified to be made continuous and has values of 0. The span parameter is set to one week (1w), which is basically the size/range of the bin created to make the data continuous.

The addtotals command

The Splunk addtotals command is used to compute the total of all the numeric fields or of the specified numeric fields for all the events in the result. The total value of numeric fields can either be calculated for all the rows, for all the columns, or for both of all the events.

The syntax for the addtotals command is as follows:

… | addtotals
    row=true / false
    col=true / false
    labelfield=Field_name
    label=Label_name
    fieldname=Fieldnames/ Field_list

Refer to the following list for parameter description about the options of the addtotals command:

  • row: The default value for this argument is true, which means that when the addtotals command is used, it will result in calculating the sum of all the rows or for the specified field_list for all the events. The result will be stored in a new field named as Total by default or can be specified in the fieldname parameter. Since the default value is true, this parameter is used when the total of each row is not required. In that case, this parameter will be set to false.
  • col: This parameter, if set to true, will create a new event called the summary event at the bottom of the list of events. This parameter results in the sum of column totals in a table. The default value for this parameter is false.
  • labelfield: The Field_name can be specified to the newly created field for the column total. This field is used when the col parameter is set to true to override the field name of the summary field with the user specified field_name.
  • label: The label_name can be specified to name the field for row total, which, by default, has label_name as total with the user-specified fieldname.
  • fieldname: The list of fieldnames/field_list delimited by a space for which the sum is to be calculated is specified in this parameter. If this parameter is not specified, then the total of all the numeric fields is calculated.

Take a look at the following example of the addtotals command:

|inputcsv datanse.csv | table EM EU | addtotals col=true

The output of the preceding query will be similar to the following screenshot:

The addtotals command

The Splunk addtotals command computed the arithmetic total of fields (EM and EU) and resulted in the fieldname Total. The parameter col is set to true, which means each column total is also calculated and resulted in the output.

The xyseries command

The Splunk xyseries command is used to convert the data into a format that is Splunk visualization compatible. In other words, the data will be converted into a format such that the tabular data can be visualized using various visualization formats such as line chart, bar graph, area chart, pie chart, scatter chart, and so on. This command can be very useful in formatting the data to build visualizations of multiple data series.

Refer to the following query block for the syntax:

xyseries
    grouped=true / false
    x_axis_fieldname
    y_axis_fieldname
    y_axis_data_fieldname

The description of the parameters of the preceding query is as follows:

  • grouped: This parameter, if set to true, will allow multifile input, and the output will be sorted by the value of x_axis_fieldname
  • x_axis_fieldname: The fieldname that is to be set as x axis in the output
  • y_axis_fieldname: The fieldname that is to be used as a label for the data series
  • y_axis_data_fieldname: The field or list of fields containing the data to be plotted

Refer to the following example for better clarity:

|inputcsvabc.csv |stats sum(Hits) AS Hits by Date UserID Transaction | eval temp=Date+"##"+UserID| table temp Transaction Hits | xyseries temp, Transaction Hits | fillnull | rex field=temp "(?<Date>.*)##(?<UserID>.*)" | fields - temp | table Date, UserID *

The output of the preceding query would look similar to the following screenshot:

The xyseries command

The preceding screenshot is the sample data image, which shows the data points on which we will run the xyseries command. The following screenshot shows the output of the search result on the given dataset:

The xyseries command

The first screenshot displays the data that is basically logging off the type of transaction and number of hits with respect to UserID and Time. In a scenario when the user wants a summary of, for example, all transactions done by each user on each date in the dataset, then the xyseries command can be used. In the example of xyseries, first, the stats command is used to create a statistical output by calculating the sum of hits based on Date, UserID and Transaction. Then, a temporary variable temp is created using the eval command to add Date and UserID into a fieldname temp. The xyseries command of Splunk is used to create a statistical output, and then, the temporary variable temp is expanded into its original variables, that is, Date and UserID. Hence, you get the result as required (shown in the second screenshot). Thus, the xyseries command can be used to plot data visualization for multiple data series.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.147.235