Big data analytics is a very popular trend. As a result, most business users want to discover their big data using intuitive and user-friendly tools because exploring data stored in Hadoop or any NoSQL data stores is a challenging task. Fortunately, Hunk does away with all the complexity obstructing analysts and business users. Moreover, it gives additional features that allow us to handle big data in just several mouse clicks. This is possible with Hunk knowledge objects.
In the previous chapter, we created virtual indexes based on web logs for the international fashion retailer Unicorn Fashion. We created some queries and reports via Search Processing Language (SPL). Moreover, we created a web operation dashboard and learnt how to create alerts.
In this chapter, we will explore Hunk knowledge objects, which will help us to achieve better results with less effort. Moreover, we will become familiar with pivots and data models, in order to learn how to work with Hunk with the traditional Business Intelligence (BI) tool.
Hunk has the same capabilities as Splunk; as a result we can create various knowledge objects that can help us explore big data and make it more user-friendly.
To work with knowledge objects, go to the KNOWLEDGE menu under Settings:
There are various knowledge objects available in Hunk. We encountered SPL, reports, dashboards, and alerts in the previous chapter. Let's expand our knowledge of Hunk and explore additional knowledge objects.
For more information about knowledge objects, see: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/WhatisSplunkknowledge.
Field aliases help us to normalize data over several sources. We can create multiple aliases in one field.
Let's create a new alias using the following steps:
Web Browser
, type the sourcetype as access_combined
, create this alias under Field aliases: useragent
= web_browser
, and click Save—as shown in the following screenshot:index="digital_analytics"
.web_browser
:Moreover, we can create the same alias web_browser
for any other data source. For example, we could have other logs, where instead of useragent
we could just have agent
. In this case, we can create a new alias that will map agent
as web_browser
. As a result, we create one alias for two different fields from various data sources.
A calculated field acts as a shortcut for performing repetitive, long, or complex transformations using the eval
command.
For example, say we want to monitor bandwidth usage in megabytes but we have all our data in bytes. Let's create a new field to convert bytes to megabytes:
access_combined
, the Name as bandwidth
, and Eval expression as bytes/1024/1024
. Click on Save:index="digital_analytics" | iplocation clientip |stats sum(bandwidth) by Country | sort – sum(bandwidth)
As a result, we got the top countries and their bandwidth in megabytes and used the new calculated field in a search like any other extracted field.
Field extractions are a special utility that helps us create custom fields. It generates a regular expression that pulls those fields from similar events. We can extract fields that are static and often needed in searches using Interactive Field Extractor (IFX). It is a very useful tool that:
Let's try to extract new fields from our digital data set:
index="digital_analytics"
index="digital_analytics" | stats count by browser_name
Moreover, there is another way to extract fields during the search using the rex
and erex
commands.
You can learn more about the rex
and erex
commands, with examples, at: http://docs.splunk.com/Documentation/Splunk/6.2.2/SearchReference/Erex and http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Rex.
Tags are like nicknames that you create for related field/value pairs. They can make our data more understandable and less ambiguous. It is possible to create several tags for any field/value combination.
Let's create tags for our data set:
index="digital_analytics"
index="digital_analytics" tag="Checkout"
An event type is a method of categorizing events based on a search; in other words, we can create a group of events based on common values. Let's look at the following example in order to better understand how this works. We can create a new event to:
For example, say the sales team wants to track monthly online sales. They want to easily identify purchases that are categorized by item. Let's create a new event type for coats:
index="digital_analytics" action=purchase productName=COATS
Purchase Coats
. In addition, we can create a new tag and choose the color and priority. Then click on Save.index="digital_analytics" action=purchase.
eventype
. As a result, we can group our events in custom groups using tags and event types.Workflow actions launch from fields and events in our search results in order to interact with external resources or narrow our search. The possible actions are:
For example, organizations often need to track ongoing attempts by external sources trying to log in with invalid credentials. We can use a GET workflow action that will open a new browser window with information about the source IP address.
For more information about workflow actions in the Splunk knowledgebase with detailed explanations and examples, see: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/CreateworkflowactionsinSplunkWeb.
Macros are useful when we frequently run searches with a similar search syntax. It can be a full search string or a portion of a search that can be reused in multiple places. In addition, macros allow us to define one or more arguments within the search segment.
Let's create macros with an argument:
activitybycategory(2)
.index="digital_analytics" action=$action1$ AND productName=$Name1$ | stats count by product_name
action1, Name1
. We should get the following:activitybycategory(purchase,COATS)
index="digital_analytics" action=purchase AND productName=COATS | stats count by productName
A data model is a hierarchically structured data set that generates searches and drives a pivot. (A pivot is an interface in which we can create reports based on data models. Soon we will explore pivots more closely.) In other words, data models provide a more meaningful representation of underlying raw machine data.
Data models are designed to make it easy to share and reuse domain knowledge. The idea is that admins or powerusers create data models for non-technical users, who interact with data via a user-friendly pivot UI.
Let's create a data model for our digital data set.
Unicorn Fashion Digital Analytics
and click on Create. A new data model will be created.Digital Data
and Constraints as index=digital_analytics sourcetype=access_combined
. Click on Save:We successfully added a root event and now we can add fields that Hunk can extract automatically. Let's do it:
There are four types of attribute in Hunk:
Moreover, there are also four types of attribute flag:
In order to add GeoIP attributes we should have a latitude and longitude lookup table or GeoIP mapping fields.
Let's add GeoIP attributes:
longitude
and latitude
respectively. Click on Save:As a result new fields will be added to our data model.
Hunk offers us other methods of adding attributes, such as:
eval
expressionFor more information about the eval
expression, see: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Addanevalexpressionattribute.
For more information about lookup in data models, see: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Addalookupattribute.
For more information about regular expressions in data models, see: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Addaregularexpressionattribute.
3.147.59.198