Chapter 3. Exploring the Visualize Page

The Visualize page is the most important page in Kibana 4, which helps in visualizing the data that has been analyzed using the Discover page. This page helps in creating the different types of visualization required for the data present in Elasticsearch. It is a separate page in Kibana that helps with easy understanding and creating data visualizations. This page is most crucial from a business perspective, because by analyzing the data stored, visualizations provide a simple and easy way to understand data. Using it, you can create different types of data visualization, save the visualizations, or use them individually/combine different visualizations to form a dashboard. This page gives you a full overview of the different types of visualization provided, how to create a new visualization from a new search or saved search, and how to design visualizations as per requirements.

The Kibana Visualize page is where you can create, modify, and view your own custom visualizations. There are several different types of visualization, including Vertical Bar Chart, Area Chart, Line Chart, Pie Chart, Tile Map (for displaying data on a map), and Data Table. Visualizations can also be shared with other users who have access to your Kibana instance.

Visualizations are the core component that makes Kibana functionality rich and useful software. Visualizations utilize the underlying component of Elasticsearch for aggregating and visualizing data. For a better understanding, let's explore the basic usage of aggregations used in Elasticsearch.

In this chapter, we are going to have a look at the following topics:

  • Basic concepts of bucket aggregations
  • Basic concepts of metric aggregations
  • Steps for designing visualization
  • Creating various visualizations

Understanding aggregations

Aggregations are collections of data that is stored in buckets. Aggregations have grown from the facets module of Elasticsearch, which allows fast querying and easy aggregation of data. Aggregations are used for building analytical information over the documents stored. They are used for real-time data analysis purposes. There are different types of aggregation which have a specified purpose with specific output, which can be classified into the following categories.

Bucket aggregations

In this type of aggregation, buckets are created to store various documents and are used for grouping the documents stored; every bucket is associated with a key and document criterion. The decision-making that decides which bucket will contain a document matching its criterion can be based either on the value of a specific field or any other parameter. Whenever aggregation is done, all bucket criterion are evaluated to decide which documents match the criterion of each bucket and fit into a particular bucket. This process goes on and on until all documents are segregated into different buckets as per the matching criterion. At the end of this process, all documents completely fit into any of the buckets created.

Every bucket has a criterion that helps to decide whether a document fits into that bucket or not. Also, bucket aggregations always compute and return the total number of documents that fit into each bucket. There are different bucket aggregators in Kibana 4 that have a different bucket strategy, such as some may define a single bucket, some may define multiple buckets, or dynamically create buckets during the aggregation process. Bucket aggregations are very powerful as they can combine with other types of aggregation, creating sub-aggregations. In sub-aggregations, the aggregations will be computed for each bucket generated by the parent aggregation. The different types of bucket aggregation are as follows.

Date histogram

This aggregation is done on date/time values that are automatically extracted by Kibana from the documents. Kibana automatically fetches the date type field in which different types of intervals are specified, such as 5 min, 30 min, and so on. This type of bucket puts in all the documents matching the criterion of the bucket whose value of the date field lies within the same interval as defined.

The available expressions for intervals specified in Kibana are year, quarter, month, week, day, hour, minute, second, auto, out of which only days, hours, minutes and seconds are allowed to contain fractional values. The auto interval automatically decides the time interval to be chosen by Kibana, on the basis of which graphs are designed, so that a good amount of buckets are created.

For example, date histogram can be used for a field that contains date/time with an interval of an hour. In this, there will be a bucket created for every hour, and each bucket stores documents that fall under the hour, meaning that if a document is created in the 5th hour, then the document fits into the bucket that contains documents created in the 5th hour only.

Histogram

This aggregation is done on a numeric field, which is automatically read/analyzed by Kibana and extracted from the documents. It creates a dynamic bucket based on the interval specified. In this, you can define any interval with a numeric value. This type of bucket puts in all the documents matching the criterion of the bucket whose value of the numeric field lies within the same interval as defined.

Note

Histogram is similar to date histogram aggregation except that date histogram is used for a date/time field, whereas histogram is used for a numeric value field.

For example, if the documents contain a numeric field (quantity) holding values from 1-100, we create a dynamic bucket by specifying intervals of 10. When aggregation takes place, the quantity field of each document is computed and rounded off to the nearest bucket, meaning if the quantity is 52 and the bucket interval is 10, then it will be rounded off to 50 and thus the document will fit into the bucket associated with the key 50.

Range

This aggregation is used to specify a range size or interval of range in which each range size represents a bucket. It is used for aggregation on numeric or date/time fields. It is similar to a manual Histogram or Date Histogram aggregation. Range size has to be specified manually, which helps to analyze a subset of complete data. The range consists of from and to values.

For example, the document contains a numeric field (user.statuses_count) with ranges such as 1,000-3,000, 3,000-5,000, 5,000-10,000, and so on. When aggregation takes place, the values extracted from every document will be checked against every bucket range specified and the document will fit into the matching bucket, meaning there will be three buckets containing documents of users who have posted statuses in the afore-mentioned range of 1,000-3,000, 3,000-5,000, and 5,000-10,000. It is very useful for analyzing data to create clusters, such as cluster users who frequently tweet or cluster users who are popular.

Note

This aggregation includes the from value in the bucket, but excludes the to value for range.

Also, an upper or lower boundary can be used for creating an open range, such as 10000-* in which this bucket will contain all documents of users who have posted statuses more than 10,000 times.

Date range

This aggregation is used to specify a range size or interval of range in date format in which each range size represents a bucket. It is used for aggregation on date/time fields. Range size has to be specified manually, which helps to analyze a subset of complete data. The range consists of from and to values.

For example, the document contains a date field (created_at) with ranges such as from now-2M/M to now-1M/M and from now-1M/M to now. When aggregation takes place, the values extracted from every document will be checked against every bucket range specified and the document will fit into the matching bucket, meaning there will be two buckets containing documents of a user in which bucket 1 will contain documents matching date range of current date—two months to current date—one month, and bucket 2 will contain documents matching date range of current date—one month to current date.

IPv4 range

This aggregation is used to specify a range size or interval of range in IP format in which each range size represents a bucket. The range consists of from and to values.

For example, the document contains an IP field (host_address) with ranges such as from 192.168.1.1 to 192.168.1.100, from 192.168.1.100 to 192.168.1.150. When aggregation takes place, the values extracted from every document will be checked against every bucket range specified and the document will fit into the matching bucket.

Terms

This aggregation is used to create buckets based on the values of a field. The buckets are created dynamically. It is similar to working with the GROUP BY statement used in SQL. In this, a field is specified that creates a bucket for all the values that exist in the field and puts in every document that has a value in that field.

For example, use terms aggregation on a user.languages field, which consists of languages in which a user tweets. It creates buckets for each language (en, jp, ru, and so on) and each bucket contains all the documents of a specific language in which a user has tweeted. So en language bucket will contain all documents that have been tweeted in English language and so on.

Filters

Filters are described exactly as a query, which was covered in the Using Search Bar section, in Chapter 2, Exploring the Discover Page. It is a very flexible yet powerful aggregation that helps to create visualizations based on search queries. In this aggregation, a filter is specified for each bucket on the basis of which of the documents match the filter that fits into that bucket.

For example, use filters aggregation on a field with the user.languages :( en or jp) query, which will create a bucket in which all documents containing tweets in English or Japanese fit in. If we add another filter query user.statuses_count:[5000-*], it will create two buckets in which one bucket will contain documents of tweets in English or Japanese, and another bucket will contain documents of users who have posted statuses more than 5000 times.

Note

Filters aggregation is slower in execution than other aggregations.

Significant terms

This aggregation is used to find uncommonly common terms in the data present. It uses a foreground set and background set, which help to find uncommonly common words. It is useful for creating subsets of the data to analyze uncommon behaviors/scenarios. The foreground set contains the search results matched by a query (filter), and the background set contains data in the index or indices. Significant terms are used to give results that have undergone a significant change as measured between the foreground set and background set.

If a term exists in 10 documents out of 10,000 indexed documents, but appears in 8 documents from 50 documents returned from the search query, then such a term is significant.

The foreground set can be constructed by either using a query (filter) or using any other bucket aggregation, first on all documents and then choosing significant terms as sub-aggregation. The size property is used to specify how many buckets are to be constructed, meaning how many significant terms should be calculated.

For example, use filter aggregation on a field with the user.location: India query and select significant terms on a language field as bucket aggregation specifying size as 5. It will give the top five significant terms for the search queries that are: en, hi, and so on.

Let's understand how these results were obtained when we used significant terms. When using the search query user.location:India, it gave a result of 270 documents, meaning 270 documents contained the search query. When using significant terms, it gave a result of hi having a count of 22 out of those 270 documents. When searching for the language hi in all the documents, it gave a count of 160 out of the total document count of 92,004. Therefore 22/270 (8.15%) in comparison of 160/92004 (0.17%) is a significant number, which tells us how much more common hi is within the search query of user.location: India but uncommon in all the documents.

Note

It is used to detect outliers and find anomalies. Some of the use cases are: finding trending topics on Twitter (country wise), detecting credit card fraud, and recommendation engines.

GeoHash

This aggregation is used to create buckets based on the geo_point fields and groups those points into buckets. The buckets are created dynamically. For this aggregation, the geo_point field has to be specified, which is automatically read by Kibana along with specifying precision. The smaller the precision, the larger the area covered by the buckets.

Use GeoHash aggregation on location fields to create a bucket containing tweets from users who are close to each other.

Note

GeoHash is used with the Tile Map visualizations, which help to easily visualize the data on a map.

Metric aggregations

Metric aggregations are used for computing metrics over a set of documents. This aggregation is used after creating a bucket aggregation which has buckets with documents stored in it. Metric aggregation is then specified to calculate the value of each bucket, so this aggregation runs on each bucket and provides a single value result per bucket.

In the visualizations, bucket aggregation would determine the first dimension of the chart followed by the value calculated by metric aggregation, which would be termed "second dimension".

Note

Metric aggregations will always run on buckets, and thus would always contain bucket aggregation.

The different types of metric aggregations are as follows.

Count

This aggregation is used to return the number of documents contained within every bucket as a value. The value can be extracted from any fields present in the documents.

For example, to find out how many tweets are in each language, use a term aggregation on the user.languages field, which will create one bucket per language. Then use a count metric aggregation, which will display the number of tweets for each language bucket.

Sum

This aggregation is used to calculate the sum of a numeric field stored in every bucket. The result for every bucket will be the sum of all the values in that field.

Average

This aggregation is used to calculate the average value of a numeric field stored in every bucket. The result for every bucket will be the average of all the values in that field.

For example, to find out the average number of statuses of Twitter users, use a term aggregation on the user.languages field. Then use average metric aggregation, which will display the average number of statuses tweeted for each language bucket.

Min

This aggregation is used to calculate the minimum value of a numeric field stored in every bucket. The result for every bucket will be the minimum value for that field found in documents stored.

Max

This aggregation is used to calculate the maximum value of a numeric field stored in every bucket. The result for every bucket will be the maximum value for that field found in documents stored.

For example, to find out the maximum number of retweets in each language, use a term aggregation on the user.languages field, which will create one bucket per language. Then use a maximum metric aggregation on the retweet.retweet_count field, which will display the maximum number of retweets for each language bucket.

Unique count

This aggregation is used to count the number of unique values that exist for a field stored in every bucket. The result for every bucket will be the total number of unique values for that field found in documents stored.

For example, the documents contain a numeric field (user.statuses_count) with ranges such as 1,000-3,000, 3,000-5,000, 5,000-10,000 for which buckets will be created. Then, unique count metric aggregation is used on the user.languages field, which will display for each user the status range the number of different languages used for posting statuses.

Percentile

This aggregation is used to calculate percentiles over numeric fields stored in buckets. It is different from other metric aggregations as it stores multiple values per bucket. It comes under the category of multivalue metrics aggregation. When specifying this aggregation, a numeric value field has to be specified along with multiple percentage values. The result of this aggregation will be the value for which a specified percentage of documents will be inside the value.

For example, use percentiles aggregation on the user.statuses_count field and specify the percentile values as 5, 50, 75, and 95. It will result in four aggregated values for every bucket. So if we only had one single index, then the 5 percentile result will have the value of 24. This means that 5% of all the tweets in this bucket have a user status count with 24 or below. The 50 percentile result is 175, meaning that 50% of all the tweets in this bucket have a user status count of 175 or below. The 75 percentile result is 845, meaning that 75% of all the tweets in this bucket have a user status count of 845 or below. The 95 percentile result is 18500, meaning that 50% of all the tweets in this bucket have a user status count of 18500 or below.

Note

Both unique count and percentile are approximate calculations. They sacrifice accuracy for speed.

Percentile ranks

This aggregation is used to calculate single or multiple percentile ranks over a numeric field, which has been extracted from the documents (data) and stored in buckets. It comes under the category of multi-value metrics aggregation. It is used to display the percentage of values occurring that are below a certain specific value. If a value is greater than or equal to 75% of values occurring, it is said to be at the 75th percentile rank.

Now as we have understood all the aggregations provided in Kibana, let's understand how to use these aggregations with visualizations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.111.179