The Visualize page is the most important page in Kibana 4, which helps in visualizing the data that has been analyzed using the Discover page. This page helps in creating the different types of visualization required for the data present in Elasticsearch. It is a separate page in Kibana that helps with easy understanding and creating data visualizations. This page is most crucial from a business perspective, because by analyzing the data stored, visualizations provide a simple and easy way to understand data. Using it, you can create different types of data visualization, save the visualizations, or use them individually/combine different visualizations to form a dashboard. This page gives you a full overview of the different types of visualization provided, how to create a new visualization from a new search or saved search, and how to design visualizations as per requirements.
The Kibana Visualize page is where you can create, modify, and view your own custom visualizations. There are several different types of visualization, including Vertical Bar Chart, Area Chart, Line Chart, Pie Chart, Tile Map (for displaying data on a map), and Data Table. Visualizations can also be shared with other users who have access to your Kibana instance.
Visualizations are the core component that makes Kibana functionality rich and useful software. Visualizations utilize the underlying component of Elasticsearch for aggregating and visualizing data. For a better understanding, let's explore the basic usage of aggregations used in Elasticsearch.
In this chapter, we are going to have a look at the following topics:
Aggregations are collections of data that is stored in buckets. Aggregations have grown from the facets module of Elasticsearch, which allows fast querying and easy aggregation of data. Aggregations are used for building analytical information over the documents stored. They are used for real-time data analysis purposes. There are different types of aggregation which have a specified purpose with specific output, which can be classified into the following categories.
In this type of aggregation, buckets are created to store various documents and are used for grouping the documents stored; every bucket is associated with a key and document criterion. The decision-making that decides which bucket will contain a document matching its criterion can be based either on the value of a specific field or any other parameter. Whenever aggregation is done, all bucket criterion are evaluated to decide which documents match the criterion of each bucket and fit into a particular bucket. This process goes on and on until all documents are segregated into different buckets as per the matching criterion. At the end of this process, all documents completely fit into any of the buckets created.
Every bucket has a criterion that helps to decide whether a document fits into that bucket or not. Also, bucket aggregations always compute and return the total number of documents that fit into each bucket. There are different bucket aggregators in Kibana 4 that have a different bucket strategy, such as some may define a single bucket, some may define multiple buckets, or dynamically create buckets during the aggregation process. Bucket aggregations are very powerful as they can combine with other types of aggregation, creating sub-aggregations. In sub-aggregations, the aggregations will be computed for each bucket generated by the parent aggregation. The different types of bucket aggregation are as follows.
This aggregation is done on date/time values that are automatically extracted by Kibana from the documents. Kibana automatically fetches the date type field in which different types of intervals are specified, such as 5 min, 30 min, and so on. This type of bucket puts in all the documents matching the criterion of the bucket whose value of the date field lies within the same interval as defined.
The available expressions for intervals specified in Kibana are year, quarter, month, week, day, hour, minute, second, auto, out of which only days, hours, minutes and seconds are allowed to contain fractional values. The auto interval automatically decides the time interval to be chosen by Kibana, on the basis of which graphs are designed, so that a good amount of buckets are created.
For example, date histogram can be used for a field that contains date/time with an interval of an hour. In this, there will be a bucket created for every hour, and each bucket stores documents that fall under the hour, meaning that if a document is created in the 5th hour, then the document fits into the bucket that contains documents created in the 5th hour only.
This aggregation is done on a numeric field, which is automatically read/analyzed by Kibana and extracted from the documents. It creates a dynamic bucket based on the interval specified. In this, you can define any interval with a numeric value. This type of bucket puts in all the documents matching the criterion of the bucket whose value of the numeric field lies within the same interval as defined.
For example, if the documents contain a numeric field (quantity) holding values from 1-100, we create a dynamic bucket by specifying intervals of 10. When aggregation takes place, the quantity field of each document is computed and rounded off to the nearest bucket, meaning if the quantity is 52 and the bucket interval is 10, then it will be rounded off to 50 and thus the document will fit into the bucket associated with the key 50.
This aggregation is used to specify a range size or interval of range in which each range size represents a bucket. It is used for aggregation on numeric or date/time fields. It is similar to a manual Histogram or Date Histogram aggregation. Range size has to be specified manually, which helps to analyze a subset of complete data. The range consists of from and to values.
For example, the document contains a numeric field (user.statuses_count
) with ranges such as 1,000-3,000, 3,000-5,000, 5,000-10,000, and so on. When aggregation takes place, the values extracted from every document will be checked against every bucket range specified and the document will fit into the matching bucket, meaning there will be three buckets containing documents of users who have posted statuses in the afore-mentioned range of 1,000-3,000, 3,000-5,000, and 5,000-10,000. It is very useful for analyzing data to create clusters, such as cluster users who frequently tweet or cluster users who are popular.
Also, an upper or lower boundary can be used for creating an open range, such as 10000-*
in which this bucket will contain all documents of users who have posted statuses more than 10,000 times.
This aggregation is used to specify a range size or interval of range in date format in which each range size represents a bucket. It is used for aggregation on date/time fields. Range size has to be specified manually, which helps to analyze a subset of complete data. The range consists of from and to values.
For example, the document contains a date field (created_at
) with ranges such as from now-2M/M to now-1M/M
and from now-1M/M to now
. When aggregation takes place, the values extracted from every document will be checked against every bucket range specified and the document will fit into the matching bucket, meaning there will be two buckets containing documents of a user in which bucket 1 will contain documents matching date range of current date—two months to current date—one month, and bucket 2 will contain documents matching date range of current date—one month to current date.
This aggregation is used to specify a range size or interval of range in IP format in which each range size represents a bucket. The range consists of from and to values.
For example, the document contains an IP field (host_address
) with ranges such as from 192.168.1.1
to 192.168.1.100
, from 192.168.1.100
to 192.168.1.150
. When aggregation takes place, the values extracted from every document will be checked against every bucket range specified and the document will fit into the matching bucket.
This aggregation is used to create buckets based on the values of a field. The buckets are created dynamically. It is similar to working with the GROUP BY
statement used in SQL. In this, a field is specified that creates a bucket for all the values that exist in the field and puts in every document that has a value in that field.
For example, use terms aggregation on a user.languages
field, which consists of languages in which a user tweets. It creates buckets for each language (en, jp, ru, and so on) and each bucket contains all the documents of a specific language in which a user has tweeted. So en language bucket will contain all documents that have been tweeted in English language and so on.
Filters are described exactly as a query, which was covered in the Using Search Bar section, in Chapter 2, Exploring the Discover Page. It is a very flexible yet powerful aggregation that helps to create visualizations based on search queries. In this aggregation, a filter is specified for each bucket on the basis of which of the documents match the filter that fits into that bucket.
For example, use filters aggregation on a field with the user.languages :( en or jp)
query, which will create a bucket in which all documents containing tweets in English or Japanese fit in. If we add another filter query user.statuses_count:[5000-*]
, it will create two buckets in which one bucket will contain documents of tweets in English or Japanese, and another bucket will contain documents of users who have posted statuses more than 5000 times.
This aggregation is used to find uncommonly common terms in the data present. It uses a foreground set and background set, which help to find uncommonly common words. It is useful for creating subsets of the data to analyze uncommon behaviors/scenarios. The foreground set contains the search results matched by a query (filter), and the background set contains data in the index or indices. Significant terms are used to give results that have undergone a significant change as measured between the foreground set and background set.
If a term exists in 10 documents out of 10,000 indexed documents, but appears in 8 documents from 50 documents returned from the search query, then such a term is significant.
The foreground set can be constructed by either using a query (filter) or using any other bucket aggregation, first on all documents and then choosing significant terms as sub-aggregation. The size property is used to specify how many buckets are to be constructed, meaning how many significant terms should be calculated.
For example, use filter aggregation on a field with the user.location: India
query and select significant terms on a language field as bucket aggregation specifying size as 5. It will give the top five significant terms for the search queries that are: en, hi, and so on.
Let's understand how these results were obtained when we used significant terms. When using the search query user.location:India
, it gave a result of 270 documents, meaning 270 documents contained the search query. When using significant terms, it gave a result of hi having a count of 22 out of those 270 documents. When searching for the language hi in all the documents, it gave a count of 160 out of the total document count of 92,004. Therefore 22/270 (8.15%) in comparison of 160/92004 (0.17%) is a significant number, which tells us how much more common hi is within the search query of user.location: India but uncommon in all the documents.
This aggregation is used to create buckets based on the geo_point
fields and groups those points into buckets. The buckets are created dynamically. For this aggregation, the geo_point
field has to be specified, which is automatically read by Kibana along with specifying precision. The smaller the precision, the larger the area covered by the buckets.
Use GeoHash aggregation on location fields to create a bucket containing tweets from users who are close to each other.
Metric aggregations are used for computing metrics over a set of documents. This aggregation is used after creating a bucket aggregation which has buckets with documents stored in it. Metric aggregation is then specified to calculate the value of each bucket, so this aggregation runs on each bucket and provides a single value result per bucket.
In the visualizations, bucket aggregation would determine the first dimension of the chart followed by the value calculated by metric aggregation, which would be termed "second dimension".
The different types of metric aggregations are as follows.
This aggregation is used to return the number of documents contained within every bucket as a value. The value can be extracted from any fields present in the documents.
For example, to find out how many tweets are in each language, use a term aggregation on the user.languages
field, which will create one bucket per language. Then use a count metric aggregation, which will display the number of tweets for each language bucket.
This aggregation is used to calculate the sum of a numeric field stored in every bucket. The result for every bucket will be the sum of all the values in that field.
This aggregation is used to calculate the average value of a numeric field stored in every bucket. The result for every bucket will be the average of all the values in that field.
For example, to find out the average number of statuses of Twitter users, use a term aggregation on the user.languages
field. Then use average metric aggregation, which will display the average number of statuses tweeted for each language bucket.
This aggregation is used to calculate the minimum value of a numeric field stored in every bucket. The result for every bucket will be the minimum value for that field found in documents stored.
This aggregation is used to calculate the maximum value of a numeric field stored in every bucket. The result for every bucket will be the maximum value for that field found in documents stored.
For example, to find out the maximum number of retweets in each language, use a term aggregation on the user.languages
field, which will create one bucket per language. Then use a maximum metric aggregation on the retweet.retweet_count
field, which will display the maximum number of retweets for each language bucket.
This aggregation is used to count the number of unique values that exist for a field stored in every bucket. The result for every bucket will be the total number of unique values for that field found in documents stored.
For example, the documents contain a numeric field (user.statuses_count
) with ranges such as 1,000-3,000, 3,000-5,000, 5,000-10,000 for which buckets will be created. Then, unique count metric aggregation is used on the user.languages
field, which will display for each user the status range the number of different languages used for posting statuses.
This aggregation is used to calculate percentiles over numeric fields stored in buckets. It is different from other metric aggregations as it stores multiple values per bucket. It comes under the category of multivalue metrics aggregation. When specifying this aggregation, a numeric value field has to be specified along with multiple percentage values. The result of this aggregation will be the value for which a specified percentage of documents will be inside the value.
For example, use percentiles aggregation on the user.statuses_count
field and specify the percentile values as 5, 50, 75, and 95. It will result in four aggregated values for every bucket. So if we only had one single index, then the 5 percentile result will have the value of 24. This means that 5% of all the tweets in this bucket have a user status count with 24 or below. The 50 percentile result is 175, meaning that 50% of all the tweets in this bucket have a user status count of 175 or below. The 75 percentile result is 845, meaning that 75% of all the tweets in this bucket have a user status count of 845 or below. The 95 percentile result is 18500, meaning that 50% of all the tweets in this bucket have a user status count of 18500 or below.
This aggregation is used to calculate single or multiple percentile ranks over a numeric field, which has been extracted from the documents (data) and stored in buckets. It comes under the category of multi-value metrics aggregation. It is used to display the percentage of values occurring that are below a certain specific value. If a value is greater than or equal to 75% of values occurring, it is said to be at the 75th percentile rank.
Now as we have understood all the aggregations provided in Kibana, let's understand how to use these aggregations with visualizations.
18.119.111.179