Sorting your data

Data in Elasticsearch is by default sorted by a relevance score, which is computed using the Lucene scoring formula, TF/IDF. This relevance score is a floating point value that is returned with search results inside the _score parameter. By default, results are sorted in descending order.

Note

Sorting on nested and geo-points fields will be covered in the upcoming chapters.

See the following query for an example:

{
  "query": {
    "match": {
      "text": "data analytics"
    }
  }
}

We are searching for tweets that contain the data or analytics terms in their text fields. In some cases, however, we do not want the results to be sorted based on _score. Elasticsearch provides a way to sort documents in various ways. Let's explore how this can be done.

Sorting documents by field values

This section covers the sorting of documents based on the fields that contain a single value such as created_at, or followers_count. Please note that we are not talking about sorting string-based fields here.

Suppose we want to sort tweets that contain data or analytics in their text field based on their creation time in ascending order:

{
  "query":{
    "match":{"text":"data analytics"}
  },
  "sort":[
    {"created_at":{"order":"asc"}}
    ]
}

In the response of the preceding query, max_score and _score will have null as values. They are not calculated because _score is not used for sorting. You will see an additional field, sort. This field contains the date value in the long format, which has been used for sorting.

Sorting on more than one field

In scenarios where it is required to sort documents based on more than one field, one can use the following syntax for sorting:

"sort": [
{"created_at":{"order":"asc"},"followers_count":{"order":"asc"}}
]

With the above query, the results will be sorted first using tweet creation time, and if two tweets have the same tweet creation time, then they will be sorted using the followers count.

Sorting multivalued fields

Multivalued fields such as arrays of dates contain more than one value, and you cannot specify on which value to sort. So in this case, the single value needs to be calculated first using mode parameter that takes min, max, avg, median, or sum as a value. For example, in the following query the sorting will be done on the maximum value inside the price field of each document:

"sort" : [
      {"price" : {"order" : "asc", "mode" : "max"}}
   ]

Sorting on string fields

The analyzed string fields are also multivalued fields since they contain multiple tokens and because of performance considerations; do not use sorting on analyzed fields.

The string field on which sorting is to be done must be not_analyzed or keyword tokenized so that the field contains only one single token.

Note

Sorting is an expensive process. All the values for the field on which sorting is to be performed are loaded into memory. So, you should have an ample amount of memory on the node to perform sorting. The data type of the field should also be chosen carefully while creating mapping. For example, short can be used in place of integer or long if the value is not going to be bigger.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.55.102