Sorting results

When searching for results, the most common criterion for sorting in Elasticsearch is the relevance to a text query.

Real-world applications often need to control the sorting criteria in scenarios, such as the following:

  • Sorting a user by last name and first name
  • Sorting items by stock symbols, price (ascending, descending)
  • Sorting documents by size, file type, source

Getting ready

You need an up-and-running Elasticsearch installation as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operating system.

To correctly execute the following commands, you will need an index populated with the chapter_05/populate_query.sh script available in the online code.

How to do it...

In order to sort the results, we will perform the following steps:

  1. Add a sort section to your query as follows:
            curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?  
            pretty' -d'{ 
              "query": { 
                "match_all": {} 
             }, 
              "sort": [ 
                { 
                  "price": { 
                    "order": "asc", 
                    "mode": "avg", 
                    "unmapped_type" : "double", 
                    "missing": "_last" 
                  } 
                }, 
                "_score" 
              ] 
            }' 
    
  2. The returned result should be similar to the following:
            ..., 
              "hits" : { 
                "total" : 3, 
                "max_score" : null, 
                "hits" : [ { 
                  "_index" : "test-index", 
                  "_type" : "test-type", 
                  "_id" : "1", 
                  "_score" : 1.0, "_source" :{ ... "price":4.0}, 
                  "sort" : [ 
                      4.0, 
                      1.0 
                    ]  
                }, { 
                  .... 
    

The sort result is very special: an extra field sort is created to collect the value used for sorting.

How it works...

The sort parameter can be defined as a list that can contain both simple strings and JSON objects.

The sort strings are the name of the field (such as field1, field2, field3 or field4, and so on) used for sorting and are similar to the SQL function order by.

The JSON object allows users extra parameters as follows:

  • order (asc/desc): This defines whether the order must be considered ascendant (default) or descendent.
  • unmapped_type (long/int/double/string/...): This defines the type of the sort parameter if the value is missing. It's best practice to define it to prevent sorting errors due to missing values.
  • missing (_last/_first): This defines how to manage missing values - whether to put them at the end (_last) of the results or at the start (_first).
  • mode: This defines how to manage multi-value fields. Possible values are:
    • min: The minimum value is chosen (that is to say that in the case of multi-price on an item, it chooses the lowest for comparison).
    • max: The maximum value is chosen.
    • sum: The sort value will be computed as the sum of all the values. This mode is only available on numeric array fields.
    • avg: The sort value will be the average of all the values. This mode is only available on numeric array fields.
    • median: The sort value will be the median of all the values. This mode is only available on numeric array fields.

Tip

If we want to add the relevance score value to the sort list, we must use the special sort field _score.

In case you are sorting for a nested object, there are two extra parameters that can be used, as follows:

  • nested_path: This defines the nested object to be used for sorting. The field defined for sorting will be relative to the nested_path. If not defined, then the sorting field is related to the document root.
  • nested_filter: This defines a filter that is used to remove nested documents that don't match from the sorting value extraction. This filter allows a better selection of values to be used in sorting.

For example, if we have an address object nested in a person document and we can sort for the city.name, we can use the following:

  • address.city.name without defining the nested_path
  • city.name if we define a nested_path address

Tip

The sorting process requires that the sorting fields of all the matched query documents are fetched to be compared. To prevent high memory usage, its better to sort numeric fields, and in case of string sorting, choose short text fields processed with an analyzer that doesn't tokenize the text.

There's more...

If you are using sort, pay attention to the tokenized fields, because the sort order depends on the lower-order token if ascendant and the higher order token if descendent. In case of tokenized fields, this behavior is not similar to a common sort because we execute it at term level.

For example, if we sort by the descending name field, we use the following:

    curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?
    sort=name:desc'

In the preceding example, the results are as follows:

... 
"hits" : [ { 
      "_index" : "test-index", 
      "_type" : "test-type", 
      "_id" : "1", 
      "_score" : null, "_source" : {"position": 1, "parsedtext": "Joe Testere nice guy", "name": "Joe Tester", "uuid": "11111"}, 
      "sort" : [ "tester" ] 
    }, { 
      "_index" : "test-index", 
      "_type" : "test-type", 
      "_id" : "3", 
      "_score" : null, "_source" : {"position": 3, "parsedtext": "Bill is not
                nice guy", "name": "Bill Clinton", "uuid": "33333"}, 
      "sort" : [ "clinton" ] 
    }, { 
      "_index" : "test-index", 
      "_type" : "test-type", 
      "_id" : "2", 
      "_score" : null, "_source" : {"position": 2, "parsedtext": "Bill Testere nice guy", "name": "Bill Baloney", "uuid": "22222"}, 
      "sort" : [ "bill" ] 
    } 

The expected SQL results can be obtained using a not-tokenized field, in this case name.raw, as follows:

    curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?
    sort=name.raw:desc'

The results are as follows:

{ 
  ... 
  "hits" : { 
    "total" : 3, 
    "max_score" : null, 
    "hits" : [ 
      { 
        ..."_id" : "1",... 
        "sort" : [ 
          "Joe Tester" 
        ] 
      }, 
      { 
        ..."_id" : "3", 
        ..."sort" : [ 
          "Bill Clinton" 
        ] 
      }, 
      { 
        ..."_id" : "2", 
        ..."sort" : [ 
          "Bill Baloney" 
        ] 
      } 
    ] 
  } 
} 

There are two special sorting types: geo distance and scripting.

Geo distance sorting uses the distance from a geo point (location) as metric to compute the ordering. A sorting example could be as follows:

... 
"sort" : [ 
        { 
            "_geo_distance" : { 
                "pin.location" : [-70, 40], 
                "order" : "asc", 
                "unit" : "km" 
            } 
        } 
    ], ... 

It accepts special parameters such as the following:

  • unit: This defines the metric to be used to compute the distance.
  • distance_type (sloppy_arc/arc/plane): This defines the type of distance to be computed. The name _geo_distance for the field is mandatory.

The point of reference for the sorting can be defined in several ways as we have already discussed in the Mapping a geo point field recipe in Chapter 3, Managing Mappings.

Using the scripting for sorting will be discussed in Sorting data using scripts recipe in Chapter 9, Scripting after we introduce the scripting capabilities of Elasticsearch.

See also

  • The Mapping a GeoPoint field recipe in Chapter 3, Managing Mappings. This explains how to correct create a mapping for a geo point field
  • The Sorting with scripts recipe in Chapter 9, Scripting will explain the use of custom script for computing values to sort on
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.11.247