Sorting data using scripts

Elasticsearch provides scripting support for sorting functionality. In real-world applications, there is often a need to modify the default sort by match score using an algorithm that depends on the context and some external variables. Some common scenarios are as follows:

  • Sorting places near a point
  • Sorting by most read articles
  • Sorting items by custom user logic
  • Sorting items by revenue

Tip

Because the compute of scores on a large dataset is very CPU intensive, if you use scripting it's better execute it on a small dataset using standard score queries for detecting the top documents, and then execute a rescoring on the top subset.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command-line you need to install curl for your operating system.

To correctly execute the following commands, you need an index populated with the chapter_09/populate_for_scripting.sh script available in the online code.

How to do it...

For sorting using scripting, we will perform the following steps:

  1. If we want to order our documents by the price field multiplied by a factor parameter (that is, sales tax), the search will be as shown in the following code:
            curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?  
            pretty&size=3' -d 
            '{ 
                "query": 
                { 
                  "match_all": {} 
                }, 
                "sort": 
                { 
                  "_script": 
                  { 
                    "script": 
                    { 
                      "inline": "Math.sqrt(doc["price"].value * 
                      params.factor)", 
                      "params":
                      { 
                        "factor": 1.1 
                      } 
                    }, 
                    "type": "number", 
                    "order": "asc" 
                  } 
                }    
            }' 
    

    In this case, we have used a match_all query and a sort script. In real-world applications, the documents to be sorted must not be of a high cardinality.

  2. If everything's correct, the result returned by Elasticsearch should be as shown in the following code:
            { 
              "took" : 7, 
              "timed_out" : false, 
              "_shards" : { 
                "total" : 5, 
                "successful" : 5, 
                "failed" : 0 
              }, 
              "hits" : { 
                "total" : 1000, 
                "max_score" : null, 
                "hits" : [ { 
                  "_index" : "test-index", 
                  "_type" : "test-type", 
                  "_id" : "161", 
                  "_score" : null, "_source" : ... truncated ..., 
                  "sort" : [ 0.0278578661440021 ] 
                }, { 
                  "_index" : "test-index", 
                  "_type" : "test-type", 
                  "_id" : "634", 
                  "_score" : null, "_source" : ... truncated ..., 
                  "sort" : [ 0.08131364254827411 ] 
                }, { 
                  "_index" : "test-index", 
                  "_type" : "test-type", 
                  "_id" : "465", 
                  "_score" : null, "_source" : ... truncated ..., 
                  "sort" : [ 0.1094966959069832 ] 
                } ] 
              } 
            } 
    

How it works...

The sort parameter, which we discussed in Chapter 5, Search, can be extended with the help of scripting.

The sort scripting allows defining several parameters, such as:

  • order (default "asc") ("asc" or "desc"): This determines whether the order must be ascending or descending.
  • type: This defines the type to convert the value.
  • script: This contains the script object to be executed.

Extending the sort with scripting allows the use of a broader approach in scoring your hits.

Tip

Elasticsearch scripting permits the use of any code that you want to use. You can create custom complex algorithms for scoring your documents.

There's more...

Painless and Groovy provides a lot of built-in functions (mainly taken from Java Math class) that can be used in scripts such as the following:

Function

Description

time()

The current time in milliseconds

sin(a)

Returns the trigonometric sine of an angle

cos(a)

Returns the trigonometric cosine of an angle

tan(a)

Returns the trigonometric tangent of an angle

asin(a)

Returns the arc sine of a value

acos(a)

Returns the arc cosine of a value

atan(a)

Returns the arc tangent of a value

toRadians(angdeg)

Converts an angle measured in degrees to an approximately equivalent angle measured in radians

toDegrees(angrad)

Converts an angle measured in radians to an approximately equivalent angle measured in degrees

exp(a)

Returns Euler's number raised to the power of a value

log(a)

Returns the natural logarithm (base e) of a value

log10(a)

Returns the base 10 logarithm of a value

sqrt(a)

Returns the correctly rounded positive square root of a value

cbrt(a)

Returns the cube root of a double value

IEEEremainder(f1, f2)

Computes the remainder operation on two arguments as prescribed by the IEEE 754 standard

ceil(a)

Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer

floor(a)

Returns the largest (closest to positive infinity) value that is less than or equal to the argument and is equal to a mathematical integer

rint(a)

Returns the value that is closest in value to the argument and is equal to a mathematical integer

atan2(y, x)

Returns the angle, theta from the conversion of rectangular coordinates (x,y_) to polar coordinates (r,_theta)

pow(a, b)

Returns the value of the first argument raised to the power of the second argument

round(a)

Returns the closest integer to the argument

random()

Returns a random double value

abs(a)

Returns the absolute value of a value

max(a, b)

Returns the greater of two values

min(a, b)

Returns the smaller of two values

ulp(d)

Returns the size of the unit in the last place of the argument

signum(d)

Returns the signum function of the argument

sinh(x)

Returns the hyperbolic sine of a value

cosh(x)

Returns the hyperbolic cosine of a value

tanh(x)

Returns the hyperbolic tangent of a value

hypot(x,y)

Returns sqrt(x2+y2) without intermediate overflow or underflow

acos(a)

Returns the arc cosine of a value

atan(a)

Returns the arc tangent of a value

If you want to retrieve records in a random order, you can use a script with a random method as shown in the following code:

    curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?
    &pretty&size=3' -d '{
      "query": {
        "match_all": {}
      },
      "sort": {
        "_script": {
          "script": {
            "inline": "Math.random()"
          },
          "type": "number",
          "order": "asc"
        }
      }
    }'

In this example, for every hit, the new sort value is computed executing the scripting function Math.random().

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.93.68