Painless scripting

Painless is a simple, secure scripting language available in Elasticsearch by default. It was designed by Elasticsearch guys specifically to be used with Elasticsearch and can safely be used with inline and stored scripting. Its syntax is similar to Groovy.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line you need to install curl on your operating system.

To be able to use regular expressions in Painless scripting, you need to activate them in your elasticsearch.yml adding the following:

    script.painless.regex.enabled: true

To correctly execute the following commands, you need an index populated with the chapter_09/populate_for_scripting.sh script available in the online code.

How to do it...

We'll use Painless scripting to compute the scoring with a script. A script code requires us to correctly escape special characters, and writing it in a single curl is quite complex and often difficult to read, so we'll save the script code in a separate file to avoid complex escaping and newline management:

  1. We create the file containing the script painless_script_score.json:
            { 
              "query": { 
                "function_score": { 
                  "script_score": { 
                    "script": { 
                      "lang": "painless", 
                      "inline": "doc['price'].value * 1.2" 
                    } 
                  } 
                } 
              } 
            } 
    
  2. We can now execute the script via a curl:
           curl -XPOST 'http://127.0.0.1:9200/test-index/test-type/_search? 
           pretty&size=2' -d @painless_script_score.json
    
  3. The result will be as follows, if everything is all right:
            { 
              "took" : 26, 
              "timed_out" : false, 
              "_shards" : {...}, 
              "hits" : { 
                "total" : 1000, 
                "max_score" : 119.97963, 
                "hits" : [ 
                  { 
                    "_index" : "test-index", 
                    "_type" : "test-type", 
                    "_id" : "857", 
                    "_score" : 119.97963, 
                    "_source" : { 
                      ...           
                        "price" : 99.98302508488757, 
                      ... 
                    } 
                  }, 
                  { 
                    "_index" : "test-index", 
                    "_type" : "test-type", 
                    "_id" : "136", 
                    "_score" : 119.90164, 
                    "_source" : { 
                      ... 
                      "price" : 99.91804048691392, 
                      ... 
                    } 
                  }   
                ] 
              } 
            } 
    

How it works...

Painless is a scripting language developed for Elasticsearch for fast data processing and security (sandboxed to prevent malicious code injection).

The syntax is based on Groovy, and it's provided by default in every installation.

Painless is marked as "experimental" by the Elasticsearch team, because some features may change in the future, but it is the preferred language for scripting.

Elasticsearch processes the scripting language in two steps:

  1. The script code is compiled in an object to be used in a script call. If the scripting code is invalid; then an exception is raised.
  2. For every element, the script is called and the result is collected. If the script fails on some elements, the search/computation may fail.

Tip

Using scripting is a powerful Elasticsearch functionality, but it costs a lot in terms of memory and CPU cycles. The best practice, if it's possible, is to optimize the indexing of data to search or aggregate and avoid using scripting.

The way to define a script in Elasticsearch is always the same. The script is contained in an object script and it accepts several parameters:

  • inline/id/name: This is the reference for the script that can be:
    • inline ,if it's provided with the call
    • id ,if it's stored in the cluster
    • name ,if it's stored on the filesystem
  • params (an optional JSON object): This defines the parameters to be passed to, which are, in the context of scripting, variable params
  • lang (default painless): This defines the scripting language to be used

There's more

Painless is the preferred choice if the script is not too complex; but otherwise, a native plugin provides a better environment to implement complex logic and data management.

For accessing document properties in Painless scripts, the same approach works as with other scripting languages:

  • doc._score: This stores the document score. It's generally available in searching, sorting and aggregations.
  • doc._source: This allows access to the source of the document. Use it wisely because it requires the entire source to be fetched and it's very CPU-and-memory-intensive.
  • _fields['field_name'].value: This allows you to load the value from stored field (in mapping, the field has the stored:true parameter).
  • doc['field_name']: This extracts the document field value from the doc values of the field. In Elasticsearch, doc values are automatically stored for every field that is not of type text.
  • doc['field_name'].value: This extracts the value of the field_name field from the document. If the value is an array, or if you want to extract the value as an array, you can use doc['field_name'].values.
  • doc['field_name'].empty: This returns true if the field_name field has no value in the document.
  • doc['field_name'].multivalue: This returns true if the field_name field contains multiple values.

Tip

For performance, the fastest access method for a field value is by doc value, then stored field, and finally, from the source.

If the field contains a GeoPoint value, additional methods are available, such as:

  • doc['field_name'].lat: This returns the latitude of a GeoPoint. If you need the value as an array, you can use doc['field_name'].lats.
  • doc['field_name'].lon: This returns the longitude of a GeoPoint. If you need the value as an array, you can use doc['field_name'].lons.
  • doc['field_name'].distance(lat,lon): This returns the plane distance in miles from a lat/lon point.
  • doc['field_name'].arcDistance(lat,lon): This returns the arc distance in miles given a lat/lon point.
  • doc['field_name'].geohashDistance(geohash): This returns the distance in miles given a GeoHash value.

By using these helper methods, it is possible to create advanced scripts to boost a document by a distance, which can be very handy for developing geospatial-centered applications.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.64.248