Using a boolean query

Most people using a search engine have at sometime used the syntax with minus (-) and plus (+) to include or exclude query terms. The Boolean query allows the user to programmatically define queries to include, exclude, optionally include (should), or filter in the query.

This kind of query is one of the most important ones because it allows the user to aggregate a lot of simple queries/filters that we will see in this chapter to build a big complex one.

Two main concepts are important in searches: query and filter. The query means that the matched results are scored using an internal Lucene scoring algorithm; for the filter, the results are matched without scoring. Because the filter doesn't need to compute the score, it is generally faster and can be cached.

Getting ready

You will need an up-and-running Elasticsearch installation as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via a command line, you need to install curl for your operating system.

To correctly execute the following commands, you will need an index populated with the chapter_05/populate_query.sh script available in the online code.

How to do it...

For executing a Boolean query, we will perform the following steps:

  1. We execute a Boolean query from the command line as follows:
            curl -XPOST 'http://127.0.0.1:9200/test-index/test- 
            type/_search?pretty' -d '{ 
                 "query": { 
                    "bool" : { 
                    "must" : [{ 
                        "term" : { "parsedtext" : "joe" } 
                    }], 
                    "must_not" : [{ 
                        "range" : { 
                            "position" : { "from" : 10, "to" : 20 } 
                        } 
                    }], 
                    "should" : [ 
                        { 
                           "term" : { "uuid" : "11111" } 
                        }, 
                       { 
                            "term" : { "uuid" : "22222" } 
                        } 
                    ], 
                    "filter" : [{ 
                       "term" : { "parsedtext" : "joe" } 
                    }], 
            "minimum_number_should_match" : 1, 
                   "boost" : 1.0 
                 } 
                } 
            }' 
    
  2. The result returned by Elasticsearch is similar to the previous recipes, but in this case it should return one record (id:1).

How it works...

The bool query is often one of the most used because it allows the user to compose a large query using a lot of simpler ones. One of the following four parts is mandatory:

  • must: A list of queries that must be satisfied. All the must queries must be verified to return the hits. It can be seen as an AND filter with all its sub queries.
  • must_not: A list of queries that must not be matched. It can be seen as not filter of an AND query.
  • should: A list of queries that can be verified. The minimum number of these queries that must be verified and this value is controlled by minimum_number_should_match (default 1).
  • filter: A list of queries to be used as the filter. They allow the user to filter out results without changing the score and relevance. The filter queries are faster than standard ones because they don't need to compute the score.

Tip

The Boolean filter is much faster than a group of And/Or/Not queries because it is optimized for executing fast Boolean bitwise operations on document bitmap results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.222.239