Using a term query

Searching or filtering for a particular term is very frequent. Term queries work with exact value matches and are generally very fast.

The term queries can be compared to the equal "=" query in the SQL world (for not tokenized fields).

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To correctly execute the following commands, you need an index populated with the chapter_05/populate_query.sh script available in the online code.

How to do it...

To execute a term query, we will perform the following steps:

  1. We execute a term query from the command line:
            curl -XPOST 'http://127.0.0.1:9200/test-index/test-
            type/_search?pretty=true' -d '{
              "query": {
                  "term": {
                      "uuid": "33333"
                  }
               }
            }'
    
  2. The result returned by Elasticsearch, if everything is alright, should be as follows:
            { 
              "took" : 58, 
              "timed_out" : false, 
              "_shards" : { 
                "total" : 5, 
                "successful" : 5, 
                "failed" : 0 
              }, 
              "hits" : { 
                "total" : 1, 
                "max_score" : 0.30685282, 
                "hits" : [ { 
                  "_index" : "test-index", 
                  "_type" : "test-type", 
                  "_id" : "3", 
                  "_score" : 0.30685282, "_source" : {"position": 3,   
                  "parsedtext": "Bill is not
                    
                  nice guy", "name": "Bill Clinton", "uuid": "33333"} 
                } ] 
              } 
            } 
    
  3. For executing a term query as a filter, we need to use it wrapped in a Boolean query. The preceding term query will be executed in this way:
            curl -XPOST 'http://127.0.0.1:9200/test-index/test-
            type/_search?pretty=true' -d '{
              "query": {
                "bool": {
                  "filter": {
                    "term": {
                      "uuid": "33333"
                    }
                  }
                }
              }
            }'
    
  4. The result returned by Elasticsearch, if everything is alright, should be:
            { 
              "took" : 46, 
              "timed_out" : false, 
              "_shards" : { 
                "total" : 5, 
                "successful" : 5, 
                "failed" : 0 
              }, 
              "hits" : { 
                "total" : 1, 
                "max_score" : 0.0, 
                "hits" : [ 
                  { 
                    "_index" : "test-index", 
                    "_type" : "test-type", 
                    "_id" : "3", 
                    "_score" : 0.0, 
                    "_source" : { 
                      "hit" : 4, 
                      "price" : 6.0, 
                      "name" : "Bill Clinton", 
                      "position" : 3, 
                      "parsedtext" : "Bill is not
              
                       nice guy", 
                      "uuid" : "33333" 
                    } 
                  } 
                ] 
              } 
            } 
    

The result is a standard query result as we have seen in the Executing a Search recipe in Chapter 5, Search.

How it works...

Lucene, due to its inverted index, is one of the fastest engines at searching for a term/value in a field.

Every field that is indexed in Lucene is converted in a fast search structure for its particular type:

  • The text is split in tokens if analyzed or saved as a single token
  • The numeric fields are converted in their fastest binary representation
  • The date and datetime fields are converted in binary forms

In Elasticsearch, all these conversion steps are automatically managed. Search for a term, independent from the value, which is archived by Elasticsearch using the correct format for the field.

Internally, during a term query execution, all the documents matching the term are collected, then they are sorted by score (the scoring depends on the Lucene, similarity algorithm chosen by default BM25. For more details about Elasticsearch similarity algorithms see https://www.elastic.co/guide/en/elasticsearch/reference/5.x/index-modules-similarity.html).

If we look for the results of the previous searches, for the term query the hit has 0.30685282 as the score, the filter has 1.0. The time required for scoring if the sample is very small is not so relevant, but if you have thousands or millions of documents it takes much more time.

Tip

If the score is not important, prefer to use the term filter.

The filter is preferred to the query when the score is not important. The typical scenarios are:

  • Filtering permissions
  • Filtering numerical values
  • Filtering ranges

Tip

In filtered query, the filter applies first, narrowing the number of documents to be matched against the query, then the query is applied.

There's more...

Matching a term is the basis of Lucene and Elasticsearch. To correctly use these queries, you need to pay attention to how the field is indexed.

As we saw in Chapter 3, Managing Mappings, the terms of an indexed field depend on the analyzer used to index it. To better understand this concept, there is a representation of a phrase depending on several analyzers in the following table. For standard string analyzers, if we have a similar phrase Phrase: Peter's house is big, the results will be similar to the following table:

Mapping index

Analyzer

Tokens

"index": false

(No index)

(No tokens)

"type": "keyword"

KeywordAnalyzer

["Peter's house is big"]

"type": "text"

StandardAnalyzer

["peter", "s", "house", "is", "big"]

The common pitfalls in searching are related to misunderstanding the analyzer/mapping configuration.

KeywordAnalyzer, which is used as the default for the not tokenized field, saves the string unchanged as a single token.

StandardAnalyzer, the default for the type="text" field, tokenizes on whitespaces and punctuation; every token is converted into lowercase. You should use the same analyzer for indexing to analyze the query (the default settings).

In the preceding example, if the phrase is analyzed with StandardAnalyzer, you cannot search for the term "Peter", but rather for "peter" because the StandardAnalyzer executes lowercase on terms.

Tip

When the same field requires one or more search strategies, you need to use the fields property via the different analyzers that you need.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.117.2