Searching or filtering for a particular term is very frequent. Term queries work with exact value matches and are generally very fast.
The term queries can be compared to the equal "=" query in the SQL world (for not tokenized fields).
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via the command line, you need to install curl
for your operative system.
To correctly execute the following commands, you need an index populated with the chapter_05/populate_query.sh
script available in the online code.
To execute a term query, we will perform the following steps:
curl -XPOST 'http://127.0.0.1:9200/test-index/test- type/_search?pretty=true' -d '{ "query": { "term": { "uuid": "33333" } } }'
{ "took" : 58, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.30685282, "hits" : [ { "_index" : "test-index", "_type" : "test-type", "_id" : "3", "_score" : 0.30685282, "_source" : {"position": 3, "parsedtext": "Bill is not nice guy", "name": "Bill Clinton", "uuid": "33333"} } ] } }
curl -XPOST 'http://127.0.0.1:9200/test-index/test- type/_search?pretty=true' -d '{ "query": { "bool": { "filter": { "term": { "uuid": "33333" } } } } }'
{ "took" : 46, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.0, "hits" : [ { "_index" : "test-index", "_type" : "test-type", "_id" : "3", "_score" : 0.0, "_source" : { "hit" : 4, "price" : 6.0, "name" : "Bill Clinton", "position" : 3, "parsedtext" : "Bill is not nice guy", "uuid" : "33333" } } ] } }
The result is a standard query result as we have seen in the Executing a Search recipe in Chapter 5, Search.
Lucene, due to its inverted index, is one of the fastest engines at searching for a term/value in a field.
Every field that is indexed in Lucene is converted in a fast search structure for its particular type:
In Elasticsearch, all these conversion steps are automatically managed. Search for a term, independent from the value, which is archived by Elasticsearch using the correct format for the field.
Internally, during a term query execution, all the documents matching the term are collected, then they are sorted by score (the scoring depends on the Lucene, similarity algorithm chosen by default BM25. For more details about Elasticsearch similarity algorithms see https://www.elastic.co/guide/en/elasticsearch/reference/5.x/index-modules-similarity.html).
If we look for the results of the previous searches, for the term query the hit has 0.30685282
as the score, the filter has 1.0
. The time required for scoring if the sample is very small is not so relevant, but if you have thousands or millions of documents it takes much more time.
The filter is preferred to the query when the score is not important. The typical scenarios are:
Matching a term is the basis of Lucene and Elasticsearch. To correctly use these queries, you need to pay attention to how the field is indexed.
As we saw in Chapter 3, Managing Mappings, the terms of an indexed field depend on the analyzer used to index it. To better understand this concept, there is a representation of a phrase depending on several analyzers in the following table. For standard string analyzers, if we have a similar phrase Phrase: Peter's house is big
, the results will be similar to the following table:
Mapping index |
Analyzer |
Tokens |
|
(No index) |
(No tokens) |
|
|
|
|
|
|
The common pitfalls in searching are related to misunderstanding the analyzer/mapping configuration.
KeywordAnalyzer
, which is used as the default for the not tokenized
field, saves the string unchanged as a single token.
StandardAnalyzer
, the default for the type="text"
field, tokenizes on whitespaces and punctuation; every token is converted into lowercase. You should use the same analyzer for indexing to analyze the query (the default settings).
In the preceding example, if the phrase is analyzed with StandardAnalyzer
, you cannot search for the term "Peter", but rather for "peter" because the StandardAnalyzer executes lowercase on terms.
3.133.117.2