Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library. Lucene is arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary.
But Lucene is just a library. To leverage its power, you need to work in Java and to integrate Lucene directly with your application. Worse, you will likely require a degree in information retrieval to understand how it works. Lucene is very complex.
Elasticsearch is also written in Java and uses Lucene internally for all of its indexing and searching, but it aims to make full-text search easy by hiding the complexities of Lucene behind a simple, coherent, RESTful API.
However, Elasticsearch is much more than just Lucene and much more than “just” full-text search. It can also be described as follows:
A distributed real-time document store where every field is indexed and searchable
A distributed search engine with real-time analytics
Capable of scaling to hundreds of servers and petabytes of structured and unstructured data
And it packages up all this functionality into a standalone server that your application can talk to via a simple RESTful API, using a web client from your favorite programming language, or even from the command line.
It is easy to get started with Elasticsearch. It ships with sensible defaults and hides complicated search theory away from beginners. It just works, right out of the box. With minimal understanding, you can soon become productive.
Elasticsearch can be downloaded, used, and modified free of charge. It is available under the Apache 2 license, one of the most flexible open source licenses available.
As your knowledge grows, you can leverage more of Elasticsearch’s advanced features. The entire engine is configurable and flexible. Pick and choose from the advanced features to tailor Elasticsearch to your problem domain.
The easiest way to understand what Elasticsearch can do for you is to play with it, so let’s get started!
The only requirement for installing Elasticsearch is a recent version of Java. Preferably, you should install the latest version of the official Java from www.java.com.
You can download the latest version of Elasticsearch from elasticsearch.org/download.
curl -L -O http://download.elasticsearch.org/PATH/TO/VERSION.zip unzip elasticsearch-$VERSION
.zipcd
elasticsearch-$VERSION
Fill in the URL for the latest version available on elasticsearch.org/download.
When installing Elasticsearch in production, you can use the method described previously, or the Debian or RPM packages provided on the downloads page. You can also use the officially supported Puppet module or Chef cookbook.
Marvel is a management and monitoring tool for Elasticsearch, which is free for development use. It comes with an interactive console called Sense, which makes it easy to talk to Elasticsearch directly from your browser.
Many of the code examples in the online version of this book include a View in Sense link. When clicked, it will open up a working example of the code in the Sense console. You do not have to install Marvel, but it will make this book much more interactive by allowing you to experiment with the code samples on your local Elasticsearch cluster.
Marvel is available as a plug-in. To download and install it, run this command in the Elasticsearch directory:
./bin/plugin -i elasticsearch/marvel/latest
You probably don’t want Marvel to monitor your local cluster, so you can disable data collection with this command:
echo
'marvel.agent.enabled: false'
>> ./config/elasticsearch.yml
Elasticsearch is now ready to run. You can start it up in the foreground with this:
./bin/elasticsearch
Add -d
if you want to run it in the background as a daemon.
Test it out by opening another terminal window and running the following:
curl 'http://localhost:9200/?pretty'
You should see a response like this:
{
"status"
:
200
,
"name"
:
"Shrunken Bones"
,
"version"
:
{
"number"
:
"1.4.0"
,
"lucene_version"
:
"4.10"
},
"tagline"
:
"You Know, for Search"
}
This means that your Elasticsearch cluster is up and running, and we can start experimenting with it.
cluster.name
that are working together to share data
and to provide failover and scale, although a single node can form a cluster
all by itself.
You should change the default cluster.name
to something appropriate to you,
like your own name, to stop your nodes from trying to join another cluster on
the same network with the same name!
You can do this by editing the elasticsearch.yml
file in the config/
directory and then restarting Elasticsearch. When Elasticsearch is running in
the foreground, you can stop it by pressing Ctrl-C; otherwise, you can shut
it down with the shutdown
API:
curl -XPOST 'http://localhost:9200/_shutdown'
If you installed the Marvel management and monitoring tool, you can view it in a web browser by visiting http://localhost:9200/_plugin/marvel/.
You can reach the Sense developer console either by clicking the “Marvel dashboards” drop-down in Marvel, or by visiting http://localhost:9200/_plugin/marvel/sense/.
How you talk to Elasticsearch depends on whether you are using Java.
If you are using Java, Elasticsearch comes with two built-in clients that you can use in your code:
The node client joins a local cluster as a non data node. In other words, it doesn’t hold any data itself, but it knows what data lives on which node in the cluster, and can forward requests directly to the correct node.
The lighter-weight transport client can be used to send requests to a remote cluster. It doesn’t join the cluster itself, but simply forwards requests to a node in the cluster.
Both Java clients talk to the cluster over port 9300, using the native Elasticsearch transport protocol. The nodes in the cluster also communicate with each other over port 9300. If this port is not open, your nodes will not be able to form a cluster.
The Java client must be from the same version of Elasticsearch as the nodes; otherwise, they may not be able to understand each other.
More information about the Java clients can be found in the Java API section of the Guide.
All other languages can communicate with Elasticsearch over port 9200 using
a RESTful API, accessible with your favorite web client. In fact, as you have
seen, you can even talk to Elasticsearch from the command line by using the
curl
command.
A request to Elasticsearch consists of the same parts as any HTTP request:
curl
-
X
<
VERB
>
'<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>'
-
d
'<BODY>'
The parts marked with < >
above are:
VERB
The appropriate HTTP method or verb: GET
, POST
, PUT
, HEAD
, or DELETE
.
PROTOCOL
Either http
or https
(if you have an https
proxy in front of Elasticsearch.)
HOST
The hostname of any node in your Elasticsearch cluster, or localhost
for a node on your local machine.
PORT
The port running the Elasticsearch HTTP service, which defaults to 9200
.
QUERY_STRING
Any optional query-string parameters (for example ?pretty
will pretty-print the JSON response to make it easier to read.)
BODY
A JSON-encoded request body (if the request needs one.)
For instance, to count the number of documents in the cluster, we could use this:
curl
-
XGET
'http://localhost:9200/_count?pretty'
-
d
'
{
"query": {
"match_all": {}
}
}
'
Elasticsearch returns an HTTP status code like 200 OK
and (except for HEAD
requests) a JSON-encoded response body. The preceding curl
request would respond
with a JSON body like the following:
{
"count"
:
0
,
"_shards"
:
{
"total"
:
5
,
"successful"
:
5
,
"failed"
:
0
}
}
We don’t see the HTTP headers in the response because we didn’t ask curl
to
display them. To see the headers, use the curl
command with the -i
switch:
curl
-
i
-
XGET
'localhost:9200/'
For the rest of the book, we will show these curl
examples using a shorthand
format that leaves out all the bits that are the same in every request,
like the hostname and port, and the curl
command itself. Instead of showing
a full request like
curl
-
XGET
'localhost:9200/_count?pretty'
-
d
'
{
"query": {
"match_all": {}
}
}'
we will show it in this shorthand format:
GET
/
_count
{
"query"
:
{
"match_all"
:
{}
}
}
In fact, this is the same format that is used by the Sense console that we installed with Marvel. If in the online version of this book, you can open and run this code example in Sense by clicking the View in Sense link above.
Objects in an application are seldom just a simple list of keys and values. More often than not, they are complex data structures that may contain dates, geo locations, other objects, or arrays of values.
Sooner or later you’re going to want to store these objects in a database. Trying to do this with the rows and columns of a relational database is the equivalent of trying to squeeze your rich, expressive objects into a very big spreadsheet: you have to flatten the object to fit the table schema—usually one field per column—and then have to reconstruct it every time you retrieve it.
Elasticsearch is document oriented, meaning that it stores entire objects or documents. It not only stores them, but also indexes the contents of each document in order to make them searchable. In Elasticsearch, you index, search, sort, and filter documents—not rows of columnar data. This is a fundamentally different way of thinking about data and is one of the reasons Elasticsearch can perform complex full-text search.
Elasticsearch uses JavaScript Object Notation, or JSON, as the serialization format for documents. JSON serialization is supported by most programming languages, and has become the standard format used by the NoSQL movement. It is simple, concise, and easy to read.
Consider this JSON document, which represents a user object:
{
"email"
:
"[email protected]"
,
"first_name"
:
"John"
,
"last_name"
:
"Smith"
,
"info"
:
{
"bio"
:
"Eco-warrior and defender of the weak"
,
"age"
:
25
,
"interests"
:
[
"dolphins"
,
"whales"
]
},
"join_date"
:
"2014/05/01"
}
Although the original user
object was complex, the structure and meaning of
the object has been retained in the JSON version. Converting an object to JSON
for indexing in Elasticsearch is much simpler than the equivalent process for
a flat table structure.
Almost all languages have modules that will convert arbitrary data structures or objects into JSON for you, but the details are specific to each language. Look for modules that handle JSON serialization or marshalling. The official Elasticsearch clients all handle conversion to and from JSON for you automatically.
To give you a feel for what is possible in Elasticsearch and how easy it is to use, let’s start by walking through a simple tutorial that covers basic concepts such as indexing, search, and aggregations.
We’ll introduce some new terminology and basic concepts along the way, but it is OK if you don’t understand everything immediately. We’ll cover all the concepts introduced here in much greater depth throughout the rest of the book.
So, sit back and enjoy a whirlwind tour of what Elasticsearch is capable of.
We happen to work for Megacorp, and as part of HR’s new “We love our drones!” initiative, we have been tasked with creating an employee directory. The directory is supposed to foster employer empathy and real-time, synergistic, dynamic collaboration, so it has a few business requirements:
Enable data to contain multi value tags, numbers, and full text.
Retrieve the full details of any employee.
Allow structured search, such as finding employees over the age of 30.
Allow simple full-text search and more-complex phrase searches.
Return highlighted search snippets from the text in the matching documents.
Enable management to build analytic dashboards over the data.
The first order of business is storing employee data. This will take the form of an employee document’: a single document represents a single employee. The act of storing data in Elasticsearch is called indexing, but before we can index a document, we need to decide where to store it.
In Elasticsearch, a document belongs to a type, and those types live inside an index. You can draw some (rough) parallels to a traditional relational database:
Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields
An Elasticsearch cluster can contain multiple indices (databases), which in turn contain multiple types (tables). These types hold multiple documents (rows), and each document has multiple fields (columns).
So for our employee directory, we are going to do the following:
Index a document per employee, which contains all the details of a single employee.
That type will live in the megacorp
index.
That index will reside within our Elasticsearch cluster.
In practice, this is easy (even though it looks like a lot of steps). We can perform all of those actions in a single command:
PUT
/
megacorp
/
employee
/
1
{
"first_name"
:
"John"
,
"last_name"
:
"Smith"
,
"age"
:
25
,
"about"
:
"I love to go rock climbing"
,
"interests"
:
[
"sports"
,
"music"
]
}
Notice that the path /megacorp/employee/1
contains three pieces of
information:
megacorp
The index name
employee
The type name
1
The ID of this particular employee
The request body—the JSON document—contains all the information about this employee. His name is John Smith, he’s 25, and enjoys rock climbing.
Simple! There was no need to perform any administrative tasks first, like creating an index or specifying the type of data that each field contains. We could just index a document directly. Elasticsearch ships with defaults for everything, so all the necessary administration tasks were taken care of in the background, using default values.
Before moving on, let’s add a few more employees to the directory:
PUT
/
megacorp
/
employee
/
2
{
"first_name"
:
"Jane"
,
"last_name"
:
"Smith"
,
"age"
:
32
,
"about"
:
"I like to collect rock albums"
,
"interests"
:
[
"music"
]
}
PUT
/
megacorp
/
employee
/
3
{
"first_name"
:
"Douglas"
,
"last_name"
:
"Fir"
,
"age"
:
35
,
"about"
:
"I like to build cabinets"
,
"interests"
:
[
"forestry"
]
}
Now that we have some data stored in Elasticsearch, we can get to work on the business requirements for this application. The first requirement is the ability to retrieve individual employee data.
This is easy in Elasticsearch. We simply execute an HTTP GET
request and
specify the address of the document—the index, type, and ID. Using
those three pieces of information, we can return the original JSON document:
GET
/
megacorp
/
employee
/
1
And the response contains some metadata about the document, and John Smith’s
original JSON document as the _source
field:
{
"_index"
:
"megacorp"
,
"_type"
:
"employee"
,
"_id"
:
"1"
,
"_version"
:
1
,
"found"
:
true
,
"_source"
:
{
"first_name"
:
"John"
,
"last_name"
:
"Smith"
,
"age"
:
25
,
"about"
:
"I love to go rock climbing"
,
"interests"
:
[
"sports"
,
"music"
]
}
}
A GET
is fairly simple—you get back the document that you ask for. Let’s
try something a little more advanced, like a simple search!
The first search we will try is the simplest search possible. We will search for all employees, with this request:
GET
/
megacorp
/
employee
/
_search
You can see that we’re still using index megacorp
and type employee
, but
instead of specifying a document ID, we now use the _search
endpoint. The
response includes all three of our documents in the hits
array. By default,
a search will return the top 10 results.
{
"took"
:
6
,
"timed_out"
:
false
,
"_shards"
:
{
...
},
"hits"
:
{
"total"
:
3
,
"max_score"
:
1
,
"hits"
:
[
{
"_index"
:
"megacorp"
,
"_type"
:
"employee"
,
"_id"
:
"3"
,
"_score"
:
1
,
"_source"
:
{
"first_name"
:
"Douglas"
,
"last_name"
:
"Fir"
,
"age"
:
35
,
"about"
:
"I like to build cabinets"
,
"interests"
:
[
"forestry"
]
}
},
{
"_index"
:
"megacorp"
,
"_type"
:
"employee"
,
"_id"
:
"1"
,
"_score"
:
1
,
"_source"
:
{
"first_name"
:
"John"
,
"last_name"
:
"Smith"
,
"age"
:
25
,
"about"
:
"I love to go rock climbing"
,
"interests"
:
[
"sports"
,
"music"
]
}
},
{
"_index"
:
"megacorp"
,
"_type"
:
"employee"
,
"_id"
:
"2"
,
"_score"
:
1
,
"_source"
:
{
"first_name"
:
"Jane"
,
"last_name"
:
"Smith"
,
"age"
:
32
,
"about"
:
"I like to collect rock albums"
,
"interests"
:
[
"music"
]
}
}
]
}
}
Next, let’s try searching for employees who have “Smith” in their last name. To do this, we’ll use a lightweight search method that is easy to use from the command line. This method is often referred to as a query-string search, since we pass the search as a URL query-string parameter:
GET
/
megacorp
/
employee
/
_search
?
q
=
last_name
:
Smith
We use the same _search
endpoint in the path, and we add the query itself in
the q=
parameter. The results that come back show all Smiths:
{
...
"hits"
:
{
"total"
:
2
,
"max_score"
:
0.30685282
,
"hits"
:
[
{
...
"_source"
:
{
"first_name"
:
"John"
,
"last_name"
:
"Smith"
,
"age"
:
25
,
"about"
:
"I love to go rock climbing"
,
"interests"
:
[
"sports"
,
"music"
]
}
},
{
...
"_source"
:
{
"first_name"
:
"Jane"
,
"last_name"
:
"Smith"
,
"age"
:
32
,
"about"
:
"I like to collect rock albums"
,
"interests"
:
[
"music"
]
}
}
]
}
}
Query-string search is handy for ad hoc searches from the command line, but it has its limitations (see “Search Lite”). Elasticsearch provides a rich, flexible, query language called the query DSL, which allows us to build much more complicated, robust queries.
The domain-specific language (DSL) is specified using a JSON request body. We can represent the previous search for all Smiths like so:
GET
/
megacorp
/
employee
/
_search
{
"query"
:
{
"match"
:
{
"last_name"
:
"Smith"
}
}
}
This will return the same results as the previous query. You can see that a
number of things have changed. For one, we are no longer using query-string
parameters, but instead a request body. This request body is built with JSON,
and uses a match
query (one of several types of queries, which we will learn
about later).
Let’s make the search a little more complicated. We still want to find all employees with a last name of Smith, but we want only employees who are older than 30. Our query will change a little to accommodate a filter, which allows us to execute structured searches efficiently:
GET
/
megacorp
/
employee
/
_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"range"
:
{
"age"
:
{
"gt"
:
30
}
}
},
"query"
:
{
"match"
:
{
"last_name"
:
"smith"
}
}
}
}
}
This portion of the query is a range
filter, which will find all ages
older than 30—gt
stands for greater than.
This portion of the query is the same match
query that we used before.
Don’t worry about the syntax too much for now; we will cover it in great
detail later. Just recognize that we’ve added a filter that performs a
range search, and reused the same match
query as before. Now our results show only one employee who happens to be 32 and is named Jane Smith:
{
...
"hits"
:
{
"total"
:
1
,
"max_score"
:
0.30685282
,
"hits"
:
[
{
...
"_source"
:
{
"first_name"
:
"Jane"
,
"last_name"
:
"Smith"
,
"age"
:
32
,
"about"
:
"I like to collect rock albums"
,
"interests"
:
[
"music"
]
}
}
]
}
}
The searches so far have been simple: single names, filtered by age. Let’s try a more advanced, full-text search—a task that traditional databases would really struggle with.
We are going to search for all employees who enjoy rock climbing:
GET
/
megacorp
/
employee
/
_search
{
"query"
:
{
"match"
:
{
"about"
:
"rock climbing"
}
}
}
You can see that we use the same match
query as before to search the about
field for “rock climbing.” We get back two matching documents:
{
...
"hits"
:
{
"total"
:
2
,
"max_score"
:
0.16273327
,
"hits"
:
[
{
...
"_score"
:
0.16273327
,
"_source"
:
{
"first_name"
:
"John"
,
"last_name"
:
"Smith"
,
"age"
:
25
,
"about"
:
"I love to go rock climbing"
,
"interests"
:
[
"sports"
,
"music"
]
}
},
{
...
"_score"
:
0.016878016
,
"_source"
:
{
"first_name"
:
"Jane"
,
"last_name"
:
"Smith"
,
"age"
:
32
,
"about"
:
"I like to collect rock albums"
,
"interests"
:
[
"music"
]
}
}
]
}
}
By default, Elasticsearch sorts matching results by their relevance score,
that is, by how well each document matches the query. The first and highest-scoring result is obvious: John Smith’s about
field clearly says “rock
climbing” in it.
But why did Jane Smith come back as a result? The reason her document was
returned is because the word “rock” was mentioned in her about
field.
Because only “rock” was mentioned, and not “climbing,” her _score
is
lower than John’s.
This is a good example of how Elasticsearch can search within full-text fields and return the most relevant results first. This concept of relevance is important to Elasticsearch, and is a concept that is completely foreign to traditional relational databases, in which a record either matches or it doesn’t.
Finding individual words in a field is all well and good, but sometimes you want to match exact sequences of words or phrases. For instance, we could perform a query that will match only employee records that contain both “rock” and “climbing” and that display the words are next to each other in the phrase “rock climbing.”
To do this, we use a slight variation of the match
query called the
match_phrase
query:
GET
/
megacorp
/
employee
/
_search
{
"query"
:
{
"match_phrase"
:
{
"about"
:
"rock climbing"
}
}
}
This, to no surprise, returns only John Smith’s document:
{
...
"hits"
:
{
"total"
:
1
,
"max_score"
:
0.23013961
,
"hits"
:
[
{
...
"_score"
:
0.23013961
,
"_source"
:
{
"first_name"
:
"John"
,
"last_name"
:
"Smith"
,
"age"
:
25
,
"about"
:
"I love to go rock climbing"
,
"interests"
:
[
"sports"
,
"music"
]
}
}
]
}
}
Many applications like to highlight snippets of text from each search result so the user can see why the document matched the query. Retrieving highlighted fragments is easy in Elasticsearch.
Let’s rerun our previous query, but add a new highlight
parameter:
GET
/
megacorp
/
employee
/
_search
{
"query"
:
{
"match_phrase"
:
{
"about"
:
"rock climbing"
}
},
"highlight"
:
{
"fields"
:
{
"about"
:
{}
}
}
}
When we run this query, the same hit is returned as before, but now we get a
new section in the response called highlight
. This contains a snippet of
text from the about
field with the matching words wrapped in <em></em>
HTML tags:
{
...
"hits"
:
{
"total"
:
1
,
"max_score"
:
0.23013961
,
"hits"
:
[
{
...
"_score"
:
0.23013961
,
"_source"
:
{
"first_name"
:
"John"
,
"last_name"
:
"Smith"
,
"age"
:
25
,
"about"
:
"I love to go rock climbing"
,
"interests"
:
[
"sports"
,
"music"
]
},
"highlight"
:
{
"about"
:
[
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
You can read more about the highlighting of search snippets in the highlighting reference documentation.
Finally, we come to our last business requirement: allow managers to run
analytics over the employee directory. Elasticsearch has functionality called
aggregations, which allow you to generate sophisticated analytics over your
data. It is similar to GROUP BY
in SQL, but much more powerful.
For example, let’s find the most popular interests enjoyed by our employees:
GET
/
megacorp
/
employee
/
_search
{
"aggs"
:
{
"all_interests"
:
{
"terms"
:
{
"field"
:
"interests"
}
}
}
}
Ignore the syntax for now and just look at the results:
{
...
"hits"
:
{
...
},
"aggregations"
:
{
"all_interests"
:
{
"buckets"
:
[
{
"key"
:
"music"
,
"doc_count"
:
2
},
{
"key"
:
"forestry"
,
"doc_count"
:
1
},
{
"key"
:
"sports"
,
"doc_count"
:
1
}
]
}
}
}
We can see that two employees are interested in music, one in forestry, and one in sports. These aggregations are not precalculated; they are generated on the fly from the documents that match the current query. If we want to know the popular interests of people called Smith, we can just add the appropriate query into the mix:
GET
/
megacorp
/
employee
/
_search
{
"query"
:
{
"match"
:
{
"last_name"
:
"smith"
}
},
"aggs"
:
{
"all_interests"
:
{
"terms"
:
{
"field"
:
"interests"
}
}
}
}
The all_interests
aggregation has changed to include only documents matching our query:
...
"all_interests"
:
{
"buckets"
:
[
{
"key"
:
"music"
,
"doc_count"
:
2
},
{
"key"
:
"sports"
,
"doc_count"
:
1
}
]
}
Aggregations allow hierarchical rollups too. For example, let’s find the average age of employees who share a particular interest:
GET
/
megacorp
/
employee
/
_search
{
"aggs"
:
{
"all_interests"
:
{
"terms"
:
{
"field"
:
"interests"
},
"aggs"
:
{
"avg_age"
:
{
"avg"
:
{
"field"
:
"age"
}
}
}
}
}
}
The aggregations that we get back are a bit more complicated, but still fairly easy to understand:
...
"all_interests"
:
{
"buckets"
:
[
{
"key"
:
"music"
,
"doc_count"
:
2
,
"avg_age"
:
{
"value"
:
28.5
}
},
{
"key"
:
"forestry"
,
"doc_count"
:
1
,
"avg_age"
:
{
"value"
:
35
}
},
{
"key"
:
"sports"
,
"doc_count"
:
1
,
"avg_age"
:
{
"value"
:
25
}
}
]
}
The output is basically an enriched version of the first aggregation we ran.
We still have a list of interests and their counts, but now each interest has
an additional avg_age
, which shows the average age for all employees having
that interest.
Even if you don’t understand the syntax yet, you can easily see how complex aggregations and groupings can be accomplished using this feature. The sky is the limit as to what kind of data you can extract!
Hopefully, this little tutorial was a good demonstration about what is possible in Elasticsearch. It is really just scratching the surface, and many features—such as suggestions, geolocation, percolation, fuzzy and partial matching—were omitted to keep the tutorial short. But it did highlight just how easy it is to start building advanced search functionality. No configuration was needed—just add data and start searching!
It’s likely that the syntax left you confused in places, and you may have questions about how to tweak and tune various aspects. That’s fine! The rest of the book dives into each of these issues in detail, giving you a solid understanding of how Elasticsearch works.
At the beginning of this chapter, we said that Elasticsearch can scale out to hundreds (or even thousands) of servers and handle petabytes of data. While our tutorial gave examples of how to use Elasticsearch, it didn’t touch on the mechanics at all. Elasticsearch is distributed by nature, and it is designed to hide the complexity that comes with being distributed.
The distributed aspect of Elasticsearch is largely transparent. Nothing in the tutorial required you to know about distributed systems, sharding, cluster discovery, or dozens of other distributed concepts. It happily ran the tutorial on a single node living inside your laptop, but if you were to run the tutorial on a cluster containing 100 nodes, everything would work in exactly the same way.
Elasticsearch tries hard to hide the complexity of distributed systems. Here are some of the operations happening automatically under the hood:
Partitioning your documents into different containers or shards, which can be stored on a single node or on multiple nodes
Balancing these shards across the nodes in your cluster to spread the indexing and search load
Duplicating each shard to provide redundant copies of your data, to prevent data loss in case of hardware failure
Routing requests from any node in the cluster to the nodes that hold the data you’re interested in
Seamlessly integrating new nodes as your cluster grows or redistributing shards to recover from node loss
As you read through this book, you’ll encounter supplemental chapters about the distributed nature of Elasticsearch. These chapters will teach you about how the cluster scales and deals with failover (Chapter 2), handles document storage (Chapter 4), executes distributed search (Chapter 9), and what a shard is and how it works (Chapter 11).
These chapters are not required reading—you can use Elasticsearch without understanding these internals—but they will provide insight that will make your knowledge of Elasticsearch more complete. Feel free to skim them and revisit at a later point when you need a more complete understanding.
By now you should have a taste of what you can do with Elasticsearch, and how easy it is to get started. Elasticsearch tries hard to work out of the box with minimal knowledge and configuration. The best way to learn Elasticsearch is by jumping in: just start indexing and searching!
However, the more you know about Elasticsearch, the more productive you can become. The more you can tell Elasticsearch about the domain-specific elements of your application, the more you can fine-tune the output.
The rest of this book will help you move from novice to expert. Each chapter explains the essentials, but also includes expert-level tips. If you’re just getting started, these tips are probably not immediately relevant to you; Elasticsearch has sensible defaults and will generally do the right thing without any interference. You can always revisit these chapters later, when you are looking to improve performance by shaving off any wasted milliseconds.
3.16.75.165