We have seen how Elasticsearch makes it easy to start developing a new application without requiring any advance planning or setup. However, it doesn’t take long before you start wanting to fine-tune the indexing and search process to better suit your particular use case. Almost all of these customizations relate to the index, and the types that it contains. In this chapter, we introduce the APIs for managing indices and type mappings, and the most important settings.
Until now, we have created a new index by simply indexing a document into it. The index is created with the default settings, and new fields are added to the type mapping by using dynamic mapping. Now we need more control over the process: we want to ensure that the index has been created with the appropriate number of primary shards, and that analyzers and mappings are set up before we index any data.
To do this, we have to create the index manually, passing in any settings or type mappings in the request body, as follows:
PUT
/
my_index
{
"settings"
:
{
...
any
settings
...
},
"mappings"
:
{
"type_one"
:
{
...
any
mappings
...
},
"type_two"
:
{
...
any
mappings
...
},
...
}
}
In fact, if you want to, you can prevent the automatic creation of indices by
adding the following setting to the config/elasticsearch.yml
file on each
node:
action
.
auto_create_index
:
false
Later, we discuss how you can use “Index Templates” to preconfigure automatically created indices. This is particularly useful when indexing log data: you log into an index whose name includes the date and, as midnight rolls over, a new properly configured index automatically springs into existence.
There are many many knobs that you can twiddle to customize index behavior, which you can read about in the Index Modules reference documentation, but…
Two of the most important settings are as follows:
number_of_shards
The number of primary shards that an index should have,
which defaults to 5
. This setting cannot be changed
after index creation.
number_of_replicas
The number of replica shards (copies) that each primary shard
should have, which defaults to 1
. This setting can be changed
at any time on a live index.
For instance, we could create a small index—just one primary shard—and no replica shards with the following request:
PUT
/
my_temp_index
{
"settings"
:
{
"number_of_shards"
:
1
,
"number_of_replicas"
:
0
}
}
Later, we can change the number of replica shards dynamically using the
update-index-settings
API as follows:
PUT
/
my_temp_index
/
_settings
{
"number_of_replicas"
:
1
}
The third important index setting is the analysis
section, which is used
to configure existing analyzers or to create new custom analyzers
specific to your index.
In “Analysis and Analyzers”, we introduced some of the built-in analyzers, which are used to convert full-text strings into an inverted index, suitable for searching.
The standard
analyzer, which is the default analyzer
used for full-text fields, is a good choice for most Western languages.
It consists of the following:
The standard
tokenizer, which splits the input text on word boundaries
The standard
token filter, which is intended to tidy up the tokens
emitted by the tokenizer (but currently does nothing)
The lowercase
token filter, which converts all tokens into lowercase
The stop
token filter, which removes stopwords—common words
that have little impact on search relevance, such as a
, the
, and
,
is
.
By default, the stopwords filter is disabled. You can enable it by creating a
custom analyzer based on the standard
analyzer and setting the stopwords
parameter. Either provide a list of stopwords or tell it to use a predefined
stopwords list from a particular language.
In the following example, we create a new analyzer called the es_std
analyzer, which uses the predefined list of Spanish stopwords:
PUT
/
spanish_docs
{
"settings"
:
{
"analysis"
:
{
"analyzer"
:
{
"es_std"
:
{
"type"
:
"standard"
,
"stopwords"
:
"_spanish_"
}
}
}
}
}
The es_std
analyzer is not global—it exists only in the spanish_docs
index where we have defined it. To test it with the analyze
API, we must
specify the index name:
GET
/
spanish_docs
/
_analyze
?
analyzer
=
es_std
El
veloz
zorro
marrón
The abbreviated results show that the Spanish stopword El
has been
removed correctly:
{
"tokens"
:
[
{
"token"
:
"veloz"
,
"position"
:
2
},
{
"token"
:
"zorro"
,
"position"
:
3
},
{
"token"
:
"marrón"
,
"position"
:
4
}
]
}
While Elasticsearch comes with a number of analyzers available out of the box, the real power comes from the ability to create your own custom analyzers by combining character filters, tokenizers, and token filters in a configuration that suits your particular data.
In “Analysis and Analyzers”, we said that an analyzer is a wrapper that combines three functions into a single package, which are executed in sequence:
Character filters are used to “tidy up” a string before it is tokenized.
For instance, if our text is in HTML format, it will contain HTML tags like
<p>
or <div>
that we don’t want to be indexed. We can use the
html_strip
character filter
to remove all HTML tags and to convert HTML entities like Á
into the
corresponding Unicode character Á
.
An analyzer may have zero or more character filters.
An analyzer must have a single tokenizer. The tokenizer breaks up the
string into individual terms or tokens. The
standard
tokenizer,
which is used in the standard
analyzer, breaks up a string into
individual terms on word boundaries, and removes most punctuation, but
other tokenizers exist that have different behavior.
For instance, the
keyword
tokenizer
outputs exactly the same string as it received, without any tokenization. The
whitespace
tokenizer
splits text on whitespace only. The
pattern
tokenizer can
be used to split text on a matching regular expression.
After tokenization, the resulting token stream is passed through any specified token filters, in the order in which they are specified.
Token filters may change, add, or remove tokens. We have already mentioned the
lowercase
and
stop
token filters,
but there are many more available in Elasticsearch.
Stemming token filters
“stem” words to their root form. The
ascii_folding
filter
removes diacritics, converting a term like "très"
into "tres"
. The
ngram
and
edge_ngram
token filters can produce
tokens suitable for partial matching or autocomplete.
In Part II, we discuss examples of where and how to use these tokenizers and filters. But first, we need to explain how to create a custom analyzer.
In the same way as we configured the es_std
analyzer previously, we can configure
character filters, tokenizers, and token filters in their respective sections
under analysis
:
PUT
/
my_index
{
"settings"
:
{
"analysis"
:
{
"char_filter"
:
{
...
custom
character
filters
...
},
"tokenizer"
:
{
...
custom
tokenizers
...
},
"filter"
:
{
...
custom
token
filters
...
},
"analyzer"
:
{
...
custom
analyzers
...
}
}
}
}
As an example, let’s set up a custom analyzer that will do the following:
Strip out HTML by using the html_strip
character filter.
Replace &
characters with " and "
, using a custom mapping
character filter:
"char_filter"
:
{
"&_to_and"
:
{
"type"
:
"mapping"
,
"mappings"
:
[
"&=> and "
]
}
}
Tokenize words, using the standard
tokenizer.
Lowercase terms, using the lowercase
token filter.
Remove a custom list of stopwords, using a custom stop
token filter:
"filter"
:
{
"my_stopwords"
:
{
"type"
:
"stop"
,
"stopwords"
:
[
"the"
,
"a"
]
}
}
Our analyzer definition combines the predefined tokenizer and filters with the custom filters that we have configured previously:
"analyzer"
:
{
"my_analyzer"
:
{
"type"
:
"custom"
,
"char_filter"
:
[
"html_strip"
,
"&_to_and"
],
"tokenizer"
:
"standard"
,
"filter"
:
[
"lowercase"
,
"my_stopwords"
]
}
}
To put it all together, the whole create-index
request looks like this:
PUT
/
my_index
{
"settings"
:
{
"analysis"
:
{
"char_filter"
:
{
"&_to_and"
:
{
"type"
:
"mapping"
,
"mappings"
:
[
"&=> and "
]
}},
"filter"
:
{
"my_stopwords"
:
{
"type"
:
"stop"
,
"stopwords"
:
[
"the"
,
"a"
]
}},
"analyzer"
:
{
"my_analyzer"
:
{
"type"
:
"custom"
,
"char_filter"
:
[
"html_strip"
,
"&_to_and"
],
"tokenizer"
:
"standard"
,
"filter"
:
[
"lowercase"
,
"my_stopwords"
]
}}
}}}
After creating the index, use the analyze
API to test the new analyzer:
GET
/
my_index
/
_analyze
?
analyzer
=
my_analyzer
The
quick
&
brown
fox
The following abbreviated results show that our analyzer is working correctly:
{
"tokens"
:
[
{
"token"
:
"quick"
,
"position"
:
2
},
{
"token"
:
"and"
,
"position"
:
3
},
{
"token"
:
"brown"
,
"position"
:
4
},
{
"token"
:
"fox"
,
"position"
:
5
}
]
}
The analyzer is not much use unless we tell Elasticsearch where to use it. We
can apply it to a string
field with a mapping such as the following:
PUT
/
my_index
/
_mapping
/
my_type
{
"properties"
:
{
"title"
:
{
"type"
:
"string"
,
"analyzer"
:
"my_analyzer"
}
}
}
A type in Elasticsearch represents a class of similar documents. A type
consists of a name—such as user
or blogpost
—and a mapping. The
mapping, like a database schema, describes the fields or properties that
documents of that type may have, the datatype of each field—such as string
,
integer
, or date
—and how those fields should be indexed and stored by
Lucene.
In “What Is a Document?”, we said that a type is like a table in a relational database. While this is a useful way to think about types initially, it is worth explaining in more detail exactly what a type is and how they are implemented on top of Lucene.
A document in Lucene consists of a simple list of field-value pairs. A field must have at least one value, but any field can contain multiple values. Similarly, a single string value may be converted into multiple values by the analysis process. Lucene doesn’t care if the values are strings or numbers or dates—all values are just treated as opaque bytes.
When we index a document in Lucene, the values for each field are added to the inverted index for the associated field. Optionally, the original values may also be stored unchanged so that they can be retrieved later.
Elasticsearch types are implemented on top of this simple foundation. An index may have several types, each with its own mapping, and documents of any of these types may be stored in the same index.
Because Lucene has no concept of document types, the type name of each
document is stored with the document in a metadata field called _type
. When
we search for documents of a particular type, Elasticsearch simply uses a
filter on the _type
field to restrict results to documents of that type.
Lucene also has no concept of mappings. Mappings are the layer that Elasticsearch uses to map complex JSON documents into the simple flat documents that Lucene expects to receive.
For instance, the mapping for the name
field in the user
type may declare
that the field is a string
field, and that its value should be analyzed
by the whitespace
analyzer before being indexed into the inverted
index called name
:
"name"
:
{
"type"
:
"string"
,
"analyzer"
:
"whitespace"
}
The fact that documents of different types can be added to the same index introduces some unexpected complications.
Imagine that we have two types in our index: blog_en
for blog posts in
English, and blog_es
for blog posts in Spanish. Both types have a
title
field, but one type uses the english
analyzer and
the other type uses the spanish
analyzer.
The problem is illustrated by the following query:
GET
/
_search
{
"query"
:
{
"match"
:
{
"title"
:
"The quick brown fox"
}
}
}
We are searching in the title
field in both types. The query string needs
to be analyzed, but which analyzer does it use: spanish
or english
? It
will use the analyzer for the first title
field that it finds, which
will be correct for some docs and incorrect for the others.
We can avoid this problem either by naming the fields differently—for example, title_en
and title_es
—or by explicitly including the type name in the
field name and querying each field separately:
GET
/
_search
{
"query"
:
{
"multi_match"
:
{
"query"
:
"The quick brown fox"
,
"fields"
:
[
"blog_en.title"
,
"blog_es.title"
]
}
}
}
Our new query uses the english
analyzer for the field blog_en.title
and
the spanish
analyzer for the field blog_es.title
, and combines the results
from both fields into an overall relevance score.
This solution can help when both fields have the same datatype, but consider what would happen if you indexed these two documents into the same index:
Type: user
{
"login"
:
"john_smith"
}
Type: event
{
"login"
:
"2014-06-01"
}
Lucene doesn’t care that one field contains a string and the other field contains a date. It will happily index the byte values from both fields.
However, if we now try to sort on the event.login
field, Elasticsearch
needs to load the values in the login
field into memory. As we said in
“Fielddata”, it loads the values for all documents in the index
regardless of their type.
It will try to load these values either as a string or as a date, depending on
which login
field it sees first. This will either produce unexpected results
or fail outright.
The uppermost level of a mapping is known as the root object. It may contain the following:
A properties section, which lists the mapping for each field that a document may contain
Various metadata fields, all of which start with an underscore, such
as _type
, _id
, and _source
Settings, which control how the dynamic detection of new fields
is handled, such as analyzer
, dynamic_date_formats
, and
dynamic_templates
Other settings, which can be applied both to the root object and to fields
of type object
, such as enabled
, dynamic
, and include_in_all
We have already discussed the three most important settings for document fields or properties in “Core Simple Field Types” and “Complex Core Field Types”:
type
The datatype that the field contains, such as string
or date
index
Whether a field should be searchable as full text (analyzed
), searchable as an exact value (not_analyzed
), or not searchable at all (no
)
analyzer
Which analyzer
to use for a full-text field, both at index time and at search time
We will discuss other field types such as ip
, geo_point
, and geo_shape
in
the appropriate sections later in the book.
By default, Elasticsearch stores the JSON string representing the
document body in the _source
field. Like all stored fields, the _source
field is compressed before being written to disk.
This is almost always desired functionality because it means the following:
The full document is available directly from the search results—no need for a separate round-trip to fetch the document from another data store.
Partial update
requests will not function without the _source
field.
When your mapping changes and you need to reindex your data, you can do so directly from Elasticsearch instead of having to retrieve all of your documents from another (usually slower) data store.
Individual fields can be extracted from the _source
field and returned
in get
or search
requests when you don’t need to see the whole document.
It is easier to debug queries, because you can see exactly what each document contains, rather than having to guess their contents from a list of IDs.
That said, storing the _source
field does use disk space. If none of the
preceding reasons is important to you, you can disable the _source
field with
the following mapping:
PUT
/
my_index
{
"mappings"
:
{
"my_type"
:
{
"_source"
:
{
"enabled"
:
false
}
}
}
}
In a search request, you can ask for only certain fields by specifying the
_source
parameter in the request body:
GET
/
_search
{
"query"
:
{
"match_all"
:
{}},
"_source"
:
[
"title"
,
"created"
]
}
Values for these fields will be extracted from the _source
field and
returned instead of the full _source
.
In “Search Lite”, we introduced the _all
field: a special field that
indexes the values from all other fields as one big string. The query_string
query clause (and searches performed as ?q=john
) defaults to searching in
the _all
field if no other field is specified.
The _all
field is useful during the exploratory phase of a new application,
while you are still unsure about the final structure that your documents will
have. You can throw any query string at it and you have a good chance of
finding the document you’re after:
GET
/
_search
{
"match"
:
{
"_all"
:
"john smith marketing"
}
}
As your application evolves and your search requirements become more exacting,
you will find yourself using the _all
field less and less. The _all
field
is a shotgun approach to search. By querying individual fields, you have more
flexbility, power, and fine-grained control over which results are considered
to be most relevant.
One of the important factors taken into account by the
relevance algorithm
is the length of the field: the shorter the field, the more important. A term
that appears in a short title
field is likely to be more important than the
same term that appears somewhere in a long content
field. This distinction
between field lengths disappears in the _all
field.
If you decide that you no longer need the _all
field, you can disable it
with this mapping:
PUT
/
my_index
/
_mapping
/
my_type
{
"my_type"
:
{
"_all"
:
{
"enabled"
:
false
}
}
}
Inclusion in the _all
field can be controlled on a field-by-field basis
by using the include_in_all
setting, which defaults to true
. Setting
include_in_all
on an object (or on the root object) changes the
default for all fields within that object.
You may find that you want to keep the _all
field around to use
as a catchall full-text field just for specific fields, such as
title
, overview
, summary
, and tags
. Instead of disabling the _all
field completely, disable include_in_all
for all fields by default,
and enable it only on the fields you choose:
PUT
/
my_index
/
my_type
/
_mapping
{
"my_type"
:
{
"include_in_all"
:
false
,
"properties"
:
{
"title"
:
{
"type"
:
"string"
,
"include_in_all"
:
true
},
...
}
}
}
Remember that the _all
field is just an analyzed string
field. It
uses the default analyzer to analyze its values, regardless of which
analyzer has been set on the fields where the values originate. And
like any string
field, you can configure which analyzer the _all
field should use:
PUT
/
my_index
/
my_type
/
_mapping
{
"my_type"
:
{
"_all"
:
{
"analyzer"
:
"whitespace"
}
}
}
There are four metadata fields associated with document identity:
_id
The string ID of the document
_type
The type name of the document
_index
The index where the document lives
_uid
The _type
and _id
concatenated together as type#id
By default, the _uid
field is stored (can be retrieved) and
indexed (searchable). The _type
field is indexed but not stored,
and the _id
and _index
fields are neither indexed nor stored, meaning
they don’t really exist.
In spite of this, you can query the _id
field as though it were a real
field. Elasticsearch uses the _uid
field to derive the _id
. Although you
can change the index
and store
settings for these fields, you almost
never need to do so.
The _id
field does have one setting that you may want to use: the path
setting tells Elasticsearch that it should extract the value for the
_id
from a field within the document itself.
PUT
/
my_index
{
"mappings"
:
{
"my_type"
:
{
"_id"
:
{
"path"
:
"doc_id"
},
"properties"
:
{
"doc_id"
:
{
"type"
:
"string"
,
"index"
:
"not_analyzed"
}
}
}
}
}
Then, when you index a document:
POST
/
my_index
/
my_type
{
"doc_id"
:
"123"
}
the _id
value will be extracted from the doc_id
field in the document
body:
{
"_index"
:
"my_index"
,
"_type"
:
"my_type"
,
"_id"
:
"123"
,
"_version"
:
1
,
"created"
:
true
}
bulk
requests (see “Why the Funny Format?”). The node handling
the request can no longer use the optimized bulk format to parse just
the metadata line in order to decide which shard should receive the request.
Instead, it has to parse the document body as well.
When Elasticsearch encounters a previously unknown field in a document, it uses dynamic mapping to determine the datatype for the field and automatically adds the new field to the type mapping.
Sometimes this is the desired behavior and sometimes it isn’t. Perhaps you don’t know what fields will be added to your documents later, but you want them to be indexed automatically. Perhaps you just want to ignore them. Or—especially if you are using Elasticsearch as a primary data store—perhaps you want unknown fields to throw an exception to alert you to the problem.
Fortunately, you can control this behavior with the dynamic
setting,
which accepts the following options:
true
Add new fields dynamically—the default
false
Ignore new fields
strict
Throw an exception if an unknown field is encountered
The dynamic
setting may be applied to the root object or to any field
of type object
. You could set dynamic
to strict
by default,
but enable it just for a specific inner object:
PUT
/
my_index
{
"mappings"
:
{
"my_type"
:
{
"dynamic"
:
"strict"
,
"properties"
:
{
"title"
:
{
"type"
:
"string"
},
"stash"
:
{
"type"
:
"object"
,
"dynamic"
:
true
}
}
}
}
}
The my_type
object will throw an exception if an unknown field
is encountered.
The stash
object will create new fields dynamically.
With this mapping, you can add new searchable fields into the stash
object:
PUT
/
my_index
/
my_type
/
1
{
"title"
:
"This doc adds a new field"
,
"stash"
:
{
"new_field"
:
"Success!"
}
}
But trying to do the same at the top level will fail:
PUT
/
my_index
/
my_type
/
1
{
"title"
:
"This throws a StrictDynamicMappingException"
,
"new_field"
:
"Fail!"
}
dynamic
to false
doesn’t alter the contents of the _source
field at all. The _source
will still contain the whole JSON document that
you indexed. However, any unknown fields will not be added to the mapping and
will not be searchable.
If you know that you are going to be adding new fields on the fly, you probably want to leave dynamic mapping enabled. At times, though, the dynamic mapping “rules” can be a bit blunt. Fortunately, there are settings that you can use to customize these rules to better suit your data.
When Elasticsearch encounters a new string field, it checks to see if the
string contains a recognizable date, like 2014-01-01
. If it looks
like a date, the field is added as type date
. Otherwise, it is
added as type string
.
Sometimes this behavior can lead to problems. Imagine that you index a document like this:
{
"note"
:
"2014-01-01"
}
Assuming that this is the first time that the note
field has been seen,
it will be added as a date
field. But what if the next document looks
like this:
{
"note"
:
"Logged out"
}
This clearly isn’t a date, but it is too late. The field is already a date field and so this “malformed date” will cause an exception to be thrown.
Date detection can be turned off by setting date_detection
to false
on the root object:
PUT
/
my_index
{
"mappings"
:
{
"my_type"
:
{
"date_detection"
:
false
}
}
}
With this mapping in place, a string will always be a string
. If you need
a date
field, you have to add it manually.
Elasticsearch’s idea of which strings look like dates can be altered
with the dynamic_date_formats
setting.
With dynamic_templates
, you can take complete control over the
mapping that is generated for newly detected fields. You
can even apply a different mapping depending on the field name
or datatype.
Each template has a name, which you can use to describe what the template
does, a mapping
to specify the mapping that should be applied, and
at least one parameter (such as match
) to define which fields the template
should apply to.
Templates are checked in order; the first template that matches is
applied. For instance, we could specify two templates for string
fields:
es
: Field names ending in _es
should use the spanish
analyzer.
en
: All others should use the english
analyzer.
We put the es
template first, because it is more specific than the
catchall en
template, which matches all string fields:
PUT
/
my_index
{
"mappings"
:
{
"my_type"
:
{
"dynamic_templates"
:
[
{
"es"
:
{
"match"
:
"*_es"
,
"match_mapping_type"
:
"string"
,
"mapping"
:
{
"type"
:
"string"
,
"analyzer"
:
"spanish"
}
}},
{
"en"
:
{
"match"
:
"*"
,
"match_mapping_type"
:
"string"
,
"mapping"
:
{
"type"
:
"string"
,
"analyzer"
:
"english"
}
}}
]
}}}
The match_mapping_type
allows you to apply the template only
to fields of the specified type, as detected by the standard dynamic
mapping rules, (for example string
or long
).
The match
parameter matches just the field name, and the path_match
parameter matches the full path to a field in an object, so
the pattern address.*.name
would match a field like this:
{
"address"
:
{
"city"
:
{
"name"
:
"New York"
}
}
}
The unmatch
and path_unmatch
patterns can be used to exclude fields
that would otherwise match.
More configuration options can be found in the reference documentation for the root object.
Often, all types in an index share similar fields and settings. It can be
more convenient to specify these common settings in the _default_
mapping,
instead of having to repeat yourself every time you create a new type. The
_default_
mapping acts as a template for new types. All types created
after the _default_
mapping will include all of these default settings,
unless explicitly overridden in the type mapping itself.
For instance, we can disable the _all
field for all types, using the
_default_
mapping, but enable it just for the blog
type, as follows:
PUT
/
my_index
{
"mappings"
:
{
"_default_"
:
{
"_all"
:
{
"enabled"
:
false
}
},
"blog"
:
{
"_all"
:
{
"enabled"
:
true
}
}
}
}
The _default_
mapping can also be a good place to specify index-wide
dynamic templates.
Although you can add new types to an index, or add new fields to a type, you can’t add new analyzers or make changes to existing fields. If you were to do so, the data that had already been indexed would be incorrect and your searches would no longer work as expected.
The simplest way to apply these changes to your existing data is to reindex: create a new index with the new settings and copy all of your documents from the old index to the new index.
One of the advantages of the _source
field is that you already have the
whole document available to you in Elasticsearch itself. You don’t have to
rebuild your index from the database, which is usually much slower.
To reindex all of the documents from the old index efficiently, use
scan-and-scroll to retrieve batches of documents from the old index,
and the bulk
API to push them into the new index.
The problem with the reindexing process described previously is that you need to update your application to use the new index name. Index aliases to the rescue!
An index alias is like a shortcut or symbolic link, which can point to one or more indices, and can be used in any API that expects an index name. Aliases give us an enormous amount of flexibility. They allow us to do the following:
Switch transparently between one index and another on a running cluster
Group multiple indices (for example, last_three_months
)
Create “views” on a subset of the documents in an index
We will talk more about the other uses for aliases later in the book. For now we will explain how to use them to switch from an old index to a new index with zero downtime.
There are two endpoints for managing aliases: _alias
for single
operations, and _aliases
to perform multiple operations atomically.
In this scenario, we will assume that your application is talking to an
index called my_index
. In reality, my_index
will be an alias that
points to the current real index. We will include a version number in the
name of the real index: my_index_v1
, my_index_v2
, and so forth.
To start off, create the index my_index_v1
, and set up the alias
my_index
to point to it:
PUT
/
my_index_v1
PUT
/
my_index_v1
/
_alias
/
my_index
You can check which index the alias points to:
GET
/*
/_alias/my_index
Or which aliases point to the index:
GET
/
my_index_v1
/
_alias
/*
Both of these return the following:
{
"my_index_v1"
:
{
"aliases"
:
{
"my_index"
:
{
}
}
}
}
Later, we decide that we want to change the mappings for a field in our index.
Of course, we can’t change the existing mapping, so we have to reindex
our data. To start, we create my_index_v2
with the new mappings:
PUT
/
my_index_v2
{
"mappings"
:
{
"my_type"
:
{
"properties"
:
{
"tags"
:
{
"type"
:
"string"
,
"index"
:
"not_analyzed"
}
}
}
}
}
Then we reindex our data from my_index_v1
to my_index_v2
, following
the process described in “Reindexing Your Data”. Once we are satisfied that our
documents have been reindexed correctly, we switch our alias
to point to the new index.
An alias can point to multiple indices, so we need to remove the alias
from the old index at the same time as we add it to the new index. The
change needs to be atomic, which means that we must use the _aliases
endpoint:
POST
/
_aliases
{
"actions"
:
[
{
"remove"
:
{
"index"
:
"my_index_v1"
,
"alias"
:
"my_index"
}},
{
"add"
:
{
"index"
:
"my_index_v2"
,
"alias"
:
"my_index"
}}
]
}
Your application has switched from using the old index to the new index transparently, with zero downtime.
Even when you think that your current index design is perfect, it is likely that you will need to make some change later, when your index is already being used in production.
Be prepared: use aliases instead of indices in your application. Then you will be able to reindex whenever you need to. Aliases are cheap and should be used liberally.
18.216.117.191