Understanding the schema-less

Certainly one of the most important features of Elasticsearch is its ability to be schema-less but it must be digested with no doubt.

Yes, as stated previously, Elasticsearch does not require some definitions such as index, type, and field type before the indexing process, and when an object is indexed later with a new property, it will automatically be added to the mapping definitions.

So, is the claim about "Elasticsearch stands for the schema-free model" always true?

Recall that types are being created according to the mapping information and mapping is actually a schema definition. Therefore, Elasticsearch expects that mapping and the documents being indexed are compatible.

Now let's examine the following example:

curl -XPUT localhost:9200/my_index/document/1 -d '{"value": "a"}'
{"_index":"my_index","_type":"document","_id":"1","_version":1,"created":true}

curl -XPUT localhost:9200/my_index/document/2 -d '{"value": 1}'
{"_index":"my_index","_type":"document","_id":"2","_version":1,"created":true}

Everything seems fine. Let's now request mapping for the document type. This gives us the following result:

curl -XGET localhost:9200/my_index/document/_mapping?pretty
{
  "my_index" : {
    "mappings" : {
      "document" : {
        "properties" : {
          "value" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

The response shows that the value field has been recognized as a field of type string by Elasticsearch because the first value being a string value (that is, a) was sent (remember, explicit mapping was used). In this case, when the second document was indexed, Elasticsearch converted the numeric value into a string value.

Okay, now we will delete the my_index and indexing the documents in reverse order:

curl -XDELETE localhost:9200/my_index
{"acknowledged":true}

curl -XPUT localhost:9200/my_index/document/1 -d '{"value": 1}'
{"_index":"my_index","_type":"document","_id":"1","_version":1,"created":true}

So far so good. Let's continue:

curl -XPUT localhost:9200/my_index/document/2 -d '{"value": "a"}'
{"error":"MapperParsingException[failed to parse [value]]; nested: NumberFormatException[For input string: "a"]; ","status":400}

Oops, we have a big problem. As you seen, the server returns a 400 Bad Request when we submit the second document. Let's now again request mapping for the document type:

curl -XGET localhost:9200/my_index/document/_mapping?pretty
{
  "my_index" : {
    "mappings" : {
      "document" : {
        "properties" : {
          "value" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

As you can see, the value field has been recognized as a field of type long by Elasticsearch because the first value being a numeric value (that is, 1) was sent (remember again, explicit mapping was used). In this case, when the second document was indexed, Elasticsearch tried to parse the string value a as a numeric value and threw a NumberFormatException as this string can't be parsed numerically.

We cannot solve this problem by deleting the first document because this action does not change the mapping information. Keep in mind that once a field has been added, its type cannot change.

To sum up, Elasticsearch is schema-less in that you do not need to define fields in advance, but it requires that the fields in documents being indexed are compatible with the mapping. You can add new fields anytime, but once a field is defined, you cannot change its type.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.204.201