Mapping

I mentioned earlier that Elasticsearch doesn't require a data structure to be defined for the document types. However, Elasticsearch internally figures out the structure of the data that we insert. We have the ability to define this structure manually but don't necessarily need to. When Elasticsearch uses its own guess of the data structure, it's said to be using a dynamic mapping for the document type. Let's look at what Elasticsearch guessed for our product document type. Using the command line, make the following request with curl:

> curl 'http://localhost:9200/daintree/products/_mapping?pretty'
{
  "daintree" : {
    "mappings" : {
      "products" : {
        "properties" : {
          "category" : {
            "type" : "string"
          },
          "name" : {
            "type" : "string"
          },
          "price" : {
            "type" : "long"
          },
          "tags" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

Elasticsearch has done a pretty good job of guessing our document structure. As you can see, it correctly guessed the type for all our fields. However, if you notice the type for the tags field, you'll see that it's a string. If you look at the document we retrieved earlier, the tags field is an array of strings. What's going on here?

Well, in Elasticsearch, an array doesn't have any special mapping. Each field can have one or more values; thus, each field can be an array without having to map it as such. One important implication of this is that arrays in Elasticsearch can only have one type of data. Thus, you can't have an array that contains both date values and strings. If you try to insert something like that, Elasticsearch will just go ahead and store the date as a string.

You might be wondering that if Elasticsearch is intelligent enough to figure out our data structure, then why do we care about the mapping? Well, the library we are using to work with Elasticsearch, elasticsearch_dsl, needs to define custom mappings to be able to insert documents into the index.

It is also a good idea to be explicit in what kind of data you will be inserting into the index. You can set a number of options when you set your own mapping, such as defining a field to be an integer. This way, even if you insert the value "123", Elasticsearch will convert it to an integer before inserting the document and raise an error if it can't. This provides data validation. There are certain types of data, such as dates in a format different than what Elasticsearch uses by default, that can only be correctly indexed if you have set a custom mapping.

Defining a mapping

To define a mapping with elasticsearch_dsl, we create a DocType subclass. This is similar to how a Django database model is defined. Create a new main/es_docs.py file and type in the following code:

from elasticsearch_dsl import DocType
from elasticsearch_dsl import Long
from elasticsearch_dsl import String


class ESProduct(DocType):
    name = String(required=True)
    description = String()
    price = Long(required=True)

    category = String(required=True)
    tags = String(multi=True)

    class Meta:
        doc_type = 'products'

There shouldn't be any surprises here as the syntax is pretty self-explanatory. I prefer to add ES to the start of my document type classes to differentiate an ES doc type class from the Django model of the same name. Note that we explicitly specified the document type name. If we hadn't, elasticsearch_dsl would have automatically come up with a name based on the class name—ESProduct. However,as we just wanted to define a mapping for an existing document type, we set the doc_type attribute in the Meta class.

Notice that our data types are the same as the ones that we saw before when we asked Elasticsearch about the mapping. There is a reason for this. You can't change the data type for an existing field. Otherwise, the existing documents would have the wrong data type and the search would return inconsistent results. While this mapping already exists in our Elasticsearch, let's see how we would use this class to define a new document type mapping. Open up the Django shell again and type in the following:

> python manage.py shell
> from elasticsearch_dsl.connections import connections
> from main.es_docs import ESProduct
> connections.create_connection()
<Elasticsearch([{}])>
> ESProduct.init(index='daintree')

We use the ESProduct.init(index='daintree') method to create the mapping in Elasticsearch. As our mapping already existed and was exactly the same, this function didn't change anything. However, if we were creating a new mapping, this function would have configured Elasticsearch with the new document type.

Note that this time we didn't pass any parameters to the connections.create_connection() method, which means that it used the default host list that assumes a locally running instance of Elasticsearch on the default port of 9200. As our Elasticsearch is running locally on the same port, we can skip the host's argument to the create_connection() method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.118.90