Inserting documents into Elasticsearch from Python

Now that we have a DocType subclass and have seen how to create the mapping, all that's left to look at is inserting documents into Elasticsearch. This section assumes that you have loaded the fixtures data that I provided with the code drop.

Open the Django shell again and type the following commands:

> python manage.py shell
> from elasticsearch_dsl.connections import connections
> from main.es_docs import ESProduct
> from main.models import Product
> connections.create_connection()
<Elasticsearch([{}])>
> p = Product.objects.get(pk=200)
> esp = ESProduct(meta={'id':p.pk}, name=p.name, description=p.description, price=p.price, category=p.category.name)
> for tag in p.tags.all():
>     esp.tags.append(tag.name)
>
> esp.save(index='daintree')
True

Note

Note the empty line after the for loop body. In the shell, this empty line is required to tell the interactive shell that the loop body is finished and it can go ahead and execute the loop.

It should be pretty normal, up to where we get the product with ID 200 from the database. I just chose a random ID as I knew that the product with ID 200 would exist in your database after you had loaded the fixtures I provided.

Next, we create a new ESProduct instance and assign it values from our Django model. The ID field needs to be assigned a value using the special meta keyword argument because that is part of the metadata of the document in Elasticsearch and not part of the document body. If we didn't provide an ID, Elasticsearch would automatically generate a random one for us. We specify it explicitly so that we can tie our database models to our Elasticsearch documents.

Next, we loop over all the tags in our Product object and append it to the tags field in our ESProduct object. We didn't need to set the tags field value to an empty array. When we defined the tags field, we passed the multi=True argument to the constructor. For elasticsearch_dsl fields, a multifield has a default empty value, which is an empty list. Thus, in our loop, we were sure that esp.tags is a list that we can append to.

After we have set up our ESProduct model instance with the correct values, we call the save method, passing the index name in which to insert it. Once the save call returns, Elasticsearch will hold our new data. We can test it using curl to retrieve this new document:

> curl 'http://localhost:9200/daintree/products/_search?pretty'

In the output for this command, you should now see three products instead of the two that we originally inserted.

Getting all our data into Elasticsearch

We can't go around inserting data into Elasticsearch from the console all the time. We need an automated way of doing so. As we've seen before, Django management commands are a perfect way to create a script to do so. Create the folders that will hold our command file, main/management/commands, create an empty __init__.py file in both main/management and main/management/commands, and add the following code to main/management/commands/index_all_data.py:

import elasticsearch_dsl
import elasticsearch_dsl.connections

from django.core.management import BaseCommand

from main.models import Product
from main.es_docs import ESProduct


class Command(BaseCommand):
    help = "Index all data to Elasticsearch"

    def handle(self, *args, **options):
        elasticsearch_dsl.connections.connections.create_connection()

        for product in Product.objects.all():
            esp = ESProduct(meta={'id': product.pk}, name=product.name, description=product.description,
                            price=product.price, category=product.category.name)
            for tag in product.tags.all():
                esp.tags.append(tag.name)
            
            esp.save(index='daintree')

There isn't anything new here. We just loop over all the product objects in our database and add them to Elasticsearch. Run it as follows:

> python manage.py index_all_data

It will run successfully without any output and you should now have all your documents in Elasticsearch. To confirm this, we can get the stats for our daintree index from Elasticsearch. Run the following command from your shell:

> curl 'localhost:9200/daintree/_stats?pretty=1'

This should output lots of data about the daintree index. You'll need to scroll up and you will find the total document count. It should be similar to this:

.
.
.
"total" : {
        "docs" : {
          "count" : 1000,
          "deleted" : 0
        },
.
.
.

As you can see, all our data is now indexed. Next, we will add search to our home page using Elasticsearch.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.8.212