Exploring Elasticsearch

Before we get into integrating Elasticsearch with our Django application, let's take some time and explore Elasticsearch. We'll look into how to get data into it and use the search features to get back the results we want. We won't go into a lot of details about the search as we'll look into it later when we are building the search page for our applications, but we will get a basic overview of how Elasticsearch works and how it can be useful for us.

To start, download the latest version of Elasticsearch from https://www.elastic.co/downloads/elasticsearch. You will need to have Java installed on your system to run Elasticsearch, so go ahead and install that as well if you don't already have it. You can get Java from https://java.com/en/download/. Once you have downloaded Elasticsearch, extract the files from the compressed archive into a folder, open a new terminal session, and cd to this folder. Next, cd into the bin folder and run the following command:

> ./elasticsearch
.
.
.
[2016-03-06 17:53:53,091][INFO ][http                     ] [Marvin Flumm] publish_address {127.0.0.1:9200}, bound_addresses {[fe80::1]:9200}, {[::1]:9200}, {127.0.0.1:9200}
[2016-03-06 17:53:53,092][INFO ][node                     ] [Marvin Flumm] started
[2016-03-06 17:53:53,121][INFO ][gateway                  ] [Marvin Flumm] recovered [0] indices into cluster_state

Running the Elasticsearch binary should produce a lot of output, and it will be different from what I've pasted here. However, you should still see the two messages started and recovered [0] indices into cluster_state at the end of the output. This means that Elasticsearch is now running on your system. That wasn't so hard! Of course, running Elasticsearch in production is a bit different, and the Elasticsearch documentation provides a lot of information about how to deploy it for a couple of different use cases.

We only cover the basics of Elasticsearch in this chapter as our focus is on looking at the integration between Django and Elasticsearch, but if you ever find yourself stuck somewhere or need some questions answered, do take a look at the documentation—it really is quite extensive and thorough. You can find it at https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html. There is also a book style guide available at https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html if you want to really spend time learning Elasticsearch.

First steps with Elasticsearch

Now that we have Elasticsearch running, what can we do with it? Well, for starters, you need to know that Elasticsearch exposes its functionality over a simple HTTP API. So you don't need any special libraries to communicate with it. Most programming languages, including Python, include the means to make HTTP requests. However, there are a couple of libraries that provide another layer of abstraction over HTTP and make working with Elasticsearch easier. We'll get into those later.

For now, let's open up this URL in our browsers:

http://localhost:9200/?pretty

This should give you an output similar to this:

{
  "name" : "Marvin Flumm",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.2.0",
    "build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe",
    "build_timestamp" : "2016-01-27T13:32:39Z",
    "build_snapshot" : false,
    "lucene_version" : "5.4.1"
  },
  "tagline" : "You Know, for Search"
}

While most of the values will be different, the structure of the response should roughly be the same. This simple test lets us know that Elasticsearch is working properly on our system.

Now we'll do a quick walkthrough where we insert, retrieve, and search for a couple of products. I won't go into a lot of details, but if you are interested, you should look at the documentation of Elasticsearch that I mentioned before.

Note

You will need to have a working copy of the curl command-line utility installed on your machine to perform the steps in this section. It should be available by default on Linux and Unix platforms, including Mac OS X. If you're on Windows, you can get a copy from https://curl.haxx.se/download.html.

Open a new terminal window as our current one has Elasticsearch running in it. Next, type in the following:

> curl -XPUT http://localhost:9200/daintree/products/1 -d '{"name": "Django Blueprints", "category": "Book", "price": 50, "tags": ["django", "python", "web applications"]}'
{"_index":"daintree","_type":"products","_id":"1","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}      
                                                > curl -XPUT http://localhost:9200/daintree/products/2 -d '{"name": "Elasticsearch Guide", "category": "Book", "price": 100, "tags": ["elasticsearch", "java", "search"]}'
{"_index":"daintree","_type":"products","_id":"2","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}

Most of the Elasticsearch APIs accept JSON objects. Here, we are asking Elasticsearch to PUT two documents, ids 1 and 2, in its storage. It may look complicated, but let me explain what's happening here.

In a database server, you have databases, tables, and rows. Your database is like a namespace where all of your tables live. Tables define the overall shape of the data that you want to store, and each row is one unit of that data. Elasticsearch has a slightly different way of working with data.

In place of a database, Elasticsearch has an index. Tables are called document types and live inside of indexes. Finally, the rows, or documents as Elasticsearch calls them, are stored inside of the document type. In our preceding example, we told Elasticsearch to PUT a document with Id 1 in the products document type, which lives in the daintree index. One thing that we didn't do here is define the document structure. That's because Elasticsearch doesn't require a set structure. It will dynamically update the structure of its tables (the document types) as you insert new documents.

Let's try retrieving the first document that we inserted. Run this command:

> curl -XGET 'http://localhost:9200/daintree/products/1?pretty=true'
{
  "_index" : "daintree",
  "_type" : "products",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "Django Blueprints",
    "category" : "Book",
    "price" : 50,
    "tags" : [ "django", "python", "web applications" ]
  }
}

As you can probably guess, the API for Elasticsearch is very simple and intuitive. We used a PUT HTTP request when we wanted to insert a document. When we want to retrieve one, we use the GET HTTP request type and we give the same path that we used when inserting the document. We get back a bit more information than we inserted. Our document is in the _source field and the rest of the fields are metadata that Elasticsearch stores with each document.

Now we look at the star of the show—searching! Let's see how to do a simple search for books with the word Django in their title. Run the following command:

> curl -XGET 'http://localhost:9200/daintree/products/_search?q=name:Django&pretty'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.19178301,
    "hits" : [ {
      "_index" : "daintree",
      "_type" : "products",
      "_id" : "1",
      "_score" : 0.19178301,
      "_source" : {
        "name" : "Django Blueprints",
        "category" : "Book",
        "price" : 50,
        "tags" : [ "django", "python", "web applications" ]
      }
    } ]
  }
}

The result is what you would have expected for this search. Elasticsearch only returned the one document that had the term Django in its name and skipped the other one. This is called the lite search or query-string search as our query is sent as part of the query string parameters. However, this method quickly gets difficult to use for complicated queries having multiple parameters. For those queries, Elasticsearch provides a full query DSL, which uses JSON to specify the query. Let's take a look how we could do this same search using the query DSL:

> curl -XGET 'http://localhost:9200/daintree/products/_search?pretty' -d '{"query": {"match": {"name": "Django"}}}'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.19178301,
    "hits" : [ {
      "_index" : "daintree",
      "_type" : "products",
      "_id" : "1",
      "_score" : 0.19178301,
      "_source" : {
        "name" : "Django Blueprints",
        "category" : "Book",
        "price" : 50,
        "tags" : [ "django", "python", "web applications" ]
      }
    } ]
  }
}

This time, instead of passing a query parameter, we send a body with the GET request. The body is the JSON query that we wish to execute. I won't be explaining the query DSL because it has a lot of features and is quite powerful and it would take another book to explain it properly. In fact, a couple of books have been written that explain the DSL fully. However, for simple usages like this, you can guess easily what's happening. If you want further details, I will again suggest taking a look at the Elasticsearch documentation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.251.70