Chapter 6. Document Relationships in NoSQL World

We have all grown up learning about relational data and databases. However, relational databases have their limitations, especially when providing full-text searches. Because of the limitations faced with relational databases, the world is adapting quickly to NoSQL solutions, and despite of there being so many NoSQL databases in the market, Elasticsearch has an upper hand because it offers the handling of relationships among different entities in combination with a powerful full-text search.

In this chapter, we will cover the following topics:

  • Managing relational data in Elasticsearch
  • Working with nested objects
  • Introducing parent-child relationships
  • Considerations for using document relationships

Relational data in the document-oriented NoSQL world

Relational databases have a lot of problems when it comes to dealing with a massive amount of data. Be it speed, efficient processing, effective parallelization, scalability, or costs, relational databases fail when the volume of data starts growing. The other challenge of relational databases is that relationships and schemas must be defined upfront. To overcome these problems, people started with normalizing data, dropping constraints, and relaxing transactional guarantees. Eventually, by compromising on these features, relational databases started resembling a NoSQL product. NoSQL is a combination of two terms, No and SQL. Some people say that it means no relational or no RDBMS, whereas other people say that it is "not only SQL". Whatever the meaning is, one thing is for sure, NoSQL is all about not following the rules of relational databases.

There is no doubt that document-oriented NoSQL databases have succeeded a lot in overcoming the issues faced in relational databases, but one thing cannot be missed out while working with any kind of data: relationships.

Managing relational data in Elasticsearch

Elasticsearch is also a NoSQL document data store. However, despite being a NoSQL data store, Elasticsearch offers a lot of help in managing relational data to an extent. It does support SQL-like joins and works awesomely on nested and related data entities.

For blog posts and comments, or an employee and their experiences, the data is always relational. With Elasticsearch, you can work very easily by preserving the association with different entities along with a powerful full-text search and analytics. Elasticsearch makes this possible by introducing two types of document relation models:

  • Nested relationships
  • Parent-child relationships

Both types of relationship work on the same model, one to many relationship. There is one root/parent object that can have one or more child objects.

The following image is a visual representation of how nested and parent-child documents look into Elasticsearch:

Managing relational data in Elasticsearch

As shown in the preceding image, in a nested relationship, there is a one root object, which is the main document that we have, and it contains an array of sub-documents called nested documents. There is no limit to the level of nesting of documents inside a root object. For example, look at the following JSON for a multilevel nesting:

{
  "location_id": "axdbyu",
  "location_name": "gurgaon",
  "company": [
    {
      "name": "honda",
      "modelName": [
        { "name": "honda cr-v", "price": "2 million" }
      ]
    },
    {
      "name": "bmw",
      "modelName": [
        { "name": "BMW 3 Series", "price": "2 million"},
        { "name": "BMW 1 Series", "price": "3 million" }
      ]
    }
  ]
}

The preceding example shows that we are dealing with data in which each location can have multiple companies and each company has different models. So, indexing this kind of data without a nested type will not solve our purpose if we have to find a particular model with the name or price of a particular company at a given location. This type of relational data with a one to many relationship can be handled in Elasticsearch using nested types.

Nested fields are used to index arrays of objects, in which each object can be queried (with the nested query) as an independent document; however, in a nested structure, everything is stored in the same Lucene block. This has the advantage of fast joins while querying, but also a disadvantage of the storage of the data.

Managing relational data in Elasticsearch

The parent-child relational model overcomes the storage problems of a nested model as the related documents here are not stored in the same lucene block, rather they are stored in the same shard. The parent and child are completely different documents. Elasticsearch maintains an internal data structure by mapping child document IDs to parent document IDs (similar to a foreign key that we use to define in relational databases).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.198.96