Modeling relationships

We saw in the previous sections how to model and store products and run various queries on products. The product data had partly structured data and partly textual data. What if we also had detailed features of the products available to us? We may have many different types of products and each product may have completely different types of detailed features. For example, for products that fall into the Laptops category, we would have features such as screen size, processor type, and processor clock speed.

At the same time, products in the Automobile GPS Systems category may have features such as screen size, whether GPS can speak street names, or whether it has free lifetime map updates available.

Because we may have tens of thousands of products in hundreds of product categories, we may have tens of thousands of features. One solution might be to create one field for each feature. As you may remember from our earlier analogy between the field and database column, the resulting data would look very sparse if we tried to show it in tabular format:

Title	Category	Screen Size	Processor Type	Clock Speed	Speaks Street Names	Map Updates
ThinkPad X1	Laptops	14 inches	core i5	2.3 GHz
Acer Predator	Laptops	15.6 inches	core i7	2.6 GHz
Trucker 600	GPS Navigation Systems	6 inches			Yes	Yes
RV Tablet 70	GPS Navigation Systems	7 inches			Yes	Yes

As you can see, products in the Laptops category have a different set of columns populated (those columns are the features related to that category) and products in the GPS Navigation Systems category have a different set of columns populated. If we were to model all products of all categories like this, we may end up with tens of thousands of fields (imagine tens of thousands of columns to make it a very wide table). If the data was modeled in this way, it would be hard to generate certain types of queries, as we have many different fields.

Instead, we could model this relationship between a product and its features as a one-to-many relationship as we would in a relational database. Let's see how we would have modeled it in a relational database.

The product table would be modeled as follows:

ProductID	Title	Category	Description	Other Product Columns...
c0001	ThinkPad X1	Laptops	...	...
c0002	Acer Predator	Laptops	...	...
c0003	Trucker 600	GPS Navigation Systems	...	...
c0004	RV Tablet 70	GPS Navigation Systems	...	...

The features would be modeled as a separate table, where the ProductID and Feature may be a composite primary key:

ProductID	Feature	FeatureValue
c001	Screen Size	14 inches
c001	Processor Type	core i5
c001	Clock Speed	2.3 GHz
c002	Screen Size	15.6 inches
...	...	..

When we model a similar type of relationship in Elasticsearch, we can use the join datatype (https://www.elastic.co/guide/en/elasticsearch/reference/7.x/parent-join.html) to model relationships. To import the data, follow the steps mentioned in chapter-03/products_with_features_data.

Here, we want to establish a relationship between products and features. When using the join datatype, we still need to index everything into a single Elasticsearch index within a single Elasticsearch type. Remember, we can't have more than one type within a single index. The join datatype mapping that establishes the relationship is defined as follows:

PUT /amazon_products_with_features
{
  ...
  "mappings": {
    "doc": {
      "properties": {
        ...
        "product_or_feature": {
          "type": "join",
          "relations": {
            "product": "feature"
          }
        },
        ...
       }
     }
   }
}

The highlighted product_or_feature field is of type join and it defines the relationship between product and feature. The product is on the left-hand side, which is analogous to conveying that the product is the parent of the feature(s).

When indexing product records, we use the following syntax:

PUT /amazon_products_with_features/doc/c0001
{
  "description": "The Lenovo ThinkPad X1 Carbon 20K4002UUS has a 14 inch IPS Full HD LED display which makes each image and video appear sharp and crisp. The Thinkpad has an Intel Core i5 6200U 2.3 GHz Dual-core processor with Intel HD 520 graphics and 8 GB LPDDR3 SDRAM that gives lag free experience. It has a 180 GB SSD which makes all essential data and entertainment files handy. It supports 802.11ac and Bluetooth 4.1 and runs on Windows 7 Pro 64-bit downgrade from Windows 10 Pro 64-bit operating system. The ThinkPad X1 Carbon has two USB 3.1 Gen 1 ports which enables 10 times faster file transfer and has Gigabit ethernet for network communication. This notebook comes with 3 cell Lithium ion battery which gives upto 15.5 hours of battery life.",
  "price": "699.99",
  "id": "c0001",
  "title": "Lenovo ThinkPad X1 Carbon 20K4002UUS",
  "product_or_feature": "product",
  "manufacturer": "Lenovo"
}

The value of product_or_feature, which is set to product suggests that this document is referring to a product.

A feature record is indexed as follows:

PUT amazon_products_with_features/doc/c0001_screen_size?routing=c0001
{
  "product_or_feature": {
    "name": "feature",
    "parent": "c0001"
  },
  "feature_key": "screen_size",
  "feature_value": "14 inches",
  "feature": "Screen Size"
}

Notice that, while indexing a feature, we need to set which is the parent of the document within product_or_feature. We also need to set a routing parameter that is equal to the document ID of the parent so that the child document gets indexed in the same shard as the parent. Please follow the instructions at chapter-03/products_with_features_data to load some sample data, that has products and features.

Once we have some products and features populated, we can query from products while joining the data from features. For example, you may want to get all of the products that have a certain feature.

Table of Contents for Modeling relationships

Create new playlist

Sign In

Sign Up

Table of Contents for
Modeling relationships