Modeling relationships

We saw in the previous sections how to model and store products and run various queries on products. The product data had partly structured data and partly textual data. What if we also had detailed features of the products available to us? We may have many different types of products and each product may have completely different types of detailed features. For example, for products that fall into the Laptops category, we would have features such as screen size, processor type, and processor clock speed.

At the same time, products in the Automobile GPS Systems category may have features such as screen size, whether GPS can speak street names, or whether it has free lifetime map updates available.

Because we may have tens of thousands of products in hundreds of product categories, we may have tens of thousands of features. One solution might be to create one field for each feature. As you may remember from our earlier analogy between the field and database column, the resulting data would look very sparse if we tried to show it in tabular format:

Title

Category

Screen Size

Processor Type

Clock Speed

Speaks Street Names

Map Updates

ThinkPad X1

Laptops

14 inches

core i5

2.3 GHz

Acer Predator

Laptops

15.6 inches

core i7

2.6 GHz

Trucker 600

GPS Navigation Systems

6 inches

Yes

Yes

RV Tablet 70

GPS Navigation Systems

7 inches

Yes

Yes

 

As you can see, products in the Laptops category have a different set of columns populated (those columns are the features related to that category) and products in the GPS Navigation Systems category have a different set of columns populated. If we were to model all products of all categories like this, we may end up with tens of thousands of fields (imagine tens of thousands of columns to make it a very wide table). If the data was modeled in this way, it would be hard to generate certain types of queries, as we have many different fields.

Instead, we could model this relationship between a product and its features as a one-to-many relationship as we would in a relational database. Let's see how we would have modeled it in a relational database.

The product table would be modeled as follows:

ProductID

Title

Category

Description

Other Product Columns...

c0001

ThinkPad X1

Laptops

...

...

c0002

Acer Predator

Laptops

...

...

c0003

Trucker 600

GPS Navigation Systems

...

...

c0004

RV Tablet 70

GPS Navigation Systems

...

...

 

The features would be modeled as a separate table, where the ProductID and Feature may be a composite primary key:

ProductID

Feature

FeatureValue

c001

Screen Size

14 inches

c001

Processor Type

core i5

c001

Clock Speed

2.3 GHz

c002

Screen Size

15.6 inches

...

...

..

 

When we model a similar type of relationship in Elasticsearch, we can use the join datatype (https://www.elastic.co/guide/en/elasticsearch/reference/7.x/parent-join.html) to model relationships. To import the data, follow the steps mentioned in chapter-03/products_with_features_data.

Here, we want to establish a relationship between products and features. When using the join datatype, we still need to index everything into a single Elasticsearch index within a single Elasticsearch type. Remember, we can't have more than one type within a single index. The join datatype mapping that establishes the relationship is defined as follows:

PUT /amazon_products_with_features
{
...
"mappings": {
"doc": {
"properties": {
...
"product_or_feature": {
"type": "join",
"relations": {
"product": "feature"
}
},
...
}
}
}
}

The highlighted product_or_feature field is of type join and it defines the relationship between product and feature. The product is on the left-hand side, which is analogous to conveying that the product is the parent of the feature(s).

When indexing product records, we use the following syntax:

PUT /amazon_products_with_features/doc/c0001
{
"description": "The Lenovo ThinkPad X1 Carbon 20K4002UUS has a 14 inch IPS Full HD LED display which makes each image and video appear sharp and crisp. The Thinkpad has an Intel Core i5 6200U 2.3 GHz Dual-core processor with Intel HD 520 graphics and 8 GB LPDDR3 SDRAM that gives lag free experience. It has a 180 GB SSD which makes all essential data and entertainment files handy. It supports 802.11ac and Bluetooth 4.1 and runs on Windows 7 Pro 64-bit downgrade from Windows 10 Pro 64-bit operating system. The ThinkPad X1 Carbon has two USB 3.1 Gen 1 ports which enables 10 times faster file transfer and has Gigabit ethernet for network communication. This notebook comes with 3 cell Lithium ion battery which gives upto 15.5 hours of battery life.",
"price": "699.99",
"id": "c0001",
"title": "Lenovo ThinkPad X1 Carbon 20K4002UUS",
"product_or_feature": "product",
"manufacturer": "Lenovo"
}

The value of product_or_feature, which is set to product suggests that this document is referring to a product.

A feature record is indexed as follows:

PUT amazon_products_with_features/doc/c0001_screen_size?routing=c0001
{
"product_or_feature": {
"name": "feature",
"parent": "c0001"
},
"feature_key": "screen_size",
"feature_value": "14 inches",
"feature": "Screen Size"
}

Notice that, while indexing a feature, we need to set which is the parent of the document within product_or_feature. We also need to set a routing parameter that is equal to the document ID of the parent so that the child document gets indexed in the same shard as the parent. Please follow the instructions at chapter-03/products_with_features_data to load some sample data, that has products and features.

Once we have some products and features populated, we can query from products while joining the data from features. For example, you may want to get all of the products that have a certain feature. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.149.232