Join datatypes

If you recall in the Field datatype section of Chapter 4, Mapping APIs, Elasticsearch supports a special datatype called join to create a simple or multiple-level parent/child relationship in documents of the same index. You can define a set of possible relationships in the document with a parent field and a child field. We can establish a one-to-one relationship with the announcement, which is an array of dividend records, or a one-to-many relationship with each individual dividend record. According to the recommendations, you should only use the join datatype when your data contains a one-to-many relationship and the number of entities in many side significantly exceeds the other. Since there is a kind of ETF called a dividend-paying ETF, which gets dividends paid at a higher frequency, we will use the join datatype to expand our system.

To make it easier for understand, we look at an example of using ACWF as the sample ETF, in which it has got seven dividend records. From the aforementioned GitHub repository, there are four files, cf_etf_dividend_join_mappings.json, cf_etf_acwf_join.json, cf_etf_dividend_join_bulk.json, and cf_etf_dividend_join_bulk_index.sh for you to download to practice the join datatype. You can run the bash shell file to create the cf_etf_dividend_join index with the mappings in which a cf_etf_dividend relation is defined between the cf_etf and dividend entities, as shown in the following screenshot:

For simplicity, we use only one ETF, ACWF, for the parent document. After the cf_etf_dividend_join index is created, the bash shell file will index the parent document (cf_etf_acwf_join.json), as shown in the following screenshot:

Since we use bash script to perform the indexing operation, we use a known document identifier, 12345678, for indexing so that the known document identifier can be used to indicate the parent document when indexing the child document. It is not necessary to do so when you work on programming.

After the parent document is indexed, we index the child documents (cf_etf_dividend_join_bulk.json). There are 10 dividend records with the ACWF ETF; therefore, there are seven indexing operations in the _bulk operation. The following screenshot shows the _bulk operation for indexing these seven dividend records with the routing value:

Since it is required that the parent and the child documents must be indexed on the same shard, the routing value is required. Recall that the default routing value is the document identifier, which is 12345678 in this case.

Table of Contents for Join datatypes

Create new playlist

Sign In

Sign Up

Table of Contents for
Join datatypes