Adding comment data with Elasticsearch aggregation

The blog has been added to the system. Now a user can add a comment. So next, we will see how to add a comment. As discussed, the Comment document type is defined as a nested type in the blog document. It means the blog document contains an array of comment objects, making a one-to-many relationship. We also need to create a comment model class as follows:

public class Comment {
  private String id;
  private String blogId;
  private String parentId;
  private int childSequence;
  private String position;
  private String status;
  private int level;
  private String user;
  private String emailAddress;
  private String commentText;
  
  @JsonFormat
    (shape = JsonFormat.Shape.STRING, pattern = "MM-dd-yyyy'T'HH:mm:ss")
  private Date createdDate;

//Getter and Setter methods
.....
}

Since this is nested within a blog, there is no need to define the @Document annotation as it is not directly associated with any document type. While adding the comment, there is certain metadata that needs to be taken care of, as follows:

We are providing the comment with reply functionality. Once a user does reply to any comment, it will be added one level down, considering it as child comment. To maintain this, we use the level attribute, which simply shows at which level this comment is placed.
The blogId attribute simply holds the ID of a blog with which this comment is associated. Since this is a nested object, in most of the cases, it is not required to have a parent document ID. But we are going to show the comment list to an admin user to moderate and reply back. To make comment administration simple, we have just added blogId in the comment.
The parentId attribute holds the ID of parent comment, if it is placed as a reply, or else it will be zero.
The childSequence attribute simply shows the sequence number at a particular level. For example, if there are total two replies (at the second level) and a user tries to add a third reply (at the second level), then the childSequence attribute will be three. This attribute is used to construct a value of the position attribute.
The position attribute will be combination of level and childSequence. This is used to sort the comments so that they are displayed in the correct order for a given blog.

Since a comment is a nested type of blog, there is no such method to save only comments. Instead, we need to fetch all comments, add the new one to the associated blog, and then save the whole blog. Everything is straightforward, except getting the value of childSequence. We will see how to get maximum childSequence in a given level with the following aggregate query:

GET blog/blog/_search
{
  "query": {
    "match": {
      "_id": "1huEWWYB1CjEZ-A9sjir"
    }
  },
  "aggs": {
    "aggChild": {
      "nested": {
        "path": "comments"
      },
      "aggs": {
        "filterParentId": {
          "filter": {
            "nested": {
              "path": "comments",
              "query": {
                "match": {
                  "comments.parentId": "0"
                }
              }
            }
          },
          "aggs": {
            "maxChildSeq": {
              "max": {
                "field": "comments.childSequence"
              }
            }
          }
        }
      }
    }
  }
}

Before we can understand the query, we need to look at what aggregation is. In Elasticsearch, an aggregation is a mechanism used to provide aggregated data on a search query. They are used to compose complex queries. They come under four categories, as follows:

Bucketing
Metric
Matrix
Pipeline

Each of these aggregation types can be used in a nested fashion, meaning it can be used as a sub-aggregation to another, to solve very complex queries. Now, let's go back to the query to find childSequence and understand it.

The very first query criteria matches the value against blogId (_id). Any attribute given to the query criteria in the beginning will match its value against the blog attribute. The next is the aggregate query that is applied to the nested document—comments. Each aggregate query has a name. The first aggregate query has the aggChild name.

Going further, the next aggregate query with the filterParentId name simply matches parentId, which is nothing but the parent comment ID. It is required to find childSequence under given a comment as a parent comment. For top-level comments, this must be zero. The last aggregate query with the maxChildSeq name simply finds the maximum of childSequence. It uses maximum criteria. Each nested aggregate query simply applies the search criteria to results given by the preceding aggregate query. You will get results of this query similar to the following:

  "aggregations": {
    "aggChild": {
      "doc_count": 4,
      "filterParentId": {
        "doc_count": 2,
        "maxChildSeq": {
          "value": 3
        }
      }
    }
  }

The query result contains other information, but we will only focus on aggregation. The result shows a document count at each aggregate query. The value of maxChildSeq is three means there are three comments at level one (top-level comment), so when a user adds a new (top-level) comment, childSequnce will be four.

This was the REST-based query. For the Blogpress application, we need to execute similar queries in the Java class. Elasticsearch provides Java APIs to perform anything that can be done through REST query. When we define a starter for Elasticsearch in Spring Boot, the required Elasticsearch JAR files are available in the classpath. To write the preceding query with Java APIs, we need to write a custom fetch method in our Elasticsearch repository.

Spring Data is an extensible framework, allowing us to provide customized implementation of a repository on top of what it provides out of the box. So first we will extend the Elasticsearch repository with following steps.

Define a custom repository interface called BlogRepositoryCustom.
The BlogRepository interface that we created initially should extend this interface, along with ElasticsearchRepository<Blog, String>, as follows:

public interface BlogRepository extends ElasticsearchRepository<Blog, String>,BlogRepositoryCustom

Define the custom repository implementation class that implements the BlogRepositoryCustom interface as follows:

@Repository
public class BlogRepositoryCustomImpl implements BlogRepositoryCustom {

  private static Logger logger = LoggerFactory.getLogger(BlogRepositoryCustomImpl.class);
  
  @Autowired
  private ElasticsearchTemplate elasticsearchTemplate;
  
  ....
  
}

This class must be declared with the @Repository annotation. We can define any custom method in this class. We want to write a method with an Elasticsearch Java API to find the maximum child sequence at a given level, so we will write it in this class as follows:

public int getCurrentChildSequence(String blogId,String parentCommentId) {
    int currentChildSeq=0;
    TermQueryBuilder termQueryBuilder = new TermQueryBuilder("comments.parentId", parentCommentId);
    
    NestedAggregationBuilder aggregationBuilder = AggregationBuilders.nested("aggChild",  "comments").subAggregation(AggregationBuilders.filter("filterParentId", termQueryBuilder).subAggregation(AggregationBuilders.max("maxChildSeq").field("comments.childSequence")));
    TermQueryBuilder rootTermQueryBuilder = new TermQueryBuilder("_id", blogId);
    SearchResponse response = elasticsearchTemplate.getClient().prepareSearch("blog")
      .setTypes("blog")
      .setQuery(rootTermQueryBuilder)
      .addAggregation(aggregationBuilder)
      .execute().actionGet();

    if(response !=null) {
      if(response.getAggregations() !=null) {
        List<Aggregation> aggLst = response.getAggregations().asList();
        if(aggLst !=null) {
          Aggregation resultAgg = aggLst.get(0);
          if(resultAgg !=null) {
            //getMaxChildSequenceFromJson method parse the json to get max child sequence
            currentChildSeq = getMaxChildSequenceFromJson(resultAgg.toString());
          }
        }
      }
    }
    //Adding one to set next sequence
    currentChildSeq=currentChildSeq+1;
    return currentChildSeq;
  }

The AggregationBuilders class is used to construct an aggregate query. The Elasticsearch Java API is self-explanatory and simple. You can easily relate this Java API query with a REST query. We first create a nested aggregate query and then add a filter aggregate query as a sub-aggregation followed by a max aggregation.

The value of blogId is added with a TermQueryBuilder class. Finally, we get an Elasticsearch client from elasticsearchTemplate and initiate search by providing an index name (blog), a document type (blog), a root level query (for blogId), and at the end setting the aggregations. This Java API returns the aggregation JSON that we got for REST query, which you can process with a JSON API to get the desired result.

Table of Contents for Adding comment data with Elasticsearch aggregation

Create new playlist

Sign In

Sign Up

Table of Contents for
Adding comment data with Elasticsearch aggregation