Reading comment data with Elasticsearch aggregation

Once comments are added, they must be visible when the user opens the blog. This scenario is straightforward. Since comments are nested objects of a blog, when we read a blog with the following API, all its comments are also available as part of the blog object:

 Optional<Blog> blogObj = blogRepository.findById(blogId);
if(blogObj.isPresent()) {
return blogObj.get();
}else {
return null;
}

The findById method is provided out of the box by a default repository implementation, available during runtime. We pass blogId, and it will fetch all details of the blog along with comments (as nested objects).

The second scenario for reading comment is the admin user opens the manage-comment page, where all comments are displayed for moderation purposes. In this case, the system will show all comments added to any of the blogs, so it is necessary to bring all comments from all blogs.

The first way of achieving this is to fetch all blogs, take the comments, and append them to build the comments list. But this is not an ideal solution as it requires many things to be done manually. We can use Elasticsearch aggregation queries to do this. By default, the nested objects cannot be fetched directly as a parent object, so it requires aggregation:

GET blog/blog/_search
{
"aggs": {
"aggChild": {
"nested": {
"path": "comments"
},
"aggs": {
"aggSortComment": {
"top_hits": {
"sort": [
{
"comments.createdDate": {
"order": "desc"
}
}
],"from": 0,
"size": 10
}
}
}
}
}
}

This query has the top_hits aggregation, which simply lists all nested objects. We need the data in descending order of createdDate (recently added should be placed on top), so sorting criteria is added. The from and size criteria are used for pagination. The from criteria represents the offset from first record, while size shows the total record per page.

By default, top_hits will return three records if you have not provided the size value. Also, the maximum allowed size is 100 so while using top_hits, you have to use pagination.

This query returns the result. Aggregation data for full results is shown in the following snippet:

"aggregations": {
"aggChild": {
"doc_count": 7,
"aggSortComment": {
"hits": {
"total": 7,
"max_score": null,
"hits": [
{
"_index": "blog",
"_type": "blog",
"_id": "Bsz2Y2YBksR0CLn0e37E",
"_nested": {
"field": "comments",
"offset": 2
},
"_score": null,
"_source": {
"id": "e7EqiPJHsj1539275565438",
"blogId": "Bsz2Y2YBksR0CLn0e37E",
"parentId": "0",
"childSequence": 2,
"position": "1.2",
"status": "M",
"level": 1,
"user": "Nilang Patel",
"emailAddress": "[email protected]",
"commentText": "installatin of java. great blog",
"createdDate": "10-11-2018T16:32:45"
},
"sort": [
1539275565000
]
},
{
.... Other JSON Objects, each represents comment data.
}...
]
}
}
}
}

You can write the previous query with the Elasticsearch Java API as follows:

public List<Comment> getAllComments(int from, int size){

NestedAggregationBuilder aggregation = AggregationBuilders.nested("aggChild", "comments").
subAggregation(AggregationBuilders.topHits("aggSortComment").sort("comments.createdDate", SortOrder.DESC).from(from).size(size));


SearchResponse response = elasticsearchTemplate.getClient().prepareSearch("blog")
.setTypes("blog")
.addAggregation(aggregation)
.execute().actionGet();

List<Aggregation> responseAgg = response.getAggregations().asList();
//getAllCommentsFromJson method process the json and return desire data.
return getAllCommentsFromJson(responseAgg.get(0).toString());
}

Again, this is self-explanatory. First, we are creating a nested aggregation query with AggregationBuilders and adding the sub-aggregation of the top_hits type, along with sorting criteria with the from and size settings. The process of getting a response is identical to what we used in the method to get the maximum child sequence.

In case we need to display comments with a specific status value, we can use the following query:

GET blog/blog/_search
{
"_source": false,
"aggs": {
"aggChild": {
"nested": {
"path": "comments"
},
"aggs": {
"aggStatsComment": {
"terms": {
"field": "comments.status",
"include": "K"
},
"aggs": {
"aggSortComment": {
"top_hits": {
"sort": [
{
"comments.createdDate": {
"order": "desc"
}
}
],
"from": 0,
"size": 10
}
}
}
}
}
}
}
}

The term aggregation query has been added which checks the value of the status field. You can use a wildcard (*) for matching criteria, for example, A* will match all statuses starting with A. The equivalent Java API appears as follows:

public List<Comment> getCommentsForStatus(String status,int from, int size) {

IncludeExclude includeExclude = new IncludeExclude(status, null);

NestedAggregationBuilder aggregation = AggregationBuilders.nested("aggChild", "comments").
subAggregation(AggregationBuilders.terms("aggStatsComment").
field("comments.status").includeExclude(includeExclude).
subAggregation(AggregationBuilders.topHits("aggSortComment").size(10).sort("com ments.createdDate", SortOrder.DESC))
);

SearchResponse response = elasticsearchTemplate.getClient().prepareSearch("blog")
.setTypes("blog")
.addAggregation(aggregation)
.execute().actionGet();

List<Aggregation> responseAgg = response.getAggregations().asList();

return getAllCommentsWithStatusFromJson(responseAgg.get(0).toString());

}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.6.77