Chapter 3. NLP in e-Commerce

Today’s new marketplaces must nurture and manage perfect competition to thrive.

- Jeff Jordan, Andreessen Horowitz

In today’s world, e-commerce has become synonymous with shopping. An enriched customer experience compared to what a physical retail store offers has helped in this growth of e-commerce. Worldwide retail e-commerce sales in 2018 were $2.8 trillion, and are projected to touch $4.8 trillion by 2021 [1]. Recent advancements in Machine Learning and NLP have played a major role in this rapid growth.

Open the home page of any e-retailer, and you will find a lot of information in the form of text and images. A major portion of this information consists of text, in the form of product descriptions, reviews etc. Retailers strive to utilize this information intelligently to deliver customer delight and build competitive advantage. An E-commerce company faces a range of text related problems which can be solved by NLP techniques. We saw different kinds of NLP problems and solutions in the previous section of this book (Chapters 4-8). In this chapter, we will give an overview how the NLP problems in E-commerce domain can be addressed using what we learnt in this book so far. We will discuss some of the key NLP tasks in this domain, including search, building a product catalog, collecting reviews, and providing recommendations. Let us start with an overview of these four tasks.

Search

Search systems in e-commerce are different compared to general search engines such as Google, Bing, Yahoo etc. An e-commerce search engine is closely tied to the products available and the different kinds of information associated with them. For instance in regular search engine you are largely dealing with free-form text data like a news article or a blog as opposed to structured, sales and review data for e-commerce. You might search for ‘red checkered shirt for a wedding’ and the e-commerce search engine should be able to fetch it. Similar form of focused search can also be seen on travel websites for flight and hotel bookings such as Airbnb, TripAdvisor etc. The specific nature of the information associated with each type of e-commerce business calls for a customized pipeline of information processing, extraction and search.

Building an e-commerce catalogue

Any large e-commerce enterprise needs a easy to access catalogue. Better product descriptions with relevant information help the customer to choose the right product through this catalogue. Such information can also help in product recommendation and user personalization as well. Imagine a recommendation engine automatically knows that you like the color orange! That is certainly not possible unless and until the engine notices that most of your last purchases or searches were on apparels of the color orange. To achieve this, the first thing that is needed is identifying ‘orange’ is associated as a color attribute with the products. Extracting such information automatically is calld attribute extraction. Attribute extraction from product descriptions can guarantee that all the relevant product information is properly indexed and displayed for each product, improving product discoverability.

Review analysis

The most noticeable part of an e-commerce platform is the user reviews section for all products.. Reviews provide a different perspective of the product which cannot be obtained only from the product attributes alone, such as usability, comparisons with other products and delivery feedback. Reviews are mainly textual in nature and thus warrant a deep textual analysis of them. However, all reviews may not be useful as well or may not come from trusted users. Further, it is hard to manually process multiple reviews for a given product. NLP techniques provide an overall perspective for all reviews by performing tasks such as sentiment analysis, review summarization, identifying review helpfulness and so on. We have seen one example of NLP for review analysis in Chapter 5 when we discussed Key phrase extraction. We will see other use cases later in this chapter.

Recommendation for e-commerce

Without a recommendation engine, any e-commerce platform would be incomplete. A customer likes when the platform intelligently understands their choices and suggests the next products to buy. It actually helps the customer to organize their thoughts about shopping and helps to achieve better utility. Recommendations of discounted items, same-brand products, or products with favorite attributes can really engage the customer on the website and make them spend more time. This directly increases the possibility of the customers buying those products. Apart from transaction-based recommendation facilities, there are a rich set of algorithms which are developed based on product content information and reviews which are textual in nature. NLP is used to build such recommendation systems.

With this overview, we are all set to explore the role in NLP in E-commerce in more detail! Let us start with how it is useful for search.

Search in E-Commerce

Customers visit an e-commerce website to find and purchase their desired products quickly. Ideally, a search box should enable the customer to quickly reach the right product. The search needs to be fast, precise, and fetch results that closely match the customers’ needs. A good search mechanism positively impacts the conversion rate which directly impacts the revenue of the e-retailer. In chapter 7, we discussed how general search engines work and where NLP is useful. However, for e-commerce, the search engine needs to be more fine-tuned to the business needs. Search in e-commerce is closed-domain, i.e., the search engine typically fetches items from within the product range of the business, rather than a generic set of documents or content on the open web. The data format in which the product information gets stored is largely the same schema and have multiple levels of information which as a whole define the product. The kind of search that happens in e-commerce is generally called as ‘Faceted search’, which we will focus on in this section.

Faceted search is a specialized variant of the search mechanism which allows the customer to navigate in a better manner with filters. For example, if you are planning to buy a TV, then you might look for filters like a brand, price, TV size etc. In e-commerce websites, users are presented with a set of search filters depending on the product. Figures 11-1 and 11-2 below illustrate search in e-commerce through the websites of Amazon and Walmart.

Figure 3-1. Faceted Search at Amazon.com
Faceted Search at Walmart.com

The leftmost section for both of the images depicts a set of filters (alternatively, ‘facets’) which allows the customers to guide their search in a manner which satisfies their buying needs. In Figure 11-1 we see a search for Television models. Hence, the filters shown show aspects such as resolution, display size etc. Along with such custom filters, there are also some general features that are valid for many such product searches such as brand, price range, mode of shipping etc, which are shown in Figure 11-2.

Faceted search is a guided navigation via specific attributes of the product the user is looking for, using the attributes the user is looking for as filters. These filters are nothing but various explicit dimensions to perceive the product. A similar search query (what you generally type for searching any products) can generate various search results based on the combination of the filters used. This mechanism enables the user to arrange the search results on her own to get more control over shopping, rather than having the pre-determined taxonomic order.

As we know now, the filters are the key which defines the faceted search. However, these filters may not always be readily available for all products. Here are some reasons for that:

  • It is possible that the seller did not upload all required information while listing the product on the e-commerce website. This is typically the case when new e-commerce business ramps up and they aggressively promote quick onboarding of various sellers with their products. To achieve this, they often allow the sellers to list without having quality checks in place for the product metadata.

  • Some of the filters are difficult to obtain or the seller may not have the complete information to provide. For example, the calorific value of a food product, which is typically derived from the nutrients information provided on the product case. The e-retailers do not expect this information to be provided by the seller, but these are crucial because these may capture very important customer signals which are directly related to the conversation of that product sale.

Apart from search algorithms, there are many such nuances associated with faceted search and we will focus these aspects for the rest of this chapter. The issues mentioned above relate to the problem we will discuss in the next section - building an e-commerce catalogue.

Building an E-Commerce Catalogue

As we saw earlier in this chapter, building an informative catalogue is one of the primary problems in E-commerce. It can be split into several sub-problems, as shown below:

  • Attribute extraction

  • Product categorization and Taxonomy creation

  • Product enrichment

  • Product duplication and matching

Attribute Extraction

Attributes are properties that define a product. For example, in Figure 11-1, we saw brand, resolution, TV size etc as relevant attributes. An accurate display of these attributes will provide a complete overview of the product on the e-commerce website so that the customer can make a informed choice. A rich set of attributes directly relates to the improvement of clicks and click-through rates which influence the product’s sale. [Link to Come] shows an example of a product description obtained by a set of filters or attributes.

Figure 3-2. Product obtained by a set filters or attributes

As you can see, the attributes like {clothing, color, size} are basically what defines this product to a customer. Each of these attributes can have multiple values as seen in the figure. In this example, color takes seven values. However, directly obtaining attributes from the sellers for all products is difficult. Moreover, the quality of the attributes obtained should be consistent enough that it allows a customer to have the correct and relevant information about a product.

Traditionally, e-commerce websites employed crowdsourcing techniques to obtain the attributes. Crowdsourcing is typically done by third-party companies or crowdsourcing platforms where the specific question about each product is asked and the crowd workers are expected to answer them. Sometimes the questions are framed in a multiple-choice manner to restrict the answer into a set of values. But generally, that is expensive and not scalable with the increase in the volume of products. That is where techniques from machine learning step in. This by a challenging task because it requires the understanding of the context of the information present in the product. For example, look at the following two product descriptions in Figure 11-4.

Figure 3-3. Cases where ‘pink’ is the attribute value for two different attributes

Pink is a popular brand with younger women. Similarly, pink is a very common color of apparels. Hence, in the first case, Pink is a brand attribute whereas the other case, pink is just a color. In the above figures, we see the backpack is from the brand ‘pink’ with a color of neon red. Whereas the sweatshirt is of the color pink. Cases like these and many more are highly prevalent and poses a challenging task for a computer to solve.

There are some structured ways by which one can solve this problem to an acceptable performance. If a dictionary of attributes can be prepared or each product is listed with proper attribute values then the search mechanism can access them to populate the results accurately according to customer needs. The set of algorithms which extract the attributes from various product information are generally called attribute extraction algorithms. These algorithms take a collection of textual data as input and produce the output as a {attribute, value} pair. There are two types of attribute extraction algorithms - direct and derived [4]. Typically, these algorithm takes text as input (e.g., product title or product description).

Direct attribute extraction algorithms assume the presence of the attribute value in the input text. For example, ‘Sony XBR49X900E 49-Inch 4K Ultra HD Smart LED TV (2017 Model)’ - this text contains the brand ‘Sony’ explicitly mentioned here. A brand is typically an attribute which is expected to be present in the product title for most of the cases. On the other hand, Derived Attribute Extraction algorithms do not assume that the extracted attribute is present in the input text. They derive that information from the context. For example, gender is such an attribute which is usually not present in the product title, but from the input text, the algorithm can identify is the product is specifically for men or women. See here in this example - ‘YunJey Short Sleeve Round Neck Triple Color Block Stripe T-Shirt Casual Blouse’. The product is for women but the gender ‘women’ is not explicitly mentioned in the product description or title. In this case, the gender has to be inferred from the text, for instance, from the product description.

Typically, the direct attribute extraction algorithms are modeled as a sequence to sequence labeling problem. Recall that in chapter 5, we have discussed the supervised sequence models for Named Entity Recognition tasks. A sequence labelling model takes a sequence as input and outputs another sequence of the same length. In chapter 5, we have also discussed how to create input and output datasets with various encoding schemes which will be applicable here as well.

Let us say we have a product title ‘The Green Pet Shop Self Cooling Dog Pad’ , where ‘The Green Pet Shop’ is the brand. In a BIO encoding scheme, this will look as shown in Figure 11-5 below:

Figure 3-4. BIO encoding scheme for direct attribute extraction

We know, now, various tokenization techniques from chapter 2 and we can tokenize the input text to tokens (words here). As we like to preserve the sequential information of the input text, we continue with this sequence of tokens to make the input sequence. Getting such labelled data is crucial for any attribute extraction process. Especially in e-commerce, for supervised attribute extraction algorithms, you should have data that represents various categories with significant variations in the data content. Sampling strategies for collecting annotated titles should be stratified. The strata can be selected based on categories..

For such labeled data, a rich set of features [4, 46] need to be extracted to train a machine learning model. Ideally, the input features should capture the characteristics, locational, and contextual information. Here is a list of some of the features which can capture all these three aspects. One can develop more complex features in the similar lines and perform analysis to understand are they significant in improving the performances.

Characteristic features:

These are typically token based features:

the character composition of the token

letter case

token length in terms of number of characters

Locational features:

These features capture the positional aspect of the token in the input sequence:

number of tokens in the input sequence before the given token

ratio of the token position and the total length of the sequence

Contextual features:

These features mostly encode information about the neighbouring tokens:

the identity of the preceding/succeeding token

whether the preceding/succeeding token is capitalized

the bigram consisting of the token and its predecessor/successor

whether the preceding token is a conjunction

part of speech tag of the token

Once the features are generated and output tags are encoded properly, we get the sequence pairs for training the model. We can directly use some of the models which we have discussed in chapter 5 such as Conditional Random Field, Hidden Markov Model, etc. [4]. Even though the pipeline looks simple and similar to Named Entity Recognition (NER) systems, there are challenges with these feature generation schemes and modeling techniques. For attributes like ingredients, simple features do not capture the context around them.

To deal with such data sparsity and other feature incompleteness issues, some approaches suggest the use of sequence of word embeddings in the input This can be tackled in advanced methods like deep recurrent structure-based sequence to sequence models. The input sequence will be passed to the model as is, and it is supposed to predict the output sequence. Recent efforts include deep recurrent structures like RNN or LSTM to perform the sequence to sequence labeling task [47, 48]. One of the state-of-the technique includes the distributed word representations as we have discussed in chapter 2. As you can see from the architecture given in the figure, it uses both word-level representation, as well as character level representation [44]. Using character level representation helps here to cover unknown words which are out-of-vocabulary.

Figure 3-5. LSTM architecture for attribute extraction as a seq2seq task

Bidirectional LSTMs can be used to encode the sequential information present in the input sequence. LSTMs are known to capture contextual information present in the input. The need for bidirectional recurrent structures is to encode the sequential information from both directions of the input sequence. Here, you can see how this deep learning model works better than the typical machine learning models due to their non-expressiveness of the handcrafted features.

For indirect attribute classification, generally, classification frameworks are used. Recall the example of ‘YunJae Short Sleeve Round Neck Triple Color Block Stripe T-Shirt Casual Blouse’. For this case, we embed the whole input string using any of the sentence representation method (averaging out the individual word vectors, recall from chapter 4) or creating significant features such as presence of class-specific words, character n-grams, word n-grams and pass them them to a classifier where the indirect attribute values are used as a class. In this example here, to extract ‘gender’ as an attribute, one should use men, women, unisex, child as different class labels [4, 44].

Note
For the models which use deep recurrent structures, the amount of data needed is typically much more than what is needed when less complex ML models such as CRF, HMM are used. More the data, better the deep models learn. This is common to all deep learning models but for e-commerce getting a large set of well sampled annotated data is very expensive hence it is to be taken care of before you use any sophisticated models.
Figure 3-6. Characteristic performance improvement in the LSTM framework (Majumder et. al. [59])

Even though we have discussed so far about attribute extraction from textual data, mainly product titles, there are various recent approaches which extend this to multimodal attribute extraction incorporating various details such as title, description, image, reviews etc. about the product [43].

In the next sections we will be talking about expanding similar techniques we applied to product attributes to other facets of products and e-commerce.

Product Categorization and Taxonomy

Product categorization is a process of dividing products into groups. These groups can be defined based on similarity--product of same brands can be grouped together and product of the same type can be group together. Generally e-commerce has defined broad categories of products such as electronics, beauty products, foods etc. Once a new product arrives, it is needed to be grouped before it is put in the catalog. One can further define successively smaller group for tighter definition of products, such as GMO and non-GMO food products inside the food category.

This categorization process is typically manual but on a huge scale, it is almost impossible to categorize the product manually. This categorization is typically a classification process. The algorithm takes information from a variety of sources and applies the classification technique to solve this [49, 50].

Specifically, there are efforts where algorithms take input as the title or description and classify the product into a suitable category when all the categories are known previously. This again falls in the typical case of text classification. In this way, the categorization process can be automated. Once the category is determined, it is directly extended to relevant attribute extraction process which we have discussed. It is logical that a product will be passed to the attribute extraction process only when its category is discovered.

There are also multimodal approaches to solve this problem as well. The accuracy of the algorithm can be improved when both images and text can be used to solve this problem. Images can be passed to a convolutional neural network for generating image embedding and on the other hand, the text sequence can be encoded by LSTM [43], which in turn can be appended to be passed to any classifier for the final output.

As we have described here about categories for different products, there are good efforts by e-commerce companies to streamline this. The streamlined catalog or category-tree is called a taxonomy. We call it a tree since it can start from high level categories like electronics, which are considered as roots, and essentially branch out to smaller subcategories such as laptops, mobiles etc.

Importance of Taxonomy

A good taxonomy and properly linked products can be critical for any e-commerce system [10]. It is important for:

  • Showing similar products to the product searched

  • Better recommendation

  • Selecting proper bundle of products for better deal to the customer

  • Provision of replacing old products to new

  • Price comparison across different products in the same category

Figure 3-7. Fig 11-8: A typical category hierarchy - taxonomy of a product [10]

Building a taxonomic tree is an elaborate process and text classification is widely used. Once the attribute values are obtained from the attribute extraction procedure, placing them in right level can be done via a hierarchical text classification. Since this process of establishing taxonomic tree is lengthy, hence to reduce the overhead, simple rule based classification methods are used. Cases which are complex and requires context to determine the right taxonomic level, are dealt by ML classification techniques such SVM or decision tree [51].

Figure 3-8. Fig 11-9: Taxonomic tree with different levels [51]

For new e-commerce platforms, creating a product taxonomy via right product categorization is an insurmountable task. This requires huge relevant data, manual intervention and category experts’ decision to build such rich content. All of these can be expensive to get for nascent e-commerce platforms, however, there are some APIs that can help. These APIs typically access large catalogue content of various big retailers and build the intelligence inside to categorize a product by scanning its unique product code. Small scale e-commerce should utilise the power of strong APIs for taxonomy creation and categorization. Once, a significant amount of product information is gathered, then using own rule based systems are advisable. Some of these APIs are offered by Semantics3, eBay and Lucidworks, to name a few. Figure 11-10 below shows a snapshot of one such API from Semantics3 [52]. It is one of the leading API providers for various retail and e-commerce solution. Their AI-driven API helps to categorise a product from its name.

Figure 3-9. Fig 11-10 Semantics3 terminal snapshot [52]

Product Enrichment

A product is generally identifiable through its Unique Product Code (UPC). But for a better customer experience, it is important to provide other relevant information as we have already discussed in the attribute extraction section. There could be potential sources of these information - short titles, long titles, product images, product description - but to extract relevant information from these sources to fill the catalog is important and requires machine learning algorithms. To enrich the information of a product, it is first important to categorize the product into the right category to establish and link to its proper taxonomy. Sometimes, a misleading title can hamper the faceted search in an e-commerce platform. Such an example is the following. Improving product title not only enhances the click-through rate but also improves the conversion rate.

In the example in Figure 11-11, the product title is too long, contains keywords like iPad, iPhone, Samsung which can easily mislead the search. This is an ideal case which requires determination of product category, the right taxonomy, the attributes which can further be used to better frame the product title. We discuss various aspects of product enrichment in this section.

Figure 3-10. Example of clumsy product title - an ideal case for product enrichment

Product enrichment is typically seen as a larger and more continuous process than just improving product titles in any online retail setup. Typically, automatic processes are set up to generate reports on how much product information is enriched. This ‘how much’ can be defined in a more detailed manner. The product content is typically divided into many levels based on their taxonomic definitions. At every level, the number of present attribute values over number of required attribute values can give an indication about ‘how enriched’ is the product content. For instance, for a product ‘a t-shirt’, information such as brand, size, color, material are important to have. Now it is also possible that brand is an attribute which is a higher level taxonomic attribute than attributes like color or material. Here, higher denotes a more generic information. This can be followed by more specific information i.e. lower level taxonomic attribute such as material. The product enrichment reporting process can peek into each of these levels and detect how many values are present and calculates a ratio over how many attribute values are required to present.

Apart from taxonomic level, there are many other ways to define the enrichment levels. Most of them are based on the importance of the information. Following is one of the ways of defining them, as by [51]:

Once an acceptable enrichment proportion is achieved, then a better look at the product title and description can be given. Improving product enrichment as a part of attribute filling is a part of attribute extraction which we have already discussed in the last section.

The problem we have depicted in the Fig 13 can be addressed as a string matching NLP task. Once, for that product, different taxonomic level as well as enrichment levels are filled or filled at least to an acceptable threshold (typically defined by the retail platform itself), then an attempt can be taken to making the product title more expressive and accurate.

The process can start with direct string matching. All the attribute values which are present can be checked with exact match across all the words or phrases present in the existing product title. If one or more attribute values are not present in the existing product title then they have to be incorporated. It is also required to filter out tokens which are not part of the product’s attribute values. We can see in the example given in Fig 13 that iPad, iPhone are not part of its attribute values. These tokens are misleading and can affect the faceted search. Hence, such tokens should be removed from the product title, unless they are very important to set up the context for the product.

Ideally, a predefined template for the product titles helps to maintain consistency across them. A wise approach would be to build a template which is composed of attribute values in an top down order of the taxonomic tree. The product category or type could be the first token in the product title. It can follow other attributes in the decreasing order of important such as Brand, Pack size, Color etc. Attributes which at the lowest level of importance (enrichment level) can be omitted to be put in the product title. These values can come as a part of large description of the product which typically appear after product title, price etc, in the e-commerce product webpage.

Product Duplication and Matching

There are many cases where the product information is mistakenly same, not the product images are the same which is termed as product duplication. Apart from product categorization and attribute extraction, product duplication is also an important aspect of e-commerce. Identifying duplicate product is a challenging task. This process also includes a product matching procedure. The following process describes one way to handle this problem. It includes: attribute match, title match and image match.

Attribute match

If the attributes are already extracted then we can look for similar attributes for both the products and compare the value. For example, if the brand is extracted for both the product then the brand can be compared. Matching attributes is a string matching task [53]. Two strings can be matched via exact character match or using string similarity metrics. String similarity metrics are typically built to take care of slight spelling mistakes, abbreviations etc. Customized metrics can also be built to avoid very specific characteristics of e-commerce keywords. Abbreviations are a big problem in product related data. The same word can be represented in multiple accepted abbreviations. They should be mapped to a consistent form (normalization, which we will discuss in brief later) or form agnostic rules have to be formulated to tackle the problem. An intuitive rule to tackle abbreviations while matching two words could be matching of first and last character and checking whether character those belong to the shorter word belong to the longer word or not.

Title match

There could be titles which have almost similar words or word sequences. Below are some examples:

Garmin nuvi 2699LMTHD GPS Device

nuvi 2699LMTHD Automobile Portable GPS Navigator

Garmin nuvi 2699LMTHD — GPS navigator — automotive 6.1 in

Garmin Nuvi 2699lmthd Gps Device

Garmin nuvi 2699LMT HD 6” GPS with Lifetime Maps and HD Traffic (010–01188–00)

These are the example of various wordings of the same product by different sellers. Definitely, the e-commerce platform would like to retrieve them all for a given search on this product. To identify all the titles are the same, there are multiple methods but the simplest one would be to compare bigrams and trigrams. It is also possible to generate title level features (mostly related with bigram and trigram) and then calculate euclidean distance between them. The problem of generating feature is always twofold, as we know creating right and significant feature is difficult also the patterns of matching may vary drastically across categories. Recent efforts have shown that using sentence level embedding and using a pair of textual phrase simultaneously to learn a distance metric helps in matching [54]. This is typically done by a neural network architecture called Siamese network [55]. Siamese network simultaneously takes both the sequences and learns to generate the embeddings in such a way that if they are similar, embeddings will be closer, else farther.

Image Match

Pixel-to-Pixel match or feature map matching or even advanced image matching techniques (siamese networks) are popular for image matching [56]. Duplication can be reduced by efficient image matching techniques. Most of the algorithms are based on the principles of computer vision approaches and depend on images quality and other size related particulars.

Note
Running A/B tests are important after attribute extraction, product enrichment and matching procedures are done. This reflects the impact in terms of direct or indirect sales, clicks through rates, time spent on one web page etc.

Review Analysis

Reviews are considered to be one of the integral elements in the e-commerce domain. They reflect direct feedback from customers about the performance of the products. It is very important to leverage this abundant information and churn important signals to send feedback to the e-commerce system so that it further improves the customer experience. Moreover, reviews are available to all the customers and it directly affects the sales of products. One of the most common action that most of the e-commerce platforms take is to perform sentiment analysis on the reviews. In this section, we will delve deeper into the different aspects of review sentiment analysis.

Sentiment Analysis

Figure 11-12 shows a screenshot of customer reviews of a product on Amazon.com. Most of us are familiar with seeing such screenshots on e-commerce websites These are the ratings available for iPhone X on Amazon.

Figure 3-11. Analysis of customer reviews - ratings, keywords, sentiments

As you can see, there are 67% of reviews which have a rating of 5 stars, i.e the highest, but there are 22% of reviews which have the lowest rating. Now for an e-commerce company, it is important to know what led the customers to give the lowest ratings. Now see two extreme examples of reviews of the same product -

Figure 3-12. A positive and a negative review

Certainly, both of these reviews contain some information about the phone which give the retailer cues what customers are thinking. Specifically, negative reviews are more important to understand. Look at the first review where the customer is stating that there are issues regarding phone those are getting shipped. It is mostly related to the defected screen which the retailer should take care of. Hence, it is crucial that there is a full understanding of the reviews. By nature, they are in the text, and mostly in very unstructured format, full of unforced errors such as spelling mistakes, wrong sentence constructions, incomplete words, abbreviations etc. This makes the review analysis even more challenging.

In chapter 4, we have discussed the basic sentiment analysis, as a classification problem. Starting from a dictionary based approach, we discussed several different ways of performing text classification, which can all be applied to sentiment analysis.

Note

Typically, the review contains more than one sentence. It is advisable to break a review into sentences and pass each sentence as one data point. This also has other implications such as sentence wise aspect tagging, aspect wise sentiment analysis etc.

Ratings are considered to be directly proportional to the overall sentiment of the reviews. There are cases where the user mistakenly rated the product badly but had given a positive review. Understanding emotions directly from the text will help retailers to rectify these anomalies during analysis. Given the fact, we have already discussed how you can get the overall emotion for the review. But typically, a review doesn’t talk about one aspect of the product-- generally tries to cover most of the aspect of the product and finally, everything reflects in the review rating.

Have a look at the snapshot of iphone X review snapshot from amazon.com again in Figure 12.

Look at the section where it reads ‘Read reviews that mention’. These are nothing but the important keywords that Amazon has found out to be displayed which may help the customers to navigate better in skimming through the reviews. This clearly indicates that there are certain aspects on which customers are talking about. It could be user experience, or manufacturing aspects or price or something else. How could we know what are the emotions or feedback of the customer along these aspects? Our discussion till now has only given a high-level index of emotion for the whole review but will not be able to dig down more to understand it better. This necessitates the aspect level understanding of the reviews. These aspects could be pre-defined or could be extracted from the review data itself. Based on that, the approaches will be supervised or unsupervised, accordingly.

Aspect Level Sentiment analysis

Before we start the discussion of various techniques for aspect level sentiment analysis, we need to understand what is an aspect. An aspect is a semantically rich concept-centric collection of words which indicates certain properties or characteristics of the product. For instance in Figure 11-14, we see the kind of aspects a travel website might have- Location, Value and Cleanliness.

This is not only constrained to the inherent attribute of the product but also anything and everything related to the supply, presentation, delivery, return, quality and many more around the product. Typically, a clear distinguishing of these aspects is difficult unless assumed previously.

When the retailer has a very clear understanding of the aspect which can be talked about around the product, then finding aspects falls under the supervised category of algorithms. There is a common technique for using seed words or seed lexicons which essentially hints the crucial tokens that could be present under that particular aspect. For example, regarding user experience as an aspect for iPhone X, seed words could be screen resolution, touch, the response time of the phone etc. Again, it is completely up to the retailer at what level of granularity, they would like to operate. For example, screen quality alone could be a more granular aspect.

Supervised Approach

The supervised approaches mainly depend upon the seed words. It tries to identify the presence of these seed words in a sentence given a review. If it identifies a particular seed word in a sentence it tags the sentence by the corresponding aspect. Once all the sentences are tagged to any of the aspects, the sentiment analysis has to be done on a sentence level. Now since we already have an additional tag for each sentence, sentences having one tag can be filtered and sentiments for them could be aggregated to understand the feedback of the customer for that aspect. For example, all review sentences related to screen quality, touch, response time can be grouped together.

For a change, let’s look at a direct example from a travel website in Figure 11-14 where the aspect level sentiment analysis is apparent. As you see, there are specific ratings for location, check in, value, cleanliness - which are semantic concepts rightfully extracted from the data to present a more detailed view on the reviews.

Figure 3-13. Aspect level ratings on reviews given in a travel website

Unsupervised Approach

As it is understood, arranging good quality seed lexicon is very difficult and hence there are unsupervised ways of detecting aspects.

Topic modeling is a very useful technique in identifying latent topics present in a document. We can think of these topics as aspects in our case. Imagine if we can group sentences which are talking about the same aspect. That is exactly what a topic modeling algorithm does. One of the most popular topic modeling approaches is to use Latent Dirichlet Algorithm (LDA). Latent Dirichlet algorithm takes a document containing sentences and returns the distributions of belongingness of each sentence to a previously specified number of topics. Once the probability number is there, it is easy to perform an argmax operation to find out the assignment of topics to each sentence.

In a similar fashion, we can predefine the number of aspects we are expecting out of the set of sentences. The topic modeling algorithm also outputs the probability of each word to be in all the topics (aspect here). Hence, it is also possible to group words which have a very high chance to belong to a certain aspect and call them characteristic words for that particular aspect. This will finally help to annotate the unannotated aspects of suitable ones.

Further to this, there could be more unsupervised approach which can be performed by creating sentence representation and then performing clustering as opposed to performing LDA. It has been seen that the later sometimes gives better results when the number of review sentences are less.

Note

Defining aspects is a difficult task. One should keep the business needs in mind while the aspects are retrieved. For example, if the seller in the e-commerce platform is a manufacturer, you would like to identify aspects like screen quality, hardware performance etc for a mobile phone issue. On the other hand, for third party sellers, one may focus on aspects like user experience, screen size etc. A same review can be tagged in multiple names as aspects as it is needed.

Connecting Overall Ratings to Aspects - Latent Rating Regression (LARA)

We have already seen how you can detect the sentiment for each aspect. Now, if we think, typically users also give an overall rating. The idea here is to connect that rating to individual aspect level sentiment. Details of LARA implementation are outside the scope of the book but here is an example of the system, generating aspect level rating for a hotel review [5]. You can delve into the details by looking at the reference.

Figure 3-14. Aspect wise sentiment prediction using LARA

We can assume that the final rating is nothing but a weighted combination of individual aspect level sentiments. The objective shall be estimating the weights, as well as the aspect level sentiment together. It is also possible to perform these two operations sequentially i.e. first determining the aspect level sentiment and then the weights.

These weights learned will finally indicate on top of various sentiment present for each aspect, how much a reviewer puts importance on that specific topic. It is possible that a customer is extremely unhappy with some aspect but maybe it is not in his priority. This information is crucial for e-tailers before they take any direct action.

Note

User information is also key in handling reviews. Imagine a scenario where a user who is very popular writes a good review as opposed to a user who being less popular does so. The user matters! While performing the review analysis, a ‘user weight’ can be defined for all the users based on their ratings (generally given by other peers) and can be used in all calculation to discount the reviewer bias.

Understanding Aspects in a better way

Once we derive all the aspects and tag each sentence with these aspects, it is possible to group the sentences by aspects. But given the huge volume of review an e-commerce website encounters, there will still be a lot of sentences coming under an aspect. Here, a summarization algorithm may save the day. Think about a situation, where you need to take an action regarding an aspect but you do not have the capacity to go through all the sentences regarding that particular aspect. Hence, you need an automatic algorithm which can pick and choose the best representative sentences for that aspect.

LexRank [6] is an algorithm, similar to PageRank which assume each sentence is a node and connects via sentence similarity. Once done, it picks the most central sentences out of it and presents an extractive summary for the sentences under an aspect.

Figure 3-15. The complete flow chart of review analysis - overall sentiments, aspect level sentiments, and aspect wise significant reviews
Note

A complete understanding of a product can only be achieved by both user reviews and editorial reviews. Editorial reviews are generally provided by expert users or domain experts. These reviews are more reliable and can be shown at the top of the review section. But on the other hand, general user reviews reveal the true picture of the product experience from all users perspective. Hence melding editorial reviews with the general user reviews is important. That may achieved by mixing both kinds of reviews in the top section and ranking them accordingly.

Recommendations for e-Commerce

We have already seen various recommendation techniques on recommendations in chapter 9. Here, in the context of e-commerce, a comprehensive study over what are the typical algorithms those work in various scenarios is presented [8].

Figure 3-16. Recommendation techniques and their usage in e-commerce

Figure 11-17 shows an analysis of various recommendation algorithms employed in e-commerce websites. As can be seen, typically when no metadata or transaction data are present, the e-commerce websites use best-selling products as the recommended product to the user. But after that, there are two main intelligent ways by which the recommendations usually take place - neighborhood-based methods and association rule mining.

Think about a scenario of a product has been viewed or bought by a customer. What are the products which can be shown as a recommendation is a question. Obviously, products which are related to it are likely to be shown. This can be achieved by neighborhood-based methods where they look for similar products (in terms of attributes, purchase history, customers who bought etc.).

On the other hand, association rule mining is solely based on transaction patterns [58]. It is possible to calculate the probability for one product to be bought with another product. This probability governs which other products should be recommended along with the respective products.

The typical recommender systems use the transaction history into account to capture the user behavior based on which the recommendation can be done. Transaction history is mainly numerical data and matrix factorization methods are very effective in building the recommender engines. Now, apart from that, the inherent properties of the product also drive customers to purchase them. This can only be included in the recommendation algorithm if it considers the product attribute information, which are typically textual information. In this section we mainly focus on approaches that take textual information into account.

Note

Recommendation engines deal with information from various sources. Proper matching of various data tables, consistency of the information across various data sources is very important to maintain. For example, while collating the information about product attributes and product transaction history, it is to be carefully checked about the consistency of the information. Also, complement and substitute data gives indication about data quality. One should check for anomalous behavior while working with multifarious data sources, which is the case for e-commerce recommendation.

Apart from product attributes, reviews are also a rich source of data which talks about the product. These reviews can be intelligently used to understand the interrelationship between products and based on that understanding, new products can be recommended to the customer. Here, we will perform a case study on identifying substitute and complement products of a certain product by accessing signals from raw reviews.

Case Study - Substitutes and Complements

Recommender systems are built on the idea of ‘similar’ products. This similarity can be defined in many ways such as content-based or user base. There is another way of identifying the item interrelationship, specifically in an e-commerce setting.

We know there are products which are typically bought together and can be termed as complements. On the other hand, there are pairs which are bought in lieu of the other, and they are termed as a substitute pair. Even though the economic definition is much more rigorous, these lines of thought typically capture the behavioral aspect of product purchase.

The consumer behavior is quirky, but their joint behavior can provide ample information about the product interrelationship. There are many ways by which these substitute pairs and complementary pairs can be detected. We will focus on one such approach which primarily relies on the textual information present around products to infer their interrelationship.

Julian McAuley has presented [10] a comprehensive way of understanding the product interrelationship in a framework where the query product is given, the framework returns the ranked products which are substitutes and complements. We will discuss this application as a case study in the light of e-commerce (Refer to Chapter 8 for a dedicated focus on all other recommendation techniques).

Typically, as we have discussed, the reviews do contain the specific information about product attributes. As we know, attribute extraction is a procedure itself and we have seen ways to extract them. This approach shows an alternative way of understanding product attributes, mostly in a latent manner and infer from them.

Figure 3-17. Substitutes and Complements based on product reviews [10]

Getting the topic

Each product is associated with a review. This review can discuss or mention about various topics related to the products. Please note, these topics are latent and cannot be distinctly identified but the share of discussion on a review can be obtained per topic for a particular product. The topic modeling algorithm can achieve the topic representation of a product. This can be achieved by performing an LDA [9] on the reviews available and then obtain the topic distribution for the review.

Thus, we obtain a vector which we call a topic vector which tells how a particular product has been discussed in the reviews. For all the products, this topic vector can be obtained.

How two products are linked

Now, the next task would be to understand how the two products are linked. We know the topic vectors for all the products which in turn latently capture the intrinsic properties of the product. Please note that these vectors do capture the product attribute information without explicitly extracting them. The idea is to create a joint feature vector out of the topic vectors available for a product pair and then predict if there is any relationship between these two products. This can be viewed as a binary classification problem where the features have to be obtained from the individual topic vector for the product pair. We call this process ‘link prediction’.

To ensure that the topic vector is expressive enough to predict a link or relationship between a product pair, the objective of obtaining topic vectors and link prediction can be solved jointly rather than sequentially.

Here we are trying to learn Topic vectors for each product as well as learn feature weights for each products. We are trying to predict whether the pair has a relationship (1) or not (0).

Figure 3-18. Topic vector and topic hierarchy expresses how different taxonomic identities and relations are captured in reviews

This shows how a topic vector becomes expressive enough to capture the intrinsic attributes of the product. Indeed, this a laptop charger and the model clearly depicts the topical similarity of various product type or hierarchy.

Summary

The e-commerce industry is seeing immense success and a primary driver behind that has been the massive data collection, utilization, analysis and adaptation of the data-driven decisions. Textual data is a major part of e-commerce data. In this chapter, We have seen a handful of tasks in e-Commerce which solely deals with textual data and discussed the customized NLP solutions required to solve them. On the one hand, we have the task of catalog building. Product content information is crucial to be accurate and for that, we need tasks like text classification and NER. On the other hand, from reviews, there are pipelines which can tell the story behind customers’ feeling about a product. Text classification for sentiment analysis, text summarization for getting out the essence of large review texts are few of the many. The recommendation engine is another backbone of e-commerce operations and clearly needs the information both from the user and product catalog. There are complex techniques which account the textual information along with transactional information to define the recommendation algorithms.

Summing this up, the penetration of NLP techniques in modern day e-commerce platforms is exemplary. They also have been core drivers of this immense revenue growth. This being said, it always warrants a closer look towards the data sources, data type and finding a suitable algorithm which fits the business needs. This chapter aims to scratch that surface and seeds those possibilities which can lead to a successful user-friendly e-commerce business.

  • You now are familiar with various problems in e-commerce which can be attempted to solve using NLP techniques.

  • You also know what are the typical sources of textual data in an e-commerce context.

  • You know how to extract a 360 view about a product which is basic element in an e-commerce system.

  • You know how to apply attribute extraction algorithms (step by step) to extract product attributes which are crucial for faceted search.

  • You now have better understanding about product enrichment and taxonomy building. You have all the pointers to build your own.

  • You Know now the nitty-gritties of product matching and its various aspects.

  • You know now that ‘reviews’ are one of the most important textual data source in e-commerce.

  • You know how to apply a classification framework for sentiment analysis.

  • You know how to develop an aspect extraction framework and how to analyse them under the hood of sentiment analysis and important determination.

  • You know how you can leverage texts for recommendation in e-commerce. You have a good understanding about a case of product substitute determination with the help of reviews.

References

[1] E-commerce in the United States - Statistics & Facts, Statista.com, https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/

[2] Ben-Yitzhak, Ori, et al. “Beyond basic faceted search.” Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 2008.

[3] Hearst, Marti. “Design recommendations for hierarchical faceted search interfaces.” ACM SIGIR workshop on faceted search. 2006.

[4] More, Ajinkya.: Attribute Extraction from Product Titles in eCommerce. EI-KDD (2016)

[5] Wang, Hongning, Yue Lu, and Chengxiang Zhai. “Latent aspect rating analysis on review text data: a rating regression approach.” Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2010.

[6] Erkan, Günes, and Dragomir R. Radev. “Lexrank: Graph-based lexical centrality as salience in text summarization.” journal of artificial intelligence research 22 (2004): 457-479.

[8] Sarwar, Badrul, George Karypis, Joseph Konstan, and John Riedl. “Analysis of recommendation algorithms for e-commerce.” In Proceedings of the 2nd ACM conference on Electronic commerce, pp. 158-167. ACM, 2000.

[9] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent dirichlet allocation.” Journal of machine Learning Research 3.Jan (2003): 993-1022.

[10] McAuley, Julian, Rahul Pandey, and Jure Leskovec. “Inferring networks of substitutable and complementary products.” Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015.

[30] History of the Sears Catalog - http://www.searsarchives.com/catalogs/history.htm

[31] https://en.wikipedia.org/wiki/Timeline_of_Amazon.com

[32] Streitfeld, David; Kantor, Jodi (August 17, 2015). “Jeff Bezos Says Amazon Won’t Tolerate ‘Callous’ Management Practices”. The New York Times. ISSN 0362-4331

[33] The history of e-bay - http://www.cs.brandeis.edu/~magnus/ief248a/eBay/history.html

[34] The history of Walmart - http://help.walmart.com/app/answers/detail/a_id/6/~/walmart.coms-history-and-mission

[35] Weissmann, Jordan (March 13, 2014). “Amazon Is Jacking Up the Cost of Prime, and It’s Still Cheap”. Slate.com. The Slate Group. Retrieved May 9, 2014

[36] “Amazon has 100 Million Prime Members”. Engadget. Retrieved 18 April 2018

[37] http://help.walmart.com/app/answers/detail/a_id/1544/~/free-2-day-shipping

[38] https://news.walmart.com/2017/01/31/walmart-launches-free-two-day-shipping-on-more-than-two-million-items-no-membership-required

[39] Jannach, Dietmar, and Malte Ludewig. “Investigating personalized search in e-commerce.” FLAIRS, 2017.

[40] Tunkelang, Daniel. “Faceted search.” Synthesis lectures on information concepts, retrieval, and services 1.1 (2009): 1-80.

[41] Ben-Yitzhak, Ori, Nadav Golbandi, Nadav Har’El, Ronny Lempel, Andreas Neumann, Shila Ofek-Koifman, Dafna Sheinwald, Eugene Shekita, Benjamin Sznajder, and Sivan Yogev. “Beyond basic faceted search.” In Proceedings of the 2008 international conference on web search and data mining, pp. 33-44. ACM, 2008.

[42] Fishburn, Peter C. Utility theory for decision making. No. RAC-R-105. Research analysis corp McLean VA, 1970.

[43] Logan, I. V., Robert, L., Humeau, S., & Singh, S. (2017). Multimodal Attribute Extraction. arXiv preprint arXiv:1711.11118.

[44] Majumder, Bodhisattwa Prasad, Aditya Subramanian, Abhinandan Krishnan, Shreyansh Gandhi, and Ajinkya More. “Deep Recurrent Neural Networks for Product Attribute Extraction in eCommerce.” arXiv preprint arXiv:1803.11284 (2018).

[45] https://en.wikipedia.org/wiki/Pink_(Victoria%27s_Secret)

[46] Tkachenko, Maksim, and Andrey Simanovsky. “Named entity recognition: Exploring features.” In KONVENS, pp. 118-127. 2012.

[47] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to sequence learning with neural networks.” In Advances in neural information processing systems, pp. 3104-3112. 2014.

[48] Huang, Zhiheng, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF models for sequence tagging.” arXiv preprint arXiv:1508.01991 (2015).

[49] A.-M. Popescu and O. Etzioni, “Extracting product features and opinion from reviews”, Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, pp. 339–346, 2005

[50] Wang, Tao, Yi Cai, Ho-fung Leung, Raymond YK Lau, Qing Li, and Huaqing Min. “Product aspect extraction supervised with online domain knowledge.” Knowledge-Based Systems 71 (2014): 86-100.

[51] Trietsch, R. C. “Product attribute value classification from unstructured text in e-commerce.” (2016).

[52] https://www.semantics3.com/blog/product-classification-with-ai-how-machine-learning-is-helping-clean-up-the-messy-underside-of-4b911a2414bb

[53] Cheatham, Michelle, and Pascal Hitzler. “String similarity metrics for ontology alignment.” In International Semantic Web Conference, pp. 294-309. Springer, Berlin, Heidelberg, 2013.

[54] Bilenko, Mikhail, and Raymond J. Mooney. “Adaptive duplicate detection using learnable string similarity measures.” In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 39-48. ACM, 2003.

[55] Neculoiu, Paul, Maarten Versteegh, and Mihai Rotaru. “Learning text similarity with siamese recurrent networks.” In Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 148-157. 2016.

[56] Zagoruyko, Sergey, and Nikos Komodakis. “Learning to compare image patches via convolutional neural networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353-4361. 2015.

[57] Morinaga, Satoshi, Kenji Yamanishi, Kenji Tateishi, and Toshikazu Fukushima. “Mining product reputations on the web.” In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 341-349. ACM, 2002.

[58] Agrawal, R., Imieliński, T. and Swami, A., 1993, June. Mining association rules between sets of items in large databases. In Acm sigmod record (Vol. 22, No. 2, pp. 207-216). ACM.

[59] Majumder, B. P., Subramanian, A., Krishnan, A., Gandhi, S., & More, A. (2018). Deep Recurrent Neural Networks for Product Attribute Extraction in eCommerce. arXiv preprint arXiv:1803.11284.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.105.247