Content-based filtering

As we mentioned earlier, content-based filtering systems provide users with recommendations based on their past behavior as well as the characteristics of items that are positively rated or liked by the given user. We can also take into account the items that were disliked by the given user. An item is generally represented by several discrete attributes. These attributes are analogous to the input variables or features of a classification or linear regression based machine learning model.

For example, suppose we want to build a recommendation system that uses content-based filtering to recommend online products to its users. Each product can be characterized and identified by several known characteristics, and users can provide a rating for each characteristic of every product. The feature values of the products can have values between the 0 and 10, and the ratings provided by users for the products will have values within the range of 0 and 5. We can visualize the sample data for this recommendation system in a tabular representation, as follows:

Content-based filtering

In the preceding table, the system has Content-based filtering products and Content-based filtering users. Each product is defined by Content-based filtering features, each of which will have a value in the range of 0 and 10, and each product is also rated by a user. Let the rating of each product Content-based filtering by a user Content-based filtering be represented as Content-based filtering. Using the input values Content-based filtering, or rather the input vector Content-based filtering, and the rating Content-based filtering of a user Content-based filtering, we can estimate a parameter vector Content-based filtering that we can use to to predict a user's rating. Thus, content-based filtering in fact applies a copy of linear regression to each user's rating and each product's feature values to estimate a regression model that can in turn be used to estimate the users rating for some unrated products. In effect, we learn the parameter Content-based filtering using the independent variables Content-based filtering and the dependent variable Content-based filtering and for all the users the system. Using the estimated parameter Content-based filtering and some given values for the independent variables, we can predict the value of the dependent variable for any given user. The optimization problem for content-based filtering can thus be expressed as follows:

Content-based filtering
Content-based filtering

The optimization problem defined in the preceding equation can be applied to all users of the system to produce the following optimization problem for U users:

Content-based filtering

In simple terms, the parameter vector Content-based filtering tries to scale or transform the input variables to match the output variable of the model. The second term that is added is for regularization. Interestingly, the optimization problem defined in the receding equation is analogous to that of linear regression, and thus content-based filtering can be considered as an extension of linear regression.

The key issue with content-based filtering is whether a given recommendation system can learn from a user's preferences or ratings. Direct feedback can be used by asking for the rating of items in the system that they like, although these ratings can also be implied from a user's past behavior. Also, a content-based filtering system that is trained for a set of users and a specific category of items cannot be used to predict the same user's ratings for a different category of items. For example, it's a difficult problem to use a user's preference for news to predict the user's liking for online shopping products.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.246.245