CBF methods

This class of method relies on the data that describes the items, which is then used to extract the features of the users. In our MovieLens example, each movie j has a set of G binary fields to indicate if it belongs to one of the following genres: unknown, action, adventure, animation, children's, comedy, crime, documentary, drama, fantasy, film noir, horror, musical, mystery, romance, sci-fi, thriller, war, or western.

Based on these features (genres), each movie is described by a binary vector mj with G dimensions (number of movie genres) with entries equal to 1 for all the genres contained in movie j, or 0 otherwise. Given the dataframe that stores the utility matrix called dfout in the Utility matrix section mentioned earlier, these binary vectors mj are collected from the MoviesLens database into a dataframe using the following script:

CBF methods

The movies content matrix has been saved in the movies_content.csv file ready to be used by the CBF methods.

The goal of the content-based recommendation system is to generate the user's profile with the same fields to indicate how much the user likes each genre. The problem with this method is that the content description of the item is not always available, so it is not always possible to employ this technique in the e-commerce environment. The advantage is that the recommendations to a specific user are independent of the other users' ratings, so it does not suffer from cold start problems due to an insufficient number of users' ratings for particular items. Two approaches are going to be discussed to find the best recommendation methodologies. The first methodology simply generates the user's profile associated with the average ratings of the movies seen by each user to each genre and the cosine similarity is used to find the movies most similar to the user preferences. The second methodology is a regularized linear regression model to generate the user's profile features from the ratings and the movie features so that the ratings of the movies not yet seen by each user can be predicted using these users' profiles.

Item features average method

The approach is really simple and we are going to explain it using the features that describe the movies in the MovieLens example, as discussed previously. The objective of the method is to generate the movie genres' preferences vector Item features average method for each user i (length equal to G). This is done by calculating the average rating Item features average method and each genre entry g; Item features average method is given by the sum of ratings of the movies seen by user i (Mi) containing the genre g, minus the average Item features average method and divided by the number of movies containing genre g:

Item features average method

Here, Ikg is 1 if the movie k contains genre g; otherwise it is 0.

The vectors Item features average method are then compared to the binary vectors mj using the cosine similarity and the movies with the highest similarity values are recommended to the user i. The implementation of the method is given by the following Python class:

Item features average method

The constructor stores the list of the movie titles in Movieslist and the movie features in the Movies vector, and the GetRecMovies function generates the user genres' preferences vector, that is, Item features average method (applying the preceding formula) called features_u, and returns the most similar items to this vector.

Regularized linear regression method

The method learns the movie preferences of the users as parameters Regularized linear regression method of a linear model, with Regularized linear regression method, where N is the number of users and G is the number of features (movie genres) of each item. We add an intercept value on the user parameters θii0 = 1) and also the movie vector mj that has the same value mj0=1, and so Regularized linear regression method. To learn the vectors of parameters qi , we solve the following regularized minimization problem:

Regularized linear regression method

Here, Iij is 1; that is, user i watched the movie, otherwise j is 0 and λ is the regularization parameter (see Chapter 3, Supervised Machine Learning).

The solution is given by applying gradient descent (see Chapter 3, Supervised Machine Learning). For each user i:

  • Regularized linear regression method (k=0)
  • Regularized linear regression method (k>0)

Since we are adding 1 entry to the movie and user vectors respectively, the distinction between learning the intercept parameter (k=0) and the others is necessary (there is no possibility of overfitting on the intercept, so no need to regularize on it). After the parameters qi are learned, the recommendation is performed by simply applying for any missing rating rij in the formula Regularized linear regression method.

The method is implemented by the following code:

Regularized linear regression method

The constructor of the class CBF_regression just performs the gradient descent to find the parameters θi (called Pmatrix) while the function CalcRatings finds the most similar rating vector in the stored utility matrix R (in case the user is not present in the utility matrix) and then it uses the corresponding parameters' vector to predict the missing ratings.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.138.104