Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CBF methods

This class of method relies on the data that describes the items, which is then used to extract the features of the users. In our MovieLens example, each movie j has a set of G binary fields to indicate if it belongs to one of the following genres: unknown, action, adventure, animation, children's, comedy, crime, documentary, drama, fantasy, film noir, horror, musical, mystery, romance, sci-fi, thriller, war, or western.

Based on these features (genres), each movie is described by a binary vector m_j with G dimensions (number of movie genres) with entries equal to 1 for all the genres contained in movie j, or 0 otherwise. Given the dataframe that stores the utility matrix called dfout in the Utility matrix section mentioned earlier, these binary vectors m_j are collected from the MoviesLens database into a dataframe using the following script:

The movies content matrix has been saved in the movies_content.csv file ready to be used by the CBF methods.

The goal of the content-based recommendation system is to generate the user's profile with the same fields to indicate how much the user likes each genre. The problem with this method is that the content description of the item is not always available, so it is not always possible to employ this technique in the e-commerce environment. The advantage is that the recommendations to a specific user are independent of the other users' ratings, so it does not suffer from cold start problems due to an insufficient number of users' ratings for particular items. Two approaches are going to be discussed to find the best recommendation methodologies. The first methodology simply generates the user's profile associated with the average ratings of the movies seen by each user to each genre and the cosine similarity is used to find the movies most similar to the user preferences. The second methodology is a regularized linear regression model to generate the user's profile features from the ratings and the movie features so that the ratings of the movies not yet seen by each user can be predicted using these users' profiles.

Item features average method

The approach is really simple and we are going to explain it using the features that describe the movies in the MovieLens example, as discussed previously. The objective of the method is to generate the movie genres' preferences vector for each user i (length equal to G). This is done by calculating the average rating and each genre entry g; is given by the sum of ratings of the movies seen by user i (Mi) containing the genre g, minus the average and divided by the number of movies containing genre g:

Here, I_kg is 1 if the movie k contains genre g; otherwise it is 0.

The vectors are then compared to the binary vectors mj using the cosine similarity and the movies with the highest similarity values are recommended to the user i. The implementation of the method is given by the following Python class:

The constructor stores the list of the movie titles in Movieslist and the movie features in the Movies vector, and the GetRecMovies function generates the user genres' preferences vector, that is, (applying the preceding formula) called features_u, and returns the most similar items to this vector.

Regularized linear regression method

The method learns the movie preferences of the users as parameters of a linear model, with , where N is the number of users and G is the number of features (movie genres) of each item. We add an intercept value on the user parameters θ_i (θ_i0 = 1) and also the movie vector m_j that has the same value m_j0=1, and so . To learn the vectors of parameters q_i , we solve the following regularized minimization problem:

Here, I_ij is 1; that is, user i watched the movie, otherwise j is 0 and λ is the regularization parameter (see Chapter 3, Supervised Machine Learning).

The solution is given by applying gradient descent (see Chapter 3, Supervised Machine Learning). For each user i:

(k=0)
(k>0)

Since we are adding 1 entry to the movie and user vectors respectively, the distinction between learning the intercept parameter (k=0) and the others is necessary (there is no possibility of overfitting on the intercept, so no need to regularize on it). After the parameters q_i are learned, the recommendation is performed by simply applying for any missing rating r_ij in the formula .

The method is implemented by the following code:

The constructor of the class CBF_regression just performs the gradient descent to find the parameters θ_i (called Pmatrix) while the function CalcRatings finds the most similar rating vector in the stored utility matrix R (in case the user is not present in the utility matrix) and then it uses the corresponding parameters' vector to predict the missing ratings.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CBF methods

Create new playlist

Sign In

Sign Up

CBF methods

Item features average method

Regularized linear regression method

Table of Contents for
CBF methods