Using the mlpack library

The mlpack library is a general-purpose machine learning library that has a lot of different algorithms inside and command-line tools to process the data and learn these algorithms without explicit programming. As a basis, this library uses the Armadillo linear algebra library for math calculations. Other libraries we've used in previous chapters don't have the collaborative filtering algorithm implementations.

To load the MovieLens dataset, use the same loading helper function as in the previous section. After the data is loaded, convert it to a format suitable for an object of the mlpack::cf::CFType type. This type implements a collaborative filtering algorithm and can be configured with different types of matrix factorization approaches. The object of this type can use dense as well as sparse rating matrices. In the case of a dense matrix, it should have three rows. The first row corresponds to users, the second row corresponds to items, and the third row corresponds to the rating. This structure is called a coordinate list format. In the case of the sparse matrix, it should be a regular (user, item) table, as in the previous example. So, let's define the sparse matrix for ratings. It should have the arma::SpMat<DataType> type from the Armadillo library, as illustrated in the following code block:

 arma::SpMat<DataType> ratings_matrix(ratings.size(), movies.size());
std::vector<std::string> movie_titles;
{
// fill matrix
movie_titles.resize(movies.size());

size_t user_idx = 0;
for (auto& r : ratings) {
for (auto& m : r.second) {
auto mi = movies.find(m.first);
auto movie_idx = std::distance(movies.begin(), mi);
movie_titles[static_cast<size_t>(movie_idx)] = mi->second;
ratings_matrix(user_idx, movie_idx) =
static_cast<DataType>(m.second);
}
++user_idx;
}
}

Now, we can initialize the mlpack::cf::CFType class object. It takes the next parameters in the constructor: the rating matrix, the matrix decomposition policy, the number of neighbors, the number of target factors, the number of iterations, and the minimum value of learning error, after which the algorithm can stop.

For this object, do the nearest neighbor search only on the H matrix. This means you avoid calculating the full rating matrix, using the observation that if the rating matrix is X = W H, then the following applies:

distance(X.col(i), X.col(j)) = distance(W H.col(i), W H.col(j))

This expression can be seen as the nearest neighbor search on the H matrix with the Mahalanobis distance, as illustrated in the following code block:

 // factorization rank
size_t n_factors = 100;
size_t neighborhood = 50;

mlpack::cf::NMFPolicy decomposition_policy;

// stopping criterions
size_t max_iterations = 20;
double min_residue = 1e-3;

mlpack::cf::CFType cf(ratings_matrix,
decomposition_policy,
neighborhood,
n_factors,
max_iterations,
min_residue);

Notice that as a decomposition policy, the object of the mlpack::cf::NMFPolicy type was used. This is the non-negative matrix factorization algorithm with the ALS approach. There are several decomposition algorithms in the mlpack library. For example, there is a batch SVD decomposition implemented in the mlpack::cf::BatchSVDPolicy type. The constructor of this object also does the complete training, so after its call has finished, we can use this object to get recommendations. Recommendations can be retrieved with the GetRecommendations method. This method gets the number of recommendations you want to get, the output matrix for recommendations, and the list of user IDs for users you want to get recommendations from, as shown in the following code block:

 arma::Mat<size_t> recommendations;
// Get 5 recommendations for specified users.
arma::Col<size_t> users;
users << 1 << 2 << 3;

cf.GetRecommendations(5, recommendations, users);

for (size_t u = 0; u < recommendations.n_cols; ++u) {
std::cout << "User " << users(u) << " recommendations are: ";
for (size_t i = 0; i < recommendations.n_rows; ++i) {
std::cout << movie_titles[recommendations(i, u)] << ";";
}
std::cout << std::endl;
}

Notice that the GetRecommendations method returns the item IDs as its output. So, we can see that using this library for implementing a recommender system is much easier than writing it from scratch. Also, there are many more configuration options in the mlpack library for building such systems; for example, we can configure the neighbor detection policy and which distance measure to use. These configurations can significantly improve the quality of the system you build because you can make them according to your own particular task.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.237.29