Dividing the DataFrame

Another prerequisite is to divide the dataset into train and test subsets. These subsets will be used in later sections to implement our recommendation systems and to measure the performance. The evaluationScheme() function from the recommenderlab library can be used to split the dataset into training and testing subsets. A number of user-specified parameters can be passed to this function. In the following code, realRatingMatrix is split according to an 80/20 training/testing split, with up to 20 items recommended for each user. Furthermore, we specify that any rating greater than 0 is to be considered a positive rating, in conformance with the predefined [-10, 10] rating scale. The Jester5k dataset can be divided into the train and test datasets with the following code:

# split the data into the training and the test set
Jester5k_es <- evaluationScheme(Jester5k, method="split", train=0.8, given=20, goodRating=0)
# verifying if the train - test was done successfully
print(Jester5k_es)

This will result in the following output:

Evaluation scheme with 20 items given
Method: ‘split’ with 1 run(s).
Training set proportion: 0.800
Good ratings: >=0.000000
Data set: 5000 x 100 rating matrix of class ‘realRatingMatrix’ with 362106 ratings.

From the output of the evaluationScheme() function, we can observe that the function yielded a single R object containing both the training and test subsets. This object will be used to define and evaluate a variety of recommender models.

Table of Contents for Dividing the DataFrame

Create new playlist

Sign In

Sign Up

Table of Contents for
Dividing the DataFrame