Although this method is not used often in many commercial recommendation systems, association rules learning is certainly a method worth knowing about because of historical data reasons, and it can be employed to solve a wide range of problems in real-world examples. The main concept of this method is to find relationships among items based on some statistical measure of the occurrences of the items in the database of transactions T (for example, a transaction could be the movies seen by a user i or the products bought by i). More formally, a rule could be {item1,item2} => {item3}, that is, a set of items ({item1,item2}) implies the presence of another set ({item3}). Two definitions are used to characterize each X=>Y rule:
Support represents the frequency of a certain rule on the transaction database, while the confidence indicates the probability that set Y will occur if set X is present. In other words, the support value is chosen to filter the number of rules we want to mine from the database (the higher the support, the fewer rules will satisfy the condition), while the confidence can be thought of as a similarity metric between sets X and Y. In the case of the movie recommendation system, the transaction database can be generated from the utility matrix R considering the movies each user likes, and we look for rules composed by sets X and Y that contain only one item (movie). These rules are collected in a matrix, ass_matrix
, in which each entry ass_matrixij represents the confidence of the rule i =>j. The recommendations for the given user are obtained by simply multiplying the ass_matrix
by his ratings u_vec
: , and sorting all the values by the largest value corresponding to the most recommended movie to the least. Therefore, this method does not predict the ratings, but the list of movie recommendations; however, it is fast and it also works well with a sparse utility matrix. Note that to find all the possible combinations of items to form sets X and Y as fast as possible, two algorithms have been developed in the literature: apriori and fp-growth (not discussed here since we only require rules with one item per set X and Y).
The class that implements the method is as follows:
The class constructor takes as input parameters the utility matrix Umatrix
, the movie titles list Movieslist
, the support min_support
, confidence min_confidence
thresholds (default 0.1
), and the likethreshold
, which is the minimum rating value to consider a movie in a transaction (default 3
). The function combine_lists
finds all the possible rules, while filterSet
just reduces the rules to the subset that satisfies the minimum support threshold. calc_confidence_matrix
fills the ass_matrix
with the confidence value that satisfies the minimum threshold (otherwise 0
is set by default) and GetRecItems
returns the list of recommended movies given the user ratings u_vec
.
3.133.133.233