We will now study the Slope One algorithm for collaborative filtering. Also, we will demonstrate how we can implement it concisely in Clojure.
The Slope One algorithm is one of the simplest forms of item-based collaborative filtering, which is essentially a collaborative filtering technique in which the users explicitly rate each item they like (for more information, refer to Slope One Predictors for Online Rating-Based Collaborative Filtering). Generally, item-based collaborative filtering techniques will use the user's ratings and past behavior of users to estimate a simple regression model for each user. Thus, we estimate a function for all users in the system.
Slope One algorithm uses a simpler predictor to model the regression pattern of a user's behavior, and is thus less computationally expensive. The parameter can be estimated by calculating the differences in user ratings between two items. Since the definition of the Slope One algorithm is simple, it can be implemented easily and efficiently. Interestingly, this algorithm is less susceptible to overfitting than other collaborative filtering techniques.
Consider a simple recommendation system with two items and two users. We can visualize this sample data with the following table:
In the data shown in the preceding table, the difference in the ratings of Item A and Item B can be found by using the ratings provided by User 1. This difference is found to be . Thus, we can add this difference to the rating of Item A by User 2 to predict his/her rating of Item B, which is equal to .
Let's extend the preceding example to three items and three users. The table for this data can be visualized as follows:
In this example, the average difference in ratings between Item A and Item B for User 2 (-1) and User 1 (+2) is . Hence, on average, Item A is rated better than Item B by . Similarly, the average rating difference between Item A and Item C is . We can predict the rating of User 3 for Item A using his/her rating for Item B and the average difference of ratings for Item A and Item B. This value comes out to .
We will now describe a concise implementation the Slope One algorithm in Clojure. First off, we need to define our sample data. This can be done using a nested map, as shown in the following code:
(def ? nil) (def data {"User 1" {"Item A" 5 "Item B" 3 "Item C" 2 "Item D" ?} "User 2" {"Item A" 3 "Item B" 4 "Item C" ? "Item D" 4} "User 3" {"Item A" ? "Item B" 2 "Item C" 5 "Item D" 3} "User 4" {"Item A" 4 "Item B" ? "Item C" 3 "Item D" ?}})
In the preceding code shown, we bind the value nil
to the ?
symbol, and use it to define a nested map data
, in which each key represents a user and its value represents a map of the user's ratings with the item names as the keys. We will define some of the following utility methods to help us manipulate the nested map represented by data
:
(defn flatten-to-vec [coll] (reduce #(apply conj %1 %2) [] coll))
The flatten-to-vec
function defined in the preceding code simply converts a map to a flat vector using the reduce
and conj
functions. We can also define flatten-to-vec
, by using a functional composition of the standard vec
, flatten
, and seq
functions, as (def flatten-to-vec (comp vec flatten seq))
. Since we are dealing with maps, we can define some of the following functions to map any function to the values of these maps:
(defn map-vals [f m] (persistent! (reduce (fn [m [k v]] (assoc! m k (f k v))) (transient m) m))) (defn map-nested-vals [f m] (map-vals (fn [k1 inner-map] (map-vals (fn [k2 val] (f [k1 k2] val)) inner-map)) m))
The map-vals
function defined in the preceding code can be used to mutate the values of a given map. This function uses the assoc!
function to replace the value stored by a given key in the map, and it uses the reduce
function to compose and apply the assoc!
function to all the key-value pairs in the map. In Clojure, most collections, including maps, are persistent and immutable. Note the use of the transient
function to convert a persistent and immutable map into a mutable one and the use of the persistent!
function that converts a transient mutable collection to a persistent one. By isolating mutation, the performance of this function is improved while retaining the guarantee of immutability for the code that uses this function. The map-nested-vals
function defined in the preceding code simply applies the map-vals
function to the second level values in a nested map.
We can examine the behavior of the map-vals
and map-nested-vals
functions in the REPL as follows:
user> (map-vals #(inc %2) {:foo 1 :bar 2}) {:foo 2, :bar 3} user> (map-nested-vals (fn [keys v] (inc v)) {:foo {:bar 2}}) {:foo {:bar 3}}
As shown the preceding REPL output, the inc
function is applied to the values of maps {:foo 1 :bar 2}
and {:foo {:bar 3}}
. We now define a function to produce a trained model from the sample data by using the Slope One algorithm, as follows:
(defn train [data] (let [diff-map (for [[user preferences] data] (for [[i u-i] preferences [j u-j] preferences :when (and (not= i j) u-i u-j)] [[i j] (- u-i u-j)])) diff-vec (flatten-to-vec diff-map) update-fn (fn [[freqs-so-far diffs-so-far] [item-pair diff]] [(update-in freqs-so-far item-pair (fnil inc 0)) (update-in diffs-so-far item-pair (fnil + 0) diff)]) [freqs total-diffs] (reduce update-fn [{} {}] diff-vec) differences (map-nested-vals (fn [item-pair diff] (/ diff (get-in freqs item-pair))) total-diffs)] {:freqs freqs :differences differences}))
The train function defined in the preceding code first finds the differences between the ratings of all the items in the model using the for
macro, and then it adds the frequencies of ratings of the items and the differences in their ratings using the update-fn
closure.
The update-fn
function uses the update-in
function to replace the value of a key in a map. Note the use of the fnil
function, which essentially returns a function that checks for the value nil
and replaces it with the second argument. This is used to treat the values represented by the ?
symbol that has the value nil
in the nested map data. Lastly, the train
function applies the map-nested-vals
and get-in
functions to the map of rating differences returned in the previous step. Finally, it returns a map with the keys :freqs
and :differences
, which contain maps that represent the frequencies of items and differences in ratings with respect to other items in the model respectively. We can now use this trained model to predict the ratings of the given items by various users. To do this, we will implement a function in the following code that uses the value returned by the train
function defined in the preceding code:
(defn predict [{:keys [differences freqs] :as model} preferences item] (let [get-rating-fn (fn [[num-acc denom-acc] [i rating]] (let [freqs-ji (get-in freqs [item i])] [(+ num-acc (* (+ (get-in differences [item i]) rating) freqs-ji)) (+ denom-acc freqs-ji)]))] (->> preferences (filter #(not= (first %) item)) (reduce get-rating-fn [0 0]) (apply /))))
The predict
function defined in the preceding code uses the get-in
function to retrieve the sum of frequencies and differences of each item in the maps returned by the train
function. This function then averages these rating differences by using a composition of the reduce
and /
(division) functions. The behavior of the predict
function can be examined in the REPL, as shown in the following code:
user> (def trained-model (train data)) #'user/trained-model user> (predict trained-model {"Item A" 2} "Item B") 3/2
As shown in the preceding REPL output, the predict
function used the value returned by the train
function to predict the rating of Item B
by a user who has given Item A
a rating of 2
. The predict
function estimates the rating of Item B
as 3/2
. We can now implement a function in the following code that wraps around the predict
function to find the ratings of all items in our model:
(defn mapmap ([vf s] (mapmap identity vf s)) ([kf vf s] (zipmap (map kf s) (map vf s)))) (defn known-items [model] (-> model :differences keys)) (defn predictions ([model preferences] (predictions model preferences (filter #(not (contains? preferences %)) (known-items model)))) ([model preferences items] (mapmap (partial predict model preferences) items)))
The mapmap
function defined in the preceding code simply applied two functions to a given sequence and returns a map with keys that are created using the first function kf
and with a value generated by the second function vf
. If only a single function is passed to the mapmap
function, it uses the identity
function to generate keys in the map returned by it. The known-items
function defined in the preceding code will determine all items in a model using the keys function on the map represented by the :differences
key in the value returned by the train
function. Finally, the predictions
function uses the value returned by the train
and known-items
functions to determine all items in the model and then predict all unrated items for a particular user. The function also takes an optional third argument, which is a vector of the item names whose ratings are to be predicted, in order to return the predictions of all items with names present in the vector items
.
Now, we can examine the behavior of the preceding function in the REPL, as follows:
user> (known-items trained-model) ("Item D" "Item C" "Item B" "Item A")
As shown in the preceding output, the known-items
function returns the names of all items in the model. We can now try out the predictions function, as follows:
user> (predictions trained-model {"Item A" 2} ["Item C" "Item D"]) {"Item D" 3, "Item C" 0} user> (predictions trained-model {"Item A" 2}) {"Item B" 3/2, "Item C" 0, "Item D" 3}
Note that when we skip the last optional parameter of the predictions
function, the map returned by this function will contain all items that are not previously rated by a particular user. This can be asserted in the REPL by using the keys
function, as follows:
user> (keys (predictions trained-model {"Item A" 2})) ("Item B" "Item C" "Item D")
To conclude, we have demonstrated how we can implement the Slope One algorithm using nested maps and standard Clojure functions.
18.118.28.179