Using the Slope One algorithm

We will now study the Slope One algorithm for collaborative filtering. Also, we will demonstrate how we can implement it concisely in Clojure.

The Slope One algorithm is one of the simplest forms of item-based collaborative filtering, which is essentially a collaborative filtering technique in which the users explicitly rate each item they like (for more information, refer to Slope One Predictors for Online Rating-Based Collaborative Filtering). Generally, item-based collaborative filtering techniques will use the user's ratings and past behavior of users to estimate a simple regression model for each user. Thus, we estimate a function Using the Slope One algorithm for all users Using the Slope One algorithm in the system.

Slope One algorithm uses a simpler predictor Using the Slope One algorithm to model the regression pattern of a user's behavior, and is thus less computationally expensive. The parameter Using the Slope One algorithm can be estimated by calculating the differences in user ratings between two items. Since the definition of the Slope One algorithm is simple, it can be implemented easily and efficiently. Interestingly, this algorithm is less susceptible to overfitting than other collaborative filtering techniques.

Consider a simple recommendation system with two items and two users. We can visualize this sample data with the following table:

Using the Slope One algorithm

In the data shown in the preceding table, the difference in the ratings of Item A and Item B can be found by using the ratings provided by User 1. This difference is found to be Using the Slope One algorithm. Thus, we can add this difference to the rating of Item A by User 2 to predict his/her rating of Item B, which is equal to Using the Slope One algorithm.

Let's extend the preceding example to three items and three users. The table for this data can be visualized as follows:

Using the Slope One algorithm

In this example, the average difference in ratings between Item A and Item B for User 2 (-1) and User 1 (+2) is Using the Slope One algorithm. Hence, on average, Item A is rated better than Item B by Using the Slope One algorithm. Similarly, the average rating difference between Item A and Item C is Using the Slope One algorithm. We can predict the rating of User 3 for Item A using his/her rating for Item B and the average difference of ratings for Item A and Item B. This value comes out to Using the Slope One algorithm.

We will now describe a concise implementation the Slope One algorithm in Clojure. First off, we need to define our sample data. This can be done using a nested map, as shown in the following code:

(def ? nil)
(def data
  {"User 1" {"Item A" 5 "Item B" 3 "Item C" 2 "Item D" ?}
   "User 2" {"Item A" 3 "Item B" 4 "Item C" ? "Item D" 4}
   "User 3" {"Item A" ? "Item B" 2 "Item C" 5 "Item D" 3}
   "User 4" {"Item A" 4 "Item B" ? "Item C" 3 "Item D" ?}})

In the preceding code shown, we bind the value nil to the ? symbol, and use it to define a nested map data, in which each key represents a user and its value represents a map of the user's ratings with the item names as the keys. We will define some of the following utility methods to help us manipulate the nested map represented by data:

(defn flatten-to-vec [coll]
  (reduce #(apply conj %1 %2)
          []
          coll))

The flatten-to-vec function defined in the preceding code simply converts a map to a flat vector using the reduce and conj functions. We can also define flatten-to-vec, by using a functional composition of the standard vec, flatten, and seq functions, as (def flatten-to-vec (comp vec flatten seq)). Since we are dealing with maps, we can define some of the following functions to map any function to the values of these maps:

(defn map-vals [f m]
  (persistent!
    (reduce (fn [m [k v]]
              (assoc! m k (f k v)))
            (transient m) m)))

(defn map-nested-vals [f m]
  (map-vals
   (fn [k1 inner-map]
     (map-vals
      (fn [k2 val] (f [k1 k2] val)) inner-map)) m))

The map-vals function defined in the preceding code can be used to mutate the values of a given map. This function uses the assoc! function to replace the value stored by a given key in the map, and it uses the reduce function to compose and apply the assoc! function to all the key-value pairs in the map. In Clojure, most collections, including maps, are persistent and immutable. Note the use of the transient function to convert a persistent and immutable map into a mutable one and the use of the persistent! function that converts a transient mutable collection to a persistent one. By isolating mutation, the performance of this function is improved while retaining the guarantee of immutability for the code that uses this function. The map-nested-vals function defined in the preceding code simply applies the map-vals function to the second level values in a nested map.

We can examine the behavior of the map-vals and map-nested-vals functions in the REPL as follows:

user> (map-vals #(inc %2) {:foo 1 :bar 2})
{:foo 2, :bar 3}

user> (map-nested-vals (fn [keys v] (inc v)) {:foo {:bar 2}})
{:foo {:bar 3}}

As shown the preceding REPL output, the inc function is applied to the values of maps {:foo 1 :bar 2} and {:foo {:bar 3}}. We now define a function to produce a trained model from the sample data by using the Slope One algorithm, as follows:

(defn train [data]
  (let [diff-map      (for [[user preferences] data]
                        (for [[i u-i] preferences
                              [j u-j] preferences
                              :when (and (not= i j)
                                         u-i u-j)]
                          [[i j] (- u-i u-j)]))
        diff-vec      (flatten-to-vec diff-map)
        update-fn     (fn [[freqs-so-far diffs-so-far]
                           [item-pair diff]]
                        [(update-in freqs-so-far
                                    item-pair (fnil inc 0))
                         (update-in diffs-so-far
                                    item-pair (fnil + 0) diff)])
        [freqs
         total-diffs] (reduce update-fn
                              [{} {}] diff-vec)
        differences   (map-nested-vals
                       (fn [item-pair diff]
                         (/ diff (get-in freqs item-pair)))
                       total-diffs)]
    {:freqs freqs
     :differences differences}))

The train function defined in the preceding code first finds the differences between the ratings of all the items in the model using the for macro, and then it adds the frequencies of ratings of the items and the differences in their ratings using the update-fn closure.

Note

The main difference between a function and a macro is that a macro doesn't evaluate its parameters while being executed. Also, macros are resolved and expanded at compile time and functions are called at runtime.

The update-fn function uses the update-in function to replace the value of a key in a map. Note the use of the fnil function, which essentially returns a function that checks for the value nil and replaces it with the second argument. This is used to treat the values represented by the ? symbol that has the value nil in the nested map data. Lastly, the train function applies the map-nested-vals and get-in functions to the map of rating differences returned in the previous step. Finally, it returns a map with the keys :freqs and :differences, which contain maps that represent the frequencies of items and differences in ratings with respect to other items in the model respectively. We can now use this trained model to predict the ratings of the given items by various users. To do this, we will implement a function in the following code that uses the value returned by the train function defined in the preceding code:

(defn predict [{:keys [differences freqs]
                :as model}
               preferences
               item]
  (let [get-rating-fn (fn [[num-acc denom-acc]
                           [i rating]]
                        (let [freqs-ji (get-in freqs [item i])]
                          [(+ num-acc
                              (* (+ (get-in differences [item i])
                                    rating)
                                 freqs-ji))
                           (+ denom-acc freqs-ji)]))]
    (->> preferences
         (filter #(not= (first %) item))
         (reduce get-rating-fn [0 0])
         (apply /))))

The predict function defined in the preceding code uses the get-in function to retrieve the sum of frequencies and differences of each item in the maps returned by the train function. This function then averages these rating differences by using a composition of the reduce and / (division) functions. The behavior of the predict function can be examined in the REPL, as shown in the following code:

user> (def trained-model (train data))
#'user/trained-model
user> (predict trained-model {"Item A" 2} "Item B")
3/2

As shown in the preceding REPL output, the predict function used the value returned by the train function to predict the rating of Item B by a user who has given Item A a rating of 2. The predict function estimates the rating of Item B as 3/2. We can now implement a function in the following code that wraps around the predict function to find the ratings of all items in our model:

(defn mapmap
  ([vf s]
     (mapmap identity vf s))
  ([kf vf s]
     (zipmap (map kf s)
             (map vf s))))

(defn known-items [model]
  (-> model :differences keys))

(defn predictions
  ([model preferences]
     (predictions
      model
      preferences
      (filter #(not (contains? preferences %))
              (known-items model))))
  ([model preferences items]
     (mapmap (partial predict model preferences)
             items)))

The mapmap function defined in the preceding code simply applied two functions to a given sequence and returns a map with keys that are created using the first function kf and with a value generated by the second function vf. If only a single function is passed to the mapmap function, it uses the identity function to generate keys in the map returned by it. The known-items function defined in the preceding code will determine all items in a model using the keys function on the map represented by the :differences key in the value returned by the train function. Finally, the predictions function uses the value returned by the train and known-items functions to determine all items in the model and then predict all unrated items for a particular user. The function also takes an optional third argument, which is a vector of the item names whose ratings are to be predicted, in order to return the predictions of all items with names present in the vector items.

Now, we can examine the behavior of the preceding function in the REPL, as follows:

user> (known-items trained-model)
("Item D" "Item C" "Item B" "Item A")

As shown in the preceding output, the known-items function returns the names of all items in the model. We can now try out the predictions function, as follows:

user> (predictions trained-model {"Item A" 2} ["Item C" "Item D"])
{"Item D" 3, "Item C" 0}
user> (predictions trained-model {"Item A" 2})
{"Item B" 3/2, "Item C" 0, "Item D" 3}

Note that when we skip the last optional parameter of the predictions function, the map returned by this function will contain all items that are not previously rated by a particular user. This can be asserted in the REPL by using the keys function, as follows:

user> (keys  (predictions trained-model {"Item A" 2}))
("Item B" "Item C" "Item D")

To conclude, we have demonstrated how we can implement the Slope One algorithm using nested maps and standard Clojure functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.28.179