An example - alternating least squares

A recommender system tries to predict the potential items that a user might be interested in, based on a history from other users.

So let's consider a so-called item-user or product-customer matrix, as illustrated here:

This is a so-called sparse matrix because only a couple of cells are populated with non-zero values indicating a match between a customer i and a product j. Either by just putting a one in the cell or any other numerical value, for example, indicating the number of products bought or a rating for that particular product j from customer i. Let's call this matrix r_ui, where u stands for user and i for item.

Those of you familiar with linear algebra might know that any matrix can be factorized by two smaller matrices. This means that you have to find two matrices p_u and q_i that, when multiplied with each other, reconstruct the original matrix r_ui; let's call the reconstruction r_ui'. The goal is to find p_u and q_i to reconstruct r_ui' such that it doesn't differ too much from r_ui. This is done using a sum of squared errors objective.

The following figure illustrates this and the sparsity property of the matrix:

Once we've found good factors p_u and q_i, we can construct r_ui' and, finally, new non-zero cells will be present, which become the new predicted product suggestions. In case you haven't understood all the details, don't worry, as we don't need too much of this example to understand the rest of this chapter.

A common algorithm to find p_u and q_i is called alternating least squares (ALS)--alternating because in each iteration the optimization objective switches from p_u to q_i and vice versa. Don't get bothered with it too much, but this is how it actually works, and, in Apache Spark MLlib, this is just a single line of Scala code:

So what's wrong with this? Before we explain this, let's take a look at how ALS is implemented in a statistical programming language such as R:

Again, don't worry if you don't understand each line, but the purpose of this figure is to show you that in R, this algorithm needs only 27 lines of code to be expressed. If we now take a look at the ALS implementation in MLlib, we'll see that it has more than 800 lines. You can find this implementation at https://github.com/apache/spark/tree/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation.

So why do we need more than 800 lines in Scala on Spark and only 27 in R? This is because of performance optimizations. The ALS implementation in MLlib consists of more than 50% of performance optimization code. So what if we could perform the following?

Get rid of all performance optimizations in our algorithm implementation
Port our R code 1:1 to some parallel framework
In case of changes, just change our R implementation

This is where Apache SystemML kicks in, it supports all this. Apache SystemML's DSL (domain specific language) is a subset of R syntax, so you can just take the previous example and run it 1:1 without any modification on top of Apache SystemML. In addition, a cost-based performance optimizer generates a physical execution plan on top of Apache Spark in order to minimize execution time based on size properties of your data. So let's find out how this works.

Table of Contents for An example - alternating least squares

Create new playlist

Sign In

Sign Up

Table of Contents for
An example - alternating least squares