The history of Apache SystemML

Apache SystemML is already ten years old. Of course, it went through multiple refactorings and is now a state-of-the-art, and one of the fastest, machine learning libraries in the world. Recently, DeepLearning has also been added, which we will cover briefly in the following chapter on DeepLearning:

As you can see in the preceding figure, a lot of research has been done for Apache SystemML. It is two years older than Apache Spark and in 2017 it has been turned into a top level Apache project, leaving incubator status. Even during the time SystemML was started, the researchers at IBM Research Almaden realized that, very often, out-of-the box machine learning algorithms perform very poorly on large Datasets.

So, the data analysis pipeline had to be tuned after a small-scale version of it had been prototyped. The following figure illustrates this:

This means that the data scientist will prototype his application in a programming language of his choice, most likely Matlab, R or python and, finally, a systems programmer will pick this up and re-implement this in a JVM language like Java or Scala, which usually turns out to provide better performance and also linearly scales on data parallel framework like Apache Spark.

The scaled version of the prototype will return results on the whole Dataset and the data scientist again is in charge of modifying the prototype and the whole cycle begins again. Not only the IBM Almaden Research staff members have experienced this, but even our team has seen it. So let's make the systems programmer redundant (or at least require him only to take care of our Apache Spark jobs) using Apache SystemML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.220