Summary

You've learned that there is room for additional machine learning frameworks and libraries on top of Apache Spark and that a cost-based optimizer similar to what we are already using in Catalyst can speed things up tremendously. In addition, separation from performance optimizations code and code for the algorithm facilitates further improvements on the algorithm side without having to care about performance at all.

Additionally, these execution plans are highly adaptable to the size of the data and also to the available hardware configuration based on main memory size and potential accelerators such as GPUs. Apache SystemML dramatically improves on the life cycle of machine learning applications, especially if machine learning algorithms are not used out of the box, but an experienced data scientists works on low level details on it in a mathematical or statistical programming language.

In Apache SystemML, this low level, mathematical code can be used out of the box, without any manual transformation or translation to other programming languages . It can be executed on top of Apache Spark.

In the next chapter we'll cover DeepLearning and how it can be used on top of Apache Spark. This is one of the hottest topics in 2017, so stay tuned!

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary