Chapter 16. Parallelism in Scala and Akka

Data analysts, scientists, and software engineers have been facing a serious challenge: the explosion of the amount of data required to build reliable models. After all, how valuable is a data mining application if the model does not scale?

The challenge of big data is addressed through a two-facet strategy: improving the efficiency of existing data mining and machine learning solutions, and leveraging scalable infrastructure (frameworks, programming languages, GPUs, and so on).

This chapter covers the Scala parallel collections, the Actor model, and the Akka framework. The next chapter introduces the Apache Spark framework and its collection of machine learning algorithms.

The following are the topics addressed in this chapter:

  • Introduction to Scala parallel collections
  • Evaluation of the performance of a parallel collection on a multicore CPU
  • The Actor model and reactive systems
  • Clustered and reliable distributed computing using Akka
  • Design of computational workflow using Akka routers

Overview

The support for distributing and concurrent processing is provided by different stacked frameworks and libraries. Scala concurrent and parallel collections classes leverage the threading capabilities of the Java virtual machine. Akka.io implements a reliable action model originally introduced as part of the Scala standard library. The Akka framework supports remote Actors, routing, and load balancing protocol; dispatchers, clusters, events, and configurable mailboxes management; and support for different transport modes, supervisory strategies, and typed Actors.

The following stack representation illustrates the interdependencies between frameworks:

Overview

Stack representation of Scalable frameworks using Scala

The next chapter introduces the Apache Spark framework.

Each layer adds a new functionality to the previous one to increase scalability. The Java Virtual Machine (JVM) runs as a process within a single host. Scala concurrent classes support effective deployment of an application by leveraging multi core CPU capabilities without the need to write multithreaded applications. Akka extends the Actor paradigm to clusters with advanced messaging and routing options. Finally, Apache Spark leverages Scala higher-order collection methods and the Akka implementation of the Actor model to provide large-scale data processing systems with better performance and reliability, through its resilient distributed datasets and in-memory persistency.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.116.51