Distributed Processing

In the last chapter, we introduced the concept of parallel processing and learned how to leverage multicore processors and GPUs. Now, we can step up our game a bit and turn our attention on distributed processing, which involves executing tasks across multiple machines to solve a certain problem.

In this chapter, we will illustrate the challenges, use cases, and examples of how to run code on a cluster of computers. Python offers easy-to-use and reliable packages for distribute processing, which will allow us to implement scalable and fault-tolerant code with relative ease.

The list of topics for this chapter is as follows:

  • Distributed computing and the MapReduce model
  • Directed Acyclic Graphs with Dask
  • Writing parallel code with Dask's array, Bag, and DataFrame data structures
  • Distributing parallel algorithms with Dask Distributed
  • An introduction to PySpark
  • Spark's Resilient Distributed Datasets and DataFrame
  • Scientific computing with mpi4py
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.153.212