Multithreading versus multiprocessing

Now that we have come to the end of our discussion on multiprocessing, it is a good time to compare and contrast the scenarios where one needs to choose between scaling using threads in a single process or using multiple processes in Python.

Here are some guidelines.

Use multithreading in the following cases:

  1. The program needs to maintain a lot of shared states, especially mutable ones. A lot of the standard data structures in Python, such as lists, dictionaries, and others, are thread-safe, so it costs much less to maintain a mutable shared state using threads than via processes.
  2. The program needs to keep a low memory foot-print.
  3. The program spends a lot of time doing I/O. Since the GIL is released by threads doing I/O, it doesn't affect the time taken by the threads to perform I/O.
  4. The program doesn't have a lot of data-parallel operations which it can scale across multiple processes

Use multiprocessing in these scenarios:

  • The program performs a lot of CPU-bound heavy computing such as byte-code operations, number crunching, and the like on reasonably large inputs.
  • The program has inputs which can be parallelized into chunks and whose results can be combined afterwards—in other words, the input of the program yields well to data-parallel computations.
  • The program doesn't have any limitations on memory usage, and you are on a modern machine with a multicore CPU and large enough RAM.
  • There is not much shared mutable state between processes that need to be synchronized—this can slow down the system, and offset any benefits gained from multiple processes.
  • Your program is not heavily dependent on I/O—file or disk I/O or socket I/O.

Concurrecy in Python – Asynchronous Execution

We have seen two different ways to perform concurrent execution using multiple threads and multiple processes. We saw different examples of using threads and their synchronization primitives. We also saw a couple of examples using multi-processing with slightly varied outcomes.

Apart from these two ways to do concurrent programming, another common technique is that of asynchronous programming or asynchronous I/O.

In an asynchronous model of execution, tasks are picked to be executed from a queue of tasks by a scheduler, which executes these tasks in an interleaved manner. There is no guarantee that the tasks will be executed in any specific order. The order of execution of tasks depends upon how much processing time a task is willing to yield to another task in the queue. Put in other words, asynchronous execution happens through co-operative multitasking.

Asynchronous execution usually happens in a single thread. This means no true data parallelism or true parallel execution can happen. Instead, the model only provides a semblance of parallelism.

As execution happens out of order, asynchronous systems need a way to return the results of function execution to the callers. This usually happens with callbacks, which are functions to be called when the results are ready or using special objects that receive the results, often called futures.

Python 3 provides support for this kind of execution via its asyncio module using coroutines. Before we go on to discuss this, we will spend some time understanding pre-emptive multitasking versus cooperative multitasking, and how we can implement a simple cooperative multitasking scheduler in Python using generators.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.114.85