Processes and threads

Threads can be compared to light processes, in the sense that they offer advantages similar to those of processes, without, however, requiring the typical communication techniques of processes. Threads allow you to divide the main control flow of a program into multiple concurrently running control streams. Processes, by contrast, have their own addressing space and their own resources. It follows that communication between parts of code running on different processes can only take place through appropriate management mechanisms, including pipes, code FIFO, mailboxes, shared memory areas, and message passing. Threads, on the other hand, allow the creation of concurrent parts of the program, in which each part can access the same address space, variables, and constants.

The following table summarizes the main differences between threads and processes:

Threads

Processes

Share memory.

Do not share memory.

Start/change are computationally less expensive.

Start/change are computationally expensive.

Require fewer resources (light processes).

Require more computational resources.

Need synchronization mechanisms to handle data correctly.

No memory synchronization is required.

 

After this brief introduction, we can finally show how processes and threads operate.

In particular, we want to compare the serial, multithread, and multiprocess execution times of the following function, do_something, which performs some basic calculations, including building a list of integers selected randomly (a do_something.py file):

import random

def do_something(count, out_list):
for i in range(count):
out_list.append(random.random())

Next, there is the serial (serial_test.py) implementation. Let's start with the relevant imports:

from do_something import *
import time

Note the importing of the module time, which will be used to evaluate the execution time, in this instance, and the serial implementation of the do_something function. size of the list to build is equal to 10000000, while the do_something function will be executed 10 times:

if __name__ == "__main__":
start_time = time.time()
size = 10000000
n_exec = 10
for i in range(0, exec):
out_list = list()
do_something(size, out_list)

print ("List processing complete.")
end_time = time.time()
print("serial time=", end_time - start_time)

Next, we have the multithreaded implementation (multithreading_test.py).

Import the relevant libraries:

from do_something import *
import time
import threading

Note the import of the threading module in order to operate with the multithreading capabilities of Python.

Here, there is the multithreading execution of the do_something function. We will not comment in-depth on the instructions in the following code, as they will be discussed in more detail in Chapter 2Thread-Based Parallelism.

However, it should be noted in this case, too, that the length of the list is obviously the same as in the serial case, size = 10000000, while the number of threads defined is 10, threads = 10, which is also the number of times the do_something function must be executed:

if __name__ == "__main__":
start_time = time.time()
size = 10000000
threads = 10
jobs = []
for i in range(0, threads):

Note also the construction of the single thread, through the threading.Thread method:

out_list = list()
thread = threading.Thread(target=list_append(size,out_list))
jobs.append(thread)

The sequence of cycles in which we start executing threads and then stop them immediately afterwards is as follows:

    for j in jobs:
j.start()
for j in jobs:
j.join()

print ("List processing complete.")
end_time = time.time()
print("multithreading time=", end_time - start_time)

Finally, there is the multiprocessing implementation (multiprocessing_test.py).

We start by importing the necessary modules and, in particular, the multiprocessing library, whose features will be explained in-depth in Chapter 3, Process-Based Parallelism:

from do_something import *
import time
import multiprocessing

As in the previous cases, the length of the list to build, the size, and the execution number of the do_something function remain the same (procs = 10):

if __name__ == "__main__":
start_time = time.time()
size = 10000000
procs = 10
jobs = []
for i in range(0, procs):
out_list = list()

Here, the implementation of a single process through the multiprocessing.Process method call is affected as follows:

        process = multiprocessing.Process
(target=do_something,args=(size,out_list))
jobs.append(process)

Next, the sequence of cycles in which we start executing processes and then stop them immediately afterwards is executed as follows:

    for j in jobs:
j.start()

for j in jobs:
j.join()

print ("List processing complete.")
end_time = time.time()
print("multiprocesses time=", end_time - start_time)

Then, we open the command shell and run the three functions described previously. 

Go to the folder where the functions have been copied and then type the following:

> python serial_test.py

The result, obtained on a machine with the following features—CPU Intel i7/8 GB of RAM, is as follows:

List processing complete.
serial time= 25.428767204284668

In the case of the multithreading implementation, we have the following:

> python multithreading_test.py

The output is as follows:

List processing complete.
multithreading time= 26.168917179107666

Finally, there is the multiprocessing implementation:

> python multiprocessing_test.py

Its result is as follows:

List processing complete.
multiprocesses time= 18.929869890213013

As can be seen, the results of the serial implementation (that is, using serial_test.py) are similar to those obtained with the implementation of multithreading (using multithreading_test.py) where the threads are essentially launched one after the other, giving precedence to the one over the other until the end, while we have benefits in terms of execution times using the Python multiprocessing capability (using multiprocessing_test.py).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.104.238