Synchronization and locks

Even if multiprocessing uses processes (with their own independent memory), it lets you define certain variables and arrays as shared memory. You can define a shared variable using multiprocessing.Value, passing its data type as a string (i integer, d double, f float, and so on). You can update the content of the variable through the value attribute, as shown in the following code snippet:

    shared_variable = multiprocessing.Value('f') 
shared_variable.value = 0

When using shared memory, you should be aware of concurrent accesses. Imagine that you have a shared integer variable and each process increments its value multiple times. You will define a process class as follows:

    class Process(multiprocessing.Process): 

def __init__(self, counter):
super(Process, self).__init__()
self.counter = counter

def run(self):
for i in range(1000):
self.counter.value += 1

You can initialize the shared variable in the main program and pass it to 4 processes, as shown in the following code:

    def main(): 
counter = multiprocessing.Value('i', lock=True)
counter.value = 0

processes = [Process(counter) for i in range(4)]
[p.start() for p in processes]
[p.join() for p in processes] # processes are done
print(counter.value)

If you run this program (shared.py in the code directory), you will note that the final value of counter is not 4000, but it has random values (on my machine, they are between 2000 and 2500). If we assume that the arithmetic is correct, we can conclude that there's a problem with the parallelization.

What happens is that multiple processes are trying to access the same shared variable at the same time. The situation is best explained by looking at the following figure. In a serial execution, the first process reads (the number 0), increments it, and writes the new value (1); the second process reads the new value (1), increments it, and writes it again (2).

In the parallel execution, the two processes read (0), increment it, and write the value (1) at the same time, leading to a wrong answer:


To solve this problem, we need to synchronize the access to this variable so that only one process at a time can access, increment, and write the value on the shared variable. This feature is provided by the multiprocessing.Lock class. A lock can be acquired and released through the acquire method and release, or using the lock as a context manager. Since the lock can be acquired by only one process at a time, this method prevents multiple processes from executing the protected section of code at the same time.

We can define a global lock and use it as a context manager to restrict the access to the counter, as shown in the following code snippet:

    lock = multiprocessing.Lock() 

class Process(multiprocessing.Process):

def __init__(self, counter):
super(Process, self).__init__()
self.counter = counter

def run(self):
for i in range(1000):
with lock: # acquire the lock
self.counter.value += 1
# release the lock

Synchronization primitives, such as locks, are essential to solve many problems, but they should be kept to a minimum to improve the performance of your program.

The multiprocessing module includes other communication and synchronization tools; you can refer to the official documentation at http://docs.python.org/3/library/multiprocessing.html for a complete reference.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.5.201