Race conditions

In the context of multithreading, a race condition is a situation where two or more threads try to modify a shared data structure at the same time, but due to the way the threads are scheduled and executed, the shared data structure is modified in a way that leaves it in an inconsistent state.

Is this statement confusing? No worries, let's try to understand it with an example:

Consider our previous example of the JSON to YAML converter problem. Now, let's assume that we did not use locks when we were writing the converted YAML output to the file. Now consider this: we have two threads, named writer-1 and writer-2 which are responsible for writing to the common YAML file. Now, imagine both the writer-1 and writer-2, threads have started their operations of writing to the file and, with the way the operating system scheduled the threads to execute, writer-1 starts writing to the file. Now, while the writer-1 thread was writing to the file, the operating system decided that the thread finished its quota of time and swaps that thread with the writer-2 thread. Now, one thing to note here is that the writer-1 thread had not completed writing all the data when it was swapped. Now, the writer-2 thread starts executing and completes writing of data in the YAML file. Upon completion of the writer-2 thread, the OS then starts executing the writer-1 thread again which starts to write the remaining data again to the YAML file and then finishes.

Now, when we open the YAML file, what we see is a file with data mingled up from two writer threads, and hence, leaves our file in an inconsistent state. A problem such as what happened between the writer-1 and writer-2 threads is known as a race condition.

Race conditions come under the category of problems that are very hard to debug, since the order in which the threads will execute depends on machine to machine and OS to OS. So, a problem that may occur on one deployment may not occur on another deployment.

So, how do we avoid race conditions? Well, we already have the answers to the question and we have just recently used them. So, let's take a look at some of the ways in which race conditions can be prevented:

  • Utilizing locks in critical regions: Critical regions refer to those areas of code where a shared variable is being modified by a thread. To prevent race conditions from happening in critical regions, we can use locks. A lock essentially causes all the threads to block except the thread that holds the lock. All the other threads that need to modify the shared resource will execute only when the thread that is currently holding the lock releases it. Some of the categories of locks that can be used are mutex locks, which can only be held by a single thread at a time; re-entrant locks, which allow a recursive function to take multiple locks on the same shared resource; and condition objects, which can be used to synchronize execution in producer-consumer type environments.
  • Utilizing thread-safe data structures: One other way of preventing race conditions is by using thread-safe data structures. A thread-safe data structure is one that will automatically manage the modifications being made to it by multiple threads and will serialize their operations. One of the thread-safe shared data structures that is provided by Python is a queue. A queue can be used easily when the operation involves multiple threads.

Now, we have an idea about what race conditions are, how they happen, and how they can be avoided. With this in mind, let's take a look at one of the other pitfalls that can arise due to the way we prevent race conditions from happening.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.189.67