1

Concurrency – A High-Level Overview

For many who don’t work with concurrent programs (and for some who do), concurrency means the same thing as parallelism. In colloquial speech, people don’t usually distinguish between the two. But there are some clear reasons why computer scientists and software engineers make a big deal out of differentiating concurrency and parallelism. This chapter is about what concurrency is (and what it is not) and some of the foundational concepts of concurrency.

Specifically, we’ll cover the following main topics in this chapter:

  • Concurrency and parallelism
  • Shared memory versus message passing
  • Atomicity, race, deadlocks, and starvation

By the end of this chapter, you will have a high-level understanding of concurrency and parallelism, basic concurrent programming models, and some of the fundamental concepts of concurrency.

Technical Requirements

This chapter requires some familiarity with the Go language. Some of the examples use goroutines, channels, and mutexes.

Concurrency and parallelism

There was probably a time when concurrency and parallelism meant the same thing in computer science. That time is long gone now. Many people will tell you what concurrency is not: “concurrency is not parallelism,” but when it comes to telling what concurrency is, a simple definition is usually elusive. Different definitions of concurrency give different aspects of the concept because concurrency is not how the real world works. The real world works with parallelism. I will try to summarize some of the core ideas behind concurrency, hoping you can understand the abstract nature of it well enough so that you can apply it to solve practical problems.

Many things around us act independently at the same time. There are probably people around you minding their own business, and sometimes, they interact with you and with each other. All these things happen in parallel, so parallelism is the natural way of thinking about multiple independent things interacting with each other. If you observe a single person’s behavior in a group of people, things are much more sequential: that person does things one after the other and may interact with others in the group, all in an orderly sequence. This is quite similar to multiple programs interacting with each other in a distributed system, or multiple threads of a program interacting with each other in a multi-threaded program.

In computer science, it is widely accepted that the study of concurrency started with the work of Edsger Dijkstra – in particular, the one-page paper titled Solution of a Problem in Concurrent Programming Control in 1965. This paper deals with a mutual exclusion problem involving N computers sharing memory. This is a very clever description that highlights the difference between concurrency and parallelism: it talks about “concurrent programming” and “parallel execution.” Concurrency relates to how programs are written. Parallelism relates to how programs run.

Even though this was mostly an academic exercise at the time, the field of concurrency grew over the years and branched into many different but related topics, including hardware design, distributed systems, embedded systems, databases, cloud computing, and more. It is now one of the necessary core skills for every software engineer, thanks to the advances in hardware technology. Nowadays, multi-core processors are the norm, which are essentially multiple processors packed on a single chip, usually sharing memory. These are used in data centers that power cloud-based applications in which someone can provision hundreds of computers connected via a network within minutes, and destroy them after a workload has been completed. The same concurrency principles apply to applications running on multiple machines on a distributed system, to applications running on a multi-core processor in a laptop, and to applications that run on a single-core system. Thus, any serious software developer must be knowledgeable of these principles to develop correct and safe programs that can scale.

Over the years, several mathematical models have been developed to analyze and validate the behavior of concurrent systems. Communicating Sequential Processes (CSP) is one such model that influenced the design of Go. In CSP, systems are composed of multiple sequential processes that are running in parallel. These processes can communicate with each other synchronously, which means that a system sending a message to the other can only continue once the other system receives it (this is exactly how unbuffered channels behave in Go.)

The validation aspect of such formal frameworks is most intriguing because they are developed with the promise of proving certain properties of complex systems. These properties can be things such as “can the system deadlock?”, which may have life-threatening implications for mission-critical systems. You don’t want your auto-pilot software to stop working mid-flight. Most validation activities boil down to proving properties about the states of the program. That’s what makes proving properties about concurrent systems so difficult: when multiple systems run together, the possible states of the composite system grow exponentially.

The state of a sequential system captures the history of the system at a certain point in time. For a sequential program, the state can be defined as the values in memory together with the current execution location of that program. Given these two, you can determine what the next state will be. As the program executes, it modifies the values of variables and advances the execution location so that the program changes its state. To illustrate this concept, look at the following simple program written in pseudo-code:

1: increment x
2: if x<3 goto 1
3: terminate

The program starts with loc=1 and x=0. When the statement at location 1 is executed, x becomes 1 and location becomes 2. When the statement at location 2 is executed, x stays the same, but the location goes back to 1. This goes on, incrementing x every time the statement at location 1 runs until x reaches 3. Once x is 3, the program terminates. The sequence in Figure 1.1 shows the states of the program:

Figure 1.1 – Sequence of the states of the program

Figure 1.1 – Sequence of the states of the program

When multiple processes are running in parallel, the state of the whole system is the combination of the states of its components. For example, if there are two instances of this program running, then there are two instances of the x variable, which are x1 and x2, and two locations, loc1 and loc2, pointing to the next line to run. At every state, the possible next states branch based on which copy of the system runs first. Figure 1.2 illustrates some of the states of this system:

Figure 1.2 – States of the parallel program

Figure 1.2 – States of the parallel program

In this diagram, the arrows are labeled with the index of the process that runs in that step. A particular run of the composite program is one of the paths in the diagram. Several observations can be made about these diagrams:

  • Each sequential process has seven distinct states.
  • Each sequential process goes through the same sequence of states at every run, but the states of the two instances of the program interleave in different ways on each path.
  • In a particular run, the composite system can go through 14 different states. That is, the length of any path from the start state to the end state in the composite state diagram is 14. (Each process has to go through seven distinct states, making 14 distinct composite states.)
  • Every run of the composite system can go through one of the possible paths.
  • There are 128 distinct states in the composite system. (For each state of system 1, system 2 can be in 7 distinct states, so 27=128.)
  • No matter which path is taken, the end state is the same.

In general, for a system with n states, m copies of that system running in parallel will have nm distinct states.

That’s one of the reasons why it is so hard to analyze concurrent programs: independent components of a concurrent program can run in any order, making it practically impossible to do state analysis.

It is now time to introduce a definition of concurrency:

Concurrency is the ability of different parts of a program to be executed out-of-order or in partial order without affecting the result.

This is an interesting definition, especially for those who are new to the field of concurrency. For one thing, it does not talk about doing multiple things at the same time, but about executing algorithms “out-of-order.” The phrase doing multiple things at the same time defines parallelism. Concurrency is about how the program is written, so according to Rob Pike, one of the creators of the Go language, it is about “dealing with multiple things at the same time.”

Now, a few words on “ordering” things. There are “total orders,” such as the less-than relationship for integers. Given any two integers, you can compare them using the less-than relationship. For sequential programs, we can define a “happened-before relationship,” which is a total order: for any two distinct events that happen within a sequential process, one event happens before the other. If two events happen in different processes, how can a happened-before relationship be defined? A globally synchronized clock can be used to order events happening in isolated processes. However, such a clock with sufficient precision does not usually exist in typical distributed systems. Another possibility is causal relationships between processes: if a process sends a message to another when the message is received, anything that happened before sending the message happened before the second process received it. This is illustrated in Figure 1.3:

Figure 1.3 – a and b happened before c

Figure 1.3 – a and b happened before c

Here, event a happened before c, and b happened before c, but nothing can be said about a and b. They happened “concurrently.” In a concurrent program, not every pair of events are comparable, thus the happened-before relationship is a partial order.

Let’s revisit the famous “dining philosophers problem” to explore the ideas of concurrency, parallelism, and out-of-order execution. This was first formulated by Dijkstra but later brought to its final form by C.A.R. Hoare. The problem is defined as follows: five philosophers are dining together at the same round table. There are five plates, one in front of each philosopher, and one fork between each plate, five forks total. The dish they are eating from requires them to use both forks, one on their left side, and the other on their right side. Each philosopher thinks for a random interval and then eats for a while. To eat, a philosopher must acquire both forks – one on the left side and one on the right side of the philosopher’s plate:

Figure 1.4 – Dining philosophers’ problem – some of the possible states

Figure 1.4 – Dining philosophers’ problem – some of the possible states

The goal is to devise a concurrent framework that keeps the philosophers well-fed while allowing them to think. We will revisit this problem in detail later as well. For this chapter, we are interested in possible states, some of which are illustrated in Figure 1.4. From left to right, the first figure shows all philosophers thinking. The second figure shows two philosophers that picked up the fork on their left-hand side, so one of the philosophers is waiting for the other to finish. The third figure shows the state in which one of the philosophers is eating while the others are thinking. The philosophers next to the one that’s eating are waiting for their turn to use the fork. The fourth figure shows the state in which two philosophers are eating at the same time. You can see that this is the maximum number of philosophers that can eat at the same time because there are not enough resources (forks) for one more philosopher to eat. The last figure shows the state where each philosopher has one fork, so they are all waiting to acquire the second fork to eat. This situation will only be resolved if at least one of the philosophers gives up and puts the fork back on to the table so that another one can pick it up.

Now, let’s change the problem setup a little bit. Instead of five philosophers sitting at the table, suppose we have a single philosopher who prefers walking when she is thinking. When she gets hungry, she randomly chooses a plate, places the adjacent forks on that plate one by one, and then starts eating. When she is done, she places the forks back on the table one by one and goes back to thinking while walking around the table. She may, however, get distracted during this process and get up at any point, neglecting to put one or both forks back on the table.

When the philosopher chooses a plate, one of the following is possible:

  • Both forks are on the table. Then, she picks them up and starts eating.
  • One of the forks is on the table, and the other one is on the next plate. Realizing that she cannot eat with a single fork, she gets up and chooses another plate. She may or may not put the fork back on the table.
  • One of the forks is on the table, and the other one is on the selected plate. She picks up the second fork and starts eating.
  • None of the forks are on the table, because they are both on adjacent plates. Realizing that she cannot eat without a fork, she gets up and chooses another plate.
  • Both forks are on the selected plate. She starts eating.

Even though the modified problem has only one philosopher, the possible states of the modified problem are identical to those of the original. The five states depicted in the preceding figure are still some of the possible states of the modified problem. The original problem, where there are five processors (philosophers) performing a computation (eating and thinking) using shared resources (forks) illustrates the parallel execution of a concurrent program. In the modified program, there is only one processor (philosopher) performing the same computation using shared resources by dividing her time (time sharing) to fulfill the roles of the missing philosophers. The underlying algorithms (behavior of the philosopher(s)) are the same. So, concurrent programming is about organizing a problem into computational units that can run using time sharing or that can run in parallel. In that sense, concurrency is a programming model like object-oriented programming or functional programming. Object-oriented programming divides a problem into logically related structural components that interact with each other. Functional programming divides a problem into functional components that call each other. Concurrent programming divides a problem into temporal components that send messages to each other, and that can be interleaved or run in parallel.

Time-sharing means sharing a computing resource with multiple users or processes. In concurrent programming, the shared resource is the processor itself. When multiple threads of executions are created by a program, the processor runs one thread for some time, and then switches to another thread, and so on. This is called context-switching. The context of an execution thread contains its stack and the states of the processor registers when that thread stopped. This way, the processor can quickly switch from stack to stack, saving and restoring the processor’s state at each switch. The exact location in the code where the processor does that switch depends on the underlying implementation. In preemptive threading, a running thread can be stopped at any time during that thread’s execution. In non-preemptive threading (or cooperative threading), a running thread voluntarily gives up execution by performing a blocking operation, a system call, or something else.

For a long time (until Go version 1.14), the Go runtime used a cooperative scheduler. That meant that in the following program, once the first goroutine started running, there was no way to stop it. If you build this program with a Go version < 1.14 and run it with a single OS thread multiple times, some runs will print Hello, while others will not. This is because if the first goroutine starts working before the second one, it will never let the second goroutine run:

func main() {
     ch:=make(chan bool)
     go func() {
           for {}
     }()
     go func() {
           fmt.Println("hello")
     }()
     <-ch
}

This is no longer the case for more recent Go versions. Now, the Go runtime uses a preemptive scheduler that can run other goroutines, even if one of them is trying to consume all processor cycles.

As a developer of concurrent systems, you have to be aware of how threads/goroutines are scheduled. This understanding is the key to identifying the possible ways in which a concurrent system can behave. At a high level, the states a thread/goroutine can be in are shown using the state in Figure 1.5:

Figure 1.5 – Thread state diagram

Figure 1.5 – Thread state diagram

When a thread is created, it is in the Ready state. When the scheduler assigns it to a processor it moves to the Running state and starts running. A running thread can be preempted and moved back into the Ready state. When the thread performs an I/O operation or blocks waiting for a lock or channel operation, it moves to the Blocked state. When the I/O operation completes, the lock is unlocked, or the channel operation is completed, the thread moves back to the Ready state, waiting to be scheduled.

The first thing you should notice here is that a thread waiting for something to happen in a blocked state may not immediately start running when it is unblocked. This fact is usually overlooked when designing and analyzing concurrent programs. What is the meaning of this for your Go programs? It means that unlocking a mutex doesn’t mean one of the goroutines waiting for that mutex will start running immediately. Similarly, writing to a channel does not mean the receiving goroutine will immediately start running. They will be ready to run, but they may not be scheduled immediately.

You will see different variations of this thread state diagram. Every operating system and every language runtime has different ways of scheduling their execution threads. For example, a threading system may differentiate between being blocked by an I/O operation and being blocked by a mutex. This is only a high-level depiction almost all thread implementations share.

Shared memory versus message passing

If you have been developing with Go for some time, you have probably heard the phrase “Do not communicate by sharing memory. Instead, share memory by communicating.” Sharing memory among the concurrent blocks of a program creates vast opportunities for subtle bugs that are hard to diagnose. These problems manifest themselves randomly, usually under load that cannot be simulated in a controlled test environment, and they are hard or impossible to reproduce. What cannot be reproduced cannot be tested, so finding such problems is usually a matter of luck. Once found, they are usually easy to fix with very minor changes. That adds insult to injury. Go supports both shared memory and message-passing models, so we will spend some time looking at what the shared memory and message-passing paradigms are.

In a shared memory system, there can be multiple processors or cores with multiple execution threads that use the same memory. In a Uniform Memory Access (UMA) system, all processors have equal access to memory. This can reduce throughput because the same memory access bus is shared by multiple processors. In a Non-Uniform Memory Access (NUMA) system, processors may have dedicated access to certain blocks of memory. In such a system, the operating system can allocate the memory of the processes that run in a processor to the memory region dedicated to that processor, increasing the memory throughput. Almost all systems also use faster cache memory to improve memory access time, and they employ cache coherence protocols to make sure that processes do not read stale values (values that are updated in the cache but not written to the main memory) from the main memory.

Within a single program, shared memory simply means that multiple execution threads access the same part of memory. Writing programs using shared memory comes naturally in most situations. In most modern languages such as Go, execution threads have unrestricted access to the memory space of the whole process. So, any variable accessible to a thread can be read and modified by that thread. If there were no compiler or hardware optimizations, this would have been just fine. When executing programs, modern processors do not usually wait for their memory-read or memory-write operations to complete. They execute instructions in a pipeline. So, while an instruction is waiting for the completion of a read/write operation, the processor can start running the next instruction of the program. If the second instruction completes before the first one, then the results of that instruction may be written to memory “out-of-order.”

Let’s look at this very simple program as an example:

var locked,y int
func f() {
   locked=1
   y=2
   ...
}

In this program, suppose the f() function starts with locked and when y is initialized to 0. Then, locked is set to 1, presumably to implement some sort of a locking scheme, and then y is set to 2. If we take a snapshot of the memory at that point, we may see one of the following:

  • locked=0, y=0: The processor ran the assignment operations, but the updates are not written to memory yet
  • locked=0, y=2: The y variable has been updated in memory, but locked is not yet updated in memory
  • locked=1, y=0: The locked variable is updated and written to memory, but the y variable may or may not be updated yet
  • locked=1, y=2: Both variables are updated, and the updates are written to memory

Reordering instructions is not limited to processors. Compilers can also move statements around without this affecting the result of computation within a sequential program. This is usually done during the optimization phase of the compilation. In the preceding program, nothing is preventing the compiler from reversing the order of the two assignments. Based on the rest of the code, the compiler may decide that executing locked=1 after y=2 is better.

In short, there is no guarantee for one thread to see the correct values of variables modified by other threads without additional mechanisms.

To make shared memory applications work correctly, we need a way to tell the compiler and the processor to commit all changes to memory. The low-level facility that is used for that purpose is a memory barrier. A memory barrier is an instruction that forces a certain ordering constraint on the processor and the compiler. After a memory barrier, all operations issued before the barrier are guaranteed to be completed before those that come after the barrier. In the preceding program, a memory barrier after the assignment to y ensures that the snapshot will have locked=1 and y=2. So, a memory barrier is a crucial low-level feature we need to make shared memory applications run correctly.

You may be wondering, how is this useful for me? When you are dealing with concurrent programs that use shared memory, the effects of the operations performed in one block will only be guaranteed to be visible to the other concurrent blocks after a memory barrier. These are certain operations in Go: atomic read/writes (functions in sync/atomic package) and channel read/writes. Other synchronization operations (mutexes, wait groups, and condition variables) use the atomic read/writes and channel operations, and thus also include memory barriers.

Note

A quick note about debugging concurrent programs is necessary here. A common practice for debugging concurrent programs is to print the values of some variables at a critical point in code to capture information about the state of the program when some unexpected behavior occurs. This practice usually prevents that unexpected behavior, because printing something to the console or writing a log message to a file usually involves mutexes and atomic operations. So, it is more common to have bugs in production when logging is turned off than to have them during development where there is much logging.

Now, a few words about the message-passing model. For many languages, concurrency is an add-on feature with libraries defining functions to create and manage concurrent blocks (threads) of execution. Go takes a different approach by making concurrency a core feature of the language. The concurrency features of Go are inspired by CSP, where multiple isolated processes communicate by sending/receiving messages. This has also been the basic process model for Unix/Linux systems for a long time. The Unix motto has been “make each program do one thing well,” and then compose these programs so that they can do more complicated tasks. Most traditional Unix/Linux programs are written to read/write from/to the terminal so that they can be connected one after the other in the form of a pipeline, where each program uses the output of the previous one as input using interprocess communication mechanisms. This system also resembles a distributed system with multiple interconnected computers. Each computer has dedicated memory where it carries out computations, sends its results to other computers on the network, and receives results from other computers.

Message passing is one of the core ideas in establishing happened-before relationships in distributed systems. When a system receives a message from another system, you can be sure that any event that happened before the sender system sent that message happened before the receiving system received it. Without such causal references, it is usually not possible to identify which event happened when in a distributed system. The same idea applies to concurrent systems. In a message-passing system, happened-before relationships are established using the messages. In shared memory systems, such happened-before relationships are established using locks.

There are some advantages to having both models available – for instance, many algorithms are much easier to implement on a shared memory system, whereas a message-passing system is free of certain types of data races. There are some disadvantages as well. For instance, in a hybrid model, it is easy to share memory unintentionally, creating data races. A common theme to prevent such unintentional sharing is to be mindful of data ownership: when you send a data object to another goroutine via a channel, the ownership of that data object is transferred to the receiving goroutine, and the sending goroutine must not access that object again without ensuring mutual exclusion. Sometimes, this is hard to achieve.

For example, even though the following snippet uses a channel to communicate between threads, the object sent via the channel is a map, which is a pointer to a complicated data structure. Continuing to use the same map after the channel send operation will probably result in data corruption and panic:

// Compute some result
data:=computeData()
m:=make(map[string]Data)
// Put the result in a map
m["result"]=data
// Send the map to another goroutine
c<-m
// Here, m is shared between the other goroutine and this one

Atomicity, race, deadlocks, and starvation

To write and analyze concurrent programs successfully, you have to be aware of some key concepts: atomicity, race, deadlocks, and starvation. Atomicity is a property you have to carefully exploit for safe and correct operation. Race is a natural condition related to the timing of events in a concurrent system, and can create irreproducible subtle bugs. You have to avoid deadlocks at all costs. Starvation is usually related to scheduling algorithms, but can also be caused by bugs in the program.

A race condition is a condition in which the outcome of a program depends on the sequence or timing of concurrent executions. A race condition is a bug when at least one of the outcomes is undesirable. Consider the following data type representing a bank account:

type Account struct {
     Balance int
}
func (acct *Account) Withdraw(amt int) error {
     if acct.Balance < amt {
           return ErrInsufficient
      }
     acct.Balance -= amt
     return nil
}

The Account type has a Withdraw method that checks the balance to see if there are sufficient funds for withdrawal, then will either fail with an error or reduce the balance. Now, let’s call this method from two goroutines:

acct:=Account{Balance:10}
go func() {acct.Withdraw(6)}() // Goroutine 1
go func() {acct.Withdraw(5)}() // Goroutine 2

The logic should not allow the account balance to go below zero. One of the goroutines should run successfully, while the other one should fail because of insufficient funds. Depending on which goroutine runs first, the account should have either 5 or 4 for the final balance. However, these goroutines may interleave, resulting in many possible runs, some of which are shown here:

Possible run 1

acct.Balance=10

Goroutine 1 (amt=6)

Goroutine 2 (amt=5)

acct.Balance

if acct.Balance < amt

10

if acct.Balance < amt

10

acct.Balance -= amt

5

acct.Balance -= amt

-1

Possible run 2

acct.Balance=10

Goroutine 1 (amt=6)

Goroutine 2 (amt=5)

acct.Balance

if acct.Balance < amt

10

if acct.Balance < amt

10

acct.Balance -= amt

4

acct.Balance -= amt

-1

Possible run 3

acct.Balance=10

Goroutine 1 (amt=6)

Goroutine 2 (amt=5)

acct.Balance

if acct.Balance < amt

10

acct.Balance -= amt

5

if acct.Balance < amt

5

return ErrInsufficient

5

Possible run 4

acct.Balance=10

Goroutine 1 (amt=6)

Goroutine 2 (amt=5)

acct.Balance

if acct.Balance < amt

10

acct.Balance -= amt

4

if acct.Balance < amt

4

return ErrInsufficient

4

Possible runs 1 and 2 leave the account with a negative balance, even though the logic prevents this case for a single-threaded program. Possible runs 3 and 4 are also race conditions but they leave the system in a consistent state. Two goroutines compete for the funds; one of them is successful while the other fails. This makes race conditions very hard to diagnose: they only happen rarely, and they are not reproducible, even if the conditions under which they are detected are replicated.

The race condition shown here is also a data race. A data race is a special type of race condition where the following occurs:

  • Two or more threads are accessing the same memory location
  • At least one of the threads writes to that location
  • There is no synchronization or locks to coordinate the ordering of operations between the two threads

Note that in runs 3 and 4, first, the function ran in its entirety in one goroutine, then it ran in its entirety in the other goroutine. Thus, Withdraw ran atomically. An atomic operation contains a sequence of sub-operations where either none of them happen, or all of them happen.

Now, let’s consider a more realistic scenario: the effects of the sequential operations of one goroutine may look out-of-order to another goroutine. The following is a possible run:

Possible run 5

acct.Balance=10

Goroutine 1 (amt=6)

Balance

Goroutine 2 (amt=5)

Balance

10

if acct.Balance < amt

10

10

acct.Balance -= n

5

if acct.Balance< n

10

5

acct.Balance-= n

4

4

In run 5, the compiler and the processor provide a consistent sequential view of the memory for each goroutine, which may not be the same for all goroutines. In the preceding example, the memory-write operation (acct.Balance -= n ) by goroutine 2 is delayed, resulting in goroutine 1 deciding there are enough funds. Eventually, goroutine 2 observes that the update made by goroutine 1 was written to memory, and the update made by goroutine 1 is lost.

Such data races are possible in any shared memory program where multiple execution threads can access the same object concurrently. If the operations changing the contents of the shared object are performed atomically, then they appear to be instantaneous. One way to implement atomicity is by using critical sections. A critical section is a protected section of a program in which only one process or thread can make modifications to a shared resource. When only one thread is allowed to enter the critical section, the mutual exclusion property is satisfied, and the operation that’s performed in the critical section is atomic. A subtle point we need to emphasize here is that a critical section is defined for a specific shared resource. Multiple processes can be in their critical sections and satisfy mutual exclusion property if they do not share a resource.

A sync.Mutex can be used for mutual exclusion:

type Account struct {
     sync.Mutex
     ID string
     Balance int
}
func (acct *Account) Withdraw(amt int) error {
     acct.Lock()
     defer acct.Unlock()
     if acct.Balance < amt {
           return ErrInsufficientFunds
     }
     acct.Balance -= amt
     return nil
}

Let’s analyze how this program works: one of the goroutines arrives at acct.Lock(), attempts to lock the mutex, and succeeds. If any other goroutine arrives at the same point, their attempts to lock the mutex will be blocked until the first goroutine unlocks it. The first goroutine, now that it has entered the critical section successfully, completes the function and unlocks the mutex. The unlocking operation enables all the other goroutines that are waiting for it to become unlocked. One of the waiting goroutines is randomly selected, so it locks the mutex and takes its turn running the critical section. The mutex implementation contains a memory barrier, so all the modifications that are performed by the first goroutine are visible to the second goroutine. An important point to note here is the use of the shared acct.Mutex. Atomicity is only guaranteed if all access to acct.Balance is done when the shared mutex is locked. The second thread sees the effect of the first thread’s call to Withdraw() as an instantaneous execution because the second thread is blocked while waiting for the first thread.

You may have already noticed that all concurrent operations using a read-decide-update scheme are prone to race conditions unless the read-decide-update block is in a critical section. For example, if you are working with a database, reading an object and then updating it with another database invocation has a race condition. This is because, after reading the object, another process may change it before we can write it back. If you are dealing with a file system, creating a file by checking if it exists first is a race condition because another process may create the file right after you check its existence but before you create it, causing you to overwrite an existing file. If you are working with a variable in memory, another thread may modify the variable after one thread reads it, causing it to overwrite potential changes. Most systems come with a way to define critical sections. Databases have transaction management systems to ensure atomicity and mutual exclusion; shared memory systems have mutexes.

The use of mutexes can get complicated when multiple shared objects need to interact. Let’s say we have two accounts, and we want to write a function to transfer money from one to the other. The operation has to be atomic to ensure the total amount of money in the system is consistent. The following is our first attempt:

func Transfer(from, to *Account, amt int) error {
     from.Lock()
     defer from.Unlock()
     to.Lock()
     defer to.Unlock()
     if from.Balance < amt {
           return ErrInsufficient
     }
     from.Balance -= amt
     to.Balance += amt
}

As we did previously, we call this function from two goroutines:

acct1 := Account{Balance: 10}
acct2 := Account{Balance: 15}
go func() { Transfer(acct1, acct2, 5) }()
go func() { Transfer(acct2, acct1, 10) }()

The following is a possible execution of this function:

Possible run 1

Goroutine 1 (from acct1 to acct2)

Goroutine 2 (from acct2 to acct1)

acct1.Lock()

acct2.Lock()

acct2.Lock()// blocked

acct1.Lock() // blocked

After locking acct1, goroutine 1 attempts to lock acct2, which was locked by goroutine 2 before. Goroutine 1 blocks this. However, goroutine 2 attempts to lock acct1, and blocks because acct1 is locked by goroutine 1. This is a deadlock. A deadlock is a situation in which every member of a group of objects is waiting on objects in the same group to release a lock. There are four necessary and sufficient conditions for a deadlock to happen (these are called Coffman conditions):

  1. At least one member must hold a resource exclusively.
  2. At least one member must be waiting for a resource being held by another member while holding another one exclusively.
  3. Only the member holding a resource can release that resource.
  4. There is a set of members, P1, P2, …, Pn, such that P1 is waiting for a resource held by P2, P2 is waiting for a resource held by P3, …, and Pn is waiting for a resource held by P1.

It is usually not possible to change the resource requirements of the algorithm: such a transfer function will need to lock both accounts. So, one way to fix this deadlock problem is to remove the fourth condition: the cyclic wait. Here, goroutine 1 is waiting for a lock held by goroutine 2, while goroutine 2 is waiting for a lock held by goroutine 1. A simple way to resolve this is to impose a consistent ordering of locks whenever multiple objects must be locked. The Account object has a unique identifier, so we can order the locking of accounts by their identifiers:

func Transfer(from, to *Account, amt int) error {
     if from.ID < to.ID {
            from.Lock()
            defer from.Lock()
            to.Lock()
            defer to.Lock()
     } else {
           to.Lock()
           defer to.Lock()
           from.Lock()
           defer from.Lock()
     }
...
}

This is not always easy, especially if the algorithm involves conditional locking of objects. Enforcing a consistent order of locking for multiple resources usually prevents many deadlocks, but not all.

As a side note, the Go runtime uses the term “deadlock” more liberally. If the runtime detects that all goroutines are blocked, it calls it a deadlock. This is quite easy to detect by the scheduler: it knows all goroutines and which ones are active, so it can schedule them. If there is none, it panics. For example, none of the four conditions of deadlock apply to the following program, but the Go runtime calls it a deadlock:

func main() {
     c := make(chan bool)
     <-c
}

The Go runtime does not detect deadlocks that occur in a proper subset of the goroutines. That’s the case if some goroutines are deadlocked in a cyclic wait while other goroutines are still running. The deadlocked goroutines will continue waiting until the program ends. This can be a critical problem for long-running programs such as servers.

Starvation is a situation in which a process is denied access to a resource for a long time or indefinitely. It is usually a result of simplistic or incorrect scheduling algorithms and bugs in algorithms. For example, let’s consider two goroutines where the first one locks a mutex for a long time, and the second one locks the same mutex for a short time:

var mutex sync.Mutex
    go func() {
        for {
             mutex.Lock()
            // Work for a long time
             mutex.Unlock()
            // Do something quick
        }
    }()
    go func() {
        for {
             mutex.Lock()
            // Do something quick
            mutex.Unlock()
            // Work for a long time
        }
    }()

The first goroutine keeps the mutex locked for a long time while the second goroutine is waiting for it. When it releases the mutex, both goroutines will be competing for the mutex again. A schedule that gives the goroutines equal chance will select the first goroutine half of the time. Also, whenever the second goroutine releases the lock, it is almost certain that the first goroutine will be waiting to acquire it because the second goroutine has a long task to complete. So, a simple “fair” scheduler would unfairly “starve” the second goroutine. Recent versions of the Go runtime deal with this problem by favoring goroutines waiting in the queue, thus giving both goroutines a fair chance.

Starvation is also the key idea for denial of service attacks. The idea of a denial of service attack is to starve a computer with useless computations so much so that it cannot perform any useful computations. For example, a web service may accept an XML document containing a recursive entity definition and parse it, resulting in an XML document with billions of entries (this is called a Billion Laughs Attack, which generates a billion “lol”s when a small XML document is processed). As a rule of thumb for services, never trust the data you receive from a client. Always impose size limits and do not blindly expand the data you receive.

The Go runtime helps detect deadlocks where all goroutines are asleep, but it fails to detect situations where goroutines are starved. Such situations can be difficult to detect. Take a look at the following example:

func Producer() <-chan string {
    c := make(chan string)
    go func() {
        defer func() {
             if r := recover(); r != nil {
                    log.Println("Recovered", r)
                    }
            }()
         for i := 0; i < 10; i++ {
             c <- produceValue()
         }
        close(c)
    }()
    return c
}
func httpHandler(w http.ResponseWriter, req *http.Request) {
    c := Producer()
    for _, x := range c {
         w.Write([]byte(x))
    }
}

This is a web service handler that responds to a request by returning the strings produced by a producer function. When the request comes, the handler creates a Producer and expects to receive some values from it through a channel. When the producer is done generating values, it closes the channel so that the for loop can terminate. Now, suppose the produceValue function panics. That will terminate the goroutine without closing the channel. Panics are propagated through the stack of the goroutine it happened in. The HTTP handler, running in a separate goroutine, never receives that panic and is now indefinitely blocked. The client who issued the request may eventually timeout, but the goroutine created for the HTTP handler will never terminate (the goroutine will leak). This will not be detected by the runtime, though it may be observed in the logs of this server. If this panic happens sufficiently often, the server will soon run out of resources to continue. (This example has an easy fix: instead of closing the channel at the end of the goroutine, close it in a defer statement at the beginning. That will close the channel, even in case of panic.)

A livelock is a situation that looks like a deadlock, but none of the processes are blocked waiting on another. It is a kind of starvation in the sense that even though processes work, they don’t do anything useful. A typical example of a livelock is two people walking in opposite directions in a narrow hallway. When they meet, both being equally polite, each wants to yield to the other, but they both make the same decisions at the same time, moving left and right in synchrony while failing to clear the way for the other. The runtime cannot offer much help for a livelock because as far as the runtime is concerned, the processes are running. In a livelock situation, processes keep attempting to acquire a resource and fail because the other process is also doing the same. It is easy to fall into a livelock using a try-lock:

go func() {
    for {
         if mutex1.TryLock() {
            if mutex2.TryLock() {
                 // ...
                mutex2.Unlock()
             }
            mutex1.Unlock()
        }
    }
}()
go func() {
    for {
        if mutex2.TryLock() {
            if mutex1.TryLock() {
                 // ...
                mutex1.Unlock()
            }
            mutex2.Unlock()
        }
      }
}()

A livelock is very difficult to diagnose because the situation may resolve itself after several iterations. There are several ways you can prevent a livelock: if you fail to acquire a resource and intend to retry it (such as by using TryLock), retry for a finite number of times, and possibly with random delays between retries. This can be generalized to work queues and pipelines: if you acquire a piece of work and fail to complete it, record the number of times that piece of work has been retried so that you don’t keep rescheduling the same work indefinitely.

These concepts can be used to define a framework for specifying certain properties of concurrent programs. There are usually two kinds of properties of interest:

  • Safety properties specify that something bad never happens. The absence of a deadlock is an important safety property that states no execution of a program can deadlock. Mutual exclusion guarantees that no two processes sharing a resource are in their critical sections at the same time. Correctness properties guarantee that if a program starts at a valid state, it should end at a valid state.
  • Liveness properties specify that something good eventually happens. One of the most studied liveness properties is that the program should eventually terminate. This does not usually hold for long-running programs such as servers. For such programs, important liveness properties include that every request must eventually be answered, a message sent will eventually be received, and a process will eventually enter its critical section.

There have been many studies on providing safety and liveness properties. For this book, our focus will be on practical applications of such properties. For example, the liveness of applications can usually be tested by heartbeat services. This can detect if the application became completely unresponsive, but cannot detect if part of the application is busy spinning in a livelock. For some mission-critical software systems, formal proof of safety and liveness properties is necessary. For many others, runtime evaluation and detection of these properties is sufficient. With careful observation of the internal workings of a concurrent program, it is usually possible to test certain safety and liveness properties. A safety or liveness problem can sometimes be remedied by terminating that component and creating another instance of it. This approach, in a way, gives up on the idea that all such problems must be identified before deploying the system and accepts that the programs will fail. What is important is that when programs fail, they should be recognized quickly, and fixed by automatically replacing the component. This is one of the core tenets of cloud computing and the microservice architecture: individual components will fail but the overall system must self-heal by detecting and replacing failing components automatically.

Summary

The main theme in this chapter was that concurrency is not parallelism. Parallelism is an intuitive concept people are used to because the real world works in parallel. Concurrency is a mode of computation where blocks of code may or may not run in parallel. The key here is to make sure we get the correct result no matter how the program is run.

We also talked about the two main concurrency programming paradigms: message passing and shared memory. Go permits both, which makes it easy to program, but equally easy to make mistakes. The last part of this chapter was about fundamental concepts of concurrent programming – that is, race conditions, atomicity, deadlocks, and livelock concepts. The important point to note here is that these are not theoretical concepts – these are real situations that affect how programs run and how they fail.

We tried to avoid Go specifics in this chapter as much as possible. The next chapter will cover Go concurrency primitives.

Question

We looked at the dining philosopher’s problem with a single philosopher that walks when they are thinking. What problems can you foresee if there are two philosophers?

Further reading

The literature on concurrency is very rich. These are only some of the seminal works in the field of concurrency and distributed computing that are related to the topics we discussed in this chapter. Every serious software practitioner should at least have a basic understanding of these.

The following paper is easy to read and short. It defines mutual exclusion and critical sections: E. W. Dijkstra. 1965. Solution of a problem in concurrent programming control. Commun. ACM 8, 9 (Sept. 1965), 569. https://doi.org/10.1145/365559.365617.

This is the CSP book. It defines the CSP as a formal language: Hoare, C. A. R. (2004) [originally published in 1985 by Prentice Hall International]. “Communicating Sequential Processes” (PDF). Usingcsp.com.

The following paper talks about the ordering of events in a distributed system: Time, Clocks and the Ordering of Events in a Distributed System, Leslie Lamport, Communications of the ACM 21, 7 (July 1978), 558-565. Reprinted in several collections, including Distributed Computing: Concepts and Implementations, McEntire et al., ed. IEEE Press, 1984. | July 1978, pp. 558-565. 2000 PODC Influential Paper Award (later renamed the Edsger W. Dijkstra Prize in Distributed Computing). Also awarded an ACM SIGOPS Hall of Fame Award in 2007.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.102.112