Identifying race conditions with race detection

If you've ever written an application that depends on the exact timing and sequencing of functions or methods to create a desired output, you're already quite familiar with race conditions.

These are particularly common anytime you deal with concurrency and far more so when parallelism is introduced. We've actually encountered a few of them in the first few chapters, specifically with our incrementing number function.

The most commonly used educational example of race conditions is that of a bank account. Assume that you start with $1,000 and attempt 200 $5 transactions. Each transaction requires a query on the current balance of the account. If it passes, the transaction is approved and $5 is removed from the balance. If it fails, the transaction is declined and the balance remains unchanged.

This is all well and good until the query happens at some point during a concurrent transaction (in most cases in another thread). If, for example, a thread asks "Do you have $5 in your account?" as another thread is in the process of removing $5 but has not yet completed, you can end up with an approved transaction that should have been declined.

Tracking down the cause of race conditions can be—to say the least—a gigantic headache. With Version 1.1 of Go, Google introduced a race detection tool that can help you locate potential issues.

Let's take a very basic example of a multithreaded application with race conditions and see how Golang can help us debug it. In this example, we'll build a bank account that starts with $1,000 and runs 100 transactions for a random amount between $0 and $25.

Each transaction will be run in its own goroutine, as follows:

package main

import(
  "fmt"
  "time"
  "sync"
  "runtime"
  "math/rand"
)  

var balance int
var transactionNo int

func main() {
  rand.Seed(time.Now().Unix())
  runtime.GOMAXPROCS(2)
  var wg sync.WaitGroup

  tranChan := make(chan bool)


  balance = 1000
  transactionNo = 0
  fmt.Println("Starting balance: $",balance)

  wg.Add(1)
  for i := 0; i < 100; i++ {
    go func(ii int, trChan chan(bool)) {
      transactionAmount := rand.Intn(25)
      transaction(transactionAmount)
      if (ii == 99) {
        trChan <- true
      }

    }(i,tranChan)
  }

  go transaction(0)
  select {

    case <- tranChan:
      fmt.Println("Transactions finished")
      wg.Done()

  }

  wg.Wait()
  close(tranChan)
  fmt.Println("Final balance: $",balance)
}

func transaction(amt int) (bool) {

  approved := false  
  if (balance-amt) < 0 {
    approved = false
  }else {
    approved = true
    balance = balance - amt
  }

  approvedText := "declined"
  if (approved == true) {
    approvedText = "approved"
  }else {

  }
  transactionNo = transactionNo + 1
  fmt.Println(transactionNo,"Transaction for $",amt,approvedText)
  fmt.Println("	Remaining balance $",balance)
  return approved
}

Depending on your environment (and whether you enable multiple processors), you might have the previous goroutine operate successfully with a $0 or more final balance. You might, on the other hand, simply end up with transactions that exceed the balance at the time of transaction, resulting in a negative balance.

So how do we know for sure?

For most applications and languages, this process often involves a lot of running, rerunning, and logging. It's not unusual for race conditions to present a daunting and laborious debugging process. Google knows this and has given us a race condition detection tool. To test this, simply use the –race flag when testing, building, or running your application, as shown:

go run -race race-test.go

When run on the previous code, Go will execute the application and then report any possible race conditions, as follows:

>> Final balance: $0
>> Found 2 data race(s)

Here, Go is telling us there are two potential race conditions with data. It isn't telling us that these will surely create data consistency issues, but if you run into such problems, this may give you some clue as to why.

If you look at the top of the output, you'll get more detailed notes on what's causing a race condition. In this example, the details are as follows:

==================
WARNING: DATA RACE
Write by goroutine 5: main.transaction()   /var/go/race.go:75 +0xbd 
  main.func┬╖001()   /var/go/race.go:31 +0x44

Previous write by goroutine 4: main.transaction() 
  /var/go/race.go:75 +0xbd main.func┬╖001()   /var/go/race.go:31 
    +0x44

Goroutine 5 (running) created at: main.main()   /var/go/race.go:36 
  +0x21c

Goroutine 4 (finished) created at: main.main()   /var/go/race.go:36 
  +0x21c

We get a detailed, full trace of where our potential race conditions exist. Pretty helpful, huh?

The race detector is guaranteed to not produce false positives, so you can take the results as strong evidence that there is a potential problem in your code. The potential is stressed here because a race condition can go undetected in normal conditions very often—an application may work as expected for days, months, or even years before a race condition can surface.

Tip

We've mentioned logging, and if you aren't intimately familiar with Go's core language, your mind might go in a number of directions—stdout, file logs, and so on. So far we've stuck to stdout, but you can use the standard library to handle this logging. Go's log package allows you to write to io or stdout as shown:

  messageOutput := os.Stdout
  logOut := log.New(messageOutput,"Message: ",log.
  Ldate|log.Ltime|log.Llongfile);
  logOut.Println("This is a message from the 
  application!")

This will produce the following output:

Message: 2014/01/21 20:59:11 /var/go/log.go:12: This is a message from the application!

So, what's the advantage of the log package versus rolling your own? In addition to being standardized, this package is also synchronized in terms of output.

So what now? Well, there are a few options. You can utilize your channels to ensure data integrity with a buffered channel, or you can use the sync.Mutex struct to lock your data.

Using mutual exclusions

Typically, mutual exclusion is considered a low-level and best-known approach to synchronicity in your application—you should be able to address data consistency within communication between your channels. However, there will be instances where you need to truly block read/write on a value while you work with it.

At the CPU level, a mutex represents an exchange of binary integer values across registers to acquire and release locks. We'll deal with something on a much higher level, of course.

We're already familiar with the sync package from our use of the WaitGroup struct, but the package also contains the conditional variables struct Cond and Once, which will perform an action just one time, and the mutual exclusion locks RWMutex and Mutex. As the name RWMutex implies, it is open to multiple readers and/or writers to lock and unlock; there is more on this later in this chapter and in Chapter 5, Locks, Blocks, and Better Channels.

All of these—as the package name implies—empower you to prevent race conditions on data that may be accessed by any number of goroutines and/or threads. Using any of the methods in this package does not ensure atomicity within data and structures, but it does give you the tools to manage atomicity effectively. Let's look at a few ways we can solidify our account balance in concurrent, threadsafe applications.

As mentioned previously, we can coordinate data changes at the channel level whether that channel is buffered or unbuffered. Let's offload the logic and data manipulation to the channel and see what the –race flag presents.

If we modify our main loop, as shown in the following code, to utilize messages received by the channel to manage the balance value, we will avoid race conditions:

package main

import(
  "fmt"
  "time"
  "sync"
  "runtime"
  "math/rand"
)  

var balance int
var transactionNo int

func main() {
  rand.Seed(time.Now().Unix())
  runtime.GOMAXPROCS(2)
  var wg sync.WaitGroup
  balanceChan := make(chan int)
  tranChan := make(chan bool)


  balance = 1000
  transactionNo = 0
  fmt.Println("Starting balance: $",balance)

  wg.Add(1)
  for i:= 0; i<100; i++ {

    go func(ii int) {

      transactionAmount := rand.Intn(25)
      balanceChan <- transactionAmount


      if ii == 99 {
        fmt.Println("Should be quittin time")
        tranChan <- true
        close(balanceChan)
        wg.Done()
      }

    }(i)

  }

  go transaction(0)


    breakPoint := false
    for {
      if breakPoint == true {
        break
      }
      select {
        case amt:= <- balanceChan:
          fmt.Println("Transaction for $",amt)
          if (balance - amt) < 0 {
            fmt.Println("Transaction failed!")
          }else {
            balance = balance - amt
            fmt.Println("Transaction succeeded")
          }
          fmt.Println("Balance now $",balance)


        case status := <- tranChan:
          if status == true {
            fmt.Println("Done")
            breakPoint = true
            close(tranChan)
            
          }
      }
    }

  wg.Wait()

  fmt.Println("Final balance: $",balance)
}

func transaction(amt int) (bool) {

  approved := false  
  if (balance-amt) < 0 {
    approved = false
  }else {
    approved = true
    balance = balance - amt
  }

  approvedText := "declined"
  if (approved == true) {
    approvedText = "approved"
  }else {

  }
  transactionNo = transactionNo + 1
  fmt.Println(transactionNo,"Transaction for $",amt,approvedText)
  fmt.Println("	Remaining balance $",balance)
  return approved
}

This time, we let the channel manage the data entirely. Let's look at what we're doing:

transactionAmount := rand.Intn(25)
balanceChan <- transactionAmount

This still generates a random integer between 0 and 25, but instead of passing it to a function, we pass the data along the channel. Channels allow you to control the ownership of data neatly. We then see the select/listener, which largely mirrors our transaction() function defined earlier in this chapter:

case amt:= <- balanceChan:
fmt.Println("Transaction for $",amt)
if (balance - amt) < 0 {
  fmt.Println("Transaction failed!")
}else {
  balance = balance - amt
  fmt.Println("Transaction succeeded")
}
fmt.Println("Balance now $",balance)

To test whether we've averted a race condition, we can run go run with the -race flag again and see no warnings.

Channels can be seen as the sanctioned go-to way of handling synchronized dataUse Sync.Mutex().

As mentioned, having a built-in race detector is a luxury not afforded to developers in most languages, and having it allows us to test methodologies and get real-time feedback on each.

We noted that using an explicit mutex is discouraged in favor of channels of goroutines. This isn't always exactly true because there is a right time and place for everything, and mutexes are no exclusion. What's worth noting is that mutexes are implemented internally by Go for channels. As was previously mentioned, you can use explicit channels to handle reads and writes and juggle the data between them.

However, this doesn't mean there is no use for explicit locks. An application that has many reads and very few writes might benefit from explicit locks for writes; this doesn't necessarily mean that the reads will be dirty reads, but it could result in faster and/or more concurrent execution.

For the sake of demonstration, let's remove our race condition using an explicit lock. Our -race flag tells us where it encounters read/write race conditions, as shown:

Read by goroutine 5: main.transaction()   /var/go/race.go:62 +0x46

The previous line is just one among several others we'll get from the race detection report. If we look at line 62 in our code, we'll find a reference to balance. We'll also find a reference to transactionNo, our second race condition. The easiest way to address both is to place a mutual exclusion lock around the contents of the transaction function as this is the function that modifies the balance and transactionNo variables. The transaction function is as follows:

func transaction(amt int) (bool) {
  mutex.Lock()
  
  approved := false
  if (balance-amt) < 0 {
    approved = false
  }else {
    approved = true
    balance = balance - amt
  }


  approvedText := "declined"
  if (approved == true) {
    approvedText = "approved"
  }else {

  }
  transactionNo = transactionNo + 1
  fmt.Println(transactionNo,"Transaction for $",amt,approvedText)
  fmt.Println("	Remaining balance $",balance)

  mutex.Unlock()
  return approved
}

We also need to define mutex as a global variable at the top of our application, as shown:

var mutex sync.Mutex

If we run our application now with the -race flag, we get no warnings.

The mutex variable is, for practical purposes, an alternative to the WaitGroup struct, which functions as a conditional synchronization mechanism. This is also the way that the channels operate—data that moves along channels is contained and isolated between goroutines. A channel can effectively work as a first-in, first-out tool in this way by binding goroutine state to WaitGroup; data accessed across the channel can then be provided safety via the lower-level mutex.

Another worthwhile thing to note is the versatility of a channel—we have the ability to share a channel among an array of goroutines to receive and/or send data, and as a first-class citizen, we can pass them along in functions.

Exploring timeouts

Another noteworthy thing we can do with channels is explicitly kill them after a specified amount of time. This is an operation that will be a bit more involved should you decide to manually handle mutual exclusions.

The ability to kill a long-running routine through the channel is extremely helpful; consider a network-dependent operation that should not only be restricted to a short time period but also not allowed to run for a long period. In other words, you want to offer the process a few seconds to complete; but if it runs for more than a minute, our application should know that something has gone wrong enough to stop attempting to listen or send on that channel. The following code demonstrates using a timeout channel in a select call:

func main() {
  
  ourCh := make(chan string,1)

  go func() {

  }()

  select {
    case <-time.After(10 * time.Second):
      fmt.Println("Enough's enough")
      close(ourCh)
  }

}

If we run the previous simple application, we'll see that our goroutine will be allowed to do nothing for exactly 10 seconds, after which we implement a timeout safeguard that bails us out.

You can see this as being particularly useful in network applications; even in the days of blocking and thread-dependent servers, timeouts like these were implemented to prevent a single misbehaving request or process to gum up the entire server. This is the very basis of a classic web server problem that we'll revisit in more detail later.

Importance of consistency

In our example, we'll build an events scheduler. If we are available for a meeting and we get two concurrent requests for a meeting invite, we'll get double-booked should a race condition exist. Alternately, locked data across two goroutines may cause both the requests to be denied or will result in an actual deadlock.

We want to guarantee that any request for availability is consistent—there should neither be double-booking nor should a request for an event be blocked incorrectly (because two concurrent or parallel routines lock the data simultaneously).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.157.6