Chapter 9. Concurrency with Akka

Much of this book focusses on taking advantage of multicore and distributed architectures. In Chapter 4, Parallel Collections and Futures, you learned how to use parallel collections to distribute batch processing problems over several threads and how to perform asynchronous computations using futures. In Chapter 7, Web APIs, we applied this knowledge to query the GitHub API with several concurrent threads.

Concurrency abstractions such as futures and parallel collections simplify the enormous complexity of concurrent programming by limiting what you can do. Parallel collections, for instance, force you to phrase your parallelization problem as a sequence of pure functions on collections.

Actors offer a different way of thinking about concurrency. Actors are very good at encapsulating state. Managing state shared between different threads of execution is probably the most challenging part of developing concurrent applications, and, as we will discover in this chapter, actors make it manageable.

GitHub follower graph

In the previous two chapters, we explored the GitHub API, learning how to query the API and parse the results using json-4s.

Let's imagine that we want to extract the GitHub follower graph: we want a program that will start from a particular user, extract this user followers, and then extract their followers until we tell it to stop. The catch is that we don't know ahead of time what URLs we need to fetch: when we download the login names of a particular user's followers, we need to verify whether we have fetched these users previously. If not, we add them to a queue of users whose followers we need to fetch. Algorithm aficionados might recognize this as breadth-first search.

Let's outline how we might write this in a single-threaded way. The central components are a set of visited users and queue of future users to visit:

val seedUser = "odersky" // the origin of the network

// Users whose URLs need to be fetched 
val queue = mutable.Queue(seedUser) 

// set of users that we have already fetched 
// (to avoid re-fetching them)
val fetchedUsers = mutable.Set.empty[String] 

while (queue.nonEmpty) {
  val user = queue.dequeue
  if (!fetchedUsers(user)) {
    val followers = fetchFollowersForUser(user)
    followers foreach { follower =>           
      // add the follower to queue of people whose 
      // followers we want to find.
      queue += follower
    }
    fetchedUsers += user
  }
}

Here, the fetchFollowersForUser method has signature String => Iterable[String] and is responsible for taking a login name, transforming it into a URL in the GitHub API, querying the API, and extracting a list of followers from the response. We will not implement it here, but you can find a complete example in the chap09/single_threaded directory of the code examples for this book (https://github.com/pbugnion/s4ds). You should have all the tools to implement this yourself if you have read Chapter 7, Web APIs.

While this works, it will be painfully slow. The bottleneck is clearly the fetchFollowersForUser method, in particular, the part that queries the GitHub API. This program does not lend itself to the concurrency constructs that we have seen earlier in the book because we need to protect the state of the program, embodied by the user queue and set of fetched users, from race conditions. Note that it is not just a matter of making the queue and set thread-safe. We must also keep the two synchronized.

Actors offer an elegant abstraction to encapsulate state. They are lightweight objects that each perform a single task (possibly repeatedly) and communicate with each other by passing messages. The internal state of an actor can only be changed from within the actor itself. Importantly, actors only process messages one at a time, effectively preventing race conditions.

By hiding program state inside actors, we can reason about the program more effectively: if a bug is introduced that makes this state inconsistent, the culprit will be localized entirely in that actor.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.206.112