Chapter 5: Routers

Up to this point in the book, we've looked at simple uses of actors. While you can certainly build complicated chains of actors cooperating via message passing, you haven't really seen the components that allow for building more interesting and complex actor topologies. This is where routers (and dispatchers, covered in the next chapter) enter the picture.

Routers are a mechanism Akka provides to handle the flow of messages to actors. They allow for grouping actors together and sending messages to different actors based upon a variety of different rules. Dispatchers, on the other hand, are more concerned with the actual management of the actors' execution. In fact, they are themselves instances of ExecutionContext, which we briefly looked at earlier in the context of futures.

It's easy to get confused at times about which you should choose, given a specific problem, between a router or a dispatcher. Hopefully, by the end of this chapter you will begin building an intuitive sense of when each of these provide an appropriate solution.

The basics of routers

The basic purpose of a router is to provide a means to determine where messages are sent between a group of possible actors. You can think of these as being similar to a load-balancer in front of a typical web application. In fact, one of the most common use cases for routers in Akka is to balance some load across a set of actors. This can be used quite effectively for situations that call for possibly limited or costly resource usage, like what you might typically see in, for instance, a pool of database connections or other external resources (connections to external dependencies is a common scenario here).

It's important right from the start to understand that, while routers are internally implemented as actors, they possess certain properties and behaviors that are not always comparable to normal actors. For instance, they do not actually use the same mailbox and message handling scheme used by other actors. Akka specifically bypasses these to make routing efficient and simple. There is a cost, if you're actually implementing a router, but as we don't plan to cover that here, you can safely ignore it for now.

One of the consequences of routers being actors is that they will be part of the actor hierarchy, so the path used to address them will be based on the name give to the router, rather than the name of the actors used by the router. This also has implications for how responses are sent when an actor behind a router is communicating with another actor. Normally, if you just rely on an actor using sender from within a routed actor, any messages that other actor sends back to its sender reference would go straight to the current actor, rather than being routed back through the router. Depending upon the scenario, this might not be ideal. You can override this behavior and have any responses sent back through the router using a variation on the normal message sending syntax (this actually uses a method to which ! is effectively aliased behind the scenes). This form tells the actor on the other end to use the current actor's parent, which is the router, as the sender:

 // within an actor that is behind a router
sender.tell(SomeResponseMessage(), context.parent)

Built-in router types

Akka provides a variety of routers to use in your application.

RoundRobinRouter

The RoundRobinRouter is one of the simplest routers you'll encounter, but it is nonetheless very useful. In very simple terms, it will send each message in-turn to the next actor in order, based upon the ordered sequence of available routees. Don't assume any given order here — just assume that the order will be consistent across calls.

This router works quite well for simple scenarios where you want to spread tasks across a set of actors, but where there is not likely to be a large variation of time spent performing those tasks and where mailbox backlogs are not a primary concern. The reason for this is straightforward. If the tasks have a significant variance in time incurred, the likelihood of ending up with one or more actors backing up rises as the rate of messages being sent increases. You might end up with a number of messages that were sent early in the sequence of events, but which end up sitting unattended to for long periods of time while other messages sent much later are handled after only a short delay.

SmallestMailboxRouter

The SmallestMailboxRouter is potentially useful for helping with the situation described in the previous scenario where you want to avoid having messages sitting unhandled simply because they were sent to a busy actor when another actor is sitting without messages to process. The SmallestMailboxRouter will look at the available routees and select from them whichever routee has the smallest (possibly empty) mailbox. It's not a cure-all, though — even if incoming messages are always sent to the smallest mailbox, there is no way to predict whether that mailbox is currently occupied by messages that will actually take longer to process than a much larger mailbox full of messages awaiting a different actor. Also, this router does not have the ability to view the mailbox size for remote actors (remote actors are covered in a chapter 8). Given this limitation, remote actors are given the lowest priority in the selection algorithm.

BroadcastRouter

BroadcastRouter is a handy utility for the case where you need to send the same messages to a set of actors. There are two common use cases where this functionality can be worthwhile. The most obvious scenario is where you have a set of tasks that need to be performed in parallel using the exact same sequence of messages in each case.

The other scenario where this router can be useful is when you have a set of existing actors and you need a simple mechanism for broadcasting the same message to each. In this case, you would use a routees parameter to specify the actors to be included when you create the router instance.

RandomRouter

This router simply sends messages randomly to its routees. There's not much more to it than that. If you're asking yourself why you would ever use this, you're not alone. It is perhaps not very clear when it makes sense to choose this router. The best advice is probably that there will be cases where no other router is an obvious choice and the order of which you send messages to your routees doesn't matter. In these cases, a RandomRouter can be a useful choice.

ScatterGatherFirstCompletedRouter

The ScatterGatherFirstCompletedRouter is a very special router that behaves quite differently from the other routers described here. Like the BroadcastRouter, it will send any received message to all its routees, but it's intended to be used with futures and the ask pattern, so that a response will be returned, but the response returned will be the first of the routees to return a response.

This router essentially wraps together a common usage of futures, which is to handle precisely this scatter-gather pattern.

ConsistentHashingRouter

A chapter alone could be dedicated to describing consistent hashing in depth, but in essence it's a means of mapping hash keys such that the addition of new slots to the hash table results in a minimal remapping of keys. This can be very useful, for instance, when you want to determine what set of servers to send a given request to. If the mappings were to change each time a new key was added, there would be a very high cost to any such additions. But with consistent hashing, this is minimized and thus there's a largely predictable mapping of a given request to a target server or resource. This exact technique is used in a number of popular distributed data storage services.

The actual use of this router is complicated enough to be beyond the scope of this book, but it's a useful facility to be aware of when designing distributed systems. It can be of particular use for cases where you are caching or memoizing data within stateful actors and you need to reduce the need to refresh that cached data. 

Others

The routers described above are just the existing routers provided as part of Akka's library, but it's not difficult to create custom routers if need be. Reading the source code for the built-in routers will give you perhaps the best indication of what's involved and, of course, the Akka docs include good material on this subject.

Using routers

There are two mechanisms to specify how a router instance should be created, one is purely programmatic and the other uses the Akka configuration. Both have their uses and it's important to understand the reasons to choose one over the other, which we'll cover as we look at each approach.

Configuration-based router creation

Creating routers via configuration is very simple and makes it easy to adjust the runtime routing strategy without needing to make code changes and push out a new build:

 akka.actor.deployment {
/configBasedRouter {
router = round-robin
nr-of-instances = 3
}
}

Assuming you have an actor called PooledActor, you would then use this by adding the following code. Note that the Props object is still passed the instance of tha actor you intend as your routee, but then you modify the Props by way of the withRouter method call:

 import akka.actor._
import akka.routing.FromConfig
val router = context.actorOf(Props[PooledActor].withRouter(FromConfig()),
name = "configBasedRouter")
router ! SomeMessage()

All routers available within Akka allow for resizeable pools. You can set this via code, but since we're focusing on configuration based router definition, let's look at some of the options available:

 akka.actor.deployment {
/resizableRouter {
router = smallest-mailbox
resizer = {
lower-bound = 2
upper-bound = 20
messages-per-resize = 20
}
}
}

The configuration defined here specifies a starting size of 2 routees, in addition to assuring that we never drop below 2 routees in the pool. Further, we'll never have more than 20 routees and the router will only try to resize, if necessary, after every 20 messages. This option is useful for assuring that the router is not spending an excessive amount of time trying to resize the pool.

There are a handful of other configuration parameters available for the resizer and they can get a bit confusing on your first encounter with them, so here's a brief summary of each and how they are used:

  • pressure-threshold is used to define how the resizer determines how many actors are currently busy. If this value is set to 0, then it uses the number of actors currently processing a message. If the value is set to the default value of 1, it uses the number of actors which are processing messages and have one of more messages in their mailbox. If the number is greater than 1, it will only include actors which have at least that number of messages in their mailbox.
     
  • rampup-rate is used to determine how much to increase the routee pool size by when all current routees are considered to be busy (based on the pressure-threshold). This value is a ratio and defaults to 0.2, which means that when the resizer determines that it needs to create more routees, it will attempt to increase the pool size by 20%.
     
  • backoff-threshold is used for reducing the size of the pool. Another ratio (this value defaults to 0.3) is interpreted to mean that there must be less than 30% of the current routees busy before the pool is shrunk.
     
  • backoff-rate is essentially the inverse of rampup-rate, since it determines how much to decrease the pool size when that is called for. The default rate of 0.1 means that it will be decreased by 10% when needed.
     
  • stop-delay is used to provide a small delay before a PoisonPill message is sent to the routees that are being removed from the pool, in order to shut them down. The delay, defaulting to 1s, is provided to allow some time for messages to be placed into the routees mailbox before sending them the message to terminate.

You can easily spend a huge amount of time just trying to get the perfect configuration, but I'd advise against spending excessive effort on this when you're first building your system. It takes careful testing to really test these changes adequately, so working with the defaults is often a good place to start.

Programmatic router creation

Creating a router in code is also quite simple. If you choose to use this approach, you can always redefine the configuration using the methods specified previously, assuming you've given the router a name you can use to reference it in the configuration:

 val randomRouter = context.actorOf(Props[MyActor].withRouter(
RandomRouter(nrOfInstances = 100)),
name = "randomlyRouted")

Alternately, to using configuration-driven actor sizing (either using a fixed nr-of-instances or by defining a resizer), you can pass a collection of existing actors in to your router. This can be very useful when you need to perform more complex setup of each actor instance than the Props factory-based approach allows. A very simple example of this would be simply setting a specific name for each actor. It's notable that using this approach obviates the use of nr-of-instances or any sort of resizing:

 val namedActors = Vector[ActorRef](
context.actorOf(Props[MyActor], name = "i-am-number-one")
context.actorOf(Props[MyActor], name = "i-am-number-two")
)
val router = context.actorOf(Props().withRouter(
SmallestMailboxRouter(routees = namedActors)),
"smallestMailboxRouter")

Routers and supervision

Routers, like other parts of your actor hierarchy, generally should be considered in light of supervision and failure handling. By default, all routers will escalate any errors thrown by their routees, which can lead to some unexpected behavior. For example, if your actor that creates the router has a policy of restarting all of its children on an exception when one of your routees encounters an error, all of the children of the parent of the router will be restarted. This is not a good thing. Thankfully, you can override the strategy used by the router easily enough as the following example demonstrates:

 val router = context.actorOf(Props[MyActor].withRouter(
RoundRobinRouter(nrOfInstances = 20,
supervisorStrategy = OneForOneStrategy() {
case _: DomainException => SupervisorStrategy.Restart
}
)
))

Continuing our example application

In the second chapter, I showed you a very simple example of using a router to both load balance across requests to a database and to provide fault handling. We then changed the fault handling mechanism in the last chapter to use a special intermediate actor to provide a supervisor for our routed actors. Let's expand this a bit further with what we've learned here to make it more adaptive to changing load-handling needs.

First, we make a fairly simple change in Bookmarker, removing a bit more code we placed there earlier:

 import org.eclipse.jetty.server.Server
import org.eclipse.jetty.servlet.{ServletHolder, ServletContextHandler}
import java.util.UUID
import akka.actor.{Props, ActorSystem}
import akka.routing.{FromConfig}
object Bookmarker extends App {
val system = ActorSystem("bookmarker")
val database = Database.connect[Bookmark, UUID]("bookmarkDatabase")
val bookmarkStore =
system.actorOf(Props(new BookmarkStore(database)).withRouter(FromConfig()),
name = "bookmarkStore")

val bookmarkStoreGuardian =
system.actorOf(Props(new BookmarkStoreGuardian(bookmarkStore)))
val server = new Server(8080)
val root = new ServletContextHandler(ServletContextHandler.SESSIONS)
root.addServlet(new ServletHolder(new BookmarkServlet(system,
bookmarkStoreGuardian)), "/")

server.setHandler(root)
server.start
server.join
}

The primary change to note is that we now create the BookmarkStore router using the FromConfig call to tell Akka to load the router settings from the application.conf file. Here's the minimal configuration used here:

 akka.actor.deployment {
/bookmarkStore {
router = round-robin
nr-of-instances = 10
resizer {
lower-bound = 5
upper-bound = 50
pressure-threshold = 0
rampup-rate = 0.1
backoff-threshold = 0.4
}
}
}

I'm making some assumptions here that I should explain. Of course, this is a semi-imaginary example, given that I'm using a mock database interface. Even with a real database or other external data store, determining the settings to use here would take a bit of analysis. In this case, I'm assuming that at minimum, I want to have a collection of 5 BookmarkStore actors to interact with the database but ramping up to 50 in times of peak load. Further, I'm setting the pressure-threshold to 0 based upon the understanding that calls to an external system are expensive, so having an actor currently processing a message is enough to consider it busy. However, I don't want to ramp up too quickly, so I've set the rampup-rate to only increase the pool size by 10% at a time. I also want to shrink it back down quickly, tending towards a small pool, so I've set the backoff-threshold to drop the pool size when fewer than 40% of my routees are busy.

Wrap-up

This whirlwind tour of routers and dispatchers has hopefully given you an idea of the flexibility Akka gives you for creating robust configurations that can handle very different types of workflow, depending upon your needs. There are a huge range of choices available to you, so you might feel overwhelmed, but it's generally best to start with the minimum you need to get things working. Then, through watching the performance and profiling under real workloads, you can get a better sense of where to apply these different tools and understand how they might impact your overall system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.114.28