Actor supervision

Now, let's explain another strangeness in the actor path that you are probably wondering about—the leading /user part in the actors paths we've seen before. The existence of this part is Akka's answer to the question we stated at the beginning of this chapter—How is the very first actor created?

In Akka, the very first actor is created by the library itself. It represents a root actor and is called the root guardian accordingly (we'll explain the guardian part in a moment).

In fact, Akka creates three guardian actors for each actor system, as shown in the following diagram.

The / root guardian is a parent for two other guardians and thus an ancestor of any other actor in the system.

The /user guardian is a root actor for all user-created actors in the system. Thus, every actor created by any user of the Akka library has two parents in the hierarchy and therefore has /user/ as a prefix in its path.

The /system is a root actor for internal actors that have been created by the system.

Let's extend our actor diagram with the guardian actors we just learned about:

We instantiate all of our actors except mixers by using system.context. Because of this, they are created as children of the user guardian. The root guardian is on the top of the hierarchy and has a user guardian and a system guardian as its children.

The guardians are a part of another important feature of Akka—supervision. To understand what supervision is and why it is important, let's finally run our application.

The following is the sanitized output in the console:

...
[INFO] Remoting now listens on addresses: [akka.tcp://[email protected]:2552]
[INFO] [akka.tcp://[email protected]:2552/user/Chef] Sent jobs to 24 mixers
[INFO] [akka.tcp://[email protected]:2552/user/Chef] Ready to accept new mixing jobs
[INFO] [akka.tcp://[email protected]:2552/user/Manager] Cookies are ready: Cookies(12)
[ERROR] [akka.actor.LocalActorRefProvider(akka://Bakery)] guardian failed, shutting down system
java.lang.AssertionError: assertion failed
 at scala.Predef$.assert(Predef.scala:204)
 ...
...
[akka.tcp://[email protected]:2552/system/remoting-terminator] Remoting shut down.

What happened? The actor system has started and communication between actors begun, but then an AssertionError was thrown and the whole system terminated!

The reason for this exception is a trivial programming error in the Baker actor we described previously:

override def receive: Receive = {
  ...
  case c: Cookies =>
    context.actorSelection("../Manager") ! c
    assert(timer.isEmpty)
    if (queue > 0) timer = sendToOven() else timer = None
}

The assertion that the timer is empty is wrong, and so it throws an exception at runtime. The exception is not caught and leads to termination of the program. Obviously, in this case, the rules of the actor model (as described at the beginning of this chapter) are not respected. One actor affects all other actors and the system as a whole without sending any messages.

In fact, this is not some deficiency of Akka. The reason our application behaves as it does is that we ignored the very important aspect of actor-based systems—supervision.

Supervision in Akka means that any actor creating a child is responsible for its management in case problems occur.

An actor detecting erroneous conditions is expected to suspend all of its descendants and itself and report a failure to its parent. This failure reporting has a form of exception throwing.

By convention, expected erroneous conditions, for example, an absence of a record in the database, are modelled on the protocol level via messages and errors of a technical nature, such as unavailable database connections modelled with exceptions. To better differentiate between erroneous conditions, developers are encouraged to define a rich set of exception classes, similar to that of normal message classes.

Exceptions thrown by the child actor are delivered to the parent, who then needs to handle the situation in one of four possible ways:

Resume the child and let it process messages in the message box, starting from the next one. The message that caused the actor to fail is lost.
Restart the child. This will clean up its internal state and recursively stop all of its descendants.
Stop the child completely. This is also recursively propagated to the descendants.
Propagate the failure to its own parent. By doing this, the supervisor is failing itself with the same cause as the subordinate.

Before delving into the technical aspects of defining a supervision strategy, let's revisit our actor's structure. Currently, all of our actors (with the exception of dynamic mixers) are created as direct children of the user guardian. This leads to the necessity to define the supervision strategy in one place for the whole actor hierarchy. This is a clear violation of the principle of separation of concerns, and is known in Akka as a Flat Actor Hierarchy anti-pattern. What we should aim for instead is creating a structure where failure handling happens close to the place the error occurred by the actor that is most capable of handling such errors.

With this goal in mind, let's restructure our application so that the Baker actor is responsible for the supervision of the Oven and the Manager is responsible for all of the actors in the system. This structure is represented in the following diagram:

Now, we have a sane hierarchical structure where each supervisor has the best knowledge about possible failures of its children and how to deal with them.

On a technical level, the supervision strategy is defined by overriding the supervisorStrategy field of the corresponding actor. To demonstrate how this is done, let's extend our Mixer actor with the capability to report different hardware failures. First, we must define a rich set of exceptions in the companion object:

class MotorOverheatException extends Exception
class SlowRotationSpeedException extends Exception
class StrongVibrationException extends Exception

And then we need to throw them randomly during message processing:

class Mixer extends Actor with ActorLogging {
  override def receive: Receive = {
    case Groceries(eggs, flour, sugar, chocolate) =>
      val rnd = Random.nextInt(10)
      if (rnd == 0) {
        log.info("Motor Overheat")
        throw new MotorOverheatException
      }
      if (rnd < 3) {
        log.info("Slow Speed")
        throw new SlowRotationSpeedException
      }
      ...
  }
}

Now, we'll override a supervision strategy in the Chef actor:

override val supervisorStrategy: OneForOneStrategy =
  OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
    case _: MotorOverheatException ⇒
      self ! Dough(0)
      Stop
    case _: SlowRotationSpeedException ⇒
      sender() ! message
      Restart
    case _: StrongVibrationException =>
      sender() ! message
      Resume
    case _: Exception ⇒ Escalate
  }

The OneForOneStrategy instructs the Chef to deal with any children's failures on an individual basis.

For MotorOverheatException, we decide to stop the failing Mixer. The Chef sends an empty Dough message to itself which is counted as the response from the broken child.

The SlowRotationSpeedException means that something went wrong during the placement of groceries into the Mixer. The original message was lost by the Mixer at the moment it threw an Exception, so we're resending this message and restarting the Mixer.

We can tolerate StrongVibrationException, so we just compensate for the lost message by resending it and resuming the child.

In the case of any other exception, the Chef has no knowledge of how to handle it and just propagates failure to the Manager. The Manager does not have any supervisorStrategy defined and the exception is ultimately propagated to the user guardian.

The user guardian handles exceptions as specified by the default strategy. The default strategy is the same for all actors in the userspace if not overridden, and is defined as follows:

ActorInitializationException: Stops the failing child actor
ActorKilledException: Stops the failing child actor
DeathPactException: Stops the failing child actor
Exception: Restarts the failing child actor
Throwable: Escalates to the parent actor

The root guardian is configured with SupervisorStrategy.stoppingStrategy, which differentiates between the Exception and other throwables. The former leads to the termination of the failing actor (which effectively means all of the actors in the /user or /system space), while the latter is propagated further and leads to the termination of the actor system. This is what happened when our earlier implementation threw an AssertionError.

The supervision strategy for the user guardian can be overridden by using its configuration property. Let's demonstrate how to use it to handle the occasional LazinessException, which could be thrown by any actor in the system. First, we augment application.conf:

akka {
 actor {
  guardian-supervisor-strategy = ch11.GuardianSupervisorStrategyConfigurator
 }
}

And then we implement the configured strategy, as follows:

class GuardianSupervisorStrategyConfigurator
    extends SupervisorStrategyConfigurator {
  override def create(): SupervisorStrategy = AllForOneStrategy() {
    case _: LazyWorkerException ⇒
      println("Lazy workers. Let's try again with another crew!")
      Restart
  }
}

Laziness is contagious, so we use AllForOneStrategy to replace the whole team by restarting all of the children of the user guardian.

Table of Contents for Actor supervision

Create new playlist

Sign In

Sign Up

Table of Contents for
Actor supervision