Building GenServers for OTP

Though our counter is an oversimplification, the basic approach has been used for over thirty years to manage both concurrent state and behavior for most important Erlang applications. The library encapsulating that approach is called OTP, and the abstraction is called a generic server, or GenServer. Let’s modify our counter to use OTP to create our counter, instead.

We don’t need to change too much. Instead of creating specific functions to handle inc, dec, and val, we use specific OTP abstractions instead. Update your counter.ex file with these contents:

1: defmodule​ InfoSys.Counter ​do
use​ GenServer
def​ inc(pid), ​do​: GenServer.cast(pid, ​:inc​)
5: 
def​ dec(pid), ​do​: GenServer.cast(pid, ​:dec​)
def​ val(pid) ​do
GenServer.call(pid, ​:val​)
10: end
def​ start_link(initial_val) ​do
GenServer.start_link(__MODULE__, initial_val)
end
15: 
def​ init(initial_val) ​do
{​:ok​, initial_val}
end
20: def​ handle_cast(​:inc​, val) ​do
{​:noreply​, val + 1}
end
def​ handle_cast(​:dec​, val) ​do
25:  {​:noreply​, val - 1}
end
def​ handle_call(​:val​, _from, val) ​do
{​:reply​, val, val}
30: end
end

We’ve changed the terminology some, but not the implementation. When we want to send asynchronous messages such as our inc and dec messages, we use GenServer.cast, as you can see on line 4. Notice that these functions don’t send a return reply. When we want to send synchronous messages that return the state of the server, we use GenServer.call as we do on line 8. Notice the _from in the function head. You can use an argument leading with an underscore, just as you’d use a _ as wildcard match. With this feature, we can explicitly describe the argument while ignoring the contents.

On the server side, the implementation is much the same: we use a handle_cast line for :inc and one for :dec, each returning a noreply alongside the new state, and we also use handle_call to handle :val, and specify the return value. We explicitly tell OTP when to send a reply and when not to send one. We also have to tweak the start_link to start a GenServer, giving it the current module name and the counter. This function spawns a new process and invokes the InfoSys.Counter.init function inside this new process to set up its initial state.

Let’s take that much for a spin:

 iex>​ alias InfoSys.Counter
 InfoSys.Counter
 iex>​ {​:ok​, counter} = Counter.start_link(10)
 {:ok, #PID<0.96.0>}
 iex>​ Counter.dec(counter)
 :ok
 iex>​ Counter.dec(counter)
 :ok
 iex>​ Counter.val(counter)
 8

Our first counter was split into client and server code. This segregation remains when we write our GenServer. init, handle_call, and handle_cast run in the server. All other functions are part of the client.

Our OTP counter server works exactly as before, but we’ve gained much by moving it to a GenServer. On the surface, we no longer need to worry about setting up references for synchronous messages. Those are taken care of for us by GenServer.call. Second, the GenServer module is now in control of the receive loop, allowing it to provide great features like code upgrading and handling of system messages, which will be useful when we introspect our system with Observer later on. A GenServer is one of many OTP behaviours. We’ll continue exploring them as we build our information system.

Adding Failover

The benefits of OTP go beyond simply managing concurrent state and behavior. It also handles the linking and supervision of processes. Now let’s explore how process supervision works. We’ll supervise our new counter.

Though our counter is a trivial service, we’ll play with supervision strategies. Our supervisor needs to be able to restart each service the right way, according to the policies that are best for the application. For example, if a database dies, you might want to automatically kill and restart the associated connection pool. This policy decision should not impact code that uses the database. If we replace a simple supervisor process with a supervisor tree, we can build much more robust fault-tolerance and recovery software.

In Phoenix, you didn’t see too much code attempting to deal with the fallout for every possible exception. Instead, we trust the error reporting to log the errors so that we can fix what’s broken, and in the meantime, we can automatically restart services in the last good state. The beauty of OTP is that it captures these clean abstractions in a coherent library, allowing us to declare the supervision properties that most interest us without bogging down the meaning of each individual application. With a supervision tree having a configurable policy, you can build robust self-healing software without building complex self-healing software.

We’ll manage the configuration of the supervision policies in a single location. Since we’re under an umbrella, we’ll use the application.ex file for our info_sys. Let’s add our Counter server to our application’s supervision tree. In lib/info_sys/application.ex, add your new server as a child of your supervisor, like this:

 children = [
  {InfoSys.Counter, 5}, ​# new counter worker
 ]

To specify the children an Elixir application will start, we define a child spec. In this case, we add our new counter to the existing list of children that our application already defined. You’ll specify a single element containing a two-tuple having the module you want to start and the value that will be received on start_link by the GenServer. Alternatively, passing in only a module name uses a default value of [].

For our Counter, we pass a tuple, which takes the module, and the argument for the child’s start_link/1. In our case, we pass the initial state, as the number 5.

In opts, you can see the policy that our application will use if something goes wrong. OTP calls this policy the supervision strategy. In this case, we’re using the :one_for_one strategy. This strategy means that if the child dies, only that child will be restarted. If all resources depended on some common service, we could have specified :one_for_all to kill and restart all child process if any child dies. We’ll explore those strategies later on.

Now if we fire up our application with iex -S mix, we don’t see anything particular, since our counter is running but we aren’t interacting with it.

Let’s add a periodic tick to our counter to see it work in action in our supervision tree.

Modify your Counter’s init function and add a new handle_info callback, like this:

 def​ init(initial_val) ​do
  Process.send_after(self(), ​:tick​, 1000)
  {​:ok​, initial_val}
 end
 
 def​ handle_info(​:tick​, val) ​do
  IO.puts(​"​​tick ​​#{​val​}​​"​)
  Process.send_after(self(), ​:tick​, 1000)
  {​:noreply​, val - 1}
 end

We tweak init in the counter process to send itself a :tick message every 1,000 milliseconds, and then we add a function to process those ticks, simulating a countdown. As with channels, out-of-band messages are handled inside the handle_info callback, which sets up a new tick and decrements the state.

Now you can fire our application back up with iex -S mix and see our counter worker in action:

 iex>​ tick 5
 tick 4
 tick 3
 tick 2
 tick 1
 ^C

This isn’t terribly exciting, but it gets interesting when we deal with our workers crashing.

Let’s crash our counter if it ticks below a certain value:

 def​ handle_info(​:tick​, val) ​when​ val <= 0, ​do​: ​raise​ ​"​​boom!"
 
 def​ handle_info(​:tick​, val) ​do
  IO.puts(​"​​tick ​​#{​val​}​​"​)
  Process.send_after(self(), ​:tick​, 1000)
  {​:noreply​, val - 1}
 end

We add a :tick clause for cases when the value is less than zero, and we raise an error that crashes our process. Let’s fire up iex -S mix again and see what happens:

 iex>​ tick 5
 tick 4
 tick 3
 tick 2
 tick 1
 [error] GenServer #PID<0.119.0> terminating
 **​ (RuntimeError) boom!
  (info_sys) lib/info_sys/counter.ex:22: InfoSys.Counter.handle_info/2
  (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
  (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
  (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
 Last message: :tick
 State: 0
 tick 5
 tick 4
 tick 3
 tick 2
 tick 1
 ^C

As expected, our server crashed—but then it restarted! That’s the magic of supervision. When our counter crashed, it was restarted with its initial state of 5. In short, our program crashed, the supervisor identified the crash, and then it restarted the process in a known good state. We don’t have to add any extra code to fully supervise every process. We need only configure a policy to tell OTP how to handle each crash.

The basic building blocks of isolated application processes and a supervision structure to manage them have been the cornerstone of Erlang reliability—whether you’re running a trivial counter, a server with a million processes, or a worldwide distributed application with tens of millions of processes. The principles are the same, and they’ve been proven to work.

To apply these principles, you need to know how to tell Elixir what supervision behavior you expect. Here are the basics.

Restart Strategies

The first decision you need to make is to tell OTP what should happen if your process crashes. Think of these details as a software policy for dealing with failure. If we decide to use anything beyond the module to start and the initial value for the OTP server, we’ll need a way to specify those options. That’s called a child spec which configures the policy for an OTP restart.

You have a couple of options for defining those options. First, you can do it within the children definition in application.ex. To do so, you can use the Supervisor.child_spec function. For example, if we wanted to explicitly specify a :permanent restart strategy, you’d do so like this:

 children = [
  Supervisor.child_spec({InfoSys.Counter, 5}, ​restart:​ ​:permanent​)
 ]

That’s fine for a single child spec, but having to specify the supervision values every time we list our server would be repetitive and error prone. Fortunately, Elixir allows us to also define those values directly in the Counter module, like this:

 defmodule​ InfoSys.Counter ​do
 use​ GenServer, ​restart:​ ​:permanent
  ...
 end

Behind the scenes, this code works because use GenServer defines a child_spec(arg) function, which returns the child specification. Most of the time this high-level using option is enough. When you need more, you can always define your own child_spec(arg) function, like this:

 defmodule​ InfoSys.Counter ​do
  ...
 def​ child_spec(arg) ​do
  %{
 id:​ __MODULE__,
 start:​ { __MODULE__, ​:start_link​, [arg]},
 restart:​ ​:temporary​,
 shutdown:​ 5000,
 type:​ ​:worker
  }
 end
  ...
 end

The keys listed here are the module, the function, and arguments to call for starting and restarting the server, the restart configuration, a shutdown value in milliseconds, and the type of child. We’ll go into some of these options in more detail throughout the chapter. See the Elixir documentation for child_spec[32] for a complete list of options and more details. For now, let’s focus on the restart option configuration. Child specifications support the following restart values:

:permanent

The child is always restarted (default).

:temporary

The child is never restarted.

:transient

The child is restarted only if it terminates abnormally, with an exit reason other than :normal, :shutdown, or {:shutdown, term}.

:permanent is the default restart strategy and the trailing options are fully optional, so to specify a :permanent counter with an initial value of 5, we can use worker(InfoSys.Counter, [5]).

Let’s say we have a situation in which mostly dead isn’t good enough. When a counter dies, we want it to really die. Perhaps restarting the server would cause harm. Let’s try changing our restart strategy to :temporary and observe the crash:

 children = [
  Supervisor.child_spec({InfoSys.Counter, 5}, ​restart:​ ​:temporary​)
 ]

Now let’s fire our project back up with iex -S mix:

 iex>​ tick 5
 tick 4
 tick 3
 tick 2
 tick 1
 [error] GenServer #PID<0.306.0> terminating
 [error] GenServer #PID<0.119.0> terminating
 **​ (RuntimeError) boom!
  (info_sys) lib/info_sys/counter.ex:22: InfoSys.Counter.handle_info/2
  (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
  (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
  (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
 Last message: :tick
 State: 0

As you’d expect, when our counter dies it stays dead. The :temporary strategy is useful when a restart is unlikely to resolve the problem, or when restarting doesn’t make sense based on the flow of the application.

Sometimes, you may want OTP to retry an operation a few times before failing. You can do exactly that with a pair of child spec options called max_restarts and max_seconds. OTP will only restart an application max_restarts times in max_seconds before failing and reporting the error up the supervision tree. By default, Elixir will allow 3 restarts in 5 seconds, but you can configure these values to whatever you want. In general, you’ll use the restart strategies your specific application requires.

Supervision Strategies

Just as child workers have different restart strategies, supervisors have configurable supervision strategies. The most basic and the default for new Phoenix applications is :one_for_one. When a :one_for_one supervisor detects a crash, it restarts a worker of the same type without any other consideration. Most of the time, :one_for_one is enough but sometimes, processes depend on one another. When such a process dies, more than one process must restart. That’s why Elixir supports more than one restart strategy.

Let’s look at the ones that are available:

:one_for_one

If a child terminates, a supervisor restarts only that process.

:one_for_all

If a child terminates, a supervisor terminates all children and then restarts all children.

:rest_for_one

If a child terminates, a supervisor terminates all child processes defined after the one that dies. Then the supervisor restarts all terminated processes.

These strategies are all relatively straightforward. To get a taste of them, let’s start multiple counters and see how the termination of one of them affects the others. Back in lib/info_sys/application.ex, change the start function to this:

 children = [
  {InfoSys.Counter, 15},
  {InfoSys.Counter, 5},
  {InfoSys.Counter, 10},
 ]
 opts = [​strategy:​ ​:one_for_all​, ​name:​ InfoSys.Supervisor] ​# new strategy
 Supervisor.start_link(children, opts)

Now when you boot your application via $ iex -S mix, you will notice it won’t even start, with this reason:

 ** (Mix) Could not start application info_sys:
  InfoSys.Application.start(:normal, []) returned an error: bad child spec,
  more than one child specification has the id: InfoSys.Counter.
 
 If using maps as child specifications, make sure the :id keys are unique.
 If using a module or {module, arg} as child, use Supervisor.child_spec/2
 to change the :id, for example:
 
  children = [
  Supervisor.child_spec({MyWorker, arg}, id: :my_worker_1),
  Supervisor.child_spec({MyWorker, arg}, id: :my_worker_2)
  ]

The error message shows us exactly what we need to do. We are starting multiple counters but they all have the same ID. We need to pass distinct IDs in each child_spec call, so let’s do that:

 children = [
  Supervisor.child_spec({InfoSys.Counter, 15}, ​id:​ ​:long​),
  Supervisor.child_spec({InfoSys.Counter, 5}, ​id:​ ​:short​),
  Supervisor.child_spec({InfoSys.Counter, 10}, ​id:​ ​:medium​)
 ]

Restart the application with $ iex -S mix once more and you should see all servers counting down at the same time. As soon as the “short” counter reaches 0, it terminates, and then we can see all counters restarting from scratch. Feel free to play with the other supervision strategies and see how the system will behave.

Once the counter experiments are over, change our lib/rumbl/application.ex back to the original supervision tree and restart strategy:

 def​ start(_type, _args) ​do
  children = [
  ]
 
  opts = [​strategy:​ ​:one_for_one​, ​name:​ InfoSys.Supervisor]
  Supervisor.start_link(children, opts)
 end

The GenServer is the foundation of many different abstractions throughout Elixir and Phoenix. Knowing these small details will make you a much better programmer. Let’s see a couple more examples.

Using Agents

It turns out that a still simpler abstraction has many of the benefits of a GenServer. It’s called an agent. With an agent, you have only five main functions: start_link initializes the agent, stop stops the agent, update changes the state of the agent, get retrieves the agent’s current value, and get_and_update performs the last two operations simultaneously. Here’s what our counter would look like with an agent:

 iex>​ ​import​ Agent
 nil
 iex>​ {​:ok​, agent} = start_link(​fn​ -> 5 ​end​)
 {:ok, #PID<0.57.0>}
 iex>​ update(agent, &(&1 + 1))
 :ok
 iex>​ get(agent, &(&1))
 6
 iex>​ stop(agent)
 :ok

To initialize an agent, you pass a function returning the state you want. To update the agent, you pass a function taking the current state and returning the new state. That’s all there is to it. Behind the scenes, this agent is an OTP GenServer, and plenty of options are available to customize it as needed. One such option is called :name.

Registering Processes

With OTP, we can register a process by name with the :name option in start_link. After we register a process by name, we can send messages to it using the registered name instead of the pid.

Let’s rewrite the previous example using a named agent:

 iex>​ ​import​ Agent
 nil
 iex>​ {​:ok​, agent} = start_link(​fn​ -> 5 ​end​, ​name:​ MyAgent)
 {:ok, #PID<0.57.0>}
 iex>​ update(MyAgent, &(&1 + 1))
 :ok
 iex>​ get(MyAgent, &(&1))
 6
 iex>​ stop(MyAgent)
 :ok

If a process already exists with the registered name, we can’t start the agent:

 iex>​ ​import​ Agent
 nil
 iex>​ {​:ok​, agent} = start_link(​fn​ -> 5 ​end​, ​name:​ MyAgent)
 {:ok, #PID<0.57.0>}
 iex>​ {​:ok​, agent} = start_link ​fn​ -> 5 ​end​, ​name:​ MyAgent
 **​ (MatchError) no match of right hand side value:
  {:error, {:already_started, #PID<0.57.0>}}

Agents are one of the many constructs built on top of OTP. You’ve already seen another, the Phoenix.Channel. Let’s take a look.

OTP and Channels

If we were building a supervisor for a couple of application components, the simple default :one_for_one strategy might be enough. The goal for Phoenix Channels is bigger, though. To us, supervisors aren’t just tiny isolated services. Channels are core infrastructure. We intentionally build all of our infrastructure with a tree of supervisors, where each node of the tree knows how to restart any major service if it fails.

When you coded your channels in the previous chapter, you might not have known it, but you were building an OTP application. Each new channel was a process built to serve a single user in the context of a single conversation on a topic. Though Phoenix is new, we’re standing on the shoulders of giants. Erlang’s OTP has been around as long as Erlang has been popular—we know that it works. Much of the world’s text-messaging traffic runs on OTP infrastructure. WhatsApp runs on Erlang to process more than tens of billions messages every day. You can count on this infrastructure always being up and available because it’s built on a reliable foundation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.234.150