Automatic crash recovery

When we create a site, server and site logic is all tied up in one process. Whereas with other platforms, the server code is already in place. If our site code has bugs, the server is very unlikely to crash, and thus in many cases the site can stay active even if one part of it is broken.

With a Node-based website, a small bug can crash the entire process, and this bug may only be triggered once in a blue moon.

As a hypothetical example, the bug could be related to character encoding on POST requests. When someone like Felix Geisendörfer completes and submits a form, suddenly our entire server crashes because it can't handle umlauts.

In this recipe, we'll look at using Upstart, an event-driven init service available for Linux servers, which isn't based upon Node, but is nevertheless a very handy accomplice.

Getting ready

We will need Upstart installed on our server. http://upstart.ubuntu.com contains instructions on how to download and install. If we're already using an Ubuntu or Fedora remote server then Upstart will already be integrated.

How to do it...

Let's make a new server that purposefully crashes when we access it via HTTP:

var http = require('http'),
http.createServer(function (req, res) {
  res.end("Oh oh! Looks like I'm going to crash...");
  throw crashAhoy;
}).listen(8080);

After the first page loads, the server will crash and the site goes offline.

Let's call this code server.js placing it on our remote server under /var/www/crashingserver

Now we create our Upstart configuration file, saving it on our server as /etc/init/crashingserver.conf.

start on started network-services

respawn
respawn limit 100 5

setuid www-data

exec /usr/bin/node /var/www/crashingserver/server.js >>  /var/log/crashingserver.log 2>&1 

post-start exec echo "Server was (re)started on $(date)" | mail -s "Crashing Server (re)starting" [email protected]

Finally, we initialize our server as follows:

start crashingserver

When we access http://nodecookbook.com:8080 and refresh the page, our site is still accessible. A quick look at /var/log/crashingserver.log reveals that the server did indeed crash. We could also check our inbox to find the server restart notification.

How it works...

The name of the Upstart service is taken from the particular Upstart configuration filename. We initiate the /etc/init/crashingserver.conf Upstart service with start crashingserver.

The first line of the configuration ensures our web server automatically recovers even when the operating system on our remote server is restarted (for example, due to a power failure or required reboot, and so on).

respawn is declared twice, once to turn on respawning and then to set a respawn limit — a maximum of 100 restarts every 5 seconds. The limit must be set according to our own scenario. If the website is low traffic this number might be adjusted to say 10 restarts in 8 seconds.

We want to stay alive if at all possible, but if an issue is persistent we can take that as a red flag that a bug is having a detrimental effect on user experience or system resources.

The next line initializes our server as the www-data user, and sends output to /var/log/crashingserver.log.

The final line sends out an email just after our server has been started, or restarted. This is so we can be notified that there are probably issues to address with our server.

There's more...

Let's implement another Upstart script that notifies us if the server crashes beyond its respawn limit, plus we'll look at another way to keep our server alive.

Detecting a respawn limit violation

If our server exceeds the respawn limit, it's likely there is a serious issue that should be solved as soon as possible. We need to know about it immediately. To achieve this in Upstart, we can create another Upstart configuration file that monitors the crashingserver daemon, sending an email if the respawn limit is transgressed.

task

start on stopped crashingserver PROCESS=respawn

script
  if [ "$JOB" != ''  ]
    then echo "Server "$JOB" has crashed on $(date)" | mail -s 
    $JOB" site down!!" [email protected]
  fi
end script

Let's save this to /etc/init/sitedownmon.conf.

Then we do:

start crashingserver
start sitedownmon

We define this Upstart process as a task (it only has one thing to do, after which it exits). We don't want it to stay alive after our server has crashed.

The task is performed when the crashingserver daemon has stopped during a respawn (for example, when the respawn limit has been broken).

Our script stanza (directive) contains a small bash script that checks for the existence of the JOB environment variable (in our case, it would be set to crashingserver) and then sends an email accordingly. If we don't check its existence, a sitedownmon seems to trigger false positives when it is first started and sends an email with an empty JOB variable.

We could later extend this script to include more web servers, simply by adding one line to sitedownmon.conf per server:

start on stopped anotherserver PROCESS=respawn

Staying up with forever

There is a simpler Node-based alternative to Upstart called forever:

npm -g install forever

If we simply initiate our server with forever as follows:

forever server.js

And then access our site, some of the terminal output will contain the following:

warn: Forever detected script exited with code: 1
warn: Forever restarting script for 1 time

But we'll still be able to access our site (although it will have crashed and been restarted).

To deploy our site on a remote server, we log in to our server via SSH, install forever and say:

forever start server.js

While this technique is certainly less complex, it's also less robust. Upstart provides core kernel functionality and is therefore system critical. If Upstart fails, the kernel panics and the whole server restarts.

Nevertheless, forever is used widely in production on Nodejitsu's PaaS stack, and its attractive simplicity may be viable for less mission-critical production environments.

See also

  • Deploying to a server environment discussed in this chapter
  • Hosting with a Platform as a Service provider discussed in this chapter
  • Continuous deployment discussed in this chapter
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.42.168