Chapter 6 Politics

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6 Politics

In November of 2008, I presented “Building Distributed Applications” at RubyConf. The day after my presentation, I saw a gentleman by the name of Mike Perham 1 present a project he was working on, called “Politics.” I remember thinking that it seemed like an interesting idea, but I didn’t quite get what he was trying to do at the time. I squirreled it away as something to keep my eye on and eventually try to figure out.

Fast-forward several months, and I’m at work discussing with a colleague a problem we are having. We have identical application server instances running in production, using a popular “cloud” hosting provider. Each one of these instances is born of the same base image, meaning that each instance is loaded with the same software and is configured identically. With autoscaling, the automated launching of a new instance in response to load, we are not always in charge of launching an instance; the “cloud” does it for us.

The problem we were running into was this: How do we configure just one machine to run our background jobs, scheduled tasks, and other queue-type processing? We didn’t want each machine to perform these tasks for a couple reasons. First, we didn’t want to worry about each of these processes to step on the other’s toes, processing the same task. Second, we didn’t want to add load to each instance, when we could burden just one instance.

The problem came to be, how do we do this? How do we configure one instance to perform these tasks, and not the others, when they are all identical instances? Also, what if the instance we configure crashes or goes down? Do we then have to manually configure another instance to be the one that performs the tasks?

We chewed on these questions over lunch, and we couldn’t come up with a decent solution that made us both happy. A few days later I was watching a popular Sunday morning political roundtable talk show, and I remembered Perham’s Politics.2 I sat down, started to play with the software, and realized two things: It might solve our problems, and it was easy to use.

Politics provides modules that allow us to build a self-repairing worker class that can be run on all our instances. But it designates only one instance at a time to do the work specified for a given time period. Politics calls this worker a token worker.

Under the covers Politics uses three different technologies to maintain order in all the worker classes. One of them we already know—DRb. The other two technologies it uses are Memcached 3 and mDNS 4 (also known as Multicast DNS, Zero Configuration Networking, or Bonjour).5

Memcached

Over the past few years, Memcached has become an industry-standard mechanism used to cache—well, pretty much everything. It is nearly impossible to be a developer and not hear the word Memcached mentioned as a must-have technology.

Originally designed to cache database queries, Memcached is now used to cache everything from HTML to whole objects. Its easy setup and straightforward clients, written in nearly every modern language, have helped ensure its continued success.

Memcached, at its most basic, is a high-performance Hash. Written in C, Memcached allows the setting and retrieval of key/value pairs, along with basic expiration data. Memcached automatically expires objects at their set expiration time. If an expiration time is not set, it retains that object for the life of the Memcached instance. The only exception to this rule is that Memcached has very smart algorithms for clearing out old, unused data, should it need to make space for more often-used objects.

In an effort to keep Memcached as lean and mean as it can be, the developers have kept its feature set incredibly light. For example, you can’t even get a list of all the keys currently stored within Memcached. Although the lack of a large feature set is a negative for some, that same lack of features allows for a highly stable and high-performance code base.

mDNS: a Horse of Many Colors

Although it may have many names, mDNS, or whatever your preferred name for it is, was developed to allow developers to use familiar DNS interfaces and techniques when in a small network where no DNS server has been installed. This technology is commonly encountered when trying to set up a network printer or storage device.

As briefly mentioned earlier in this book, Rinda uses this technology to find available services on the local network. Politics uses mDNS for the same reason one would use Rinda—to discover available services. It is unclear to me why the author of Politics, Mike Perham, decided to use mDNS and not Rinda for the discovery portion of Politics.

Installation

To use Politics, you must first have both Memcached and mDNS installed on your system. If you are using Mac OS X, you already have mDNS installed. The same goes for most flavors of UNIX. If your system does not have mDNS installed already, consult the package management system for your operating system to find out how to install it. Memcached will need to be installed on most machines. Instructions on how to do this can be found at http://www.danga.com/memcached/.

After you have installed, or confirmed the installation of, Memcached and mDNS, you can easily install Politics using RubyGems:

$ gem install mperham-politics -s http://gems.github.com

You should see a message similar to the following:

Successfully installed mperham-politics-0.2.5

With that, your installation should be complete, and you should be ready to go.

Where’s “Hello World”?

This is the only chapter that doesn’t start with a simple “Hello World” example. Throughout this book I have tried to keep the format of the beginning of each chapter the same. This gives each library and tool a common point for evaluation. Plus, it helps ensure that everything is installed and working correctly.

I struggled quite a lot with Politics to try and find a way to make it fit this model I had in mind for each chapter. The problem, it turns out, is that Politics is very different from each of the other libraries and tools we look at in this book.

It is not a tool that allows effective communication between different services, so it is not easy to just set up a simple “Hello World” example. Instead, Politics, as you will see, is a set of utilities that lets you solve problems involved with using some of the other tools in this book, such as making sure that different processes work in harmony.

Because of how different Politics is from the other libraries discussed in this book, I decided to treat it as such. I wanted you, the reader, to see Politics for what it is and show you how it can be leveraged with other technologies in this book to further enhance your distributed applications.

Working with Politics

As I mentioned earlier, one of the problems I was having was making sure that all the processes I had running didn’t step on each others’ toes and spend cycles doing the same work. This is a common problem when dealing with distributed programming. A typical example occurs when we are working with distributed message queues. Part III of this book, “Distributed Message Queues,” discusses distributed messaging queues in greater detail. For now, simply know that what we discuss in this section can make processing those queues extremely effective and powerful.

For our example, let’s pretend we have just received a big juicy government contract. The government has a queue that is constantly being filled with how much pork-barrel spending is happening. Because so much spending is occurring, the government wants to ensure that the system is constantly running and self-repairing. Should any of the workers die, or be taken offline, another should take its place and continue to keep track of the pork that is being spent.

To help us solve this problem, Politics offers the Politics::TokenWorker module. What does this module offer us? The Politics::TokenWorker module, when included into a worker class, allows us to create a class that will either process the work we give it and act as the “leader,” or patiently sit and wait for its turn to be the “leader” while not doing any work. The beautiful part of what the Politics::TokenWorker module offers is that no extra coding is needed to figure out if the current instance of the class is a “worker” or a “leader.”

Right now you are probably wondering what the difference is between a leader and a worker. When we launch our classes that include the Politics::TokenWorker module, they are all the same—they are workers. Workers, in this case, are the opposite of what you might believe them to be. Workers actually don’t do any work. Instead, workers sit and wait for their chance to be a leader. Leaders, on the other hand, do all the work. I know—it seems a bit confusing, doesn’t it? I probably would have chosen slightly different names, but that’s just me.

When a worker or workers launch, they all connect to the instance of Memcached that they are configured to communicate with. When they do that, one of the instances gets a token (hence the name token worker) and becomes a leader for a specified interval. It is then the job of that leader to complete the tasks assigned during the given time period, or iteration length, as it is known in Politics. When the specified time frame has elapsed, all the instances again connect to Memcached, where one of them is again elected leader, and the cycle continues.

Let’s look at an example of how this works in practice. To start, we need a queue to serve all the pork that needs to be processed. The following class is that queue:

Although the PorkSpendingQueue should not be used in production, I highly recommend one of the great queues discussed in Part III of this book. It should certainly serve our simple needs. The pop method finds the smallest piece of pork that is stored on the file system, reads the contents of the file, deletes that file, and returns the file’s value. Should we run out of pork (which is highly unlikely), nil will be returned.

The start_spending method quite generously creates a bunch of pork for us in our queue. To make sure you have pork for the following examples, you should execute the start_spending method to fill your queue.

Now that we have a queue we can access, let’s see what our token worker class will look like:

First we need to require a couple of Politics’ files, politics and politics/token_worker. When we have those, we require our pork_spending_queue file so that we can access our PorkSpendingQueue class that we built earlier.

In the definition of our PorkSpendingTokenWorker, we include the Politics::TokenWorker module to give us access to the methods and hooks we need to make our class into a token worker.

In our initialize method we need to connect this class to the system. We do that by calling the register_worker method we got when we included the Politics::TokenWorker module. The register_worker method takes two parameters. The first is the name of the group this worker should belong to. The second parameter is a Hash of options. Currently, only two options are supported.

The first is :iteration_length. This option is how long, in seconds, you want an iteration to be. An iteration is the span of time between when a leader is elected and when another leader needs to be elected. It is important to pick the correct length of time for your iteration. You want to make sure that it is long enough to actually process the tasks given to it. If you set this parameter too short, an exception is raised and the instance of that worker dies, never to be heard from again. However, if you set this parameter too long, there are other issues to deal with. First, if you set your worker up to process only so many tasks in the iteration length, then it will just sit there doing nothing after it finishes processing the tasks while it waits for the iteration to end. Second, if there is a problem with the current leader, a new one won’t be elected until the next iteration. So if you set your length to be an hour and your leader dies five minutes in, then you will have to wait another 55 minutes for the next leader to be elected. Although this might work for you in some circumstances, it is probably not going to work for you in most situations. The moral of this story is to take some time, do some benchmarking, and try to determine the most appropriate time for your system.

The second parameter is :servers. This parameter is an Array of Memcached server locations, presented in the format host:port.

With our PorkSpendingTokenWorker registered with the system, we need to tell it what it should do if it becomes a leader. We define this within our start method. There is no requirement that this method name needs to be called start. In theory, you do not even need a particular method; you could put it in your initialize method. But I prefer to have my code a bit cleaner than that—hence the separate method.

In our start method, we call the process method provided by the Politics::TokenWorker module. The process method is provided a block that tells the leader what to work on when the time comes. In our block we tell the class to process five pieces of pork per iteration. We call the pop method on the PorkSpendingQueue class. If a piece of pork is returned to us (not a nil value), we print a message to the screen. It tells us which PID, or process, has just spent the pork, and how much pork has been spent. After we have processed a piece of pork, we sleep for 1 second.

At the bottom of our file you will notice these two lines:

p = PorkSpendingTokenWorker.new
p.start

These lines are there so that if you execute the file, it will create a new instance of the PorkSpendingTokenWorker class and call its start method.

Now if we execute this class, we should see something similar to the following get printed to our screen:

Now if you were to start a second instance of the PorkSpendingTokenWorker class, you would see that it will, most likely, just sit there idle. There is a possibility that at the end of the iteration it might become a leader, but I have found that in most cases the original stays the leader until it is killed. This is not always the case, however, and you certainly should not count on it when developing these classes.

Caveats

The documentation that ships with Politics points out a few caveats that also should be mentioned here.

The first caveat is that Politics is not designed to work with multiple processes within the same Ruby VM. This means that we cannot fire up two instances of the PorkSpendingTokenWorker class in the same Ruby VM. If you want more than one instance of the class working on the same machine or virtual instance, you need to create a new Ruby VM for each instance.

The reason given for not being able to run more than one instance of a Politics class is that the algorithm used to choose a leader is only designed to pick a leader from a set of processes, not multiple instances within a single process.

The second caveat was briefly touched on in the main text. The algorithm that selects the next leader is not guaranteed to choose the previous leader for the next iteration. It has been my experience that most of the time, when a leader is elected, it tends to remain the leader for a long time; however, this is not certain.

You should not architect your classes to assume that there is a guaranteed order to the selection of a leader. Classes should be designed with the idea that they will be a leader only once. This means that you should avoid keeping state information pertinent to the leader in a particular Ruby VM. If you need to store such information, you should place it in a database, Memcached, or other such storage that can be accessed by any instance that is chosen to be the leader.

If we were to kill the instance that is currently the leader, we should see that the other, still running instance should become elected leader at the end of the current iteration:

As you can see, a different process took over as leader and continued to work on processing our pork queue.

Conclusion

Earlier in the chapter I described a problem I was having. I had identical server instances, but I wanted only one to work on background tasks at a time. I also needed to make sure that if a new instance was brought online, either by me or automatically by the system, it didn’t start processing background tasks. I also had the rather common problem of making sure that if the instance that was doing the background tasks died, another instance would pick up the mantle and start processing the background tasks.

Politics managed to solve those problems for me. By writing a simple class I was able to start instances of that class on each server instance when it starts, and I could rest assured that only one instance of the class would be processing tasks at any given time. I also could take comfort in the knowledge that should one of my instances die, or need to be taken offline for any reason, another instance would kick in to do the processing.

There certainly are other ways to solve this problem. A technology such as Delayed Job (Chapter 10, “Delayed Jobs”) also could be used, but it takes a different approach to solving these problems. I will say that eventually I did stop using Politics in favor of Delayed Job. That was not a slight on Politics but rather a desire to have a unified system for dealing with processing tasks. You might be wondering why I still chose to write about Politics, despite having stopped using it myself. The reason is that it is a good demonstration of using token workers for managing distributed programming tasks, plus there definitely are places where it is a more appropriate technology than something else like Delayed Job.

The library has a few shortcomings; the biggest one for me is the use of mDNS and not Rinda for its autodiscovery backend. However, it is well written, and it definitely gets the job done. Later in the book, when we look at tools such as distributed message queues, I want you to keep Politics in mind. Think about how using Politics will allow you to keep on top of your queues in an easy-to-use, easy-to-configure fashion.