Chapter 6 Politics

In November of 2008, I presented “Building Distributed Applications” at RubyConf. The day after my presentation, I saw a gentleman by the name of Mike Perham1 present a project he was working on, called “Politics.” I remember thinking that it seemed like an interesting idea, but I didn’t quite get what he was trying to do at the time. I squirreled it away as something to keep my eye on and eventually try to figure out.

Fast-forward several months, and I’m at work discussing with a colleague a problem we are having. We have identical application server instances running in production, using a popular “cloud” hosting provider. Each one of these instances is born of the same base image, meaning that each instance is loaded with the same software and is configured identically. With autoscaling, the automated launching of a new instance in response to load, we are not always in charge of launching an instance; the “cloud” does it for us.

The problem we were running into was this: How do we configure just one machine to run our background jobs, scheduled tasks, and other queue-type processing? We didn’t want each machine to perform these tasks for a couple reasons. First, we didn’t want to worry about each of these processes to step on the other’s toes, processing the same task. Second, we didn’t want to add load to each instance, when we could burden just one instance.

The problem came to be, how do we do this? How do we configure one instance to perform these tasks, and not the others, when they are all identical instances? Also, what if the instance we configure crashes or goes down? Do we then have to manually configure another instance to be the one that performs the tasks?

We chewed on these questions over lunch, and we couldn’t come up with a decent solution that made us both happy. A few days later I was watching a popular Sunday morning political roundtable talk show, and I remembered Perham’s Politics.2 I sat down, started to play with the software, and realized two things: It might solve our problems, and it was easy to use.

Politics provides modules that allow us to build a self-repairing worker class that can be run on all our instances. But it designates only one instance at a time to do the work specified for a given time period. Politics calls this worker a token worker.

Under the covers Politics uses three different technologies to maintain order in all the worker classes. One of them we already know—DRb. The other two technologies it uses are Memcached3 and mDNS4 (also known as Multicast DNS, Zero Configuration Networking, or Bonjour).5

Installation

To use Politics, you must first have both Memcached and mDNS installed on your system. If you are using Mac OS X, you already have mDNS installed. The same goes for most flavors of UNIX. If your system does not have mDNS installed already, consult the package management system for your operating system to find out how to install it. Memcached will need to be installed on most machines. Instructions on how to do this can be found at http://www.danga.com/memcached/.

After you have installed, or confirmed the installation of, Memcached and mDNS, you can easily install Politics using RubyGems:

$ gem install mperham-politics -s http://gems.github.com

You should see a message similar to the following:

Successfully installed mperham-politics-0.2.5

With that, your installation should be complete, and you should be ready to go.

Working with Politics

As I mentioned earlier, one of the problems I was having was making sure that all the processes I had running didn’t step on each others’ toes and spend cycles doing the same work. This is a common problem when dealing with distributed programming. A typical example occurs when we are working with distributed message queues. Part III of this book, “Distributed Message Queues,” discusses distributed messaging queues in greater detail. For now, simply know that what we discuss in this section can make processing those queues extremely effective and powerful.

For our example, let’s pretend we have just received a big juicy government contract. The government has a queue that is constantly being filled with how much pork-barrel spending is happening. Because so much spending is occurring, the government wants to ensure that the system is constantly running and self-repairing. Should any of the workers die, or be taken offline, another should take its place and continue to keep track of the pork that is being spent.

To help us solve this problem, Politics offers the Politics::TokenWorker module. What does this module offer us? The Politics::TokenWorker module, when included into a worker class, allows us to create a class that will either process the work we give it and act as the “leader,” or patiently sit and wait for its turn to be the “leader” while not doing any work. The beautiful part of what the Politics::TokenWorker module offers is that no extra coding is needed to figure out if the current instance of the class is a “worker” or a “leader.”

Right now you are probably wondering what the difference is between a leader and a worker. When we launch our classes that include the Politics::TokenWorker module, they are all the same—they are workers. Workers, in this case, are the opposite of what you might believe them to be. Workers actually don’t do any work. Instead, workers sit and wait for their chance to be a leader. Leaders, on the other hand, do all the work. I know—it seems a bit confusing, doesn’t it? I probably would have chosen slightly different names, but that’s just me.

When a worker or workers launch, they all connect to the instance of Memcached that they are configured to communicate with. When they do that, one of the instances gets a token (hence the name token worker) and becomes a leader for a specified interval. It is then the job of that leader to complete the tasks assigned during the given time period, or iteration length, as it is known in Politics. When the specified time frame has elapsed, all the instances again connect to Memcached, where one of them is again elected leader, and the cycle continues.

Let’s look at an example of how this works in practice. To start, we need a queue to serve all the pork that needs to be processed. The following class is that queue:

image

Although the PorkSpendingQueue should not be used in production, I highly recommend one of the great queues discussed in Part III of this book. It should certainly serve our simple needs. The pop method finds the smallest piece of pork that is stored on the file system, reads the contents of the file, deletes that file, and returns the file’s value. Should we run out of pork (which is highly unlikely), nil will be returned.

The start_spending method quite generously creates a bunch of pork for us in our queue. To make sure you have pork for the following examples, you should execute the start_spending method to fill your queue.

Now that we have a queue we can access, let’s see what our token worker class will look like:

image

image

First we need to require a couple of Politics’ files, politics and politics/token_worker. When we have those, we require our pork_spending_queue file so that we can access our PorkSpendingQueue class that we built earlier.

In the definition of our PorkSpendingTokenWorker, we include the Politics::TokenWorker module to give us access to the methods and hooks we need to make our class into a token worker.

In our initialize method we need to connect this class to the system. We do that by calling the register_worker method we got when we included the Politics::TokenWorker module. The register_worker method takes two parameters. The first is the name of the group this worker should belong to. The second parameter is a Hash of options. Currently, only two options are supported.

The first is :iteration_length. This option is how long, in seconds, you want an iteration to be. An iteration is the span of time between when a leader is elected and when another leader needs to be elected. It is important to pick the correct length of time for your iteration. You want to make sure that it is long enough to actually process the tasks given to it. If you set this parameter too short, an exception is raised and the instance of that worker dies, never to be heard from again. However, if you set this parameter too long, there are other issues to deal with. First, if you set your worker up to process only so many tasks in the iteration length, then it will just sit there doing nothing after it finishes processing the tasks while it waits for the iteration to end. Second, if there is a problem with the current leader, a new one won’t be elected until the next iteration. So if you set your length to be an hour and your leader dies five minutes in, then you will have to wait another 55 minutes for the next leader to be elected. Although this might work for you in some circumstances, it is probably not going to work for you in most situations. The moral of this story is to take some time, do some benchmarking, and try to determine the most appropriate time for your system.

The second parameter is :servers. This parameter is an Array of Memcached server locations, presented in the format host:port.

With our PorkSpendingTokenWorker registered with the system, we need to tell it what it should do if it becomes a leader. We define this within our start method. There is no requirement that this method name needs to be called start. In theory, you do not even need a particular method; you could put it in your initialize method. But I prefer to have my code a bit cleaner than that—hence the separate method.

In our start method, we call the process method provided by the Politics::TokenWorker module. The process method is provided a block that tells the leader what to work on when the time comes. In our block we tell the class to process five pieces of pork per iteration. We call the pop method on the PorkSpendingQueue class. If a piece of pork is returned to us (not a nil value), we print a message to the screen. It tells us which PID, or process, has just spent the pork, and how much pork has been spent. After we have processed a piece of pork, we sleep for 1 second.

At the bottom of our file you will notice these two lines:

p = PorkSpendingTokenWorker.new
p.start

These lines are there so that if you execute the file, it will create a new instance of the PorkSpendingTokenWorker class and call its start method.

Now if we execute this class, we should see something similar to the following get printed to our screen:

image

Now if you were to start a second instance of the PorkSpendingTokenWorker class, you would see that it will, most likely, just sit there idle. There is a possibility that at the end of the iteration it might become a leader, but I have found that in most cases the original stays the leader until it is killed. This is not always the case, however, and you certainly should not count on it when developing these classes.

If we were to kill the instance that is currently the leader, we should see that the other, still running instance should become elected leader at the end of the current iteration:

image

As you can see, a different process took over as leader and continued to work on processing our pork queue.

Conclusion

Earlier in the chapter I described a problem I was having. I had identical server instances, but I wanted only one to work on background tasks at a time. I also needed to make sure that if a new instance was brought online, either by me or automatically by the system, it didn’t start processing background tasks. I also had the rather common problem of making sure that if the instance that was doing the background tasks died, another instance would pick up the mantle and start processing the background tasks.

Politics managed to solve those problems for me. By writing a simple class I was able to start instances of that class on each server instance when it starts, and I could rest assured that only one instance of the class would be processing tasks at any given time. I also could take comfort in the knowledge that should one of my instances die, or need to be taken offline for any reason, another instance would kick in to do the processing.

There certainly are other ways to solve this problem. A technology such as Delayed Job (Chapter 10, “Delayed Jobs”) also could be used, but it takes a different approach to solving these problems. I will say that eventually I did stop using Politics in favor of Delayed Job. That was not a slight on Politics but rather a desire to have a unified system for dealing with processing tasks. You might be wondering why I still chose to write about Politics, despite having stopped using it myself. The reason is that it is a good demonstration of using token workers for managing distributed programming tasks, plus there definitely are places where it is a more appropriate technology than something else like Delayed Job.

The library has a few shortcomings; the biggest one for me is the use of mDNS and not Rinda for its autodiscovery backend. However, it is well written, and it definitely gets the job done. Later in the book, when we look at tools such as distributed message queues, I want you to keep Politics in mind. Think about how using Politics will allow you to keep on top of your queues in an easy-to-use, easy-to-configure fashion.

Endnotes

1. http://www.mikeperham.com/

2. http://github.com/mperham/politics/tree/master

3. http://www.danga.com/memcached/

4. http://en.wikipedia.org/wiki/MDNS, http://www.multicastdns.org/

5. http://en.wikipedia.org/wiki/Bonjour_(software)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.106.135