Distributed Programming with Ruby

Mark Bates

image

Upper Saddle River, NJ • Boston • Indianapolis • San Francisco
New York • Toronto • Montreal • London • Munich • Paris • Madrid
Capetown • Sydney • Tokyo • Singapore • Mexico City

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.

The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.

The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact:

U.S. Corporate and Government Sales
800-382-3419
[email protected]

For sales outside the United States, please contact:

International Sales
[email protected]

Visit us on the web: informit.com/ph

Editor-in-Chief
Mark Taub

Acquisitions Editor
Debra Williams Cauley

Development Editor
Songlin Qiu

Managing Editor
Kristy Hart

Senior Project Editor
Lori Lyons

Copy Editor
Gayle Johnson

Indexer
Brad Herriman

Proofreader
Apostrophe Editing
Services

Publishing Coordinator
Kim Boedigheimer

Cover Designer
Chuti Prasertsith

Compositor
Nonie Ratcliff

Library of Congress Cataloging-in-Publication Data:

Bates, Mark, 1976-
  Distributed programming with Ruby/Mark Bates.
         p. cm.
  Includes bibliographical references and index.
  ISBN 978-0-321-63836-6 (pbk. : alk. paper) 1. Ruby (Computer program language)
2. Electronic data processing—Distributed processing. 3. Object-oriented methods
(Computer science) I. Title.
  QA76.73.R83B38 2010
  005.1’17—dc22
                                                                       2009034095

Copyright © 2010 Pearson Education, Inc.

All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to:

Pearson Education, Inc.
Rights and Contracts Department
501 Boylston Street, Suite 900
Boston, MA 02116
Fax: 617-671-3447

ISBN-13: 978-0-321-63836-6
ISBN-10: 0-321-63836-0
Text printed in the United States on recycled paper at RR Donnelley and Sons in
Crawfordsville, Indiana
First printing November 2009

To Rachel, Dylan, and Leo.

Thanks for letting Daddy hide away until the wee hours of the morning and be absent most weekends. I love you both so very much, and I couldn’t have done this without the three of you.

Contents

Foreword

Preface

Part I Standard Library

1 Distributed Ruby (DRb)

Hello World

Proprietary Ruby Objects

Security

Access Control Lists (ACLs)

DRb over SSL

ID Conversion

Built-in ID Converters

Building Your Own ID Converter

Using Multiple ID Converters

Conclusion

Endnotes

2 Rinda

“Hello World” the Rinda Way

Understanding Tuples and TupleSpaces

Writing a Tuple to a TupleSpace

Reading a Tuple from a TupleSpace

Taking a Tuple from a TupleSpace

Reading All Tuples in a TupleSpace

Callbacks and Observers

Understanding Callbacks

Implementing Callbacks

Security with Rinda

Access Control Lists (ACLs)

Using Rinda over SSL

Selecting a RingServer

Renewing Rinda Services

Using a Numeric to Renew a Service

Using nil to Renew a Service

Using the SimpleRenewer Class

Custom Renewers

Conclusion

Endnotes

Part II Third-Party Frameworks and Libraries

3 RingyDingy

Installation

Getting Started with RingyDingy

“Hello World” the RingyDingy Way

Building a Distributed Logger with RingyDingy

Letting RingyDingy Shine

Conclusion

4 Starfish

Installation

Getting Started with Starfish

“Hello World” the Starfish Way

Using the Starfish Binary

Saying Goodbye to the Starfish Binary

Building a Distributed Logger with Starfish

Letting Starfish Shine

MapReduce and Starfish

Using Starfish to MapReduce ActiveRecord

Using Starfish to MapReduce a File

Conclusion

Endnotes

5 Distribunaut

Installation

Blastoff: Hello, World!

Building a Distributed Logger with Distribunaut

Avoiding Confusion of Services

Borrowing a Service with Distribunaut

Conclusion

Endnotes

6 Politics

Installation

Working with Politics

Conclusion

Endnotes

Part III Distributed Message Queues

7 Starling

What Is a Distributed Message Queue?

Installation

Getting Started with Starling

“Hello World” the Starling Way

Building a Distributed Logger with Starling

Persisted Queues

Getting Starling Stats

Conclusion

Endnotes

8 AMQP/RabbitMQ

What Is AMQP?

Installation

“Hello World” the AMQP Way

Building a Distributed Logger with AMQP

Persisted AMQP Queues

Subscribing to a Message Queue

Topic Queues

Fanout Queues

Conclusion

Endnotes

Part IV Distributed Programming with Ruby on Rails

9 BackgrounDRb

Installation

Offloading Slow Tasks with BackgrounDRb

Configuring BackgrounDRb

Persisting BackgrounDRb Tasks

Caching Results with Memcached

Conclusion

Endnotes

10 Delayed Job

Installation

Sending It Later with Delayed Job

Custom Workers and Delayed Job

Who’s on First, and When Does He Steal Second?

Configuring Delayed Job

Conclusion

Endnotes

Index

Foreword

Mark’s career in programming parallels mine to a certain degree. We both started developing web applications in 1996 and both did hard time in the Java world before discovering Ruby and Rails in 2005, and never looking back.

At RubyConf 2008 in Orlando, I toasted Mark on his successful talk as we sipped Piña Coladas and enjoyed the “fourth track” of that conference—the lazy river and hot tub. The topic of our conversation? Adding a title to the Professional Ruby Series in which Mark would draw from his experience building Mack, a distributed web framework, as well as his long career doing distributed programming. But most important, he would let his enthusiasm for the potentially dry subject draw in the reader while being educational. I sensed a winner, but not only as far at finding the right author. The timing was right, too.

Rails developers around the world are progressing steadily beyond basic web programming as they take on large, complex systems that traditionally would be done on Java or Microsoft platforms. As a system grows in scale and complexity, one of the first things you need to do is to break it into smaller, manageable chunks. Hence all the interest in web services. Your initial effort might involve cron jobs and batch processing. Or you might implement some sort of distributed job framework, before finally going with a full-blown messaging solution.

Of course, you don’t want to reinvent anything you don’t need to, but Ruby’s distributed programming landscape can be confusing. In the foreground is Ruby’s DRb technology, part of the standard library and relatively straightforward to use—especially for those of us familiar with parallel technologies in other languages, such as Java’s RMI. But does that approach scale? And is it reliable? If DRb is not suitable for your production use, what is? If we cast our view further along the landscape, we might ask: “What about newer technologies like AMQP and Rabbit MQ? And how do we tie it all together with Rails in ways that make sense?”

Mark answers all those questions in this book. He starts with some of the deepest documentation on DRb and Rinda that anyone has ever read. He then follows with coverage of the various Ruby libraries that depend on those building blocks, always keeping in mind the practical applications of all of them. He covers assembling cloud-based servers to handle background processing, one of today’s hottest topics in systems architecture. Finally, he covers the Rails-specific libraries BackgrounDRb and Delayed Job and teaches you when and how to use each.

Ultimately, one of my most pleasant surprises and one of the reasons that I think Mark is an up-and-coming superstar of the Ruby community is the hard work, productivity, and fastidiousness that he demonstrated while writing this book. Over the course of the spring and summer of this year, Mark delivered chapters and revisions week after week with clockwork regularity. All with the utmost attention to detail and quality. All packed with knowledge. And most important, all packed with strong doses of his winning personality. It is my honor to present to you the latest addition to our series, Distributed Programming with Ruby.

Obie Fernandez, Series Editor
September 30, 2009

Preface

I first found a need for distributed programming back in 2001. I was looking for a way to increase the performance of an application I was working on. The project was a web-based email client, and I was struggling with a few performance issues. I wanted to keep the email engine separate from the client front end. That way, I could have a beefier box handle all the processing of the incoming email and have a farm of smaller application servers handling the front end of it. That seems pretty easy and straightforward, doesn’t it? Well, the language I was using at the time was Java, and the distributed interface was RMI (remote method invocation). Easy and straightforward are not words I would use to describe my experiences with RMI.

Years later I was working on a completely different project, but I had a not-too-dissimilar problem—performance. The application this time was a large user-generated content site built using Ruby on Rails. When a user wrote, edited, or deleted an article for the site, it needed to be indexed by our search engine, our site map needed to be rebuilt, and the article needed to be injected into the top of our rating engine system. As you can imagine, none of this was quick and simple. You can also probably guess that our CEO wanted all of this to happen as close to real time as possible, but without the end user’s having to wait for everything to get done. To further complicate matters, we had limited system resources and millions of articles that needed to be processed.

I didn’t want to burden our already-overworked applications server boxes with these tasks, so I had to offload the processing to another machine. The question came to be how I could best offload this work. The first idea was to use the database as the transfer mechanism. I could store all the information in the database that these systems would need. Then the machine that was to do the processing could poll the database at a regular interval, find any pending tasks, pull them out of the database, create the same heavy objects I already had, and then start processing them. The problem, as you most likely already know, is that I’m now placing more load on the database. I would be polling it continually, regardless of whether it contained any tasks. If it did have tasks, I would have to pull those records out of the database and use more system resources transforming the records back into those same heavy Ruby objects I already had.

What I really wanted to do was just send the fully formed Ruby objects I had already created to the other machine and let it do the processing. This would lessen the burden all around. In addition to the lighter load on the database, memory, and system resources, the machine doing the processing would work only when it was told to, and it wouldn’t waste recourses by continually polling the database. Plus, without polling, the parts of the application the CEO wanted updated in near real time would get updated faster.

After I realized that what I wanted to do was to use some sort of distributed mechanism, that’s when I decided to see what sort of RMI-esque features Ruby had. I was already impressed with Ruby for being a terse language, but when I found the DRb (Distributed Ruby, also known as dRuby) package, I became a believer. I found that writing distributed applications in Ruby could be simple, and dare I say fun.

Who Is This Book For?

This book is quite simply written for the intermediate to advanced Ruby developer who wants to start developing distributed applications. This book assumes that you have good knowledge of Ruby, at least at the intermediate developer level. Although we will touch on some parts of the Ruby language—particularly those that might be confusing when dealing with distributed applications—we will not be going into the language in depth.

Although you should know Ruby, this book assumes that you probably do not understand distributed programming and that this is your first venture into this world. If you have done distributed programming before, this book will help you quickly understand how to do it in Ruby. If you haven’t, this book will help you understand what distributed programming is and isn’t.

How Is This Book Organized?

This book is split into four parts. Part I examines what ships with the standard library in Ruby 1.8.x and beyond. We look, in depth, at understanding how DRb (dRuby or Distributed Ruby) and Rinda work. We will build some simple applications in a variety of ways and use those examples to talk about the libraries. We examine the pros and cons of DRb and Rinda. By the end of Part I, “Standard Library,” you should feel comfortable and ready to build your distributed applications using these libraries.

Part II, “Third-Party Frameworks and Libraries,” looks at a variety of third-party tools, libraries, and frameworks designed to make distributed programming in Ruby easy, fun, and robust. Some of these libraries build on the DRb and Rinda libraries we learned about in Part I, and others don’t. Some are based on executing arbitrary code on another machine. Others are based on running code in the background to elevate performance.

Part III, “Distributed Message Queues,” takes a close look at some of the leading distributed message queues available to the Ruby community. These queues can help facilitate communication and tasks between your applications. Distributed message queues can help increase your applications’ performance by queuing up work to be done at a later date instead of at runtime.

Finally, Part IV, “Distributed Programming with Ruby on Rails,” looks at a few libraries that are designed to work exclusively with the Ruby on Rails web framework. These libraries might already be familiar to you if you have been using Ruby on Rails for several years. But there is always something to be learned, and that’s what the chapters in this part of this book will help you with.

During the course of the book, we will examine a breadth of different technologies; however, this book is not necessarily a how-to guide. Instead, you will use these different technologies to help understand the complex problems associated with distributed programming and several different ways you can solve these problems. You’ll use these technologies to learn about RMI, message queues, and MapReduce, among others.

How to Run the Examples

I have tried to make this book as easy to use and follow as possible. When a new technology is referenced or introduced, I give you a link to find out more about it and/or its developer(s). When you see a code sample, unless otherwise stated, I present that sample in its entirety. I have also taken extra effort to make sure that you can easily run each of those code samples as is. Unless otherwise stated, you should be able to take any code sample, copy it into a Ruby file, and run it using the ruby command, like this:

$ ruby foo.rb

There are times when a file needs to be named something specific or has to be run with a special command. In particular, Chapter 4, “Starfish,” covers this issue. At that time I will call your attention to these details so that you can run the examples without hassle.

In some chapters, such as Chapters 2, “Rinda,” and 8, “AMQP/RabbitMQ,” background servers need to be run for the examples to run correctly. It is highly recommended that you restart these background servers between each set of examples that are presented in these chapters. A lot of these chapters iteratively build on a piece of software, and restarting the servers between runs helps eliminate potentially confusing results.

Acknowledgments

Writing a book isn’t easy. I know that’s an obvious statement, but sometimes I think people just don’t quite get what goes into writing a book. I didn’t think it would be this difficult. Thankfully, though, I have somehow made it out the other side. I’m a little (more like a lot) battered, bruised, and very tired, but it was definitely worth it.

However, I couldn’t have done this without a lot of help from a lot of different people. As with a good Oscar speech, I’ll try to keep this brief, and I’m sure, as with an Oscar speech, I’ll leave out some people. If I’ve left you out, I apologize. Now, let’s see if I can get through this before the orchestra plays me off.

First, and foremost, I have to thank my family. Rachel, my beautiful wife, has been so supportive and understanding, not just with this book, but with everything I do. I know that she would’ve loved to have had me spend my weekend afternoons going for walks with her. Or to have me do the stuff around the house that needs to get done. Instead, she let me hide away in my office/studio, diligently (sometimes) working on my book. The same goes for Dylan, my son. I’m sure he would’ve preferred to have Daddy playing with him all day. I’m all yours now, little buddy. And to little Leo: This book and you share a very similar timeline—only two days separate your birth and this book going to print. Welcome, son! Your mother and big brother will tell you this hasn’t been easy, and you’re better for having slept through the whole thing.

Before I get off the subject of family, I would like to thank my parents. The reasons are obvious. They brought me into this world. (And, from what I’ve been told, they can take me out as well.) They have always supported me and have made me the man I am today. Because of them I am not afraid to take risks. I’m not afraid to fail. In general, I’m not afraid. Except for dogs. I’m afraid of dogs, but I don’t think that’s my parents’ fault.

I would also like to quickly thank the rest of my friends, family, and coworkers. Mostly I’m thanking them for not telling me to shut up whenever I started talking about my book, which, let me tell you, was a lot. Even I got tired of hearing about it!

In November 2008, I gave a presentation titled “Building Distributed Applications” at RubyConf in Florida. After my presentation I was approached by a couple of gentlemen telling me how much they enjoyed my talk. They wanted to know where they could find out more about DRb and Rinda. I told them that unfortunately very little documentation on the subject existed—just a few blog posts here and there, and the code itself. They told me I should write a book about distributed programming with Ruby, adding that they would order it in a heartbeat. I thought it was a great idea. Shortly before I sent my manuscript to the publisher, I received an email from one of these gentlemen, Ali Rizvi. He had stumbled across one of my blog posts on a completely unrelated subject (the iPhone), and he realized who I was and that I was writing this book. He dropped me a quick note to say hi and that he was looking forward to reading the book. So Ali, now that I know your name, thank you for the idea!

At that same conference I found myself having a few drinks in the hot tub with none other than Obie Fernandez, the Professional Ruby Series editor for Addison-Wesley. He told me how much he enjoyed my presentation earlier that day. I used the opportunity to pitch him my book idea—the one I’d had only an hour before. He loved the idea and told me he thought it would be a great book, and he would love to be a part of it. A few weeks later I received an email from Debra Williams Cauley at Addison-Wesley, wanting to talk to me about the book. The rest, as they say, is history.

Obie and Debra have been my guiding light with this book. Obie has given me great advice and guidance on writing it. His direction as a series editor has been invaluable. Thank you, Obie, for your mentoring, and thank you for helping me get this opportunity.

Debra, thank you. Thank you so much. Debra managed this book. She answered all my questions (some good, some bad); she was always there with an answer. She never told me a request was too outrageous. She helped guide me through the treacherous waters of book writing, and it’s because of her that I managed to make it through to the other end mostly unscathed. I can’t say enough great things about Debra, and I know I can never thank her as much as she deserves to be thanked in regards to this book. Thank you, Debra.

I would like to thank Songlin Qiu. Songlin’s amazing technical editing is, quite frankly, what made this book readable. She constantly kept me on my toes and made sure not only that the book was consistent, but also that it was well written and worth reading. I’m pretty sure she also fixed a million misuses of the “its” that appeared in the book. Thank you, Songlin.

Gayle Johnson also deserves a thank you here for her copy editing. She is the one who turned my words into poetry. Well, maybe poetry is an exaggeration, but trust me—this book is a lot more enjoyable to read because of her. She turned my Guinness soaked ramblings into coherent English. Thank you, Gayle.

Lori was my project editor on this book. She helped to guide me through the murky waters that are the copy editing/pre-production phase of writing a book. Thank you, Lori, for helping me take my book to the printer.

I would like to acknowledge another group of people—technical reviewers. They read the book and told me all the things they don’t like about it. Just kidding—sort of. They are my peers. Their job is to read the book and give me feedback on what they liked, disliked, and were indifferent to. Their comments ranged from “Why didn’t you talk about such-and-such?” to “I like how you flow from this subject to that one.” Some of these people I came to absolutely love, either because they offered me great advice or because they liked what I had done. Others I came to be frustrated with, either because I didn’t like their comments or because they were right, and I don’t like being wrong. Either way, all the feedback was extremely helpful. So with that said, here is a list of those people, in no particular order: Gregg Pollack, Robert P.J. Day, Jennifer Lindner, and Ilya Grigorik. Thank you all so very much.

I want to thank everyone at Addison-Wesley who worked on this book. Thank you to those who dedicated their time to making my dream into a reality. I know there are people who are working hard in the background that I am unaware of, from the cover art, to the technical editing, to the page layout, to the technical reviewers, to the person who corrects my spelling, thank you.

Finally, thank you. Thank you for spending your time and your money on this book. I appreciate it very, very much.

About the Author

Mark Bates has been developing web applications of one kind or another since 1996. He has spent an ungodly amount of time programming Java, but thankfully he discovered Ruby in late 2005, and life has been much nicer since.

Since discovering Ruby, Mark has become a prominent member of the community. He has developed various open-source projects, such as Configatron, Cachetastic, Genosaurus, APN on Rails, and the Mack Framework, just to name a few. The Mack Framework brought Mark to the forefront of distributed programming in the Ruby community. Mack was a web framework designed from the ground up to aid in the development of distributed applications.

Mark has taught classes on both Ruby and Ruby on Rails. He has spoken at several Ruby gatherings, including 2008’s RubyConf, where he spoke about building distributed applications.

Mark has an honors degree in music from the Liverpool Institute for Performing Arts. He still likes to rock out on the weekends, but set times are now 10 p.m., not 2 a.m.

He lives just outside of Boston with his wife Rachel and their son Dylan, both of whom he missed very much when writing this book.

Mark can be found at http://www.markbates.com and http://github.com/markbates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.137.38