Most of this book has emphasized that understanding how to use tools effectively is just as important as having a nice collection of tools. However, that’s not to say that knowing where to find the right tool for the job isn’t an invaluable skill to have. In this appendix, we’ll take a look at a small sampling of Ruby’s vast standard library. What you will find is that it is essentially a treasure chest of goodies designed to make your Ruby programs more enjoyable to write.
Because of RubyGems, we tend to leverage a lot of third-party software. For this reason, we are often more likely to resort to a Google search instead of a search of Ruby’s API documentation when we want to solve a problem that isn’t immediately handled in core Ruby. This isn’t necessarily a bad thing, but it is important not to overlook the benefits that come with using a standard library when it is available. When all else is equal, the gains you’ll get from using standard Ruby are easy to enumerate:
Ruby standard libraries are typically distributed with Ruby itself, which means that no extra software needs to be installed to make them work.
Standard libraries don’t change rapidly. Their APIs tend to be stable and mature, and will likely outlast your application’s development cycle. This removes the need for frequent compatibility updates that you might experience with third-party software.
Except for a few obvious exceptions, Ruby standard libraries are guaranteed to run anywhere Ruby runs, avoiding platform-specific issues.
Using standard libraries improves the understandability of your code, as they are available to everyone who uses Ruby. For open source projects, this might make contributions to your project easier, for the same reason.
These reasons are compelling enough to encourage us to check Ruby’s standard library before doing a Google search for third-party libraries. However, it might be more convincing if you have some practical examples of what can be accomplished without resorting to dependency code.
I’ve handpicked 10 of the libraries I use day in and day out. This isn’t necessarily meant to be a “best of” sampling, nor is it meant to point out the top 10 libraries you need to know about. We’ve implicitly and explicitly covered many standard libraries throughout the book, and some of those may be more essential than what you’ll see here. However, I’m fairly certain that after reading this appendix, you’ll find at least a few useful tricks in it, and you’ll also get a clear picture of how diverse Ruby’s standard library is.
Be sure to keep in mind that while we’re looking at 10 examples here, there are more than 100 standard libraries packaged with Ruby, about half of which are considered mature. These vary in complexity from simple tools to solve a single task to full-fledged frameworks. Even though you certainly won’t need to be familiar with every last package and what it does, it’s important to be aware of the fact that what we’re about to discuss is just the tip of the iceberg.
Now, on to the fun. I’ve included this appendix because I think it embodies a big part of the joy of Ruby programming to me. I hope that when reading through the examples, you feel the same way.
As I mentioned before, Ruby standard libraries run the gamut from extreme simplicity to deep complexity. I figured we’d kick things off with something in the former category.
If you’re reading this book, you’ve certainly made use of Kernel#p
during debugging. This handy method,
which calls #inspect
on an object and
then prints out its result, is invaluable for basic debugging needs.
However, reading its output for even relatively modest objects can be
daunting:
friends = [ { first_name: "Emily", last_name: "Laskin" }, { first_name: "Nick", last_name: "Mauro" }, { first_name: "Mark", last_name: "Maxwell" } ] me = { first_name: "Gregory", last_name: "Brown", friends: friends } p me # Outputs: {:first_name=>"Gregory", :last_name=>"Brown", :friends=>[{:first_name=>"Emily", :last_name=>"Laskin"}, {:first_name=>"Nick", :last_name=>"Mauro"}, {:first_name= >"Mark", :last_name=>"Maxwell"}]}
We don’t typically write our code this way, because the structure of
our objects actually means something to us. Luckily, the
pp standard library understands this and provides
much nicer human-readable output. The changes to use pp
instead of p
are fairly simple:
require "pp" friends = [ { first_name: "Emily", last_name: "Laskin" }, { first_name: "Nick", last_name: "Mauro" }, { first_name: "Mark", last_name: "Maxwell" } ] me = { first_name: "Gregory", last_name: "Brown", friends: friends } pp me # Outputs: {:first_name=>"Gregory", :last_name=>"Brown", :friends=> [{:first_name=>"Emily", :last_name=>"Laskin"}, {:first_name=>"Nick", :last_name=>"Mauro"}, {:first_name=>"Mark", :last_name=>"Maxwell"}]}
Like when you use p
, there is a
hook to set up pp
overrides for your
custom objects. This is called pretty_print
. Here’s a simple implementation
that shows how you might use it:
require "pp" class Person def initialize(first_name, last_name, friends) @first_name, @last_name, @friends = first_name, last_name, friends end def pretty_print(printer) printer.text "Person <#{object_id}>: " << " Name: #@first_name #@last_name Friends: " @friends.each do |f| printer.text " #{f[:first_name]} #{f[:last_name]} " end end end friends = [ { first_name: "Emily", last_name: "Laskin" }, { first_name: "Nick", last_name: "Mauro" }, { first_name: "Mark", last_name: "Maxwell" } ] person = Person.new("Gregory", "Brown", friends) pp person #=> outputs: Person <1013900>: Name: Gregory Brown Friends: Emily Laskin Nick Mauro Mark Maxwell
As you can see here, pretty_print
takes an argument, which is an instance of the current pp
object. Because pp
inherits from PrettyPrint
, a class provided by Ruby’s
prettyprint standard library, it
provides a whole host of formatting helpers for indenting, grouping, and wrapping structured
data output. We’ve stuck with the raw text()
call here, but it’s worth mentioning that
there is a lot more available to you if you need it.
A benefit of indirectly displaying your output through a printer
object is that it allows pp
to give you an
inspect
-like method that returns a string. Try person.pretty_print_inspect
to see how this works. The string represents exactly what would be printed
to the console, just like obj.inspect
would. If you wish to use pretty_print_inspect
as your default inspect
method (and therefore make p
and pp
work the same), you can do so easily with an alias:
class Person # other code omitted alias_method :inspect, :pretty_print_inspect end
Generally speaking, pp
does a
pretty good job of rendering the debugging output for even relatively
complex objects, so you may not need to customize its behavior often.
However, if you do have a need for specialized output, you’ll find that
the pretty_print
hook provides
something that is actually quite a bit more powerful than Ruby’s default
inspect
hook, and that can really come
in handy for certain needs.
Like most other modern programming languages, Ruby ships with
libraries for working with some of the most common network protocols,
including FTP and HTTP. However, the Net::FTP
and Net::HTTP
libraries are designed primarily for
heavy lifting at the low level. They are great for this purpose, but they
leave something to be desired for when all that is needed is to grab a
remote file or do some basic web scraping. This is where open-uri
shines.
The way open-uri
works is by
patching Kernel#open
to accept URIs.
This means we can directly open remote files and work with them. For
example, here’s how we’d print out Ruby’s license using open-uri
:
require "open-uri" puts open("http://www.ruby-lang.org/en/LICENSE.txt").read #=> "Ruby is copyrighted free software by Yukihiro Matsumoto <[email protected]>. You can redistribute it and/or modify it under either the terms of the GPL (see COPYING.txt file), or the conditions below: ..."
If we encounter an HTTP error, an OpenURI::HTTPError
will be raised, including the
relevant error code:
>> open("http://majesticseacreature.com/a_totally_missing_document") OpenURI::HTTPError: 404 Not Found from /usr/local/lib/ruby/1.8/open-uri.rb:287:in 'open_http' from /usr/local/lib/ruby/1.8/open-uri.rb:626:in 'buffer_open' from /usr/local/lib/ruby/1.8/open-uri.rb:164:in 'open_loop' from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'catch' from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'open_loop' from /usr/local/lib/ruby/1.8/open-uri.rb:132:in 'open_uri' from /usr/local/lib/ruby/1.8/open-uri.rb:528:in 'open' from /usr/local/lib/ruby/1.8/open-uri.rb:30:in 'open' from (irb):10 from /usr/local/lib/ruby/1.8/uri/generic.rb:250 >> open("http://prism.library.cornell.edu/control/authBasic/authTest/") OpenURI::HTTPError: 401 Authorization Required from /usr/local/lib/ruby/1.8/open-uri.rb:287:in 'open_http' from /usr/local/lib/ruby/1.8/open-uri.rb:626:in 'buffer_open' from /usr/local/lib/ruby/1.8/open-uri.rb:164:in 'open_loop' from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'catch' from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'open_loop' from /usr/local/lib/ruby/1.8/open-uri.rb:132:in 'open_uri' from /usr/local/lib/ruby/1.8/open-uri.rb:528:in 'open' from /usr/local/lib/ruby/1.8/open-uri.rb:30:in 'open' from (irb):7 from /usr/local/lib/ruby/1.8/uri/generic.rb:250
The previous example was a small hint about another feature of open-uri, HTTP basic authentication. Notice what happens when we provide a username and password accessing the same URI:
>> open("http://prism.library.cornell.edu/control/authBasic/authTest/", ?> :http_basic_authentication => ["test", "this"]) => #<StringIO:0x2d1810>
Success! You can see here that open-uri
represents the returned file as a StringIO
object,
which is why we can call read
to get
its contents. Of course, we can use most other I/O operations as well, but I won’t get into
that here.
As I mentioned before, open-uri also wraps
Net::FTP
, so you could even do
something like download Ruby with it:
open("ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p0.tar.bz2") do |o| File.open(File.basename(o.base_uri.path), "w") { |f| f << o.read } end
Here we see that even though the object returned by open()
is a StringIO
object, it includes some extra
metadata, such as the base_uri
of your
request. These helpers are provided by the OpenURI::Meta
module, and are worth looking over
if you need to get more than just the contents of a file back.
Although there are some advanced features to open-uri
, it is most useful for the simple cases
shown here. Because it returns a StringIO
object, this means that any fairly
flexible interface can be extended to support remote file downloads. For a
practical example, we can take a look at Prawn’s image embedding, which
assumes only that an object you pass to it must respond to #read
:
Prawn::Document.generate("remote_images.pdf") do image open("http://prawn.majesticseacreature.com/media/prawn_logo.png") end
This feature was accidentally enabled when we allowed the image()
method to accept
Tempfile
objects. Because open-uri
smoothly integrates with the rest of Ruby, you might find situations where
it can come in handy in a similar way in your own applications.
Core Ruby has a Time
class, but
we will encounter a lot of situations where we also need to work with
dates, or combinations of dates and times. Ruby’s
date standard library gives us Date
and DateTime
, and extends Time
with conversion methods for each of them.
This library comes packed with a powerful parser that can handle all sorts
of date formats, and a solid date formatting engine to output data based
on a template. Here are just a couple of trivial examples to give you a
sense of its flexibility:
>> Date.strptime("12/08/1985","%m/%d/%Y").strftime("%Y-%m-%d") => "1985-12-08" >> Date.strptime("1985-12-08","%Y-%m-%d").strftime("%m/%d/%Y") => "12/08/1985" >> Date.strptime("December 8, 1985","%b%e, %Y").strftime("%m/%d/%Y") => "12/08/1985"
Date
objects can also be queried
for all sorts of information, as well as manipulated to produce new
days:
>> date = Date.today => #<Date: 2009-02-09 (4909743/2,0,2299161)> >> date + 1 => #<Date: 2009-02-10 (4909745/2,0,2299161)> >> date << 1 => #<Date: 2009-01-09 (4909681/2,0,2299161)> >> date >> 1 => #<Date: 2009-03-09 (4909799/2,0,2299161)> >> date.year => 2009 >> date.month => 2 >> date.day => 9 >> date.wday => 1 >> date + 36 => #<Date: 2009-03-17 (4909815/2,0,2299161)>
Here we’ve just scratched the surface, but in the interest of
keeping a quick pace, we’ll dive right into an example. So far, we’ve been
looking at Date
, but now we’re going to
work with DateTime
. The two are
basically the same, except that the latter can hold time values as
well:[19]
>> dtime = DateTime.now => #<DateTime: 2009-02-09T03:56:17-05:00 (...)> >> [:month, :day, :year, :hour, :minute, :second].map { |attr| dtime.send(attr) } => [2, 9, 2009, 3, 56, 17] >> dtime.between?(DateTime.now - 1, DateTime.now + 1) => true >> dtime.between?(DateTime.now - 1, DateTime.now - 1) => false
What follows is a simplified look at a common problem: event scheduling. Basically, we want an object that provides functionality like this:
sched = Scheduler.new sched.event "2009.02.04 10:00", "2009.02.04 11:30", "Eat Snow" sched.event "2009.02.03 14:00", "2009.02.04 14:00", "Wear Special Suit" sched.display_events_at '2009.02.04 10:20'
When given a specific date and time, display_events_at
would look up all the events
that were happening at that exact moment:
Events occurring around 10:20 on 02/04/2009 -------------------------------------------- 14:00 (02/03) - 14:00 (02/04): Wear Special Suit 10:00 (02/04) - 11:30 (02/04): Eat Snow
This means that if we look a little later in the day, as you can see that in this particular example, although eating snow is a short-lived experience, the passion for wearing a special suit carries on:
sched.display_events_at '2009.02.04 11:45' ## OUTPUTS ## Events occurring around 11:45 on 02/04/2009 -------------------------------------------- 14:00 (02/03) - 14:00 (02/04): Wear Special Suit
As it turns out, implementing the Scheduler
class is pretty straightforward,
because DateTime
objects can be used as
endpoints in a Ruby range object. So when we look at these two events,
what we’re really doing is something similar to this:
>> a = DateTime.parse("2009.02.03 14:00") .. DateTime.parse("2009.02.04 14:00") >> a.cover?(DateTime.parse("2009.02.04 11:45")) => true >> b = DateTime.parse("2009.02.04 10:00") .. DateTime.parse("2009.02.04 11:30") >> b.cover?(DateTime.parse("2009.02.04 11:45")) => false
When you combine this with things we’ve already discussed, like flexible date parsing and formatting, you end up with a fairly vanilla implementation:
require "date" class Scheduler def initialize @events = [] end def event(from, to, message) @events << [DateTime.parse(from) .. DateTime.parse(to), message] end def display_events_at(datetime) datetime = DateTime.parse(datetime) puts "Events occurring around #{datetime.strftime("%H:%M on %m/%d/%Y")}" puts "--------------------------------------------" events_at(datetime).each do |range, message| puts "#{time_abbrev(range.first)} - #{time_abbrev(range.last)}: #{message}" end end private def time_abbrev(datetime) datetime.strftime("%H:%M (%m/%d)") end def events_at(datetime) @events.each_with_object([]) do |event, matched| matched << event if event.first.cover?(datetime) end end end
We can start by looking at how event()
works:
def event(from, to, message) @events << [DateTime.parse(from) .. DateTime.parse(to), message] end
Here we see that each event is simply a tuple consisting of two
elements: a datetime Range
, and a
message. We parse the strings on the fly using DateTime.parse
. This method should typically be used with
caution, as it is much more reliable to use Date.strptime
, and much faster to construct
a DateTime
manually than it is to
attempt to guess the date format. That having been said, there is no
substitute when you cannot rely on a standardized date format, and it does
a good job of providing a flexible interface when one is needed.
As this fairly pedestrian code completely covers storing events, what remains to be shown is how they are selectively retrieved and displayed. We’ll start with the helper method that looks up what events are going on at a particular time:
def events_at(datetime) @events.each_with_object([]) do |event, matched| matched << event if event.first.cover?(datetime) end end
Here, we build up our list of matching events by simply iterating
over the event list and including those only those events in which the
datetime Range
covers the time in
question. This, along with the
self-explanatory time_abbrev
code, is
used to keep display_events_at
nice and clean:
def display_events_at(datetime) datetime = DateTime.parse(datetime) puts "Events occurring around #{datetime.strftime("%H:%M on %m/%d/%Y")}" puts "--------------------------------------------" events_at(datetime).each do |range, message| puts "#{time_abbrev(range.first)} - #{time_abbrev(range.last)}: #{message}" end end
Here, we’re doing little more than parsing the date and time passed
in as a string to get us a DateTime
object, and then displaying the results of events_at
. We take advantage of strftime
for that, and recover the endpoints of
the range to include in our output, to show exactly when an event starts
and stops. There’s really not much more to it.
Although this example is obviously a bit oversimplified, you’ll find
that similar problems crop up again and again. The key thing to remember
is to take advantage of the ability of DateTime
objects to be used within ranges, and
whenever possible, to avoid parsing dates yourself. If you need finer
granularity, use strptime()
, but for
many needs parse()
will do the trick
while providing a more flexible interface to your users.
We’ve covered some of the most common uses of Ruby’s standard date library here, but there are of course plenty of other features for the edge case. As with the other topics in this appendix, hit up the API documentation if you need to know more.
Although Ruby’s String
object provides many
powerful features that rely on regular expressions, it can be cumbersome
to build any sort of parser with them. Most operations that you can do
directly on strings work on the whole string at once, providing MatchData
that can be used
to index into the original content. This is great when a single pattern
fits the bill, but when you want to consume some text in chunks, switching
up strategies as needed along the way, things get a little more hairy.
This is where the strscan library comes in.
When you require strscan, it provides a class
called StringScanner
. The underlying
purpose of using this object is that it keeps track of where you are in
the string as you consume parts of it via regex patterns. Just to clear up
what this means, we can take a look at the example used in the
RDoc:
s = StringScanner.new('This is an example string') s.eos? # -> false p s.scan(/w+/) # -> "This" p s.scan(/w+/) # -> nil p s.scan(/s+/) # -> " " p s.scan(/s+/) # -> nil p s.scan(/w+/) # -> "is" s.eos? # -> false p s.scan(/s+/) # -> " " p s.scan(/w+/) # -> "an" p s.scan(/s+/) # -> " " p s.scan(/w+/) # -> "example" p s.scan(/s+/) # -> " " p s.scan(/w+/) # -> "string" s.eos? # -> true p s.scan(/s+/) # -> nil p s.scan(/w+/) # -> nil
From this simple example, it’s clear to see that the index is
advanced only when a match is made. Once the end of the string is reached,
there is nothing left to match. Although this may seem a little simplistic
at first, it forms the essence of what StringScanner
does for us. We can see that by
looking at how it is used in the context of something a little more
real.
We’re about to look at how to parse JSON (JavaScript Object
Notation), but the example we’ll use is primarily for educational
purposes, as it demonstrates an elegant use of
StringScanner
. If you have a real need for this
functionality, be sure to look at the json standard
library that ships with Ruby, as that is designed to provide the kind of
speed and robustness you’ll need in production.
In Ruby Quiz #155, James Gray builds up a JSON parser by
hand-rolling a recursive descent parser using StringScanner
. He actually covers the full
solution in depth on the Ruby Quiz website, but this
abridged version focuses specifically on his use of StringScanner
. To keep things simple, we’ll
discuss roughly how he manages to get this small set of assertions to
pass:
def test_array_parsing assert_equal(Array.new, @parser.parse(%Q{[]})) assert_equal( ["JSON", 3.1415, true], @parser.parse(%Q{["JSON", 3.1415, true]}) ) assert_equal([1, [2, [3]]], @parser.parse(%Q{[1, [2, [3]]]})) end
We can see by the general outline how this parser works:
require "strscan" class JSONParser AST = Struct.new(:value) def parse(input) @input = StringScanner.new(input) parse_value.value ensure @input.eos? or error("Unexpected data") end private def parse_value trim_space parse_object or parse_array or parse_string or parse_number or parse_keyword or error("Illegal JSON value") ensure trim_space end # ... end
Essentially, a StringScanner
object is built up using the original JSON string. Then, the parser
recursively walks down through the structure and parses the data types it
encounters. Once the parsing completes, we expect that we’ll be at the end
of the string, otherwise some data was left unparsed, indicating
corruption.
Looking at the way parse_value
is
implemented, we see the benefit of using StringScanner
. Before an actual value is
parsed, whitespace is trimmed on both ends using the trim_space
helper. This is exactly as simple as
you might expect it to be:
def trim_space @input.scan(/s+/) end
Of course, to make things a little more interesting, and to continue
our job, we need to peel back the covers on parse_array
:
def parse_array if @input.scan(/[s*/) #1 array = Array.new more_values = false while contents = parse_value rescue nil #2 array << contents.value more_values = @input.scan(/s*,s*/) or break end error("Missing value") if more_values @input.scan(/s*]s*/) or error("Unclosed array") #3 AST.new(array) else false end end
The beauty of JSON (and this particular parsing solution) is that
it’s very easy to see what’s going on. On a successful parse, this code
takes three simple steps. First, it detects the opening [
, indicating the start of a JSON array. If it
finds that, it creates a Ruby array to populate. Then, the second step is
to parse out each value, separated by commas and optional whitespace. To
do this, the parser simply calls parse_value
again, taking advantage of recursion
as we mentioned before. Finally, the third step is to seek a closing
]
, which, when found, ends this stage
of parsing and returns a Ruby array wrapped in the AST struct this parser
uses.
Going back to our three assertions, we can trace them one by one. The first one was meant to test parsing an empty array:
assert_equal(Array.new, @parser.parse(%Q{[]}))
This one is the most simple to trace, predictably. When parse_value
is called to capture the contents in
the array, it will error out, because no JSON objects start with ]
. James is using a clever trick that banks on a
failed parse, because that allows him to short-circuit processing the
contents. This error is swallowed, leaving the contents empty. The string
is then scanned for the closing ]
,
which is found, and an AST-wrapped empty Ruby array is returned.
The second assertion is considerably more tricky:
assert_equal( ["JSON", 3.1415, true], @parser.parse(%Q{["JSON", 3.1415, true]}) )
Here, we need to rely on parse_value
’s ability to parse strings, numbers,
and booleans. All three of these are done using techniques similar to
those shown so far, but a string is a little hairy due to some tricky edge
cases. However, to give you a few extra samples, we can take a look at the
other two:
def parse_number @input.scan(/-?(?:0|[1-9]d*)(?:.d+)?(?:[eE][+-]?d+)?/) and AST.new(eval(@input.matched)) end def parse_keyword @input.scan(/(?:true|false|null)/) and AST.new(eval(@input.matched.sub("null", "nil"))) end
In both cases, James takes advantage of the similarities between
Ruby and JSON when it comes to numbers and keywords, and essentially just
eval
s the results after a bit of massaging. The numeric
pattern is a little hairy, and you don’t necessarily need to understand
it. Instead, the interesting thing to note about these two examples is
their use of StringScanner#matched
. As
the name suggests, this method returns the actual string that was just
matched by the pattern. This is a common way to extract values while
conditionally scanning for matches.
This pretty much wraps up the interesting bits about getting the
second assertion to pass. Here, the parser just keeps attempting to pull
off new values if it can, while the array code wipes out any intermediate
commas. Once the values are exhausted, the ]
is then searched for, as before.
The third and final case for array parsing may initially seem complicated:
assert_equal([1, [2, [3]]], @parser.parse(%Q{[1, [2, [3]]]}))
However, if you recall that the way parse_array
works is to repeatedly call parse_value
until all its elements are consumed,
it’s clear what is going on. Because an array can be parsed by parse_value
just the same as any other job, the
nested arrays have no trouble repeating the same process to find their
elements, which can also be arrays. At some point, this process bottoms
out, and the whole structure is built up. That means that we actually get
to pass this third assertion for free, as the
implementation already uses recursive calls through parse_value
.
Although this doesn’t cover 100% of how James’s parser works, it
gives you a good sense of when StringScanner
might be a good tool to have
around. You can see how powerful it is to keep a single reference to a
StringScanner
and use it in a number of
different methods to consume a string part by part. This allows better
decomposition of your program, and simplifies the code by removing some of
the low-level plumbing from the equation. So next time you want to do
regular expression processing on a string chunk by chunk rather than all
at once, you might want to give StringScanner
a try.
Though it might not be something we do every day, having easy access to the common cryptographic hash functions can be handy for all sorts of things. The digest standard library provides several options, including MD5, SHA1, and SHA2. We’ll cover three simple use cases here: calculating the checksum of a file, uniquely hashing files based on their content, and encrypted password storage.
I won’t get into the details about the differences between various hashing algorithms or their limitations. Though they all have a potential risk for what is known as a collision, where two distinct content keys are hashed to the same value, this is rare enough to not need to worry about in most practical scenarios. Of course, if you’re new to encryption in general, you will want to read up on these techniques elsewhere before attempting to use them for anything nontrivial. Assuming that you accept this responsibility, we can move on to see how these hashing functions can be used in your Ruby applications.
We’ll start with checksums, because these are pretty easy to find in the wild. If you’ve downloaded open source software before, you’ve probably seen MD5 or SHA256 hashes before. I’ll be honest: most of the time I just ignore these, but they do come in handy when you want to verify that an automated download completed correctly. They’re also useful if you have a tendency toward paranoia and want to be sure that the file you are receiving is really what you think it is. Using the Ruby 1.9.1 release notes themselves as an example, we can see what a digitally signed file download looks like:
== Location * ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p0.tar.bz2 SIZE: 7190271 bytes MD5: 0278610ec3f895ece688de703d99143e SHA256: de7d33aeabdba123404c21230142299ac1de88c944c9f3215b816e824dd33321
Once we’ve downloaded the file, pulling the relevant hashes is trivial. Here’s how we’d grab the MD5 hash:
>> require "digest/md5" => true >> Digest::MD5.hexdigest(File.binread("ruby-1.9.1-p0.tar.bz2")) => "0278610ec3f895ece688de703d99143e"
If we preferred the slightly more secure SHA256 hash, we don’t need to work any harder:
>> require "digest/sha2" => true >> Digest::SHA256.hexdigest(File.binread("ruby-1.9.1-p0.tar.bz2")) => "de7d33aeabdba123404c21230142299ac1de88c944c9f3215b816e824dd33321"
As both of these match the release notes, we can be reasonably sure that nothing nefarious is going on, and also that our file integrity has been preserved. That’s the most common use of this form of hashing.
Of course, in addition to identifying a particular file uniquely,
cryptographic hashes allow us to identify the uniqueness of a file’s
content. If you’ve used the revision control system git
, you may have noticed that the revisions are
actually identified by SHA1 hashes that describe the changesets. We can do
similar things in our Ruby applications.
For example, in Prawn, we support embedding images in PDF documents.
Because these images can be from any number of sources ranging from a temp
file to a directly downloaded image from the Web, we cannot rely on unique
filenames mapping to unique images. Processing images can be pretty
costly, especially when we do things like split out alpha channels for
PNGs, so we want to avoid reprocessing images when we can avoid it. The
solution to this problem is simple: we use SHA1 to generate a
hexdigest
for the image content and then use that as a
key into a hash. A rough approximation of what we’re doing looks like
this:
require "digest/sha1" def image_registry(raw_image_data) img_sha1 = Digest::SHA1.hexdigest(raw_image_data) @image_registry[img_sha1] ||= build_image_obj(raw_image_data) end
This technique clearly isn’t limited to PDF generation. To name just a couple of other use cases, I use a similar hashing technique to make sure the content of my blog has changed before reuploading the static files it generates, so it uploads only the files it needs. I’ve also seen this used in the context of web applications to prevent identical content from being copied again and again to new files. Fundamentally, these ideas are nearly identical to the previous code sample, so I won’t illustrate them explicitly.
However, while we’re on the topic of web applications, we can work our way into our last example: secure password storage.
It should be pretty obvious that even if we restrict access to our databases, we should not store passwords in clear text. We have a responsibility to offer users a reasonable amount of privacy, and through cryptographic hashing, even administrators can be kept in the dark about what individual users’ passwords actually are. Using the techniques already shown, we get most of the way to a solution.
The following example is from an ActiveRecord model as part of a Rails application, but it is fairly easily adaptable to any system in which the user information is remotely stored outside of the application itself. Regardless of whether you are familiar with ActiveRecord, the code should be fairly straightforward to follow with a little explanation. Everything except the relevant authentication code has been omitted, to keep things well focused:
class User < ActiveRecord::Base def password=(pass) @password = pass salt = [Array.new(6){rand(256).chr}.join].pack("m").chomp self.password_salt, self.password_hash = salt, Digest::SHA256.hexdigest(pass + salt) end def self.authenticate(username,password) user = find_by_username(username) hash = Digest::SHA256.hexdigest(password + user.password_salt) if user.blank? || hash != user.password_hash raise AuthenticationError, "Username or password invalid" end user end end
Here we see two functions: one for setting an individual user’s password, and another for authenticating and looking up a user by username and password. We’ll start with setting the password, as this is the most crucial part:
def password=(pass) @password = pass salt = [Array.new(6){rand(256).chr}.join].pack("m").chomp self.password_salt, self.password_hash = salt, Digest::SHA256.hexdigest(pass + salt) end
Here we see that the password is hashed using Digest::SHA256
, in a similar fashion to our
earlier examples. However, this password isn’t directly hashed, but
instead, is combined with a salt
to make it more
difficult to guess. This technique has been shown in many Ruby cookbooks
and tutorials, so you may have encountered it before. Essentially, what
you are seeing here is that for each user in our database, we generate a
random six-byte sequence and then pack it into a base64-encoded string,
which gets appended to the password before it is hashed. This makes
several common attacks much harder to execute, at a minimal complexity
cost to us.
An important thing to notice is that what we store is the fingerprint of the password after it has been salted rather than the password itself, which means that we never store the original content and it cannot be recovered. So although we can tell whether a given password matches this fingerprint, the original password cannot be retrieved from the data we are storing.
If this more or less makes sense to you, the authenticate
method will be easy to follow
now:
def self.authenticate(username,password) user = find_by_username(username) hash = Digest::SHA256.hexdigest(password + user.password_salt) if user.blank? || hash != user.password_hash raise AuthenticationError, "Username or password invalid" end user end
Here, we first retrieve the user from the database. Assuming that
the username is valid, we then look up the salt
and add
it to our bare password string. Because we never stored the actual
password, but only its salted hash, we call hexdigest
again and compare the hash to the one
stored in the database. If they match, we return our user object and all
is well; if they don’t, an error is raised. This completes the cycle of
secure password storage and authentication and demonstrates the role that
cryptographic hashes play in it.
With that, we’ve probably talked enough about digest
for now. There are some more advanced
features available, but as long as you know that Digest::MD5
, Digest::SHA1
and Digest::SHA256
exist and how to call hexdigest()
on each of them, you have all you’ll
need to know for most occasions. Hopefully, the examples here have
illustrated some of the common use cases, and helped you think of your own
in the process.
The mathn standard library, when combined with
the core Math
module, serves to make
mathematical operations more pleasant in Ruby. The main purpose of
mathn is to pull in other standard libraries and
integrate them with the rest of Ruby’s numeric system. You’ll notice this
right away when doing basic arithmetic:
>> 1 / 2 => 0 >> Math.sqrt(-1) Errno::EDOM: Numerical argument out of domain - sqrt .. >> require "mathn" => true >> 1 / 2 => 1/2 >> 1 / 2 + 5 / 7 => 17/14 >> Math.sqrt(-1) => (0+1i)
As you can see, integer division gives way when
mathn is loaded, in favor of returning Rational
objects. These behave like the
fractions you learned in grade school, and keep values in exact terms
rather than expressing them as floats where possible. Numbers also
gracefully extend into the Complex
field, without
error. Although this sort of behavior might seem unnecessary for
day-to-day programming needs, it can be very helpful for mathematical
applications.
In addition to changing the way basic arithmetic works, mathn pulls in a few of the higher-level mathematical constructs. For those interested in enumerating prime numbers (for whatever fun reason you might have in mind), a class is provided. To give you a peek at how it works, we can do things like ask for the first 10 primes or how many primes exist up to certain numbers:
>> Prime.first(10) => [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] >> Prime.find_index { |e| e > 1000 } => 168
In addition to enabling Prime
,
mathn also allows you to use Matrix
and Vector
without an explicit
require
:
>> Vector[1,2,3] * 3 => Vector[3, 6, 9] >> Matrix[[5,1,2],[3,1,4]].transpose => Matrix[[5, 3], [1, 1], [2, 4]]
These classes can do all sorts of useful linear algebra functions, but I don’t want to overwhelm the casual Rubyist with mathematical details. Instead, we’ll look at a practical use of them and leave the theory as a homework assignment. Consider the simple drawing in Figure B-1.
We see two rather exciting triangles, nuzzling up against each other at a single point. As it turns out, the smaller triangle is nothing more than a clone of the larger one, reflected, rotated, and scaled to fit. Here’s the code that makes all that happen:[20]
include Math Canvas.draw("triangles.pdf") do points = Matrix[[0,0], [100,200], [200,100]] paint_triangle(*points) # reflect across y-axis points *= Matrix[[-1, 0],[0,1]] # rotate so bottom is flush with x axis. theta = -atan(1/2) points *= Matrix[[cos(theta), -sin(theta)], [sin(theta), cos(theta)]] # scale down by 50% points *= 1/2 paint_triangle(*points) end
You don’t need to worry about the graphics-drawing code. Instead,
focus on the use of Matrix
manipulations here, and
watch what happens to the points in each step. We start off with our
initial triangle’s coordinates, as such:
>> points = Matrix[[0,0], [100,200], [200,100]] => Matrix[[0, 0], [100, 200], [200, 100]]
Then, we multiply by a 2×2 matrix that performs a reflection across the y-axis:
>> points *= Matrix[[-1, 0],[0,1]] => Matrix[[0, 0], [-100, 200], [-200, 100]]
Notice here that the x values are inverted, while the y value is left untouched. This is what translates our points from the right side to the left. Our next task uses a bit of trigonometry to rotate our triangle to lie flat along the x-axis. Notice here that we use the arctan of 1/2, because the bottom-edge triangle on the right rises halfway toward the upper boundary before terminating. If you aren’t familiar with how this calculation works, don’t worry—just observe its results:
>> theta = -atan(1/2) => -0.463647609000806 >> points *= Matrix[[cos(theta), -sin(theta)], ?> [sin(theta), cos(theta)] ] => Matrix[[0.0, 0.0], [-178.885438199983, 134.164078649987], [-223.606797749979, 0.0]]
The numbers got a bit ugly after this calculation, but there is a key observation to make here. The triangle’s dimensions were preserved, but two of the points now lie on the x-axis. This means our rotation was successful.
Finally, we do scalar multiplication to drop the whole triangle down to half its original size:
>> points *= 1/2 => Matrix[[0.0, 0.0], [-89.4427190999915, 67.0820393249935], [-111.803398874989, 0.0]]
This completes the transformation and shows how the little triangle was developed simply by manipulating the larger one. Although this is certainly a bit of an abstract example, it hopefully serves as sufficient motivation for learning a bit more about matrixes. Although they can certainly be used for more hardcore calculations, simple linear transformations such as the ones shown in this example come cheap and easy and demonstrate an effective way to do some interesting graphics work.
Although truly hardcore math might be better suited for a more
special-purpose language, Ruby is surprisingly full-featured enough to
write interesting math programs with. As this particular topic can run far
deeper than I have time to discuss, I will leave further investigation to
the interested reader. The key thing to remember is that
mathn puts Ruby in a sort of “math mode” by including
some of the most helpful standard libraries and modifying the way that
Ruby does its basic arithmetic. This feature is so useful that
irb includes a special switch -m
, which essentially requires
mathn and then includes the Math
module in at the top level.
A small caveat to keep in mind when working with mathn is that it is fairly aggressive about the changes it makes. If you are building a Ruby library, you may want to be a bit more conservative and use the individual packages it enables one by one rather than having to deal with the consequences of potentially breaking code that relies on behaviors such as integer division.
All that having been said, if you’re working on your math homework, or building a specialized mathematical application in Ruby, feel free to go wild with all that mathn has to offer.
If you need to represent a data table in plain-text format, CSV (comma-separated value) files are about as simple as you can get. These files can easily be processed by almost any programming language, and Ruby is no exception. The csv standard library is fast for pure Ruby, internationalized, and downright pleasant to work with.
In the most simple cases, it’d be hard to make things easier. For example, say you had a CSV file (payments.csv) that looked like this:
name,payment Gregory Brown,100 Joe Comfort,150 Jon Juraschka,200 Gregory Brown,75 Jon Juraschka,250 Jia Wu,25 Gregory Brown,50 Jia Wu,75
If you want to just slurp this into an array of arrays, it can’t be easier:
>> require "csv" => true >> CSV.read("payments.csv") => [["name", "payment"], ["Gregory Brown", "100"], ["Joe Comfort", "150"], ["Jon Juraschka", "200"], ["Gregory Brown", "75"], ["Jon Juraschka", "250"], ["Jia Wu", "25"], ["Gregory Brown", "50"], ["Jia Wu", "75"]]
Of course, slurping files isn’t a good idea if you want to handle only a subset of data, but csv makes row-by-row handling easy. Here’s an example of how you’d capture only my records:
>> data = [] => [] >> CSV.foreach("payments.csv") { |row| data << row if row[0] == "Gregory Brown" } => nil >> data => [["Gregory Brown", "100"], ["Gregory Brown", "75"], ["Gregory Brown", "50"]]
A common convention is to have the first row of a CSV file represent header data. csv can give you nicer accessors in this case:
>> data => [] >> CSV.foreach("payments.csv", :headers => true) do |row| ?> data << row if row['name'] == "Gregory Brown" >> end => nil >> data => [#<CSV::Row "name":"Gregory Brown" "payment":"100">, #<CSV::Row "name":"Gregory Brown" "payment":"75">, #<CSV::Row "name":"Gregory Brown" "payment":"50">]
CSV::Row
is a sort of hash/array
hybrid. The primary feature that distinguishes it from a hash is that it
allows for duplicate field names. Here’s an example of how that works.
Given a simple file with nonunique column names like this
(phone_numbers.csv):
applicant,phone_number,spouse,phone_number James Gray,555 555 5555,Dana Gray,123 456 7890 Gregory Brown,098 765 4321,Jia Wu,222 222 2222
we can extract both "phone_number"
fields, as shown here:
>> data = CSV.read("phone_numbers.csv", :headers => true) => #<CSV::Table mode:col_or_row row_count:3> >> data.map { |r| r["phone_number"], r["phone_number",2] } => [["555 555 5555", "123 456 7890"], [" 098 765 4321", "222 222 2222"]]
We see that CSV::Row#[]
takes an optional second
argument that is an offset from which to begin looking for a field name.
For this particular data, r["phone_number",0]
and r["phone_number",1]
would resolve as the first
phone number field; an index of 2 or 3 would look up the second phone
number. If we know the names of the columns near each phone number, we can
do this in a bit of a smarter way:
>> data.map { |r| [ r["phone_number", r.index("applicant")], ?> r["phone_number", r.index("spouse")] ] } => [[" 555 555 5555", "123 456 7890"], ["098 765 4321", "222 222 2222"]]
Although this still depends on ordinal positioning to some extent, it allows us to do a relative index lookup. If we know that “phone number” is always going to be next to “applicant” and “spouse,” it doesn’t matter which column they start at. Whenever you can take advantage of this sort of flexibility, it’s a good idea to do so.
So far, we’ve talked about reading files, but the csv library handles writing as well. Rather than continue with our irb-based exploration, I’ll just combine the features that we’ve already gone over with a couple new ones, so that we can look at a tiny but fully functional script.
Our task is to convert the previously mentioned payments.csv file into a summary report (payment_summary.csv), which will look like this:
name,total payments Gregory Brown,225 Joe Comfort,150 Jon Juraschka,450 Jia Wu,100
Here, we’ve done a grouping on name and summed up the payments. If you thought this might be a complicated process, you thought wrong. Here’s all that needs to be done:
require "csv" @totals = Hash.new(0) csv_options = {:headers => true, :converters => :numeric } CSV.foreach("payments.csv", csv_options) do |row| @totals[row['name']] += row['payment'] end CSV.open("payment_summary.csv", "w") do |csv| csv << ["name","total payments"] @totals.each { |row| csv << row } end
The core mechanism for doing the grouping-and-summing operation is just a hash with default values of zero for unassigned keys. If you haven’t seen this technique before, be sure to make a note of it, because you’ll see it all over Ruby scripts. The rest of the work is easy once we exploit this little trick.
As far as processing the initial CSV file goes, the only new trick
we’ve added is to specify the option :converters
=> :numeric
. This tells csv to hit each
cell with a check to see whether it contains a valid Ruby number. It then
does the right thing and converts it to a Fixnum
or Float
if there’s a match. This lets us normalize
our data as soon as it’s loaded, rather than litter our code with to_i
and to_f
calls.
For this reason, the foreach
loop
is nothing more than simple addition, keying the name to a running total
of payments.
Finally, we get to the writing side of things. We use CSV.open
in a similar way to how we might use
File.open
, and we populate a CSV
object by shoving arrays that represent rows
into it. This code is a little prettier than it might be in the general
case, as we’re working with only two columns, but you should be able to
see that the process is relatively straightforward nonetheless.
Here we see a useful little script based on the csv library weighing in at around 10 lines of code. As impressive as that might be, we haven’t even scratched the surface on this one, so be sure to dig deeper if you have a need for processing tabular datafiles. One thing that I didn’t show at all is that csv handles reading and writing from strings just as well as it does files, which may be useful in web applications and other places where there is a need to stream files rather than work directly with the filesystem. Another is dealing with different column and row record separators. Luckily, the csv library is comparably well documented, so all of these things are just a quick API documentation search away.
A great thing about the newly revamped csv
standard library is that you don’t necessarily need to upgrade to Ruby 1.9
to use it. It started as a third-party alternative to Ruby 1.8’s CSV
standard library, under the name FasterCSV
, and this project is still supported
under Ruby 1.8.6. So if you like what you see here, and you want to use it
in some of your legacy Ruby code, you can always install the
fastercsv gem and be up and running.
PStore
provides a simple,
transactional database for storing Ruby objects within a file. This gives
you a persistence layer without relying on any external resources, which
can be very handy. Using PStore
is so
simple that I can forgo most of the details and jump right into some code
that I use for a Sinatra microapp at work. What follows is a very simple
backend for storing replies to an anonymous survey:
class SuggestionBox def initialize(filename="suggestions.pstore") @filename = filename end def store @store ||= PStore.new(@filename) end def add_reply(reply) store.transaction do store[:replies] ||= [] store[:replies] << reply end end def replies(readonly=true) store.transaction do store[:replies] end end def clear_replies store.transaction do store[:replies] = [] end end end
In our application, the usage of this object for storing responses
is quite simple. Given that box
is just
a SuggestionBox
in the following code,
we just have a single call to add_reply
that looks something like this:
box.add_reply(:question1 => params["question1"], :question2 => params["question2"])
Later, when we need to generate a PDF report of all the replies in the suggestion box, we do something like this to extract the responses to the two questions:
question1_replies = [] question2_replies = [] box.replies.each do |reply| question1_replies << reply[:question1] question2.replies << reply[:question2] end
So that covers the usage, but let’s go back in a little more detail
to the implementation. You can see that our store
is initialized by just constructing a new
PStore
object and passing it a filename:
def store @store ||= PStore.new(@filename) end
Then, when it comes to using our PStore
, it
basically looks like we’re dealing with a hash-like object, but all of our
interactions with it are through these transaction
blocks:
def add_reply(reply) store.transaction do store[:replies] ||= [] store[:replies] << reply end end def replies store.transaction(readonly=true) do store[:replies] end end def clear_replies store.transaction do store[:replies] = [] end end
So the real question here is what do we gain? The answer is, predictably, a whole lot.[21]
By using PStore
, we can be sure
that only one write-mode transaction is open at a time, preventing issues
with partial reads/writes in multiprocessed applications. This means that
if we attempt to produce a report against SuggestionBox
while a new suggestion is being
written, our report will wait until the write operation completes before
it processes. However, when all transactions are read-only, they will not
block each other, allowing them to run concurrently.
Every transaction reloads the file at its start, keeping things
up-to-date and synchronized. Every write that is done checks the MD5 sum
of the contents to avoid unnecessary writes for unchanging data. If
something goes wrong during a write, all the write operations in a
transaction are rolled back and an exception is raised. In short, PStore
provides a fairly robust persistence
framework that is suitable for use across multiple applications or
threads.
Of course, though it is great for what it does, PStore
has notable limitations. Because it loads
the entire dataset on every read, and writes the whole dataset on every
write, it is very I/O-intensive. Therefore, it’s not meant to handle very
high load or large datasets. Additionally, as it is essentially nothing
more than a file-based Hash
object, it
cannot serve as a substitute for some sort of SQL server when dealing with
relational data that needs to be efficiently queried. Finally, because it
uses the core utility Marshal
to serialize objects to
disk,[22] PStore
cannot be used to
store certain objects. These include anonymous classes and Proc
objects, among other things.
Despite these limitations, PStore
has a very wide
sweet spot in which it is the right way to go. Whenever you need to
persist a modest amount of nonrelational data and possibly share it across
processes or threads, it is usually the proper tool for the job. Although
it is possible to code up your own persistence solutions on top of Ruby’s
raw serialization support, PStore
solves most of the common needs in a rather elegant way.
JavaScript Object Notation (JSON) is an object serialization format that has been gaining a ton of steam lately. With the rise of a service-oriented Web, the need for a simple, language-independent data serialization format has become more and more apparent.
Historically, XML has been used for interoperable data serialization. However, using XML for this is a bit like going bird hunting with a bazooka: it’s just way more firepower than the job requires. JSON aims to do one thing and do it well, and these constraints give rise to a human-readable, human-editable, easy-to-parse, and easy-to-produce data interchange format.
The primitive constructs provided by JSON are limited to hashes
(called objects in JSON), arrays, strings, numbers,
and the conditional triumvirate of true
, false
,
and nil
(called null
in JSON). It’s easy to see that these
concepts trivially map to core Ruby objects. Let’s take a moment to see
how each of these data types are represented in JSON:
require "json" hash = { "Foo" => [Math::PI, 1, "kittens"], "Bar" => [false, nil, true], "Baz" => { "X" => "Y" } } #... puts hash.to_json #=> Outputs {"Foo":[3.14159265358979,1,"kittens"],"Bar":[false,null,true],"Baz":{"X":"Y"}}
There isn’t really much to it. In fact, JSON is somewhat syntactically similar to Ruby. Though the similarity is only superficial, it is nice to be able to read and write structures in a format that doesn’t feel completely alien.
If we go the other direction, from JSON into Ruby, you’ll see that the transformation is just as easy:
require "json" json_string = '{"Foo":[3.14159265358979,1,"kittens"], ' + '"Bar":[false,null,true],"Baz":{"X":"Y"}}' hash = JSON.parse(json_string) p hash["Bar"] #=> Outputs [false,nil,true] p hash["Baz"] #=> Outputs { "X"=>"Y" }
Without knowing much more about Ruby’s json standard library, you can move on to building useful things. As long as you know how to navigate the nested hash and array structures, you can work with pretty much any service that exposes a JSON interface as if it were responding to you with Ruby structures. As an example, we can look at a fairly simple interface that does a web search and processes the JSON dataset it returns:
require "json" require "open-uri" require "cgi" module GSearch extend self API_BASE_URI = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=" def show_results(query) results = response_data(query) results["responseData"]["results"].each do |match| puts CGI.unescapeHTML(match["titleNoFormatting"]) + ": " + match["url"] end end def response_data(query) data = open(API_BASE_URI + URI.escape(query), "Referer" => "http://rubybestpractices.com").read JSON.parse(data) end end
Here we’re using json and the
open-uri library, which was discussed earlier in this
appendix as a way to wrap a simple
Google web search. When run, this code will print out a few page titles
and their URLs for any query you enter. Here’s a sample of GSearch
in action:
GSearch.show_results("Ruby Best Practices") # OUTPUTS Ruby Best Practices: Rough Cuts Version | O'Reilly Media: http://oreilly.com/catalog/9780596156749/ Ruby Best Practices: The Book and Interview with Gregory Brown: http://www.rubyinside.com/ruby-best-practices-gregory-brown-interview-1332.html On Ruby: A 'Ruby Best Practices' Blogging Contest: http://on-ruby.blogspot.com/2008/12/ruby-best-practices-blogging-contest.html Gluttonous : Rails Best Practices, Tips and Tricks: http://glu.ttono.us/articles/2006/02/06/rails-best-practices-tips-and-tricks
Maybe by the time this book comes out, we’ll have nabbed all four of
the top spots, but that’s beside the point. You’ll want to notice that the
GSearch
module interacts with the
json library in a sum total of one line, in order to
convert the dataset into Ruby. After that, the rest is business as
usual.
The interesting thing about this particular example is that I didn’t
read any of the documentation for the search API. Instead, I tried a
sample query, converted the JSON to Ruby, and then used Hash#keys
to tell me what attributes were
available. From there I continued to use ordinary Ruby reflection and
inspection techniques straight from irb to figure out
which fields were needed to complete this example. By thinking of JSON
datasets as nothing more than the common primitive Ruby objects, you can
accomplish a lot using the skills you’re already familiar with.
After seeing how easy it is to consume JSON, you might be wondering
how you’d go about producing it using your own custom objects. As it turns
out, there really isn’t that much to it. Say, for example, you had a
Point
class that was responsible for
doing some calculations, but that at its essence it was basically just an
ordered pair representing an
[x,y] coordinate. Producing the
JSON to match this is easy:
require "json" class Point def initialize(x,y) @x, @y = x, y end def distance_to(point) Math.hypot(point.x - x, point.y - y) end attr_reader :x, :y def to_json(*args) [x,y].to_json(*args) end end point_a = Point.new(1,2) puts point_a.to_json #=> "[1,2]" point_data = JSON.parse('[4,6]') point_b = Point.new(*point_data) puts point_b.distance_to(point_a) #=> 5.0
Here, we have simply represented our core data in primitives and then wrapped our object model around it. In many cases, this is the most simple, implementation-independent way to represent one of our objects.
However, in some cases you may wish to let the object internally interpret the structure of a JSON document and do the wrapping for you. The Ruby json library provides a simple hook that depends on a bit of metadata to convert our parsed JSON into a customized higher-level Ruby object. If we rework our example, we can see how it works:
require "json" class Point def initialize(x,y) @x, @y = x, y end def distance_to(point) Math.hypot(point.x - x, point.y - y) end attr_reader :x, :y def to_json(*args) { 'json_class' => self.class.name, 'data' => [@x, @y] }.to_json(*args) end def self.json_create(obj) new(*obj['data']) end end point_a = Point.new(1,2) puts point_a.to_json #=> {"json_class":"Point","data":[1,2]} point_b = JSON.parse('{"json_class":"Point","data":[4,6]}') puts point_b.distance_to(point_a) #=> 5.0
Although a little more work needs to be done here, we can see that
the underlying mechanism for a direct Ruby→JSON→Ruby round trip is simple.
The JSON library depends on the attribute json_class
, which points to a string that
represents a Ruby class name. If this class has a method called json_create
, the parsed JSON data is passed to
this method and its return value is returned by JSON.parse
. Although this approach involves
rolling up our sleeves a bit, it is nice to see that there is not much
magic to it.
Although this adds some extra noise to our JSON output, it does not add any new constructs, so there is not an issue with other programming languages being able to parse it and use the underlying data. It simply comes with the added benefit of Ruby applications being able to simply map raw primitive data to the classes that wrap them.
Depending on your needs, you may prefer one technique over the
other. The benefit of providing a simple to_json
hook that produces raw primitive values
is that it keeps your serialized data completely implementation-agnostic.
This will come in handy if you need to support a wide range of clients.
The benefit of using the "json_class"
attribute is that you do not have to think about manually building up
high-level objects from your object data. This is most beneficial when you
are serving up data to primarily Ruby clients, or when your data is
complex enough that manually constructing objects would be painful.
No matter what your individual needs are, it’s safe to say that JSON is something to keep an eye on moving forward. Ruby’s implementation is fast and easy to work with. If you need to work with or write your own web services, this is definitely a tool you will want to familiarize yourself with.
Code generation can be useful for dynamically generating static files based on a template. When we need this sort of functionality, we can turn to the erb standard library. ERB stands for Embedded Ruby, which is ultimately exactly what the library facilitates.
In the most basic case, a simple ERB template[23] might look like this:
require 'erb' x = 42 template = ERB.new("The value of x is: <%= x %>") puts template.result(binding)
The resulting text looks like this:
The value of x is: 42
If you’ve not worked with ERB before, you may be wondering how this differs from ordinary string interpolation, such as this:
x = 42 puts "The value of x is: #{x}"
The key difference to recognize here is the way the two strings are
evaluated. When we use string interpolation, our values are substituted
immediately. When we evaluate an ERB template, we do not actually evaluate
the expression inside the <%= ...
%>
until we call ERB#result
. That means that although this code
does not work at all:
string = "The value of x is: #{x}" x = 42 puts string
the following code will work without any problems:
require 'erb' template = ERB.new("The value of x is: <%= x %>") x = 42 puts template.result(binding)
This is the main reason why ERB can be useful to us. We can write
templates ahead of time, referencing variables and methods that may not
exist yet, and then bind them just before rendering time using binding
.
We can also include some logic in our files, to determine what should be printed:
require "erb" class A def initialize(x) @x = x end attr_reader :x public :binding def eval_template(string) ERB.new(string,0,'<>').result(binding) end end template = <<-EOS <% if x == 42 %> You have stumbled across the Answer to the Life, the Universe, and Everything <% else %> The value of x is <%= x %> <% end %> EOS foo = A.new(10) bar = A.new(21) baz = A.new(42) [foo, bar, baz].each { |e| puts e.eval_template(template) }
Here, we run the same template against three different objects,
evaluating it within the context of each of their bindings. The more
complex ERB.new
call here sets the
safe_level
the template is executed in to 0 (the
default), but this is just because we want to provide the third argument,
trim_mode
. When trim_mode
is set to
"<>"
, the newlines are omitted
for lines starting with <%
and
ending with %>
. As this is useful
for embedding logic, we need to turn it on to keep the generated string
from having ugly stray newlines. The final output of the script looks like
this:
The value of x is 10 The value of x is 21 You have stumbled across the Answer to the Life, the Universe, and Everything
As you can see, ERB does not emit text when it is within a conditional block that is not satisfied. This means a lot in the way of building up dynamic output, as you can use all your normal control structures to determine what text should and should not be rendered.
The documentation for the erb library is quite good, so I won’t attempt to dig much deeper here. Of course, no mention of ERB would be complete without an example of HTML templating. Although the API documentation goes into much more complicated examples, I can’t resist showing the template that I use for rendering entries in my blog engine.[24] This doesn’t include the site layout, but just the code that gets run for each entry:
<h2><%= title %></h2> <%= entry %> <div align="right"> <p><small>Written by <%= Blaag::AUTHOR %> on <%= published_date.strftime("%Y.%m%.%d") %> at <%= published_date.strftime("%H:%M" )%> | <%= related %> </small> </p> </div>
This code is evaluated in the context of a Blaag::Entry
object, which, as you can clearly
see, does most of the heavy lifting through helper methods. Although this
example might be boring and a bit trivial, it shows something worth
keeping in mind. Just because you can do all sorts of logic in your ERB
templates doesn’t mean that you should. The fact that these templates can
be evaluated in a custom binding means that you are able to keep your code
where it should be while avoiding messy string interpolation. If your ERB
templates start to look more like Ruby scripts than templates, you’ll want
to clean things up before you start pulling your hair out.
Using ERB
for templating can
really come in handy. Whether you need to generate a form letter or just
plug some values into a complicated data format, the ability to late-bind
data to a template and generate dynamic content on the fly from static
templates is powerful indeed. Just be sure to keep in mind that ERB is
meant to supplement your ordinary code rather than replace it, and you’ll
be able to take advantage of this useful library without creating a big
mess.
Hopefully these examples have shown the diversity you can come to expect from Ruby’s standard library. What you have probably noticed by now is that there really is a lot there—so much so that it might be a little overwhelming at first. Another observation you may have made is that there seems to be little consistency in interface between the libraries. This is because many were written by different people at different stages in Ruby’s evolution.
However, assuming that you can tolerate the occasional wart, a solid working knowledge of what is available in Ruby’s standard libraries is a key part of becoming a masterful Rubyist. It goes beyond simply knowing about and using these tools, though. Many of the libraries discussed here are written in pure Ruby, which means that you can actually learn a lot by grabbing a copy of Ruby’s source and reading through their implementations. I know I’ve learned a lot in this manner, so I wholeheartedly recommend it as a way to test and polish your Ruby chops.
[19] A DateTime
is also similar to
Ruby’s Time
object, but it is not
constrained to representing only those dates that can be represented
in Unix time. See http://en.wikipedia.org/wiki/Unix_time.
[20] To run this example, you’ll need my Prawn-based helpers. These are included in this book’s git repository.
[21] The following paragraph summarizes a ruby-talk post from Ara T. Howard, found at http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/177666.
[22] The yaml/store library will allow you to
use YAML in place of Marshal
, but with many of the
same limitations.
[23] Credit: this is the first example in the ERB API documentation.
[24] See http://github.com/sandal/blaag.
3.143.203.146