Iterators and Enumerable Objects

Although while, until, and for loops are a core part of the Ruby language, it is probably more common to write loops using special methods known as iterators. Iterators are one of the most noteworthy features of Ruby, and examples such as the following are common in introductory Ruby tutorials:

3.times { puts "thank you!" }  # Express gratitude three times
data.each {|x| puts x }        # Print each element x of data
[1,2,3].map {|x| x*x }         # Compute squares of array elements
factorial = 1                  # Compute the factorial of n
2.upto(n) {|x| factorial *= x }

The times, each, map, and upto methods are all iterators, and they interact with the block of code that follows them. The complex control structure behind this is yield. The yield statement temporarily returns control from the iterator method to the method that invoked the iterator. Specifically, control flow goes from the iterator to the block of code associated with the invocation of the iterator. When the end of the block is reached, the iterator method regains control and execution resumes at the first statement following the yield. In order to implement some kind of looping construct, an iterator method will typically invoke the yield statement multiple times. Figure 5-1 illustrates this complex flow of control. Blocks and yield are described in detail in Blocks below; for now, we focus on the iteration itself rather than the control structure that enables it.

An iterator yielding to its invoking method

Figure 5-1. An iterator yielding to its invoking method

As you can see from the previous examples, blocks may be parameterized. Vertical bars at the start of a block are like parentheses in a method definition—they hold a list of parameter names. The yield statement is like a method invocation; it is followed by zero or more expressions whose values are assigned to the block parameters.

Numeric Iterators

The core Ruby API provides a number of standard iterators. The Kernel method loop behaves like an infinite loop, running its associated block repeatedly until the block executes a return, break, or other statement that exits from the loop.

The Integer class defines three commonly used iterators. The upto method invokes its associated block once for each integer between the integer on which it is invoked and the integer which is passed as an argument. For example:

4.upto(6) {|x| print x}   # => prints "456"

As you can see, upto yields each integer to the associated block, and it includes both the starting point and the end point in the iteration. In general, n.upto(m) runs its block m-n+1 times.

The downto method is just like upto but iterates from a larger number down to a smaller number.

When the Integer.times method is invoked on the integer n, it invokes its block n times, passing values 0 through n-1 on successive iterations. For example:

3.times {|x| print x }    # => prints "012"

In general, n.times is equivalent to 0.upto(n-1).

If you want to do a numeric iteration using floating-point numbers, you can use the more complex step method defined by the Numeric class. The following iterator, for example, starts at 0 and iterates in steps of 0.1 until it reaches Math::PI:

0.step(Math::PI, 0.1) {|x| puts Math.sin(x) }

Enumerable Objects

Array, Hash, Range, and a number of other classes define an each iterator that passes each element of the collection to the associated block. This is perhaps the most commonly used iterator in Ruby; as we saw earlier, the for loop only works for iterating over objects that have each methods. Examples of each iterators:

[1,2,3].each {|x| print x }   # => prints "123"
(1..3).each  {|x| print x }   # => prints "123" Same as 1.upto(3)

The each iterator is not only for traditional “data structure” classes. Ruby’s IO class defines an each iterator that yields lines of text read from the Input/Output object. Thus, you can process the lines of a file in Ruby with code like this:

File.open(filename) do |f|       # Open named file, pass as f
  f.each {|line| print line }    # Print each line in f
end                              # End block and close file

Most classes that define an each method also include the Enumerable module, which defines a number of more specialized iterators that are implemented on top of the each method. One such useful iterator is each_with_index, which allows us to add line numbering to the previous example:

File.open(filename) do |f|
  f.each_with_index do |line,number|
    print "#{number}: #{line}"
  end
end

Some of the most commonly used Enumerable iterators are the rhyming methods collect, select, reject, and inject. The collect method (also known as map) executes its associated block for each element of the enumerable object, and collects the return values of the blocks into an array:

squares = [1,2,3].collect {|x| x*x}   # => [1,4,9]

The select method invokes the associated block for each element in the enumerable object, and returns an array of elements for which the block returns a value other than false or nil. For example:

evens = (1..10).select {|x| x%2 == 0} # => [2,4,6,8,10]

The reject method is simply the opposite of select; it returns an array of elements for which the block returns nil or false. For example:

odds = (1..10).reject {|x| x%2 == 0}  # => [1,3,5,7,9]

The inject method is a little more complicated than the others. It invokes the associated block with two arguments. The first argument is an accumulated value of some sort from previous iterations. The second argument is the next element of the enumerable object. The return value of the block becomes the first block argument for the next iteration, or becomes the return value of the iterator after the last iteration. The initial value of the accumulator variable is either the argument to inject, if there is one, or the first element of the enumerable object. (In this case, the block is invoked just once for the first two elements.) Examples make inject more clear:

data = [2, 5, 3, 4]
sum = data.inject {|sum, x| sum + x }      # => 14    (2+5+3+4)
floatprod = data.inject(1.0) {|p,x| p*x }  # => 120.0 (1.0*2*5*3*4)
max = data.inject {|m,x| m>x ? m : x }     # => 5     (largest element)

See Enumerable Objects for further details on the Enumerable module and its iterators.

Writing Custom Iterators

The defining feature of an iterator method is that it invokes a block of code associated with the method invocation. You do this with the yield statement. The following method is a trivial iterator that just invokes its block twice:

def twice
  yield
  yield
end

To pass argument values to the block, follow the yield statement with a comma-separated list of expressions. As with method invocation, the argument values may optionally be enclosed in parentheses. The following simple iterator shows a use of yield:

# This method expects a block. It generates n values of the form
# m*i + c, for i from 0..n-1, and yields them, one at a time, 
# to the associated block.
def sequence(n, m, c)
  i = 0
  while(i < n)      # Loop n times
    yield m*i + c   # Invoke the block, and pass a value to it
    i += 1          # Increment i each time
  end
end

# Here is an invocation of that method, with a block.
# It prints the values 1, 6, and 11
sequence(3, 5, 1) {|y| puts y }

Here is another example of a Ruby iterator; it passes two arguments to its block. It is worth noticing that the implementation of this iterator uses another iterator internally:

# Generate n points evenly spaced around the circumference of a 
# circle of radius r centered at (0,0). Yield the x and y coordinates
# of each point to the associated block.
def circle(r,n)
  n.times do |i|    # Notice that this method is implemented with a block
    angle = Math::PI * 2 * i / n
    yield r*Math.cos(angle), r*Math.sin(angle)
  end
end

# This invocation of the iterator prints:
# (1.00, 0.00) (0.00, 1.00) (-1.00, 0.00) (-0.00, -1.00)
circle(1,4) {|x,y| printf "(%.2f, %.2f) ", x, y }

Using the yield keyword really is a lot like invoking a method. (See Chapter 6 for complete details on method invocation.) Parentheses around the arguments are optional. You can use * to expand an array into individual arguments. yield even allows you to pass a hash literal without the curly braces around it. Unlike a method invocation, however, a yield expression may not be followed by a block. You cannot pass a block to a block.

If a method is invoked without a block, it is an error for that method to yield, because there is nothing to yield to. Sometimes you want to write a method that yields to a block if one is provided but takes some default action (other than raising an error) if invoked with no block. To do this, use block_given? to determine whether there is a block associated with the invocation. block_given?, and its synonym iterator?, are Kernel methods, so they act like global functions. Here is an example:

# Return an array with n elements of the form m*i+c
# If a block is given, also yield each element to the block
def sequence(n, m, c)
  i, s = 0, []                  # Initialize variables
  while(i < n)                  # Loop n times
    y = m*i + c                 # Compute value
    yield y if block_given?     # Yield, if block
    s << y                      # Store the value
    i += 1
  end
  s                             # Return the array of values
end

Enumerators

An enumerator is an Enumerable object whose purpose is to enumerate some other object. To use enumerators in Ruby 1.8, you must require 'enumerator'. In Ruby 1.9 (and also 1.8.7), enumerators are built-in and no require is necessary. (As we’ll see later, the built-in enumerators have substantially more functionality than that provided by the enumerator library.)

Enumerators are of class Enumerable::Enumerator. Although this class can be instantiated directly with new, this is not how enumerators are typically created. Instead, use to_enum or its synonym enum_for, which are methods of Object. With no arguments, to_enum returns an enumerator whose each method simply calls the each method of the target object. Suppose you have an array and a method that expects an enumerable object. You don’t want to pass the array object itself, because it is mutable, and you don’t trust the method not to modify it. Instead of making a defensive deep copy of the array, just call to_enum on it, and pass the resulting enumerator instead of the array itself. In effect, you’re creating an enumerable but immutable proxy object for your array:

# Call this method with an Enumerator instead of a mutable array.
# This is a useful defensive strategy to avoid bugs.
process(data.to_enum)  # Instead of just process(data)

You can also pass arguments to to_enum, although the enum_for synonym seems more natural in this case. The first argument should be a symbol that identifies an iterator method. The each method of the resulting Enumerator will invoke the named method of the original object. Any remaining arguments to enum_for will be passed to that named method. In Ruby 1.9, the String class is not Enumerable, but it defines three iterator methods: each_char, each_byte, and each_line. Suppose we want to use an Enumerable method, such as map, and we want it to be based on the each_char iterator. We do this by creating an enumerator:

s = "hello"
s.enum_for(:each_char).map {|c| c.succ }  # => ["i", "f", "m", "m", "p"]

In Ruby 1.9 (and 1.8.7), it is usually not even necessary to use to_enum or enum_for explicitly as we did in the previous examples. This is because the built-in iterator methods of Ruby 1.9 (which include the numeric iterators times, upto, downto, and step, as well as each and related methods of Enumerable) automatically return an enumerator when invoked with no block. So, to pass an array enumerator to a method rather than the array itself, you can simply call the each method:

process(data.each_char)  # Instead of just process(data)

This syntax is even more natural if we use the chars alias in place of each_char. To map the characters of a string to an array of characters, for example, just use .chars.map:

"hello".chars.map {|c| c.succ }  # => ["i", "f", "m", "m", "p"]

Here are some other examples that rely on enumerator objects returned by iterator methods. Note that it is not just iterator methods defined by Enumerable that can return enumerator objects; numeric iterators like times and upto do the same:

enumerator = 3.times             # An enumerator object
enumerator.each {|x| print x }   # Prints "012"

# downto returns an enumerator with a select method
10.downto(1).select {|x| x%2==0}  # => [10,8,6,4,2]

# each_byte iterator returns an enumerator with a to_a method
"hello".each_byte.to_a            # => [104, 101, 108, 108, 111]

You can duplicate this behavior in your own iterator methods by returning self.to_enum when no block is supplied. Here, for example, is a version of the twice iterator shown earlier that can return an enumerator if no block is provided:

def twice
  if block_given?
    yield
    yield
  else
    self.to_enum(:twice)    
  end
end

In Ruby 1.9, enumerator objects define a with_index method that is not available in the Ruby 1.8 enumerator module. with_index simply returns a new enumerator that adds an index parameter to the iteration. For example, the following returns an enumerator that yields the characters of a string and their index within the string:

enumerator = s.each_char.with_index

Finally, keep in mind that enumerators, in both Ruby 1.8 and 1.9, are Enumerable objects that can be used with the for loop. For example:

for line, number in text.each_line.with_index
  print "#{number+1}: #{line}"
end

External Iterators

Our discussion of enumerators has focused on their use as Enumerable proxy objects. In Ruby 1.9, (and 1.8.7, though the implementation is not as efficient) however, enumerators have another very important use: they are external iterators. You can use an enumerator to loop through the elements of a collection by repeatedly calling the next method. When there are no more elements, this method raises a StopIteration exception:

iterator = 9.downto(1)             # An enumerator as external iterator
begin                              # So we can use rescue below
  print iterator.next while true   # Call the next method repeatedly
rescue StopIteration               # When there are no more values
  puts "...blastoff!"              # An expected, nonexceptional condition
end

External iterators are quite simple to use: just call next each time you want another element. When there are no more elements left, next will raise a StopIteration exception. This may seem unusual—an exception is raised for an expected termination condition rather than an unexpected and exceptional event. (StopIteration is a descendant of StandardError and IndexError; note that it is one of the only exception classes that does not have the word “error” in its name.) Ruby follows Python in this external iteration technique. By treating loop termination as an exception, it makes your looping logic extremely simple; there is no need to check the return value of next for a special end-of-iteration value, and there is no need to call some kind of next? predicate before calling next.

To simplify looping with external iterators, the Kernel.loop method includes (in Ruby 1.9) an implicit rescue clause and exits cleanly when StopIteration is raised. Thus, the countdown code shown earlier could more easily be written like this:

iterator = 9.downto(1)
loop do                 # Loop until StopIteration is raised
  print iterator.next   # Print next item
end
puts "...blastoff!"

Many external iterators can be restarted by calling the rewind method. Note, however, that rewind is not effective for all enumerators. If an enumerator is based on an object like a File which reads lines sequentially, calling rewind will not restart the iteration from the beginning. In general, if new invocations of each on the underlying Enumerable object do not restart the iteration from the beginning, then calling rewind will not restart it either.

Once an external iteration has started (i.e., after next has been called for the first time), an enumerator cannot be cloned or duplicated. It is typically possible to clone an enumerator before next is called, or after StopIteration has been raised or rewind is called.

Normally, enumerators with next methods are created from Enumerable objects that have an each method. If, for some reason, you define a class that provides a next method for external iteration instead of an each method for internal iteration, you can easily implement each in terms of next. In fact, turning an externally iterable class that implements next into an Enumerable class is as simple as mixing in (with include—see Modules) a module like this:

module Iterable
  include Enumerable          # Define iterators on top of each
  def each                    # And define each on top of next
    loop { yield self.next }
  end
end

Another way to use an external iterator is to pass it to an internal iterator method like this one:

def iterate(iterator)
  loop { yield iterator.next }
end

iterate(9.downto(1)) {|x| print x }

The earlier quote from Design Patterns alluded to one of the key features of external iterators: they solve the parallel iteration problem. Suppose you have two Enumerable collections and need to iterate their elements in pairs: the first elements of each collection, then the second elements, and so on. Without an external iterator, you must convert one of the collections to an array (with the to_a method defined by Enumerable) so that you can access its elements while iterating the other collection with each.

Example 5-1 shows the implementation of three iterator methods. All three accept an arbitrary number of Enumerable objects and iterate them in different ways. One is a simple sequential iteration using only internal iterators; the other two are parallel iterations and can only be done using the external iteration features of Ruby 1.9.

Example 5-1. Parallel iteration with external iterators

# Call the each method of each collection in turn.
# This is not a parallel iteration and does not require enumerators.
def sequence(*enumerables, &block)
  enumerables.each do |enumerable|
    enumerable.each(&block)
  end
end

# Iterate the specified collections, interleaving their elements.
# This can't be done efficiently without external iterators.
# Note the use of the uncommon else clause in begin/rescue.
def interleave(*enumerables)
  # Convert enumerable collections to an array of enumerators.
  enumerators = enumerables.map {|e| e.to_enum }
  # Loop until we don't have any more enumerators.
  until enumerators.empty?
    begin
      e = enumerators.shift   # Take the first enumerator
      yield e.next            # Get its next and pass to the block
    rescue StopIteration      # If no more elements, do nothing
    else                      # If no exception occurred
      enumerators << e        # Put the enumerator back
    end
  end
end

# Iterate the specified collections, yielding tuples of values,
# one value from each of the collections. See also Enumerable.zip.
def bundle(*enumerables)
  enumerators = enumerables.map {|e| e.to_enum }
  loop { yield enumerators.map {|e| e.next} }
end

# Examples of how these iterator methods work
a,b,c = [1,2,3], 4..6, 'a'..'e'
sequence(a,b,c) {|x| print x}   # prints "123456abcde"
interleave(a,b,c) {|x| print x} # prints "14a25b36cde"
bundle(a,b,c) {|x| print x}     # '[1, 4, "a"][2, 5, "b"][3, 6, "c"]'

The bundle method of Example 5-1 is similar to the Enumerable.zip method. In Ruby 1.8, zip must first convert its Enumerable arguments to arrays and then use those arrays while iterating through the Enumerable object it is called on. In Ruby 1.9, however, the zip method can use external iterators. This makes it (typically) more efficient in space and time, and also allows it to work with unbounded collections that could not be converted into an array of finite size.

Iteration and Concurrent Modification

In general, Ruby’s core collection of classes iterate over live objects rather than private copies or “snapshots” of those objects, and they make no attempt to detect or prevent concurrent modification to the collection while it is being iterated. If you call the each method of an array, for example, and the block associated with that invocation calls the shift method of the same array, the results of the iteration may be surprising:

a = [1,2,3,4,5]
a.each {|x| puts "#{x},#{a.shift}" }  # prints "1,1
3,2
5,3"

You may see similarly surprising behavior if one thread modifies a collection while another thread is iterating it. One way to avoid this is to make a defensive copy of the collection before iterating it. The following code, for example, adds a method each_in_snapshot to the Enumerable module:

module Enumerable
  def each_in_snapshot &block
    snapshot = self.dup    # Make a private copy of the Enumerable object
    snapshot.each &block   # And iterate on the copy
  end
end


[*] Within the Japanese Ruby community, the term “iterator” has fallen out of use because it implies an iteration that is not actually required. A phrase like “method that expects an associated block” is verbose but more precise.

[*] Design Patterns: Elements of Reusable Object-Oriented Software, by Gamma, Helm, Johnson, and Vlissides (Addison-Wesley).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.225