Puzzler 5

The Missing List

Scala's rich collections library provides many functions and operations on many different collection types. Sometimes, though, you may want to add your own "utility" operations.

This example represents an attempt (suboptimal, but more on that later) to sum the sizes of multiple collection instances. For example, the sum of the sizes of Vector("a"), List("b", "c"), and Array("d", "e", "f") should be 1 + 2 + 3 = 6:

  scala> sumSizes(Seq(Vector("a"), List("b""c"), 
           Array("d""e""f")))
  res4: Int = 6

You may try to make such utility functions work on base collection types, such as Iterable, to ensure they are widely applicable. Obviously, the intention is that the operation behave in the same way regardless of the concrete collections passed in.

What is the result of executing the following code?

  def sumSizes(collections: Iterable[Iterable[_]]): Int =
    collections.map(_.size).sum
  
sumSizes(List(Set(12), List(34))) sumSizes(Set(List(12), Set(34)))

Possibilities

  1. Prints:
      Int = 4
      Int = 4
    
  2. Prints:
      Int = 4
      Int = 2
    
  3. Prints:
      Int = 2
      Int = 4
    
  4. Prints:
      Int = 2
      Int = 2
    

Explanation

You may wonder whether sumSizes is somehow counting the number of collections rather than their sizes, which would cause both statements to print 2, as in answer 4. As it happens, though, the correct answer is number 2:

  scala> sumSizes(List(Set(12), List(34)))
  res5: Int = 4
  
scala> sumSizes(Set(List(12), Set(34))) res6: Int = 2

To understand what is going on, you need to recall another feature of the Scala collections library: its operations generally preserve the input collection type!

Developers coming from a Java background might expect that transforming an Iterable would produce a result that conforms to the Iterable interface, but could be any underlying implementation type. The Scala collections go one step further, though, and return an Iterable with the same type as the input type.[1]

This means the result type of the following statement is also a set and thus cannot contain multiple instances of the same item:

  Set(List(12), Set(34)).map(_.size)

As a result, this intermediate value contains only one element:

  scala> Set(List(12), Set(34)).map(_.size)
  res7: scala.collection.immutable.Set[Int] = Set(2)

This behavior of sets explains the observed result.

Adding to the puzzlement here is the fact that Set does not appear anywhere in the type declaration of sumSizes. There is no obvious warning flag to make either the author or caller of the function think about the potential impact of the uniqueness constraint of Sets, or indeed other constraints with similar unintended consequences of other subtypes of Iterable.

Discussion

How can you avoid such surprises? One possible choice in this case would be to convert the outer collection to a known type, e.g., using toSeq:

  def sumSizes(collections: Iterable[Iterable[_]]): Int =
    collections.toSeq.map(_.size).sum
  
scala> sumSizes(List(Set(12), List(34))) res0: Int = 4
scala> sumSizes(Set(List(12), Set(34))) res1: Int = 4

In this particular case, you can do even better. By switching to an implementation using fold, you can avoid the problem and eliminate one of the iterations through the outer collection:

  def sumSizes(collections: Iterable[Iterable[_]]): Int =
    collections.foldLeft(0) {
      (sumOfSizes, collection) => sumOfSizes + collection.size
  }
  
scala> sumSizes(List(Set(12), List(34))) res8: Int = 4
scala> sumSizes(Set(List(12), Set(34))) res9: Int = 4
image images/moralgraphic117px.png Pay close attention to the possible input types to your methods that operate on collections. If you do not need to preserve the input type, consider constructing your own intermediate types with known characteristics.

Footnotes for Chapter 5:

[1] See the Scaladoc for CanBuildFrom. [EPF]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.147.77