Foldable

The monoid identity property allows us to handle empty collections in a general way. So, instead of having the following:

  def reduceLeft(op: (A, A) => A): A

We'll have a definition that takes an identity element as another parameter. By convention, this approach is called fold:

  def foldLeft(identity: A)(op: (A, A) => A): A

The reason for the name foldLeft is that the identity element is used as an initial argument for reducing the collection, which leads to the following sequence of calls:

op(op(op(op(identity, a1), a2), a3), a4), ...

Optionally, it is represented in postfix-notation:

(((identity op a1) op a2) op a3) ...

Which is, well, kind of folding the collection, starting with the identity and the first element of it.

The associativity of the operation and the identity element tells us that another approach is also possible, starting from the identity and the last element of the collection and then moving toward its head:

(a1 op (a2 op (a3 op identity))) ...

Folding in this direction is naturally called foldRight:

 def foldRight(identity: A)(op: (A, A) => A): A

The same properties also give us the ability to fold the collection, starting from any place! This is particularly useful in a balanced fold, which works from both ends:

(a1 op (a2 op identity)) op ((a3 op identity) op a4)

Interestingly, both sides can be dealt with recursively, in that we can split each of them into two parts and use a balanced fold again. Even more interestingly, as the left and right sides are folded independently from each other, the folding can be done in parallel!

In the same way, as we did for Reducible, we can group these functions into yet another abstraction, MonoidFoldable:


trait MonoidFoldable[A, F[_]] {
  def foldRight(as: F[A])(i: A, op: (A,A) => A): A
  def foldLeft(as: F[A])(i: A, op: (A,A) => A): A
  def foldBalanced(as: F[A])(i: A, op: (A,A) => A): A
}

This time, we define it as a type class that is capable of folding collections of type F with elements of type A. For most of the existing collections, instances of this type class should be able to delegate the foldLeft and foldRight implementations to the F. Let's demonstrate this with an instance of ListMonoidFoldable:

implicit def listMonoidFoldable[A : Monoid]: MonoidFoldable[A, List] = new MonoidFoldable[A, List] {
  private val m = implicitly[Monoid[A]]
  override def foldRight(as: List[A])(i: A, op: (A, A) => A): A = as.foldRight(m.identity)(m.op)

  override def foldLeft(as: List[A])(i: A, op: (A, A) => A): A = as.foldLeft(m.identity)(m.op)

  override def foldBalanced(as: List[A])(i: A, op: (A, A) => A): A = as match {
    case Nil => m.identity
    case List(one) => one
    case _ => val (l, r) = as.splitAt(as.length/2)
      m.op(foldBalanced(l)(m.identity, m.op), foldBalanced(r)(m.identity, m.op))
  }
}

First of all, we require that type A is a monoid. Then, we get an instance of it by using the usual approach of calling implicitly. Then, we implement foldRight and foldLeft by calling the corresponding methods on the underlying List. Finally, we implement foldBalanced in a head-recursive manner. This implementation splits the list into two halves and folds them independently, exactly as we reasoned before. It is not done in parallel, though.

We can improve on that aspect by utilizing the Future we discussed in the previous chapter. We introduced a new method, foldPar, which takes an additional implicit ExecutionContext:

def foldPar(as: F[A])(i: A, op: (A,A) => A)(implicit ec: ExecutionContext): A

The execution context needs to pass over the moment we create a Future for our parallel computation. The structure of the method is similar to the balancedFold, since we have to split the collection into two parts and fold it recursively. This time, we limit the minimal number of items in the collection to be folded in parallel because it might be more computationally expensive to create a Future than to fold a handful of elements in a sequential manner:

private val parallelLimit = 8
override def foldPar(as: List[A])(i: A, op: (A, A) => A)(implicit ec: ExecutionContext): Future[A] = {
  if (as.length < parallelLimit) Future(foldBalanced(as)(i, op))
  else {
    val (l, r) = as.splitAt(as.length/2)
    Future.reduceLeft(List(foldPar(l)(m.identity, m.op), foldPar(r)(m.identity, m.op)))(m.op)
  }
}

For simplicity, we've hardcoded the minimal number of elements eligible for parallel computation, but it can be passed over as a parameter. In the method itself, we either spawn a balanced fold in a separate thread if the collection is shorter than the limit or we initiate two parallel folds the same way we did before, but this time we use Future.reduceLeft to combine them together at the moment both computations are finished.

We expect foldPar to be quicker than other folds. Let's write a property for that. This endeavor will be more involved than before. The reason for this is that the monoids we've to build so far are very simple and, because of that, very fast. Because of this, we won't be able to see the advantages of parallelization—the price of spawning a Future will outweigh the folding itself. For our purposes, we'll make our monoid a bit slower by adding a small delay to it:

implicit val slowPoisonMonoid: Monoid[Fish] = new Monoid[Fish] {
  override def identity: Fish = ZeroFish
  override def op(l: Fish, r: Fish): Fish = {
    Thread.sleep(1)
    if (l.poisonousness > r.poisonousness) l else r
  }
}

Another point is that the list should be long enough, otherwise we won't be really testing the parallelization feature. We need to create a dedicated generator for lists between 100 and 1,000 in length:

val bucketOfFishGen: Gen[List[Fish]] = for {
  n <- Gen.choose(100, 1000)
  gen <- Gen.listOfN(n, fishGen)
} yield gen

implicit val arbBucketOfFish: Arbitrary[Bucket[Fish]] = Arbitrary(bucketOfFishGen)

We also need a helper method to measure the execution time of the code block:

def withTime(block: => Fish): (Fish, Long) = {
  val start = System.nanoTime()
  val result = block
  (result, (System.nanoTime() - start) / 1000000)
}

For foldPar, we also need an implicit execution context. We'll use the global one as it is good enough for our purposes:

import scala.concurrent.ExecutionContext.Implicits.global

With all the preparations out of the way, we can formulate our property:

property("foldPar is the quickest way to fold a list") = {
  import Monoid.slowPoisonMonoid
  val foldable = MonoidFoldable.listMonoidFoldable[Fish]

  forAllNoShrink((as: Bucket[Fish]) => {
       ...
    })
  }

In the body, we'll first measure the execution time of different folding approaches:

val (left, leftRuntime) = withTime(foldable.foldLeft(as))
val (right, rightRuntime) = withTime(foldable.foldRight(as))
val (balanced, balancedRuntime) = withTime(foldable.foldBalanced(as))
val (parallel, parallelRuntime) = withTime(Await.result(foldable.foldPar(as), 5.seconds))

foldPar returns a Future[Fish] as the result, so we're waiting for it to complete. Finally, we check that the result of all of folds is the same and that the parallel folding does not take more time than the other approaches. We label these properties appropriately:

  s"${as.size} fishes: $leftRuntime, $rightRuntime, $balancedRuntime, $parallelRuntime millis" |: all(
    "all results are equal" |: all(left == right, left == balanced, left == parallel),
    "parallel is quickest" |: parallelRuntime <= List(leftRuntime, rightRuntime, balancedRuntime).min
  )
})

Now, we can run our test and check that our implementation lives up to our expectations if tested in SBT session:

+ MonoidFoldable.foldPar is the quickest way to fold a list: OK, passed 100 tests.

It turns out that it does!

Needless to say that it is possible to define other instances of the MonoidFoldable type class. Moreover, the collection does not need to be linear. As soon as it is possible to iterate over its elements, any structure will do—a binary tree, a map, a bag, or anything even more sophisticated. This possibility of being able to abstract over different data structures and to parallelize computations is what makes monoids especially useful in big data and distributed scenarios.

Having said that, we need to emphasize that MonoidFoldable can be made even more flexible and general-purpose. To understand how this can be done, we need to look once again at the definition of the fold process we gave earlier:

(((identity op a1) op a2) op a3) ...

We can notice from this that the recursive operation takes two arguments and returns a result that becomes the first argument for the next iteration. This observation leads to the conclusion that the folding function does not need to have both arguments of the same type. As long as the return type is the same as the type of the first argument and the same as the type of the identity element, any function will be good for folding. This allows us to define a more generic Foldable abstraction that can convert elements of the collection and combine them at the same time:

trait Foldable[F[_]] {
  def foldLeft[A,B](as: F[A])(z: B)(f: (B, A) => B): B
  def foldRight[A,B](as: F[A])(z: B)(f: (A, B) => B): B
}

This approach allows us to use any function, not only monoids, for folding. Of course, it is still possible to use existing monoid definitions, for example, by defining a method that would accept a monoid as an implicit parameter:

  def foldMap[A,B : Monoid](as: F[A])(f: A => B): B = {
    val m = implicitly[Monoid[B]]
    foldLeft(as)(m.identity)((b, a) => m.op(f(a), b))
  }

This definition relies on the existence of some complementary function, f, which converts elements of the collection into the appropriate type before they can be combined using the monoid operation.

Looking in the opposite direction, abstract algebraic structures do not end with monoids. In fact, there are a lot of more advanced definitions out there like the group, abelian group, and ring.

We'll take a short look at the implementation of the group just to reinforce our understanding of the topic of algebraic structures.

Table of Contents for Foldable

Create new playlist

Sign In

Sign Up

Table of Contents for
Foldable