Chapter 13. Collections

Topics in This Chapter A2

In this chapter, you will learn about the Scala collections library from a library user’s point of view. In addition to arrays and maps, which you have already encountered, you will see other useful collection types. There are many methods that can be applied to collections, and this chapter presents them in an orderly way.

The key points of this chapter are:

  • All collections extend the Iterable trait.

  • The three major categories of collections are sequences, sets, and maps.

  • Scala has mutable and immutable versions of most collections.

  • A Scala list is either empty, or it has a head and a tail which is again a list.

  • Sets are unordered collections.

  • Use a LinkedHashSet to retain the insertion order or a SortedSet to iterate in sorted order.

  • + adds an element to an unordered collection; +: and :+ prepend or append to a sequence; ++ concatenates two collections; - and -- remove elements.

  • The Iterable and Seq traits have dozens of useful methods for common operations. Check them out before writing tedious loops.

  • Mapping, folding, and zipping are useful techniques for applying a function or operation to the elements of a collection.

  • You can consider an Option to be a collection of size 0 or 1.

  • A lazy list is evaluated on demand and can hold an unbounded number of elements.

13.1 The Main Collections Traits

Figure 13–1 shows the most important traits that make up the Scala collections hierarchy.

Images

Figure 13–1 Key traits in the Scala collections hierarchy

An Iterable is any collection that can yield an Iterator with which you can access all elements in the collection:


val coll = ... // some Iterable
val iter = coll.iterator
while iter.hasNext do
  process(iter.next())

In a Seq, the elements are ordered, and the iterator visits the elements in order. An IndexedSeq allows fast random access through an integer index. For example, an ArrayBuffer is indexed but a linked list is not. A LinearSeq allows fast head and tail operations. A Buffer allows fast insertion and removal at the end. An ArrayBuffer is both a Buffer and, as already mentioned, an IndexedSeq.

A Set is an unordered collection of values. In a SortedSet, elements are always visited in sorted order.

A Map is a set of (key, value) pairs. A SortedMap visits the entries as sorted by the keys. See Chapter 4 for more information.

This hierarchy is similar to that in Java, with a couple of welcome improvements:

  1. Maps are a part of the hierarchy and not a separate hierarchy.

  2. IndexedSeq is the supertype of arrays but not of lists, allowing you to tell the two apart.

Images Note

In Java, both ArrayList and LinkedList implement a common List interface, making it difficult to write efficient code when random access is preferred, for example when searching in a sorted sequence. This was a flawed design decision in the original Java collections framework. In a later version, a marker interface RandomAccess was added to deal with this problem.

Each Scala collection trait has a companion object with an apply method for constructing an instance of the collection. For example,


Iterable(0xFF, 0xFF00, 0xFF0000)
Set(Color.RED, Color.GREEN, Color.BLUE)
Map(Color.RED -> 0xFF0000, Color.GREEN -> 0xFF00, Color.BLUE -> 0xFF)
SortedSet("Hello", "World")

This is called the “uniform creation principle.”

There are methods toSeq, toSet, toMap, and so on, as well as a generic to method, that you can use to translate between collection types.


val coll = Seq(1, 1, 2, 3, 5, 8, 13)
val set = coll.toSet
val buffer = coll.to(ArrayBuffer)

Images Note

You can use the == operator to compare any sequence, set, or map with another collection of the same kind. For example, Seq(1, 2, 3) == (1 to 3) yields true. But comparing different kinds, for example Seq(1, 2, 3) == Set(1, 2, 3), is a compilation error. In that case, use the sameElements method.

13.2 Mutable and Immutable Collections

Scala supports both mutable and immutable collections. An immutable collection can never change, so you can safely share a reference to it, even in a multithreaded program. For example, there is a scala.collection.mutable.Map and a scala.collection.immutable.Map. Both have a common supertype scala.collection.Map (which, of course, contains no mutation operations).

Images Note

When you have a reference to a scala.collection.immutable.Map, you know that nobody can change the map. If you have a scala.collection.Map, then you can’t change it, but someone else might.

Scala gives a preference to immutable collections. The scala package and the Predef object, which are always imported, have type aliases Seq, IndexedSeq, List, Set, and Map that refer to the immutable traits. For example, Predef.Map is the same as scala.collection.immutable.Map.

Images Tip

With the statement

import scala.collection.mutable

you can get an immutable map as Map and a mutable one as mutable.Map.

If you had no prior experience with immutable collections, you may wonder how you can do useful work with them. The key is that you can create new collections out of old ones. For example, if numbers is an immutable set, then

numbers + 9

is a new set containing the numbers together with 9. If 9 was already in the set, you just get a reference to the old set. This is particularly natural in recursive computations. For example, here we compute the set of all digits of an integer:

def digits(n: Int): Set[Int] =
  if n < 0 then digits(-n)
  else if n < 10 then Set(n)
  else digits(n / 10) + (n % 10)

This method starts out with a set containing a single digit. At each step, another digit is added. However, adding the digit doesn’t mutate a set. Instead, in each step, a new set is constructed.

Images Note

In addition to the scala.collection.immutable and scala.collection.mutable packages, there is a scala.collection.concurrent package. It has a Map trait with methods for atomic processing: putIfAbsent, remove, replace, getOrElseUpdate, updateWith. A TrieMap implementation is provided. You can also adapt a Java ConcurrentHashMap to this trait—see Section 13.13, “Interoperability with Java Collections,” on page 204.

13.3 Sequences

Figure 13–2 shows the most important immutable sequences.

Images

Figure 13–2 Immutable sequences

A Vector is the immutable equivalent of an ArrayBuffer: an indexed sequence with fast random access. Vectors are implemented as trees where each node has up to 32 children. For a vector with one million elements, one needs four layers of nodes. (Since 103 ≈ 210, 106 ≈ 324.) Accessing an element in such a list will take 4 hops, whereas in a linked list it would take an average of 500,000.

s

A Range represents an integer sequence, such as 0,1,2,3,4,5,6,7,8,9 or 10,20,30. Of course a Range object doesn’t store all sequence values but only the start, end, and increment. You construct Range objects with the to and until methods, as described in Chapter 2.

We discuss lists in the next section, and lazy lists in Section 13.12, “Lazy Lists,” on page 202.

See Figure 13–3 for the most useful mutable sequences.

We discussed array buffers in Chapter 3. Stacks, queues, and priority queues are standard data structures that are useful for implementing certain algorithms. If you are familiar with these structures, the Scala implementations won’t surprise you.

Images

Figure 13–3 Mutable sequences

13.4 Lists

In Scala, a list is either Nil (that is, empty) or an object with a head element and a tail that is again a list. For example, consider the list

val digits = List(4, 2)

The value of digits.head is 4, and digits.tail is List(2). Moreover, digits.tail.head is 2 and digits.tail.tail is Nil.

The :: operator makes a new list from a given head and tail. For example,

9 :: List(4, 2)

is List(9, 4, 2). You can also write that list as

9 :: 4 :: 2 :: Nil

Note that :: is right-associative. With the :: operator, lists are constructed from the end:

9 :: (4 :: (2 :: Nil))

Images Note

There is a similar right-associative operator to build tuples:

1 *: 2.0 *: "three" *: EmptyTuple

yields the tuple (1, 2.0, "three").

In Java or C++, one uses an iterator to traverse a linked list. You can do this in Scala as well, but it is often more natural to use recursion. For example, the following function computes the sum of all elements in a linked list of integers:

def sum(lst: List[Int]): Int =
  if lst == Nil then 0 else lst.head + sum(lst.tail)

Or, if you prefer, you can use pattern matching:

def sum(lst: List[Int]): Int = lst match
  case Nil => 0
  case h :: t => h + sum(t) // h is lst.head, t is lst.tail

Note the :: operator in the second pattern. It “destructures” the list into head and tail.

Images Note

Recursion works so naturally because the tail of a list is again a list.

Of course, for this particular example, you do not need to use recursion at all. The Scala library already has a sum method:

List(9, 4, 2).sum // Yields 15

If you want to mutate list elements in place, you can use a ListBuffer, a data structure that is backed by a linked list with references to the first and last node. This makes it efficient to add or remove elements at either end of the list.

However, adding or removing elements in the middle is not efficient. For example, suppose you want to remove every second element of a mutable list. With a Java LinkedList, you use an iterator and call remove after every second call to next. There is no analogous operation on a ListBuffer. Of course, removing multiple elements by their index positions is very inefficient in a linked list. Your best bet is to generate a new list with the result (see Exercise 3 on page 206).

13.5 Sets

A set is a collection of distinct elements. Trying to add an existing element has no effect. For example,

Set(2, 0, 1) + 1

is the same as Set(2, 0, 1).

Unlike lists, sets do not retain the order in which elements are inserted. By default, sets are implemented as hash sets in which elements are organized by the value of the hashCode method. (In Scala, as in Java, every object has a hashCode method.)

For example, if you iterate over

Set(1, 2, 3, 4, 5, 6)

the elements are visited in the order

5 1 6 2 3 4

You may wonder why sets don’t retain the element order. It turns out that you can find elements much faster if you allow sets to reorder their elements. Finding an element in a hash set is much faster than in an array or list.

A linked hash set remembers the order in which elements were inserted. It keeps a linked list for this purpose. For example,

val weekdays = scala.collection.mutable.LinkedHashSet("Mo", "Tu", "We", "Th", "Fr")

If you want to iterate over elements in sorted order, use a sorted set:

val numbers = scala.collection.mutable.SortedSet(1, 2, 3, 4, 5, 6)

A bit set is an implementation of a set of non-negative integers as a sequence of bits. The ith bit is 1 if i is present in the set. This is an efficient implementation as long as the maximum element is not too large. Scala provides both mutable and immutable BitSet classes.

The contains method checks whether a set contains a given value. The subsetOf method checks whether all elements of a set are contained in another set.

val digits = Set(1, 7, 2, 9)
digits.contains(0) // false
Set(1, 2).subsetOf(digits) // true

The union, intersect, and diff methods carry out the usual set operations. If you prefer, you can write them as |, &, and &~. You can also write union as ++ and difference as --. For example, if we have the set

val primes = Set(2, 3, 5, 7)

then digits.union(primes) is Set(1, 2, 3, 5, 7, 9), digits & primes is Set(2, 7), and digits -- primes is Set(1, 9).

13.6 Operators for Adding or Removing Elements

When you want to add or remove an element, or a number of elements, the operators to use depend on the collection type. Table 13–1 provides a summary.

Table 13–1 Operators for Adding and Removing Elements

Operator

Description

Collection type

coll :+ elem

elem +: coll

A collection of the same type as coll to which elem has been appended or prepended.

Seq

coll + elem

coll - elem

A collection of the same type as coll with the given element added or removed.

Immutable Set, Map

coll ++ coll2

A collection of the same type as coll, containing the elements of both collections.

Iterable

coll -- coll2

A collection of the same type as coll from which the elements of coll2 have been removed. (For sequences, use diff.)

Immutable Set, Map

elem :: lst

lst2 ::: lst

A list with the element or given list prepended to lst. Same as +: and ++:.

List

set | set2

set & set2

set &~ set2

Set union, intersection, difference. | is the same as ++, and &~ is the same as --.

Set

coll += elem

coll ++= coll2

coll -= elem

coll --= coll2

Modifies coll by adding or removing the given elements.

Mutable collections

elem +=: coll

coll2 ++=: coll

Modifies coll by prepending the given element or collection, alias for prepend.

Buffer

Note that + is used for adding an element to an unordered immutable collection, while +: and :+ add an element to the beginning or end of an ordered collection.

Vector(1, 2, 3) :+ 5 // Yields Vector(1, 2, 3, 5)
0 +: 1 +: Vector(1, 2, 3) // Yields Vector(0, 1, 1, 2, 3)

Note that +:, like all operators ending in a colon, is right-associative, and that it is a method of the right operand.

These operators return new collections (of the same type as the original ones) without modifying the original. Mutable collections have a += operator that mutates the left-hand side. For example,

val numberBuffer = ArrayBuffer(1, 2, 3)
numberBuffer += 5 // Adds 5 to numberBuffer

With an immutable collection, you can use += or :+= with a var, like this:

var numberSet = Set(1, 2, 3)
numberSet += 5 // Sets numberSet to the immutable set numberSet + 5
var numberVector = Vector(1, 2, 3)
numberVector :+= 5 // += does not work since vectors don’t have a + operator

To remove an element, use the - operator:

Set(1, 2, 3) - 2 // Yields Set(1, 3)

You can add multiple elements with the ++ operator:

coll ++ coll2

yields a collection of the same type as coll that contains both coll and coll2. Similarly, the -- operator removes multiple elements.

Images Note

For lists, you can use +: instead of :: for consistency, with one exception: Pattern matching (case h::t) does not work with the +: operator.

Images CAUTION

The + and - operators are deprecated for mutable collections, as well as collections of unknown mutability (such as scala.collection.Set). They don’t mutate the collection but compute new collections with elements added or removed. If you really want to do this with a (potentially) mutable collection, use the ++ and -- operators.

Images CAUTION

The operators coll += (e1, e2, ...) and coll -= (e1, e2, ...) with multiple arguments are deprecated, since infix operators with more than two arguments are now discouraged.

As you can see, Scala provides many operators for adding and removing elements. Here is a cheat sheet:

  1. Append (:+) or prepend (+:) to a sequence.

  2. Add (+) to an unordered collection.

  3. Remove with the - operator.

  4. Use ++ and -- for bulk add and remove.

  5. Mutations are += ++= -= --=.

  6. For lists, many Scala programmers prefer the :: and ::: operators to +: and ++.

  7. Stay away from deprecated += (e1, e2, ...), -= (e1, e2, ...), ++: and enigmatic +=:, ++=:.

13.7 Common Methods

Table 13–2 gives a brief overview of the most important methods of the Iterable trait, sorted by functionality.

Table 13–2 Important Methods of the Iterable Trait

Methods

Description

head, last, headOption, lastOption

Returns the first or last element; or, the first or last element as an Option.

tail, init

Returns everything but the first or last element.

length, isEmpty

Returns the length, or true if the length is zero.

map(f), flatMap(f), foreach(f), mapInPlace(f)collect(pf)

Applies a function to all elements; see Section 13.8.

reduceLeft(op), reduceRight(op), foldLeft(init)(op), foldRight(init)(op)

Applies a binary operation to all elements in a given order; see Section 13.9.

reduce(op), fold(init)(op), aggregate(init)(op, combineOp)

Applies a binary operation to all elements in arbitrary order.

sum, product, max, min, maxBy(f), minBy(f), maxOption, minOption, maxByOption(f), minByOption(f)

Returns the sum or product (provided the element type can be implicitly converted to the Numeric trait), or the maximum or minimum. The max and min functions require that the element type has an Ordering. The maxBy and minBy functions measure the elements by applying the function f. The methods ending in Option return an Option so that they can be safely applied to empty collections.

count(pred), forall(pred), exists(pred)

Returns the count of elements fulfilling the predicate; true if all elements do, or at least one element does.

filter(pred), filterNot(pred), partition(pred)

Returns all elements fulfilling or not fulfilling the predicate; the pair of both.

takeWhile(pred), dropWhile(pred), span(pred)

Returns the first elements fulfilling pred; all but those elements; the pair of both.

take(n), drop(n), splitAt(n)

Returns the first n elements; everything but the first n elements; the pair of both.

takeRight(n), dropRight(n)

Returns the last n elements; everything but the last n elements.

slice(from, to), view(from, to)

Returns the elements in the range from until to, or a view thereto.

zip(coll2), zipAll(coll2, fill, fill2), lazyZip(coll2), zipWithIndex

Returns pairs of elements from this collection and another; see Section 13.10.

grouped(n), sliding(n)

Returns iterators of subcollections of length n; grouped yields elements with index 0 until n, then with index n until 2 * n, and so on; sliding yields elements with index 0 until n, then with index 1 until n + 1, and so on.

groupBy(k), groupMap(k, m), groupMapReduce(k, m, r)

The groupBy method yields a map whose keys are k(x) for all elements x. The value for each key is the collection of elements with that key. The groupMap method further applies the function m to each value, and groupMapReduce applies m and then reduces the collection with r.

mkString(before, between, after), addString(sb, before, between, after)

Makes a string of all elements, adding the given strings before the first, between each, and after the last element. The second method appends that string to a string builder.

toIterable, toSeq, toIndexedSeq, toArray, toBuffer, toList, toSet, toVector, toMap, to

Converts the collection to a collection of the specified type.

The Seq trait adds several methods to the Iterable trait. Table 13–3 shows the most important ones.

Table 13–3 Important Methods of the Seq Trait

Methods

Description

coll(k)
(i.e., coll.apply(k))

The kth sequence element.

contains(elem), containsSlice(seq), startsWith(seq), endsWith(seq)

Returns true if this sequence contains the given element or sequence; if it starts or ends with the given sequence.

indexOf(elem), lastIndexOf(elem), indexOfSlice(seq), lastIndexOfSlice(seq)

Returns the index of the first or last occurrence of the given element or element sequence.

indexWhere(pred)

Returns the index of the first element fulfilling pred.

prefixLength(pred), segmentLength(pred, n)

Returns the length of the longest sequence of elements fulfilling pred, starting with 0 or n.

padTo(n, fill)

Returns a copy of this sequence, with fill appended until the length is n.

intersect(seq), diff(seq)

Returns the “multiset” intersection or difference of the sequences. For example, if a contains five 1s and b contains two, then a intersect b contains two (the smaller count), and a diff b contains three (the difference).

reverse

The reverse of this sequence.

sorted, sortWith(less), sortBy(f)

The sequence sorted using the element ordering, the binary less function, or a function f that maps each element to an ordered type.

permutations, combinations(n)

Returns an iterator over all permutations or combinations (subsequences of length n).

Images Note

These methods never mutate a collection. If their result is a collection, the type is of the same type as the original, or as close as possible to it. (With types such as Range and BitSet, it can happen that the result is a more general sequence or set.) This is sometimes called the “uniform return type” principle.

13.8 Mapping a Function

You often want to transform all elements of a collection. The map method applies a function to a collection and yields a collection of the results. For example, given a list of strings

val names = List("Peter", "Paul", "Mary")

you get a list of the uppercased strings as

names.map(_.toUpperCase) // List("PETER", "PAUL", "MARY")

This is exactly the same as

n <- names yield n.toUpperCase

If the function yields a collection instead of a single value, you may want to concatenate all results. In that case, use flatMap. For example, consider

def ulcase(s: String) = Vector(s.toUpperCase, s.toLowerCase)

Then names.map(ulcase) is

List(Vector("PETER", "peter"), Vector("PAUL", "paul"), Vector("MARY", "mary"))

but names.flatMap(ulcase) is

List("PETER", "peter", "PAUL", "paul", "MARY", "mary")

Images Tip

If you use flatMap with a function that returns an Option, the resulting collection contains all values v for which the function returns Some(v).

For example, given a list of keys and a map, here is a list of the matching values that are actually present:

val scores = Map("Alice" -> 10, "Bob" -> 3, "Cindy" -> 8)
val keys = Array("Alice", "Cindy", "Eloïse")
keys.flatMap(k => scores.get(k)) // Array(10, 8)

Images Note

The map and flatMap methods are important because they are used for translating for expressions. For example, the expression

for i <- 1 to 10 yield i * i

is translated to

(1 to 10).map(i => i * i)

and

for i <- 1 to 10; j <- 1 to i yield i * j

becomes

(1 to 10).flatMap(i => (1 to i).map(j => i * j))

Why flatMap? See Exercise 9 on page 206.

The mapInPlace method is the in-place equivalent of map. It applies to mutable collections, and replaces each element with the result of a function. For example, the following code changes all buffer elements to uppercase:

val buffer = ArrayBuffer("Peter", "Paul", "Mary")
buffer.mapInPlace(_.toUpperCase)

If you just want to apply a function for its side effect and don’t care about the function values, use foreach:

names.foreach(println)

The collect method works with partial functions—functions that may not be defined for all inputs. It yields a collection of all function values of the arguments on which it is defined. For example,

"-3+4".collect({ case '+' => 1 ; case '-' => -1 }) // Vector(-1, 1)

The groupBy method yields a map whose keys are the function values, and whose values are the collections of elements whose function value is the given key. For example,

val words = ...
val map = words.groupBy(_.substring(0, 1).toUpperCase)

builds a map that maps "A" to all words starting with A, and so on.

13.9 Reducing, Folding, and Scanning (A3)

The map method applies a unary function to all elements of a collection. The methods that we discuss in this section combine elements with a binary function. The call c.reduceLeft(op) applies op to successive elements, like this:

            .
            .
           .
          op
         /  
        op   coll(3)
       /  
      op   coll(2)
     /  
coll(0)  coll(1)

For example,

List(1, 7, 2, 9).reduceLeft(_ - _)

is

            -
           /   
          -     9
        /   
       -     2
     /   
    1     7

or

((1 - 7) - 2) - 9 = 1 - 7 - 2 - 9 = -17

The reduceRight method does the same, but it starts at the end of the collection. For example,

List(1, 7, 2, 9).reduceRight(_ - _)

is

1 - (7 - (2 - 9)) = 1 - 7 + 2 - 9 = -13

Images Note

Reducing assumes that the collection has at least one element. To avoid an exception for potentially empty collections, use reduceLeftOption and reduceRightOption.

Often, it is useful to start the computation with an initial element other than the initial element of a collection. The call coll.foldLeft(init)(op) computes

          .
          .
         .
        op
       /  
      op   coll(2)
     /  
    op   coll(1)
   /  
init   coll(0)

For example,

List(1, 7, 2, 9).foldLeft(0)(_ - _)

is

0 - 1 - 7 - 2 - 9 = -19

Images Note

The initial value and the operator are separate “curried” parameters so that Scala can use the type of the initial value for type inference in the operator. For example, in List(1, 7, 2, 9).foldLeft("")(_ + _), the initial value is a string, so the operator must be a function (String, Int) => String.

There is a foldRight variant as well, computing

     .
       .
        .
        op
       /  
coll(n-3)  op
          /  
   coll(n-2)  op
             /  
      coll(n-1)  init

These examples don’t seem to be very useful. Of course, coll.reduceLeft(_ + _) or coll.foldLeft(0)(_ + _) computes the sum, but you can get that directly with coll.sum.

Folding is sometimes attractive as a replacement for a loop. Suppose, for example, we want to count the frequencies of the letters in a string. One way is to visit each letter and update a mutable map.

val freq = scala.collection.mutable.Map[Char, Int]()
for c <- "Mississippi" freq(c) = freq.getOrElse(c, 0) + 1
// Now freq is Map('i' -> 4, 'M' -> 1, 's' -> 4, 'p' -> 2)

Here is another way of thinking about this process. At each step, combine the frequency map and the newly encountered letter, yielding a new frequency map. That’s a fold:

               .
               .
              .
             op
            /  
           op   's'
          /  
         op   'i'
        /  
empty map  'M'

What is op? The left operand is the partially filled map, and the right operand is the new letter. The result is the augmented map. It becomes the input to the next call to op, and at the end, the result is a map with all counts. The code is

"Mississippi".foldLeft(Map[Char, Int]())((m, c) => m + (c -> (m.getOrElse(c, 0) + 1)))

Note that this is an immutable map. We compute a new map at each step.

Images Note

It is possible to replace any while loop with a fold. Build a data structure that combines all variables updated in the loop, and define an operation that implements one step through the loop. I am not saying that this is always a good idea, but you may find it interesting that loops and mutations can be eliminated in this way.

Finally, the scanLeft and scanRight methods combine folding and mapping. You get a collection of all intermediate results. For example,

(1 to 10).scanLeft(0)(_ + _)

yields all partial sums:

Vector(0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55)

13.10 Zipping

The methods of the preceding section apply an operation to adjacent elements in the same collection. Sometimes, you have two collections, and you want to combine corresponding elements. For example, suppose you have a list of product prices and corresponding quantities:

val prices = List(5.0, 20.0, 9.95)
val quantities = List(10, 2, 1)

The zip method lets you combine them into a list of pairs. For example,

prices.zip(quantities)

is a List[(Double, Int)]:

List[(Double, Int)] = List((5.0, 10), (20.0, 2), (9.95, 1))

The method is called “zip” because it combines the two collections like the teeth of a zipper.

Now it is easy to apply a function to each pair.

prices.zip(quantities).map(_ * _)

It is perhaps surprising that you can provide a function of type (Double, Int) => Double when the collection contains (Double, Int) pairs. Exercise 7 on page 206 explores how this “tupling” works.

The result is a list of prices:

List(50.0, 40.0, 9.95)

The total price of all items is then

prices.zip(quantities).map(_ * _).sum

If one collection is shorter than the other, the result has as many pairs as the shorter collection. For example,

List(5.0, 20.0, 9.95).zip(List(10, 2))

is

List((5.0, 10), (20.0, 2))

The zipAll method lets you specify defaults for the shorter list:

List(5.0, 20.0, 9.95).zipAll(List(10, 2), 0.0, 1)

is

List((5.0, 10), (20.0, 2), (9.95, 1))

The zipWithIndex method returns a list of pairs where the second component is the index of each element. For example,

"Scala".zipWithIndex

is

Vector(('S', 0), ('c', 1), ('a', 2), ('l', 3), ('a', 4))

This can be useful if you want to compute the index of an element with a certain property. For example,

"Scala".zipWithIndex.max

is ('l', 3), giving you both the maximum and the position where it occurs.

13.11 Iterators

You can obtain an iterator from a collection with the iterator method. This isn’t as common as in Java or C++ because you can usually get what you need more easily with one of the methods from the preceding sections.

However, iterators are useful for collections that are expensive to construct fully. For example, Source.fromFile yields an iterator because it might not be efficient to read an entire file into memory. There are a few Iterable methods that yield an iterator, such as permutations, grouped, or sliding.

When you have an iterator, you can iterate over the elements with the next and hasNext methods and stop when you have seen enough:

val iter = ... // some Iterator
var done = false
while !done && iter.hasNext do
  done = process(iter.next())

Be aware that the iterator operations move the iterator. There is no way to reset it to the start of the iteration.

If the stopping condition is simple, you can avoid the loop. The Iterator class has many methods that work identically to the methods on collections. In particular, all Iterable methods listed in Section 13.7, “Common Methods,” on page 193 are available, except for head, headOption, last, lastOption, tail, init, takeRight, and dropRight. After calling a method such as map, filter, count, sum, or even length, the iterator is at the end of the collection, and you can’t use it again. With other methods, such as find or take, the iterator is past the found element or the taken ones.

val result = iter.take(500).toList
val result2 = iter.takeWhile(isNice).toList
// isNice is some function returning Boolean.

Sometimes, you want to be able to look at the next element before deciding whether to consume it. In that case, use the buffered method to turn an Iterator into a BufferedIterator. The head method yields the next element without advancing the iterator.

val iter = scala.io.Source.fromFile(filename).buffered
while iter.hasNext && iter.head == '#' do
  while iter.hasNext && iter.head != '
' do
    iter.next
// Now iter points to the first line not starting with #

Images Tip

If you find it too tedious to work with an iterator, and it doesn’t produce a huge number of elements, just dump the elements into a collection by calling toSeq, toArray, toBuffer, toSet, or toMap to copy the values into a collection.

13.12 Lazy Lists (A3)

In the preceding section, you saw that an iterator is a “lazy” alternative to a collection. You get the elements as you need them. If you don’t need any more elements, you don’t pay for the expense of computing the remaining ones.

However, iterators are fragile. Each call to next mutates the iterator. Lazy lists offer an immutable alternative. A lazy list is an immutable list in which the elements are computed lazily—that is, only when you ask for them.

Here is a typical example:

def numsFrom(n: BigInt): LazyList[BigInt] = { log(n) ; n } #:: numsFrom(n + 1)

The #:: operator is like the :: operator for lists, but it constructs a lazy list.

You couldn’t do this with a regular list. You would get a stack overflow when the numsFrom function calls itself recursively. With a lazy list, the expressions to the left and right of #:: are only executed when needed.

When you call

val tenOrMore = numsFrom(10)

you get a lazy list that is displayed as

LazyList(<not computed>)

The elements are unevaluated, and the log shows nothing. If you call

tenOrMore.tail.tail.tail.head

the result is 13 and the log shows

10 11 12 13

The lazy list caches the evaluated head values. If you call

tenOrMore.tail.tail.tail.head

one more time, you get the result without another call to log.

When you apply a method to a lazy list that yields another list, such as map or filter, the result is also lazy. For example,

val squares = numsFrom(1).map(x => x * x)

yields

LazyList(<not computed>)

You have to call squares.head to force evaluation of the first entry.

If you want to get more than one answer, you can invoke take followed by force, which forces evaluation of all values. For example,

squares.take(5).force

produces LazyList(1, 4, 9, 16, 25).

Of course, you don’t want to call

squares.force // No!

That call would attempt to evaluate all members of an infinite list, eventually causing an OutOfMemoryError.

You can construct a lazy list from an iterator. For example, the Source.getLines method returns an Iterator[String]. With that iterator, you can only visit the lines once. A lazy list caches the visited lines so you can revisit them:

val words = Source.fromFile("/usr/share/dict/words").getLines.to(LazyList)

You just saw that methods such as map and filter are computed on demand when applied to a lazy list. You can get a similar effect with other collections by applying the view method. For example,

val palindromicSquares = (1 to 1000000).view
  .map(x => x * x)
  .filter(x => x.toString == x.toString.reverse)

yields a collection that is unevaluated. When you call

palindromicSquares.take(10).mkString(",")

then enough squares are generated until ten palindromes have been found, and then the computation stops.

Unlike lazy lists, views do not cache any values. If you call palindromicSquares.take(10).mkString(",") again, the computation starts over.

13.13 Interoperability with Java Collections

At times you may need to use a Java collection, and you will likely miss the rich set of methods that you get with Scala collections. Conversely, you may want to build up a Scala collection and then pass it to Java code. The CollectionConverters object provides a set of conversions between Scala and Java collections.

For example,

import scala.jdk.CollectionConverters.*

Table 13–4 shows the conversions from Scala to Java collections.

Table 13–4 Conversions between Scala Collections and Java Collections

Conversion function

Type in scala.collection

Type (generally in java.util)

asJava/asScala

Iterable

java.lang.Iterable

asJavaCollection/asScala

Iterable

Collection

asJava/asScala

Iterator

Iterator

asJavaEnumeration/asScala

Iterator

Enumeration

asJava/asScala

mutable.Buffer

List

asJava

Seq, mutable.Seq

List

asJava/asScala

mutable.Set

Set

asJava

Set

Set

asJava/asScala

mutable.Map

Map

asJava

Map

Map

asJavaDictionary/asScala

Map

Dictionary

asJava/asScala

concurrent.Map

concurrent.ConcurrentMap

Note that the conversions yield wrappers that let you use the target interface to access the original type. For example, if you use

val props = System.getProperties.asScala

then props is a wrapper whose methods call the methods of the underlying Java object. If you call

props("com.horstmann.scala") = "impatient"

then the wrapper calls put("com.horstmann.scala", "impatient") on the underlying Properties object.

To convert from a Scala collection to a Java stream (sequential or parallel), start with the statement

import scala.jdk.StreamConverters.*

Then use one of the following methods:

  • asJavaSeqStream, asJavaParStream for sequences, maps, or strings

  • asJavaParKeyStream, asJavaParValueStream for maps

  • asJavaSeqCodePointStream, asJavaParCodePointStream for code points of strings

  • asJavaSeqCharStream, asJavaParCharStream for UTF-16 code units of strings

If the element type is a primitive type, the resulting Java stream is a DoubleStream, IntStream, or LongStream.

To convert from a Java stream to a Scala collection, use the toScala method and pass the collection type.

val lineBuffer = Files.lines(Path.of(filename)).toScala(Buffer)

Exercises

1. Write a function that, given a string, produces a map of the indexes of all characters. For example, indexes("Mississippi") should return a map associating 'M' with the set {0}, 'i' with the set {1, 4, 7, 10}, and so on. Use a mutable map of characters to mutable sets. How can you ensure that the set is sorted?

2. Repeat the preceding exercise, using an immutable map of characters to lists.

3. Write a function that removes every second element from a ListBuffer. Try it two ways. Call remove(i) for all even i starting at the end of the list. Copy every second element to a new list. Compare the performance.

4. Write a function that receives a collection of strings and a map from strings to integers. Return a collection of integers that are values of the map corresponding to one of the strings in the collection. For example, given Array("Tom", "Fred", "Harry") and Map("Tom" -> 3, "Dick" -> 4, "Harry" -> 5), return Array(3, 5). Hint: Use flatMap to combine the Option values returned by get.

5. Implement a function that works just like mkString, using reduceLeft.

6. Given a list of integers lst, what is lst.foldRight(List[Int]())(_ :: _)? lst.foldLeft(List[Int]())(_ :+ _)? How can you modify one of them to reverse the list?

7. In Section 13.10, “Zipping,” on page 200, it is not obvious why the expression prices.zip(quantities).map(_ * _) works. Shouldn’t that be prices.zip(quantities).map(t => t(0) * t(1))? After all, the parameter of map is a function that takes a single parameter. Verify this by passing a function

((Double, Int)) => Double

The Scala compiler converts the expression _ * _ to a function with a single tuple parameter. Explore in which contexts this happens. Does it work for function literals with explicit parameters (x: Double, y: Int) => x * y? Can you assign a literal to a variable of type ((Double, Int)) => Double?

Conversely, what happens when you pass a variable f of type (Double, Int) => Double, that is, prices.zip(quantities).map(f)? How can you fix that? (Hint: tupled.) Does tupled work with function literals?

8. Write a function that turns an array of Double values into a two-dimensional array. Provide a parameter for the number of columns. For example, with Array(1, 2, 3, 4, 5, 6) and three columns, return Array(Array(1, 2, 3), Array(4, 5, 6)). Use the grouped method.

9. The Scala compiler transforms a for/yield expression

for i <- 1 to 10; j <- 1 to i yield i * j

to invocations of flatMap and map, like this:

(1 to 10).flatMap(i => (1 to i).map(j => i * j))

Explain the use of flatMap. Hint: What is (1 to i).map(j => i * j) when i is 1, 2, 3?

What happens when there are three generators in the for/yield expression?

10. Write a function that computes the sum of the non-None values in a List[Option[Int]]. Hint: flatten.

11. The method java.util.TimeZone.getAvailableIDs yields time zones such as Africa/Cairo and Asia/Chungking. Which continent has the most time zones? Hint: groupBy.

12. Produce a lazy list of random numbers. Hint: continually.

13. A fixed point of a function f is an argument x such that f(x) == x. Sometimes, it is possible to find fixed points by starting with a value a and then computing f(a), f(f(a)), f(f(f(a))), and so on, until the sequence converges to a fixed point x. I happen to know this from personal experience. Bored in math class, I kept hitting the cosine key of my calculator in radian mode, and pretty soon got a value for which cos(x) == x.

Produce a lazy list of iterated function applications and search for identical consecutive values. Hint: sliding(2).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.58.122