Chapter 3. Collections and the Joy of Immutability

In this chapter, we're going to explore Scala's collections classes and how to use them. Most Scala collection classes are immutable, meaning that once they are instantiated, the instances cannot be changed. You're used to immutability, as Java Strings are immutable. The conjunction of collections being immutable and providing powerful iteration features leads to more concise, higher-performance code that does extremely well in multicore, multithreaded concurrency situations.

Thinking Immutably

In Java, the most commonly used types are immutable. Once an instance is created, it cannot be changed. In Java, String, int, long, double, and boolean are all immutable data types. Of these, String is a subclass of Object. Once a String is created, it cannot be changed. This has lots of benefits. You don't have to synchronize access to a String, even if it is shared by many threads, because there's no chance that it will be modified while another thread is accessing it. You don't have to keep a private copy of a String in case another method modifies it out from under you. When you pass String and other immutable types around in a Java program, you don't have to be defensive about using the instance. You can store it without fear that another method or thread will toLowerCase it.

Using immutable data structures means less defensive programming, fewer defects, and, in most cases, better performance. So, you ask, why doesn't Java have a lot more immutable data structures?

There are two ends of the programming spectrum: the how end and the what end. Assembly language is at the far end of the how part of the spectrum. When you program in assembly language, you direct the CPU to move bytes around memory, perform arithmetic operations and tests, and change the program counter. These directions—imperatives if you will—direct the computer's operation (in other words, we tell it how to do its tasks). C is a thin layer on top of assembly language and continues to be a language oriented toward directing the steps that the CPU will take.

Spreadsheets are far at the what end of the spectrum. Spreadsheets contain formulas that define the relationship between cells (so we tell the computer what we want to do). The order of evaluating the cells, the cells that are calculated based on changes in other cells, and so on, are not specified by the user but are inferred by the spreadsheet engine based on the relationships among the cells (the computer does the how part for us). In a C program, one always thinks about changing memory. In a spreadsheet (which is a program; Excel is the most popular programming language in the world), one thinks about altering input (changing nonformula cells) and seeing the output (what is recalculated).

Java evolved from imperative roots and spouted mostly mutable data structures in its standard libraries. The number of mutable classes in java.util.* far outnumber the immutable classes. By default, variables in Java are mutable, and you have to explicitly declare them as final if you want them to be assign-once. Despite that I've written a couple of commercial spreadsheets,[13] and should have understood the benefits of the functional "what" approach, until I spent a lot of time with Scala, I did not think about immutable data structures. I thought about flow of control. It took over a year of practicing immutability and avoiding flow-of-control imperatives in my code before I really grokked immutability and what-oriented coding. So, why doesn't Java have more immutable data structures? Because it's not obvious that Java needs them until you code with them for a while, and very few Java developers I know spent a lot of time with Lisp, ML, or Haskell. But immutability is better.

With a good garbage collector like the one in the JVM, immutable data structures tend to perform better than mutable data structures. For example, Scala's List, which is an immutable linked list, tends to perform better than Java's ArrayList using real-world data. This advantage exists for a couple of reasons. ArrayList pre-allocates an internal array of 10 slots to put items in. If you store only two or three items in the ArrayList, seven or eight slots are wasted. If you exceed the default 10 slots, there's an O(n) copy operation to move the references from the old array to the new array. Contrast this with Scala's List, which is a linked list. Adding elements is a constant-time, O(1), operation. The only memory consumed for storing items is the number of items being stored. If you have hundreds of items or if you're going to do random access on the collection, an Array is a better way to store data. But, most real-world applications are moving two to five items around in a collection and accessing them in order. In this case, a linked list is better.

Immutable data structures are part of the formula for more stable applications. As you start thinking about immutable data structures, you also start reducing the amount of state that is floating around in your application. There will be less and less global state. There will be fewer things that can be changed or mutated. Your methods will rely less and less on setting global state or changing the state of parameters, and your methods will become more and more transformative. In other words, your methods will transform the input values to output values without referring to or modifying external state. These methods are much easier to test using automated test tools such as ScalaCheck. Additionally, they fail less frequently.

One common failure mode for mutable state programs is that a new team member changes program state in an unpredictable way. There may be some setters that create state for an object. It's implied (and probably quite logically) that once the object is handed off to the "do work" method and goes beyond a certain barrier, its setters are not to be called. But along comes a developer who doesn't know about the implied barrier and uses a setter that causes some program logic to fail.

At this point, you may be resisting and saying, "Ten million Java, C++, C, and C# developers can't be wrong." I thought that way when I first started coding in Scala. But I set some goals for myself to learn and understand immutability. Over time, I came to appreciate that many of the defects that I was used to dealing with in Javaland and Rubyland went away as I used more and more immutable data structures and worked to isolate the state in my application from the logic of my application.

Scala List, Tuple, Option, and Map Classes

Scala has a wide variety of collections classes. Collections are containers of things. Those containers can be sequenced, linear sets of items (e.g., List):

scala> val x = List(1,2,3,4)
x: List[Int] = List(1, 2, 3, 4)
scala> x.filter(a => a % 2 == 0)
res14: List[Int] = List(2, 4)
scala> x
res15: List[Int] = List(1, 2, 3, 4)

They may be indexed items where the index is a zero-based Int (e.g., Array) or any other type (e.g., Map).

scala> val a = Array(1,2,3)
a: Array[Int] = Array(1, 2, 3)
scala> a(1)
res16: Int = 2
scala> val m = Map("one" -> 1, "two" -> 2, "three" -> 3)
m: ... Map[java.lang.String,Int] = Map(one -> 1, two -> 2, three -> 3)
scala> m("two")
res17: Int = 2

The collections may have an arbitrary number of elements or be bounded to zero or one element (e.g., Option ). Collections may be strict or lazy.

Lazy collections have elements that may not consume memory until they are accessed (e.g., Range). Let's create a Range:

scala> 0 to 10
res0: Range.Inclusive = Range(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

The nifty thing about Ranges is that the actual elements in the Range are not instantiated until they are accessed. So we can create a Range for all positive Integers but take only the first five elements. This code runs without consuming many gigabytes of RAM because only the elements that are needed are created.

scala> (1 to Integer.MAX_VALUE - 1).take(5)
res18: RandomAccessSeq[Int] = RandomAccessSeq(1, 2, 3, 4, 5)

Collections may be mutable (the contents of the reference can change) or immutable (the thing that a reference refers to is never changed). Note that immutable collections may contain mutable items.

In this chapter, we'll be focusing on List, Option, and Map. These immutable data structures form the backbone of most of the programs I write.

List[T]

Scala's List[T] is a linked list of type T. That means it's a sequential list of any type, including Java's primitives (Int, Float, Double, Boolean, Char) because Scala takes care of boxing (turning primitives into objects) for you. Internally, List is made up of a "cons" cell (the scala.:: class [yes, that's two colons]) with a tail that refers to another cons cell or the Nil object. It's easy to create a List:

scala> 1 :: 2 :: 3 :: Nil
res20: List[Int] = List(1, 2, 3)

The previous code creates three cons cells, each with an Int in it. Anything that looks like an operator with a : (colon) as the first character is evaluated right to left. Thus, the previous code is evaluated just like the following:

scala> new ::(1, new ::(2, new ::(3, Nil)))
res21: ::[Int] = List(1, 2, 3)

:: takes a "head" which is a single element and a "tail" which is another List. The expression on the left of the :: is the head, and the expression on the right is the tail. To create a List using ::, we must always put a List on the right side. That means that the right-most element has to be a List, and in this case, we're using an empty List, Nil.

We can also create a List using the List object's apply method (which is defined as def apply[T](param: T*): List[T], which translates to "the apply method of type T takes zero or more parameters of type T and returns a List of type T"):

scala> List(1,2,3)
res22: List[Int] = List(1, 2, 3)

The type inferencer is pretty good at figuring out the type of the List, but sometimes you need to help it along:

scala> List(1, 44.5, 8d)
res27: List[AnyVal] = List(1, 44.5, 8.0)
scala> List[Number](1, 44.5, 8d)
res28: List[java.lang.Number] = List(1, 44.5, 8.0)

If you want to prepend an item to the head of the List, you can use ::, which actually creates a new cons cell with the old list as the tail:

scala> val x = List(1,2,3)
scala> 99 :: x
res0: List[Int] = List(99, 1, 2, 3)

Note that the list referred to by the variable x is unchanged, but a new List is created with a new head and the old tail. This is a very fast, constant-time, O(1), operation.

You can also merge two lists to form a new List. This operation is O(n) where n is the number of elements in the first List:

scala> val x = List(1,2,3)
scala> val y = List(99, 98, 97)
scala> x ::: y
res3: List[Int] = List(1, 2, 3, 99, 98, 97)

Getting Functional

The power of List and other collections in Scala come when you mix functions with the collection operators. Let's say we want to find all the odd numbers in a List. It's easy:

scala> List(1,2,3).filter(x => x % 2 == 1)
res4: List[Int] = List(1, 3)

The filter method iterates over the collection and applies the function, in this case, an anonymous function, to each of the elements. If the function returns true, the element is included in the resulting collection. If the function returns false, the element is not included in the resulting collection. The resulting collection is the same type of collection that filter was invoked on. If you invoke filter on a List[Int], you get a List[Int]. If you invoke filter on an Array[String], you get an Array[String] back. In this case, we've written a function that performs mod 2 on the parameter and tests to see whether the result is 1, which indicates that the parameter is odd. There's a corresponding remove method, which removes elements that match the test function:

scala> List(1,2,3).remove(x => x % 2 == 1)
res5: List[Int] = List(2)

We can also write a method called isOdd and pass the isOdd method as a parameter (Scala will promote the method to a function):

scala> def isOdd(x: Int) = x % 2 == 1
isOdd: (Int)Boolean
scala> List(1,2,3,4,5).filter(isOdd)
res6: List[Int] = List(1, 3, 5)

filter works with any collections that contain any type. For example:

scala> "99 Red Balloons".toList.filter(Character.isDigit)
res9: List[Char] = List(9, 9)

In this case, we're converting a String to a List[Char] using the toList method and filtering the numbers. The Scala compiler promotes the isDigit static method on Character to a function, thus demonstrating interoperability with Java and that Scala methods are not magic.

Another useful method for picking the right elements out of a List is takeWhile, which returns all the elements until it encounters an element that causes the function to return false. For example, let's get all the characters up to the first space in a String:

scala> "Elwood eats mice".takeWhile(c => c != ' ')
res12: Seq[Char] = ArrayBuffer(E, l, w, o, o, d)

Contrast with Java

I grew up writing machine code and later assembly language. When I wrote this code, I was telling the machine exactly what to do: load this register with this value, test the value, branch if some condition was met, and so on. I directed the steps that the CPU took in order to perform my task. Contrast this with writing formula functions in Excel. In Excel, we describe how to solve some problem using the formula functions and cell addresses, and it's up to Excel to determine what cells need to be recalculated and the order for the recalculation.

Directing the machine's steps is termed imperative coding. Imperative coding describes, as we saw earlier, the "how." Writing functions describing the "what"—the goal to be achieved—and allowing the computer to figure out the "how" is termed functional programming. Scala allows you to express code in a way that's further toward the functional end of the coding spectrum. Let's do a little exploring of the differences between the two.

Let's compare filter in Scala to the Java implementation shown in Listing 3-1.

Example 3.1. Java Implementation of Odd Filtering

int[] x = {1,2,3,4,5};
ArrayList<Integer> res = new ArrayList<Integer>();
for (int v : x) {
  if (v % 2 == 1) res.add(new Integer(v));
}

In this code, the logic gets lost in the boilerplate. There are two pieces of logic that we are concerned with: what operation is being performed and the formula for that logic. When you first write the previous code, you know what the intent is. If you didn't comment your code with // filtering the odd array elements, you, or someone who picks up the code in a year or two, will have to puzzle about the meaning of the loop. The filter logic is buried in the middle of the code. Compare the Java code to the line of Scala code: filter(v => v % 2 == 1), where the very essence of the operation appears in the code and nothing else.

As the complexity of your code increases and the time between writing and maintaining a particular module increases, removing boilerplate while maintaining visible business logic makes code easier to maintain and decreases defects. Let's look at the takeWhile Java translation in Listing 3-2.

Example 3.2. Java Implementation of Take Until First Space

char[] x = "Elwood Eats Mice".toCharArray();
ArrayList<Character> res = new ArrayList<Character>();
for (char c : x) {
  if (c != ' ') res.add(new Character(c));
  else break;
}

Once again, one line in Scala expresses what takes many lines in Java. This example also demonstrates the mental shift that is common in imperative code. In one line in the loop, we mutate or change the res variable. In the next line, in the else, we have the break flow of control statement. Your brain has to think about two distinct concepts: "what variable is being mutated" and "what's the next statement the program is going to execute" all rolled into two lines.

When I was programming in Java, I never gave constructs like this a second thought. After programming in Ruby and Scala, I find that thinking about "what's happening to my data" and "what's the flow of control in my program" at the same time is very challenging. My brain has morphed into thinking about "what is business logic for transforming input to output?" I find this focuses me on the business logic task at hand. As we continue to explore transformations, let's look at other ways to transform a List.

Transformation

The map method on List (and Seq), transforms each element of a collection based on a function. For example, if we have a List[String] and want to convert it to all lowercase:

scala> List("A", "Cat").map(s => s.toLowerCase)
res29: List[java.lang.String] = List(a, cat)

We can shorten the function so the code reads:

scala> List("A", "Cat").map(_.toLowerCase)
res30: List[java.lang.String] = List(a, cat)

The number of elements in the returned collection is the same as the number of elements in the original collection, but the types may be different. If the function passed into map returns a different type, then the resulting collection is a collection of the type returned from the function. For example, we can take a List[String] and calculate the length of each String, which will result in a List[Int]:

scala> List("A", "Cat").map(_.length)
res31: List[Int] = List(1, 3)

map provides a very powerful and uniform way to transform data from one type to another. We can transform our Strings to lowercase, to a List of their length, and we can extract data from a collection of complex objects. For example, if we have a database query that returns records of type Person defined as having a first method that returns a String containing the person's first name, we can create a List of the first names of the people in the List:

scala> trait Person {def first: String }
defined trait Person
scala> val d = new Person {def first = "David" }
scala> val e = new Person {def first = "Elwood"}
scala> val a = new Person {def first = "Archer"}
scala> List(a, d, e).map(_.first)
res35: List[String] = List(Archer, David, Elwood)

Or, if we're writing a web app, we can create an <li> (an HTML list element) containing the first name of each Person in our List:

scala> List(a,d,e).map(n => <li>{n.first}</li>)
List(<li>Archer</li>, <li>David</li>, <li>Elwood</li>)

List also has a sort method, which takes a function that compares the two parameters. We can sort a List[Int]:

scala> List(99, 2, 1, 45).sort(_ < _)
res47: List[Int] = List(1, 2, 45, 99)

We can sort a List[String]:

scala> List("b", "a", "elwood", "archer").sort(_ < _)
res48: List[java.lang.String] = List(a, archer, b, elwood)

And we can sort longest to shortest:

scala> List("b", "a", "elwood", "archer").
       sort(_.length > _.length)
res49: List(archer, elwood, a, b)

We can combine the operations. Let's update our Person trait:

trait Person {
      def age: Int
      def first: String
      def valid: Boolean
}

Now we can write the code shown in Listing 3-3 to find all the valid Person records, sort by age, and return the first names.

Example 3.3. First Name of Valid Persons, Sorted by Age

def validByAge(in: List[Person]) =
  in.filter(_.valid).
  sort(_.age < _.age).
  map(_.first)

Transformation vs. Mutation

While sometimes you may do complex logic like this in the database, other times you may have a collection of records in memory and you need to perform different transformations on those records. Let's, once again, compare the Scala code in Listing 3-3 to the Java implementation in Listing 3-4.

Example 3.4. Java Implementation of First Name of Valid Persons, Sorted by Age

public ArrayList<String> validByAge(ArrayList<Person> in) {
  ArrayList<Person> valid = new ArrayList<Person>();
  for (Person p: in) {
    if (p.valid()) valid.add(p);
  }

  Person[] people = valid.toArray(new Person[0]);

 Arrays.sort(people, new Comparator<Person>() {
    public int compare(Person a, Person b) {
      return a.age() - b.age();
    } } );

  ArrayList<String> ret = new ArrayList<String>();

  for (Person p: people) {
    ret.add(p.first());
  }

  return ret;
}

Our code expanded from 4 lines to 16 lines (not including blank lines). While we can still discern the intent of the Java code, it requires mentally filtering out the boilerplate loops and looking inside them. You have to look past the how in order to understand the what. On the other hand, even the Scala beginner should be able to read the lines in the Scala code to understand the meaning. We even see a hint of functional programming in the Java code with the construction of Comparator, which consumes four lines of code rather than Scala's one line of code.

Interestingly, under the covers, Scala is creating anonymous inner classes for each of the functions and passing them to filter, map, and sort. Just as it makes sense in Java not to have lots of sorting routines floating around code, so there's a set of sort methods (and other helper methods) on Arrays, Scala puts the looping constructs into map, filter, and so on as well so that your code contains more logic and less boilerplate.

Reduxio

Scala has other abstractions for common collections operations. reduceLeft allows you to perform an operation on adjacent elements of the collection where the result of the first operation is fed into the next operation. For example, if we want to find the biggest number in a List[Int]:

scala> List(8, 6, 22, 2).reduceLeft(_ max _)
res50: Int = 22

In this case, reduceLeft takes 8 and 6 and feeds them into our function, which returns the maximum value of the two numbers: 8. Next, reduceLeft feeds 8 (the output of the last iteration) and 22 into the function, resulting in 22. Next, reduceLeft feeds 22 and 2 into the function, resulting in 22. Because there are no more elements, reduceLeft returns 22.

We can use reduceLeft to find the longest word:

scala> List("moose", "cow", "A", "Cat").
       reduceLeft((a, b) => if (a.length > b.length) a else b)
res41: java.lang.String = moose

Because Scala's if expression works like Java's ternary operator, the if in the previous code returns a if it's longer than b. We can also find the shortest word:

scala> List("moose", "cow", "A", "Cat").
       reduceLeft((a, b) => if (a.length < b.length) a else b)
res42: java.lang.String = A

reduceLeft will throw an exception on an Nil (empty) List. This is correct behavior as there is no way to apply the function on the members of the List as a Nil List has no elements.

foldLeft is similar to reduceLeft, but it starts with a seed value. The return type of the function and the return type of foldLeft must be the same type as the seed. The first example is summing up List[Int]:

scala> List(1,2,3,4).foldLeft(0) (_ + _)
res43: Int = 10

In this case, the seed value is 0. Its type is Int. foldLeft feeds the seed and the first element of the List, 1, into the function, which returns 1. Next, foldLeft feeds 1 (the result of the previous iteration) and 2 (the next element) into the function, resulting in 3. The process continues, and the sum of the List[Int] is generated: 10. We can generate the product of the List the same way:

scala> List(1,2,3,4).foldLeft(1) (_ * _)
res44: Int = 24

But because the return type of foldLeft is the type of the seed, not the type of the List, we can figure out the total length of a List[String]:

scala> List("b", "a", "elwood", "archer").foldLeft(0)(_ + _.length)
res51: Int = 14

I find that sometimes I need to work with more than one collection at a time. For example, if we want to generate the List of products of the numbers from 1 to 3:

scala> val n = (1 to 3).toList
n: List[Int] = List(1, 2, 3)
scala> n.map(i => n.map(j => i * j))
res53: List[List[Int]] = List(List(1, 2, 3), List(2, 4, 6), List(3, 6, 9))

We have nested map invocations, and that results in a List[List[Int]]. In some cases, this may be what we want. In other cases, we want the results in a single List[Int]. In order to nest the map operations but flatten the results of nested operations, we use the flatMap method:

scala> n.flatMap(i => n.map(j => i * j))
res58: List[Int] = List(1, 2, 3, 2, 4, 6, 3, 6, 9)

Look Ma, No Loops

So far, we've written a bunch of code that manipulates collections without explicit looping. By passing functions, that is, logic, to methods that control the looping, we let the library writers define the looping, and we define the logic in our app. However, syntactically, nested map, flatMap, and filter can get ugly. For example, if we want to find the product of the odd numbers from 1 to 10 times the even numbers from 1 to 10, we could write the following:

scala> def isOdd(in: Int) = in % 2 == 1
scala> def isEven(in: Int) = !isOdd(in)
scala> val n = (1 to 10).toList
scala> n.filter(isEven).flatMap(i => n.filter(isOdd).map(j => i * j))
res60: List[Int] = List(2, 6, 10, 14, 18, ... 10, 30, 50, 70, 90)

Scala provides the for comprehension, which provides syntactically pleasing nesting of map, flatMap, and filter. We can convert the nested statements from the previous example into a syntactically pleasing statement:

scala> for {i <- n if isEven(i); j <- n if isOdd(j)} yield i * j
res59: List[Int] = List(2, 6, 10, 14, 18, ... 10, 30, 50, 70, 90)

The for comprehension is not a looping construct but is a syntactic construct that the compiler reduces to map, flatMap, and filter. In fact, the two lines

n.filter(isEven).flatMap(i => n.filter(isOdd).map(j => i * j))

and

for {i <- n if isEven(i); j <- n if isOdd(j)} yield i * j

result in the same bytecode. The for comprehension can be used with any class, including user-generated classes, that implement map, flatMap, filter, and foreach. This means you can create your own classes that work with the for comprehension.

Lists also work well with Scala's pattern matching and recursive programming. We'll be exploring pattern matching in depth in Chapter 5. For this example, pattern matching is a lot like Java's switch statement, but it can be used to compare things that are more complex than Ints, and Scala's pattern matching allows you to match some elements and extract, or capture, others into variables.

The pattern-matching syntax is the same as List construction syntax. For example, if we are matching against List[Int], case 1 :: Nil => will match List(1). case 1 :: 2 :: Nil => will match List(1,2). case 1 :: rest => will match any List that starts with 1 and will put the tail of the List into the variable rest.

Our example will convert a List[Char] of Roman numerals to their Arabic numeral equivalent. The code matches the List to a series of patterns. Based on the matched pattern, a value is returned. The patterns are matched in order of appearance. However, the compiler may optimize the patterns by eliminating duplicate tests.[14] The code to convert from Roman numerals to Int is in Listing 3-5.

Example 3.5. Roman Numerals

def roman(in: List[Char]): Int = in match {
    case 'I' :: 'V' :: rest => 4 + roman(rest)
    case 'I' :: 'X' :: rest => 9 + roman(rest)
    case 'I' :: rest => 1 + roman(rest)
    case 'V' :: rest => 5 + roman(rest)
    case 'X' :: 'L' :: rest => 40 + roman(rest)
    case 'X' :: 'C' :: rest => 90 + roman(rest)
    case 'X' :: rest => 10 + roman(rest)
    case 'L' :: rest => 50 + roman(rest)
    case 'C' :: 'D' :: rest => 400 + roman(rest)
    case 'C' :: 'M' :: rest => 900 + roman(rest)
    case 'C' :: rest => 100 + roman(rest)
    case 'D' :: rest => 500 + roman(rest)
    case 'M' :: rest => 1000 + roman(rest)
    case _ => 0
}

case 'I' :: 'V' :: rest => 4 + roman(rest) tests the first two characters, and if they are IV, the method returns 4 plus the Roman numeral conversion of the rest of the List[Char]. If the test falls through to case _ => 0, there are no more Roman numerals, 0 is returned, and there's no more recursion—no more calls back into the roman() method. Without explicit looping or length testing or explicit branching logic, we've written a concise, readable method.

Scala's List and other sequential collections provide powerful ways to define business logic in a concise, maintainable way. In the next section, we're going to explore Tuples, which are fixed-length collections where each element can be a different type.

Tuples

Have you ever written a method that returns two or three values? Let's write a method that takes a List[Double] and returns the count, the sum, and the sum of squares returned in a three-element Tuple, a Tuple3[Int, Double, Double]:

def sumSq(in: List[Double]): (Int, Double, Double) =
    in.foldLeft((0, 0d, 0d))((t, v) =>  (t._1 + 1, t._2 + v, t._3 + v * v))

The sumSq method takes a List[Double] as input and returns a Tuple3[Int, Double, Double]. The compiler desugars (Int, Double, Double) into Tuple3[Int, Double, Double]. The compiler will treat a collection of elements in parenthesis as a Tuple. We seed the foldLeft with (0, 0d, 0d), which the compiler translates to a Tuple3[Int, Double, Double]. The function takes two parameters: t and v. t is a Tuple3, and v is a Double. The function returns a new Tuple3 by adding 1 to the first element of the Tuple, adding v to the second element of the Tuple, and adding the square of v to the third element of the Tuple. Using Scala's pattern matching, we can make the code a little more readable:

def sumSq(in: List[Double]) : (Int, Double, Double) =
    in.foldLeft((0, 0d, 0d)){
    case ((cnt, sum, sq), v) =>  (cnt + 1, sum + v, sq + v * v)}

You can create Tuples using a variety of syntax:

scala> Tuple2(1,2) == Pair(1,2)
scala> Pair(1,2) == (1,2)
scala> (1,2) == 1 -> 2

The last example, 1 -> 2, is a particularly helpful and syntactically pleasing way for passing pairs around. Pairs appear in code very frequently, including name/value pairs for creating Maps.

Map[K, V]

A Map is a collection of key/value pairs. Any value can be retrieved based on its key. Keys are unique in the Map, but values need not be unique. In Java, Hashtable and HashMap are common Map classes. The default Scala Map class is immutable. This means that you can pass an instance of Map to another thread, and that thread can access the Map without synchronizing. The performance of Scala's immutable Map is indistinguishable from the performance of Java's HashMap.

We can create a Map:

scala> var p = Map(1 -> "David", 9 -> "Elwood")
p: ... Map[Int,String] = Map(1 -> David, 9 -> Elwood)

We create a new Map by passing a set of Pair[Int, String] to the Map object's apply method. Note that we created a var p rather than a val p. This is because the Map is immutable, so when we alter the contents on the Map, we have to assign the new Map back to p.

We can add an element to the Map:

scala>  p + 8 -> "Archer"
res4: ... Map[Int,String] = Map(1 -> David, 9 -> Elwood, 8 -> Archer)

But we haven't changed the immutable Map:

scala> p
res5: ... Map[Int,String] = Map(1 -> David, 9 -> Elwood)

In order to update p, we have to assign the new Map back to p:

scala> p = p + 8 -> "Archer"

or:

scala> p += 8 -> "Archer"

And we can see that p is updated:

scala> p
res7: Map[Int,String] = Map(1 -> David, 9 -> Elwood, 8 -> Archer)

We can get elements out of the Map:

scala> p(9)
res12: java.lang.String = Elwood

What happens when we ask for an element that doesn't exist?

scala> p(88)
java.util.NoSuchElementException: key not found: 88

This is mighty inconvenient. If you try to get an element that's not in the Map, you get an exception. That's kind of jarring. So far, we haven't seen much in Scala that results in exceptions being thrown, but it makes logical sense. If you request something that doesn't exist, that's an exceptional situation. Java's Map classes handle this situation by returning null, which has two drawbacks. First, you have to null-test the result of every Map access. Second, it means you can't store a null in a Map. Scala has a kinder and gentler mechanism for dealing with this situation. The get() method on Map returns an Option (Some or None) that contains the result:

scala> p.get(88)
res10: Option[java.lang.String] = None
scala> p.get(9)
res11: Option[java.lang.String] = Some(Elwood)

You can return a default value if the key is not found:

scala> p.getOrElse(99, "Nobody")
res55: java.lang.String = Nobody
scala> p.getOrElse(1, "Nobody")
res56: java.lang.String = David

We can also use flatMap with Options to find all the values with keys between 1 and 5:

scala> 1 to 5 flatMap(p.get)
res53: Seq.Projection[java.lang.String] = RangeG(David)

In this case, we create a range of numbers from 1 to 5. We flatMap this collection, passing in a function, p.get. Wait, you say, p.get isn't a function, it's a method, but you didn't include the parameter. Scala is very cool, because if it's expecting a function with parameters of a particular type and you pass a method that takes those parameters, Scala will promote the method with its missing parameters to a function. We'll explore Options in the next subsection.

Let's continue exploring Map. We can remove elements from our Map:

scala> p -= 9
scala> p
res20: Map[Int,String] = Map(1 -> David, 8 -> Archer)

We can test the Map to see whether it contains a particular key:

scala> p.contains(1)
res21: Boolean = true

We can operate on the collection of keys. We get a collection of keys from our Map and use reduceLeft to find the largest key:

scala> p.keys.reduceLeft(_ max _)
res22: Int = 8

And we can use reduceLeft on the collection of values to find the largest String:

scala> p.values.reduceLeft((a, b) => if (a > b) a else b)
res23: java.lang.String = David

We can test whether any of the values contains the letter "z":

scala> p.values.exists(_.contains("z"))
res28: Boolean = false

You can also add a bunch of elements to a Map using the ++ method:

scala> p ++= List(5 -> "Cat", 6 -> "Dog")
p: Map[Int,String] = Map(1 -> David, 8 -> Archer, 5 -> Cat, 6 -> Dog)

And you can remove a bunch of keys with the — method:

scala> p --= List(8, 6)
res40: Map[Int, String] = Map(1 -> David, 5 -> Cat)

Maps are Scala collections and have collection manipulation methods. This means we can use methods including map, filter, and foldLeft. One of the tricky parts of using Java's immutable collections is iterating over the collection and simultaneously removing elements. In my code, I have to create an accumulator for the keys I'm going to remove, loop over the collection, find all the keys to remove, and then iterate over the collection of keys to remove and remove them from the collection. Not only that, but I frequently forget how brittle Hashtable is and inevitably forget this sequence and get some nasty runtime errors. In Scala, it's much easier. But there's a simpler way to remove unwanted elements from a Map:

def removeInvalid(in: Map[Int, Person]) = in.filter(kv => kv._2.valid)

Pretty cool, huh? Map has a filter method that works just like List's filter method. The kv variable is a Pair representing the key/value pair. The filter method tests each key/value pair by calling the function and constructs a new Map that contains only the elements that passed the filter test.

Let's finish up our exploration of some of Scala's immutable data types by examining Option.

Option[T]

Option[T] provides a container for zero or one element of a given type. Option provides a very powerful alternative to Java's null. An Option[T] can be either Some[T] or None. None is an object. There is a single instance of None in your Scala program, so it's kind of like null. But None has methods on it, so you can invoke map, flatMap, filter, foreach, and so on no matter whether you have Some or None.

Let's say we have a method that retrieves a record from the database based on a primary key:

def findPerson(key: Int): Option[Person]

The method will return Some[Person] if the record is found but None if the record is not found. We can then build a method that returns the age from the primary key:

def ageFromKey(key: Int): Option[Int] = findPerson(key).map(_.age)

If the record is found in the database, ageFromKey will return Some[Int], otherwise it will return None. We can cascade mapping/flatMapping of Option without explicitly testing for None. For example, let's say we have a Map that contains parameters passed as part of a web request and a couple of helper methods. Let's make a start implementing this:

scala> import java.lang.{Boolean => JBool}

This imports java.lang.Boolean but renames it locally to JBool so it doesn't conflict with Scala's Boolean class.

scala> def tryo[T](f: => T): Option[T] = try {Some(f)} catch {case _ => None}

We define a method that wraps an operation in a try/catch block. If the operation succeeds, we wrap the result in a Some instance, otherwise return None.

scala> def toInt(s: String): Option[Int] = tryo(s.toInt)
scala> def toBool(s: String) = tryo(JBool.parseBoolean(s))

We define methods that convert String to Int or Boolean. If the String can be converted, Some will be returned, otherwise None will be returned.

With these helpers, we can define our method that converts from the parameters to a Person instance. This is shown in Listing 3-6.

Example 3.6. Convert a Map to a Person

def personFromParams(p: Map[String, String]): Option[Person] =
  for {name <- p.get("name")
       ageStr <- p.get("age")
       age <- toInt(ageStr)
       validStr <- p.get("valid")
       valid <- toBool(validStr)}
   yield new Person(name, age, valid)

In my code, any method that can logically fail (e.g., looking something up in a Map, converting a String to an Int) returns an Option. It is up to the calling code to determine what to do. Let's compare this to the Java implementation shown in Listing 3-7.

Example 3.7. Java Implementation of Convert a Map to a Person

public Person personFromParams(Map<String, String> p) {
  try {
    if (!p.containsKey("name")) return null;
    String name = p.get("name");
    if (!p.containsKey("age")) return null;
    String ageStr = p.get("age");
    int age = Integer.parseInt(ageStr);
    if (!p.containsKey("valid")) return null;
    String validStr = p.get("valid");
    bool valid = Boolean.parseBoolean(validStr);
    return new Person(name, age, valid);
  } catch (Exception e) {
    return null;
  }
}

The Java code has to explicitly test each key. This increases the number of lines of code and reduces the readibility. The Java code has multiple return paths. This makes it more difficult to understand under what conditions the code will continue to flow. At a glance, it's difficult to determine the conditions under which the method will return non-null.

Option has some methods to allow chaining. For example, if the parameter name that maps to "age" used to be called "years", we can express the code as ageStr <- p.get("age") orElse p.get("years"). If p.get("age") returns None, p.get("years") will be called. You can chain orElse just like chaining || in the if conditional. The right side of the orElse expression will be evaluated only if the left side is None.

You can retrieve the contents of an Option with the get method:

scala> Some(3).get
res57: Int = 3

But be careful, because if the Option is None, an exception will be raised:

scala> None.get
java.util.NoSuchElementException: None.get

Like Map, Option has a getOrElse method that returns a default value if the contents are undefined:

scala> Some(3).getOrElse(44)
res59: Int = 3
scala> None.getOrElse(44)
res60: Int = 44

Option is a very useful class for passing or returning values that may or may not be defined. Because Option has map, flatMap, filter, and foreach methods, it can be used in for comprehensions. Option provides a great way to avoid null problems and has methods that work conveniently with other Scala collections.

Wrapping Up List, Tuple, Map, and Option

We've covered a lot of ground in this section. We've seen some of Scala's foundation collection classes and the methods that allow for simple and powerful manipulation of these collections. These classes form the basis for much of my development. They also form the basis for other classes in Scala. For web development, XML is a very important mechanism for data exchange. Scala's immutable collections form the basis for Scala's XML support.

XML Creation and Manipulation

Scala has language-level support for defining XML constants. XML in Scala is immutable, just like Strings in Java. This means you can cache XML without worrying about copying it so it's not modified out from under you. In this section, we'll explore creating XML, parsing XML, and mutating XML. I'm covering XML in this chapter because it's an immutable data structure and because Scala treats XML as a collection on Nodes, so it gives us a chance to explore some more of the cool collections stuff we've been working on.

XML in Your Scala Code

Scala has XML literals built into the language syntax, just like Java has String literals built in. That means you can write

scala> <b>Hello World</b>
res0: scala.xml.Elem = <b>Hello World</b>

You can include attributes:

scala> <b id="greeting">Hello World</b>
res1: scala.xml.Elem = <b id="greeting">Hello World</b>

And the XML can span multiple lines:

scala> <b id="greeting">
       <span>Hello</span> World!
       </b>
res2: scala.xml.Elem =
<b id="greeting">
       <span>Hello</span> World!
       </b>

The XML elements can contain prefixes:

scala> <ns:b>Hello World from a namespace</ns:b>
res0: scala.xml.Elem = <ns:b>Hello World from a namespace</ns:b>

You can even have prefixes on the attributes:

scala> <b ns:hi='hello'>Hello</b>
res12: scala.xml.Elem = <b ns:hi="hello">Hello</b>

You can assign XML to a variable:

scala> val x = <b>Hello World!</b>
x: scala.xml.Elem = <b>Hello World!</b>

Scala represents XML as a Seq[Node], and Node is a superclass of NodeSeq, which is a subclass of Seq[Node]. That means all the collections methods that we've been exploring are available on XML collections including map, flatMap, filter, and foreach. This also means that XML can be used in for comprehensions. We'll explore that in the next subsection.

We can define a len method that takes a Seq of any type and returns the length:

scala> def len(seq: Seq[_]) = seq.length
len: (Seq[_])Int

and call it with an XML literal:

scala> len(<b>Hello</b>)
res1: Int = 1

Or we can call it with the XML variable x we defined previously:

scala> len(x)
res2: Int = 1

Or we can call it with a List[Int]:

scala> len(List(1,2,3))
res11: Int = 3

The ability to dynamically create XML in Scala is far more powerful than the ability to dynamically create Strings. Scala code can be embedded in any attribute or element body to dynamically render XML. For example, let's define a method that returns the current milliseconds as a String:

scala> def now = System.currentTimeMillis.toString
now: java.lang.String

We can call the now method from the attribute definition to add a time attribute:

scala> <b time={now}>Hello World</b>
res3: scala.xml.Elem = <b time="1230678511390">Hello World</b>

Attributes can be defined with Scala expressions of type String, NodeSeq, and Option[NodeSeq]. We did an example of String previously. Option[NodeSeq] in an attribute definition is very powerful. If the Option is None, the attribute will not be included in the resulting XML, but if the Option is Some, then the attribute will be included.

An example will help illustrate. Let's define a method that tests Long for being odd:

scala> def isOdd(in: Long) = in % 2L == 1L
isOdd: (Long)Boolean

Next we import Scala's XML package. XML literals are language-level, but the library that allows you to manipulate the literals must be imported.[15]

scala> import scala.xml._
import scala.xml._

We define the oddTime method, which will return Some[NodeSeq] if the time is odd and return None if it's even:

scala> def oddTime: Option[NodeSeq] = System.currentTimeMillis match {
       case t if isOdd(t) => Some(Text(t.toString))
       case _ => None
       }
oddTime: Option[scala.xml.NodeSeq]

And we can see that the time attribute is only included when the time is odd:

scala> <b time={oddTime}>Sometimes</b>
res6: scala.xml.Elem = <b time="1230679058437">Sometimes</b>
scala> <b time={oddTime}>Sometimes</b>
res7: scala.xml.Elem = <b time="1230679061015">Sometimes</b>
scala> <b time={oddTime}>Sometimes</b>
res8: scala.xml.Elem = <b>Sometimes</b>

You can also generate tags, text, cdata, and so on from Scala code:

scala> <b>The time is {new java.util.Date}</b>
res4: scala.xml.Elem = <b>The time is Tue Dec 30 15:09:09 PST 2008</b>

If your embedded Scala code returns a NodeSeq, the NodeSeq will be embedded directly. If your Scala expression returns anything else, that thing will be converted to a scala.xml.Text node by calling toString on the thing.

Using map, we can convert from a given type to XML that's embedded:

scala> <stuff>
       {(1 to 3).map(i => <v id={i.toString}>#{i}</v>)}
       </stuff>
res0: scala.xml.Elem =
<stuff>
       <v id="1">#1</v><v id="2">#2</v><v id="3">#3</v>
       </stuff>

One thing to be careful about is making sure your if expressions have an else. The type of an if expression without an else is Unit, which converts to an empty String:

scala> <b>{if (true) "dogs"}</b>
res14: scala.xml.Elem = <b></b>

That's not what we expected, but this is:

scala> <b>{if (true) "dogs" else ""}</b>
res15: scala.xml.Elem = <b>dogs</b>
scala> <b>{if (false) "dogs" else ""}</b>
res16: scala.xml.Elem = <b></b>

Scala correctly escapes characters from your Scala expressions:

scala> <b>{"Hello & Goodbye"}</b>
res3: scala.xml.Elem = <b>Hello &amp; Goodbye</b>
scala> <b attr={"Hello < Zoo"}/>
res6: scala.xml.Elem = <b attr="Hello &lt; Zoo"></b>

So you don't have to worry about escaping XML characters. This is generally the right thing, but if you're trying to embed a script in your XML, it might not be the right thing:

scala> val info = """
       var x = "";
       if (true && false) alert('Woof'),
       """
scala> <script>{info}</script>
res4: scala.xml.Elem =
<script>
       var x = &quot;&quot;;
       if (true &amp;&amp; false) alert('Woof'),
       </script>

You can use PCData to embed unescaped characters in your XML:

scala> <script>{PCData(info)}</script>
res5: scala.xml.Elem =
<script><![CDATA[
       var x = "";
       if (true && false) alert('Woof'),
       ]]></script>

Scala's built-in XML support makes creating XML a breeze. Because the XML data structures are immutable, they can be shared and cached without concern that another thread or method may change the XML. As a side note, when I do browser-side HTML manipulation, I always have to be concerned about inserting some nodes into other nodes because the insertion causes parent node references to be changed. This means that if I had inserted the nodes elsewhere, they'll be unceremoniously removed from the old place to be inserted into the new place. This means I have to copy the entire node structure each time I do an insert, which is an O(n) operation. With Scala's immutable XML data structures, I only have to change from the insertion point to the root node, an O(log n) operation.

Now that we've seen how to create XML in Scala, let's see how to parse and manipulate XML.

Parsing XML

Creating XML in Scala is done at the language level. The compiler needed to be changed to support XML literals. Parsing XML in Scala is all done at the library level. Let's explore XML parsing in combination with Scala's collection support. First, let's make sure the scala.xml package is imported:

scala> import scala.xml._
import scala.xml._

Next, let's grab some well-formed XML from the Web:

scala> val xml = XML.load("http://demo.liftweb.net/")
xml: scala.xml.Elem =
<html xmlns="http://www.w3.org/1999/xhtml"...

Next, let's find all the <a> tags in the document:

scala> xml \ "a"
res6: scala.xml.NodeSeq = <a shape="rect" ...

The \ (double backslash) operator finds all the tags with the given label in the document. The (single backslash) operator finds all the tags with the given label that are direct children of the current tag.

How many <a> tags on the page?

scala> (xml \ "a").length
res7: Int = 25

OK, that's pretty cool. And note that the results of the using and \ are Seq, so all the standard collection operations apply, including the for comprehension. Also, if the first character of the query String is an @, the query is performed against the attributes in the tag. Let's find all the <a> tags that refer to external resources:

scala> (xml \ "a").map(_  "@href").map(_.text).filter(_ startsWith "http:")
res11: Seq[String] = ArrayBuffer(http://liftweb.net, http://scala-lang.org, ...

How many is that?

scala> (xml \ "a").filter(n => (n  "@href").text.startsWith("http:")).length
res9: Int = 8

Perhaps the code is easier to read in a for comprehension:

scala> val refs = for {a <- xml \ "a"
       ext <- a  "@href" if ext.text startsWith "http:"}
       yield ext.text
refs: Seq[String] = ArrayBufferRO(http://liftweb.net, ...)

How many?

scala> refs.length
res10: Int = 8

In combination with Scala's collection methods, we can do more fun stuff with XML. We can traverse the XML and sum the contents of particular tags. First, let's create some XML:

scala> val x2 = <x>{(1 to 3).map(i => <i>{i}</i>)}</x>
x2: scala.xml.Elem = <x><i>1</i><i>2</i><i>3</i></x>

Let's find all the <i> tags:

scala> x2  "i"
res26: scala.xml.NodeSeq = <i>1</i><i>2</i><i>3</i>

Now, get the text from each tag:

scala> (x2  "i").map(_.text)
res27: Seq[String] = ArrayBufferRO(1, 2, 3)

and convert to an Int:

scala> (x2  "i").map(_.text.toInt)
res28: Seq[Int] = ArrayBufferRO(1, 2, 3)

Finally, we sum up the collection:

scala> (x2  "i").map(_.text.toInt).foldLeft(0)(_ + _)
res29: Int = 6

Scala's XML parsing library provides a simple yet powerful way to traverse XML. Combined with XML's collection methods, it's very easy and concise to convert XML data to object representations. Next, let's see how to transform XML.

Modifying XML

Transforming flat data structures such as List or Map is pretty easy: you just take each element from the collection and return a transformed value. Transforming XML is more complex. XML contains many different types of elements. XML data is nested. A transformation may return many nodes for each transformed node.

In order to deal with these complexities, Scala provides a set of classes to help with XML transformation. These helpers can be found in the scala.xml.transform package. These helpers provide O(log n) cost for transformations. That means only the nodes that are changed and the direct path from the changed nodes to the root node are changed. This is far more efficient than the O(n) cost of copying an entire XML data structure if modifying the structure will cause unexpected results.

In order to transform XML, we need to create a rewrite rule. In this case, if the XML element contains the attribute instruction with the value remove, the node will be removed from the transformed XML. First, let's import the packages:

import scala.xml._
import scala.xml.transform._

Next, let's define a rule that removes the elements that have the remove instruction:

val removeIt = new RewriteRule {
    override def transform(n: Node): NodeSeq = n match {
      case e: Elem if (e  "@instruction").text == "remove" => NodeSeq.Empty
      case n => n
    }
  }

The RewriteRule overrides the transform method and pattern matches against an element that contains the remove instruction. If the pattern is matched, an empty node is returned. Otherwise the method returns the unmodified node. Let's create some test data:

val xmlBooks =
        <books instruction="update">
          <book instruction="remove" name="book1" status=""/>
          <book instruction="add" name="book2" status=""/>
        </books>

Next, let's transform xmlBooks using a RuleTransformer:

scala> new RuleTransformer(removeIt).transform(xmlBooks)
res0: Seq[scala.xml.Node] =
<books instruction="update">
<book instruction="add" status="" name="book2"></book>
</books>

The rule transformer applied our removeIt rule to the input and transformed this to the output. Let's create a RewriteRule to process the add instruction:

val addIt = new RewriteRule {
    override def transform(n: Node): NodeSeq = n match {
      case e: Elem if (e  "@instruction").text == "add" =>
        new Elem(e.prefix, e.label,
          e.attributes.remove("instruction"),
          e.scope,
          transform(e.child) ++ <added>I added this</added> :_*)

      case n => n
    }
  }

This RewriteRule matches nodes with the add instruction attribute. The method creates a new element with the same attributes, but removes the instruction attribute. It also transforms the child nodes and appends a child node.

Let's run this transformation:

scala> new RuleTransformer(addIt).transform(xmlBooks)
res2: Seq[scala.xml.Node] =
<books instruction="update">
  <book instruction="remove" status="" name="book1"></book>
  <book status="" name="book2"><added>I added this</added></book>
</books>

Note that the book2 book has child nodes. We can run the transformation with both of the RewriteRules:

scala> new RuleTransformer(addIt, removeIt).transform(xmlBooks)
res3: Seq[scala.xml.Node] =
<books instruction="update">
  <book status="" name="book2"><added>I added this</added></book>
</books>

In this section, we've explored creating, parsing, and rewriting XML. Scala's awesome XML support makes it super-simple to write complex web-based applications. You can create Atom or RSS feeds in just a few lines that transform database records into the appropriate XML. You can parse incoming XML using the same constructs you use for other data sequences, which makes transformation from XML into data structures and objects easy and fast. Finally, Scala provides simple ways to transform XML.

In this chapter, we've explored Scala's immutable data types and the power and simplicity that they afford you. Next, let's see how we can use the immutable data structures in a highly concurrent situation. We're going to do concurrency without synchronization and see how immutability helps us out, big time.

Concurrency Without Synchronization

Writing multithreaded programs is a challenge. While simple data structures such as multithreaded queues or mailboxes are easy to write and relatively low in defects, most Java programs don't lend themselves to such simple abstractions. This is due in large part to passing around mutable data structures. Any data structure that can be changed without creating a new reference is a potential defect in a multithreaded application. The problem is exacerbated by the transitory nature of thread-related defects: it's hard to test for them.

There are many strategies for dealing with concurrency. One can synchronize everything, but this often leads to deadlocks because two threads are locking resources that depend on each other.

Another strategy is to copy everything that's passed to a thread. This strategy uses memory and CPU to work around the threading issue. Every time a resource is needed by a thread, the resource is copied for the thread's use. This means each Array, Hashtable, and so on that's requested by a given thread is copied. That way, the current thread does not have to synchronize the data.

Scala and immutable data structures offer another alternative. In this example, we're going to write a synchronize-free program (even the underlying data structures contain no synchronization) that has 2,000 threads each updating shared data every 100 milliseconds and a master thread that summarizes the shared data structure every second.

Create a file called Multics.scala and put the code from Listing 3-8 in it. We'll go through the code line by line after the listing.

Example 3.8. Multics.scala

import java.util.concurrent.atomic.{AtomicReference => AtomR, AtomicLong}
import java.util.Random
import scala.collection.immutable.TreeHashMap

object Multics {
  type MT = Map[String, Int]

  val info: AtomR[MT] = new AtomR(TreeHashMap.empty)
  val clashCnt = new AtomicLong

  def main(argv: Array[String]) {
     runThread {
      repeatEvery(1000) {
        println("Clash Count: "+clashCnt+" Total: "+
                info.get.foldLeft(0)(_ + _._2))
      } }
for (i <- 1 to 2000) runThread {
      var cnt = 0
      val ran = new Random
      val name = "K"+i

      doSet(info){old => old + (name -> 0)}
      repeatEvery(ran.nextInt(100)) {
        doSet(info){old => old + (name -> (old(name) + 1))}
        cnt = cnt + 1
        if (cnt != info.get()(name))
          throw new Exception("Thread: "+name+" failed")
      } }
  }

  def runThread(f: => Unit) =
  (new Thread(new Runnable {def run(): Unit = f})).start

  def doSet[T](atom: AtomR[T])(update: T => T) {
    val old = atom.get
    if (atom.compareAndSet(old, update(old))) ()
    else {
      clashCnt.incrementAndGet
      doSet(atom)(update)
    }
  }

  def repeatEvery(len: => Int)(body: => Unit): Unit = {
    try {
      while(true) {

        Thread.sleep(len)
        body
      }
    } catch {
      case e => e.printStackTrace; System.exit(1)
    }
  }
}

Let's go through the Multics code and see how it works.

import java.util.concurrent.atomic.{AtomicReference => AtomR, AtomicLong}

This imports Java's AtomicReference and renames it to AtomR and imports AtomicLong.

import java.util.Random
import scala.collection.immutable.TreeHashMap

object Multics {

This defines the object Multics.

type MT = Map[String, Int]

Then we define a type, MT, that's a Map[String, Int]. Defining the type saves us some typing.

val info: AtomR[MT] = new AtomR(TreeHashMap.empty)

info is the AtomicReference that holds the immutable Map. We can update the reference from any thread without synchronizing the AtomicReference. We use TreeHashMap as it has better characteristics for high write concurrency than Scala's default immutable Map.

val clashCnt = new AtomicLong

clashCnt is an AtomicLong used the keep track of the number of times the threads clash when trying to write to info. clashCnt can be incremented without synchronization.

def main(argv: Array[String]) {
     runThread {
      repeatEvery(1000) {
        println("Clash Count: "+clashCnt+" Total: "+
                info.get.foldLeft(0)(_ + _._2))
      } }

This code creates a new thread and starts it running the repeatEvery code. repeatEvery(1000) waits 1,000 milliseconds, then executes the body code. The body code prints the clashCnt and the sum of the values of all the info items. We can access the Map contained by info without synchronization because the Map is immutable, and we know it's not going to change out from under us. Next, we create 2,000 threads.

for (i <- 1 to 2000) runThread {
      var cnt = 0
      val ran = new Random
      val name = "K"+i

This simply initializes the variables for this thread.

doSet(info){old => old + (name -> 0)}

This initializes the count to zero for this thread. The doSet method takes a function that creates a new Map from the old Map.

repeatEvery(ran.nextInt(100)) {

This loops continuously, pausing a random amount between 0 and 100 milliseconds before each loop.

doSet(info){old => old + (name -> (old(name) + 1))}

This increments the count in info associated with this thread.

cnt = cnt + 1
        if (cnt != info.get()(name))
          throw new Exception("Thread: "+name+" failed")
      } }
  }

If the thread's local count doesn't match what's in info, there was a concurrency problem, so we throw an exception.

def runThread(f: => Unit) =
  (new Thread(new Runnable {def run(): Unit = f})).start

This creates a new thread setting the run method to the function, f, and starts the thread. When the thread calls run, f will be invoked. This is an example of passing a block of code as a parameter.

def doSet[T](atom: AtomR[T])(update: T => T) {
    val old = atom.get

This does an atomic, synchronization-free update of atom. It reads the old value, passes it to the update function, and then tries to do an atomic set. If the atomic set succeeds (the old value has not changed during the update), then the set was successful because the mutation was done on the most recent version of the data. If compareAndSet fails, it increments the clashCnt and tries again.

if (atom.compareAndSet(old, update(old))) ()
    else {
      clashCnt.incrementAndGet
      doSet(atom)(update)
    }
  }

The repeatEvery method creates a control structure. It takes two parameters, len and body. Both are call-by-name, which means that each time they're referenced in the body of repeatEvery, the code from the call site will be executed.

def repeatEvery(len: => Int)(body: => Unit): Unit = {
    try {
      while(true) {

This code repeats until an exception is thrown or the program terminates. It sleeps for len. Because len is call-by-name, the code that defines len will be executed each loop. Then the body is executed. If an exception is thrown, it will be caught and reported, and the program will terminate.

Thread.sleep(len)
        body
      }
    } catch {
      case e => e.printStackTrace; System.exit(1)
    }
  }
}

To run the code, save the file, type scalac Multics.scala to compile and scala Multics to run the program. You should see output like this:

>scala Multics
Clash Count: 50 Total: 13793
Clash Count: 283 Total: 41197
Clash Count: 351 Total: 78243
Clash Count: 389 Total: 115833
Clash Count: 434 Total: 153487
Clash Count: 483 Total: 191531
Clash Count: 524 Total: 229006
Clash Count: 561 Total: 266716
Clash Count: 1332 Total: 290839
Clash Count: 1387 Total: 327051
Clash Count: 1442 Total: 364706
Clash Count: 1486 Total: 402789

The Multics program demonstrates how to write synchronization-free code and the benefits of immutable data structures for multithreaded code.

Summary

In this chapter, we explored using Scala's immutable data structures to create powerful, flexible code that works well in multithreaded code. We saw how the methods of Scala's List class allow us to transform Lists and how the transformations highlight the business logic without the need for a lot of boilerplate. Scala's Tuples are syntactically pleasing and type-safe, and they make passing and returning things like name/value pairs super-simple and super-easy. Immutable Maps provide the same advanced ability to transform as Lists but in a convenient name/value form. The Option class is a great alternative to Java's overuse of null, and you can make very syntactically pleasing constructs with the for comprehension. We went on to see Scala's XML support, which is built on top of the immutability and manipulation constructs for Lists.

More important, we covered many of the basic constructs that you need to build Scala programs. We saw how functional, or transformation, code can be simultaneously more concise, more safe, and more performant than imperative code. We touched on passing functions and creating control structures. In the next chapter, we're going to dive deep into passing functions as parameters.



[13] Mesa for NextStep, which is still available for Mac OS X (http://www.plsys.co.uk/mesa), and Mesa 2 for OS/2 and the Integer multiuser spreadsheet engine for the JVM.

[14] You can see exactly how Scala turns patterns into code by typing scalac -print FileName.scala. This will cause the Scala compiler to emit desugared code that looks strangely like Java code.

[15] The Class class in Java is language-level, but we must explicitly import java.lang.reflect.* because Method is in that package rather than in java.lang.*.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.248.175