Topics in This Chapter A1
This chapter interrupts the treatment of the Scala language to give you some tools from the Scala libraries. You will learn how to carry out common file processing tasks, such as reading all lines or words from a file, and to work with regular expressions.
This interlude has useful information for projects that you can embark on with your current knowledge of Scala. Of course, if you prefer, skip the chapter until you need it and move on to more information about the Scala language.
Chapter highlights:
Source.fromFile(...).getLines.toArray
yields all lines of a file.
Source.fromFile(...).mkString
yields the file contents as a string.
To convert a string into a number, use the toInt
or toDouble
method.
Use the Java PrintWriter
to write text files.
"regex".r
is a Regex
object.
Use """..."""
if your regular expression contains backslashes or quotes.
If a regex pattern has groups, you can extract their contents using the syntax for regex(var1, ..., varn) <- string
.
To read all lines from a file, call the getLines
method on a scala.io.Source
object:
import scala.io.Source
val filename = "/usr/share/dict/words"
var source = Source.fromFile(filename, "UTF-8")
// You can omit the encoding if you know that the file uses
// the default platform encoding
var lineIterator = source.getLines
The result is an iterator (see Chapter 13). You can use it to process the lines one at a time:
for l <- lineIterator do
process(l)
Or you can put the lines into an array or array buffer by applying the toArray
or toBuffer
method to the iterator:
val lines = source.getLines.toArray
Sometimes, you just want to read an entire file into a string. That’s even simpler:
var contents = source.mkString
Caution
Call close
when you are done using the Source
object.
When the Source
class was created, the Java API for file processing was very limited. Java has caught up, and you may want to use the java.nio.file.Files
class instead:
import java.nio.file.{Files,Path}
import java.nio.charset.StandardCharsets.UTF_8
contents = Files.readString(Path.of(filename), UTF_8)
To read all lines with the Files.lines
method, convert the Java stream to Scala:
import scala.jdk.StreamConverters.*
val lineBuffer = Files.lines(Path.of(filename), UTF_8).toScala(Buffer)
lineIterator = Files.lines(Path.of(filename), UTF_8).toScala(Iterator)
To read individual characters from a file, you can use a Source
object directly as an iterator since the Source
class extends Iterator[Char]
:
for c <- source do process(c)
If you want to be able to peek at a character without consuming it (like istream::peek
in C++ or a PushbackInputStreamReader
in Java), call the buffered
method on the source
object. Then you can peek at the next input character with the head
method without consuming it.
source = Source.fromFile("myfile.txt", "UTF-8")
val iter = source.buffered
while iter.hasNext do
if isNice(iter.head) then
process(iter)
else
iter.next
source.close()
Here is a quick-and-dirty way of reading all whitespace-separated tokens in a source:
val tokens = source.mkString.split("\s+")
To convert a string into a number, use the toInt
or toDouble
method. For example, if you have a file containing floating-point numbers, you can read them all into an array by
val numbers = for w <- tokens yield w.toDouble
Tip
Remember—you can always use the java.util.Scanner
class to process a file that contains a mixture of text and numbers.
Finally, note that you can read numbers from scala.io.StdIn
:
print("How old are you? ")
val age = StdIn.readInt()
//
Or use readDouble or readLong
Caution
These methods assume that the next input line contains a single number, without leading or trailing whitespace. Otherwise, a NumberFormatException
occurs.
The Source
object has methods to read from sources other than files:
val source1 = Source.fromURL("https://horstmann.com/index.html", "UTF-8")
val source2 = Source.fromString("Hello, World!")
//
Reads from the given string—useful for debuggingval source3 = Source.stdin
//
Reads from standard input
Caution
When you read from a URL, you need to know the character set in advance, from an HTTP header or the first 1024 bytes of the contents. See https://www.w3.org/International/questions/qa-html-encoding-declarations
for more information.
Scala has no provision for reading binary files. You’ll need to use the Java library. Here is how you can read a file into a byte array:
val bytes = Files.readAllBytes(Path.of(filename)); //
An Array[Byte]
Scala has no built-in support for writing files. To write a text file, use a java.io.PrintWriter
, for example:
val out = PrintWriter(filename)
for i <- 1 to 100 do out.println(i)
You can also write formatted output:
val quantity = 10
val price = 29.95
out.printf("%6d %10.2f%n", quantity, price)
Remember to close the writer:
out.close()
There are no “official” Scala classes for visiting all files in a directory, or for recursively traversing directories.
The simplest approach is to use the Files.list
and Files.walk
methods of the java.nio.file
package. The list
method only visits the children of a directory, and the walk
method visits all descendants. These methods yield Java streams of Path
objects. You can visit them as follows:
import java.nio.file.*
import scala.jdk.StreamConverters.*
val dirname = "/home"
val entries = Files.list(Paths.get(dirname)) // or Files.walk
try
for p <- entries.toScala(Iterator) do
process(p)
finally
entries.close()
In Java, serialization is used to transmit objects to other virtual machines or for short-term storage. (For long-term storage, serialization can be awkward—it is tedious to deal with different object versions as classes evolve over time.)
Here is how you declare a serializable class in Java and Scala.
Java:
public class Person implements java.io.Serializable { //
This is Javaprivate static final long serialVersionUID = 42L;
private String name;
...
}
Scala:
@SerialVersionUID(42L) class Person(val name: String) extends Serializable
The Serializable
trait is defined in the scala
package and does not require an import.
Note
You can omit the @SerialVersionUID
annotation if you are OK with the default ID.
Serialize and deserialize objects in the usual way:
val fred = Person("Fred")
val out = ObjectOutputStream(FileOutputStream("/tmp/test.ser"))
out.writeObject(fred)
out.close()
val in = ObjectInputStream(FileInputStream("/tmp/test.ser"))
val savedFred = in.readObject().asInstanceOf[Person]
The Scala collections are serializable, so you can have them as members of your serializable classes:
class Person extends Serializable :
private val friends = ArrayBuffer[Person]() // OK—ArrayBuffer is serializable
...
Traditionally, programmers use shell scripts to carry out mundane processing tasks, such as moving files from one place to another, or combining a set of files. The shell language makes it easy to specify subsets of files and to pipe the output of one program into the input of another. However, as programming languages, most shell languages leave much to be desired.
Scala was designed to scale from humble scripting tasks to massive programs. The scala.sys.process
package provides utilities to interact with shell programs. You can write your shell scripts in Scala, with all the power that the Scala language puts at your disposal.
Here is a simple example:
import scala.sys.process.*
"ls -al ..".!
As a result, the ls -al ..
command is executed, showing all files in the parent directory. The result is printed to standard output.
The scala.sys.process
package contains an implicit conversion from strings to ProcessBuilder
objects. The !
method executes the ProcessBuilder
object.
The result of the !
method is the exit code of the executed program: 0
if the program was successful, or a nonzero failure indicator otherwise.
If you use !!
instead of !
, the output is returned as a string:
val result = "ls -al /".!!
Note
The !
and !!
operators were originally intended to be used as postfix operators without the method invocation syntax:
"ls -al /" !!
However, as you will see in Chapter 11, the postfix syntax is being deprecated since it can lead to parsing errors.
You can pipe the output of one program into the input of another, using the #|
method:
("ls -al /" #| "grep u").!
Note
As you can see, the process library uses the commands of the underlying operating system. Here, I use bash
commands because bash
is available on Linux, Mac OS X, and Windows.
To redirect the output to a file, use the #>
method:
("ls -al /" #> File("/tmp/filelist.txt")).!
To append to a file, use #>>
instead:
("ls -al /etc" #>> File("/tmp/filelist.txt")).!
To redirect input from a file, use #<
:
("grep u" #< File("/tmp/filelist.txt")).!
You can also redirect input from a URL:
("grep Scala" #< URL("http://horstmann.com/index.html")).!
You can combine processes with p #&& q
(execute q
if p
was successful) and p #|| q
(execute q
if p
was unsuccessful). But frankly, Scala is better at control flow than the shell, so why not implement the control flow in Scala?
Note
The process library uses the familiar shell operators | > >> < && ||
, but it prefixes them with a #
so that they all have the same precedence.
If you need to run a process in a different directory, or with different environment variables, construct a ProcessBuilder
with the apply
method of the Process
object. Supply the command, the starting directory, and a sequence of (name, value)
pairs for environment settings:
val p = Process(cmd, File(dirName), ("LC_ALL", myLocale))
Then execute it with the !
method:
("echo 42" #| p).!
When executing a process command that generates a large amount of output, you can read the output lazily:
val result = "ls -al /".lazyLines //
Yields a LazyList[String]
See Chapter 13 how to process a lazy list.
Note
If you want to use Scala for shell scripts in a UNIX/Linux/MacOS environment, start your script files like this:
#!/bin/sh
exec scala "$0" "$@"
!#
Scala commands
Note
You can also run Scala scripts from Java programs with the scripting integration of the javax.script
package. To get a script engine, call
ScriptEngine engine =
new ScriptEngineManager().getEngineByName("scala") //
This is Java
You need the Scala compiler on the class path. If you use Coursier, you can get the class path as
coursier fetch -p org.scala-lang:scala3-compiler_3:3.2.0
When you process input, you often want to use regular expressions to analyze it. The scala.util.matching.Regex
class makes this simple. To construct a Regex
object, use the r
method of the String
class:
val numPattern = "[0-9]+".r
If the regular expression contains backslashes or quotation marks, then it is a good idea to use the “raw” string syntax, """..."""
. For example:
val wsnumwsPattern = """s+[0-9]+s+""".r
//
A bit easier to read than "\s+[0-9]+\s+".r
The matches
method tests whether a regular expression matches a string:
if numPattern.matches(input) then
val n = input.toInt
...
The entire input must match. To find out whether the string contains a match, turn the regular expression into unanchored mode:
if numPattern.unanchored.matches(input) then
println("There is a number here somewhere")
The findAllIn
method returns an Iterator[String]
through all matches. Since you are unlikely to have many matches, you can simply collect the results:
input = "99 bottles, 98 bottles"
numPattern.findAllIn(input).toArray //
Yields Array(99, 98)
Note that you don’t need to call unanchored
.
To get more information about the matches, call findAllMatchIn
to get an Iterator[Match]
. Each Match
object describes the current match. Use the following methods for the match details:
start
, end
: The starting and ending index of the matching substring
matched
: The matched substring
before
, after
: The substrings before or after the match
For example:
for m <- numPattern.findAllMatchIn(input) do
println(s"${m.start} ${m.end}")
To find the first match in a string, use findFirstIn
or findFirstMatchIn
. You get an Option[String]
or Option[Match]
.
val firstMatch = wsnumwsPattern.findFirstIn("99 bottles, 98 bottles")
// Some(" 98 ")
You can replace the first match, all matches, or some matches. In the latter case, supply a function Match => Option[String]
. If the function returns Some(str)
, the match is replaced with str
.
numPattern.replaceFirstIn("99 bottles, 98 bottles", "XX")
// "XX bottles, 98 bottles"
numPattern.replaceAllIn("99 bottles, 98 bottles", "XX")
// "XX bottles, XX bottles"
numPattern.replaceSomeIn("99 bottles, 98 bottles",
m => if m.matched.toInt % 2 == 0 then Some("XX") else None)
// "99 bottles, XX bottles"
Here is a more useful application of the replaceSomeIn
method. We want to replace placeholders $0
, $1
, and so on, in a message string with values from an argument sequence. Make a pattern for the variable with a group for the index, and then map the group to the sequence element.
val varPattern = """$[0-9]+""".r
def format(message: String, vars: String*) =
varPattern.replaceSomeIn(message, m => vars.lift(
m.matched.tail.toInt))
format("At $1, there was $2 on $0.",
"planet 7", "12:30 pm", "a disturbance of the force")
// At 12:30 pm, there was a disturbance of the force on planet 7.
The lift
method turns a Seq[String]
into a function. The expression vars.lift(i)
is Some(vars(i))
if i
is a valid index or None
if it is not.
Groups are useful to get subexpressions of regular expressions. Add parentheses around the subexpressions that you want to extract, for example:
val numitemPattern = "([0-9]+) ([a-z]+)".r
You can get the group contents from a Match
object. If m
is a Match
object, then m.group(i)
is the i
th group. The start and end positions of these substrings in the original string are m.start(i)
, and m.end(i)
.
for m <- numitemPattern.findAllMatchIn("99 bottles, 98 bottles") do
println(m.group(1)) //
Prints 99 and 98
Caution
The Match
class has methods for retrieving groups by name. However, this does not work with group names inside regular expressions, such as "(?<num>[0-9]+) (?<item>[a-z]+)".r
. Instead, one needs to supply names to the r
method: "([0-9]+) ([a-z]+)".r("num", "item")
.
There is another convenient way of extracting group matches. Use a regular expression variable as an “extractor” (see Chapter 14), like this:
val numitemPattern(num, item) = "99 bottles"
//
Sets num to "99", item to "bottles"
When you use a pattern as an extractor, it must match the string from which you extract the matches, and there must be a group for each variable.
If you are not sure whether there is a match, use
str match
case numitemPattern(num, item) => ...
To extract groups from multiple matches, you can use a for
statement like this:
for numitemPattern(num, item) <- numitemPattern.findAllIn("99 bottles, 98 bottles") do
process(num, item)
1. Write a Scala code snippet that reverses the lines in a file (making the last line the first one, and so on).
2. Write a Scala program that reads a file with tabs, replaces each tab with spaces so that tab stops are at n-column boundaries, and writes the result to the same file.
3. Write a Scala code snippet that reads a file and prints all words with more than 12 characters to the console. Extra credit if you can do this in a single line.
4. Write a Scala program that reads a text file containing only floating-point numbers. Print the sum, average, maximum, and minimum of the numbers in the file.
5. Write a Scala program that writes the powers of 2 and their reciprocals to a file, with the exponent ranging from 0 to 20. Line up the columns:
1 1
2 0.5
4 0.25
... ...
6. Make a regular expression searching for quoted strings "like this, maybe with " or \"
in a source file. Write a Scala program that prints out all such strings.
7. Write a Scala program that reads a text file and prints all tokens in the file that are not floating-point numbers. Use a regular expression.
8. Write a Scala program that prints the src
attributes of all img
tags of a web page. Use regular expressions and groups.
9. Write a Scala program that counts how many files with .class
extension are in a given directory and its subdirectories.
10. Expand the example in Section 9.7, “Serialization,” on page 121. Construct a few Person
objects, make some of them friends of others, and save an Array[Person]
to a file. Read the array back in and verify that the friend relations are intact.
3.15.171.103