148. Working with Scanner

Scanner exposes an API for parsing text from strings, files, the console, and so on. Parsing is the process of tokenizing the given input and returning it as needed (for example, integers, floats, doubles, and so on). By default, Scanner parses the given input by using a white space (default delimiter) and exposes the tokens via a suite of nextFoo() methods (for example, next(), nextLine(), nextInt(), nextDouble(), and so on).

From the same category of problems, consider the Tokenizing files section as well.

For example, let's assume that we have a file (doubles.txt) that contains double numbers separated by spaces, as shown in the following illustration:

If we want to obtain this text as doubles, then we can read it and rely on a snippet of spaghetti code to tokenize and convert it into doubles. Alternatively, we can rely on Scanner and its nextDouble() method, as follows:

try (Scanner scanDoubles = new Scanner(
    Path.of("doubles.txt"), StandardCharsets.UTF_8)) {

  while (scanDoubles.hasNextDouble()) {
    double number = scanDoubles.nextDouble();
    System.out.println(number);
  }
}

The output of the preceding code is as follows:

23.4556
1.23
...

However, a file may contain mixed information of different types. For example, the file (people.txt) in the following illustration contains strings and integers that are separated by different delimiters (a comma and a semicolon):

Scanner exposes a method called useDelimiter(). This method takes an argument of the String or Pattern type in order to specify the delimiter(s) that should be used as a regular expression:

try (Scanner scanPeople = new Scanner(Path.of("people.txt"),
    StandardCharsets.UTF_8).useDelimiter(";|,")) {

  while (scanPeople.hasNextLine()) {
    System.out.println("Name: " + scanPeople.next().trim());
    System.out.println("Surname: " + scanPeople.next());
    System.out.println("Age: " + scanPeople.nextInt());
    System.out.println("City: " + scanPeople.next());
  }
}

The output of using this method is as follows:

Name: Matt
Surname: Kyle
Age: 23
City: San Francisco
...

Starting with JDK 9, Scanner exposes a new method called tokens(). This method returns a stream of delimiter-separated tokens from Scanner. For example, we can use it to parse the people.txt file and print it on the console, as follows:

try (Scanner scanPeople = new Scanner(Path.of("people.txt"),
    StandardCharsets.UTF_8).useDelimiter(";|,")) {

  scanPeople.tokens().forEach(t -> System.out.println(t.trim()));
}

The output of using the preceding method is as follows:

Matt
Kyle
23
San Francisco
...

Alternatively, we can join the tokens by space:

try (Scanner scanPeople = new Scanner(Path.of("people.txt"),
    StandardCharsets.UTF_8).useDelimiter(";|,")) {

  String result = scanPeople.tokens()
    .map(t -> t.trim())
    .collect(Collectors.joining(" "));
}

In the Searching in big files section, there is an example of how to use this method to search for a certain piece of text in a file.

The output of using the preceding method is as follows:

Matt Kyle 23 San Francisco Darel Der 50 New York ...

In terms of the tokens() methods, JDK 9 also comes with a method called findAll(). This is a very handy method for finding all the tokens that respect a certain regular expression (provided as a String or Pattern). This method returns a Stream<MatchResult> and can be used like so:

try (Scanner sc = new Scanner(Path.of("people.txt"))) {

  Pattern pattern = Pattern.compile("4[0-9]");

  List<String> ages = sc.findAll(pattern)
    .map(MatchResult::group)
    .collect(Collectors.toList());

  System.out.println("Ages: " + ages);
}

The preceding code selects all the tokens that represent ages between 40 and 49 years old, that is, 40, 43, and 43.

Scanner is a convenient approach to use if we wish to parse the input that's provided in the console:

Scanner scanConsole = new Scanner(System.in);

String name = scanConsole.nextLine();
String surname = scanConsole.nextLine();
int age = scanConsole.nextInt();
// an int cannot include "
" so we need
//the next line just to consume the "
"
scanConsole.nextLine();
String city = scanConsole.nextLine();

Note that, for numeric inputs (read via nextInt(), nextFloat(), and so on), we need to consume the newline character as well (this occurs when we hit Enter). Basically, Scanner will not fetch this character when parsing a number, and so it will go in the next token. If we don't consume it by adding a nextLine() code line then, from this point forward, the inputs will become unaligned and lead to an exception of the InputMismatchException type or come to a premature end.
The Scanner constructors that support charsets were introduced in JDK 10.

Let's take a look at the difference between Scanner and BufferedReader.

Table of Contents for 148. Working with Scanner

Create new playlist

Sign In

Sign Up

Table of Contents for
148. Working with Scanner