Using regular expressions in Java Scanner API

A scanner is a utility class used for parsing the input text and breaking the input into tokens of various types, such as Boolean, int, float, double, long, and so on. It generates tokens of various types using regular expression-based delimiters. The default delimiter is a whitespace. Using the Scanner API, we can generate tokens of all the primitive types in addition to string tokens.

The String, Pattern, and Matcher classes are able to parse the input and generate tokens of the String type only, but the Scanner class is very useful for checking and generating tokens of different types from the input source. The Scanner instance can be constructed using the File, InputStream, Path, Readable, ReadableByteChannel, and String arguments.

Pattern and Matcher will be covered in detail in Chapter 5, Introduction to Java Regular Expression APIs - Pattern and Matcher Classes.

There are many methods in the scanner that support regular expressions. Let's list those methods down and understand them better:

Method Signature

Purpose

Scanner useDelimiter(String pattern)

Sets this scanner's delimiter regex pattern to a String regex argument.

Scanner useDelimiter(Pattern pattern)

This method is almost the same as the previous one but gets a Pattern as an argument instead of a String. This means that we can pass a regular expression already compiled. If we are forced to use the version with the String argument, the scanner would compile the string to a Pattern object even if we have already executed that compilation in other parts of the code.

We will discuss the Pattern and Matcher class in the next chapter.

Pattern delimiter()

Returns the pattern being used by this scanner to match delimiters.

MatchResult match()

Returns the match result of the latest scan operation performed by this scanner.

boolean hasNext(String pattern)

Returns true if the next token matches the pattern constructed from the specified string.

boolean hasNext(Pattern pattern)

This method is almost the same as the previous one but gets Pattern as an argument instead of String.

String next(String pattern)

Returns the next token if it matches the pattern constructed from the specified string.

String next(Pattern pattern)

This method is almost the same as the previous one but gets Pattern as an argument instead of String.

String findInLine(String pattern)

Attempts to find the next occurrence of a pattern constructed from the specified string, ignoring delimiters.

String findInLine(Pattern pattern)

This method is almost the same as the previous one but gets Pattern as an argument instead of String.

Scanner skip(String pattern)

Skips the input that matches a pattern constructed from the specified string, ignoring delimiters.

Scanner skip(Pattern pattern)

This method is almost the same as the previous one but gets Pattern as an argument instead of String.

String findWithinHorizon(String pattern, int horizon)

Attempts to find the next occurrence of a pattern constructed from the specified string, ignoring delimiters.

String findWithinHorizon(Pattern pattern, int horizon)

This method is almost the same as the previous one but gets Pattern as an argument instead of String.

 

In addition to the two hasNext() methods mentioned in the preceding table using regular expression, the Scanner class also provides several overloaded hasNext methods that return true if the next available token in the input can be retrieved for that particular type. For example: hasNextInt(), hasNextDouble(), hasNextBoolean(), hasNextByte(), hasNextFloat(), hasNextLong(), hasNextShort(), hasNextBigInteger(), hasNextBigDecimal(), hasNext().

Similarly, there are several overloaded next methods that scan the input to return the next token for that particular type. For example: nextextInt(), nextextDouble(), nextextBoolean(), nextextByte(), nextextFloat(), nextextLong(), nextextShort(), nextextBigInteger(), nextextBigDecimal(), nextext().

For the complete reference of the Scanner class refer to https://docs.oracle.com/javase/8/docs/api/java/util/Scanner.html.

Suppose there is an input text delimited by two exclamation marks. The data is structured in the following sequence:

animal!!id!!weight 

The animal name is a string, id is an integer number, and weight is a double number.

With this structure, here is an example input:

Tiger!!123!!221.2!!Fox!!581!!52.50 

Given that there are two animals, here is how we can use the Scanner class to parse this input data in Java:

    final String input = "Tiger!!123!!221.2!!Fox!!581!!52.50";  
final int MAX_COUNT = 2;
String animal;
int id;
double weight;

Scanner scanner = new Scanner(input).useDelimiter("!!");

for (int i=0; i<MAX_COUNT; i++)
{
animal = scanner.next();
id = scanner.nextInt();
weight = scanner.nextDouble();

System.out.printf("animal=[%s], id=[%d], weight=[%.2f]%n", animal, id, weight);
}

scanner.close();


This is what is happening in this code:

  • new Scanner(input) is the code to construct a scanner using the input string
  • scanner.useDelimiter("!!") sets the delimiter regular expression as "!!"
  • scanner.next() gets the next string token from the constructed scanner
  • scanner.nextInt() gets the next int token from the scanner
  • scanner.nextDouble() gets the next double token from the scanner
  • scanner.close() closes the scanner object; we cannot generate further tokens from the scanner after this method call

As you can guess, we will get the following output from the preceding code:

animal=[Tiger], id=[123], weight=[221.20] 
animal=[Fox], id=[581], weight=[52.50]

Let's parse a more complex input data to understand the use of the Scanner class better. Here is the complete code listing:

package example.regex; 

import java.util.*;

public class ScannerApi
{
public static void main (String[] args)
{
final String str = "London:Rome#Paris:1234:Munich///Moscow";

Scanner scanner = new Scanner(str);

scanner.useDelimiter("\p{Punct}+");

final String cityPattern = "\p{L}+";

while(scanner.hasNext()) {
if(scanner.hasNext(cityPattern)) {
System.out.println(scanner.next());
}
else {
scanner.next();
}
}

scanner.close();
}
}

This is what is happening in this code:

  • new Scanner(str) is the code to construct a scanner using the input string
  • * scanner.useDelimiter("\p{Punct}+") sets the delimiter regular expression as one or more punctuation characters
  • We are using "\p{L}+" as the acceptable city name pattern, which means one or more Unicode letters
  • scanner.hasNext(cityPattern) returns true if the next token from the scanner matches cityPattern
  • scanner.next() retrieves the next string token from the scanner
  • scanner.close() closes the scanner object; we cannot generate further tokens from the scanner after this method call

Upon compiling and running the preceding code, it will produce the following output:

London 
Rome
Paris
Munich
Moscow
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.28.107