2.13. Parsing Formatted Strings

Problem

You need to parse a string containing control characters and the delimiters (, [, ), ], and ,.

Solution

Use variations of substring() from StringUtils. This next example parses a string that contains five numbers delimited by parentheses, brackets, and a pipe symbol (N0 * (N1,N2) [N3,N4] | N5):

String formatted = " 25 * (30,40) [50,60] | 30"

PrintWriter out = System.out;
out.print("N0: " + StringUtils.substringBeforeLast( formatted, "*" ) );
out.print(", N1: " + StringUtils.substringBetween( formatted, "(", "," ) );
out.print(", N2: " + StringUtils.substringBetween( formatted, ",", ")" ) );
out.print(", N3: " + StringUtils.substringBetween( formatted, "[", "," ) );
out.print(", N4: " + StringUtils.substringBetween( formatted, ",", "]" ) );
out.print(", N5: " + StringUtils.substringAfterLast( formatted, "|" ) );

This parses the formatted text and prints the following output:

N0: 25, N1: 30, N2: 40, N3: 50, N4: 60, N5: 30

Discussion

The following public static methods come in handy when trying to extract information from a formatted string:

StringUtils.substringBetween( )

Captures content between two strings

StringUtils.substringAfter( )

Captures content that occurs after the specified string

StringUtils.substringBefore( )

Captures content that occurs before a specified string

StringUtils.substringBeforeLast( )

Captures content after the last occurrence of a specified string

StringUtils.substringAfterLast( )

Captures content before the last occurrence of a specified string

To illustrate the use of these methods, here is an example of a feed of sports scores. Each record in the feed has a defined format, which resembles this feed description:

(SOT)<sport>[<team1>,<team2>] (<score1>,<score2>)(ETX) 

Notes:
 (SOT) is ASCII character 2 "Start of Text",
 (ETX) is ASCII character 4 "End of Transmission". 

Example:
 (SOT)Baseball[BOS,SEA] (24,22)(ETX)
 (SOT)Basketball[CHI,NYC] (29,5)(ETX)

The following example parses this feed using StringUtils methods trim( ), substringBetween( ), and substringBefore( ). The boxScore variable holds a test string to parse, and, once parsed, this code prints out the game score:

// Create a formatted string to parse - get this from a feed
char SOT = 'u0002';
char ETX = 'u0004';
String boxScore = SOT + "Basketball[CHI,BOS](69,75)
" + ETX;

// Get rid of the archaic control characters
boxScore = StringUtils.trim( boxScore ); 

// Parse the score into component parts 
String sport = StringUtils.substringBefore( boxScore, "[" );
String team1 = StringUtils.substringBetween( boxScore, "[", "," );
String team2 = StringUtils.substringBetween( boxScore, ",", "]" );
String score1 = StringUtils.substringBetween( boxScore, "(", "," );
String score2 = StringUtils.substringBetween( boxScore, ",", ")" );

PrintWriter out = System.out
out.println( "**** " + sport + " Score" );
out.println( "	" + team1 + "	" + score1 );
out.println( "	" + team2 + "	" + score2 );

This code parses a score, and prints the following output:

**** Basketball
 CHI 69
 BOS 75

In the previous example, StringUtils.trim( ) rids the text of the SOT and ETX control characters. StringUtils.substringBefore( ) then reads the sport name—“Basketball”—and substringBetween( ) is used to retrieve the teams and scores.

At first glance, the value of these substring( ) variations is not obvious. The previous example parsed this simple formatted string using three static methods on StringUtils, but how difficult would it have been to implement this parsing without the aid of Commons Lang? The following example parses the same string using only methods available in the Java 1.4 J2SE:

// Find the sport name without using StringUtils
boxScore = boxScore.trim( );

int firstBracket = boxScore.indexOf( "[" );
String sport = boxScore.substring( 0, firstBracket );

int firstComma = boxScore.indexOf( "," );
String team1 = boxScore.substring( firstBracket + 1, firstComma );

int secondBracket = boxScore.indexOf( "]" );
String team2 = boxScore.substring( firstComma + 1, secondBracket );

int firstParen = boxScore.indexOf( "(" );
int secondComma = boxScore.indexOf( ",", firstParen );
String score1 = boxScore.substring( firstParen + 1, secondComma );

int secondParen = boxScore.indexOf( ")" );
String score2 = boxScore.substring( secondComma + 1, secondParen );

This parses the string in a similar number of lines, but the code is less straightforward and much more difficult to maintain. Instead of simply calling a substringBetween( ) method, the previous example calls String.indexOf( ) and performs arithmetic with an index while calling String.substring( ). Additionally, the substring( ) methods on StringUtils are null-safe; the Java 1.4 example could throw a NullPointerException if boxScore was null.

String.trim( ) has the same behavior as StringUtils.trim( ), stripping the string of all whitespace and ASCII control characters. StringUtils.trim() is simply a wrapper for the String.trim( ) method, but the StringUtils.trim( ) method can gracefully handle a null input. If a null value is passed to StringUtils.trim(), a null value is returned.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.131.38.14