Construct a StringTokenizer
around your string and
call its methods hasMoreTokens( )
and
nextToken( )
. These implement the
Iterator
design pattern (see Section 7.5). In addition,
StringTokenizer
implements the
Enumeration
interface (also in Section 7.5), but if you use the methods thereof you will
need to cast the results to String
:
// StrTokDemo.java StringTokenizer st = new StringTokenizer("Hello World of Java"); while (st.hasMoreTokens( )) System.out.println("Token: " + st.nextToken( ));
The StringTokenizer
normally breaks the
String
into tokens at what we would think of as
“word boundaries” in European languages. Sometimes you
want to break at some other character. No problem. When you construct
your StringTokenizer
, in addition to passing in
the string to be tokenized, pass in a second string that lists the
"break characters.” For
example:
// StrTokDemo2.java StringTokenizer st = new StringTokenizer("Hello, World|of|Java", ", |"); while (st.hasMoreElements( )) System.out.println("Token: " + st.nextElement( ));
But wait, there’s more! What if you are reading lines like:
FirstName|Lastname|Company|PhoneNumber
and your dear old Aunt Begonia hasn’t been employed for the
last 38 years? Her “Company” field will in all
probability be blank.[12] If
you look very closely at the previous code example, you’ll see
that it has two delimiters together (the comma and the space), but if
you run it there are no “extra” tokens. That is, the
StringTokenizer
normally discards adjacent
consecutive delimiters. For cases like the phone list, where you need
to preserve null fields, there is good news and
bad news. The good news is you can do it; you simply add a second
argument of true
when constructing the
StringTokenizer
, meaning that you wish to see the
delimiters as tokens. The bad news is that you now get to see the
delimiters as tokens, so you have to do the arithmetic yourself. Want
to see it? Run this program:
// StrTokDemo3.java StringTokenizer st = new StringTokenizer("Hello, World|of|Java", ", |", true); while (st.hasMoreElements( )) System.out.println("Token: " + st.nextElement( ));
and you get this output:
C:javasrc>java StrTokDemo3 Token: Hello Token: , Token: Token: World Token: | Token: of Token: | Token: Java
This isn’t how you’d like
StringTokenizer
to behave, ideally, but it is
serviceable enough most of the time. Example 3-1
processes and ignores consecutive tokens, returning the results
as an array of
strings.
Example 3-1. StrTokDemo4.java (StringTokenizer)
import java.util.*; /** Show using a StringTokenizer including getting the delimiters back */ public class StrTokDemo4 { public final static int MAXFIELDS = 5; public final static String DELIM = "|"; /** Processes one String, returns it as an array of fields */ public static String[] process(String line) { String[] results = new String[MAXFIELDS]; // Unless you ask StringTokenizer to give you the tokens, // it silently discards multiple null tokens. StringTokenizer st = new StringTokenizer(line, DELIM true); int i = 0; // stuff each token into the current user while (st.hasMoreTokens( )) { String s = st.nextToken( ); if (s.equals(DELIM)) { if (i++>=MAXFIELDS) // This is messy: See StrTokDemo4b which uses // a Vector to allow any number of fields. throw new IllegalArgumentException("Input line " + line + " has too many fields"); continue; } results[i] = s; } return results; } public static void printResults(String input, String[] outputs) { System.out.println("Input: " + input); for (int i=0; i<outputs.length; i++) System.out.println("Output " + i + " was: " + outputs[i]); } public static void main(String[] a) { printResults("A|B|C|D", process("A|B|C|D")); printResults("A||C|D", process("A||C|D")); printResults("A|||D|E", process("A|||D|E")); } }
When you run this, you will see that A
is always
in Field 1, B
(if present) in Field 2, and so on.
In other words, the null fields
are being handled properly.
Input: A|B|C|D Output 0 was: A Output 1 was: B Output 2 was: C Output 3 was: D Output 4 was: null Input: A||C|D Output 0 was: A Output 1 was: null Output 2 was: C Output 3 was: D Output 4 was: null Input: A|||D|E Output 0 was: A Output 1 was: null Output 2 was: null Output 3 was: D Output 4 was: E
3.133.132.99