Taking Strings Apart with StringTokenizer

Problem

You need to take a string apart into words or tokens.

Solution

Construct a StringTokenizer around your string and call its methods hasMoreTokens( ) and nextToken( ). These implement the Iterator design pattern (see Section 7.5). In addition, StringTokenizer implements the Enumeration interface (also in Section 7.5), but if you use the methods thereof you will need to cast the results to String:

// StrTokDemo.java
StringTokenizer st = new StringTokenizer("Hello World of Java");

while (st.hasMoreTokens(  ))
    System.out.println("Token: " + st.nextToken(  ));

The StringTokenizer normally breaks the String into tokens at what we would think of as “word boundaries” in European languages. Sometimes you want to break at some other character. No problem. When you construct your StringTokenizer, in addition to passing in the string to be tokenized, pass in a second string that lists the "break characters.” For example:

// StrTokDemo2.java
StringTokenizer st = new StringTokenizer("Hello, World|of|Java", ", |");

while (st.hasMoreElements(  ))
    System.out.println("Token: " + st.nextElement(  ));

But wait, there’s more! What if you are reading lines like:

FirstName|Lastname|Company|PhoneNumber

and your dear old Aunt Begonia hasn’t been employed for the last 38 years? Her “Company” field will in all probability be blank.[12] If you look very closely at the previous code example, you’ll see that it has two delimiters together (the comma and the space), but if you run it there are no “extra” tokens. That is, the StringTokenizer normally discards adjacent consecutive delimiters. For cases like the phone list, where you need to preserve null fields, there is good news and bad news. The good news is you can do it; you simply add a second argument of true when constructing the StringTokenizer, meaning that you wish to see the delimiters as tokens. The bad news is that you now get to see the delimiters as tokens, so you have to do the arithmetic yourself. Want to see it? Run this program:

// StrTokDemo3.java
StringTokenizer st = 
    new StringTokenizer("Hello, World|of|Java", ", |", true);

while (st.hasMoreElements(  ))
    System.out.println("Token: " + st.nextElement(  ));

and you get this output:

C:javasrc>java  StrTokDemo3
Token: Hello
Token: ,
Token:
Token: World
Token: |
Token: of
Token: |
Token: Java

This isn’t how you’d like StringTokenizer to behave, ideally, but it is serviceable enough most of the time. Example 3-1 processes and ignores consecutive tokens, returning the results as an array of strings.

Example 3-1. StrTokDemo4.java (StringTokenizer)

import java.util.*;

/** Show using a StringTokenizer including getting the delimiters back */
public class StrTokDemo4 {
    public final static int MAXFIELDS = 5;
    public final static String DELIM = "|";

    /** Processes one String, returns it as an array of fields */
    public static String[] process(String line) {
        String[] results = new String[MAXFIELDS];

        // Unless you ask StringTokenizer to give you the tokens,
        // it silently discards multiple null tokens.
        StringTokenizer st = new StringTokenizer(line, DELIM  true);

        int i = 0;
        // stuff each token into the current user
        while (st.hasMoreTokens(  )) {
            String s = st.nextToken(  );
            if (s.equals(DELIM)) {
                if (i++>=MAXFIELDS)
                    // This is messy: See StrTokDemo4b which uses 
                    // a Vector to allow any number of fields.
                    throw new IllegalArgumentException("Input line " +
                        line + " has too many fields");
                continue;
            }
            results[i] = s;
        }
        return results;
    }

    public static void printResults(String input, String[] outputs) {
        System.out.println("Input: " + input);
        for (int i=0; i<outputs.length; i++)
            System.out.println("Output " + i + " was: " + outputs[i]);
    }

    public static void main(String[] a) {
        printResults("A|B|C|D", process("A|B|C|D"));
        printResults("A||C|D", process("A||C|D"));
        printResults("A|||D|E", process("A|||D|E"));
    }
}

When you run this, you will see that A is always in Field 1, B (if present) in Field 2, and so on. In other words, the null fields are being handled properly.

Input: A|B|C|D
Output 0 was: A
Output 1 was: B
Output 2 was: C
Output 3 was: D
Output 4 was: null
Input: A||C|D
Output 0 was: A
Output 1 was: null
Output 2 was: C
Output 3 was: D
Output 4 was: null
Input: A|||D|E
Output 0 was: A
Output 1 was: null
Output 2 was: null
Output 3 was: D
Output 4 was: E


[12] Unless, perhaps, you’re as slow at updating personal records as I am.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.132.99