Reading “Continued” Lines

Problem

You need to read lines that are continued with backslashes () or that are continued with leading spaces (such as email or news headers).

Solution

Use my IndentContLineReader or EscContLineReader classes.

Discussion

This functionality is likely to be reused, so it should be encapsulated in general-purpose classes. I offer the IndentContLineReader and EscContLineReader classes. EscContLineReader reads lines normally, but if a line ends with the escape character (by default, the backslash), then the escape character is deleted and the following line is joined to the preceding line. So if you have lines like this in the input:

Here is something I wanted to say:
Try and Buy in every way.
Go Team!

and you read them using an EscContLineReader’s readLine( ) method, then you will get the following lines:

Here is something I wanted to say: Try and Buy in every way.
Go Team!

Note in particular that my reader does provide a space character between the abutted parts of the continued line. An IOException will be thrown if a file ends with the escape character.

IndentContLineReader reads lines, but if a line begins with a space or tab, that line is joined to the preceding line. This is designed for reading email or Usenet news (“message”) header lines. Here is an example input file:

From: ian Tuesday, January 1, 2000 8:45 AM EST
To: Book-reviewers List
Received: by darwinsys.com (OpenBSD 2.6)
    from localhost
    at Tuesday, January 1, 2000 8:45 AM EST
Subject: Hey, it's 2000 and MY computer is still up

When read using an IndentContLineReader, this text will come out with the continued lines joined together into longer single lines:

From: ian Tuesday, January 1, 2000 8:45 AM EST
To: Book-reviewers List
Received: by darwinsys.com (OpenBSD 2.6) from localhost at Tuesday, January 1, 
2000 8:45 AM EST
Subject: Hey, it's 2000 and MY computer is still up

This class has a setContinueMode(boolean) method, which lets you turn continuation mode off. This would normally be used to process the body of a message. Since the header and the body are separated by a null line in the text representation of messages, we can process the entire message correctly as follows:

IndentContLineReader is = new IndentContLineReader(
        new StringReader(sampleTxt));
    String aLine;
    // Print Mail/News Header
    System.out.println("----- Message Header -----");
    while ((aLine = is.readLine()) != null && aLine.length(  ) > 0) {
        System.out.println(is.getLineNumber(  ) + ": " + aLine);
    }
    // Make "is" behave like normal BufferedReader
    is.setContinuationMode(false);
    System.out.println(  );
    // Print Message Body
    System.out.println("----- Message Body -----");
    while ((aLine = is.readLine(  )) != null) {
        System.out.println(is.getLineNumber(  ) + ": " + aLine);

Each of the three Reader classes is subclassed from LineNumberReader so that you can use getLineNumber( ) . This is a very useful feature when reporting errors back to the user who prepared an input file; it can save them considerable hunting around in the file if you tell them the line number on which the error occurred. The Reader classes are actually subclassed from an abstract ContLineReader subclass, which I’ll present first (Example 9-2). This class encapsulates the basic functionality for keeping track of lines that need to be joined together, and for enabling/disabling the continuation processing.

Example 9-2. ContLineReader.java

import java.io.*;

/** Subclass of LineNumberReader to allow reading of continued lines
 * using the readLine() method. The other Reader methods (readInt(  )) etc.)
 * must not be used.  Must subclass to provide the actual implementation
 * of readLine(  ).
 */
public abstract class ContLineReader extends LineNumberReader {
    /** Line number of first line in current (possibly continued) line */
    protected int firstLineNumber = 0;
    /** True if handling continuations, false if not; false == "PRE" mode */
    protected boolean doContinue = true;

    /** Set the continuation mode */
    public void setContinuationMode(boolean b) {
        doContinue = b;
    }

    /** Get the continuation mode */
    public boolean isContinuation(  ) {
        return doContinue;
    }

    /** Read one (possibly continued) line, stripping out the  that
     * marks the end of each line but the last in a sequence.
     */
    public abstract String readLine(  ) throws IOException;

    /** Read one real line. Provided as a convenience for the
     * subclasses, so they don't embarass themselves trying to
     * call "super.readLine(  )" which isn't very practical...
     */
    public String readPhysicalLine(  ) throws IOException {
        return super.readLine(  );
    }

    // Can NOT override getLineNumber in this class to return the # 
    // of the beginning of the continued line, since the subclasses
    // all call super.getLineNumber...
    
    /** Construct a ContLineReader with the default input-buffer size. */
    public ContLineReader(Reader in)  {
        super(in);
    }

    /** Construct a ContLineReader using the given input-buffer size. */
    public ContLineReader(Reader in, int sz)  {
        super(in, sz);
    }

    // Methods that do NOT work - redirect straight to parent

    /** Read a single character, returned as an int. */
    public int read(  ) throws IOException {
        return super.read(  );
    }

    /** Read characters into a portion of an array. */
    public int read(char[] cbuf, int off, int len) throws IOException {
        return super.read(cbuf, off, len);
    }

    public boolean markSupported(  ) {
        return false;
    }
}

The ContLineReader class ends with code for handling the read( ) calls so that the class will work correctly. The IndentContLineReader class extends this to allow merging of lines based on indentation. Example 9-3 shows the code for the IndentContLineReader class.

Example 9-3. IndentContLineReader.java

import java.io.*;

/** Subclass of ContLineReader for lines continued by indentation of
 * following line (like RFC822 mail, Usenet News, etc.).
 */
public class IndentContLineReader extends ContLineReader {
    /** Line number of first line in current (possibly continued) line */
    public int getLineNumber(  ) {
        return firstLineNumber;
    }

    protected String prevLine;

    /** Read one (possibly continued) line, stripping out the ''s that
     * mark the end of all but the last.
     */
    public String readLine(  ) throws IOException {
        String s;

        // If we saved a previous line, start with it. Else,
        // read the first line of possible continuation. 
        // If non-null, put it into the StringBuffer and its line 
        // number in firstLineNumber.
        if (prevLine != null) {
            s = prevLine;
            prevLine = null;
        }
        else  {
            s = readPhysicalLine(  );
        }

        // save the line number of the first line.
        firstLineNumber = super.getLineNumber(  );

        // Now we have one line. If we are not in continuation
        // mode, or if a previous readPhysicalLine(  ) returned null,
        // we are finished, so return it.
        if (!doContinue || s == null)
            return s;

        // Otherwise, start building a stringbuffer
        StringBuffer sb = new StringBuffer(s);

        // Read as many continued lines as there are, if any.
        while (true) {
            String nextPart = readPhysicalLine(  );
            if (nextPart == null) {
                // Egad! EOF within continued line.
                // Return what we have so far.
                return sb.toString(  );
            }
            // If the next line begins with space, it's continuation
            if (nextPart.length(  ) > 0 &&
                Character.isWhitespace(nextPart.charAt(0))) {
                sb.append(nextPart);    // and add line.
            } else {
                // else we just read too far, so put in "pushback" holder
                prevLine = nextPart;
                break;
            }
        }

        return sb.toString(  );        // return what's left
    }

    /* Constructors not shown */

    // Built-in test case
    protected static String sampleTxt = 
        "From: ian today now
" +
        "Received: by foo.bar.com
" +
        "    at 12:34:56 January 1, 2000
" +
        "X-Silly-Headers: Too Many
" +
        "This line should be line 5.
" +
        "Test more indented line continues from line 6:
" +
        "    space indented.
" +
        "    tab indented;
" +
        "
" +
        "This is line 10
" + 
        "the start of a hypothetical mail/news message, 
" +
        "that is, it follows a null line.
" +
        "    Let us see how it fares if indented.
" +
        " also space-indented.
" +
        "
" +
        "How about text ending without a newline?";

    // A simple main program for testing the class.
    public static void main(String argv[]) throws IOException {
        IndentContLineReader is = new IndentContLineReader(
            new StringReader(sampleTxt));
        String aLine;
        // Print Mail/News Header
        System.out.println("----- Message Header -----");
        while ((aLine = is.readLine()) != null && aLine.length(  ) > 0) {
            System.out.println(is.getLineNumber(  ) + ": " + aLine);
        }
        // Make "is" behave like normal BufferedReader
        is.setContinuationMode(false);
        System.out.println(  );
        // Print Message Body
        System.out.println("----- Message Body -----");
        while ((aLine = is.readLine(  )) != null) {
            System.out.println(is.getLineNumber(  ) + ": " + aLine);
        }
        is.close(  );
    }
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.188.201