Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

P. Späth, J. FriesenLearn Java for Android Developmenthttps://doi.org/10.1007/978-1-4842-5943-6_14

14. Migrating to New I/O

Peter Späth¹ and Jeff Friesen²

(1)

Leipzig, Sachsen, Germany

(2)

Winnipeg, MB, Canada

Chapters 12 and 13 introduced you to Java’s classic I/O APIs. Chapter 12 presented classic I/O in terms of java.io’s File, RandomAccessFile, stream, and writer/reader types. Chapter 13 presented classic I/O in terms of java.net’s socket and URL types.

Modern operating systems offer powerful I/O features that are not supported by Java’s classic I/O APIs. Features include memory-mapped file I/O (the ability to map part of a process (executing application)’s virtual memory (see http://en.wikipedia.org/wiki/Virtual_memory) to some portion of a file so that writes to or reads from that portion of the process’s memory space actually write/read the associated portion of the file), readiness selection (a step above nonblocking I/O that offloads to the operating system the work involved in checking for I/O stream readiness to perform write and read operations), and file locking (the ability for one process to prevent other processes from accessing a file or to limit the access in some way).

Java 1.4 introduced a more powerful I/O architecture that supports memory-mapped file I/O, readiness selection, file locking, and more. This architecture largely consists of buffers, channels, selectors, regular expressions, and charsets, and it is commonly known as new I/O (NIO) .

Note

Regular expressions are included as part of NIO (see JSR 51 at http://jcp.org/en/jsr/detail?id=51) because NIO is all about performance, and regular expressions are useful for scanning text (read from an I/O source) in a highly performant manner.

In this chapter, we introduce you to NIO in terms of buffers, channels, selectors, regular expressions, and charsets. We also discuss the simple printf-style formatting facility proposed in JSR 51 but not implemented until Java 5.

Working with Buffers

NIO is based on buffers. A buffer is an object that stores a fixed amount of data to be sent to or received from an I/O service (a means for performing input/output). It sits between an application and a channel that writes the buffered data to the service or reads the data from the service and deposits it into the buffer.

Buffers possess four properties:

Capacity: The total number of data items that can be stored in the buffer. The capacity is specified when the buffer is created and cannot be changed later.
Limit: The number of “live” data items in the buffer. No items starting from the zero-based limit should be written or read.
Position: The zero-based index of the next data item that can be read or the location where the data item can be written.
Mark: A zero-based position that can be recalled. The mark is initially undefined.

These four properties are related as follows:

0 <= mark <= position <= limit <= capacity

Figure 14-1 reveals a newly created and byte-oriented buffer with a capacity of 7.

../images/323065_4_En_14_Chapter/323065_4_En_14_Fig1_HTML.png — Figure 14-1
The logical layout of a byte-oriented buffer includes an undefined mark, a current position, a limit, and a capacity

Figure 14-1’s buffer can store a maximum of seven elements. The mark is initially undefined, the position is initially set to 0, and the limit is initially set to the capacity (7), which specifies the maximum number of bytes that can be stored in the buffer. You can only access positions 0 through 6. Position 7 lies beyond the buffer.

Buffer and Its Children

Buffers are implemented by classes that are derived from the abstract java.nio.Buffer class . The API documentation describes Buffer’s methods.

Many of Buffer’s methods return Buffer references so that you can chain instance method calls together. (See Chapter 3 for a discussion on instance method call chaining.) For example, instead of specifying the following three lines,

buf.mark();

buf.position(2);

buf.reset();

you can more conveniently specify the following line:

buf.mark().position(2).reset();

The documentation also shows that all buffers can be read but not all buffers can be written—for example, a buffer backed by a memory-mapped file that’s read-only. You must not write to a read-only buffer; otherwise, ReadOnlyBufferException is thrown. Call isReadOnly() when you’re unsure that a buffer is writable before attempting to write to that buffer.

Caution

Buffers are not thread-safe. You must employ synchronization when you want to access a buffer from multiple threads.

The java.nio package includes several abstract classes that extend Buffer, one for each primitive type except for Boolean: ByteBuffer, CharBuffer, DoubleBuffer, FloatBuffer, IntBuffer, LongBuffer, and ShortBuffer. Furthermore, this package includes MappedByteBuffer as an abstract ByteBuffer subclass.

Note

Operating systems perform byte-oriented I/O, and you use ByteBuffer to create byte-oriented buffers that store the bytes to write to a destination or that are read from a source. The other primitive-type buffer classes let you create multibyte view buffers (discussed later) so that you can conceptually perform I/O in terms of characters, double-precision floating-point values, 32-bit integers, and so on. However, the I/O operation is really being carried out as a flow of bytes.

Listing 14-1 demonstrates the Buffer class in terms of ByteBuffer, capacity, limit, position, and remaining elements.

import java.nio.Buffer;

import java.nio.ByteBuffer;

public class BufferDemo {

public static void main(String[] args) {

Buffer buffer = ByteBuffer.allocate(7);

System.out.println("Capacity: " + buffer.capacity());

System.out.println("Limit: " + buffer.limit());

System.out.println("Position: " + buffer.position());

System.out.println("Remaining: " + buffer.remaining());

System.out.println("Changing buffer limit to 5");

buffer.limit(5);

System.out.println("Limit: " + buffer.limit());

System.out.println("Position: " + buffer.position());

System.out.println("Remaining: " + buffer.remaining());

System.out.println("Changing buffer position to 3");

buffer.position(3);

System.out.println("Position: " + buffer.position());

System.out.println("Remaining: " + buffer.remaining());

System.out.println(buffer);

}

Listing 14-1

Demonstrating a Byte-Oriented Buffer

Listing 14-1’s main() method first needs to obtain a buffer. It cannot instantiate the Buffer class because that class is abstract. Instead, it uses the ByteBuffer class and its allocate() class method to allocate the 7-byte buffer shown in Figure 14-1. main() then calls assorted Buffer methods to demonstrate capacity, limit, position, and remaining elements.

Compile Listing 14-1 as follows:

javac BufferDemo.java

Run this application as follows:

java BufferDemo

You should observe the following output:

Capacity: 7

Limit: 7

Position: 0

Remaining: 7

Changing buffer limit to 5

Limit: 5

Position: 0

Remaining: 5

Changing buffer position to 3

Position: 3

Remaining: 2

java.nio.HeapByteBuffer[pos=3 lim=5 cap=7]

The final output line reveals that the ByteBuffer instance assigned to buffer is actually an instance of the package-private java.nio.HeapByteBuffer class.

The documentation tells you more about the buffers’ capabilities.

Working with Channels

Channels partner with buffers to achieve high-performance I/O. A channel is an object that represents an open connection to a hardware device, a file, a network socket, an application component, or another entity that’s capable of performing write, read, and other I/O operations. Channels efficiently transfer data between byte buffers and I/O service sources or destinations.

Note

Channels are the gateways through which native I/O services are accessed. Channels use byte buffers as the endpoints for sending and receiving data.

There often exists a one-to-one correspondence between an operating system file handle or file descriptor and a channel. When you work with channels in a file context, the channel will often be connected to an open file descriptor. Despite channels being more abstract than file descriptors, they are still capable of modeling an operating system’s native I/O facilities.

Channel and Its Children

Java supports channels via its java.nio.channels and java.nio.channels.spi packages. Applications interact with the types located in the former package; developers who are defining new selector providers work with the latter package. (We will discuss selectors later in this chapter.)

All channels are instances of classes that ultimately implement the java.nio.channels.Channel interface. Channel declares the following methods:

void close(): Closes this channel. When this channel is already closed, invoking close() has no effect. When another thread has already invoked close(), a new close() invocation blocks until the first invocation finishes, after which close() returns without effect. This method throws IOException when an I/O error occurs. After the channel is closed, any further attempts to invoke I/O operations upon it result in java.nio.channels.ClosedChannelException being thrown.
boolean isOpen(): Returns this channel’s open status. This method returns true when the channel is open; otherwise, it returns false.

These methods indicate that only two operations are common to all channels: close the channel and determine whether the channel is open or closed. To support I/O, Channel is extended by the java.nio.channels.WritableByteChannel and java.nio.channels.ReadableByteChannel interfaces:

WritableByteChannel declares an abstract int write(ByteBuffer buffer) method that writes a sequence of bytes from buffer to the current channel. This method returns the number of bytes actually written. It throws java.nio.channels.NonWritableChannelException when the channel was not opened for writing, java.nio.channels.ClosedChannelException when the channel is closed, java.nio.channels.AsynchronousCloseException when another thread closes the channel during the write, java.nio.channels.ClosedByInterruptException when another thread interrupts the current thread while the write operation is in progress (thereby closing the channel and setting the current thread’s interrupt status), and java.io.IOException when some other I/O error occurs.
ReadableByteChannel declares an abstract int read(ByteBuffer buffer) method that reads bytes from the current channel into buffer. This method returns the number of bytes actually read (or -1 when there are no more bytes to read). It throws java.nio.channels.NonReadableChannelException when the channel was not opened for reading; ClosedChannelException when the channel is closed; AsynchronousCloseException when another thread closes the channel during the read; ClosedByInterruptException when another thread interrupts the current thread while the write operation is in progress, thereby closing the channel and setting the current thread’s interrupt status; and IOException when some other I/O error occurs.

Note

A channel whose class implements only WritableByteChannel or ReadableByteChannel is unidirectional. Attempting to read from a writable byte channel or write to a readable byte channel results in a thrown exception.

You can use the instanceof operator to determine if a channel instance implements either interface. Because it’s somewhat awkward to test for both interfaces, Java supplies the java.nio.channels.ByteChannel interface, which is an empty marker interface that subtypes WritableByteChannel and ReadableByteChannel. When you need to learn whether or not a channel is bidirectional, it’s more convenient to specify an expression such as channel instanceof ByteChannel.

Channel is also extended by the java.nio.channels.InterruptibleChannel interface. InterruptibleChannel describes a channel that can be asynchronously closed and interrupted. This interface overrides its Channel superinterface’s close() method header, presenting the following additional stipulation to Channel’s contract for this method: any thread currently blocked in an I/O operation upon this channel will receive AsynchronousCloseException (an IOException descendant).

A channel that implements this interface is asynchronously closeable : when a thread is blocked in an I/O operation on an interruptible channel, another thread may invoke the channel’s close() method. This causes the blocked thread to receive a thrown AsynchronousCloseException instance.

A channel that implements this interface is also interruptible : when a thread is blocked in an I/O operation on an interruptible channel, another thread may invoke the blocked thread’s interrupt() method. Doing this causes the channel to be closed, the blocked thread to receive a thrown ClosedByInterruptException instance, and the blocked thread to have its interrupt status set. (When a thread’s interrupt status is already set and it invokes a blocking I/O operation on a channel, the channel is closed and the thread will immediately receive a thrown ClosedByInterruptException instance; its interrupt status will remain set.)

NIO’s designers chose to shut down a channel when a blocked thread is interrupted because they couldn’t find a way to handle interrupted I/O operations reliably in the same manner across platforms. The only way to guarantee deterministic behavior was to shut down the channel.

Tip

You can determine whether or not a channel supports asynchronous closing and interruption by using the instanceof operator in an expression such as channel instanceof InterruptibleChannel.

You previously learned that you must call a class method on a Buffer subclass to obtain a buffer. Regarding channels, there are two ways to obtain a channel:

The java.nio.channels package provides a Channels utility class that offers two methods for obtaining channels from streams—for each of the following methods, the underlying stream is closed when the channel is closed, and the channel isn’t buffered:
- WritableByteChannel newChannel(OutputStream outputStream) returns a writable byte channel for the given outputStream.
- ReadableByteChannel newChannel(InputStream inputStream) returns a readable byte channel for the given inputStream.
Various classic I/O classes have been retrofitted to support channel creation. For example, RandomAccessFile declares a FileChannel getChannel() method for returning a file channel instance, and java.net.Socket declares a SocketChannel getChannel() method for returning a socket channel.

Listing 14-2 uses the Channels class to obtain channels for the standard input and output streams and then uses these channels to copy bytes from the input channel to the output channel.

import java.io.IOException;

import java.nio.ByteBuffer;

import java.nio.channels.Channels;

import java.nio.channels.ReadableByteChannel;

import java.nio.channels.WritableByteChannel;

public class ChannelDemo {

public static void main(String[] args) {

ReadableByteChannel src = Channels.newChannel(System.in);

WritableByteChannel dest = Channels.newChannel(System.out);

try {

copy(src, dest); // or copyAlt(src, dest);

} catch (IOException ioe) {

System.err.println("I/O error: " + ioe.getMessage());

} finally {

try {

src.close();

dest.close();

} catch (IOException ioe) {

}

private static void copy(ReadableByteChannel src, WritableByteChannel dest)

throws IOException

{

ByteBuffer buffer = ByteBuffer.allocateDirect(2048);

while (src.read(buffer) != -1) {

buffer.flip();

dest.write(buffer);

buffer.compact();

}

buffer.flip();

while (buffer.hasRemaining())

dest.write(buffer);

}

private static void copyAlt(ReadableByteChannel src, WritableByteChannel dest)

throws IOException

{

ByteBuffer buffer = ByteBuffer.allocateDirect(2048);

while (src.read(buffer) != -1) {

buffer.flip();

while (buffer.hasRemaining())

dest.write(buffer);

buffer.clear();

}

Listing 14-2

Copying Bytes from an Input Channel to an Output Channel

Listing 14-2 presents two approaches to copying bytes from the standard input stream to the standard output stream. In the first approach, which is exemplified by the copy() method, the goal is to minimize native I/O calls (via the write() method calls), although more data may end up being copied as a result of the compact() method calls. In the second approach, as demonstrated by copyAlt(), the goal is to eliminate data copying, although more native I/O calls might occur.

Each of copy() and copyAlt() first allocates a direct byte buffer (recall that a direct byte buffer is the most efficient means for performing I/O on the virtual machine) and enters a while loop that continually reads bytes from the source channel until end of input (read() returns -1). Following the read, the buffer is flipped so that it can be drained. Here is where the methods diverge:

The copy() method while loop makes a single call to write(). Because write() might not completely drain the buffer, compact() is called to compact the buffer before the next read. Compaction ensures that unwritten buffer content isn’t overwritten during the next read operation. Following the while loop, copy() flips the buffer in preparation for draining any remaining content and then works with hasRemaining() and write() to drain the buffer completely.
The copyAlt() method while loop contains a nested while loop that works with hasRemaining() and write() to continue draining the buffer until the buffer is empty. This is followed by a clear() method call that empties the buffer so that it can be filled on the next read() call.

Note

It’s important to realize that a single write() method call may not output the entire content of a buffer. Similarly, a single read() call may not completely fill a buffer.

Compile Listing 14-2 via the following command line:

javac ChannelDemo.java

Run ChannelDemo via the following command lines:

java ChannelDemo

java ChannelDemo <ChannelDemo.java >ChannelDemo.bak

The first command line copies standard input to standard output. The second command line copies the contents of ChannelDemo.java to ChannelDemo.bak. After testing the copy() method, replace copy(src, dest); with copyAlt(src, dest); and repeat.

Working with Selectors

I/O is either block-oriented (such as file I/O) or stream-oriented (such as network I/O). Streams are often slower than block devices (such as fixed disks) and read/write operations often cause the calling thread to block until input is available or output has been fully written. To compensate, modern operating systems let streams operate in nonblocking mode , which makes it possible for a thread to read or write data without blocking. The operation fully succeeds, or it indicates that the read/write isn’t possible at that time. Either way, the thread is able to perform other useful work instead of waiting.

Nonblocking mode doesn’t let an application determine if it can perform an operation without actually performing the operation. For example, when a nonblocking read operation succeeds, the application learns that the read operation is possible but also has read some data that must be managed. This duality prevents you from separating code that checks for stream readiness from the data-processing code without making your code significantly complicated.

Nonblocking mode serves as a foundation for performing readiness selection , which offloads to the operating system the work involved in checking for I/O stream readiness to perform write, read, and other operations. The operating system is instructed to observe a group of streams and return some indication of which streams are ready to perform a specific operation (such as read) or operations (such as accept and read). This capability lets a thread multiplex a potentially huge number of active streams by using the readiness information provided by the operating system. In this way, network servers can handle large numbers of network connections; they are vastly scalable.

Note

Modern operating systems make readiness selection available to applications by providing system calls such as the POSIX select() call.

Selectors let you achieve readiness selection in a Java context. In this section, we first introduce you to selector fundamentals and then provide a demonstration.

Selector Fundamentals

A selector is an object created from a subclass of the abstract java.nio.channels.Selector class. It maintains a set of channels, which it examines to determine which of them are ready for reading, writing, completing a connection sequence, accepting another connection, or some combination of these tasks. The actual work is delegated to the operating system via a POSIX select() or similar system call.

Note

The ability to check a channel without having to wait when something isn’t ready (such as bytes are not available for reading) and without also having to perform the operation while checking is the key to scalability. A single thread can manage a huge number of channels, which reduces code complexity and potential threading issues.

Selectors are used with selectable channels , which are objects whose classes ultimately inherit from the abstract SelectableChannel class , which describes a channel that can be multiplexed by a selector. Socket channels, server socket channels, datagram channels, and pipe source/sink channels are selectable channels because SocketChannel, ServerChannel, DatagramChannel, Pipe.SinkChannel, and Pipe.SourceChannel are derived from SelectableChannel. In contrast, file channels are not selectable channels because FileChannel doesn’t include SelectableChannel in its ancestry.

One or more previously created selectable channels are registered with a selector. Each registration returns a key (described by a concrete instance of the abstract java.nio.channels.SelectionKey class) that’s a token signifying the relationship between one channel and the selector. This key keeps track of two sets of operations: interest set and ready set. The interest set identifies the operation categories that will be tested for readiness the next time one of the selector’s selection methods is invoked. The ready set identifies the operation categories for which the key’s channel has been found to be ready by the key’s selector. When a selection method is invoked, the selector’s associated keys are updated by checking all channels registered with that selector. The application can then obtain a set of keys whose channels were found ready, and iterate over these keys to service each channel that has become ready since a previous select method call.

Note

A selectable channel can be registered with more than one selector. It has no knowledge of the selectors to which it’s currently registered.

To work with selectors, you first need to create one. You can accomplish this task by invoking Selector’s Selector open() class method. This method returns a Selector instance on success or throws IOException on failure. The following code fragment demonstrates this task:

Selector selector = Selector.open();

You can create your selectable channels before or after creating the selector. However, you must ensure that each channel is in nonblocking mode before registering the channel with the selector. You register a selectable channel with a selector by invoking either of the following SelectableChannel registration methods:

SelectionKey register(Selector sel, int ops)
SelectionKey register(Selector sel, int ops, Object att)

Each method requires that you pass a previously created selector to sel and a bitwise ORed combination of the following SelectionKey int-based constants to ops, which signifies the interest set:

OP_ACCEPT: Operation-set bit for socket-accept operations
OP_CONNECT: Operation-set bit for socket-connect operations
OP_READ: Operation-set bit for read operations
OP_WRITE: Operation-set bit for write operations

The second method also lets you pass an arbitrary java.lang.Object/subclass instance (or null) to att. The nonnull object is known as an attachment , and it is a convenient way of recognizing a given channel or attaching additional information to the channel. It’s stored in the SelectionKey instance returned from this method.

Upon success, each method returns a SelectionKey instance that relates the selectable channel with the selector. Upon failure, an exception is thrown. For example, ClosedChannelException is thrown when the channel is closed and IllegalBlockingModeException is thrown when the channel hasn’t been set to nonblocking mode.

The following code fragment extends the previous code fragment by configuring a previously created channel to nonblocking mode and registering the channel with the selector, whose selection methods are to test the channel for accept, read, and write readiness:

channel.configureBlocking(false);

SelectionKey key = channel.register(selector, SelectionKey.OP_ACCEPT |

SelectionKey.OP_READ |

SelectionKey.OP_WRITE);

At this point, the application typically enters an infinite loop where it accomplishes the following tasks:

1.
Performs a selection operation
2.
Obtains the selected keys followed by an iterator over the selected keys
3.
Iterates over these keys and perform channel operations

A selection operation is performed by invoking one of Selector’s selection methods. For example, int select() performs a blocking selection operation. It doesn’t return until at least one channel is selected, this selector’s wakeup() method is invoked, or the current thread is interrupted, whichever comes first.

Note

Selector also declares an int select(long timeout) method that doesn’t return until at least one channel is selected, this selector’s wakeup() method is invoked, the current thread is interrupted, or the timeout value expires, whichever comes first. Additionally, Selector declares int selectNow(), which is a nonblocking version of select().

The select() method returns the number of channels that have become ready since the last time it was called. For example, if you call select() and it returns 1 because one channel has become ready, and if you call select() again and a second channel has become ready, select() will once again return 1. If you’ve not yet serviced the first ready channel, you now have two ready channels to service. However, only one channel became ready between these select() calls.

A set of the selected keys (the ready set) is now obtained by invoking Selector’s Set<SelectionKey> selectedKeys() method. Invoke the set’s iterator() method to obtain an iterator over these keys.

Finally, the application iterates over the keys. For each of the iterations, a SelectionKey instance is returned. Some combination of SelectionKey’s boolean isAcceptable(), boolean isConnectable(), boolean isReadable(), and boolean isWritable() methods are called to determine if the key indicates that a channel is ready to accept a connection, finished connecting, readable, or writable.

Note

The aforementioned methods offer a convenient alternative to specifying expressions such as key.readyOps() & OP_READ != 0. SelectionKey’s int readyOps() method returns the key’s ready set. The returned set will only contain operation bits that are valid for this key’s channel. For example, it never returns an operation bit that indicates that a read-only channel is ready for writing. Note that every selectable channel also declares an int validOps() method, which returns a bitwise ORed set of operations that are valid for the channel.

Once the application determines that a channel is ready to perform a specific operation, it can call SelectionKey’s SelectableChannel channel() method to obtain the channel and then perform work on that channel.

Note

SelectionKey also declares a Selector selector() method that returns the selector for which the key was created.

When you’re finished processing a channel, you must remove the key from the set of keys; the selector doesn’t perform this task. The next time the channel becomes ready, the Selector will add the key to the selected key set.

The following code fragment continues from the previous code fragment and demonstrates the aforementioned tasks:

while (true) {

int numReadyChannels = selector.select();

if (numReadyChannels == 0)

continue; // There are no ready channels to process.

Set<SelectionKey> selectedKeys = selector.selectedKeys();

Iterator<SelectionKey> keyIterator = selectedKeys.iterator();

while (keyIterator.hasNext()) {

SelectionKey key = keyIterator.next();

if (key.isAcceptable()) {

// A connection was accepted by a ServerSocketChannel.

ServerSocketChannel server = (ServerSocketChannel) key.channel();

SocketChannel client = server.accept();

if (client == null) // in case accept() returns null

continue;

client.configureBlocking(false); // must be nonblocking

// Register socket channel with selector for read operations.

client.register(selector, SelectionKey.OP_READ);

} else if (key.isReadable()) {

// A socket channel is ready for reading.

SocketChannel client = (SocketChannel) key.channel();

// Perform work on the socket channel.

} else if (key.isWritable()) {

// A socket channel is ready for writing.

SocketChannel client = (SocketChannel) key.channel();

// Perform work on the socket channel.

}

keyIterator.remove();

}

In addition to registering the server socket channel with the selector, each incoming client socket channel is also registered with the server socket channel. When a client socket channel becomes ready for read or write operations, key.isReadable() or key.isWritable() for the associated socket channel returns true and the socket channel can be read or written.

A key represents a relationship between a selectable channel and a selector. This relationship can be terminated by invoking SelectionKey’s void cancel() method. Upon return, the key will be invalid and will have been added to its selector’s canceled-key set. The key will be removed from all of the selector’s key sets during the next selection operation.

When you’re finished with a selector, call Selector’s void close() method. If a thread is currently blocked in one of this selector’s selection methods, it’s interrupted as if by invoking the selector’s wakeup() method. Any uncanceled keys still associated with this selector are invalidated, their channels are deregistered, and any other resources associated with this selector are released. If this selector is already closed, invoking close() has no effect.

Selector Demonstration

Selectors are commonly used in server applications. Listing 14-3 presents the source code to a server application that sends its local time to clients.

import java.io.IOException;

import java.net.InetSocketAddress;

import java.net.ServerSocket;

import java.nio.ByteBuffer;

import java.nio.channels.SelectionKey;

import java.nio.channels.Selector;

import java.nio.channels.ServerSocketChannel;

import java.nio.channels.SocketChannel;

import java.util.Iterator;

public class SelectorServer {

private final static int DEFAULT_PORT = 9999;

private static ByteBuffer bb = ByteBuffer.allocateDirect(8);

public static void main(String[] args) throws IOException {

int port = DEFAULT_PORT;

if (args.length > 0)

port = Integer.parseInt(args[0]);

System.out.println("Server starting ... listening on port " + port);

ServerSocketChannel ssc = ServerSocketChannel.open();

ServerSocket ss = ssc.socket();

ss.bind(new InetSocketAddress(port));

ssc.configureBlocking(false);

Selector s = Selector.open();

ssc.register(s, SelectionKey.OP_ACCEPT);

while (true) {

int n = s.select();

if (n == 0)

continue;

Iterator it = s.selectedKeys().iterator();

while (it.hasNext()) {

SelectionKey key = (SelectionKey) it.next();

if (key.isAcceptable()) {

SocketChannel sc = ((ServerSocketChannel) key.channel()).accept();

if (sc == null)

continue;

System.out.println("Receiving connection");

bb.clear();

bb.putLong(System.currentTimeMillis());

bb.flip();

System.out.println("Writing current time");

while (bb.hasRemaining())

sc.write(bb);

sc.close();

}

it.remove();

}

Listing 14-3

Serving Time to Clients

Listing 14–3’s server application consists of a SelectorServer class. This class allocates a direct byte buffer after this class is loaded.

When the main() method is executed, it first checks for a command-line argument, which is assumed to represent a port number. If no argument is specified, a default port number is used; otherwise, main() tries to convert it to an integer representing the port by passing the argument to Integer.parseInt(). (Remember that this method throws java.lang.NumberFormatException when a noninteger argument is passed.)

After outputting a startup message that identifies the listening port, main() obtains a server socket channel followed by the underlying socket, which is bound to the specified port. The server socket channel is then configured for nonblocking mode in preparation for registering this channel with a selector.

A selector is now obtained and the server socket channel registers itself with the selector so that it can learn when the channel is ready to perform an accept operation. The returned key isn’t saved because it’s never canceled (and the selector is never closed).

main() now enters an infinite loop, first invoking the selector’s select() method. If the server socket channel isn’t ready (select() returns 0), the rest of the loop is skipped.

The selected keys (just one key) along with an iterator for iterating over them are now obtained and main() enters an inner loop to loop over these keys. Each key’s isAcceptable() method is invoked to find out if the server socket channel is ready to perform an accept operation. If this is the case, the channel is obtained and cast to ServerSocketChannel, and ServerSocketChannel’s accept() method is called to accept the new connection.

To guard against the unlikely possibility of the returned SocketChannel instance being null (accept() returns null when the server socket channel is in nonblocking mode and no connection is available to be accepted), main() tests for this scenario and continues the loop when null is detected.

A message about receiving a connection is output, and the byte buffer is cleared in preparation for storing the local time. After this long integer has been stored in the buffer, the buffer is flipped in preparation for draining. A message about writing the current time is output and the buffer is drained. The socket channel is then closed and the key is removed from the set of keys.

Compile Listing 14-3 as follows:

javac SelectorServer.java

Run this application as follows:

java SelectorServer

You should observe the following output and the server should continue to run:

Server starting ... listening on port 9999

We need a client to exercise this server. Listing 14-4 presents the source code to a sample client application.

import java.io.IOException;

import java.net.InetSocketAddress;

import java.nio.ByteBuffer;

import java.nio.channels.SocketChannel;

import java.util.Date;

public class SelectorClient {

private final static int DEFAULT_PORT = 9999;

private static ByteBuffer bb = ByteBuffer.allocateDirect(8);

public static void main(String[] args) {

int port = DEFAULT_PORT;

if (args.length > 0)

port = Integer.parseInt(args[0]);

try {

SocketChannel sc = SocketChannel.open();

InetSocketAddress addr = new InetSocketAddress("localhost", port);

sc.connect(addr);

long time = 0;

while (sc.read(bb) != -1) {

bb.flip();

while (bb.hasRemaining()){

time <<= 8;

time |= bb.get() & 255;

}

bb.clear();

}

System.out.println(new Date(time));

sc.close();

} catch (IOException ioe) {

System.err.println("I/O error: " + ioe.getMessage());

}

Listing 14-4

Receiving Time from the Server

Listing 14-4 is much simpler than Listing 14-3 because selectors aren’t used. There’s no need for a selector in this simple application. You would typically use selectors in a client context when the client interacts with several servers.

There are a couple of interesting items in the source code:

bb.get() returns a 32-bit integer representation of an 8-bit byte. Sign extension is used for byte values greater than 127, which are regarded as negative numbers. Because leading one bits affect the result after bitwise ORing them with time, they are removed by bitwise ANDing the integer with 255.
This value in time is passed to the java.util.Date(long time) constructor when a new Date object is constructed. In turn, the Date object is passed to System.out.println(), which invokes Date’s toString() method to obtain a human-readable date/time string.

Compile Listing 14-4 as follows:

javac SelectorClient.java

In a second command window, run this application as follows:

java SelectorClient

You should observe output similar to the following:

Mon Jan 13 18:48:10 CST 2014

In the server command window, you should observe the following messages:

Receiving connection

Writing current time

Working with Regular Expressions

Text-processing applications often need to match text against patterns (character strings that concisely describe sets of strings that are considered to be matches). For example, an application might need to locate all occurrences of a specific word pattern in a text file so that it can replace those occurrences with another word. NIO includes regular expressions to help text-processing applications perform pattern matching with high performance.

Note

Despite regular expressions showing up in the NIO chapter, they are extremely powerful and you can use them everywhere in your application and outside any input or output processing context.

Pattern, PatternSyntaxException, and Matcher

A regular expression (also known as a regex or regexp) is a string-based pattern that represents the set of strings that match this pattern. The pattern consists of literal characters and metacharacters, which are characters with special meanings instead of literal meanings.

The Regular Expressions API provides the java.util.regex.Pattern class to represent patterns via compiled regexes. Regexes are compiled for performance reasons; pattern matching via compiled regexes is much faster than if the regexes were not compiled. The API documentation tells you all about Pattern’s methods.

A method of particular importance is the static compile(String) method. You can use it to prepare a pattern string for regular expression operations. In case you have several such operations, for example, inside a loop, preparing the pattern once and then using it often is much faster compared to building the same pattern over and over again.

Finally, the Matcher matcher(CharSequence input) method reveals that the Regular Expressions API also provides the Matcher class, whose matchers attempt to match compiled regexes against input text. The method you probably use most often to obtain matchers is Pattern.matches(). Or you use method find() repeatedly to iterate over all matches.

Note

A matcher finds matches in a subset of its input called the region. By default, the region contains all of the matcher’s input. The region can be modified by calling Matcher’s Matcher region(int start, int end) method (set the limits of this matcher’s region) and queried by calling Matcher’s int regionStart() and int regionEnd() methods.

We’ve created a simple application that demonstrates Pattern, PatternSyntaxException, and Matcher. Listing 14-5 presents this application’s source code.

import java.util.regex.Matcher;

import java.util.regex.Pattern;

import java.util.regex.PatternSyntaxException;

public class RegExDemo {

public static void main(String[] args) {

if (args.length != 2) {

System.err.println("usage: java RegExDemo regex input");

return;

}

try {

System.out.println("regex = " + args[0]);

System.out.println("input = " + args[1]);

Pattern p = Pattern.compile(args[0]);

Matcher m = p.matcher(args[1]);

while (m.find())

System.out.println("Located [" + m.group() + "] starting at "

+ m.start() + " and ending at " + (m.end() - 1));

} catch (PatternSyntaxException pse) {

System.err.println("Bad regex: " + pse.getMessage());

System.err.println("Description: " + pse.getDescription());

System.err.println("Index: " + pse.getIndex());

System.err.println("Incorrect pattern: " + pse.getPattern());

}

Listing 14-5

Playing with Regular Expressions

Compile Listing 14-5 as follows:

javac RegExDemo.java

Run this application as follows:

java RegExDemo ox ox

You’ll discover the following output:

regex = ox

input = ox

Located [ox] starting at 0 and ending at 1

find() searches for a match by comparing regex characters with the input characters in left-to-right order and returns true because o equals o and x equals x.

Continue by executing the following command:

java RegExDemo box ox

This time, you’ll discover the following output:

regex = box

input = ox

find() first compares regex character b with input character o. Because these characters are not equal and because there are not enough characters in the input to continue the search, find() doesn’t output a “Located” message to indicate a match.

However, if you execute java RegExDemo ox box, you’ll discover a match:

regex = ox

input = box

Located [ox] starting at 1 and ending at 2

The ox regex consists of literal characters. More sophisticated regexes combine literal characters with metacharacters (such as the period [.]) and other regex constructs.

Tip

To specify a metacharacter as a literal character, precede the metacharacter with a backslash character (as in .) or place the metacharacter between Q and E (as in Q.E). In either case, make sure to double the backslash character when the escaped metacharacter appears in a string literal, such as "\." or "\Q.\E".

The period metacharacter matches all characters except for the line terminator. For example, each of java RegExDemo .ox box and java RegExDemo .ox fox reports a match because the period matches the b in box and the f in fox.

Note

Pattern recognizes the following line terminators: carriage return ( ), newline (line feed) ( ), carriage return immediately followed by newline ( ), next line (u0085), line separator (u2028), and paragraph separator (u2029). The period metacharacter can also be made to match these line terminators by specifying the Pattern.DOTALL flag when calling Pattern.compile(String, int).

Character Classes

A character class is a set of characters appearing between [ and ]. There are six kinds of character classes:

A simple character class consists of literal characters placed side by side and matches only these characters. For example, [abc] consists of characters a, b, and c. Also, java RegExDemo t[aiou]ck tack reports a match because a is a member of [aiou]. It also reports a match when the input is tick, tock, or tuck because i, o, and u are members.
A negation character class consists of a circumflex metacharacter (^), followed by literal characters placed side by side, and it matches all characters except for those in the class. For example, [^abc] consists of all characters except for a, b, and c. Also, java RegExDemo "[^b]ox" box doesn’t report a match because b isn’t a member of [^b], whereas java RegExDemo "[^b]ox" fox reports a match because f is a member. (The double quotes surrounding [^b]ox are necessary on our Windows 7 platform because ^ is treated specially at the command line.)
A range character class consists of successive literal characters expressed as a starting literal character, followed by the hyphen metacharacter (-), followed by an ending literal character, and matches all characters in this range. For example, [a-z] consists of all characters from a through z. Also, java RegExDemo [h-l]ouse house reports a match because h is a member of the class, whereas java RegExDemo [h-l]ouse mouse doesn’t report a match because m lies outside of the range and is therefore not part of the class. You can combine multiple ranges within the same range character class by placing them side by side; for example, [A-Za-z] consists of all uppercase and lowercase Latin letters.
A union character class consists of multiple nested character classes and matches all characters that belong to the resulting union. For example, [abc[u-z]] consists of characters a, b, c, u, v, w, x, y, and z. Also, java RegExDemo [[0-9][A-F][a-f]] e reports a match because e is a hexadecimal character. (We could have alternatively expressed this character class as [0-9A-Fa-f] by combining multiple ranges.)
An intersection character class consists of multiple &&-separated nested character classes and matches all characters that are common to these nested character classes. For example, [a-c&&[c-f]] consists of character c, which is the only character common to [a-c] and [c-f]. Also, java RegExDemo "[aeiouy&&[y]]" y reports a match because y is common to classes [aeiouy] and [y].
A subtraction character class consists of multiple &&-separated nested character classes, where at least one nested character class is a negation character class, and it matches all characters except for those indicated by the negation character class/classes. For example, [a-z&&[^x-z]] consists of characters a through w. (The square brackets surrounding ^x-z are necessary; otherwise, ^ is ignored and the resulting class consists of only x, y, and z.) Also, java RegExDemo "[a-z&&[^aeiou]]" g reports a match because g is a consonant and only consonants belong to this class. (We’re ignoring y, which is sometimes regarded as a consonant and sometimes regarded as a vowel.)

A predefined character class is a regex construct for a commonly specified character class. Table 14-1 identifies Pattern’s predefined character classes.

Table 14-1

Predefined Character Classes

Predefined Character Class	Description
d	Matches any digit character. d is equivalent to [0-9].
D	Matches any nondigit character. D is equivalent to [^d].
s	Matches any whitespace character. s is equivalent to [ x0Bf ].
S	Matches any nonwhitespace character. S is equivalent to [^s].
w	Matches any word character. w is equivalent to [a-zA-Z0-9].
W	Matches any nonword character. W is equivalent to [^w].

For example, the following command reports a match because w matches the word character a in abc:

java RegExDemo wbc abc

Capturing Groups

A capturing group saves a match’s characters for later recall during pattern matching and is expressed as a character sequence surrounded by parentheses metacharacters ( and ). All characters within a capturing group are treated as a unit. For example, the (Android) capturing group combines A, n, d, r, o, i, and d into a unit. It matches the Android pattern against all occurrences of Android in the input. Each match replaces the previous match’s saved Android characters with the next match’s Android characters.

Capturing groups can appear inside other capturing groups. For example, capturing groups (A) and (B(C)) appear inside capturing group ((A)(B(C))), and capturing group (C) appears inside capturing group (B(C)). Each nested or nonnested capturing group receives its own number, numbering starts at 1, and capturing groups are numbered from left to right. For example, ((A)(B(C))) is assigned 1, (A) is assigned 2, (B(C)) is assigned 3, and (C) is assigned 4.

A capturing group saves its match for later recall via a back reference, which is a backslash character followed by a digit character denoting a capturing group number. The back reference causes the matcher to use the back reference’s capturing group number to recall the capturing group’s saved match and then use that match’s characters to attempt a further match. The following example uses a back reference to determine if the input consists of two consecutive Android patterns:

java RegExDemo "(Android) 1" "Android Android"

RegExDemo reports a match because the matcher detects Android, followed by a space, followed by Android in the input.

Boundary Matchers and Zero-Length Matches

A boundary matcher is a regex construct for identifying the beginning of a line, a word boundary, the end of text, and other commonly occurring boundaries. See Table 14-2.

Table 14-2

Boundary Matchers

Boundary Matcher	Description
^	Matches the beginning of the line.
$	Matches the end of the line.
	Matches the word boundary.
B	Matches a nonword boundary.
A	Matches the beginning of text.
G	Matches the end of the previous match.
	Matches the end of text except for line terminator (when present).
z	Matches the end of text.

Consider the following example:

java RegExDemo "I think"

This example reports several matches, as revealed in the following output:

regex =

input = I think

Located [] starting at 0 and ending at -1

Located [] starting at 1 and ending at 0

Located [] starting at 2 and ending at 1

Located [] starting at 7 and ending at 6

This output reveals several zero-length matches . When a zero-length match occurs, the starting and ending indexes are equal, although the output shows the ending index to be one less than the starting index because we specified end() - 1 in Listing 14-5 (so that a match’s end index identifies a non-zero-length match’s last character, not the character following the non-zero-length match’s last character).

Note

A zero-length match occurs in empty input text, at the beginning of input text, after the last character of input text, or between any two characters of that text. Zero-length matches are easy to identify because they always start and end at the same index position.

Quantifiers

The final regex construct we present is the quantifier, a numeric value implicitly or explicitly bound to a pattern. Quantifiers are categorized as greedy, reluctant, or possessive:

A greedy quantifier (?, *, or +) attempts to find the longest match. Specify X? to find one or no occurrences of X; X* to find zero or more occurrences of X; X+ to find one or more occurrences of X; X{n} to find n occurrences of X; X{n,} to find at least n (and possibly more) occurrences of X; and X{n,m} to find at least n but no more than m occurrences of X.
A reluctant quantifier (??, *?, or +?) attempts to find the shortest match. Specify X?? to find one or no occurrences of X; X*? to find zero or more occurrences of X; X+? to find one or more occurrences of X; X{n}? to find n occurrences of X; X{n,}? to find at least n (and possibly more) occurrences of X; and X{n,m}? to find at least n but no more than m occurrences of X.
A possessive quantifier (?+, *+, or ++) is similar to a greedy quantifier except that a possessive quantifier only makes one attempt to find the longest match, whereas a greedy quantifier can make multiple attempts. Specify X?+ to find one or no occurrences of X; X*+ to find zero or more occurrences of X; X++ to find one or more occurrences of X; X{n}+ to find n occurrences of X; X{n,}+ to find at least n (and possibly more) occurrences of X; and X{n,m}+ to find at least n but no more than m occurrences of X.

For an example of a greedy quantifier, execute the following command:

java RegExDemo .*end "wend rend end"

You’ll discover the following output:

regex = .*end

input = wend rend end

Located [wend rend end] starting at 0 and ending at 12

The greedy quantifier (.*) matches the longest sequence of characters that terminates in end. It starts by consuming all of the input text and then is forced to back off until it discovers that the input text terminates with these characters.

For an example of a reluctant quantifier, execute the following command:

java RegExDemo .*?end "wend rend end"

You’ll discover the following output:

regex = .*?end

input = wend rend end

Located [wend] starting at 0 and ending at 3

Located [ rend] starting at 4 and ending at 8

Located [ end] starting at 9 and ending at 12

The reluctant quantifier (.*?) matches the shortest sequence of characters that terminates in end. It begins by consuming nothing and then slowly consumes characters until it finds a match. It then continues until it exhausts the input text.

For an example of a possessive quantifier, execute the following command:

java RegExDemo .*+end "wend rend end"

You’ll discover the following output:

regex = .*+end

input = wend rend end

The possessive quantifier (.*+) doesn’t detect a match because it consumes the entire input text, leaving nothing left over to match end at the end of the regex. Unlike a greedy quantifier, a possessive quantifier doesn’t back off.

While working with quantifiers, you’ll probably encounter zero-length matches. For example, execute the following command:

java RegExDemo 1? 101101

You should observe the following output:

regex = 1?

input = 101101

Located [1] starting at 0 and ending at 0

Located [] starting at 1 and ending at 0

Located [1] starting at 2 and ending at 2

Located [1] starting at 3 and ending at 3

Located [] starting at 4 and ending at 3

Located [1] starting at 5 and ending at 5

Located [] starting at 6 and ending at 5

The result of this greedy quantifier is that 1 is detected at locations 0, 2, 3, and 5 in the input text and that nothing is detected (a zero-length match) at locations 1, 4, and 6.

This time, execute the following command:

java RegExDemo 1?? 101101

You should observe the following output:

regex = 1??

input = 101101

Located [] starting at 0 and ending at -1

Located [] starting at 1 and ending at 0

Located [] starting at 2 and ending at 1

Located [] starting at 3 and ending at 2

Located [] starting at 4 and ending at 3

Located [] starting at 5 and ending at 4

Located [] starting at 6 and ending at 5

This output might look surprising, but remember that a reluctant quantifier looks for the shortest match, which (in this case) is no match at all.

Finally, execute the following command:

java RegExDemo 1+? 101101

You should observe the following output:

regex = 1+?

input = 101101

Located [1] starting at 0 and ending at 0

Located [1] starting at 2 and ending at 2

Located [1] starting at 3 and ending at 3

Located [1] starting at 5 and ending at 5

This possessive quantifier only matches the locations where 1 is detected in the input text. It doesn’t perform zero-length matches.

Practical Regular Expressions

Most of the previous regex examples haven’t been practical, except to help you grasp how to use the various regex constructs. In contrast, the following examples reveal a regex that matches phone numbers of the form (ddd) ddd-dddd or ddd-dddd. A single space appears between (ddd) and ddd; there’s no space on either side of the hyphen.

java RegExDemo "((d{3}))?s*d{3}-d{4}" "(800) 555-1212"

regex = ((d{3}))?s*d{3}-d{4}

input = (800) 555-1212

Located [(800) 555-1212] starting at 0 and ending at 13

java RegExDemo "((d{3}))?s*d{3}-d{4}" 555-1212

regex = ((d{3}))?s*d{3}-d{4}

input = 555-1212

Located [555-1212] starting at 0 and ending at 7

Note

To learn more about regular expressions, check out “Lesson: Regular Expressions” (http://download.oracle.com/javase/tutorial/essential/regex/index.html) in The Java Tutorials.

Working with Charsets

In Chapter 12, we briefly introduced the concepts of character set and character encoding. We also referred to some of the types located in the java.nio.charset package. In this section, we expand on these topics and explore this package in more detail. We also briefly revisit the String class, discussing that part of String that’s relevant to the discussion.

A Brief Review of the Fundamentals

Java uses Unicode to represent characters. (Unicode is a 16-bit character set standard [actually, more of an encoding standard because some characters are represented by multiple numeric values; each value is known as a code point ] whose goal is to map all of the world’s significant character sets into an all-encompassing mapping.) Although Unicode makes it much easier to work with characters from different languages, it doesn’t automate everything and you often need to work with charsets. Before we dig into this topic, you should understand the following terms:

Character : A meaningful symbol. For example, “$” and “E” are characters. These symbols predate the computer era.
Character set : A set of characters. For example, uppercase letters A through Z could be considered to form a character set. No numeric values are assigned to the characters in the set. There is no relationship to Unicode, ASCII, EBCDIC, or any other kind of character set standard.
Coded character set : A character set where each character is assigned a unique numeric value. Standards bodies such as US-ASCII or ISO-8859-1 define mappings from characters to numeric values.
Character-encoding scheme : An encoding of a coded character set’s numeric values to sequences of bytes that represent these values. Some encodings are one-to-one. For example, in ASCII, character A is mapped to integer 65 and encoded as integer 65. For some other mappings, encodings are one-to-one or one-to-many. For example, UTF-8 encodes Unicode characters. Each character whose numeric value is less than 128 is encoded as a single byte to be compatible with ASCII. Other Unicode characters are encoded as 2- to 6-byte sequences. See www.ietf.org/rfc/rfc2279.txt for more information.
Charset : A coded character set combined with a character-encoding scheme. Charsets are described by the abstract java.nio.charset.CharSet class.

Although Unicode is widely used and increasing in popularity, other character set standards are also used. Because operating systems perform I/O at the byte level, and because files store data as byte sequences, it’s necessary to translate between byte sequences and the characters that are encoded into these sequences. Charset and the other classes located in the java.nio.charset package address this translation task.

Working with Charsets

Beginning with JDK 1.4, virtual machines were required to support a standard collection of charsets and could support additional charsets. They also support the default charset, which doesn’t have to be one of the standard charsets and is obtained when the virtual machine starts running. Table 14-3 identifies and describes the standard charsets.

Table 14-3

Standard Charsets

Charset Name	Description
US-ASCII	The 7-bit ASCII that forms the American English character set. Also known as the basic Latin block in Unicode.
ISO-8859-1	The 8-bit character set used by most European languages. It’s a superset of ASCII and includes most non-English European characters.
UTF-8	An 8-bit byte-oriented character encoding for Unicode. Characters are encoded in 1 to 6 bytes.
UTF-16BE	A 16-bit encoding using big-endian order for Unicode. Characters are encoded in 2 bytes with the high-order 8 bits written first.
UTF-16LE	A 16-bit encoding using little-endian order for Unicode. Characters are encoded in 2 bytes with the low-order 8 bits written first.
UTF-16	A 16-bit encoding whose endian order is determined by an optional byte-order mark.

Charset names are case insensitive and are maintained by the Internet Assigned Names Authority (IANA) . The names in Table 14-3 are included in IANA’s official registry.

Note

The probably most often used character encoding is UTF-8.

UTF-16BE and UTF-16LE encode each character as a 2-byte sequence in big-endian or little-endian order, respectively. A decoder for a UTF-16BE- or UTF-16LE-encoded byte sequence needs to know how the byte sequence was encoded. In contrast, UTF-16 relies on a byte-order mark (BOM) that appears at the beginning of the sequence. If this mark is absent, decoding proceeds according to UTF-16BE (Java’s native byte order). If this mark equals uFEFF, the sequence is decoded according to UTF-16BE. If this mark equals uFFFE, the sequence is decoded according to UTF-16LE.

Each charset name is associated with a Charset object, which you obtain by invoking one of this class’s factory methods. Listing 14-6 presents an application that shows you how to use this class to obtain the default and standard charsets, which are then used to encode characters into byte sequences.

import java.nio.ByteBuffer;

import java.nio.charset.Charset;

public class CharsetDemo {

public static void main(String[] args) {

String msg = "façade touché";

String[] csNames = {

"US-ASCII",

"ISO-8859-1",

"UTF-8",

"UTF-16BE",

"UTF-16LE",

"UTF-16"

};

encode(msg, Charset.defaultCharset());

for (String csName: csNames)

encode(msg, Charset.forName(csName));

}

private static void encode(String msg, Charset cs) {

System.out.println("Charset: " + cs.toString());

System.out.println("Message: " + msg);

ByteBuffer buffer = cs.encode(msg);

System.out.println("Encoded: ");

for (int i = 0; buffer.hasRemaining(); i++) {

int _byte = buffer.get() & 255;

char ch = (char) _byte;

if (Character.isWhitespace(ch) || Character.isISOControl(ch))

ch = 'u0000';

System.out.printf("%2d: %02x (%c)%n", i, _byte, ch);

}

System.out.println();

}

Listing 14-6

Using Charsets to Encode Characters into Byte Sequences

Listing 14-6’s main() method first creates a message consisting of two French words and an array of names for the standard collection of charsets. Next, it invokes the encode() method to encode the message according to the default charset, which it obtains by calling Charset’s Charset defaultCharset() factory method. Continuing, main() invokes encode() for each of the standard charsets. Charset’s Charset forName(String charsetName) factory method is used to obtain the Charset instance that corresponds to charsetName.

Caution

forName() throws java.nio.charset.IllegalCharsetNameException when the specified charset name is illegal and throws java.nio.charset.UnsupportedCharsetException when the desired charset isn’t supported by the virtual machine.

The encode() method first identifies the charset and the message. It then invokes Charset’s ByteBuffer encode(String s) method to return a new ByteBuffer object containing the bytes that encode the characters from s.

main() next iterates over the bytes in the byte buffer, converting each byte to a character. It uses java.lang.Character’s isWhitespace() and isISOControl() methods to determine if the character is whitespace or a control character (neither is regarded as printable) and converts such a character to Unicode 0 (empty string). (A carriage return or newline would screw up the output, for example.)

Finally, the index of the character, its hexadecimal value, and the character itself are printed to the standard output stream. We chose to use System.out.printf() for this task. You’ll learn about this method in the next section.

Compile Listing 14-6 as follows:

javac CharsetDemo.java

Run the application as follows:

java CharsetDemo

You should observe the following output (abbreviated):

Charset: windows-1252

Message: façade touché

Encoded: 0: 66 (f) 1: 61 (a) 2: e7 (ç) ...

Charset: US-ASCII

Encoded: 0: 66 (f) 1: 61 (a) 2: 3f (?) ...

Charset: ISO-8859-1

Encoded: 0: 66 (f) 1: 61 (a) 2: e7 (ç) ...

Charset: UTF-8

Encoded: 0: 66 (f) 1: 61 (a) 2: c3 (Ã) ...

Charset: UTF-16BE

Encoded: 0: 00 ( ) 1: 66 (f) 2: 00 ( ) ...

Charset: UTF-16LE

Encoded: 0: 66 (f) 1: 00 ( ) 2: 61 (a) ...

Charset: UTF-16

Encoded: 0: fe (þ) 1: ff (ÿ) 2: 00 ( )

In addition to providing encoding methods such as the aforementioned ByteBuffer encode(String s) method, Charset provides a complementary CharBuffer decode(ByteBuffer buffer) decoding method. The return type is CharBuffer because byte sequences are decoded into characters.

Note

ByteBuffer encode(String s) is a convenience method for specifying CharBuffer.wrap(s) and passing the result to the ByteBuffer encode(CharBuffer buffer) method.

Charsets and the String Class

The String class describes a string as a sequence of characters. It declares constructors that can be passed byte arrays. Because a byte array contains an encoded character sequence, a charset is required to decode them. The following is a partial list of String constructors that work with charsets:

String(byte[] data): Constructs a new String instance by decoding the specified array of bytes using the platform’s default charset
String(byte[] data, int offset, int byteCount): Constructs a new String instance by decoding the specified subsequence of the byte array using the platform’s default charset
String(byte[] data, String charsetName): Constructs a new String instance by decoding the specified array of bytes using the named charset

Furthermore, String declares methods that encode its sequence of characters into a byte array with help from the default charset or a named charset. Two of these methods are the following:

byte[] getBytes(): Returns a new byte array containing the characters of this string encoded using the platform’s default charset
byte[] getBytes(String charsetName): Returns a new byte array containing the characters of this string encoded using the named charset

Note that String(byte[] data, String charsetName) and byte[] getBytes(String charsetName) throw java.io.UnsupportedEncodingException when the charset isn’t supported.

We’ve created a small application that demonstrates String and charsets. Listing 14-7 presents the source code.

import java.io.UnsupportedEncodingException;

public class CharsetDemo {

public static void main(String[] args) throws UnsupportedEncodingException {

byte[] encodedMsg = {

0x66, 0x61, (byte) 0xc3, (byte) 0xa7, 0x61, 0x64, 0x65, 0x20, 0x74,

0x6f, 0x75, 0x63, 0x68, (byte) 0xc3, (byte) 0xa9

};

String s = new String(encodedMsg, "UTF-8");

System.out.println(s);

System.out.println();

byte[] bytes = s.getBytes();

for (byte _byte: bytes)

System.out.print(Integer.toHexString(_byte & 255) + " ");

System.out.println();

}

Listing 14-7

Using Charsets with String

Listing 14-7’s main() method first creates a byte array containing a UTF-8 encoded message. It then converts this array to a String object via the UTF-8 charset. After outputting the resulting String object, it extracts this object’s bytes into a new byte array and proceeds to output these bytes in hexadecimal format. As demonstrated earlier in this chapter, we bitwise AND each byte value with 255 to remove the 0xFF sign extension bytes for negative integers when the 8-bit byte integer value is converted to a 32-bit integer value. These sign extension bytes would otherwise be output.

Compile Listing 14-7 (javac CharsetDemo.java), and run this application (java CharsetDemo). You should observe the following output:

façade touché

66 61 e7 61 64 65 20 74 6f 75 63 68 e9

You might be wondering why you observe e7 instead of c3 a7 (Latin small letter c with a cedilla [hook or tail]) and e9 instead of c3 a9 (Latin small letter e with an acute accent). The answer is that we invoked the noargument getBytes() method to encode the string. This method uses the default charset, which is windows-1252 on our platform. According to this charset, e7 is equivalent to c3 a7 and e9 is equivalent to c3 a9. The result is a shorter encoded sequence.

Working with Formatter and Scanner

The description for JSR 51 (http://jcp.org/en/jsr/detail?id=51) indicates that a simple printf-style formatting facility was proposed for inclusion in NIO. If you’re familiar with the C language, you’ve probably worked with the printf() family of functions that support formatted output. You’ve also probably worked with the complementary scanf() family that support formatted input.

One feature that makes the printf() and scanf() functions useful is varargs, which lets you pass a variable number of arguments to these functions.

Working with Formatter

Java 5 introduced the java.util.Formatter class as an interpreter for printf()-style format strings. This class provides support for layout justification and alignment; common formats for numeric, string, and date/time data; and more. Commonly used Java types (such as byte and BigDecimal) are supported. Also, limited formatting customization for arbitrary user-defined types is provided through the associated java.util.Formattable interface and java.util.FormattableFlags class.

Formatter declares several constructors for creating Formatter objects. These constructors let you specify where you want formatted output to be sent. For example, Formatter() writes formatted output to an internal StringBuilder instance and Formatter(OutputStream os) writes formatted output to the specified output stream. You can access the destination by calling Formatter’s Appendable out() method .

Note

The java.lang.Appendable interface describes an object to which char values and character sequences can be appended. Classes (such as StringBuilder) whose instances are to receive formatted output (via the Formatter class) implement Appendable. This interface declares methods such as Appendable append(char c)—append c’s character to this appendable. When an I/O error occurs, this method throws IOException.

After creating a Formatter object, you would call a format() method to format a varying number of values. For example, Formatter format(String format, Object... args) formats the args array according to the string of format specifiers passed to the format parameter, and it returns a reference to the invoking Formatter so that you can chain format() calls together (for convenience).

Each format specifier has one of the following syntaxes:

%[argument_index$][flags][width][.precision]conversion
%[argument_index$][flags][width]conversion
%[flags][width]conversion

The first syntax describes a format specifier for general, character, and numeric types. The second syntax describes a format specifier for types that are used to represent dates and times. The third syntax describes a format specifier that doesn’t correspond to arguments.

The optional argument_index is a decimal integer indicating the position of the argument in the argument list. The first argument is referenced by 1$, the second argument is referenced by 2$, and so on.

The optional flags are a set of characters that modify the output format. The set of valid flags depends on the conversion.

The optional width is a positive decimal integer indicating the minimum number of characters to be written to the output.

The optional precision is a nonnegative decimal integer usually used to restrict the number of characters. The specific behavior depends on the conversion.

The required conversion depends on the syntax. For the first syntax, it’s a character indicating how the argument should be formatted. The set of valid conversions for a given argument depends on the argument’s data type. For the second syntax, it’s a two-character sequence. The first character is t or T. The second character indicates the format to be used. For the third syntax, it’s a character indicating content to be inserted in the output.

Conversions are divided into six categories: general, character, numeric (integer or floating-point), date/time, percent, and line separator. The following list identifies a few example format specifiers and their conversions:

%d: Formats argument as a decimal integer
%x: Formats argument as a hexadecimal integer
%c: Formats argument as a character
%f: Formats argument as a decimal number
%s: Formats argument as a string
%n: Outputs a platform-specific line separator
%10.2f: Formats argument as a decimal number with 10 as the minimum number of characters to be written (leading spaces are written when the number is smaller than the width) and 2 as the number of characters to be written after the decimal point
%05d: Formats argument as a decimal integer with 5 as the minimum number of characters to be written (leading 0s are written when the number is smaller than the width)

When you’re finished with the formatter, you might want to invoke the void flush() method to ensure that any buffered output in the destination is written to the underlying stream. You would typically invoke flush() when the destination is a file.

Continuing, invoke the formatter’s void close() method . In addition to closing the formatter, this method also closes the underlying output destination when this destination’s class implements the java.io.Closeable interface. If the formatter has been closed, this method has no effect. Attempting to format after calling close() results in java.util.FormatterClosedException.

Listing 14-8 provides a simple demonstration of Formatter using the aforementioned format specifiers.

import java.util.Formatter;

public class FormatterDemo {

public static void main(String[] args) {

Formatter formatter = new Formatter();

formatter.format("%d", 123);