Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Converting Between Unicode Characters and Strings

Problem

You want to convert between Unicode characters and Strings.

Solution

Since both Java chars and Unicode characters are 16 bits in width, a char can hold any Unicode character. The charAt( ) method of String returns a Unicode character. The StringBuffer append( ) method has a form that accepts a char. Since char is an integer type, you can even do arithmetic on chars, though this is not necessary as frequently as in, say, C. Nor is it often recommended, since the Character class provides the methods for which these operations were normally used in languages such as C. Here is a program that uses arithmetic on chars to control a loop, and also appends the characters into a StringBuffer (see Section 3.4):

/**
 * Conversion between Unicode characters and bytes
 */
public class UnicodeChars {
    public static void main(String[] argv) {
        StringBuffer b = new StringBuffer(  );
        for (char c = 'a'; c<'d'; c++) {
            b.append(c);
        }
        b.append('u00a5'),    // Japanese Yen symbol
        b.append('u01FC'),    // Roman AE with acute accent
        b.append('u0391'),    // GREEK Capital Alpha
        b.append('u03A9'),    // GREEK Capital Omega

        for (int i=0; i<b.length(  ); i++) {
            System.out.println("Character #" + i + " is " + b.charAt(i));
        }
        System.out.println("Accumulated characters are " + b);
    }
}

When you run it, the expected results are printed for the ASCII characters. On my Unix system, the default fonts don’t include all the additional characters, so they are either omitted or mapped to irregular characters. We will see in Section 12.4 how to draw text in other fonts.

C:javasrcstrings>java  UnicodeChars
Character #0 is a
Character #1 is b
Character #2 is c
Character #3 is %
Character #4 is |
Character #5 is
Character #6 is )
Accumulated characters are abc%|)

My Windows system doesn’t have most of those characters either, but it at least prints the ones it knows are lacking as question marks (Windows system fonts are more homogenous than those of the various Unix systems, so it is easier to know what won’t work). On the other hand, it tries to print the Yen sign as a Spanish capital Enye (N with a ~ over it). Amusingly, if I capture the console log under MS-Windows into a file and display it under Unix, the Yen symbol now appears:

Character #0 is a
Character #1 is b
Character #2 is c
Character #3 is ¥
Character #4 is ?
Character #5 is ?
Character #6 is ?
Accumulated characters are abc¥???

Table of Contents for
Converting Between Unicode Characters and Strings

Converting Between Unicode Characters and Strings

Problem

Solution

See Also

Table of Contents for Converting Between Unicode Characters and Strings

Create new playlist

Sign In

Sign Up

Converting Between Unicode Characters and Strings

Problem

Solution

See Also

Table of Contents for
Converting Between Unicode Characters and Strings