Since both
Java
char
s and Unicode
characters are
16 bits in width, a char
can hold any Unicode
character. The charAt( )
method of String
returns a Unicode character. The
StringBuffer
append( )
method
has a form that accepts a char
. Since
char
is an integer type, you can even do
arithmetic on char
s,
though this is not necessary as frequently as in, say, C. Nor is it
often recommended, since the
Character
class provides the methods for which
these operations were normally used in languages such as C. Here is a
program that uses arithmetic on char
s to control a
loop, and also appends the characters into a
StringBuffer
(see Section 3.4):
/** * Conversion between Unicode characters and bytes */ public class UnicodeChars { public static void main(String[] argv) { StringBuffer b = new StringBuffer( ); for (char c = 'a'; c<'d'; c++) { b.append(c); } b.append('u00a5'), // Japanese Yen symbol b.append('u01FC'), // Roman AE with acute accent b.append('u0391'), // GREEK Capital Alpha b.append('u03A9'), // GREEK Capital Omega for (int i=0; i<b.length( ); i++) { System.out.println("Character #" + i + " is " + b.charAt(i)); } System.out.println("Accumulated characters are " + b); } }
When you run it, the expected results are printed for the ASCII characters. On my Unix system, the default fonts don’t include all the additional characters, so they are either omitted or mapped to irregular characters. We will see in Section 12.4 how to draw text in other fonts.
C:javasrcstrings>java UnicodeChars Character #0 is a Character #1 is b Character #2 is c Character #3 is % Character #4 is | Character #5 is Character #6 is ) Accumulated characters are abc%|)
My Windows system doesn’t have most of those characters either, but it at least prints the ones it knows are lacking as question marks (Windows system fonts are more homogenous than those of the various Unix systems, so it is easier to know what won’t work). On the other hand, it tries to print the Yen sign as a Spanish capital Enye (N with a ~ over it). Amusingly, if I capture the console log under MS-Windows into a file and display it under Unix, the Yen symbol now appears:
Character #0 is a Character #1 is b Character #2 is c Character #3 is ¥ Character #4 is ? Character #5 is ? Character #6 is ? Accumulated characters are abc¥???
The
Unicode
program in this
book’s online source displays any 256-character section of the
Unicode character set. Documentation listing every character in the
Unicode character set can be downloaded along with supporting
documentation from the Unicode Consortium at http://www.unicode.org.
18.118.12.186