Convert the
text to or from internal
Unicode by specifying a converter when you construct an
InputStreamReader
or
PrintWriter.
Classes InputStreamReader
and
OutputStreamWriter
are the bridge from byte-oriented
Stream
s to character-based
Reader
s. These classes read or write bytes and
translate them to or from characters according to a specified
character encoding. The Unicode
character set used inside Java (char
and
String
types) is a 16-bit character set. But most
character sets, such as ASCII, Swedish, Spanish, Greek, Turkish, and
many others, use only a small subset of that. In fact, many European
language character sets fit nicely into 8-bit characters. Even the
larger character sets (script-based and pictographic languages)
don’t all use the same bit values for each particular
character. The
encoding
, then, is a mapping between Unicode
characters and a particular external storage format for characters
drawn from a particular national or linguistic character set.
To simplify matters, the
InputStreamReader
and
OutputStreamWriter
constructors are the only
places where you can specify the name of an encoding to be used in
this translation. If you do not, the platform’s (or
user’s) default encoding will be used.
PrintWriters
, BufferedReaders
,
and the like all use whatever encoding the
InputStreamReader
or
OutputStreamWriter
class uses. Since these bridge
classes only accept Stream
arguments in their
constructors, the implication is that if you want to specify a
non-default converter to read/write a file on disk, you must start by
constructing not a FileReader/FileWriter
, but a
FileInputStream/FileOutputStream
!
// UseConverters.java BufferedReader fromKanji = new BufferedReader( new InputStreamReader(new FileInputStream("kanji.txt"), "EUC_JP")); PrintWriter toSwedish = new PrinterWriter( new OutputStreamWriter(new FileOutputStream("sverige.txt"), "Cp278"));
Not that it would necessarily make sense to read a single file from
Kanji and output it in a Swedish encoding; for one thing, most fonts
would not have all the characters of both character sets, and at any
rate, the Swedish encoding certainly has far fewer characters in it
than the Kanji encoding. Besides, if that were all you wanted, you
could use a JDK tool with the ill-fitting name
native2ascii
(see its documentation for details). A
list of the supported encodings is also in the JDK documentation, in
the file docs/guide/internat/encoding.doc.html
.
A more detailed description is found in Appendix B of Java
I/O.
3.145.173.199