Text characters can be represented in different ways. For example, the alphabet can be encoded using Morse code into a series of dots and dashes for transmission over a telegraph line.
In a similar way, text inside a computer is stored as bits (ones and zeros). .NET Core uses a standard called Unicode to encode text internally. Sometimes, you will need to move text outside .NET Core for use by systems that do not use Unicode or use a variation of Unicode.
The following table lists some alternative text encodings commonly used by computers:
Encoding |
Description |
ASCII |
This encodes a limited range of characters using the lower seven bits of a byte |
UTF-8 |
This represents each Unicode code point as a sequence of one to four bytes |
UTF-16 |
This represents each Unicode code point as a sequence of one or two 16-bit integers |
ANSI/ISO encodings |
This provides support for a variety of code pages that are used to support a specific language or group of languages |
Add a new console application project named Ch10_Encoding
.
Import the System.Text
namespace and add the following statements to the Main
method. The code encodes a string using the chosen encoding, loops through each byte, and then decodes back into a string and outputs it:
WriteLine("Encodings"); WriteLine("[1] ASCII"); WriteLine("[2] UTF-7"); WriteLine("[3] UTF-8"); WriteLine("[4] UTF-16 (Unicode)"); WriteLine("[5] UTF-32"); WriteLine("[any other key] Default"); // choose an encoding Write("Press a number to choose an encoding: "); ConsoleKey number = ReadKey(false).Key; WriteLine(); WriteLine(); Encoding encoder; switch (number) { case ConsoleKey.D1: encoder = Encoding.ASCII; break; case ConsoleKey.D2: encoder = Encoding.UTF7; break; case ConsoleKey.D3: encoder = Encoding.UTF8; break; case ConsoleKey.D4: encoder = Encoding.Unicode; break; case ConsoleKey.D5: encoder = Encoding.UTF32; break; default: encoder = Encoding.GetEncoding(0); break; } // define a string to encode string message = "A pint of milk is £1.99"; // encode the string into a byte array byte[] encoded = encoder.GetBytes(message); // check how many bytes the encoding needed WriteLine($"{encoder.GetType().Name} uses {encoded.Length} bytes."); // enumerate each byte WriteLine($"Byte Hex Char"); foreach (byte b in encoded) { WriteLine($"{b,4} {b.ToString("X"),4} {(char)b,5}"); } // decode the byte array back into a string and display it string decoded = encoder.GetString(encoded); WriteLine(decoded);
Run the application and press 1
to choose ASCII. Notice that when outputting the bytes, the pound sign (£
) cannot be represented in ASCII, so it uses a question mark (?
) instead:
Encodings [1] ASCII [2] UTF-7 [3] UTF-8 [4] UTF-16 (Unicode) [5] UTF-32 [any other key] Default Press a number to choose an encoding: 1 ASCIIEncoding uses 23 bytes. Byte Hex Char 65 41 A 32 20 112 70 p 105 69 i 110 6E n 116 74 t 32 20 111 6F o 102 66 f 32 20 109 6D m 105 69 i 108 6C l 107 6B k 32 20 105 69 i 115 73 s 32 20 63 3F ? 49 31 1 46 2E . 57 39 9 57 39 9 A pint of milk is ?1.99
Rerun the application and press 3
to choose UTF-8. Notice that, UTF-8 requires one extra byte (24 bytes instead of 23 bytes), but it can store the £
:
UTF8Encoding uses 24 bytes. Byte Hex Char 65 41 A 32 20 112 70 p 105 69 i 110 6E n 116 74 t 32 20 111 6F o 102 66 f 32 20 109 6D m 105 69 i 108 6C l 107 6B k 32 20 105 69 i 115 73 s 32 20 194 C2 Â 163 A3 £ 49 31 1 46 2E . 57 39 9 57 39 9 A pint of milk is £1.99
Rerun the application and press 4
to choose Unicode (UTF-16). Notice that UTF-16 requires two bytes for every character, but it can store the £
:
UnicodeEncoding uses 46 bytes.
When using stream helper classes, such as StreamReader
and StreamWriter
, you can specify the encoding you want to use. As you write to the helper, the strings will automatically be encoded, and as you read from the helper, the bytes will be automatically decoded. This is how you can specify the encoding:
var reader = new StreamReader(stream, Encoding.UTF7); var writer = new StreamWriter(stream, Encoding.UTF7);
18.191.235.62