System.Text

The System.Text namespace contains classes that deal with converting from one character encoding to another. The input is assumed primarily to be Unicode. The output varies with the encoding class that is selected. Listing B.59 shows an example of using the UTF8 encoding class (encoding.cs).

Listing B.59. Using UTF8 Encoding
byte [] encoded;
UTF8Encoding encoding = new UTF8Encoding();
Console.WriteLine("CodePage: {0} ", encoding.CodePage);
Console.WriteLine("EncodingName: {0} ", encoding.EncodingName);
Console.WriteLine("WindowsCodePage: {0} ", encoding.WindowsCodePage);
encoded = encoding.GetBytes(japanese);
Console.WriteLine("Encoded Japanese: {0}  <-> {1} ",
                  japanese.Length, encoded.Length);
encoded = encoding.GetBytes(chinese);
Console.WriteLine("Encoded Chinese: {0}  <-> {1} ",
                  chinese.Length, encoded.Length);
encoded = encoding.GetBytes(english);
Console.WriteLine("Encoded English: {0}  <-> {1} ",
                  english.Length, encoded.Length);

This sample contains three strings: one Japanese, one Chinese, and one English. They are all stored as a .NET string. The UTF8 encoder takes these strings in and outputs a sequence of bytes corresponding to the UTF8 representation of these strings. Listing B.60 shows the output of Listing B.59.

Listing B.60. UTF8 Encoding Output
CodePage: 65001
EncodingName: Unicode (UTF-8)
WindowsCodePage: 1200
Encoded Japanese: 7 <-> 21
Encoded Chinese: 4 <-> 12
Encoded English: 12 <-> 12

What was 7 Japanese characters (14 bytes) turned into 21 UTF-8 bytes? What was 4 Chinese characters (8 bytes) turned into 12 UTF-8 bytes? And what was 12 English characters (24 bytes) turned into 12 UTF-8 bytes? Clearly, for Japanese and Chinese, you cannot assume that just two bytes (16-bits) need to represent a character in UTF8. For English, the encoding actually decreased the size of the required bytes by exactly one-half.

System.Text.RegularExpressions

The RegularExpressions class (and the associated support classes) in .NET has taken a great stride forward in providing additional ease of use and functionality to traditional regular expression processing. Listing B.61 shows how to use one regular expression to split apart the components of a file path (regex.cs).

Listing B.61. Regular Expressions and a File Path
public class RegexMain
{
    public static void Main(String[] args)
    {
        Regex pathregex = new Regex(@"(?<drive>[^:]:\)? ((?<dir>[^\]+)\)* (?<file>(
(?<base>[^.]+)[.]?(?<ext>.*)))");
        string s = @"c:acde.cs";

        if ( args.Length > 0 )
        {
            s = args[0];
        }

        Match mc = pathregex.Match(s);
        if ( mc.Success )
        {
            Console.WriteLine("Success in parsing "{0} " !!", s);
            Console.WriteLine("Drive: " + mc.Groups["drive"].Value);
            CaptureCollection cc = mc.Groups["dir"].Captures;
            // Print number of captures in this group.
            Console.WriteLine("Directories: {0} ", cc.Count);
            // Loop through each capture in group.
            for (int i = 0; i < cc.Count; i++)
            {
                // Print capture and position.
                Console.WriteLine("{0}  starts at character {1} ",
                                  cc[i], cc[i].Index);
            }
            Console.WriteLine("File: " + mc.Groups["file"].Value);
            Console.WriteLine("Base: " + mc.Groups["base"].Value);
            Console.WriteLine("Extension: " + mc.Groups["ext"].Value);
        }
        else
        {
            Console.WriteLine(s + " is not a valid path address");
        }
    }
}

Listing B.62 shows the output from the code in Listing B.61.

Listing B.62. Output from File Path Parsing Using Regular Expressions
Success in parsing "c:acde.cs" !!
Drive: c:
Directories: 4
a starts at character 3
b starts at character 5
c starts at character 7
d starts at character 9
File: e.cs
Base: e
Extension: cs
						

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.5.239