Chapter 8. Strings and Regular Expressions

One of the most common data types used in programming is the string. In C#, a string is a group of one or more characters declared using the string keyword. Strings play an important part in programming and are an integral part of our lives — our names, addresses, company names, email addresses, web site URLs, flight numbers, and so forth are all made up of strings. To help manipulate those strings and pattern matching, you use regular expressions, sequences of characters that define the patterns of a string. In this chapter, then, you will:

  • Explore the System.String class

  • Learn how to represent special characters in string variables

  • Manipulate strings with various methods

  • Format strings

  • Use the StringBuilder class to create and manipulate strings

  • Use Regular Expressions to match string patterns

The System.String Class

The .NET Framework contains the System.String class for string manipulation. To create an instance of the String class and assign it a string, you can use the following statements:

String str1;
            str1 = "This is a string";

C# also provides an alias to the String class: string (lowercase "s"). The preceding statements can be rewritten as:

string str1; //---equivalent to String str1;---
            str1 = "This is a string";

You can declare a string and assign it a value in one statement, like this:

string str2 = "This is another string";

In .NET, a string is a reference type but behaves very much like a value type. Consider the following example of a typical reference type:

Button btn1 = new Button() { Text = "Button 1" };
            Button btn2 = btn1;

            btn1.Text += " and 2"; //---btn1.text is now "Button 1 and 2"---
            Console.WriteLine(btn1.Text); //---Button 1 and 2---
            Console.WriteLine(btn2.Text); //---Button 1 and 2---

Here, you create an instance of a Button object (btn1) and then assign it to another variable (btn2). Both btn1 and btn2 are now pointing to the same object, and hence when you modify the Text property of btn1, the changes can be seen in btn2 (as is evident in the output of the WriteLine() statements).

Because strings are reference types, you would expect to see the same behavior as exhibited in the preceding block of code. For example:

string str1 = "String 1";
            string str2 = str1;

str1 and str2 should now be pointing to the same instance. Make some changes to str1 by appending some text to it:

str1 += " and some other stuff";

And then print out the value of these two strings:

Console.WriteLine(str1); //---String 1 and some other stuff---
            Console.WriteLine(str2); //---String 1---

Are you surprised to see that the values of the two strings are different? What actually happens when you do the string assignment (string str2 = str1) is that str1 is copied to str2 (str2 holds a copy of str1; it does not points to it). Hence, changes made to str1 are not reflected in str2.

Note

A string cannot be a value type because of its unfixed size. All values types (int, double, and so on) have fixed size.

A string is essentially a collection of Unicode characters. The following statements show how you enumerate a string as a collection of char and print out the individual characters to the console:

string str1 = "This is a string";
            foreach (char c in str1)
            {
                Console.WriteLine(c);
            }

Here's this code's output:

T
h
i
s

i
s

a

s
t
r
i
n
g

Escape Characters

Certain characters have special meaning in strings. For example, strings are always enclosed in double quotation marks, and if you want to use the actual double-quote character in the string, you need to tell the C# compiler by "escaping" the character's special meaning. For instance, say you need to represent the following in a string:

"I don't necessarily agree with everything I say." Marshall McLuhan

Because the sentence contains the double-quote characters, simply using a pair of double-quotes to contain it will cause an error:

//---error---
string quotation;
quotation = ""I don't necessarily agree with everything I say." Marshall McLuhan";

To represent the double-quote character in a string, you use the backslash () character to turn off its special meanings, like this:

string quotation =
   ""I don't necessarily agree with everything I say." Marshall McLuhan";
Console.WriteLine(quotation);

The output is shown in Figure 8-1.

Figure 8-1

Figure 8.1. Figure 8-1

A backslash, then, is another special character. To represent the C:Windows path, for example, you need to turn off the special meaning of by using another , like this:

string path = "C:\Windows";

What if you really need two backslash characters in your string, as in the following?

"\servernamepath"

In that case, you use the backslash character twice, once for each of the backslash characters you want to turn off, like this:

string UNC = "\\servername\path";

In addition to using the character to turn off the special meaning of characters like the double-quote (") and backslash (), there are other escape characters that you can use in strings.

One common escape character is the . Here's an example:

string lines = "Line 1
Line 2
Line 3
Line 4
Line 5";
            Console.WriteLine (lines);

The escape character creates a newline, as Figure 8-2 shows.

Figure 8-2

Figure 8.2. Figure 8-2

You can also use to insert tabs into your string, as the following example shows (see also Figure 8-3):

string columns1 = "Column 1	Column 2	Column 3	Column 4";
            string columns2 = "1	5	25	125";
            Console.WriteLine(columns1);
            Console.WriteLine(columns2);
Figure 8-3

Figure 8.3. Figure 8-3

You learn more about formatting options in the section "String Formatting" later in this chapter.

Besides the and escape characters, C# also supports the escape character. is the carriage return character. Consider the following example:

string str1 = "       One";
            string str2 = "Two";
            Console.Write(str1);
            Console.Write(str2);

The output is shown in Figure 8-4.

Figure 8-4

Figure 8.4. Figure 8-4

However, if you prefix a escape character to the beginning of str2, the effect will be different:

string str1 = "       One";
            string str2 = "
Two";
            Console.Write(str1);
            Console.Write(str2);

The output is shown in Figure 8-5.

Figure 8-5

Figure 8.5. Figure 8-5

The escape character simply brings the cursor to the beginning of the line, and hence in the above statements the word "Two" is printed at the beginning of the line. The escape character is often used together with to form a new line (see Figure 8-6):

string str1 = "Line 1

";
            string str2 = "Line 2

";
            Console.Write(str1);
            Console.Write(str2);

Note

By default, when you use the to insert a new line, the cursor is automatically returned to the beginning of the line. However, some legacy applications still require you to insert newline and carriage return characters in strings.

Figure 8-6

Figure 8.6. Figure 8-6

The following table summarizes the different escape sequences you have seen in this section:

Sequence

Purpose

New line

Carriage return

Carriage return; New line

"

Quotation marks

\

Backslash character

Tab

In C#, strings can also be @-quoted. Earlier, you saw that to include special characters (such as double-quote, backslash, and so on) in a string you need to use the backslash character to turn off its special meaning:

string path="C:\Windows";

You can actually use the @ character, and prefix the string with it, like this:

string path=@"C:Windows";

Using the @ character makes your string easier to read. Basically, the compiler treats strings that are prefixed with the @ character verbatim — that is, it just accepts all the characters in the string (inside the quotes). To better appreciate this, consider the following example where a string containing an XML snippet is split across multiple lines (with each line ending with a carriage return):

string XML = @"
                 <Books>
                    <title>C# 3.0 Programmers' Reference</title>
                 </Book>";
            Console.WriteLine(XML);

Figure 8-7 shows the output. The WriteLine() method prints out the line verbatim.

Figure 8-7

Figure 8.7. Figure 8-7

To illustrate the use of the @ character on a double-quoted string, the following:

string quotation =
    ""I don't necessarily agree with everything I say." Marshall McLuhan";
Console.WriteLine(quotation);

can be rewritten as:

string quotation =
    @"""I don't necessarily agree with everything I say."" Marshall McLuhan";
Console.WriteLine(quotation);

String Manipulations

Often, once your values are stored in string variables, you need to perform a wide variety of operations on them, such as comparing the values of two strings, inserting and deleting strings from an existing string, concatenating multiple strings, and so on. The String class in the .NET Framework provides a host of methods for manipulating strings, some of the important ones of which are explained in the following sections.

You can find out about all of the String class methods at www.msdn.com.

Testing for Equality

Even though string is a reference type, you will use the == and != operators to compare the value of two strings (not their references).

Consider the following three string variables:

string str1 = "This is a string";
            string str2 = "This is a ";
            str2 += "string";
            string str3 = str2;

The following statements test the equality of the values contained in each variable:

Console.WriteLine(str1 == str2); //--True---
            Console.WriteLine(str1 == str3); //--True---
            Console.WriteLine(str2 != str3); //---False---

As you can see from the output of these statements, the values of each three variables are identical. However, to compare their reference equality, you need to cast each variable to object and then check their equality using the == operator, as the following shows:

Console.WriteLine((object)str1 == (object)str2); //--False---
            Console.WriteLine((object)str2 == (object)str3); //--True---

However, if after the assignment the original value of the string is changed, the two strings' references will no longer be considered equal, as the following shows:

string str3 = str2;
            Console.WriteLine((object)str2 == (object)str3); //--True---

            str2 = "This string has changed";
            Console.WriteLine((object)str2 == (object)str3); //--False---

Besides using the == operator to test for value equality, you can also use the Equals() method, which is available as an instance method as well as a static method:

Console.WriteLine(str1 == str2); //--True---
            Console.WriteLine(str1.Equals(str2)); //--True---
            Console.WriteLine(string.Equals(str1,str2)); //--True---

Comparing Strings

String comparison is a common operation often performed on strings. Consider the following two string variables:

string str1 = "Microsoft";
     string str2 = "microsoft";

You can use the String.Compare() static method to compare two strings:

Console.WriteLine(string.Compare(str1, str2));       // 1;str1 is greater than str2
Console.WriteLine(string.Compare(str2, str1));       // −1;str2 is less than str1
Console.WriteLine(string.Compare(str1, str2, true)); // 0;str1 equals str2

The lowercase character "m" comes before the capital "M," and hence str1 is considered greater than str2. The third statement compares the two strings without considering the casing (that is, case-insensitive; it's the third argument that indicates that the comparison should ignore the casing of the strings involved).

The String.Compare() static method is overloaded, and besides the two overloaded methods (first two statements and the third statement) just shown, there are additional overloaded methods as described in the following table.

Method

Description

Compare(String, String)

Compares two specified String objects.

Compare(String, String, Boolean)

Compares two specified String objects, ignoring or respecting their case.

Compare(String, String, StringComparison)

Compares two specified String objects. Also specifies whether the comparison uses the current or invariant culture, honors or respects case, and uses word or ordinal sort rules.

Compare(String, String, Boolean, CultureInfo)

Compares two specified String objects, ignoring or respecting their case, and using culture-specific information for the comparison.

Compare(String, Int32, String, Int32, Int32)

Compares substrings of two specified String objects.

Compare(String, Int32, String, Int32, Int32, Boolean)

Compares substrings of two specified String objects, ignoring or respecting their case.

Compare(String, Int32, String, Int32, Int32, StringComparison)

Compares substrings of two specified String objects.

Compare(String, Int32, String, Int32, Int32, Boolean, CultureInfo)

Compares substrings of two specified String objects, ignoring or respecting their case, and using culture-specific information for the comparison.

Alternatively, you can use the CompareTo() instance method, like this:

Console.WriteLine(str1.CompareTo(str2)); // 1; str1 is greater than str2
     Console.WriteLine(str2.CompareTo(str1)); // −1; str2 is less than str1

Note that comparisons made by the CompareTo() instance method are always case sensitive.

Creating and Concatenating Strings

The String class in the .NET Framework provides a number of methods that enable you to create or concatenate strings.

The most direct way of concatenating two strings is to use the "+" operator, like this:

string str1 = "Hello ";
            string str2 = "world!";
            string str3 = str1 + str2;
            Console.WriteLine(str3); //---Hello world!---

The String.Format() static method takes the input of multiple objects and creates a new string. Consider the following example:

string Name = "Wei-Meng Lee";
            int age = 18;
            string str1 = string.Format("My name is {0} and I am {1} years old",
                          Name, age);

            //---str1 is now "My name is Wei-Meng Lee and I am 18 years old"---
            Console.WriteLine(str1);

Notice that you supplied two variables of string and int type and the Format() method automatically combines them to return a new string.

The preceding example can be rewritten using the String.Concat() static method, like this:

string str1 = string.Concat("My name is ", Name, " and I am ", age ,
                                         " years old");
            //---str1 is now "My name is Wei-Meng Lee and I am 18 years old"---
            Console.WriteLine(str1);

The String.Join() static method is useful when you need to join a series of strings stored in a string array. The following example shows the strings in a string array joined using the Join() method:

string[] pts = { "1,2", "3,4", "5,6" };
            string str1 = string.Join("|", pts);
            Console.WriteLine(str1); //---1,2|3,4|5,6---

To insert a string into an existing string, use the instance method Insert(), as demonstrated in the following example:

string str1 = "This is a string";
            str1 = str1.Insert(10, "modified ");
            Console.WriteLine(str1); //---This is a modified string---

The Copy() instance method enables you to copy part of a string into a char array. Consider the following example:

string str1 = "This is a string";
            char[] ch = { '*', '*', '*', '*', '*', '*','*', '*' };
            str1.CopyTo(0, ch, 2, 4);
            Console.WriteLine(ch); //---**This**---

The first parameter of the CopyTo() method specifies the index of the string to start copying from. The second parameter specifies the char array. The third parameter specifies the index of the array to copy into, while the last parameter specifies the number of characters to copy.

If you need to pad a string with characters to achieve a certain length, use the PadLeft() and PadRight() instance methods, as the following statements show:

string str1 = "This is a string";
            string str2;

            str2 = str1.PadLeft(20, '*'),
            Console.WriteLine(str2); //---"****This is a string"---

            str2 = str1.PadRight(20, '*'),
            Console.WriteLine(str2); //---"This is a string****"---

Trimming Strings

To trim whitespace from the beginning of a string, the end of a string, or both, you can use the TrimStart(), TrimEnd(), or Trim() instance methods, respectively. The following statements demonstrate the use of these methods:

string str1 = "   Computer   ";
            string str2;
            Console.WriteLine(str1); //---"   Computer   "---
            str2 = str1.Trim();
            Console.WriteLine(str2); //---"Computer"---

            str2 = str1.TrimStart();
            Console.WriteLine(str2); //---"Computer   "---

            str2 = str1.TrimEnd();
            Console.WriteLine(str2); //---"   Computer"---

Splitting Strings

One common operation with string manipulation is splitting a string into smaller strings. Consider the following example where a string contains a serialized series of points:

string str1 = "1,2|3,4|5,6|7,8|9,10";

Each point ("1, 2", "3, 4", and so on) is separated with the | character. You can use the Split() instance method to split the given string into an array of strings:

string[] strArray = str1.Split('|'),

Once the string is split, the result is stored in the string array strArray and you can print out each of the smaller strings using a foreach statement:

foreach (string s in strArray)
                Console.WriteLine(s);

The output of the example statement would be:

1,2
3,4
5,6
7,8
9,10

You can further split the points into individual coordinates and then create a new Point object, like this:

string str1 = "1,2|3,4|5,6|7,8|9,10";
            string[] strArray = str1.Split('|'),

            foreach (string s in strArray)
            {
               string[] xy= s.Split(','),
               Point p = new Point(Convert.ToInt16(xy[0]), Convert.ToInt16(xy[1]));
               Console.WriteLine(p.ToString());
            }

The output of the above statements would be:

{X=1,Y=2}
{X=3,Y=4}
{X=5,Y=6}
{X=7,Y=8}
{X=9,Y=10}

Searching and Replacing Strings

Occasionally, you need to search for a specific occurrence of a string within a string. For this purpose, you have several methods that you can use.

To look for the occurrence of a word and get its position, use the IndexOf() and LastIndexOf() instance methods. IndexOf() returns the position of the first occurrence of a specific word from a string, while LastIndexOf() returns the last occurrence of the word. Here's an example:

string str1 = "This is a long long long string...";
            Console.WriteLine(str1.IndexOf("long"));     //---10---
            Console.WriteLine(str1.LastIndexOf("long")); //---20---

To find all the occurrences of a word, you can write a simple loop using the IndexOf() method, like this:

int position = −1;
            string str1 = "This is a long long long string...";
            do
            {
                position = str1.IndexOf("long", ++position);
                if (position > 0)
                    Console.WriteLine(position);
            } while (position > 0);

This prints out the following:

10
15
20

To search for the occurrence of particular character, use the IndexOfAny() instance method. The following statements search the str1 string for the any of the characters a, b, c, d, or e, specified in the char array:

char[] anyof = "abcde".ToCharArray();
            Console.WriteLine(str1.IndexOfAny(anyof)); //---8---

To obtain a substring from within a string, use the Substring() instance method, as the following example shows:

string str1 = "This is a long string...";
            string str2;
            Console.WriteLine(str1.Substring(10)); //---long string...---
            Console.WriteLine(str1.Substring(10, 4)); //---long---

To find out if a string begins with a specific string, use the StartsWith() instance method. Likewise, to find out if a string ends with a specific string, use the EndsWith() instance method. The following statements illustrate this:

Console.WriteLine(str1.StartsWith("This")); //---True---
            Console.WriteLine(str1.EndsWith("...")); //---True---

To remove a substring from a string beginning from a particular index, use the Remove() instance method:

str2 = str1.Remove(10);
            Console.WriteLine(str2); //---"This is a"---

This statement removes the string starting from index position 10. To remove a particular number of characters, you need to specify the number of characters to remove in the second parameter:

str2 = str1.Remove(10,5);  //---remove 5 characters from index 10---
            Console.WriteLine(str2); //---"This is a string..."---

To replace a substring with another, use the Replace() instance method:

str2 = str1.Replace("long", "short");
            Console.WriteLine(str2); //---"This is a short string..."---

To remove a substring from a string without specifying its exact length, use the Replace() method, like this:

str2 = str1.Replace("long ", string.Empty);
            Console.WriteLine(str2); //---"This is a string..."---

Changing Case

To change the casing of a string, use the ToUpper() or ToLower() instance methods. The following statements demonstrate their use:

string str1 = "This is a string";
            string str2;

            str2 = str1.ToUpper();
            Console.WriteLine(str2); //---"THIS IS A STRING"---

            str2 = str1.ToLower();
            Console.WriteLine(str2); //---"this is a string"---

String Formatting

You've seen the use of the Console.WriteLine() method to print the output to the console. For example, the following statement prints the value of num1 to the console:

int num1 = 5;
            Console.WriteLine(num1); //---5---

You can also print the values of multiple variables like this:

int num1 = 5;
            int num2 = 12345;
            Console.WriteLine(num1 + " and " + num2); //---5 and 12345---

If you have too many variables to print (say more than five), though, the code can get messy very quickly. A better way would be to use a format specifier, like this:

Console.WriteLine("{0} and {1}", num1, num2); //---5 and 12345---

A format specifier ({0}, {1}, and so forth) automatically converts all data types to string. Format specifiers are labeled sequentially ({0}, {1}, {2}, and so on). Each format specifier is then replaced with the value of the variable to be printed. The compiler looks at the number in the format specifier, takes the argument with the same index in the argument list, and makes the substitution. In the preceding example, num1 and num2 are the arguments for the format specifiers.

What happens if you want to print out the value of a number enclosed with the {} characters? For example, say that you want to print the string {5} when the value of num1 is 5. You can do something like this:

num1 = 5;
            Console.WriteLine("{{{0}}}", num1); //---{5}---

Why are there two additional sets of {} characters for the format specifier? Well, if you only have one additional set of {} characters, the compiler interprets this to mean that you want to print the string literal {0}, as the following shows:

num1 = 5;
            Console.WriteLine("{{0}}", num1); //---{0}---

The two additional sets of {} characters indicate to the compiler that you want to specify a format specifier and at the same time surround the value with a pair of {} characters.

And as demonstrated earlier, the String class contains the Format() static method, which enables you to create a new string (as well as perform formatting on string data). The preceding statement could be rewritten using the following statements:

string formattedString = string.Format("{{{0}}}", num1);
            Console.WriteLine(formattedString); //---{5}---

To format numbers, you can use the format specifiers as shown here:

num1=5;
            Console.WriteLine("{0:N}", num1);      //---5.00---

            Console.WriteLine("{0:00000}", num1);  //---00005---
            //---OR---
            Console.WriteLine("{0:d5}", num1);     //---00005---

            Console.WriteLine("{0:d4}", num1);     //---0005---

            Console.WriteLine("{0,5:G}", num1);    //---    5 (4 spaces on left)---

For a detailed list of format specifiers you can use for formatting strings, please refer to the MSDN documentation under the topics "Standard Numeric Format Strings" and "Custom Numeric Format Strings."

You can also print out specific strings based on the value of a number. Consider the following example:

num1 = 0;
            Console.WriteLine("{0:yes;;no}", num1); //---no---
            num1 = 1;
            Console.WriteLine("{0:yes;;no}", num1); //---yes---
            num1 = 5;
            Console.WriteLine("{0:yes;;no}", num1); //---yes---

In this case, the format specifier contains two strings: yes and no. If the value of the variable (num) is nonzero, the first string will be returned (yes). If the value is 0, then it returns the second string (no). Here is another example:

num1 = 0;
            Console.WriteLine("{0:OK;;Cancel}", num1); //---Cancel---
            num1 = 1;
            Console.WriteLine("{0:OK;;Cancel}", num1); //---OK---
            num1 = 5;
            Console.WriteLine("{0:OK;;Cancel}", num1); //---OK---

For decimal number formatting, use the following format specifiers:

double val1 = 3.5;
            Console.WriteLine("{0:##.00}", val1);   //---3.50---
            Console.WriteLine("{0:##.000}", val1);  //---3.500---
            Console.WriteLine("{0:0##.000}", val1); //---003.500---

There are times when numbers are represented in strings. For example, the value 9876 may be represented in a string with a comma denoting the thousandth position. In this case, you cannot simply use the Parse() method from the int class, like this:

string str2 = "9,876";
            int num3 = int.Parse(str2); //---error---

To correctly parse the string, use the following statement:

int num3 = int.Parse(
                str2,
                System.Globalization.NumberStyles.AllowThousands);
            Console.WriteLine(num3);  //---9876---

Here is another example:

string str3 = "1,239,876";
            num3 = int.Parse(
                str3,
                System.Globalization.NumberStyles.AllowThousands);
            Console.WriteLine(num3);  //---1239876---

What about the reverse — formatting a number with the comma separator? Here is the solution:

num3 = 9876;
            Console.WriteLine("{0:#,0}", num3); //---9,876---

            num3 = 1239876;
            Console.WriteLine("{0:#,0}", num3); //---1,239,876---

Last, to format a special number (such as a phone number), use the following format specifier:

long phoneNumber = 1234567890;
          Console.WriteLine("{0:###-###-####}", phoneNumber); //---123-456-7890---

The StringBuilder Class

Earlier in this chapter you saw how to easily concatenate two strings by using the + operator. That's fine if you are concatenating a small number of strings, but it is not recommended for large numbers of strings. The reason is that String objects in .NET are immutable, which means that once a string variable is initialized, its value cannot be changed. When you concatenate another string to an existing one, you actually discard its old value and create a new string object containing the result of the concatenation. When you repeat this process several times, you incur a performance penalty as new temporary objects are created and old objects discarded.

Note

One important application of the StringBuilder class is its use in .NET interop with native C/C++ APIs that take string arguments and modify strings. One example of this is the Windows API function GetWindowText(). This function has a second argument that takes a TCHAR* parameter. To use this function from .NET code, you would need to pass a StringBuilder object as this argument.

Consider the following example, where you concatenate all the numbers from 0 to 9999:

int counter = 9999;
            string s = string.Empty;
            for (int i = 0; i <= counter; i++) {
                s += i.ToString();
            }
            Console.WriteLine(s);

At first glance, the code looks innocent enough. But let's use the Stopwatch object to time the operation. Modify the code as shown here:

int counter = 9999;
            System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
            sw.Start();

            string s = string.Empty;
            for (int i = 0; i <= counter; i++) {
                s += i.ToString();
            }

            sw.Stop();
            Console.WriteLine("Took {0} ms", sw.ElapsedMilliseconds);
            Console.WriteLine(s);

On average, it took about 374 ms on my computer to run this operation. Let's now use the StringBuilder class in .NET to perform the string concatenation, using its Append() method:

System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
            sw.Start();

            StringBuilder sb = new StringBuilder();
            for (int i = 0; i <= 9999; i++) {
                sb.Append (i.ToString());
            }

            sw.Stop();

            Console.WriteLine("Took {0} ms", sw.ElapsedMilliseconds);
            Console.WriteLine(sb.ToString());

On average, it took about 6 ms on my computer to perform this operation. As you can deduce, the improvement is drastic — 98% ((374—6)/374). If you increase the value of the loop variant (counter), you will find that the improvement is even more dramatic.

The StringBuilder class represents a mutable string of characters. Its behavior is like the String object except that its value can be modified once it has been created.

The StringBuilder class contains some other important methods, which are described in the following table.

Method

Description

Append

Appends the string representation of a specified object to the end of this instance.

AppendFormat

Appends a formatted string, which contains zero or more format specifiers, to this instance. Each format specification is replaced by the string representation of a corresponding object argument.

AppendLine

Appends the default line terminator, or a copy of a specified string and the default line terminator, to the end of this instance.

CopyTo

Copies the characters from a specified segment of this instance to a specified segment of a destination Char array.

Insert

Inserts the string representation of a specified object into this instance at a specified character position.

Remove

Removes the specified range of characters from this instance.

Replace

Replaces all occurrences of a specified character or string in this instance with another specified character or string.

ToString

Converts the value of a StringBuilder to a String.

Regular Expressions

When dealing with strings, you often need to perform checks on them to see if they match certain patterns. For example, if your application requires the user to enter an email address so that you can send them a confirmation email later on, it is important to at least verify that the user has entered a correctly formatted email address. To perform the checking, you can use the techniques that you have learnt earlier in this chapter by manually looking for specific patterns in the email address. However, this is a tedious and mundane task.

A better approach would be to use regular expressions — a language for describing and manipulating text. Using regular expressions, you can define the patterns of a text and match it against a string. In the .NET Framework, the System.Text.RegularExpressions namespace contains the RegEx class for manipulating regular expressions.

Searching for a Match

To use the RegEx class, first you need to import the System.Text.RegularExpressions namespace:

using System.Text.RegularExpressions;

The following statements shows how you can create an instance of the RegEx class, specify the pattern to search for, and match it against a string:

string s = "This is a string";
            Regex r = new Regex("string");
            if (r.IsMatch(s))
            {
                Console.WriteLine("Matches.");
            }

In this example, the Regex class takes in a string constructor, which is the pattern you are searching for. In this case, you are searching for the word "string" and it is matched against the s string variable. The IsMatch() method returns True if there is a match (that is, the string s contains the word "string").

To find the exact position of the text "string" in the variable, you can use the Match() method of the RegEx class. It returns a Match object that you can use to get the position of the text that matches the search pattern using the Index property:

string s = "This is a string";
            Regex r = new Regex("string");
            if (r.IsMatch(s))
            {
                Console.WriteLine("Matches.");
            }

            Match m = r.Match(s);
            if (m.Success)
            {
                Console.WriteLine("Match found at " + m.Index);
                //---Match found at 10---
            }

What if you have multiple matches in a string? In this case, you can use the Matches() method of the RegEx class. This method returns a MatchCollection object, and you can iteratively loop through it to obtain the index positions of each individual match:

string s = "This is a string and a long string indeed";
            Regex r = new Regex("string");

            MatchCollection mc = r.Matches(s);
            foreach (Match m1 in mc)
            {
                Console.WriteLine("Match found at " + m1.Index);
                //---Match found at 10---
                //---Match found at 28---
            }

More Complex Pattern Matching

You can specify more complex searches using regular expressions operators. For example, to know if a string contains either the word "Mr" or "Mrs", you can use the operator |, like this:

string gender = "Mr Wei-Meng Lee";
            Regex r = new Regex("Mr|Mrs");
            if (r.IsMatch(gender))
            {
                Console.WriteLine("Matches.");
            }

The following table describes regular expression operators commonly used in search patterns.

Operator

Description

.

Match any one character

[ ]

Match any one character listed between the brackets

[^]

Match any one character not listed between the brackets

?

Match any character one time, if it exists

*

Match declared element multiple times, if it exists

+

Match declared element one or more times

{n}

Match declared element exactly n times

{n,}

Match declared element at least n times

{n,N}

Match declared element at least n times, but not more than N times

^

Match at the beginning of a line

$

Match at the end of a line

<

Match at the beginning of a word

>

Match at the end of a word



Match at the beginning or end of a word

B

Match in the middle of a word

d

Shorthand for digits (0–9)

w

Shorthand for word characters (letters and digits)

s

Shorthand for whitespace

Another common search pattern is verifying a string containing a date. For example, if a string contains a date in the format "yyyy/mm/dd", you would specify the search pattern as follows: "(19|20)dd[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])". This pattern will match dates ranging from 1900-01-01 to 2099-12-31.

string date = "2007/03/10";
            Regex r = new Regex(@"(19|20)dd[- /.](0[1-9]|1[012])[- /.]
(0[1-9]|[12][0-9]|3[01])");
            if (r.IsMatch(date))
            {
                Console.WriteLine("Matches.");
            }

You can use the following date separators with the pattern specified above:

string date = "2007/03/10";
string date = "2007-03-10";
string date = "2007 03 10";
string date = "2007.03.10";

Some commonly used search patterns are described in the following table.

Pattern

Description

[0-9]

Digits

[A-Fa-f0-9]

Hexadecimal digits

[A-Za-z0-9]

Alphanumeric characters

[A-Za-z]

Alphabetic characters

[a-z]

Lowercase letters

[A-Z]

Uppercase letters

[ ]

Space and tab

[x00-x1Fx7F]

Control characters

[x21-x7E]

Visible characters

[x20-x7E]

Visible characters and spaces

[!"#$%&'()*+,-./:;<=>?@[\]_`{|}~]

Punctuation characters

[ vf]

Whitespace characters

w+([-+.']w+)*@w+([-.]w+)*.w+([-.]w+)*

Email address

http(s)?://([w-]+.)+[w-]+(/[w- ./?%&=]*)?

Internet URL

(((d{3}) ?)|(d{3}-))?d{3}-d{4}

U.S. phone number

d{3}-d{2}-d{4}

U.S. Social Security number

d{5}(-d{4})?

U.S. ZIP code

To verify that an email address is correctly formatted, you can use the following statements with the specified regular expression:

string email = "[email protected]";
            Regex r = new Regex(@"^[w-.]+@([w-]+.)+[w-]{2,4}$");
            if (r.IsMatch(email))
                Console.WriteLine("Email address is correct.");
            else
                Console.WriteLine("Email address is incorrect.");

There are many different regular expressions that you can use to validate an email address. However, there is no perfect regular expression to validate all email addresses. For more information on validating email addresses using regular expressions, check out the following web sites: http://regular-expressions.info/email.html and http://fightingforalostcause.net/misc/2006/compare-email-regex.php.

Summary

String manipulations are common operations, so it's important that you have a good understanding of how they work and the various methods and classes that deal with them. This chapter provided a lot of information about how strings are represented in C# and about using regular expressions to perform matching on strings.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.64.243