Chapter 16. Strings and Characters

 

The chief defect of Henry King Was chewing little bits of string.

 
 --Hilaire Belloc
 

The difference between the almost-right word and the right word is really a large matter—it’s the difference between the lightning bug and the lightning.

 
 --Mark Twain
<feature> <supertitle>Objectives</supertitle>

In this chapter you’ll learn:

<objective>

To create and manipulate immutable character-string objects of class string and mutable character-string objects of class StringBuilder.

</objective>
<objective>

To manipulate character objects of struct Char.

</objective>
<objective>

To use regular-expression classes Regex and Match.

</objective>
<objective>

To iterate through matches to a regular expression.

</objective>
<objective>

To use character classes to match any character from a set of characters.

</objective>
<objective>

To use quantifiers to match a pattern multiple times.

</objective>
<objective>

To search for patterns in text using regular expressions.

</objective>
<objective>

To validate data using regular expressions and LINQ.

</objective>
<objective>

To modify strings using regular expressions and class Regex.

</objective>
</feature>
<feature> <supertitle>Outline</supertitle> </feature>

Introduction

This chapter introduces the .NET Framework Class Library’s string- and character- processing capabilities and demonstrates how to use regular expressions to search for patterns in text. The techniques it presents can be employed in text editors, word processors, page-layout software, computerized typesetting systems and other kinds of text-processing software. Previous chapters presented some basic string-processing capabilities. Now we discuss in detail the text-processing capabilities of class string and type char from the System namespace and class StringBuilder from the System.Text namespace.

We begin with an overview of the fundamentals of characters and strings in which we discuss character constants and string literals. We then provide examples of class string’s many constructors and methods. The examples demonstrate how to determine the length of strings, copy strings, access individual characters in strings, search strings, obtain substrings from larger strings, compare strings, concatenate strings, replace characters in strings and convert strings to uppercase or lowercase letters.

Next, we introduce class StringBuilder, which is used to build strings dynamically. We demonstrate StringBuilder capabilities for determining and specifying the size of a StringBuilder, as well as appending, inserting, removing and replacing characters in a StringBuilder object. We then introduce the character-testing methods of struct Char that enable a program to determine whether a character is a digit, a letter, a lowercase letter, an uppercase letter, a punctuation mark or a symbol other than a punctuation mark. Such methods are useful for validating individual characters in user input. In addition, type Char provides methods for converting a character to uppercase or lowercase.

We provide an online section that discusses regular expressions. We present classes Regex and Match from the System.Text.RegularExpressions namespace as well as the symbols that are used to form regular expressions. We then demonstrate how to find patterns in a string, match entire strings to patterns, replace characters in a string that match a pattern and split strings at delimiters specified as a pattern in a regular expression.

Fundamentals of Characters and Strings

Characters are the fundamental building blocks of C# source code. Every program is composed of characters that, when grouped together meaningfully, create a sequence that the compiler interprets as instructions describing how to accomplish a task. In addition to normal characters, a program also can contain character constants. A character constant is a character that’s represented as an integer value, called a character code. For example, the integer value 122 corresponds to the character constant 'z'. The integer value 10 corresponds to the newline character ' '. Character constants are established according to the Unicode character set, an international character set that contains many more symbols and letters than does the ASCII character set (listed in Appendix C). To learn more about Unicode, see Appendix F.

A string is a series of characters treated as a unit. These characters can be uppercase letters, lowercase letters, digits and various special characters: +, -, *, /, $ and others. A string is an object of class string in the System namespace.[1] We write string literals, also called string constants, as sequences of characters in double quotation marks, as follows:

"John Q. Doe"
"9999 Main Street"
"Waltham, Massachusetts"
"(201) 555-1212"

A declaration can assign a string literal to a string reference. The declaration

string color = "blue";

initializes string reference color to refer to the string literal object "blue".

Performance Tip 16.1

Performance Tip 16.1

If there are multiple occurrences of the same string literal object in an application, a single copy of it will be referenced from each location in the program that uses that string literal. It’s possible to share the object in this manner, because string literal objects are implicitly constant. Such sharing conserves memory.

On occasion, a string will contain multiple backslash characters (this often occurs in the name of a file). To avoid excessive backslash characters, it’s possible to exclude escape sequences and interpret all the characters in a string literally, using the @ character. Backslashes within the double quotation marks following the @ character are not considered escape sequences, but rather regular backslash characters. Often this simplifies programming and makes the code easier to read. For example, consider the string "C:MyFolderMySubFolderMyFile.txt" with the following assignment:

string file = "C:\MyFolder\MySubFolder\MyFile.txt";

Using the verbatim string syntax, the assignment can be altered to

string file = @"C:MyFolderMySubFolderMyFile.txt";

This approach also has the advantage of allowing string literals to span multiple lines by preserving all newlines, spaces and tabs.

string Constructors

Class string provides eight constructors for initializing strings in various ways. Figure 16.1 demonstrates three of the constructors.

Example 16.1. string constructors.

 1   // Fig. 16.1: StringConstructor.cs
 2   // Demonstrating string class constructors.
 3   using System;
 4
 5   class StringConstructor
 6   {
 7      public static void Main( string[] args )
 8      {
 9         // string initialization                             
10         char[] characterArray =                              
11            { 'b', 'i', 'r', 't', 'h', ' ', 'd', 'a', 'y'};   
12         string originalString = "Welcome to C# programming!";
13         string string1 = originalString;                     
14         string string2 = new string( characterArray );       
15         string string3 = new string( characterArray, 6, 3 ); 
16         string string4 = new string( 'C', 5 );               
17                            
18         Console.WriteLine( "string1 = " + """ + string1 + ""
" +
19            "string2 = " + """ + string2 + ""
" +
20            "string3 = " + """ + string3 + ""
" +
21            "string4 = " + """ + string4 + ""
" );
22      } // end Main
23   } // end class StringConstructor
string1 = "Welcome to C# programming!" 
string2 = "birth day" 
string3 = "day" 
string4 = "CCCCC"

Lines 10–11 allocate the char array characterArray, which contains nine characters. Lines 12–16 declare the strings originalString, string1, string2, string3 and string4. Line 12 assigns string literal "Welcome to C# programming!" to string reference originalString. Line 13 sets string1 to reference the same string literal.

Line 14 assigns to string2 a new string, using the string constructor with a character array argument. The new string contains a copy of the array’s characters.

Line 15 assigns to string3 a new string, using the string constructor that takes a char array and two int arguments. The second argument specifies the starting index position (the offset) from which characters in the array are to be copied. The third argument specifies the number of characters (the count) to be copied from the specified starting position in the array. The new string contains a copy of the specified characters in the array. If the specified offset or count indicates that the program should access an element outside the bounds of the character array, an ArgumentOutOfRangeException is thrown.

Line 16 assigns to string4 a new string, using the string constructor that takes as arguments a character and an int specifying the number of times to repeat that character in the string.

Software Engineering Observation 16.1

Software Engineering Observation 16.1

In most cases, it’s not necessary to make a copy of an existing string. All strings are immutable—their character contents cannot be changed after they’re created. Also, if there are one or more references to a string (or any object for that matter), the object cannot be reclaimed by the garbage collector.

string Indexer, Length Property and CopyTo Method

The application in Fig. 16.2 presents the string indexer, which facilitates the retrieval of any character in the string, and the string property Length, which returns the length of the string. The string method CopyTo copies a specified number of characters from a string into a char array.

Example 16.2. string indexer, Length property and CopyTo method.

 1   // Fig. 16.2: StringMethods.cs
 2   // Using the indexer, property Length and method CopyTo
 3   // of class string.
 4   using System;
 5
 6   class StringMethods
 7   {
 8      public static void Main( string[] args )
 9      {
10         string string1 = "hello there";
11         char[] characterArray = new char[ 5 ];
12
13         // output string1
14         Console.WriteLine( "string1: "" + string1 + """ );
15
16         // test Length property                                     
17         Console.WriteLine( "Length of string1: " + string1.Length );
18
19         // loop through characters in string1 and display reversed
20         Console.Write( "The string reversed is: " );
21
22         for ( int i = string1.Length - 1; i >= 0; i-- )
23            Console.Write( string1[ i ] );
24
25         // copy characters from string1 into characterArray
26         string1.CopyTo( 0, characterArray, 0, characterArray.Length );
27         Console.Write( "
The character array is: " );
28
29         for ( int i = 0; i < characterArray.Length; i++ )
30            Console.Write( characterArray[ i ] ); 
31
32         Console.WriteLine( "
" );
33      } // end Main
34   } // end class StringMethods
string1: "hello there" 
Length of string1: 11 
The string reversed is: ereht olleh 
The character array is: hello

This application determines the length of a string, displays its characters in reverse order and copies a series of characters from the string to a character array. Line 17 uses string property Length to determine the number of characters in string1. Like arrays, strings always know their own size.

Lines 22–23 write the characters of string1 in reverse order using the string indexer. The string indexer treats a string as an array of chars and returns each character at a specific position in the string. The indexer receives an integer argument as the position number and returns the character at that position. As with arrays, the first element of a string is considered to be at position 0.

Common Programming Error 16.1

Common Programming Error 16.1

Attempting to access a character that’s outside a string’s bounds results in an Index-OutOfRangeException.

Line 26 uses string method CopyTo to copy the characters of string1 into a character array (characterArray). The first argument given to method CopyTo is the index from which the method begins copying characters in the string. The second argument is the character array into which the characters are copied. The third argument is the index specifying the starting location at which the method begins placing the copied characters into the character array. The last argument is the number of characters that the method will copy from the string. Lines 29–30 output the char array contents one character at a time.

Comparing strings

The next two examples demonstrate various methods for comparing strings. To understand how one string can be “greater than” or “less than” another, consider the process of alphabetizing a series of last names. The reader would, no doubt, place "Jones" before "Smith", because the first letter of "Jones" comes before the first letter of "Smith" in the alphabet. The alphabet is more than just a set of 26 letters—it’s an ordered list of characters in which each letter occurs in a specific position. For example, Z is more than just a letter of the alphabet; it’s specifically the twenty-sixth letter of the alphabet. Computers can order characters alphabetically because they’re represented internally as Unicode numeric codes.

Comparing Strings with Equals, CompareTo and the Equality Operator (==)

Class string provides several ways to compare strings. The application in Fig. 16.3 demonstrates the use of method Equals, method CompareTo and the equality operator (==).

Example 16.3. string test to determine equality.

 1   // Fig. 16.3: StringCompare.cs
 2   // Comparing strings
 3   using System;
 4
 5   class StringCompare
 6   {
 7      public static void Main( string[] args )
 8      {
 9         string string1 = "hello";
10         string string2 = "good bye";
11         string string3 = "Happy Birthday";
12         string string4 = "happy birthday";
13
14         // output values of four strings
15         Console.WriteLine( "string1 = "" + string1 + """ +
16            "nstring2 = "" + string2 + """ +
17            "nstring3 = "" + string3 + """ +
18            "nstring4 = "" + string4 + ""
" );
19
20         // test for equality using Equals method
21         if ( string1.Equals( "hello" ) )
22            Console.WriteLine( "string1 equals "hello"" );
23         else
24            Console.WriteLine( "string1 does not equal "hello"" );
25
26         // test for equality with ==
27         if ( string1 == "hello" )
28            Console.WriteLine( "string1 equals "hello"" );
29         else
30            Console.WriteLine( "string1 does not equal "hello"" );
31
32         // test for equality comparing case
33         if ( string.Equals( string3, string4 ) ) // static method
34            Console.WriteLine( "string3 equals string4" );
35         else
36            Console.WriteLine( "string3 does not equal string4" );
37
38         // test CompareTo
39         Console.WriteLine( "
string1.CompareTo( string2 ) is "+
40            string1.CompareTo( string2 ) + "
" +
41            "string2.CompareTo( string1 ) is " +
42            string2.CompareTo( string1 ) + "
" +
43            "string1.CompareTo( string1 ) is " +
44            string1.CompareTo( string1 ) + "
" +
45            "string3.CompareTo( string4 ) is " + 
46            string3.CompareTo( string4 ) + "
" +
47            "string4.CompareTo( string3 ) is " +
48            string4.CompareTo( string3 ) + "

" );
49      } // end Main
50   } // end class StringCompare

string1 = "hello" 
string2 = "good bye" 
string3 = "Happy Birthday" 
string4 = "happy birthday"

string1 equals "hello" 
string1 equals "hello" 
string3 does not equal string4

string1.CompareTo( string2 ) is 1 
string2.CompareTo( string1 ) is -1 
string1.CompareTo( string1 ) is 0 
string3.CompareTo( string4 ) is 1 
string4.CompareTo( string3 ) is -1

The condition in line 21 uses string method Equals to compare string1 and literal string "hello" to determine whether they’re equal. Method Equals (inherited from object and overridden in string) tests any two objects for equality (i.e., checks whether the objects have identical contents). The method returns true if the objects are equal and false otherwise. In this case, the condition returns true, because string1 references string literal object "hello". Method Equals uses word sorting rules that depend on your system’s currently selected culture. Comparing "hello" with "HELLO" would return false, because the lowercase letters are different from the those of corresponding uppercase letters.

The condition in line 27 uses the overloaded equality operator (==) to compare string string1 with the literal string "hello" for equality. In C#, the equality operator also compares the contents of two strings. Thus, the condition in the if statement evaluates to true, because the values of string1 and "hello" are equal.

Line 33 tests whether string3 and string4 are equal to illustrate that comparisons are indeed case sensitive. Here, static method Equals is used to compare the values of two strings. "Happy Birthday" does not equal "happy birthday", so the condition of the if statement fails, and the message "string3 does not equal string4" is output (line 36).

Lines 40–48 use string method CompareTo to compare strings. Method CompareTo returns 0 if the strings are equal, a negative value if the string that invokes CompareTo is less than the string that’s passed as an argument and a positive value if the string that invokes CompareTo is greater than the string that’s passed as an argument.

Notice that CompareTo considers string3 to be greater than string4. The only difference between these two strings is that string3 contains two uppercase letters in positions where string4 contains lowercase letters.

Determining Whether a String Begins or Ends with a Specified String

Figure 16.4 shows how to test whether a string instance begins or ends with a given string. Method StartsWith determines whether a string instance starts with the string text passed to it as an argument. Method EndsWith determines whether a string instance ends with the string text passed to it as an argument. Class stringStartEnd’s Main method defines an array of strings (called strings), which contains "started", "starting", "ended" and "ending". The remainder of method Main tests the elements of the array to determine whether they start or end with a particular set of characters.

Example 16.4. StartsWith and EndsWith methods.

 1   // Fig. 16.4: StringStartEnd.cs
 2   // Demonstrating StartsWith and EndsWith methods.
 3   using System;
 4
 5   class StringStartEnd
 6   {
 7      public static void Main( string[] args )
 8      {
 9         string[] strings = { "started", "starting", "ended", "ending" };
10
11         // test every string to see if it starts with "st"
12         for ( int i = 0; i < strings.Length; i++ )
13            if ( strings[ i ].StartsWith( "st" ) )
14               Console.WriteLine( """ + strings[ i ] + """ + 
15                  " starts with "st"" );
16
17         Console.WriteLine();
18
19         // test every string to see if it ends with "ed"
20         for ( int i = 0; i < strings.Length; i++ )
21            if ( strings[ i ].EndsWith ( "ed" ) )
22               Console.WriteLine( """ + strings[ i ] + """ +
23                  " ends with "ed"" );
24
25         Console.WriteLine();
26      } // end Main
27   } // end class StringStartEnd

"started" starts with "st"
"starting" starts with "st"

"started" ends with "ed"
"ended" ends with "ed"

Line 13 uses method StartsWith, which takes a string argument. The condition in the if statement determines whether the string at index i of the array starts with the characters "st". If so, the method returns true, and strings[i] is output along with a message.

Line 21 uses method EndsWith to determine whether the string at index i of the array ends with the characters "ed". If so, the method returns true, and strings[i] is displayed along with a message.

Locating Characters and Substrings in strings

In many applications, it’s necessary to search for a character or set of characters in a string. For example, a programmer creating a word processor would want to provide capabilities for searching through documents. The application in Fig. 16.5 demonstrates some of the many versions of string methods IndexOf, IndexOfAny, LastIndexOf and LastIndexOfAny, which search for a specified character or substring in a string. We perform all searches in this example on the string letters (initialized with "abcdefghijklmabcdefghijklm") located in method Main of class StringIndexMethods.

Example 16.5. Searching for characters and substrings in strings.

 1   // Fig. 16.5: StringIndexMethods.cs
 2   // Using string-searching methods.
 3   using System;
 4
 5   class StringIndexMethods
 6   {
 7      public static void Main( string[] args )
 8      {
 9         string letters = "abcdefghijklmabcdefghijklm";
10         char[] searchLetters = { 'c', 'a', '$' };
11 
12         // test IndexOf to locate a character in a string
13         Console.WriteLine( "First 'c' is located at index " +
14            letters.IndexOf( 'c' ) );
15         Console.WriteLine( "First 'a' starting at 1 is located at index " +
16            letters.IndexOf( 'a', 1 ) );
17         Console.WriteLine( "First '$' in the 5 positions starting at 3 " +
18            "is located at index " + letters.IndexOf('$' , 3 , 5) );
19
20         // test LastIndexOf to find a character in a string
21         Console.WriteLine( "
Last 'c' is located at index " +
22            letters.LastIndexOf( 'c' ) );
23         Console.WriteLine( "Last 'a' up to position 25 is located at " +
24            "index " + letters.LastIndexOf( 'a', 25 ) );
25         Console.WriteLine( "Last '$' in the 5 positions starting at 15 " +
26            "is located at index " + letters.LastIndexOf( '$', 15, 5 ) );
27
28         // test IndexOf to locate a substring in a string
29         Console.WriteLine( "
First "def" is located at index " +
30            letters.IndexOf( "def" ) );
31         Console.WriteLine( "First "def" starting at 7 is located at " +
32            "index " + letters.IndexOf( "def", 7 ) );
33         Console.WriteLine( "First "hello" in the 15 positions " +
34            "starting at 5 is located at index " +
35            letters.IndexOf( "hello", 5, 15 ) );
36
37         // test LastIndexOf to find a substring in a string
38         Console.WriteLine( "
Last "def" is located at index " +
39            letters.LastIndexOf( "def" ) );
40         Console.WriteLine( "Last "def" up to position 25 is located " +
41            "at index " + letters.LastIndexOf( "def", 25 ) );
42         Console.WriteLine( "Last "hello" in the 15 positions " +
43            "ending at 20 is located at index " +
44            letters.LastIndexOf( "hello", 20, 15 ) );
45
46         // test IndexOfAny to find first occurrence of character in array
47         Console.WriteLine( "
First 'c', 'a' or '$' is " +
48            "located at index " + letters.IndexOfAny( searchLetters ) );
49         Console.WriteLine("First 'c', 'a' or '$' starting at 7 is " +
50            "located at index " + letters.IndexOfAny( searchLetters, 7 ) );
51         Console.WriteLine( "First 'c', 'a' or '$' in the 5 positions " +
52            "starting at 7 is located at index " +
53            letters.IndexOfAny( searchLetters, 7, 5) );
54
55         // test LastIndexOfAny to find last occurrence of character
56         // in array
57         Console.WriteLine( "
Last 'c', 'a' or '$' is " +
58            "located at index " + letters.LastIndexOfAny( searchLetters ) );
59         Console.WriteLine( "Last 'c', 'a' or '$' up to position 1 is " +
60            "located at index " +
61            letters.LastIndexOfAny( searchLetters, 1 ) );
62         Console.WriteLine( "Last 'c', 'a' or '$' in the 5 positions " +
63            "ending at 25 is located at index " +
64            letters.LastIndexOfAny( searchLetters, 25, 5 ) );
65      } // end Main
66   } // end class StringIndexMethods

First 'c' is located at index 2 
First 'a' starting at 1 is located at index 13 
First '$' in the 5 positions starting at 3 is located at index -1

Last 'c' is located at index 15 
Last 'a' up to position 25 is located at index 13 
Last '$' in the 5 positions starting at 15 is located at index -1

First "def" is located at index 3 
First "def" starting at 7 is located at index 16 
First "hello" in the 15 positions starting at 5 is located at index -1

Last "def" is located at index 16 
Last "def" up to position 25 is located at index 16 
Last "hello" in the 15 positions ending at 20 is located at index -1

First 'c', 'a' or '$' is located at index 0 
First 'c', 'a' or '$' starting at 7 is located at index 13 
First 'c', 'a' or '$' in the 5 positions starting at 7 is located at index -1

Last 'c', 'a' or '$' is located at index 15 
Last 'c', 'a' or '$' up to position 1 is located at index 0 
Last 'c', 'a' or '$' in the 5 positions ending at 25 is located at index -1

Lines 14, 16 and 18 use method IndexOf to locate the first occurrence of a character or substring in a string. If it finds a character, IndexOf returns the index of the specified character in the string; otherwise, IndexOf returns –1. The expression in line 16 uses a version of method IndexOf that takes two arguments—the character to search for and the starting index at which the search of the string should begin. The method does not examine any characters that occur prior to the starting index (in this case, 1). The expression in line 18 uses another version of method IndexOf that takes three arguments—the character to search for, the index at which to start searching and the number of characters to search.

Lines 22, 24 and 26 use method LastIndexOf to locate the last occurrence of a character in a string. Method LastIndexOf performs the search from the end of the string to the beginning of the string. If it finds the character, LastIndexOf returns the index of the specified character in the string; otherwise, LastIndexOf returns –1. There are three versions of method LastIndexOf. The expression in line 22 uses the version that takes as an argument the character for which to search. The expression in line 24 uses the version that takes two arguments—the character for which to search and the highest index from which to begin searching backward for the character. The expression in line 26 uses a third version of method LastIndexOf that takes three arguments—the character for which to search, the starting index from which to start searching backward and the number of characters (the portion of the string) to search.

Lines 29–44 use versions of IndexOf and LastIndexOf that take a string instead of a character as the first argument. These versions of the methods perform identically to those described above except that they search for sequences of characters (or substrings) that are specified by their string arguments.

Lines 47–64 use methods IndexOfAny and LastIndexOfAny, which take an array of characters as the first argument. These versions of the methods also perform identically to those described above, except that they return the index of the first occurrence of any of the characters in the character-array argument.

Common Programming Error 16.2

Common Programming Error 16.2

In the overloaded methods LastIndexOf and LastIndexOfAny that take three parameters, the second argument must be greater than or equal to the third. This might seem counterintuitive, but remember that the search moves from the end of the string toward the start of the string.

Extracting Substrings from strings

Class string provides two Substring methods, which create a new string by copying part of an existing string. Each method returns a new string. The application in Fig. 16.6 demonstrates the use of both methods.

Example 16.6. Substrings generated from strings.

 1   // Fig. 16.6: SubString.cs
 2   // Demonstrating the string Substring method.
 3   using System;
 4 
 5   class SubString
 6   {
 7      public static void Main( string[] args )
 8      {
 9         string letters = "abcdefghijklmabcdefghijklm";
10
11         // invoke Substring method and pass it one parameter
12         Console.WriteLine( "Substring from index 20 to end is "" +
13            letters.Substring( 20 ) + """ );
14
15         // invoke Substring method and pass it two parameters
16         Console.WriteLine( "Substring from index 0 of length 6 is "" +
17            letters.Substring( 0, 6 ) + """ );
18      } // end method Main
19  } // end class SubString
Substring from index 20 to end is "hijklm" 
Substring from index 0 of length 6 is "abcdef"

The statement in line 13 uses the Substring method that takes one int argument. The argument specifies the starting index from which the method copies characters in the original string. The substring returned contains a copy of the characters from the starting index to the end of the string. If the index specified in the argument is outside the bounds of the string, the program throws an ArgumentOutOfRangeException.

The second version of method Substring (line 17) takes two int arguments. The first argument specifies the starting index from which the method copies characters from the original string. The second argument specifies the length of the substring to copy. The substring returned contains a copy of the specified characters from the original string. If the supplied length of the substring is too large (i.e., the substring tries to retrieve characters past the end of the original string), an ArgumentOutOfRangeException is thrown.

Concatenating strings

The + operator is not the only way to perform string concatenation. The static method Concat of class string (Fig. 16.7) concatenates two strings and returns a new string containing the combined characters from both original strings. Line 16 appends the characters from string2 to the end of a copy of string1, using method Concat. The statement in line 16 does not modify the original strings.

Example 16.7. Concat static method.

 1   // Fig. 16.7: SubConcatenation.cs
 2   // Demonstrating string class Concat method.
 3   using System;
 4
 5   class StringConcatenation
 6   {
 7      public static void Main( string[] args )
 8      {
 9         string string1 = "Happy ";
10         string string2 = "Birthday";
11
12         Console.WriteLine( "string1 = ""+ string1 + ""
" +
13            "string2 = "" + string2 + """ );
14         Console.WriteLine(
15            "
Result of string.Concat( string1, string2 ) = " +
16            string.Concat( string1, string2 ) );
17         Console.WriteLine( "string1 after concatenation = " + string1 );
18      } // end Main
19  } // end class StringConcatenation
string1 = "Happy " 
string2 = "Birthday"

Result of string.Concat( string1, string2 ) = Happy Birthday 
string1 after concatenation = Happy

Miscellaneous string Methods

Class string provides several methods that return modified copies of strings. The application in Fig. 16.8 demonstrates the use of these methods, which include string methods Replace, ToLower, ToUpper and Trim.

Example 16.8. string methods Replace, ToLower, ToUpper and Trim.

 1   // Fig. 16.8: StringMethods2.cs
 2   // Demonstrating string methods Replace, ToLower, ToUpper, Trim,
 3   // and ToString.
 4   using System;
 5   
 6   class StringMethods2
 7   {
 8      public static void Main( string[] args )
 9      {
10         string string1 = "cheers!";
11         string string2 = "GOOD BYE ";
12         string string3 = "   spaces  ";
13
14         Console.WriteLine( "string1 = "" + string1 + ""
" +
15            "string2 = "" + string2 + ""
" +
16            "string3 = "" + string3 + """ );
17
18         // call method Replace
19         Console.WriteLine(
20            "
Replacing "e" with "E" in string1: "" +
21            string1.Replace( 'e', 'E' ) + """ );
22
23         // call ToLower and ToUpper
24         Console.WriteLine( "
string1.ToUpper() = "" +
25            string1.ToUpper() + ""
string2.ToLower() = "" +
26            string2.ToLower() + """ );
27
28         // call Trim method
29         Console.WriteLine( "
string3 after trim = "" +
30            string3.Trim() + """ );
31
32         Console.WriteLine( "
string1 = "" + string1 + """ );
33      } // end Main
34   } // end class StringMethods2
string1 = "cheers!" 
string2 = "GOOD BYE " 
string3 = "   spaces  "

Replacing "e" with "E" in string1: "chEErs!" 

string1.ToUpper() = "CHEERS!"  
string1.ToUpper() = "good bye "

string3 after trim = "spaces"

string1 = "cheers!"

Line 21 uses string method Replace to return a new string, replacing every occurrence in string1 of character 'e' with 'E'. Method Replace takes two arguments—a char for which to search and another char with which to replace all matching occurrences of the first argument. The original string remains unchanged. If there are no occurrences of the first argument in the string, the method returns the original string. An overloaded version of this method allows you to provide two strings as arguments.

The string method ToUpper generates a new string (line 25) that replaces any lowercase letters in string1 with their uppercase equivalents. The method returns a new string containing the converted string; the original string remains unchanged. If there are no characters to convert, the original string is returned. Line 26 uses string method ToLower to return a new string in which any uppercase letters in string2 are replaced by their lowercase equivalents. The original string is unchanged. As with ToUpper, if there are no characters to convert to lowercase, method ToLower returns the original string.

Line 30 uses string method Trim to remove all whitespace characters that appear at the beginning and end of a string. Without otherwise altering the original string, the method returns a new string that contains the string, but omits leading and trailing whitespace characters. This method is particularly useful for retrieving user input (i.e., via a TextBox). Another version of method Trim takes a character array and returns a copy of the string that does not begin or end with any of the characters in the array argument.

Class StringBuilder

The string class provides many capabilities for processing strings. However a string’s contents can never change. Operations that seem to concatenate strings are in fact assigning string references to newly created strings (e.g., the += operator creates a new string and assigns the initial string reference to the newly created string).

The next several sections discuss the features of class StringBuilder (namespace System.Text), used to create and manipulate dynamic string information—i.e., mutable strings. Every StringBuilder can store a certain number of characters that’s specified by its capacity. Exceeding the capacity of a StringBuilder causes the capacity to expand to accommodate the additional characters. As we’ll see, members of class StringBuilder, such as methods Append and AppendFormat, can be used for concatenation like the operators + and += for class string. StringBuilder is particularly useful for manipulating in place a large number of strings, as it’s much more efficient than creating individual immutable strings.

Performance Tip 16.2

Performance Tip 16.2

Objects of class string are immutable (i.e., constant strings), whereas objects of class StringBuilder are mutable. C# can perform certain optimizations involving strings (such as the sharing of one string among multiple references), because it knows these objects will not change.

Class StringBuilder provides six overloaded constructors. Class StringBuilderConstructor (Fig. 16.9) demonstrates three of these overloaded constructors.

Example 16.9. StringBuilder class constructors.

 1   // Fig. 16.9: StringBuilderConstructor.cs
 2   // Demonstrating StringBuilder class constructors.
 3   using System;
 4   using System.Text;
 5   
 6   class StringBuilderConstructor
 7   {
 8      public static void Main( string[] args )
 9      {
10         StringBuilder buffer1 = new StringBuilder();         
11         StringBuilder buffer2 = new StringBuilder( 10 );     
12         StringBuilder buffer3 = new StringBuilder( "hello" );
13
14         Console.WriteLine( "buffer1 = "" + buffer1 + """ );
15         Console.WriteLine( "buffer2 = "" + buffer2 + """ );
16         Console.WriteLine( "buffer3 = "" + buffer3 + """ );
17      } // end Main
18   } // end class StringBuilderConstructor

buffer1 = "" 
buffer2 = "" 
buffer3 = "hello"

Line 10 employs the no-parameter StringBuilder constructor to create a StringBuilder that contains no characters and has an implementation-specific default initial capacity. Line 11 uses the StringBuilder constructor that takes an int argument to create a StringBuilder that contains no characters and has the initial capacity specified in the int argument (i.e., 10). Line 12 uses the StringBuilder constructor that takes a string argument to create a StringBuilder containing the characters of the string argument. Lines 14–16 implicitly use StringBuilder method ToString to obtain string representations of the StringBuilders’ contents.

Length and Capacity Properties, EnsureCapacity Method and Indexer of Class StringBuilder

Class StringBuilder provides the Length and Capacity properties to return the number of characters currently in a StringBuilder and the number of characters that a StringBuilder can store without allocating more memory, respectively. These properties also can increase or decrease the length or the capacity of the StringBuilder. Method EnsureCapacity allows you to reduce the number of times that a StringBuilder’s capacity must be increased. The method ensures that the StringBuilder’s capacity is at least the specified value. The program in Fig. 16.10 demonstrates these methods and properties.

Example 16.10. StringBuilder size manipulation.

 1   // Fig. 16.10: StringBuilderFeatures.cs
 2   // Demonstrating some features of class StringBuilder.
 3   using System;
 4   using System.Text;
 5 
 6   class StringBuilderFeatures
 7   {
 8      public static void Main( string[] args )
 9      {
10         StringBuilder buffer =                        
11            new StringBuilder( "Hello, how are you?" );
12
13         // use Length and Capacity properties
14         Console.WriteLine( "buffer = " + buffer +
15            "
Length = " + buffer.Length +
16            "
Capacity = " + buffer.Capacity );
17
18         buffer.EnsureCapacity( 75 ); // ensure a capacity of at least 75
19         Console.WriteLine( "
New capacity = "+
20            buffer.Capacity );
21
22         // truncate StringBuilder by setting Length property
23         buffer.Length = 10; 
24         Console.Write( "
New length = " +
25            buffer.Length + "
buffer = " );
26
27         // use StringBuilder indexer
28         for ( int i = 0; i < buffer.Length; i++ )
29            Console.Write( buffer[ i ] );
30
31         Console.WriteLine( "
" );
32      } // end Main
33   } // end class StringBuilderFeatures
buffer = Hello, how are you? 
Length = 19 
Capacity = 19 
New length = 10 
buffer = Hello, how 

The program contains one StringBuilder, called buffer. Lines 10–11 of the program use the StringBuilder constructor that takes a string argument to instantiate the StringBuilder and initialize its value to "Hello, how are you?". Lines 14–16 output the content, length and capacity of the StringBuilder.

Line 18 expands the capacity of the StringBuilder to a minimum of 75 characters. If new characters are added to a StringBuilder so that its length exceeds its capacity, the capacity grows to accommodate the additional characters in the same manner as if method EnsureCapacity had been called.

Line 23 uses property Length to set the length of the StringBuilder to 10. If the specified length is less than the current number of characters in the StringBuilder, the contents of the StringBuilder are truncated to the specified length. If the specified length is greater than the number of characters currently in the StringBuilder, null characters are appended to the StringBuilder until the total number of characters in the StringBuilder is equal to the specified length.

Append and AppendFormat Methods of Class StringBuilder

Class StringBuilder provides 19 overloaded Append methods that allow various types of values to be added to the end of a StringBuilder. The Framework Class Library provides versions for each of the simple types and for character arrays, strings and objects. (Remember that method ToString produces a string representation of any object.) Each method takes an argument, converts it to a string and appends it to the StringBuilder. Figure 16.11 demonstrates the use of several Append methods.

Example 16.11. Append methods of StringBuilder.

 1   // Fig. 16.11: StringBuilderAppend.cs
 2   // Demonstrating StringBuilder Append methods.
 3   using System;
 4   using System.Text;
 5
 6   class StringBuilderAppend
 7   {
 8      public static void Main( string[] args )
 9      {
10         object objectValue = "hello";
11         string stringValue = "good bye";
12         char[] characterArray = { 'a', 'b', 'c', 'd', 'e', 'f' };
13         bool booleanValue = true;
14         char characterValue = 'Z'; 
15         int integerValue = 7;
16         long longValue = 1000000;
17         float floatValue = 2.5F; // F suffix indicates that 2.5 is a float
18         double doubleValue = 33.333;
19         StringBuilder buffer = new StringBuilder();
20
21         // use method Append to append values to buffer
22         buffer.Append( objectValue );                  
23         buffer.Append( "  " );                         
24         buffer.Append( stringValue );                  
25         buffer.Append( "  " );                         
26         buffer.Append( characterArray );               
27         buffer.Append( "  " );                         
28         buffer.Append( characterArray, 0, 3 );         
29         buffer.Append( "  " );                         
30         buffer.Append( booleanValue );                 
31         buffer.Append( "  " );                         
32         buffer.Append( characterValue );               
33         buffer.Append( "  " );                         
34         buffer.Append( integerValue );                 
35         buffer.Append( "  " );                         
36         buffer.Append( longValue );                    
37         buffer.Append( "  " );                         
38         buffer.Append( floatValue );                   
39         buffer.Append( "  " );                         
40         buffer.Append( doubleValue );                  
41
42         Console.WriteLine( "buffer = "+ buffer.ToString() + "
" );
43      } // end Main
44   } // end class StringBuilderAppend

buffer = hello  good bye  abcdef  abc  True  Z  7  1000000  2.5  33.333

Lines 22–40 use 10 different overloaded Append methods to attach the string representations of objects created in lines 10–18 to the end of the StringBuilder.

Class StringBuilder also provides method AppendFormat, which converts a string to a specified format, then appends it to the StringBuilder. The example in Fig. 16.12 demonstrates the use of this method.

Example 16.12. StringBuilder’s AppendFormat method.

 1   // Fig. 16.12: StringBuilderAppendFormat.cs
 2   // Demonstrating method AppendFormat.
 3   using System;
 4   using System.Text;
 5   
 6   class StringBuilderAppendFormat
 7   {
 8      public static void Main( string[] args )
 9      {
10         StringBuilder buffer = new StringBuilder();
11
12         // formatted string                         
13         string string1 = "This {0} costs: {1:C}.
";
14
15         // string1 argument array              
16         object[] objectArray = new object[ 2 ];
17
18         objectArray[ 0 ] = "car";  
19         objectArray[ 1 ] = 1234.56;
20
21         // append to buffer formatted string with argument
22         buffer.AppendFormat( string1, objectArray );      
23
24         // formatted string string
25         string string2 = "Number:{0:d3}.
" +           
26            "Number right aligned with spaces:{0, 4}.
"+
27            "Number left aligned with spaces:{0, -4}.";  
28
29         // append to buffer formatted string with argument
30         buffer.AppendFormat( string2, 5 );
31
32         // display formatted strings
33         Console.WriteLine( buffer.ToString() );
34      } // end Main
35   } // end class StringBuilderAppendFormat
This car costs: $1,234.56. 
Number:005.
Number right aligned with spaces: 5. 
Number left aligned with spaces:5 .

Line 13 creates a string that contains formatting information. The information enclosed in braces specifies how to format a specific piece of data. Formats have the form {X[,Y][:FormatString]}, where X is the number of the argument to be formatted, counting from zero. Y is an optional argument, which can be positive or negative, indicating how many characters should be in the result. If the resulting string is less than the number Y, it will be padded with spaces to make up for the difference. A positive integer aligns the string to the right; a negative integer aligns it to the left. The optional Format-String applies a particular format to the argument—currency, decimal or scientific, among others. In this case, “{0}” means the first argument will be printed out. “{1:C}” specifies that the second argument will be formatted as a currency value.

Line 22 shows a version of AppendFormat that takes two parameters—a string specifying the format and an array of objects to serve as the arguments to the format string. The argument referred to by “{0}” is in the object array at index 0.

Lines 25–27 define another string used for formatting. The first format “{0:d3}”, specifies that the first argument will be formatted as a three-digit decimal, meaning that any number having fewer than three digits will have leading zeros placed in front to make up the difference. The next format, “{0, 4}”,specifies that the formatted string should have four characters and be right aligned. The third format, “{0, -4}”, specifies that the strings should be aligned to the left.

Line 30 uses a version of AppendFormat that takes two parameters—a string containing a format and an object to which the format is applied. In this case, the object is the number 5. The output of Fig. 16.12 displays the result of applying these two versions of AppendFormat with their respective arguments.

Insert, Remove and Replace Methods of Class StringBuilder

Class StringBuilder provides 18 overloaded Insert methods to allow various types of data to be inserted at any position in a StringBuilder. The class provides versions for each of the simple types and for character arrays, strings and objects. Each method takes its second argument, converts it to a string and inserts the string into the StringBuilder in front of the character in the position specified by the first argument. The index specified by the first argument must be greater than or equal to 0 and less than the length of the StringBuilder; otherwise, the program throws an ArgumentOutOfRangeException.

Class StringBuilder also provides method Remove for deleting any portion of a StringBuilder. Method Remove takes two arguments—the index at which to begin deletion and the number of characters to delete. The sum of the starting index and the number of characters to be deleted must always be less than the length of the StringBuilder; otherwise, the program throws an ArgumentOutOfRangeException. The Insert and Remove methods are demonstrated in Fig. 16.13.

Example 16.13. StringBuilder text insertion and removal.

 1   // Fig. 16.13: StringBuilderInsertRemove.cs
 2   // Demonstrating methods Insert and Remove of the
 3   // StringBuilder class.
 4   using System;
 5   using System.Text;
 6
 7   class StringBuilderInsertRemove
 8   {
 9      public static void Main( string[] args )
10      {
11         object objectValue = "hello";
12         string stringValue = "good bye";
13         char[] characterArray = { 'a', 'b', 'c', 'd', 'e', 'f' };
14         bool booleanValue = true;
15         char characterValue = 'K';
16         int integerValue = 7;
17         long longValue = 10000000;
18         float floatValue = 2.5F; // F suffix indicates that 2.5 is a float
19         double doubleValue = 33.333;
20         StringBuilder buffer = new StringBuilder();
21 
22         // insert values into buffer
23         buffer.Insert( 0, objectValue );   
24         buffer.Insert( 0, "  " );          
25         buffer.Insert( 0, stringValue );   
26         buffer.Insert( 0, "  " );          
27         buffer.Insert( 0, characterArray );
28         buffer.Insert( 0, "  " );          
29         buffer.Insert( 0, booleanValue );  
30         buffer.Insert( 0, "  " );          
31         buffer.Insert( 0, characterValue );
32         buffer.Insert( 0, "  " );          
33         buffer.Insert( 0, integerValue );  
34         buffer.Insert( 0, "  " );          
35         buffer.Insert( 0, longValue );     
36         buffer.Insert( 0, "  " );          
37         buffer.Insert( 0, floatValue );    
38         buffer.Insert( 0, "  " );          
39         buffer.Insert( 0, doubleValue );   
40         buffer.Insert( 0, "  " );          
41
42         Console.WriteLine( "buffer after Inserts: 
" + buffer + "
" );
43
44         buffer.Remove( 10, 1 ); // delete 2 in 2.5      
45         buffer.Remove( 4, 4 );  // delete .333 in 33.333
46
47         Console.WriteLine( "buffer after Removes:
" + buffer );
48      } // end Main
49   } // end class StringBuilderInsertRemove
buffer after Inserts:
  33.333  2.5  10000000  7  K  True  abcdef  good  bye  hello
buffer after Removes:
  33  .5  10000000  7  K  True  abcdef  good  bye  hello

Another useful method included with StringBuilder is Replace. Replace searches for a specified string or character and substitutes another string or character in its place. Figure 16.14 demonstrates this method.

Example 16.14. StringBuilder text replacement.

 1   // Fig. 16.14: StringBuilderReplace.cs
 2   // Demonstrating method Replace.
 3   using System;
 4   using System.Text;
 5
 6   class StringBuilderReplace
 7   {
 8      public static void Main( string[] args ) 
 9      {
10         StringBuilder builder1 =
11            new StringBuilder( "Happy Birthday Jane" );
12         StringBuilder builder2 = 
13            new StringBuilder( "good bye greg" );
14
15         Console.WriteLine( "Before replacements:
" +
16            builder1.ToString() + "
" + builder2.ToString() );
17
18         builder1.Replace( "Jane", "Greg" );
19         builder2.Replace( 'g', 'G', 0, 5 );
20
21         Console.WriteLine( "
After replacements:
" +
22            builder1.ToString() + "
" + builder2.ToString() );
23      } // end Main
24   } // end class StringBuilderReplace
Before Replacements: 
Happy Birthday Jane 
good bye greg

After replacements: 
Happy Birthday Greg 
Good bye greg

Line 18 uses method Replace to replace all instances "Jane" with the "Greg" in builder1. Another overload of this method takes two characters as parameters and replaces each occurrence of the first character with the second. Line 19 uses an overload of Replace that takes four parameters, of which the first two are characters and the second two are ints. The method replaces all instances of the first character with the second character, beginning at the index specified by the first int and continuing for a count specified by the second int. Thus, in this case, Replace looks through only five characters, starting with the character at index 0. As the output illustrates, this version of Replace replaces g with G in the word "good", but not in "greg". This is because the gs in "greg" are not in the range indicated by the int arguments (i.e., between indexes 0 and 4).

Char Methods

C# provides a concept called a struct (short for “structure”) that’s similar to a class. Although structs and classes are comparable, structs represent value types. Like classes, structs can have methods and properties, and can use the access modifiers public and private. Also, struct members are accessed via the member access operator (.).

The simple types are actually aliases for struct types. For instance, an int is defined by struct System.Int32, a long by System.Int64 and so on. All struct types derive from class ValueType, which derives from object. Also, all struct types are implicitly sealed, so they do not support virtual or abstract methods, and their members cannot be declared protected or protected internal.

In the struct Char,[2] which is the struct for characters, most methods are static, take at least one character argument and perform either a test or a manipulation on the character. We present several of these methods in the next example. Figure 16.15 demonstrates static methods that test characters to determine whether they’re of a specific character type and static methods that perform case conversions on characters.

Example 16.15. Char’s static character-testing and case-conversion methods.

 1   // Fig. 16.15: StaticCharMethods.cs
 2   // Demonstrates static character-testing and case-conversion methods
 3   // from Char struct
 4   using System;
 5
 6   class StaticCharMethods
 7   {
 8      static void Main( string[] args )
 9      {
10         Console.Write( "Enter a character: " );
11         char character = Convert.ToChar( Console.ReadLine() );
12
13         Console.WriteLine( "is digit: {0}", Char.IsDigit( character ) );
14         Console.WriteLine( "is letter: {0}", Char.IsLetter( character ) );
15         Console.WriteLine( "is letter or digit: {0}",
16            Char.IsLetterOrDigit( character )  );
17         Console.WriteLine( "is lower case: {0}",
18            Char.IsLower( character ) );
19         Console.WriteLine( "is upper case: {0}",
20            Char.IsUpper( character ) );
21         Console.WriteLine( "to upper case: {0}",
22            Char.ToUpper( character ) );
23         Console.WriteLine( "to lower case: {0}",
24            Char.ToLower( character ) );
25         Console.WriteLine( "is punctuation: {0}",
26            Char.IsPunctuation( character ) );
27         Console.WriteLine( "is symbol: {0}", Char.IsSymbol( character ) );
28      } // end Main
29   } // end class StaticCharMethods
Enter a character: A
is digit: False 
is letter: True 
is letter or digit: True 
is lower case: False 
is upper case: True 
to upper case: A 
to lower case: a 
is punctuation: False 
is symbol: False
Enter a character: 8
is digit: True 
is letter: False 
is letter or digit: True 
is lower case: False 
is upper case: False 
to upper case: 8 
to lower case: 8 
is punctuation: False 
is symbol: False
Enter a character: @
is digit: False 
is letter: False 
is letter or digit: False 
is lower case: False 
is upper case: False 
to upper case: @ 
to lower case: @ 
is punctuation: True 
is symbol: False
Enter a character: m
is digit: False 
is letter: True 
is letter or digit: True 
is lower case: True 
is upper case: False 
to upper case: M 
to lower case: m 
is punctuation: False 
is symbol: False
Enter a character: +
is digit: False 
is letter: False 
is letter or digit: False 
is lower case: False 
is upper case: False 
to upper case: + 
to lower case: + 
is punctuation: False 
is symbol: True

After the user enters a character, lines 13–27 analyze it. Line 13 uses Char method IsDigit to determine whether character is defined as a digit. If so, the method returns true; otherwise, it returns false (note again that bool values are output capitalized). Line 14 uses Char method IsLetter to determine whether character character is a letter. Line 16 uses Char method IsLetterOrDigit to determine whether character character is a letter or a digit.

Line 18 uses Char method IsLower to determine whether character character is a lowercase letter. Line 20 uses Char method IsUpper to determine whether character character is an uppercase letter. Line 22 uses Char method ToUpper to convert character character to its uppercase equivalent. The method returns the converted character if the character has an uppercase equivalent; otherwise, the method returns its original argument. Line 24 uses Char method ToLower to convert character character to its lowercase equivalent. The method returns the converted character if the character has a lowercase equivalent; otherwise, the method returns its original argument.

Line 26 uses Char method IsPunctuation to determine whether character is a punctuation mark, such as "!", ":" or ")". Line 27 uses Char method IsSymbol to determine whether character character is a symbol, such as "+", "=" or "^".

Structure type Char also contains other methods not shown in this example. Many of the static methods are similar—for instance, IsWhiteSpace is used to determine whether a certain character is a whitespace character (e.g., newline, tab or space). The struct also contains several public instance methods; many of these, such as methods ToString and Equals, are methods that we have seen before in other classes. This group includes method CompareTo, which is used to compare two character values with one another.

(Online) Introduction to Regular Expressions

This online section is available via the book’s Companion Website at

www.pearsonhighered.com/deitel

In this section, we introduce regular expressions—specially formatted strings used to find patterns in text. They can be used to ensure that data is in a particular format. For example, a U.S. zip code must consist of five digits, or five digits followed by a dash followed by four more digits. Compilers use regular expressions to validate program syntax. If the program code does not match the regular expression, the compiler indicates that there’s a syntax error. We discuss classes Regex and Match from the System.Text.RegularExpressions namespace as well as the symbols used to form regular expressions. We then demonstrate how to find patterns in a string, match entire strings to patterns, replace characters in a string that match a pattern and split strings at delimiters specified as a pattern in a regular expression.

Wrap-Up

In this chapter, you learned about the Framework Class Library’s string-and character-processing capabilities. We overviewed the fundamentals of characters and strings. You saw how to determine the length of strings, copy strings, access the individual characters in strings, search strings, obtain substrings from larger strings, compare strings, concatenate strings, replace characters in strings and convert strings to uppercase or lowercase letters.

We showed how to use class StringBuilder to build strings dynamically. You learned how to determine and specify the size of a StringBuilder object, and how to append, insert, remove and replace characters in a StringBuilder object. We then introduced the character-testing methods of type Char that enable a program to determine whether a character is a digit, a letter, a lowercase letter, an uppercase letter, a punctuation mark or a symbol other than a punctuation mark, and the methods for converting a character to uppercase or lowercase.

Finally, we discussed classes Regex, Match and MatchCollection from namespace System.Text.RegularExpressions and the symbols that are used to form regular expressions. You learned how to find patterns in a string and match entire strings to patterns with Regex methods Match and Matches, how to replace characters in a string with Regex method Replace and how to split strings at delimiters with Regex method Split. In the next chapter, you’ll learn how to read data from and write data to files.

Summary

Section 16.2 Fundamentals of Characters and Strings

  • Characters are the fundamental building blocks of C# program code. Every program is composed of a sequence of characters that’s interpreted by the compiler as a series of instructions used to accomplish a task.

  • A string is a series of characters treated as a single unit. A string may include letters, digits and the various special characters: +, -, *, /, $ and others.

Section 16.3 string Constructors

  • Class string provides eight constructors.

  • All strings are immutable—their character contents cannot be changed after they’re created.

Section 16.4 string Indexer, Length Property and CopyTo Method

  • Property Length determines the number of characters in a string.

  • The string indexer receives an integer argument as the position number and returns the character at that position. The first element of a string is considered to be at position 0.

  • Attempting to access a character that’s outside a string’s bounds results in an IndexOutOfRange-Exception.

  • Method CopyTo copies a specified number of characters from a string into a char array.

Section 16.5 Comparing strings

  • When the computer compares two strings, it uses word sorting rules that depend on the computer’s currently selected culture.

  • Method Equals and the overloaded equality operator (==) can each be used to compare the contents of two strings.

  • Method CompareTo returns 0 if the strings are equal, a negative number if the string that invokes CompareTo is less than the string passed as an argument and a positive number if the string that invokes CompareTo is greater than the string passed as an argument.

  • string methods StartsWith and EndsWith determine whether a string starts or ends with the characters specified as an argument, respectively.

Section 16.6 Locating Characters and Substrings in strings

  • string method IndexOf locates the first occurrence of a character or a substring in a string. Method LastIndexOf locates the last occurrence of a character or a substring in a string.

Section 16.7 Extracting Substrings from strings

  • Class string provides two Substring methods to enable a new string to be created by copying part of an existing string.

Section 16.8 Concatenating strings

  • The static method Concat of class string concatenates two strings and returns a new string containing the characters from both original strings.

Section 16.10 Class StringBuilder

  • Once a string is created, its contents can never change. Class StringBuilder (namespace System.Text) is available for creating and manipulating strings that can change.

Section 16.11 Length and Capacity Properties, EnsureCapacity Method and Indexer of Class StringBuilder

  • Class StringBuilder provides Length and Capacity properties to return, respectively, the number of characters currently in a StringBuilder and the number of characters that can be stored in a StringBuilder without allocating more memory. These properties also can be used to increase or decrease the length or the capacity of the StringBuilder.

  • Method EnsureCapacity allows you to guarantee that a StringBuilder has a minimum capacity.

Section 16.12 Append and AppendFormat Methods of Class StringBuilder

  • Class StringBuilder provides Append methods to allow various types of values to be added to the end of a StringBuilder.

  • Formats have the form {X[,Y][:FormatString]}. X is the number of the argument to be formatted, counting from zero. Y is an optional positive or negative argument that indicates how many characters should be in the result of formatting. If the resulting string has fewer characters than this number, it will be padded with spaces. A positive integer means the string will be right aligned; a negative one means the string will be left aligned. The optional FormatString indicates other formatting to apply—currency, decimal or scientific, among others.

Section 16.13 Insert, Remove and Replace Methods of Class StringBuilder

  • Class StringBuilder provides 18 overloaded Insert methods to allow various types of values to be inserted at any position in a StringBuilder. Versions are provided for each of the simple types and for character arrays, strings and Objects.

  • Class StringBuilder also provides method Remove for deleting any portion of a StringBuilder.

  • StringBuilder method Replace searches for a specified string or character and substitutes another in its place.

Section 16.14 Char Methods

  • C# provides a concept called a struct (short for structure) that’s similar to a class.

  • structs represent value types.

  • structs can have methods and properties and can use the access modifiers public and private.

  • struct members are accessed via the member-access operator (.).

  • The simple types are actually aliases for struct types.

  • All struct types derive from class ValueType, which in turn derives from object.

  • All struct types are implicitly sealed, so they do not support virtual or abstract methods, and their members cannot be declared protected or protected internal.

  • Char is a struct that represents characters.

  • Method Char.IsDigit determines whether a character is a defined Unicode digit.

  • Method Char.IsLetter determines whether a character is a letter.

  • Method Char.IsLetterOrDigit determines whether a character is a letter or a digit.

  • Method Char.IsLower determines whether a character is a lowercase letter.

  • Method Char.IsUpper determines whether a character is an uppercase letter.

  • Method Char.ToUpper converts a lowercase character to its uppercase equivalent.

  • Method Char.ToLower converts an uppercase character to its lowercase equivalent.

  • Method Char.IsPunctuation determines whether a character is a punctuation mark.

  • Method Char.IsSymbol determines whether a character is a symbol.

  • Method Char.IsWhiteSpace determines whether a character is a whitespace character.

  • Method Char.CompareTo compares two character values.

Terminology

Self-Review Exercises

16.1

State whether each of the following is true or false. If false, explain why.

  1. When strings are compared with ==, the result is true if the strings contain the same values.

  2. A string can be modified after it’s created.

  3. StringBuilder method EnsureCapacity sets the StringBuilder instance’s length to the argument’s value.

  4. Method Equals and the equality operator work the same for strings.

  5. Method Trim removes all whitespace at the beginning and the end of a string.

  6. It’s always better to use strings, rather than StringBuilders, because strings containing the same value will reference the same object.

  7. string method ToUpper creates a new string with the first letter capitalized.

16.1

  1. True.

  2. False. strings are immutable; they cannot be modified after they’re created. StringBuilder objects can be modified after they’re created.

  3. False. EnsureCapacity simply ensures that the current capacity is at least the value specified in the method call.

  4. True.

  5. True.

  6. False. StringBuilder should be used if the string is to be modified.

  7. False. string method ToUpper creates a new string with all of its letters capitalized.

16.2

Fill in the blanks in each of the following statements:

  1. To concatenate strings, use operator __________, StringBuilder method __________ or string method __________.

  2. StringBuilder method __________ first formats the specified string, then concatenates it to the end of the StringBuilder.

  3. If the arguments to a Substring method call are out of range, a(n) __________ exception is thrown.

  4. A C in a format string means to output the number as __________.

16.2

  1. +, Append, Concat.

  2. AppendFormat

  3. ArgumentOutOfRangeException.

  4. currency.

Answers to Self-Review Exercises

Exercises

16.3

(Comparing strings) Write an application that uses string method CompareTo to compare two strings input by the user. Output whether the first string is less than, equal to or greater than the second.

16.4

(Random Sentences and Story Writer) Write an application that uses random-number generation to create sentences. Use four arrays of strings, called article, noun, verb and preposition. Create a sentence by selecting a word at random from each array in the following order: article, noun, verb, preposition, article, noun. As each word is picked, concatenate it to the previous words in the sentence. The words should be separated by spaces. When the sentence is output, it should start with a capital letter and end with a period. The program should generate 10 sentences and output them to a text box.

The arrays should be filled as follows: The article array should contain the articles "the", "a", "one", "some" and "any"; the noun array should contain the nouns "boy", "girl", "dog", "town" and "car"; the verb array should contain the past-tense verbs "drove", "jumped", "ran", "walked" and "skipped"; and the preposition array should contain the prepositions "to", "from", "over", "under" and "on".

After the preceding program is written, modify the program to produce a short story consisting of several of these sentences. (How about the possibility of a random term-paper writer!)

16.5

(Pig Latin) Write an application that encodes English-language phrases into pig Latin. Pig Latin is a form of coded language often used for amusement. Many variations exist in the methods used to form pig Latin phrases. For simplicity, use the following algorithm:

To translate each English word into a pig Latin word, place the first letter of the English word at the end of the word and add the letters “ay.” Thus, the word “jump” becomes “umpjay,” the word “the” becomes “hetay” and the word “computer” becomes “omputercay.” Blanks between words remain blanks. Assume the following: The English phrase consists of words separated by blanks, there are no punctuation marks and all words have two or more letters. Enable the user to input a sentence. Use techniques discussed in this chapter to divide the sentence into separate words. Method GetPigLatin should translate a single word into pig Latin. Keep a running display of all the converted sentences in a text box.

16.6

(All Possible Three-Letter Words from a Five-Letter Word) Write a program that reads a five-letter word from the user and produces all possible three-letter combinations that can be derived from the letters of the five-letter word. For example, the three-letter words produced from the word “bathe” include the commonly used words “ate,” “bat,” “bet,” “tab,” “hat,” “the” and “tea,” and the 3-letter combinations “bth,” “eab,” etc.

16.7

(Capitalizing Words) Write a program that uses regular expressions to convert the first letter of every word to uppercase. Have it do this for an arbitrary string input by the user.

Making a Difference Exercises

16.8

(Project: Cooking with Healthier Ingredients) Obesity in the United States is increasing at an alarming rate. Check the map from the Centers for Disease Control and Prevention (CDC) at www.cdc.gov/nccdphp/dnpa/Obesity/trend/maps/index.htm, which shows obesity trends in the United States over the last 20 years. As obesity increases, so do occurrences of related problems (e.g., heart disease, high blood pressure, high cholesterol, type 2 diabetes). Write a program that helps users choose healthier ingredients when cooking, and helps those allergic to certain foods (e.g., nuts, gluten) find substitutes. The program should read a recipe from the user and suggest healthier replacements for some of the ingredients. For simplicity, your program should assume the recipe has no abbreviations for measures such as teaspoons, cups, and tablespoons, and uses numerical digits for quantities (e.g., 1 egg, 2 cups) rather than spelling them out (one egg, two cups). Some common substitutions are shown in Fig. 16.16. Your program should display a warning such as, “Always consult your physician before making significant changes to your diet.”

Table 16.16. Common ingredient substitutions.

Ingredient

Substitution

1 cup sour cream

1 cup yogurt

1 cup milk

1/2 cup evaporated milk and 1/2 cup water

1 teaspoon lemon juice

1/2 teaspoon vinegar

1 cup sugar

1/2 cup honey, 1 cup molasses or 1/4 cup agave nectar

1 cup butter

1 cup margarine or yogurt

1 cup flour

1 cup rye or rice flour

1 cup mayonnaise

1 cup cottage cheese or 1/8 cup mayonnaise and 7/8 cup yogurt

1 egg

2 tablespoons cornstarch, arrowroot flour or potato starch or 2 egg whites or 1/2 of a large banana (mashed)

1 cup milk

1 cup soy milk

1/4 cup oil

1/4 cup applesauce

white bread

whole-grain bread

1 cup sour cream

1 cup yogurt

Your program should take into consideration that replacements are not always one-for-one. For example, if a cake recipe calls for three eggs, it might reasonably use six egg whites instead. Conversion data for measurements and substitutes can be obtained at websites such as:

  • chinesefood.about.com/od/recipeconversionfaqs/f/usmetricrecipes.htm

  • www.pioneerthinking.com/eggsub.html

  • www.gourmetsleuth.com/conversions.htm

Your program should consider the user’s health concerns, such as high cholesterol, high blood pressure, weight loss, gluten allergy, and so on. For high cholesterol, the program should suggest substitutes for eggs and dairy products; if the user wishes to lose weight, low-calorie substitutes for ingredients such as sugar should be suggested.

16.9

(Project: Spam Scanner) Spam (or junk e-mail) costs U.S. organizations billions of dollars a year in spam-prevention software, equipment, network resources, bandwidth, and lost productivity. Research online some of the most common spam e-mail messages and words, and check your own junk e-mail folder. Create a list of 30 words and phrases commonly found in spam messages. Write an application in which the user enters an e-mail message. Then, scan the message for each of the 30 keywords or phrases. For each occurrence of one of these within the message, add a point to the message’s “spam score.” Next, rate the likelihood that the message is spam, based on the number of points it received.

16.10

(Project: SMS Language) Short Message Service (SMS) is a communications service that allows sending text messages of 160 or fewer characters between mobile phones. With the proliferation of mobile phone use worldwide, SMS is being used in many developing nations for political purposes (e.g., voicing opinions and opposition), reporting news about natural disasters, and so on. For example, check out comunica.org/radio2.0/archives/87. Since the length of SMS messages is limited, SMS Language—abbreviations of common words and phrases in mobile text messages, e-mails, instant messages, etc.—is often used. For example, “in my opinion” is “IMO” in SMS Language. Research SMS Language online. Write a program in which the user can enter a message using SMS Language, then the program should translate it into English (or your own language). Also provide a mechanism to translate text written in English (or your own language) into SMS Language. One potential problem is that one SMS abbreviation could expand into a variety of phrases. For example, IMO (as used above) could also stand for “International Maritime Organization,” “in memory of,” etc.



[1] C# provides the string keyword as an alias for class String. In this book, we use the term string.

[2] Just as keyword string is an alias for class String, keyword char is an alias for struct Char. In this text, we use the term Char when calling a static method of struct Char and the term char elsewhere.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.195.111