Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. Strings and Regular Expressions

There was a time when people thought of computers exclusively as manipulating numeric values. Early computers were first used to calculate missile trajectories, and programming was taught in the math department of major universities.

Today, most programs are concerned more with strings of characters than with strings of numbers. Typically these strings are used for word processing, document manipulation, and creation of web pages.

C# provides built-in support for a fully functional string type. More importantly, C# treats strings as objects that encapsulate all the manipulation, sorting, and searching methods normally applied to strings of characters.

Complex string manipulation and pattern matching is aided by the use of regular expressions. C# combines the power and complexity of regular expression syntax, originally found only in string manipulation languages such as awk and Perl, with a fully object-oriented design.

In this chapter, you will learn to work with the C# string type and the .NET Framework System.String class that it aliases. You will see how to extract sub-strings, manipulate and concatenate strings, and build new strings with the StringBuilder class. In addition, you will learn how to use the RegEx class to match strings based on complex regular expression.

Strings

C# treats strings as first-class types that are flexible, powerful, and easy to use. Each string object is an immutable sequence of Unicode characters. In other words, methods that appear to change the string actually return a modified copy; the original string remains intact.

When you declare a C# string using the string keyword, you are in fact declaring the object to be of the type System.String, one of the built-in types provided by the .NET Framework Class Library. A C# string type is a System.String type,and we will use the names interchangeably throughout the chapter.

The declaration of the System.String class is:

public sealed class String : 
   IComparable, ICloneable, Iconvertible

This declaration reveals that the class is sealed, meaning that it is not possible to derive from the string class. The class also implements three system interfaces—IComparable , ICloneable , and IConvertible —which dictate functionality that System.String shares with other classes in the .NET Framework.

As seen in Chapter 9, the IComparable interface is implemented by types whose values can be ordered. Strings, for example, can be alphabetized; any given string can be compared with another string to determine which should come first in an ordered list. IComparable classes implement the CompareTo method.

ICloneable objects can create new instances with the same value as the original instance. In this case, it is possible to clone a string to produce a new string with the same values (characters) as the original. ICloneable classes implement the Clone( ) method.

IConvertible classes provide methods to facilitate conversion to other primitive types such as ToInt32(), ToDouble(), ToDecimal( ), etc.

Creating Strings

The most common way to create a string is to assign a quoted string of characters, known as a string literal, to a user-defined variable of type string:

string newString = "This is a string literal"

Quoted strings can include escape characters, such as "" or " ,” which begin with a backslash character () and are used to indicate where line breaks or tabs are to appear. Because the backslash is itself used in some command line syntaxes, such as URLs or directory paths, in a quoted string the backslash must be preceded by another backslash.

Strings can also be created using verbatim string literals, which start with the (@ ) symbol. This tells the String constructor that the string should be used verbatim, even if it spans multiple lines or includes escape characters. In a verbatim string literal, backslashes and the characters that follow them are simply considered additional characters of the string. Thus, the following two definitions are equivalent:

string literalOne = "\\MySystem\MyDirectory\ProgrammingC#.cs;"
string verbatimLiteralOne = @"\MySystemMyDirectoryProgrammingC#.cs";

In the first line, a nonverbatim string literal is used, and so the backslash characters () must be escaped, which means it must be preceded by a second backslash character. In the second, a verbatim literal string is used, so the extra backslash is not needed. A second example illustrates multiline verbatim strings:

string literalTwo = "Line One
Line Two";
string verbatimLiteralTwo = @"Line One
Line Two";

Again, these declarations are interchangeable. Which one you use is a matter of convenience and personal style.

The ToString Method

Another common way to create a string is to call the ToString( ) method on an object and assign the result to a string variable. All the built-in types override this method to simplify the task of converting a value (often a numeric value) to a string representation of that value. In the following example, the ToString( ) method of an integer type is called to store its value in a string:

int myInteger = 5;
string integerString = myInteger.ToString(  )

The call to myInteger.ToString( ) returns a String object which is then assigned to integerString.

The .NET String class provides a wealth of overloaded constructors that support a variety of techniques for assigning string values to string types. Some of these constructors enable you to create a string by passing in a character array or character pointer. Passing in a character array as a parameter to the constructor of the String creates a CLR-compliant new instance of a string. Passing in a character pointer creates a noncompliant, “unsafe” instance.

Manipulating Strings

The string class provides a host of methods for comparing, searching, and manipulating strings, as shown in Table 10-1.

Table 10-1. Methods and fields for the string class

Method or Field	Explanation
Empty	Public static field representing the empty string.
Compare( )	Overloaded public static method that compares two strings.
CompareOrdinal( )	Overloaded public static method that compares two strings without regard to local or culture.
Concat( )	Overloaded public static method that creates a new string from one or more strings.
Copy( )	Overloaded public static method that creates a new string by copying another.
Equals( )	Overloaded public static method that determines if two strings have the same value.
Format( )	Overloaded public static method that formats a string using a format specification.
Intern( )	Overloaded public static method that returns a reference to the specified instance of a string.
IsInterned( )	Overloaded public static method that returns a reference for the string.
Join( )	Overloaded public static method that concatenates a specified string between each element of a string array.
Chars( )	The string indexer.
Length( )	The number of characters in the instance.
Clone( )	Returns the string.
Compareto( )	Compares this string with another.
CopyTo( )	Copies the specified number of characters to an array of Unicode characters.
EndsWith( )	Indicates whether the specified string matches the end of this string.
Equals( )	Determines if two strings have the same value.
Insert( )	Returns a new string with the specified string inserted.
LastIndexOf( )	Reports the index of the last occurrence of a specified character or string within the string.
PadLeft( )	Right-aligns the characters in the string, padding to the left with spaces or a specified character.
PadRight( )	Left-aligns the characters in the string, padding to the right with spaces or a specified character.
Remove( )	Deletes the specified number of characters.
Split( )	Returns the substrings delimited by the specified characters in a string array.
StartsWith( )	Indicates if the string starts with the specified characters.
SubString( )	Retrieves a substring.
ToCharArray( )	Copies the characters from the string to a character array.
ToLower( )	Returns a copy of the string in lowercase.
ToUpper( )	Returns a copy of the string in uppercase.
Trim( )	Removes all occurrences of a set of specified characters from beginning and end of the string.
TrimEnd( )	Behaves like `Trim`, but only at the end.
TrimStart( )	Behaves like `Trim`, but only at the start.

Example 10-1 illustrates the use of some of these methods, including Compare( ), Concat( ) (and the overloaded + operator), Copy( ) (and the = operator), Insert( ), EndsWith( ), and IndexOf.

Example 10-1. Working with strings

namespace Programming_CSharp
{
   using System;

   public class StringTester
   {
      static void Main(  )
      {
         // create some strings to work with
         string s1 = "abcd";
         string s2 = "ABCD";
         string s3 = @"Liberty Associates, Inc. 
                provides custom .NET development, 
                on-site Training and Consulting";
    
         int result;  // hold the results of comparisons

         // compare two strings, case sensitive
         result = string.Compare(s1, s2);
         Console.WriteLine(
            "comprecompare s1: {0}, s2: {1}, result: {2}
", 
            s1, s2, result);            

         // overloaded compare, takes boolean "ignore case" 
         //(true = ignore case)
         result = string.Compare(s1,s2, true);
         Console.WriteLine("compare insensitive
");
         Console.WriteLine("s4: {0}, s2: {1}, result: {2}
", 
            s1, s2, result);            

         // concatenation method
         string s6 = string.Concat(s1,s2);
         Console.WriteLine(
            "s6 concatenated from s1 and s2: {0}", s6);

         // use the overloaded operator
         string s7 = s1 + s2;
         Console.WriteLine(
            "s7 concatenated from s1 + s2: {0}", s7);

         // the string copy method
         string s8 = string.Copy(s7);
         Console.WriteLine(
            "s8 copied from s7: {0}", s8);

         // use the overloaded operator
         string s9 = s8;
         Console.WriteLine("s9 = s8: {0}", s9);

         // three ways to compare. 
         Console.WriteLine(
            "
Does s9.Equals(s8)?: {0}", 
            s9.Equals(s8));
         Console.WriteLine(   
            "Does Equals(s9,s8)?: {0}", 
            string.Equals(s9,s8));
         Console.WriteLine(
            "Does s9==s8?: {0}", s9 == s8);

         // Two useful properties: the index and the length
         Console.WriteLine(
            "
String s9 is {0} characters long. ", 
            s9.Length);
         Console.WriteLine(
            "The 5th character is {1}
", 
            s9.Length, s9[4]);

         // test whether a string ends with a set of characters
         Console.WriteLine("s3:{0}
Ends with Training?: {1}
",
            s3, 
            s3.EndsWith("Training") );
         Console.WriteLine(
            "Ends with Consulting?: {0}",
            s3.EndsWith("Consulting"));

         // return the index of the substring
         Console.WriteLine(
            "
The first occurrence of Training ");
         Console.WriteLine ("in s3 is {0}
", 
            s3.IndexOf("Training"));

         // insert the word excellent before "training"
         string s10 = s3.Insert(103,"excellent ");
         Console.WriteLine("s10: {0}
",s10);

         // you can combine the two as follows:
         string s11 = s3.Insert(s3.IndexOf("Training"),
            "excellent ");
         Console.WriteLine("s11: {0}
",s11);
      }
   }
}
Output
compre s1: abcd, s2: ABCD, result: -1

compare insensitive

s4: abcd, s2: ABCD, result: 0

s6 concatenated from s1 and s2: abcdABCD
s7 concatenated from s1 + s2: abcdABCD
s8 copied from s7: abcdABCD
s9 = s8: abcdABCD

Does s9.Equals(s8)?: True
Does Equals(s9,s8)?: True
Does s9==s8?: True

String s9 is 8 characters long.
The 5th character is A

s3:Liberty Associates, Inc.
                provides custom .NET development,
                on-site Training and Consulting
Ends with Training?: False

Ends with Consulting?: True

The first occurrence of Training
in s3 is 103

s10: Liberty Associates, Inc.
                provides custom .NET development,
                on-site excellent Training and Consulting

s11: Liberty Associates, Inc.
                provides custom .NET development,
                on-site excellent Training and Consulting

Example 10-1 begins by declaring three strings:

string s1 = "abcd";
string s2 = "ABCD";
string s3 = @"Liberty Associates, Inc. 
      provides custom .NET development, 
      on-site Training and Consulting";

The first two are string literals, the third a verbatim string literal. We begin by comparing s1 to s2. The Compare method is a public static method of string, and it is overloaded. The first overloaded version takes two strings and compares them:

// compare two strings, case sensitive
result = string.Compare(s1, s2);
Console.WriteLine("compre s1: {0}, s2: {1}, result: {2}
",
    s1, s2, result);

This is a case-sensitive comparison and returns different values, depending on the results of the comparison:

A negative integer if the first string is less than the second string
0 if the strings are equal
A positive integer if the first string is greater than the second string

In this case, the output properly indicates that s1 is “less than” s2. In Unicode (as in ASCII), a lowercase letter has a smaller value than an uppercase letter:

compare s1: abcd, s2: ABCD, result: -1

The second comparison uses an overloaded version of Compare which takes a third, Boolean parameter, whose value determines whether case should be ignored in the comparison. If the value of this “ignore case” parameter is true, the comparison is made without regard to case, as in the following:

result = string.Compare(s1,s2, true);
Console.WriteLine("compare insensitive
");
Console.WriteLine("s4: {0}, s2: {1}, result: {2}
", 
    s1, s2, result);

Tip

The result is written with two WriteLine statements to keep the lines short enough to print properly in this book.

This time the case is ignored and the result is 0, indicating that the two strings are identical (without regard to case):

compare insensitive

s4: abcd, s2: ABCD, result: 0

Example 10-1 then concatenates some strings. There are a couple of ways to accomplish this. You can use the Concat( ) method, which is a static public method of string:

string s6 = string.Concat(s1,s2);

or you can simply use the overloaded concatenation (+) operator:

string s7 = s1 + s2;

In both cases, the output reflects that the concatenation was successful:

s6 concatenated from s1 and s2: abcdABCD
s7 concatenated from s1 + s2: abcdABCD

Similarly, creating a new copy of a string can be accomplished in two ways. First, you can use the static Copy method:

string s8 = string.Copy(s7);

or for convenience, you might instead use the overloaded assignment operator (=), which will implicitly make a copy:

string s9 = s8;

Once again, the output reflects that each method has worked:

s8 copied from s7: abcdABCD
s9 = s8: abcdABCD

The .NET String class provides three ways to test for the equality of two strings. First, you can use the overloaded Equals( ) method and ask s9 directly whether s8 is of equal value:

Console.WriteLine("
Does s9.Equals(s8)?: {0}", 
   s9.Equals(s8));

A second technique is to pass both strings to String’s static method Equals( ):

Console.WriteLine("Does Equals(s9,s8)?: {0}", 
      string.Equals(s9,s8));

A final method is to use the overloaded equality operator (==) of String:

Console.WriteLine("Does s9==s8?: {0}", s9 == s8);

In each of these cases, the returned result is a Boolean value, as shown in the output:

Does s9.Equals(s8)?: True
Does Equals(s9,s8)?: True
Does s9==s8?: True

The equality operator is the most natural when you have two string objects, however some languages, such as VB.NET, do not support operator overloading, so be sure to override the Equals instance method as well.

The next several lines in Example 10-1 use the index operator ([]) to find a particular character within a string and the Length property to return the length of the entire string:

Console.WriteLine("
String s9 is {0} characters long.,
    s9.Length);
Console.WriteLine("The 5th character is {1}
", 
    s9.Length, s9[4]);

Here’s the output:

String s9 is {8} characters long.
The 5th character is A

The EndsWith( )method asks a string whether a substring is found at the end of the string. Thus, you might ask s3 first if it ends with "Training" (which it does not) and then if it ends with "Consulting" (which it does):

// test whether a string ends with a set of characters
Console.WriteLine("s3:{0}
Ends with Training?: {1}
",
    s3, 
    s3.EndsWith("Training") );
Console.WriteLine("Ends with Consulting?: {0}",
    s3.EndsWith("Consulting"));

The output reflects that the first test fails and the second succeeds:

s3:Liberty Associates, Inc.
provides custom .NET development,
on-site Training and Consulting
Ends with Training?: False
Ends with Consulting?: True

The IndexOf( ) method locates a substring within our string, and the Insert( ) method inserts a new substring into a copy of the original string.

The following code locates the first occurrence of "Training" in s3:

Console.WriteLine("
The first occurrence of Training ");
Console.WriteLine ("in s3 is {0}
", 
    s3.IndexOf("Training"));

The output indicates that the offset is 103:

The first occurrence of Training
in s3 is 103

You can then use that value to insert the word "excellent", followed by a space, into that string. Actually the insertion is into a copy of the string returned by the Insert( ) method and assigned to s10:

string s10 = s3.Insert(103,"excellent ");
Console.WriteLine("s10: {0}
",s10);

Here’s the output:

s10: Liberty Associates, Inc.
provides custom .NET development,
on-site excellent Training and Consulting

Finally, you can combine these operations to make a more efficient insertion statement:

string s11 = s3.Insert(s3.IndexOf("Training"),"excellent ");
Console.WriteLine("s11: {0}
",s11);

with the identical output:

s11: Liberty Associates, Inc.
provides custom .NET development,
on-site excellent Training and Consulting

Finding Substrings

The String type provides an overloaded Substring method for extracting substrings from within strings. Both versions take an index indicating where to begin the extraction, and one of the two versions takes a second index to indicate where to end the search. The Substring method is illustrated in Example 10-2.

Example 10-2. Using the Substring( ) method

namespace Programming_CSharp
{
   using System;
   using System.Text;
    
   public class StringTester
   {
      static void Main(  )
      {
         // create some strings to work with
         string s1 = "One Two Three Four"; 

         int ix;

          // get the index of the last space
         ix=s1.LastIndexOf(" ");
            
         // get the last word.
         string s2 = s1.Substring(ix+1); 
            
         // set s1 to the substring starting at 0
         // and ending at ix (the start of the last word
         // thus s1 has one two three
         s1 = s1.Substring(0,ix);    
       
         // find the last space in s1 (after two)
         ix = s1.LastIndexOf(" ");

         // set s3 to the substring starting at 
         // ix, the space after "two" plus one more
         // thus s3 = "three"
         string s3 = s1.Substring(ix+1);

         // reset s1 to the substring starting at 0
         // and ending at ix, thus the string "one two"
         s1 = s1.Substring(0,ix);

         // reset ix to the space between 
         // "one" and "two"
         ix = s1.LastIndexOf(" ");

         // set s4 to the substring starting one
         // space after ix, thus the substring "two"
         string s4 = s1.Substring(ix+1);

         // reset s1 to the substring starting at 0
         // and ending at ix, thus "one"
         s1 = s1.Substring(0,ix);

         // set ix to the last space, but there is 
         // none so ix now = -1
         ix = s1.LastIndexOf(" ");

         // set s5 to the substring at one past
         // the last space. there was no last space
         // so this sets s5 to the substring starting
         // at zero
         string s5 = s1.Substring(ix+1);
            
         Console.WriteLine ("s2: {0}
s3: {1}",s2,s3);
         Console.WriteLine ("s4: {0}
s5: {1}
",s4,s5);
         Console.WriteLine ("s1: {0}
",s1);

      }
   }
}

Output:
s2: Four
s3: Three
s4: Two
s5: One

s1: One

Example 10-2 is not an elegant solution to the problem of extracting words from a string, but it is a good first approximation and it illustrates a useful technique. The example begins by creating a string, s1:

string s1 = "One Two Three Four";

Then ix is assigned the value of the last space in the string:

ix=s1.LastIndexOf(" ");

Then the substring that begins one space later is assigned to the new string, s2:

string s2 = s1.Substring(ix+1);

This extracts from x1+1 to the end of the line, assigning to s2 the value Four.

The next step is to remove the word Four from s1. You can do this by assigning to s1 the substring of s1 which begins at 0 and ends at ix:

s1 = s1.Substring(0,ix);

We reassign ix to the last (remaining) space, which points us to the beginning of the word Three, which we then extract into string s3. We continue like this until we’ve populated s4 and s5. Finally, we print the results:

s2: Four
s3: Three
s4: Two
s5: One

s1: One

Not elegant, but it worked and it illustrates the use of Substring. This is not unlike using pointer arithmetic in C++, but without using pointers and unsafe code.

Splitting Strings

A more effective solution to the problem illustrated in Example 10-2 would be to use the Split( ) method of String, whose job is to parse a string into substrings. To use Split( ), you pass in an array of delimiters (characters which will indicate a split in the words) and the method returns an array of substrings. Example 10-3 illustrates:

Example 10-3. Using the Split( ) method

namespace Programming_CSharp
{
   using System;
   using System.Text;
    
   public class StringTester
   {
      static void Main(  )
      {
         // create some strings to work with
         string s1 = "One,Two,Three Liberty Associates, Inc."; 

         // constants for the space and comma characters
         const char Space = ' ';
         const char Comma = ',';
    
         // array of delimiters to split the sentence with
         char[] delimiters = new char[] 
            {
               Space,
               Comma
            };

         string output = "";
         int ctr = 1;

         // split the string and then iterate over the
         // resulting array of strings
         foreach (string subString in s1.Split(delimiters))
         {
            output += ctr++;
            output += ": ";
            output += subString;
            output += "
";
         }
         Console.WriteLine(output);
      }
   }
}

Output:
1: One
2: Two
3: Three
4: Liberty
5: Associates
6:
7: Inc.

You start by creating a string to parse:

string s1 = "One,Two,Three Liberty Associates, Inc.";

The delimiters are set to the space and comma characters. You then call split on this string, and pass the results to the foreach loop:

foreach (string subString in s1.Split(delimiters))

You start by initializing output to an empty string. You then build up the output string in four steps. You concatenate the value of ctr. Next you add the colon, then the substring returned by split, then the newline. With each concatenation a new copy of the string is made, and all four steps are repeated for each substring found by split. This repeated copying of string is terribly inefficient.

The problem is that the string type is not designed for this kind of operation. What you want is to create a new string by appending a formatted string each time through the loop. The class you need is StringBuilder.

Manipulating Dynamic Strings

The StringBuilder class is used for creating and modifying strings. Semantically, it is the encapsulation of a constructor for a String. The important members of StringBuilder are summarized in Table 10-2.

Table 10-2. StringBuilder methods

Method	Explanation
Capacity( )	Retrieves or assigns the number of characters the `StringBuilder` is capable of holding.
Chars( )	The indexer.
Length( )	Retrieves or assigns the length of the `StringBuilder`.
MaxCapacity( )	Retrieves the maximum capacity of the `StringBuilder`.
Append( )	Overloaded public method that appends a typed object to the end of the current `StringBuilder`.
AppendFormat( )	Overloaded public method that replaces format specifiers with the formatted value of an object.
EnsureCapacity( )	Ensures the current `StringBuilder` has a capacity at least as large as the specified value.
Insert( )	Overloaded public method that inserts an object at the specified position.
Remove( )	Removes the specified characters.
Replace( )	Overloaded public method that replaces all instances of specified characters with new characters.

Unlike String, StringBuider is mutable; when you modify a StringBuilder you modify the actual string, not a copy. Example 10-4 replaces the String object in Example 10-3 with a StringBuilder object.

Example 10-4. Using a StringBuilder

namespace Programming_CSharp
{
   using System;
   using System.Text;
    
   public class StringTester
   {
      static void Main(  )
      {
         // create some strings to work with
         string s1 = "One,Two,Three Liberty Associates, Inc."; 

         // constants for the space and comma characters
         const char Space = ' ';
         const char Comma = ',';
    
         // array of delimiters to split the sentence with
         char[] delimiters = new char[] 
         {
               Space,
               Comma
         };

         // use a StringBuilder class to build the
         // output string
         StringBuilder output = new StringBuilder(  );
         int ctr = 1;

         // split the string and then iterate over the
         // resulting array of strings
         foreach (string subString in s1.Split(delimiters))
         {
            // AppendFormat appends a formatted string
            output.AppendFormat("{0}: {1}
",ctr++,subString);            
         }
         Console.WriteLine(output);
      }
   }
}

Only the last part of the program is modified. Rather than using the concatenation operator to modify the string, you use the AppendFormat method of StringBuilder to append new, formatted strings as you create them. This is much easier and far more efficient. The output is identical:

1: One
2: Two
3: Three
4: Liberty
5: Associates
6:
7: Inc.

Delimiter Limitations

Because you passed in delimiters of both comma and space, the space after the comma between “Associates” and “Inc.” is returned as a word, numbered 6 above. That is not what you want. To eliminate this you need to tell split to match a comma (as between One, Two, and Three) or a space (as between Liberty and Associates) or a comma followed by a space. It is that last bit that is tricky and requires that you use a regular expression.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. Strings and Regular Expressions

Create new playlist

Sign In

Sign Up