There was a time when people thought of computers exclusively as manipulating numeric values. Early computers were first used to calculate missile trajectories, and programming was taught in the math department of major universities.
Today, most programs are concerned more with strings of characters than with strings of numbers. Typically these strings are used for word processing, document manipulation, and creation of web pages.
C# provides built-in support for a fully functional
string
type. More importantly, C# treats strings
as objects that encapsulate all the manipulation, sorting, and
searching methods normally applied to strings of characters.
Complex string manipulation and pattern matching is aided by the use
of regular expressions
. C# combines the power
and complexity of regular expression syntax, originally found only in
string manipulation languages such as awk and Perl, with a fully
object-oriented design.
In this chapter, you will learn to work with the C#
string
type and the .NET
Framework
System.String
class
that it aliases. You will see how to extract sub-strings, manipulate
and concatenate strings, and build new strings with the
StringBuilder
class. In addition, you will learn
how to use the RegEx
class to match strings based
on complex regular expression.
C# treats strings as first-class types that are flexible, powerful,
and easy to use. Each string
object is an
immutable
sequence of Unicode characters. In
other words, methods that appear to change the string actually return
a modified copy; the original string remains intact.
When you declare a C# string using the string
keyword, you are in fact declaring the object to be of the type
System.String
, one of the built-in types provided
by the .NET Framework Class Library. A C# string type
is a System.String
type,and we will use the names interchangeably
throughout the chapter.
The declaration of the System.String
class is:
public sealed class String : IComparable, ICloneable, Iconvertible
This declaration reveals that the class is sealed, meaning that it is
not possible to derive from the string class. The class also
implements three system
interfaces—IComparable
,
ICloneable
, and
IConvertible
—which dictate functionality
that System.String
shares with other classes in
the .NET Framework.
As seen in Chapter 9, the
IComparable
interface is implemented by types
whose values can be ordered. Strings, for example, can be
alphabetized; any given string can be compared with another string to
determine which should come first in an ordered list.
IComparable
classes implement the
CompareTo
method.
ICloneable
objects can create new instances with
the same value as the original instance. In this case, it is possible
to clone a string to produce a new string with the same values
(characters) as the original. ICloneable
classes
implement the Clone( )
method.
IConvertible
classes provide methods to facilitate
conversion to other primitive types such as
ToInt32()
, ToDouble()
,
ToDecimal( )
, etc.
The
most common way to create a string is to assign a quoted string of
characters, known as a string literal
, to a
user-defined variable of type string
:
string newString = "This is a string literal"
Quoted strings can include escape characters,
such as
"
" or
"
,” which begin with a
backslash
character () and are used to indicate where line
breaks or tabs are to appear. Because the backslash is itself used in
some command line syntaxes, such as URLs or directory paths, in a
quoted string the backslash must be preceded by another backslash.
Strings can also be created using
verbatim
string literals, which start with the
(@
) symbol. This tells the
String
constructor that the string should be used
verbatim, even if it spans multiple lines or includes escape
characters. In a verbatim string literal, backslashes and the
characters that follow them are simply considered additional
characters of the string. Thus, the following two definitions are
equivalent:
string literalOne = "\\MySystem\MyDirectory\ProgrammingC#.cs;" string verbatimLiteralOne = @"\MySystemMyDirectoryProgrammingC#.cs";
In the first line, a nonverbatim string literal is used, and so the backslash characters () must be escaped, which means it must be preceded by a second backslash character. In the second, a verbatim literal string is used, so the extra backslash is not needed. A second example illustrates multiline verbatim strings:
string literalTwo = "Line One Line Two"; string verbatimLiteralTwo = @"Line One Line Two";
Again, these declarations are interchangeable. Which one you use is a matter of convenience and personal style.
Another common way to create a string is to call the
ToString( )
method on an object and assign the
result to a string variable. All the built-in types override this
method to simplify the task of converting a value (often a numeric
value) to a string representation of that value. In the following
example, the ToString( )
method of an integer type
is called to store its value in a string:
int myInteger = 5; string integerString = myInteger.ToString( )
The call to myInteger.ToString( )
returns a
String
object which is then assigned to
integerString
.
The .NET String
class provides a wealth of
overloaded constructors that support a variety of techniques for
assigning string values to string
types. Some of
these constructors enable you to create a string by passing in a
character array or character pointer. Passing in a character array as
a parameter to the constructor of the String
creates a CLR-compliant new instance of a string. Passing in a
character pointer creates a noncompliant, “unsafe”
instance.
The
string
class provides a host of methods for comparing, searching, and
manipulating strings, as shown in Table 10-1.
Table 10-1. Methods and fields for the string class
Example 10-1 illustrates the use of some of these
methods, including Compare( )
, Concat( )
(and the overloaded +
operator),
Copy( )
(and the = operator), Insert( )
, EndsWith( )
, and
IndexOf
.
Example 10-1. Working with strings
namespace Programming_CSharp
{
using System;
public class StringTester
{
static void Main( )
{
// create some strings to work with
string s1 = "abcd";
string s2 = "ABCD";
string s3 = @"Liberty Associates, Inc.
provides custom .NET development,
on-site Training and Consulting";
int result; // hold the results of comparisons
// compare two strings, case sensitive
result = string.Compare(s1, s2);
Console.WriteLine(
"comprecompare s1: {0}, s2: {1}, result: {2}
",
s1, s2, result);
// overloaded compare, takes boolean "ignore case"
//(true = ignore case)
result = string.Compare(s1,s2, true);
Console.WriteLine("compare insensitive
");
Console.WriteLine("s4: {0}, s2: {1}, result: {2}
",
s1, s2, result);
// concatenation method
string s6 = string.Concat(s1,s2);
Console.WriteLine(
"s6 concatenated from s1 and s2: {0}", s6);
// use the overloaded operator
string s7 = s1 + s2;
Console.WriteLine(
"s7 concatenated from s1 + s2: {0}", s7);
// the string copy method
string s8 = string.Copy(s7);
Console.WriteLine(
"s8 copied from s7: {0}", s8);
// use the overloaded operator
string s9 = s8;
Console.WriteLine("s9 = s8: {0}", s9);
// three ways to compare.
Console.WriteLine(
"
Does s9.Equals(s8)?: {0}",
s9.Equals(s8));
Console.WriteLine(
"Does Equals(s9,s8)?: {0}",
string.Equals(s9,s8));
Console.WriteLine(
"Does s9==s8?: {0}", s9 == s8);
// Two useful properties: the index and the length
Console.WriteLine(
"
String s9 is {0} characters long. ",
s9.Length);
Console.WriteLine(
"The 5th character is {1}
",
s9.Length, s9[4]);
// test whether a string ends with a set of characters
Console.WriteLine("s3:{0}
Ends with Training?: {1}
",
s3,
s3.EndsWith("Training") );
Console.WriteLine(
"Ends with Consulting?: {0}",
s3.EndsWith("Consulting"));
// return the index of the substring
Console.WriteLine(
"
The first occurrence of Training ");
Console.WriteLine ("in s3 is {0}
",
s3.IndexOf("Training"));
// insert the word excellent before "training"
string s10 = s3.Insert(103,"excellent ");
Console.WriteLine("s10: {0}
",s10);
// you can combine the two as follows:
string s11 = s3.Insert(s3.IndexOf("Training"),
"excellent ");
Console.WriteLine("s11: {0}
",s11);
}
}
}
Output
compre s1: abcd, s2: ABCD, result: -1
compare insensitive
s4: abcd, s2: ABCD, result: 0
s6 concatenated from s1 and s2: abcdABCD
s7 concatenated from s1 + s2: abcdABCD
s8 copied from s7: abcdABCD
s9 = s8: abcdABCD
Does s9.Equals(s8)?: True
Does Equals(s9,s8)?: True
Does s9==s8?: True
String s9 is 8 characters long.
The 5th character is A
s3:Liberty Associates, Inc.
provides custom .NET development,
on-site Training and Consulting
Ends with Training?: False
Ends with Consulting?: True
The first occurrence of Training
in s3 is 103
s10: Liberty Associates, Inc.
provides custom .NET development,
on-site excellent Training and Consulting
s11: Liberty Associates, Inc.
provides custom .NET development,
on-site excellent Training and Consulting
Example 10-1 begins by declaring three strings:
string s1 = "abcd"; string s2 = "ABCD"; string s3 = @"Liberty Associates, Inc. provides custom .NET development, on-site Training and Consulting";
The first two are string literals, the third a verbatim string
literal. We begin by comparing s1
to
s2
. The Compare
method is a
public static method of string
, and it is
overloaded. The first overloaded version takes two strings and
compares them:
// compare two strings, case sensitive result = string.Compare(s1, s2); Console.WriteLine("compre s1: {0}, s2: {1}, result: {2} ", s1, s2, result);
This is a case-sensitive comparison and returns different values, depending on the results of the comparison:
A negative integer if the first string is less than the second string
0 if the strings are equal
A positive integer if the first string is greater than the second string
In this case, the output properly indicates that
s1
is “less than”
s2
. In Unicode (as in ASCII), a lowercase letter
has a smaller value than an uppercase letter:
compare s1: abcd, s2: ABCD, result: -1
The second comparison uses an overloaded version of
Compare
which takes a third, Boolean parameter,
whose value determines whether case should be ignored in the
comparison. If the value of this “ignore case” parameter
is true
, the comparison is made without regard to
case, as in the following:
result = string.Compare(s1,s2, true); Console.WriteLine("compare insensitive "); Console.WriteLine("s4: {0}, s2: {1}, result: {2} ", s1, s2, result);
The result is written with two WriteLine
statements to keep the lines short enough to print properly in this
book.
This time the case is ignored and the result is 0
,
indicating that the two strings are identical (without regard to
case):
compare insensitive s4: abcd, s2: ABCD, result: 0
Example 10-1 then concatenates some strings. There
are a couple of ways to accomplish this. You can use the
Concat( )
method, which is a static public method
of string
:
string s6 = string.Concat(s1,s2);
or you can simply use the overloaded
concatenation (+
)
operator:
string s7 = s1 + s2;
In both cases, the output reflects that the concatenation was successful:
s6 concatenated from s1 and s2: abcdABCD s7 concatenated from s1 + s2: abcdABCD
Similarly, creating a new copy of a string can be accomplished in two
ways. First, you can use the static Copy
method:
string s8 = string.Copy(s7);
or for convenience, you might instead use the
overloaded assignment operator
(=
), which will implicitly make a copy:
string s9 = s8;
Once again, the output reflects that each method has worked:
s8 copied from s7: abcdABCD s9 = s8: abcdABCD
The .NET String
class provides three ways to test
for the equality of two strings. First, you can use the overloaded
Equals( )
method and ask s9
directly whether s8
is of equal value:
Console.WriteLine(" Does s9.Equals(s8)?: {0}", s9.Equals(s8));
A second technique is to pass both strings to
String
’s static method Equals( )
:
Console.WriteLine("Does Equals(s9,s8)?: {0}", string.Equals(s9,s8));
A final method is to use the overloaded equality operator
(==
) of
String
:
Console.WriteLine("Does s9==s8?: {0}", s9 == s8);
In each of these cases, the returned result is a Boolean value, as shown in the output:
Does s9.Equals(s8)?: True Does Equals(s9,s8)?: True Does s9==s8?: True
The equality operator is the most natural when you have two string
objects, however some languages, such as VB.NET, do not support
operator overloading, so be sure to override the
Equals
instance method as well.
The next several lines in Example 10-1 use
the
index operator
([]
) to find a particular character within a
string and the Length
property to return the
length of the entire string:
Console.WriteLine(" String s9 is {0} characters long., s9.Length); Console.WriteLine("The 5th character is {1} ", s9.Length, s9[4]);
Here’s the output:
String s9 is {8} characters long. The 5th character is A
The
EndsWith( )
method asks a string whether a
substring is found at the end of the string. Thus, you might ask
s3
first if it ends with
"Training"
(which it does not) and then if it ends
with "Consulting"
(which it does):
// test whether a string ends with a set of characters Console.WriteLine("s3:{0} Ends with Training?: {1} ", s3, s3.EndsWith("Training") ); Console.WriteLine("Ends with Consulting?: {0}", s3.EndsWith("Consulting"));
The output reflects that the first test fails and the second succeeds:
s3:Liberty Associates, Inc. provides custom .NET development, on-site Training and Consulting Ends with Training?: False Ends with Consulting?: True
The IndexOf( )
method locates a substring within our
string, and the Insert( )
method inserts a new
substring into a copy of the original string.
The following code locates the first occurrence of
"Training"
in s3
:
Console.WriteLine(" The first occurrence of Training "); Console.WriteLine ("in s3 is {0} ", s3.IndexOf("Training"));
The output indicates that the offset is 103
:
The first occurrence of Training in s3 is 103
You can then use that value to insert the word
"excellent"
, followed by a space, into that
string. Actually the insertion is into a copy of the string returned
by the Insert( )
method and assigned to
s10
:
string s10 = s3.Insert(103,"excellent "); Console.WriteLine("s10: {0} ",s10);
Here’s the output:
s10: Liberty Associates, Inc. provides custom .NET development, on-site excellent Training and Consulting
Finally, you can combine these operations to make a more efficient insertion statement:
string s11 = s3.Insert(s3.IndexOf("Training"),"excellent "); Console.WriteLine("s11: {0} ",s11);
with the identical output:
s11: Liberty Associates, Inc. provides custom .NET development, on-site excellent Training and Consulting
The String
type provides an overloaded
Substring
method for extracting substrings from
within strings. Both versions take an index indicating where to begin
the extraction, and one of the two versions takes a second index to
indicate where to end the search. The Substring
method is illustrated in Example 10-2.
Example 10-2. Using the Substring( ) method
namespace Programming_CSharp
{
using System;
using System.Text;
public class StringTester
{
static void Main( )
{
// create some strings to work with
string s1 = "One Two Three Four";
int ix;
// get the index of the last space
ix=s1.LastIndexOf(" ");
// get the last word.
string s2 = s1.Substring(ix+1);
// set s1 to the substring starting at 0
// and ending at ix (the start of the last word
// thus s1 has one two three
s1 = s1.Substring(0,ix);
// find the last space in s1 (after two)
ix = s1.LastIndexOf(" ");
// set s3 to the substring starting at
// ix, the space after "two" plus one more
// thus s3 = "three"
string s3 = s1.Substring(ix+1);
// reset s1 to the substring starting at 0
// and ending at ix, thus the string "one two"
s1 = s1.Substring(0,ix);
// reset ix to the space between
// "one" and "two"
ix = s1.LastIndexOf(" ");
// set s4 to the substring starting one
// space after ix, thus the substring "two"
string s4 = s1.Substring(ix+1);
// reset s1 to the substring starting at 0
// and ending at ix, thus "one"
s1 = s1.Substring(0,ix);
// set ix to the last space, but there is
// none so ix now = -1
ix = s1.LastIndexOf(" ");
// set s5 to the substring at one past
// the last space. there was no last space
// so this sets s5 to the substring starting
// at zero
string s5 = s1.Substring(ix+1);
Console.WriteLine ("s2: {0}
s3: {1}",s2,s3);
Console.WriteLine ("s4: {0}
s5: {1}
",s4,s5);
Console.WriteLine ("s1: {0}
",s1);
}
}
}
Output:
s2: Four
s3: Three
s4: Two
s5: One
s1: One
Example 10-2 is not an elegant solution to the
problem of extracting words from a string, but it is a good first
approximation and it illustrates a useful technique. The example
begins by creating a string, s1
:
string s1 = "One Two Three Four";
Then ix
is assigned the value of the
last space in the string:
ix=s1.LastIndexOf(" ");
Then the substring that begins one space later is assigned to the new
string, s
2:
string s2 = s1.Substring(ix+1);
This extracts from x1+1
to the end of the line,
assigning to s2
the value Four
.
The next step is to remove the word Four
from
s1
. You can do this by assigning to
s1
the substring of s1
which
begins at 0
and ends at ix
:
s1 = s1.Substring(0,ix);
We reassign ix
to the last (remaining) space,
which points us to the beginning of the word
Three
, which we then extract into string
s3
. We continue like this until we’ve
populated s4
and s5
. Finally,
we print the results:
s2: Four s3: Three s4: Two s5: One s1: One
Not elegant, but it worked and it illustrates the use of
Substring
. This is not unlike using pointer
arithmetic in C++, but without using pointers and unsafe
code.
A more effective solution to the problem illustrated in Example 10-2 would be to use the Split( )
method of String
, whose job is to
parse a string into substrings. To use Split( )
,
you pass in an array of delimiters (characters which will indicate a
split in the words) and the method returns an array of substrings.
Example 10-3 illustrates:
Example 10-3. Using the Split( ) method
namespace Programming_CSharp
{
using System;
using System.Text;
public class StringTester
{
static void Main( )
{
// create some strings to work with
string s1 = "One,Two,Three Liberty Associates, Inc.";
// constants for the space and comma characters
const char Space = ' ';
const char Comma = ',';
// array of delimiters to split the sentence with
char[] delimiters = new char[]
{
Space,
Comma
};
string output = "";
int ctr = 1;
// split the string and then iterate over the
// resulting array of strings
foreach (string subString in s1.Split(delimiters))
{
output += ctr++;
output += ": ";
output += subString;
output += "
";
}
Console.WriteLine(output);
}
}
}
Output:
1: One
2: Two
3: Three
4: Liberty
5: Associates
6:
7: Inc.
You start by creating a string to parse:
string s1 = "One,Two,Three Liberty Associates, Inc.";
The delimiters are set to the space and comma characters. You then
call split
on this string, and pass the results to
the foreach
loop:
foreach (string subString in s1.Split(delimiters))
You start by initializing output to an empty string. You then build
up the output string in four steps. You concatenate the value of
ctr
. Next you add the colon, then the substring
returned by split, then the newline. With each concatenation a new
copy of the string is made, and all four steps are repeated for each
substring found by split
. This repeated copying of
string
is terribly inefficient.
The problem is that the string type is not designed for this kind of
operation. What you want is to create a new string by appending a
formatted string each time through the loop. The class you need is
StringBuilder
.
The
StringBuilder
class is used for creating and modifying strings. Semantically, it is
the encapsulation of a constructor for a String
.
The important members of StringBuilder
are
summarized in Table 10-2.
Table 10-2. StringBuilder methods
Unlike String
, StringBuider
is
mutable; when you modify a StringBuilder
you
modify the actual string, not a copy. Example 10-4
replaces the String
object in Example 10-3 with a StringBuilder
object.
Example 10-4. Using a StringBuilder
namespace Programming_CSharp { using System; using System.Text; public class StringTester { static void Main( ) { // create some strings to work with string s1 = "One,Two,Three Liberty Associates, Inc."; // constants for the space and comma characters const char Space = ' '; const char Comma = ','; // array of delimiters to split the sentence with char[] delimiters = new char[] { Space, Comma }; // use a StringBuilder class to build the // output string StringBuilder output = new StringBuilder( ); int ctr = 1; // split the string and then iterate over the // resulting array of strings foreach (string subString in s1.Split(delimiters)) { // AppendFormat appends a formatted string output.AppendFormat("{0}: {1} ",ctr++,subString); } Console.WriteLine(output); } } }
Only the last part of the program is modified. Rather than using the
concatenation operator to modify the string, you use the
AppendFormat
method of
StringBuilder
to append new, formatted strings as
you create them. This is much easier and far more efficient. The
output is identical:
1: One 2: Two 3: Three 4: Liberty 5: Associates 6: 7: Inc.
18.220.202.209