Chapter 3. Pulling Strings

In This Chapter

  • Pulling and twisting a string with C# — just don't string me along

  • Comparing strings

  • Other string operations, such as searching, trimming, splitting, and concatenating

  • Parsing strings read into the program

  • Formatting output strings manually or using the String.Format() method

For many applications, you can treat a string like one of the built-in value-type variable types such as int or char. Certain operations that are otherwise reserved for these intrinsic types are available to strings:

int i = 1;          // Declare and initialize an int.
string s = "abc";   // Declare and initialize a string.

In other respects, as shown in the following example, a string is treated like a user-defined class (I cover classes in Book II ):

string s1 = new String();
string s2 = "abcd";
int lengthOfString = s2.Length;

Which is it — a value type or a class? In fact, String is a class for which C# offers special treatment because strings are so widely used in programs. For example, the keyword string is synonymous with the class name String, as shown in this bit of code:

String s1 = "abcd"; // Assign a string literal to a String obj.
string s2 = s1;     // Assign a String obj to a string variable.

In this example, s1 is declared to be an object of class String (spelled with an uppercase S) whereas s2 is declared as a simple string (spelled with a lowercase s). However, the two assignments demonstrate that string and String are of the same (or compatible) types.

Note

In fact, this same property is true of the other intrinsic variable types, to a more limited extent. Even the lowly int type has a corresponding class Int32, double has the class Double, and so on. The distinction here is that string and String truly are the same thing.

In the rest of the chapter, I cover Strings and strings and all the tasks you can accomplish by using them.

The Union Is Indivisible, and So Are Strings

You need to know at least one thing that you didn't learn before the sixth grade: You can't change a string object itself after it has been created. Even though I may speak of modifying a string, C# doesn't have an operation that modifies the actual string object. Plenty of operations appear to modify the string that you're working with, but they always return the modified string as a new object, instead. One string becomes two.

For example, the operation "His name is " + "Randy" changes neither of the two strings, but it generates a third string, "His name is Randy". One side effect of this behavior is that you don't have to worry about someone modifying a string out from under you.

Consider this simplistic example program:

Note

// ModifyString -- The methods provided by class String do
//    not modify the object itself. (s.ToUpper() doesn't
//    modify 's'; rather it returns a new string that has
//    been converted.)
using System;
namespace ModifyString
{
  class Program
  {
    public static void Main(string[] args)
    {
      // Create a student object.
      Student s1 = new Student();
      s1.Name = "Jenny";
      // Now make a new object with the same name.
      Student s2 = new Student();
      s2.Name = s1.Name;
      // "Changing" the name in the s1 object does not
      // change the object itself because ToUpper() returns
      // a new string without modifying the original.
      s2.Name = s1.Name.ToUpper();
      Console.WriteLine("s1 - " + s1.Name + ", s2 - " + s2.Name);
      // Wait for user to acknowledge the results.
      Console.WriteLine("Press Enter to terminate...");
      Console.Read();
    }
  }

  // Student -- You just need a class with a string in it.
  class Student
  {
    public String Name;
  }
}

I fully discuss classes in Book II, but for now, you can see that the Student class contains a data variable called Name, of type String. The Student objects s1 and s2 are set up so the student Name data in each points to the same string data. ToUpper() converts the string s1.Name to all uppercase characters. Normally, this would be a problem because both s1 and s2 point to the same object. However, ToUpper() does not change Name — it creates a new, independent uppercase string and stores it in the object s2. Now the two Students don't point to the same string data.

The following output of the program is simple:

s1 - Jenny, s2 - JENNY
Press Enter to terminate...

This property of strings is called immutability (meaning, unchangeability).

Note

The immutability of strings is also important for string constants. A string such as "this is a string" is a form of a string constant, just as 1 is an int constant. In the same way that I reuse my shirts to reduce the size of my wardrobe (and go easy on my bank account), a compiler may choose to combine all accesses to the single constant "this is a string". Reusing string constants can reduce the footprint of the resulting program (its size on disk or in memory) but would be impossible if a string could be modified.

Performing Common Operations on a String

C# programmers perform more operations on strings than Beverly Hills plastic surgeons do on Hollywood hopefuls. Virtually every program uses the addition operator that's used on strings, as shown in this example:

string name = "Randy";
Console.WriteLine("His name is " + name); // + means concatenate.

The String class provides this special operator. However, the String class also provides other, more direct methods for manipulating strings. You can see the complete list by looking up "String class" in the Visual Studio Help index, and you'll meet many of the usual suspects in this chapter. Among the string-related tasks I cover here are the ones described in this list:

  • Comparing strings — for equality or for tasks like alphabetizing

  • Changing and converting strings in various ways: replacing part of a string, changing case, and converting between strings and other things

  • Accessing the individual characters in a string

  • Finding characters or substrings inside a string

  • Handling input from the command line

  • Managing formatted output

  • Working efficiently with strings using the StringBuilder

Note

In addition to the examples shown in the rest of this chapter, take a look at the StringCaseChanging and VariousStringTechniques examples on the Web site.

Comparing Strings

It's very common to need to compare two strings. For example, did the user input the expected value? Or maybe you have a list of strings and need to alphabetize them.

If all you need to know is whether two strings are equal (same length and same characters in the same order), you can use the == operator (or its inverse, !=, or not equal):

string a = "programming";
string b = "Programming";
if(a == b) ... // True if you don't consider case, false otherwise.
if(a != b) ... // False if you don't consider case, true otherwise.

But comparing two strings for anything but equality or inequality is another matter. It doesn't work to say

if(a < b) ...

So if you need to ask, Is string A greater than string B? or Is string A less than string B?, you need another approach.

Equality for all strings: The Compare() method

Numerous operations treat a string as a single object — for example, the Compare() method. Compare(), with the following properties, compares two strings as though they were numbers:

  • If the left-hand string is greater than the right string, Compare(left, right) returns 1.

  • If the left-hand string is less than the right string, it returns −1.

  • If the two strings are equal, it returns 0.

The algorithm works as follows when written in notational C# (that is, C# without all the details, also known as pseudocode):

compare(string s1, string s2)
{
  // Loop through each character of the strings until
  // a character in one string is greater than the
  // corresponding character in the other string.
  foreach character in the shorter string
    if (s1's character > s2's character when treated as a number)
      return 1
    if (s2's character < s1's character)
      return −1
  // Okay, every letter matches, but if the string s1 is longer,
  // then it's greater.
  if s1 has more characters left
    return 1
  // If s2 is longer, it's greater.
  if s2 has more characters left
    return −1
  // If every character matches and the two strings are the same
  // length, then they are "equal."
  return 0
}

Thus, "abcd" is greater than "abbd", and "abcde" is greater than "abcd". More often than not, you don't care whether one string is greater than the other, but only whether the two strings are equal.

Tip

You do want to know which string is bigger when performing a sort.

The Compare() operation returns 0 when two strings are identical. The following test program uses the equality feature of Compare() to perform a certain operation when the program encounters a particular string or strings.

BuildASentence prompts the user to enter lines of text. Each line is concatenated to the previous line to build a single sentence. This program exits if the user enters the word EXIT, exit, QUIT, or quit:

Note

// BuildASentence -- The following program constructs sentences
//   by concatenating user input until the user enters one of the
//   termination characters. This program shows when you need to look for
//   string equality.
using System;
namespace BuildASentence
{
  public class Program
  {
    public static void Main(string[] args)
    {
      Console.WriteLine("Each line you enter will be "
                      + "added to a sentence until you "
                      + "enter EXIT or QUIT");
      // Ask the user for input; continue concatenating
      // the phrases input until the user enters exit or
      // quit (start with an empty sentence).
      string sentence = "";
      for (; ; )
      {
// Get the next line.
        Console.WriteLine("Enter a string ");
        string line = Console.ReadLine();
        // Exit the loop if line is a terminator.
        string[] terms = { "EXIT", "exit", "QUIT", "quit" };
        // Compare the string entered to each of the
        // legal exit commands.
        bool quitting = false;
        foreach (string term in terms)
        {
          // Break out of the for loop if you have a match.
          if (String.Compare(line, term) == 0)
          {
            quitting = true;
          }
        }
        if (quitting == true)
        {
          break;
        }
        // Otherwise, add it to the sentence.
        sentence = String.Concat(sentence, line);
        // Let the user know how she's doing.
        Console.WriteLine("
you've entered: " + sentence);
      }
      Console.WriteLine("
total sentence:
" + sentence);
      // Wait for user to acknowledge the results.
      Console.WriteLine("Press Enter to terminate...");
      Console.Read();
    }
  }
}

After prompting the user for what the program expects, the program creates an empty initial sentence string called sentence. From there, the program enters an infinite loop.

Note

The controls while(true) and for(;;) loop forever, or at least long enough for some internal break or return to break you out. The two loops are equivalent, and in practice, you'll see them both. (Looping is covered in Chapter 5 of this minibook.)

BuildASentence prompts the user to enter a line of text, which the program reads using the ReadLine() method. Having read the line, the program checks to see whether it is a terminator using the boldfaced lines in the preceding example.

The termination section of the program defines an array of strings called terms and a bool variable quitting, initialized to false. Each member of the terms array is one of the strings you're looking for. Any of these strings causes the program to quit faster than a programmer forced to write COBOL.

Warning

The program must include both "EXIT" and "exit" because Compare() considers the two strings different by default. (The way the program is written, these are the only two ways to spell exit. Strings such as "Exit" and "eXit" aren't recognized as terminators.) You can also use other string operations to check for various spellings of exit. I show you this in the next section.

The termination section loops through each of the strings in the array of target strings. If Compare() reports a match to any of the terminator phrases, quitting is set to true. If quitting remains false after the termination section and line is not one of the terminator strings, it is concatenated to the end of the sentence using the String.Concat() method. The program outputs the immediate result just so the user can see what's going on.

Tip

Iterating through an array is a classic way to look for one of various possible values. (I'll show you another way in the next section, and an even cooler way in Book II.)

Here's a sample run of the BuildASentence program:

Each line you enter will be added to a
sentence until you enter EXIT or QUIT
Enter a string
Programming with

You've entered: Programming with
Enter a string
 C# is fun

You've entered: Programming with C# is fun
Enter a string
 (more or less)

You've entered: Programming with C# is fun (more or less)
Enter a string
EXIT

Total sentence:
Programming with C# is fun (more or less)
Press Enter to terminate...

I've flagged my input in bold to make the output easier to read.

Would you like your compares with or without case?

The Compare() method used in the previous example considers "EXIT" and "exit" to be different strings. However, the Compare() method has a second version that includes a third argument. This argument indicates whether the comparison should ignore the letter case. A true indicates "ignore."

The following version of the lengthy termination section in the BuildASentence example sets quitting to true whether the string passed is uppercase, lowercase, or a combination of the two:

// Indicate true if passed either exit or quit,
  // irrespective of case.
  if (String.Compare("exit", source, true) == 0) ||
         (String.Compare("quit", source, true) == 0)
  {
    quitting = true;
  }
}

This version is simpler than the previous looping version. This code doesn't need to worry about case, and it can use a single conditional expression because it now has only two options to consider instead of a longer list: any spelling variation of QUIT or EXIT.

What If I Want to Switch Case?

You may be interested in whether all of the characters (or just one) in a string are uppercase or lowercase characters. And you may need to convert from one to the other.

Distinguishing between all-uppercase and all-lowercase strings

I almost hate to bring it up, but you can use the switch command (see Chapter 5 of this minibook) to look for a particular string. Normally, you use the switch command to compare a counting number to some set of possible values; however, switch does work on string objects, as well. This version of the termination section in BuildASentence uses the switch construct:

switch(line)
{
  case "EXIT":
  case "exit":
  case "QUIT":
  case "quit":
    return true;
}
return false;

This approach works because you're comparing only a limited number of strings. The for loop offers a much more flexible approach for searching for string values. Using the case-less Compare() in the previous section gives the program greater flexibility in understanding the user.

Converting a string to upper- or lowercase

Suppose you have a string in lowercase and need to convert it to uppercase. You can use the ToUpper() method:

string lowcase = "armadillo";
string upcase = lowcase.ToUpper();  // ARMADILLO.

Similarly, you can convert uppercase to lowercase with ToLower().

What if you want to convert just the first character in a string to uppercase? The following rather convoluted code will do it (but you can see a better way in the last section of this chapter):

string name = "chuck";
string properName =
   char.ToUpper(name[0]).ToString() + name.Substring(1, name.Length - 1);

The idea in this example is to extract the first char in name (that's name[0]), convert it to a one-character string with ToString(), and then tack on the remainder of name after removing the old lowercase first character with Substring().

You can tell whether a string is uppercased or lowercased by using this scary-looking if statement:

if (string.Compare(line.ToUpper(CultureInfo.InvariantCulture),
                   line, false) == 0) ...  // True if line is all upper.

Here the Compare() method is comparing an uppercase version of line to line itself. There should be no difference if line is already uppercase. You can puzzle over the CultureInfo.InvariantCulture gizmo in Help, 'cause I'm not going to explain it here. For "is it all lowercase," stick a not (!) operator in front of the Compare() call. Alternatively, you can use a loop, as described in the next section.

Note

The StringCaseChanging example on the Web site illustrates these and other techniques, including a brief explanation of cultures.

Looping through a String

You can access individual characters of a string in a foreach loop. The following code steps through the characters and writes each to the console — just another (roundabout) way to write out the string:

string favoriteFood = "cheeseburgers";
foreach(char c in favoriteFood)
{
  Console.Write(c);  // Could do things to the char here.
}
Console.WriteLine();

You can use that loop to solve the problem of deciding whether favoriteFood is all uppercase. (See the previous section for more about case.)

bool isUppercase = true;  // Start with the assumption that it's uppercase.
foreach(char c in favoriteFood)
{
  if(!char.IsUpper(c))
  {
    isUppercase = false;  // Disproves all uppercase, so get out.
    break;
  }
}

At the end of the loop, isUppercase will either be true or false.

As shown in the final example in the previous section on switching case, you can also access individual characters in a string by using an array index notation.

Note

Arrays start with zero, so if you want the first character, you ask for [0]. If you want the third, you ask for [2].

char thirdChar = favoriteFood[2];   // First 'e' in "cheeseburgers"

Searching Strings

What if you need to find a particular word, or a particular character, inside a string? Maybe you need its index so you can use Substring(), Replace(), Remove(), or some other method on it. In this section, you'll see how to find individual characters or substrings. (I'm still using the favoriteFood variable from the previous section.)

Can I find it?

The simplest thing is finding an individual character with IndexOf():

int indexOfLetterS = favoriteFood.IndexOf('s'),  // 4.

Class String also has other methods for finding things, either individual characters or substrings:

  • IndexOfAny() takes an array of chars and searches the string for any of them, returning the index of the first one found.

    char[] charsToLookFor = { 'a', 'b', 'c' };
    int indexOfFirstFound = favoriteFood.IndexOfAny(charsToLookFor); // 0.
  • That call is often written more briefly this way:

    int index = name.IndexOfAny(new char[] { 'a', 'b', 'c' });
  • LastIndexOf() finds not the first occurrence of a character but the last.

  • LastIndexOfAny() works like IndexOfAny(), but starting at the end of the string.

  • Contains() returns true if a given substring can be found within the target string:

    if(favoriteFood.Contains("ee")) ...            // True
  • And Substring() returns the string (if it's there), or empty (if not):

    string sub = favoriteFood.Substring(6, favoriteFood.Length - 6);
  • (I go into Substring() in greater detail later in this chapter.)

Is my string empty?

How can you tell if a target string is empty ("") or has the value null? (null means that no value has been assigned yet, not even to the empty string.) Use the IsNullOrEmpty() method, like this:

bool notThere = string.IsNullOrEmpty(favoriteFood);  // False

Notice how you call IsNullOrEmpty(): string.IsNullOrEmpty(s).

You can set a string to the empty string in these two ways:

string name = "";
string name = string.Empty;

Getting Input from the Command Line

A common task in console applications is getting the information that the user types in when you prompt her for, say, an interest rate or a name. You need to read the information that comes in as a string. (Everything coming from the command line comes as a string.) Then you sometimes need to parse the input to extract a number from it. And sometimes you need to process lots of input numbers.

Trimming excess white space

First, consider that in some cases, you don't want to mess with any white space on either end of the string. The term white space refers to the characters that don't normally display on the screen, for example, space, newline (or ), and tab ( ). You may sometimes also encounter the carriage return character, .

You can use the Trim() method to trim off the edges of the string, like this:

// Get rid of any extra spaces on either end of the string.
random = random.Trim();

Class String also provides TrimFront() and TrimEnd() methods for getting more specific, and you can pass an array of chars to be included in the trimming along with white space. For example, you might trim a leading currency sign, such as '$'.Cleaning up a string can make it easier to parse. The trim methods return a new string.

Parsing numeric input

A program can read from the keyboard one character at a time, but you have to worry about newlines and so on. An easier approach reads a string and then parses the characters out of the string.

Parsing characters out of a string is another topic I don't like to mention, for fear that programmers will abuse this technique. In some cases, programmers are too quick to jump into the middle of a string and start pulling out what they find there. This is particularly true of C++ programmers because that's the only way they could deal with strings — until the addition of a string class.

The ReadLine() method used for reading from the console returns a string object. A program that expects numeric input must convert this string. C# provides just the conversion tool you need in the Convert class. This class provides a conversion method from string to each built-in variable type. Thus, this code segment reads a number from the keyboard and stores it in an int variable:

string s = Console.ReadLine();  // Keyboard input is string data
int n = Convert.ToInt32(s);     // but you know it's meant to be a number.

The other conversion methods are a bit more obvious: ToDouble(), ToFloat(), and ToBoolean().

Note

ToInt32() refers to a 32-bit, signed integer (32 bits is the size of a normal int), so this is the conversion method for ints. ToInt64() handles the size of a long.

When Convert() encounters an unexpected character type, it can generate unexpected results. Thus, you must know for sure what type of data you're processing and ensure that no extraneous characters are present.

Although I haven't fully discussed methods yet (see Book II ), here's one anyway. The following method returns true if the string passed to it consists of only digits. You can call this method prior to converting into a type of integer, assuming that a sequence of nothing but digits is probably a legal number.

Warning

To be truly complete, you need to include the decimal point for floating-point variables and include a leading minus sign for negative numbers — but hey, you get the idea.

Here's the method:

// IsAllDigits -- Return true if all characters
//   in the string are digits.
public static bool IsAllDigits(string raw)
{
  // First get rid of any benign characters at either end;
  // if there's nothing left, you don't have a number.
  string s = raw.Trim();  // Ignore white space on either side.
  if (s.Length == 0) return false;
  // Loop through the string.
  for(int index = 0; index < s.Length; index++)
  {
    // A nondigit indicates that the string probably isn't a number.
    if (Char.IsDigit(s[index]) == false) return false;
  }
  // No nondigits found; it's probably okay.
  return true;
}

The method IsAllDigits() first removes any harmless white space at either end of the string. If nothing is left, the string was blank and could not be an integer. The method then loops through each character in the string. If any of these characters turns out to be a nondigit, the method returns false, indicating that the string is probably not a number. If this method returns true, the probability is high that the string can be converted into an integer successfully.

The following code sample inputs a number from the keyboard and prints it back out to the console. (I omitted the IsAllDigits() method from the listing to save space, but I've boldfaced where this program calls it.)

Note

// IsAllDigits -- Demonstrate the IsAllDigits method.
using System;
namespace IsAllDigits
{
  class Program
  {
    public static void Main(string[] args)
    {
      // Input a string from the keyboard.
      Console.WriteLine("Enter an integer number");
      string s = Console.ReadLine();
      // First check to see if this could be a number.
      if (!IsAllDigits(s)) // Call the special method.
      {
        Console.WriteLine("Hey! That isn't a number");
      }
      else
      {        // Convert the string into an integer.
               int n = Int32.Parse(s);
// Now write out the number times 2.
        Console.WriteLine("2 * " + n + ", = " + (2 * n));
      }
      // Wait for user to acknowledge the results.
      Console.WriteLine("Press Enter to terminate...");
      Console.Read();
    }
    // IsAllDigits here.
  }
}

The program reads a line of input from the console keyboard. If IsAllDigits() returns false, the program alerts the user. If not, the program converts the string into a number using an alternative to Convert.ToInt32(aString) — the Int32.Parse(aString) call. Finally, the program outputs both the number and two times the number (the latter to prove that the program did, in fact, convert the string as advertised).

The output from a sample run of the program appears this way:

Enter an integer number
1A3
Hey! That isn't a number
Press Enter to terminate...

Note

You could let Convert try to convert garbage and handle any exception it may decide to throw. However, a better-than-even chance exists that it won't throw an exception but will just return incorrect results — for example, returning 1 when presented with 1A3. You should validate input data yourself.

Tip

You could instead use Int32.TryParse(s, n), which returns false if the parse fails or true if it succeeds. If it does work, the converted number is found in the second parameter, an int that I named n. This won't throw exceptions. See the next section for an example.

Handling a series of numbers

Often, a program receives a series of numbers in a single line from the keyboard. Using the String.Split() method, you can easily break the string into a number of substrings, one for each number, and parse them separately.

The Split() method chops a single string into an array of smaller strings using some delimiter. For example, if you tell Split() to divide a string using a comma (,) as the delimiter, "1,2,3" becomes three strings, "1", "2", and "3". (The delimiter is whichever character you use to split collections.)

The following program uses the Split() method to input a sequence of numbers to be summed. (Again, I've omitted the IsAllDigits() method to save trees.)

Note

// ParseSequenceWithSplit -- Input a series of numbers separated by commas,
//    parse them into integers and output the sum.
namespace ParseSequenceWithSplit
{
  using System;
  class Program
  {
    public static void Main(string[] args)
    {
      // Prompt the user to input a sequence of numbers.
      Console.WriteLine(
           "Input a series of numbers separated by commas:");
      // Read a line of text.
      string input = Console.ReadLine();
      Console.WriteLine();
      // Now convert the line into individual segments
      // based upon either commas or spaces.
      char[] dividers = {',', ' '};
      string[] segments = input.Split(dividers);
      // Convert each segment into a number.
      int sum = 0;
      foreach(string s in segments)
      {
        // Skip any empty segments.
        if (s.Length > 0)
        {
          // Skip strings that aren't numbers.
          if (IsAllDigits(s))
          {
            // Convert the string into a 32-bit int.
            int num = 0;
            if (Int32.TryParse(s, out num))
            {
                       Console.WriteLine("Next number = {0}", num);
                       // Add this number into the sum.
                       sum += num;
            }
            // If parse fails, move on to next number.
          }
        }
      }
      // Output the sum.
      Console.WriteLine("Sum = {0}", sum);
      // Wait for user to acknowledge the results.
      Console.WriteLine("Press Enter to terminate...");
      Console.Read();
    }
    // IsAllDigits here.
  }
}

The ParseSequenceWithSplit program begins by reading a string from the keyboard. The program passes the dividers array of char to the Split() method to indicate that the comma and the space are the characters used to separate individual numbers. Either character will cause a split there.

The program iterates through each of the smaller subarrays created by Split() using the foreach loop control. The program skips any zero-length subarrays. (This would result from two dividers in a row.) The program next uses the IsAllDigits() method to make sure that the string contains a number. (It won't if, for instance, you type ,.3 with an extra nondigit, nonseparator character.) Valid numbers are converted into integers and then added to an accumulator, sum. Invalid numbers are ignored. (I chose not to generate an error message to keep this short.)

Here's the output of a typical run:

Input a series of numbers separated by commas:
1,2, a, 3 4

Next number = 1
Next number = 2
Next number = 3
Next number = 4
Sum = 10
Press Enter to terminate...

The program splits the list, accepting commas, spaces, or both as separators. It successfully skips over the a to generate the result of 10. In a real-world program, however, you probably don't want to skip over incorrect input without comment. You almost always want to draw the user's attention to garbage in the input stream.

Joining an array of strings into one string

Class String also has a Join() method. If you have an array of strings, you can use Join() to concatenate all of the strings. You can even tell it to put a certain character string between each item and the next in the array:

string[] brothers = { "Chuck", "Bob", "Steve", "Mike" };
string theBrothers = string.Join(":", brothers);

The result in theBrothers is "Chuck:Bob:Steve:Mike", with the names separated by colons. You can put any separator string between the names: ", ", " ", " ". The first item is a comma and a space. The second is a tab character. The third is a string of several spaces.

Controlling Output Manually

Controlling the output from programs is an important aspect of string manipulation. Face it: The output from the program is what the user sees. No matter how elegant the internal logic of the program may be, the user probably won't be impressed if the output looks shabby.

The String class provides help in directly formatting string data for output. The following sections examine the Pad(), PadRight(), PadLeft(), Substring(), and Concat() methods.

Using the Trim() and Pad() methods

I show earlier how to use Trim() and its more specialized variants, TrimFront() and TrimEnd(). Here, I discuss another common method for formatting output. You can use the Pad methods, which add characters to either end of a string to expand the string to some predetermined length. For example, you may add spaces to the left or right of a string to left- or right-justify it, or you can add "*" characters to the left of a currency number, and so on.

The following small AlignOutput program uses both Trim() and Pad() to trim up and justify a series of names:

Note

// AlignOutput -- Left justify and align a set of strings
//    to improve the appearance of program output.
namespace AlignOutput
{
  using System;
  using System.Collections.Generic;
  class Program
  {
    public static void Main(string[] args)
    {
      List<string> names = new List<string> {"Christa  ",
                                             "  Sarah",
                                             "Jonathan",
                                             "Sam",
                                             " Schmekowitz "};
      // First output the names as they start out.
      Console.WriteLine("The following names are of "
                        + "different lengths");
      foreach(string s in names)
      {
        Console.WriteLine("This is the name '" + s + "' before");
      }
      Console.WriteLine();

      // This time, fix the strings so they are
      // left justified and all the same length.
      // First, copy the source array into an array that you can manipulate.
      List<string> stringsToAlign = new List<string>();
      // At the same time, remove any unnecessary spaces from either end
      // of the names.
      for (int i = 0; i < names.Count; i++)
      {
        string trimmedName = names[i].Trim();
        stringsToAlign.Add(trimmedName);
      }
      // Now find the length of the longest string so that
      // all other strings line up with that string.
      int maxLength = 0;
      foreach (string s in stringsToAlign)
      {
if (s.Length > maxLength)
        {
          maxLength = s.Length;
        }
      }
      // Now justify all the strings to the length of the maximum string.
      for (int i = 0; i < stringsToAlign.Count; i++)
      {
        stringsToAlign[i] = stringsToAlign[i].PadRight(maxLength + 1);
      }
      // Finally output the resulting padded, justified strings.
      Console.WriteLine("The following are the same names "
                      + "normalized to the same length");
      foreach(string s in stringsToAlign)
      {
        Console.WriteLine("This is the name '" + s + "' afterwards");
      }
      // Wait for user to acknowledge.
      Console.WriteLine("
Press Enter to terminate...");
      Console.Read();
    }
  }
}

AlignOutput defines a List<string> of names of uneven alignment and length. (You could just as easily write the program to read these names from the console or from a file.) The Main() method first displays the names as they are. Main() then aligns the names using the Trim() and PadRight() methods before redisplaying the resulting trimmed up strings:

The following names are of different lengths:
This is the name 'Christa  ' before
This is the name '  Sarah' before
This is the name 'Jonathan' before
This is the name 'Sam' before
This is the name ' Schmekowitz ' before

The following are the same names rationalized to the same length:
This is the name 'Christa     ' afterwards
This is the name 'Sarah       ' afterwards
This is the name 'Jonathan    ' afterwards
This is the name 'Sam         ' afterwards
This is the name 'Schmekowitz ' afterwards

The alignment process begins by making a copy of the input names list.

The code first loops through the list, calling Trim() on each element to remove unneeded white space on either end. The method loops again through the list to find the longest member. The code loops one final time, calling PadRight() to expand each string to match the length of the longest member in the list. Note how the padded names form a neat column in the output.

PadRight(10) expands a string to be at least ten characters long. For example, PadRight(10) adds four spaces to the right of a six-character string.

Finally, the code displays the list of trimmed and padded strings for output. Voilà.

Using the Concatenate() method

You often face the problem of breaking up a string or inserting some substring into the middle of another. Replacing one character with another is most easily handled with the Replace() method, like this:

string s = "Danger NoSmoking";
s = s.Replace(' ', '!')

This example converts the string into "Danger!NoSmoking".

Replacing all appearances of one character (in this case, a space) with another (an exclamation mark) is especially useful when generating comma-separated strings for easier parsing. However, the more common and more difficult case involves breaking a single string into substrings, manipulating them separately, and then recombining them into a single, modified string.

The following RemoveWhiteSpace sample program uses the Replace() method to remove white space (spaces, tabs, and newlines — all instances of a set of special characters) from a string:

Note

// RemoveWhiteSpace -- Remove any of a set of chars from a given string.
//    Use this method to remove whitespace from a sample string.
namespace RemoveWhiteSpace
{
 using System;
  public class Program
  {
    public static void Main(string[] args)
    {
      // Define the white space characters.
      char[] whiteSpace = {' ', '
', '	'};
      // Start with a string embedded with whitespace.
      string s = " this is a
string"; // Contains spaces & newline.
      Console.WriteLine("before:" + s);
      // Output the string with the whitespace missing.
      Console.Write("after:");
      // Start looking for the white space characters.
      for(;;)
      {
        // Find the offset of the character; exit the loop
        // if there are no more.
        int offset = s.IndexOfAny(whiteSpace);
        if (offset == −1)
        {
          break;
        }
        // Break the string into the part prior to the
        // character and the part after the character.
        string before = s.Substring(0, offset);
        string after  = s.Substring(offset + 1);
        // Now put the two substrings back together with the
        // character in the middle missing.
        s = String.Concat(before, after);
        // Loop back up to find next whitespace char in
        // this modified s.
      }
Console.WriteLine(s);
      // Wait for user to acknowledge the results.
      Console.WriteLine("Press Enter to terminate...");
      Console.Read();
    }
  }
}

The key to this program is the boldfaced loop. This loop continually refines a string consisting of the input string, s, removing every one of a set of characters contained in the array whiteSpace.

The loop uses IndexOfAny() to find the first occurrence of any of the chars in the whiteSpace array. It doesn't return until every instance of any of those chars has been removed. The IndexOfAny() method returns the index within the array of the first white space char that it can find. A return value of −1 indicates that no items in the array were found in the string.

The first pass through the loop removes the leading blank on the target string. IndexOfAny() finds the blank at index 0. The first Substring() call returns an empty string, and the second call returns the whole string after the blank. These are then concatenated with Concat(), producing a string with the leading blank squeezed out.

The second pass through the loop finds the space after "this" and squeezes that out the same way, concatenating the strings "this" and "is a string". After this pass, s has become "thisis a string".

The third pass finds the character and squeezes that out. On the fourth pass, IndexOfAny() runs out of white space characters to find and returns −1 (not found). That ends the loop.

The RemoveWhiteSpace program prints out a string containing several forms of white space. The program then strips out white space characters. The output from this program appears as follows:

before: this is a
string
after:thisisastring
Press Enter to terminate...

Let's Split() that concatenate program

The RemoveWhiteSpace program demonstrates the use of the Concat() and IndexOf() methods; however, it doesn't use the most efficient approach. As usual, a little examination reveals a more efficient approach using our old friend Split(). You can find the program containing this code — now in another example of a method — on the Web site under RemoveWhiteSpaceWithSplit. The method that does the work is shown here:

Note

// RemoveWhiteSpace -- The RemoveSpecialChars method removes every
//    occurrence of the specified characters from the string.
// Note: The rest of the program is not shown here.
public static string RemoveSpecialChars(string input, char[] targets)
{
  // Split the input string up using the target
  // characters as the delimiters.
  string[] subStrings = input.Split(targets);

  // output will contain the eventual output information.
  string output = "";

  // Loop through the substrings originating from the split.
  foreach(string subString in subStrings)
  {
    output = String.Concat(output, subString);
  }
  return output;
}

This version uses the Split() method to break the input string into a set of substrings, using the characters to be removed as delimiters. The delimiter is not included in the substrings created, which has the effect of removing the character(s). The logic here is much simpler and less error-prone.

The foreach loop in the second half of the program puts the pieces back together again using Concat(). The output from the program is unchanged.

Pulling the code out into a method further simplifies it and makes it clearer.

Formatting Your Strings Precisely

The String class also provides the Format() method for formatting output, especially the output of numbers. In its simplest form, Format() allows the insertion of string, numeric, or Boolean input in the middle of a format string. For example, consider this call:

string myString = String.Format("{0} times {1} equals {2}", 2, 5, 2*5);

The first argument to Format() is known as the format string — the quoted string you see. The {n} items in the middle of the format string indicate that the nth argument following the format string is to be inserted at that point. {0} refers to the first argument (in this case, the value 2), {1} refers to the next (that is, 5), and so on.

This returns a string, myString. The resulting string is

"2 times 5 equals 10"

Unless otherwise directed, Format() uses a default output format for each argument type. Format() enables you to affect the output format by including specifiers (modifiers or controls) in the placeholders. See Table 3-1 for a listing of some of these specifiers. For example, {0:E6} says, "Output the number in exponential form, using six spaces for the fractional part."

Table 3-1. Format Specifiers Using String.Format()

Control

Example

Result

Notes

C — currency

{0:C} of 123.456

$123.45

The currency sign depends on the Region setting.

 

{0:C} of −123.456

($123.45)

(Specify Region in Windows control panel.)

D — decimal

{0:D5} of 123

00123

Integers only.

E — exponential

{0:E} of 123.45

1.2345E+002

Also known as scientific notation.

F — fixed

{0:F2} of 123.4567

123.45

The number after the F indicates the number of digits after the decimal point.

N — number

{0:N} of 123456.789

123,456.79

Adds commas and rounds off to nearest 100th.

 

{0:N1} of 123456.789

123,456.8

Controls the number of digits after the decimal point.

 

{0:N0} of 123456.789

123,457

 

X — hexadecimal

{0:X} of 123

0x7B

7B hex = 123 decimal (integers only).

{0:0...}

{0:000.00} of 12.3

012.30

Forces a 0 if a digit is not already present.

{0:#...}

{0:###.##} of 12.3

12.3

Forces the space to be left blank; no other field can encroach on the three digits to the left and two digits after the decimal point (useful for maintaining decimal-point alignment).

 

{0:##0.0#} of 0

0.0

Combining the # and zeros forces space to be allocated by the #s and forces at least one digit to appear, even if the number is 0.

{0:# or 0%}

{0:#00.#%} of .1234

12.3%

The % displays the number as a percentage (multiplies by 100 and adds the % sign).

 

{0:#00.#%} of .0234

02.3%

 

Tip

The Console.WriteLine() method uses the same placeholder system. The first placeholder, {0}, takes the first variable or value listed after the format string part of the statement, and so on. Given the exact same arguments as in the earlier Format() call, Console.WriteLine() would write the same string to the console. You also have access to the format specifiers. From now on, I use the formatted form of WriteLine() much of the time, rather than concatenate items to form the final output string with the + operator.

These format specifiers can seem a bit bewildering. (I didn't even mention the detailed currency and date controls.) Explore the topic "format specifiers" in the Help Index for more information. To help you wade through these options, the following OutputFormatControls program enables you to enter a floating-point number followed by a specifier sequence. The program then displays the number, using the specified Format() control:

Note

// OutputFormatControls -- Allow the user to reformat input numbers
//     using a variety of format specifiers input at run time.
namespace OutputFormatControls
{
  using System;
  public class Program
  {
    public static void Main(string[] args)
    {
      // Keep looping -- inputting numbers until the user
      // enters a blank line rather than a number.
      for(;;)
      {
        // First input a number -- terminate when the user
        // inputs nothing but a blank line.
        Console.WriteLine("Enter a double number");
        string numberInput = Console.ReadLine();
        if (numberInput.Length == 0)
        {
          break;
        }
        double number = Double.Parse(numberInput);
        // Now input the specifier codes; split them
        // using spaces as dividers.
        Console.WriteLine("Enter the format specifiers"
                          + " separated by a blank "
                          + "(Example: C E F1 N0 0000000.00000)");
        char[] separator = {' '};
        string formatString = Console.ReadLine();
        string[] formats = formatString.Split(separator);
        // Loop through the list of format specifiers.
        foreach(string s in formats)
        {
          if (s.Length != 0)
          {
            // Create a complete format specifier
            // from the letters entered earlier.
            string formatCommand = "{0:" + s + "}";
            // Output the number entered using the
            // reconstructed format specifier.
            Console.Write(
"The format specifier {0} results in ", formatCommand);
            try
            {
              Console.WriteLine(formatCommand, number);
            }
            catch(Exception)
            {
              Console.WriteLine("<illegal control>");
            }
            Console.WriteLine();
          }
        }
      }
      // Wait for user to acknowledge.
      Console.WriteLine("Press Enter to terminate...");
      Console.Read();
    }
  }
}

The OutputFormatControls program continues to read floating-point numbers into a variable numberInput until the user enters a blank line. (Because the input is a bit tricky, I include an example for the user to imitate as part of the message asking for input.) Notice that the program does not include tests to determine whether the input is a legal floating-point number. I just assume that the user is smart enough to know what a number looks like (a dangerous assumption!).

The program then reads a series of specifier strings separated by spaces. Each specifier is then combined with a "{0}" string (the number before the colon, which corresponds to the placeholder in the format string) into the variable formatCommand. For example, if you entered N4, the program would store the specifier "{0:N4}". The following statement writes the number number using the newly constructed formatCommand:

Console.WriteLine(formatCommand, number);

In the case of the lowly N4, the command would be rendered this way:

Console.WriteLine("{0:N4}", number);

Typical output from the program appears this way (I boldfaced my input):

Enter a double number
12345.6789
Enter the specifiers separated by a blank (Example: C E F1 N0 0000000.00000)
C E F1 N0 0000000.00000
The format specifier {0:C} results in $12,345.68

The format specifier {0:E} results in 1.234568E+004

The format specifier {0:F1} results in 12345.7

The format specifier {0:N0} results in 12,346
The format specifier {0:0000000.00000} results in 0012345.67890

Enter a double number
.12345
Enter the specifiers separated by a blank (Example: C E F1 N0 0000000.00000)
00.0%
The format specifier {0:00.0%} results in 12.3%
Enter a double number

Press Enter to terminate...

When applied to the number 12345.6789, the specifier N0 adds commas in the proper place (the N part) and lops off everything after the decimal point (the 0 portion) to render 12,346. (The last digit was rounded off, not truncated.)

Similarly, when applied to 0.12345, the control 00.0% outputs 12.3%. The percent sign multiplies the number by 100 and adds %. The 00.0 indicates that the output should include at least two digits to the left of the decimal point and only one digit after the decimal point. The number 0.01 is displayed as 01.0%, using the same 00.0% specifier.

Note

The mysterious try . . . catch catches any errors that spew forth in the event you enter an illegal format command such as a D, which stands for decimal. (I cover exceptions in Chapter 9 of this minibook.)

StringBuilder: Manipulating Strings More Efficiently

Building longer strings out of a bunch of shorter strings can cost you an arm and its elbow. Because a string, after it's created, can't be changed — it's immutable, as I say at the beginning of this chapter. This example doesn't tack "ly" onto s1:

string s1 = "rapid";
string s2 = s1 + "ly";              // s2 = rapidly.

It creates a new string composed of the combination. (s1 is unchanged.) Similarly, other operations that appear to modify a string, such as Substring() and Replace(), do the same.

The result is that each operation on a string produces yet another string. Suppose you need to concatenate 1000 strings into one huge one. You're going to create a new string for each concatenation:

string[] listOfNames = ...  // 1000 pet names
string s = string.Empty;
for(int i = 0; i < 1000; i++)
{
  s += listOfNames[i];
}

To avoid such costs when you're doing lots of modifications to strings, use the companion class StringBuilder. Be sure to add this line at the top of your file:

using System.Text;  // Tells the compiler where to find StringBuilder.

Note

Unlike String manipulations, the manipulations you do on a StringBuilder directly change the underlying string. Here's an example:

StringBuilder builder = new StringBuilder("012");
builder.Append("34");
builder.Append("56");
string result = builder.ToString();  // result = 0123456

Create a StringBuilder instance initialized with an existing string, as just shown. Or create an empty StringBuilder with no initial value:

StringBuilder builder = new StringBuilder();  // Defaults to 16 characters

You can also create the StringBuilder with the capacity you expect it to need, which reduces the overhead of increasing the builder's capacity frequently:

StringBuilder builder = new StringBuilder(256); // 256 characters.

Use the Append() method to add text to the end of the current contents. Use ToString() to retrieve the string inside the StringBuilder when you finish your modifications. Here's the StringBuilder version of the loop just shown, with retrieval of the final concatenated string in boldface:

StringBuilder sb = new StringBuilder(20000);  // Allocate a bunch.
for(int i = 0; i < 1000; i++)
{
  sb.Append(listOfNames[i]);    // Same list of names as earlier
}
string result = sb.ToString();  // Retrieve the results.

StringBuilder has a number of other useful string manipulation methods, including Insert(), Remove(), and Replace(). It lacks many of string's methods, though, such as Substring(), CopyTo(), and IndexOf().

Suppose that you want to make uppercase just the first character of a string, as in the earlier section "Converting a string to upper- or lowercase." With StringBuilder, it's much cleaner looking than the code I gave earlier.

StringBuilder sb = new StringBuilder("jones");
sb[0] = char.ToUpper(sb[0]);
string fixedString = sb.ToString();

This puts the lowercase string "jones" into a StringBuilder, accesses the first char in the StringBuilder's underlying string directly with sb[0], uses the char.ToUpper() method to uppercase the character, and reassigns the uppercased character to sb[0]. Finally, it extracts the improved string "Jones" from the StringBuilder.

The BuildASentence example presented earlier in this chapter could benefit from using a StringBuilder. I use StringBuilder quite a bit.

Note

The StringCaseChanging and VariousStringTechniques examples on this book's Web site show StringBuilder in action.

Tip

Book II introduces a C# feature called extension methods. The example there adds several handy methods to the String class. Later in that minibook, we describe how to convert between strings, arrays of char, and arrays of byte. Those are operations you may need to do frequently (and are shown in the StringCaseChanging example on this book's Web site).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.130.232