Chapter 5. Strings

Introduction

Every Visual Basic developer quickly learns how to manipulate strings, but it’s often easy to overlook some of the more powerful techniques available, especially with all the new features in Visual Basic 2005. A good example is the powerful StringBuilder object, which provides an order-of-magnitude improvement for concatenating strings. Visual Basic 6 developers, in particular, will discover lots of exciting new string-processing features. For example, Visual Basic 2005’s Substring() method provides similar functionality not only to the Mid() function, but also to the Left() and Right() string functions. The regular expression library included with .NET also provides new and powerful ways to analyze and process string data.

5.1. Using a StringBuilder

Problem

You need to process many pieces of string data with more efficiency than is allowed using standard .NET Framework immutable strings.

Solution

The StringBuilder object provides extremely fast and efficient in-place processing of string and character data. The following code demonstrates several of its powerful methods and some of the techniques you can use to speed up your string processing:

	Dim workText As New System.Text.StringBuilder
	
	' ----- Build a basic text block.
	workText.Append("The important")
	workText.Append(vbNewLine)
	workText.Append("thing is not")
	workText.AppendLine()
	workText.AppendLine("to stop questioning.")

	workText. 
Append("--Albert Einstein")
	MsgBox(workText.ToString( ))

	' ----- Delete the trailing text.
	Dim endSize As Integer = "--Albert Einstein".Length
	workText.Remove(workText.Length - endSize, endSize)
	MsgBox(workText.ToString( ))

	' ----- Modify text in the middle.
	workText.Insert(4, "very ")
	MsgBox(workText.ToString( ))

	' ----- Perform a search and replace.
	workText.Replace("not", "never")
	MsgBox(workText.ToString( ))

	' ----- Truncate the existing text.
	workText.Length = 3
	MsgBox(workText.ToString( ))

Discussion

The first line of the previous code creates a new instance of the StringBuilder object. The next half dozen or so lines of code show various common uses of the StringBuilder's Append() and AppendLine() methods. Each call to Append() or AppendLine() concatenates another string or character piece into the StringBuilder’s buffer. Figure 5-1 shows the result of these first few append actions.

Piecing together strings with the StringBuilder
Figure 5-1. Piecing together strings with the StringBuilder

Avoid the temptation to concatenate these string pieces using the & operator as you prepare the various pieces for appending to the StringBuilder. Doing so detracts from the efficiency and speed advantages of the StringBuilder. For example, both of the following lines of code are legal and correct, but the line that uses the & operator does a lot more work behind the scenes:

	' ----- Don't do this!
	workText.Append("This " & "is " & "not advisable!")

	' ----- Please do this.
	workText.Append("This ").Append("is ").Append("faster!")

The first statement (the one using the & operator) must make working copies of the immutable strings to do the concatenations. Timing tests demonstrate that this can slow down your code measurably.

Besides Append(), the StringBuilder object also provides methods that parallel other functions available for processing true strings. These include Remove(), Replace(), and Insert() methods, as demonstrated in the sample code presented earlier in this recipe. The Length property shown in the sample is also available as a standard property of strings. The remaining lines of code in the sample demonstrate the use of these methods by modifying parts of the original quote.

A StringBuilder’s contents are technically not a string. Rather, the StringBuilder maintains an internal buffer of characters that at any time can easily be converted to a string using the StringBuilder’s ToString() method. Think of a StringBuilder as a string in the making that’s not really a string until you want it to be.

Behind the scenes, the default StringBuilder’s buffer starts out with a working space, or capacity, of only 16 bytes. The buffer automatically doubles in size whenever it needs more space, jumping to 32 bytes, then 64, and so on. If you have a good idea how much space your string processing may require, you can initialize StringBuilder’s buffer to a given capacity during the declaration. For example, this declaration creates a StringBuilder instance with a preallocated buffer size of 1,000 bytes:

	Dim workText As New System.Text.StringBuilder(1000)

The advantage of providing the starting capacity is a potential performance boost. In this case, the buffer’s workspace won’t need to be doubled until enough strings have been appended to overflow the 1,000-byte limit.

You can access the StringBuilder’s capacity at runtime through its Capacity property. It’s enlightening to read this property to follow along as the StringBuilder doubles in size during execution. You can set the Capacity to a new value at any time, but if you set the Capacity to less than the StringBuilder’s current Length, an exception occurs. If your intent is to shorten, or truncate, the contents of the buffer, set the Length property instead, and leave Capacity alone. The easiest way to empty a StringBuilder of its contents is to set its Length property to zero.

See Also

Recipe 5.26 also discusses building up strings from smaller components.

5.2. Creating a String of N Identical Characters

Problem

You need to create a string comprised of a single character repeated many times. These strings are sometimes useful in the formatting of ASCII text for display or printed output.

Solution

Create a new string of repeated characters using the String class itself. One of its overloaded constructors accepts a character to repeat and a repetition count.

Discussion

Most of the time you create string variables using the default constructor, which initializes the variables to Nothing. This is why you must assign a string value to a string variable after creating it, but before using its contents. However, you can use over-loaded versions of the string constructor to assign string data immediately upon creation. One version of the string constructor takes a character and a count and efficiently builds a string by repeating the character the given number of times. The following statement builds a string of 72 asterisks:

	Dim lotsOfAsterisks As New String("*"c, 72)

Visual Basic 2005 also provides a second way to create strings of duplicated characters. The StrDup() function, which is very similar to the original String() function found in Visual Basic 6, does the trick:

	lotsOfAsterisks = StrDup(72, "*")

Notice the difference in the order of the parameters between the string constructor syntax and the function call. Fortunately, Visual Studio’s IntelliSense means you don’t have to memorize the order of the parameters.

See Also

Recipe 5.45 demonstrates another method of creating strings of a common character.

5.3. Creating a String by Repeating a String N Times

Problem

You want a string comprised of a sequence of characters repeated many times. For example, you want to create a fancy separator string comprised of alternating “+” and “~” characters, as shown in Figure 5-2.

A string formed by repeating two characters many times
Figure 5-2. A string formed by repeating two characters many times

Solution

Use a StringBuilder to append as many copies of the string as desired. Then convert the result to a true string using the StringBuilder’s ToString() method:

	Dim fancyString As New System.Text.StringBuilder
	For counter As Integer = 1 To 35
	   fancyString.Append("+~")
	Next counter
	MsgBox(fancyString.ToString())

Discussion

Strings in .NET are immutable, which means that once they’ve been created, they sit in one spot in memory and can never be modified. All functions that might appear to be changing a string’s contents are actually making new copies of the original string, modified en route. In most cases, immutability provides superior string handling and processing capabilities, but when it comes to concatenating strings, the speed and efficiency advantages are nullified.

The StringBuilder object solves the concatenation dilemma nicely. It allows dynamic, in-place modification of a buffer containing a sequence of string characters, without the need to constantly reallocate String objects. If the allocated buffer space runs out, the StringBuilder efficiently and automatically doubles the number of bytes for its character workspace, and it will do so as many times as are required to handle the strings and characters appended to it.

See Also

Recipe 5.27 shows how the StringBuilder alternative really is faster than standard string concatenation.

5.4. Obfuscating a String

Problem

You need to store a string in such a way that a user won’t recognize it, but you also want to make sure that the string stays the same length and that it contains only printable ASCII characters.

Solution

Sample code folder: Chapter 05ObfuscateString

Process each printable character of the string by shifting its ASCII value to that of another character within the same set. The following two functions can be used to obfuscate strings in this way and then return them to their original states:

	Public Function Obfuscate(ByVal origText As String) As String
	   ' ----- Make a string unreadable, but retrievable.
	   Dim textBytes As Byte( ) = _
	      System.Text.Encoding.UTF8.GetBytes(origText)
	   For counter As Integer = 0 To textBytes.Length - 1
	      If (textBytes(counter) > 31) And _
	            (textBytes(counter) < 127) Then
	         textBytes(counter) += CByte(counter Mod 31 + 1)
	         If (textBytes(counter) > 126) Then _
	            textBytes(counter) -= CByte(95)
	      End If
	   Next counter
	   Return System.Text.Encoding.UTF8.GetChars(textBytes)
	End Function

	Public Function DeObfuscate(ByVal origText As String) _
	      As String
	   ' ----- Restore a previously obfuscated string.
	   Dim textBytes As Byte( ) = _
	      System.Text.Encoding.UTF8.GetBytes(origText)
	   For counter As Integer = 0 To textBytes.Length - 1
	      If (textBytes(counter) > 31) And _
	            (textBytes(counter) < 127) Then
	         textBytes(counter) -= CByte(counter Mod 31 + 1)
	         If (textBytes(counter) < 32) Then _
	            textBytes(counter) += CByte(95)
	      End If
	   Next counter
	   Return System.Text.Encoding.UTF8.GetChars(textBytes)
	End Function

Figure 5-3 shows a string before and after calling Obfuscate(), and after returning it to its original state by calling DeObfuscate().

Results of obfuscating a string to make it unreadable, then deobfuscating it
Figure 5-3. Results of obfuscating a string to make it unreadable, then deobfuscating it

Discussion

The Obfuscate() function lets you modify strings to an unreadable state without resorting to full-blown cryptographic techniques. An example of where this might come in handy is for storing string data in the registry in such a manner that the original contents are not easily searched for and that the typical user won’t recognize the data.

When modifying individual bytes of a string, it’s often best to first convert the string to an array of bytes, as shown in these functions. You can freely modify the byte values in place, unlike the contents of the immutable string they came from, and generate a new string result by converting the entire byte array in one function call.

If you work with international character sets, consider using the Unicode versions of the encoding conversion functions instead of the UTF8 versions. The byte arrays will be twice as large, but you should be able to handle other sets of characters. You’ll also need to pay close attention to the numerical shift of the byte values, modifying the above code to keep the results within the desired range of characters.

See Also

Recipe 5.23 discusses additional modifications to strings that can be reversed.

5.5. Converting Binary Data to a Hexadecimal String

Problem

You need to convert a byte array to a hexadecimal string. This is handy for the display or documentation of binary data.

Solution

Use a bit converter to get the hexadecimal representation of each byte within a block of data. The following code generates the hexadecimal string from source data:

	Dim result As String = Replace(BitConverter.ToString( _
	   origBytes), "-", "")

Discussion

There are several approaches to solving this problem. A quick review of some of these approaches will demonstrate several different programming techniques available to you in Visual Basic 2005.

The code samples in this recipe assume a byte array named origBytes built using the following code, which creates a byte array of length 256 containing one each of the byte values 0 through 255:

	Dim origBytes(255) As Byte
	For counter As Short = 0 To 255
	   origBytes(counter) = CByte(counter)
	Next counter

The first approach is somewhat “brute force” in nature. Each byte of the array is converted to a two-character string using one of the many formatting options of the byte’s ToString() method. These short strings are concatenated to the result string one at a time:

	Dim result As String = ""
	For counter As Short = 0 To 255
	   result &= origBytes(counter).ToString("X2")
	Next counter

This is fine for small arrays of bytes, but the string concatenation quickly becomes problematic as the byte count increases. The next approach uses a StringBuilder to make the concatenation more efficient for large data sources:

	Dim workText As New System.Text.StringBuilder(600)
	For counter = 0 To 255
	   workText.Append(origBytes(counter).ToString("X2"))
	Next counter
	Dim result As String = workText.ToString()

This solution runs faster, but it seems to lack the elegance and power we expect of Visual Basic. Fortunately, the .NET Framework is full of surprises, and of useful objects too. The BitConverter object provides a shared method that converts an entire array of bytes to a hexadecimal string in one call. The resulting string has dashes between each pair of hexadecimal characters. This can be nice in some circumstances, but in this case, we’re trying to create a compact hexadecimal string comprised of only two characters for each byte. The following two lines of code show how to call the BitConverter.ToString() method, and then squeeze out all the dashes using a single call to the Replace() function:

	Dim result As String
	result = BitConverter.ToString(origBytes) '00-3F-F7 etc.
	result = Replace(result, "-", "")     '003FF7 etc.

The solution presented first in this recipe is the result of combining these two function calls into a single line of code. Figure 5-4 shows the resulting hexadecimal string displaying all possible byte values.

The hexadecimal string equivalent of a byte array comprised of the values 0 to 255
Figure 5-4. The hexadecimal string equivalent of a byte array comprised of the values 0 to 255

See Also

Recipes 5.16 and 5.26 show other useful ways of modifying portions of strings.

5.6. Extracting Substrings from Larger Strings

Problem

You want to extract substrings located at the left end, the right end, or somewhere in the middle of a string.

Solution

Visual Basic 2005 strings now have a built-in method named Substring() that provides an alternative to the traditional Visual Basic functions Left(), Mid(), and Right(), although the language retains these features if you wish to use them. To emulate each of these functions, set the Substring() method’s parameters appropriately. The following code shows how to do this:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	
	' ----- Left(quote, 3) … "The"
	   MsgBox(quote.Substring(0, 3))
	
	' ----- Mid(quote, 5, 9) … "important"
	   MsgBox(quote.Substring(4, 9))

	' ----- Mid(quote, 58) … "Einstein"
	   MsgBox(quote.Substring(57))

	' ----- Right(quote, 8) … "Einstein"
	   MsgBox(quote.Substring(quote.Length - 8))

Discussion

Each line of code in the sample is prefaced by a comment line showing the equivalent syntax from VB 6. One of the big differences apparent in these examples is that the first character in the string is now at offset position 0 instead of 1, requiring a change in the offsets supplied to the Substring() method. The lengths of the sub-strings are still the same.

5.7. Converting a String’s Case

Problem

You want to convert a string to all uppercase, all lowercase, or mixed case (with only the first letter of each word in uppercase).

Solution

Sample code folder: Chapter 05MixedCase

The string methods ToUpper() and ToLower() make it easy to convert strings to upper-and lowercase, and a short special-purpose function can perform the mixed conversion. You can also use the standard Visual Basic UCase() and LCase() methods. To mix-case a string, use Visual Basic’s StrConv() function.

Discussion

Changing strings to upper- or lowercase is standard Visual Basic fare:

	' ----- To upper case.
	newString = oldString.ToUpper()
	newString = UCase(oldString)
	
	' ----- To lower case.
	newString = oldString.ToLower()
	newString = LCase(oldString)

To convert the string to mixed or “proper” case, use one of the conversion methods included in the StrConv() function:

	newString = StrConv(oldString, VbStrConv.ProperCase)

This function converts the first letter of each word to uppercase, making every other letter lowercase. Its rules are pretty basic, and it doesn’t know about special cases. If you need to correctly capitalize names such as “MacArthur,” you have to write a custom routine. The following code provides the start of a routine using an algorithm that works much like the StrConv() function. It assumes that space characters separate each word:

	Public Function MixedCase(ByVal origText As String) As String
	   ' ----- Convert a string to "proper" case.
	   Dim counter As Integer
	   Dim textParts() As String = Split(origText, " ")
	
	   For counter = 0 To textParts.Length - 1
	      If (textParts(counter).Length > 0) Then _

	         textParts(counter) = _
	         UCase(Microsoft.VisualBasic.Left( _
	         textParts(counter), 1)) & _
	         LCase(Mid(textParts(counter), 2))
	   Next counter

	   Return Join(textParts, " ")
	End Function

The code splits up the original text into an array at space-character boundaries using the Split() function. It then processes each word separately and merges them back together with the Join() method.

Figure 5-5 shows the results of various conversions on a string, including a conversion using the custom MixedCase() function. Notice that “albert” is not capitalized in the mixed-case string. This is because the two leading dashes are considered to be part of this word, based on how the Split() function separated the words at space-character locations.

The original string before and after various case conversions
Figure 5-5. The original string before and after various case conversions

See Also

Recipe 5.44 discusses the Split() function and the Split() method.

5.8. Comparing Strings with Case Sensitivity

Problem

You need to compare two strings, taking into account their case.

Solution

Use the shared Compare() method provided by the String object to compare two strings:

	Select Case String.Compare(content1, content2, False)
	   Case Is < 0
	      MsgBox("Content1 comes before Content2.")
	   Case Is > 0
	      MsgBox("Content1 comes after Content2.")
	   Case Is = 0
	      MsgBox("Content1 and Content2 are the same.")
	End Select

Setting the third parameter of the Compare() method to False instructs the method to perform a case-sensitive comparison.

Discussion

Consider the results shown in Figure 5-6, which indicate that “apples” is less than “Apples”. The ASCII values for the lowercase character “a” and the uppercase character “A” are 97 and 65, respectively, which normally puts the uppercase version first. But the String.Compare() method compares text using culture-defined sorting rules, and by default, English words beginning with lowercase letters are considered “less than” the same words beginning with uppercase letters.

Culture-defined rules apply to case-sensitive string comparisons
Figure 5-6. Culture-defined rules apply to case-sensitive string comparisons

You can change the comparison rules in several ways to match what you want to accomplish. See the Visual Studio online help for the CompareOptions property for more information on how to make these changes.

See Also

Recipe 5.9 discusses related comparisons.

5.9. Comparing Strings Without Case Sensitivity

Problem

You need to compare two strings without regard to their case.

Solution

Use the shared Compare() method provided by the String object to compare two strings:

	Select Case String.Compare(content1, content2, True)
	   Case Is < 0
	      MsgBox("Content1 comes before Content2.")
	   Case Is > 0
	      MsgBox("Content1 comes after Content2.")
	   Case Is = 0
	      MsgBox("Content1 and Content2 are the same.")
	End Select

Setting the third parameter of the Compare() method to True instructs the method to perform a case-insensitive comparison.

Discussion

This type of string comparison compares all alphabetic characters as though lower-case and uppercase characters were identical. Figure 5-7 shows that “apples” is equal to “Apples” when the strings are compared this way.

When case is ignored, lowercase and uppercase are treated identically
Figure 5-7. When case is ignored, lowercase and uppercase are treated identically

String comparisons are culturally defined by default, so be sure the sort order you get is really what you want. See the Visual Studio online help for the CompareOptions property to find more information on how to make changes to the way strings are sorted.

See Also

Recipe 5.8 discusses related comparisons.

5.10. Converting Strings to and from Character Arrays

Problem

You need to work with individual characters in a string efficiently, changing them in place in memory if possible.

Solution

Sample code folder: Chapter 05StringsAndCharArrays

Use CType() to convert the string to an array of characters, modify characters throughout the array, and then directly convert the character array back to a string:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim charArray() As Char = CType(quote, Char())
	charArray(46) = "!"c
	Dim result As String = New String(charArray)
	MsgBox(result)

Discussion

In this example, the string is converted to a character array using the versatile CType() type-conversion function. In this form, it’s easy to make a change such as replacing the period at index 46 with an exclamation point. The array is then recombined into a string by passing it to the overloaded version of the String constructor that takes an array of characters to initialize the new string. Figure 5-8 shows the displayed string result, now showing an exclamation point instead of a period.

Converting a string to an array of characters enables easy modification of individual characters in that string
Figure 5-8. Converting a string to an array of characters enables easy modification of individual characters in that string

There is another way to access individual characters in a string, but it’s read-only, so you can’t use the technique to modify the string:

	MsgBox(someString. 
Chars(46))

All strings have a Chars() property that lets you access an indexed character from the string with minimal overhead. The index is zero-based, so Chars(46) returns the 47th character.

See Also

Recipe 5.12 also examines working with individual characters within a larger string.

5.11. Converting Strings to and from Byte Arrays

Problem

You need to convert a string to bytes, and back to a string from a byte array. This enables you to work with the exact binary data comprising the string.

Solution

Sample code folder: Chapter 05StringsAndByteArrays

Use shared methods of the System.Text. Encoding object to convert to and from bytes. If you know the string data to be comprised entirely of ASCII characters, use UTF8 encoding to minimize the length of the byte array. Unicode encoding, which results in two bytes per character instead of one, can be used to guarantee no loss of data when making these conversions.

Discussion

The following sample code shows both UTF8 and Unicode encoding methods:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim bytes() As Byte
	Dim result As String

	' ----- Assumed to be all ASCII character.
	bytes = System.Text.Encoding.UTF8.GetBytes(quote)
	bytes(46) = 33  ' ASCII exclamation point
	result = System.Text.Encoding.UTF8.GetString(bytes)
	MsgBox(result)

	' ----- Works with all character sets.
	bytes = System.Text.Encoding.Unicode.GetBytes(quote)
	bytes(92) = 63  ' ASCII question mark
	bytes(93) = 0
	result = System.Text.Encoding.Unicode.GetString(bytes)
	MsgBox(result)

When using UTF8 encoding, the number of bytes in the array is the same as the number of characters in the string. The character at indexed position 46 in the string is a period. During the first conversion, this period is changed to an exclamation point, and the resulting string is displayed, a result identical to that previously shown in Figure 5-8.

A Unicode-encoded byte array contains twice as many bytes as the number of characters in the original string. This makes sense when you consider that Unicode characters are 16 bits each (or two bytes) in size. Take a close look at the byte array modifications in the second part of the example code. The byte at position 92 (twice as far into the array as the ASCII variation) is set to the desired ASCII value (63 in this case, for the question mark). But because each character now consumes two bytes in the array, you must set both bytes. Setting the byte at position 93 clears the other half of the two-byte set. Figure 5-9 shows the resulting string, now sporting a question mark at the 46th character position.

Changing the Unicode character at byte locations 92 and 93 to a question mark
Figure 5-9. Changing the Unicode character at byte locations 92 and 93 to a question mark

5.12. Tallying Characters

Problem

You want to tally, or count the occurrences of, each character value in a string.

Solution

Sample code folder: Chapter 05TallyCharacters

Convert the string to a byte array, and then tally the 256 possible byte values into an array of integer counts.

Discussion

In the case presented, the string is assumed to be all ASCII, which means conversion using UTF8 encoding is appropriate, and the tally array only needs to be dimensioned to hold 256 counting bins:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim counter As Integer
	Dim tally(255) As Integer

Convert the string to a byte array, and then loop through each byte of the array to increment the count for each byte value:

	Dim bytes() As Byte = _
	   System.Text.Encoding.UTF8.GetBytes(quote)
	For counter = 0 To bytes.Length - 1

	   tally(bytes(counter)) += 1
	Next counter

The rest of the example prepares the tally for display. For efficiency, the code presents only characters with nonzero counts:

	Dim result As New System.Text.StringBuilder(quote)
	For counter = 0 To 255
	   If (tally(counter) > 0) Then
	      result.AppendLine( )
	      result.Append(Chr(counter))
	      result.Append(Space(3))
	      result.Append(tally(counter).ToString( ))
	   End If
	Next counter
	MsgBox(result.ToString( ))

Figure 5-10 shows the results.

A quick tally of the characters in a string
Figure 5-10. A quick tally of the characters in a string

If you want to tally Unicode characters, you need to either dimension a much larger tally array or use a lookup system that constantly adds and counts characters as it finds them.

See Also

Recipe 5.11 provides additional details on encoded conversions.

5.13. Counting Words

Problem

You want to count the words in a string.

Solution

Sample code folder: Chapter 05CountWords

Use the Split() function to split the string at each space character. The length of the resulting array is a good approximation of the number of words in the string.

Discussion

There always seems to be more than one way to get things done in Visual Basic 2005, and counting words is no exception. The following code shows one quick-and-dirty technique that requires very little coding to get the job done:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim wordCount As Integer = Split(quote, Space(1)).Length
	MsgBox(quote & vbNewLine & "Number of words: " & _
	   wordCount.ToString)

Figure 5-11 shows the resulting number of words in the string.

Splitting a string to count its words
Figure 5-11. Splitting a string to count its words

Inaccuracies can creep in if there are multiple spaces between some words in the string, if extra spaces appear at either or both ends of the string, or if other whitespace characters (such as tabs) are involved. A little preparation of the string can help eliminate some of these problems, but at the expense of added complexity. For example, the following lines of code get rid of runs of two or more space characters, replacing them with single spaces. Adding this code just before the Split() function can provide a more accurate word count:

	Do While (quote.IndexOf(Space(2)) >= 0)
	   quote = quote.Replace(Space(2), Space(1))
	Loop

Similarly, you can use the Replace() method to replace all tabs with spaces (probably best done just before converting all multiple spaces to single spaces). As you can probably sense, efforts to guarantee a more accurate count cause the code to grow quickly. The best course is to decide what degree of word-counting accuracy is required, how much value to place on speed of operation, and so on before deciding how much cleanup code to add.

Another solution to this problem involves regular expressions, which are covered in Recipes 5.37, 5.38, 5.39, 5.40, 5.41 through 5.42.

See Also

Recipe 5.42 shows how to solve this same problem using a different solution.

5.14. Removing Extra Whitespace

Problem

You want to remove all extra whitespace characters from a string, leaving a single space character between each word.

Solution

Sample code folder: Chapter 05RemoveWhitespace

There are several possible ways to remove extra whitespace from a string. One approach, presented here, is to test each character of the string to see if it is whitespace and to build up the resulting string using a StringBuilder:

	Dim source As String = _
	   Space(17) & "This string had " & Chr(12) & _
	   StrDup(5, Chr(9)) & "extra whitespace. " & Space(27)
	Dim thisIsWhiteSpace As Boolean
	Dim prevIsWhiteSpace As Boolean
	Dim result As New System.Text.StringBuilder(source.Length)
	Dim counter As Integer

	For counter = 0 To source.Length - 1
	   prevIsWhiteSpace = thisIsWhiteSpace
	   thisIsWhiteSpace = _
	      Char.IsWhiteSpace(source.Chars(counter))
	   If (thisIsWhiteSpace = False) Then
	      If (prevIsWhiteSpace = True) AndAlso _
	         (result.Length > 0) Then result.Append(Space(1))
	      result.Append(source.Chars(counter))
	   End If
	Next counter
	MsgBox("<" & result.ToString( ) & ">")

Discussion

The previous code first builds a test string comprised of words separated by extra spaces, tabs, and other whitespace characters. After processing to replace runs of whitespace characters with single spaces, the resulting string is displayed for inspection, as shown in Figure 5-12.

The test string after zapping extra whitespace characters
Figure 5-12. The test string after zapping extra whitespace characters

Another straightforward approach to removing extra whitespace is to use a series of Replace() functions, first to replace tabs and other whitespace characters with spaces, and finally to replace multiple spaces with single ones. This will work fine, but the disadvantage is that many temporary strings are built in memory as the immutable strings are processed. The code presented here moves each character in memory only once, or not at all if the character is an extra whitespace.

Another good approach is to use regular expressions to grab an array of the words and then piece them back together with single spaces using a StringBuilder.

See Also

Recipe 5.42 shows how to use regular expressions to attack the multiwhitespace problem.

5.15. Using the Correct End-of-Line Characters

Problem

You are developing an application that will run on several platforms, so you want to use end-of-line characters that are compatible with all platforms.

Solution

Sample code folder: Chapter 05EndOfLine

Use the property Environment.NewLine, which returns the end-of-line characters for the current platform. For example, the following code adds a self-describing line of text to a StringBuilder and ends the line with the newline characters for the current platform:

	Dim result As New System.Text.StringBuilder
	result.Append("Environment.NewLine").Append( _
	   Environment.NewLine)
	MsgBox(result.ToString())

Discussion

The following code, which simply extends the prevous short snippet, terminates lines in 10 different ways, all with the same result in the Windows environment:

	Dim result As New System.Text.StringBuilder

	result.Append(" 
vbNewLine").Append(vbNewLine)
	result.Append("vbCrLf").Append(vbCrLf)
	result.Append("vbCr").Append(vbCr)
	result.Append("vbLf").Append(vbLf)
	result.Append("Chr(13)").Append(Chr(13))
	result.Append("Chr(10)").Append(Chr(10))
	result.Append("Chr(13) & Chr(10)").Append(Chr(13) & Chr(10))
	result.Append("Environment.NewLine").Append( _
	   Environment.NewLine)
	result.Append("ControlChars.CrLf").Append(ControlChars.CrLf)
	result.Append("ControlChars.NewLine").Append( _
	   ControlChars.NewLine)

	MsgBox(result.ToString( ))

Figure 5-13 shows each of these self-describing lines as displayed by the message box in the last line.

No less than 10 ways to terminate a line
Figure 5-13. No less than 10 ways to terminate a line

Different platforms, such as Linux and Mac OS, expect different combinations of carriage-return and line-feed characters to terminate lines in documents or in displayed text. Visual Basic 2005 defines several constants you can use that explicitly combine these characters in a variety of ways. These named constants are easily identified by their “vb” prefix.

The somewhat generic vbNewLine constant provides a platform-dependent end of line, but only if an application is recompiled on each platform. Feel free to substitute any of the others if you find them more suitable.

The ControlChars.NewLine property is not a constant. Instead, this property polls the current operating system and returns the correct sequence of characters. This is your best choice when you want to compile a .NET application on one platform but run it on another.

Tip

The StreamWriter object has a property named NewLine, which can be altered to change its default end-of-line definition. This lets you change the set of characters inserted into the stream at the end of each call to the StreamWriter’s WriteLine() method. This can be handy, for example if you wish to automate double spacing of lines.

See Also

Recipe 5.19 makes use of line endings in its adjustment of a string.

5.16. Replacing Substrings

Problem

You need to find and replace all occurrences of a substring in a larger string.

Solution

Use the String object’s Replace() method.

Discussion

The following example replaces all occurrences of lowercase “ing” with uppercase “ING” in a sample string:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim result As String = quote.Replace("ing", "ING")
	MsgBox(result)

Figure 5-14 shows the results, where two occurrences were found and replaced.

Replacing multiple substrings
Figure 5-14. Replacing multiple substrings

In this example, the substrings are replaced with a new string of the same length, but the replacement string can be of differing length. In fact, a useful technique is to make a replacement with a zero-length string, effectively deleting all occurrences of a given substring. For example, the following code, applied to the original string, results in the shortened string displayed in Figure 5-15:

	result = Quote.Replace("not to stop ", "")
Zapping substrings by replacing them with an empty string
Figure 5-15. Zapping substrings by replacing them with an empty string

See Also

Recipe 5.21 shows how to remove characters from the start and end of a string.

5.17. Inserting a Character or String

Problem

You want to insert a character or string into another string at a given location.

Solution

Use the String object’s Insert() method.

Discussion

The string method Insert() is overloaded to accept either a character or a string to be inserted at a given location. For example, the following Insert() method adds a comma just after the word “thing” in the sample string:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim result As String = quote.Insert(19, ","c)
	MsgBox(result)

Figure 5-16 shows the result of inserting the comma character.

Sample string with a character inserted
Figure 5-16. Sample string with a character inserted

In this case the character is inserted after the 19th character of the string, or just after the “g” in “thing.” You can insert a character in the first position of a string by using position 0, and at the end of a string by using the string’s Length value.

The following code inserts the word “definitely " into the sample string. The inserted text includes a space at the end to keep the words spaced correctly in the result:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	quote = quote.Insert(23, "definitely ")
	MsgBox(quote)

The 23rd position in the original string is just after the “s” character in “is not.” Figure 5-17 shows the result of this word insertion.

Sample string with the word “definitely” (followed by a space) inserted
Figure 5-17. Sample string with the word “definitely” (followed by a space) inserted

See Also

Recipe 5.18 also discusses text insertions.

5.18. Inserting a Line

Problem

You want to insert a complete line of text in a string that contains multiple lines separated by newlines. The desired insertion point is after the nth line.

Solution

Sample code folder: Chapter 05InsertLine

Split the string into a string array using the newlines as the split point, append the line to be inserted to the nth string, and use Join() to glue the string back together again.

Discussion

Use the string function Split(), which is not to be confused with the String.Split() method, to split the string into a string array. The Split() method splits the string at individual-character split points, but the Split() function lets you split the string using a multicharacter string for the defined split point. The vbNewLine constant is actually a two-character string, so you must use the Split() function to avoid splitting on the carriage-return character only, leaving the line-feed character at the front end of each array string.

Rather than redimensioning the string array to shuffle the lines and create a slot in which to insert the new one, it’s easier to just concatenate the new string, accompanied by a newline constant, to the appropriate string in the array. This is a simpler and more efficient procedure that involves less shuffling of string data in memory, and the results after doing a Join() are identical.

This insert functionality works well as a standalone function, which is presented in the following lines of code:

	Public Function InsertLine(ByVal source As String, _
	      ByVal lineNum As Integer, _
	      ByVal lineToInsert As String) As String
	   ' ----- Insert a line in the middle of a set of lines.
	   Dim lineSet( ) As String
	   Dim atLine As Integer

	   ' ----- Break the content into multiple lines.
	   lineSet = Split(source, vbNewLine)
	   
	   ' ----- Determine the new location, being careful not
	   '       to fall off the edge of the line set.
	   atLine = lineNum
	   If (atLine < 0) Then atLine = 0
	   If (atLine >= lineSet.Length) Then
	      ' ----- Append to the end of everything.
	      lineSet(lineSet.Length - 1) &= vbNewLine & lineToInsert
	   Else
	      ' ----- Insert before the specified line.
	      lineSet(atLine) = _
	         lineToInsert & vbNewLine & lineSet(atLine)
	   End If
	
	   ' ----- Reconnect and return the parts.
	   Return Join(lineSet, vbNewLine)
	End Function

The string is first split at line boundaries into a string array. LineNum is the number of the line after which the lineToInsert string is inserted. You can pass zero to this parameter to insert the new line before the first one. After appending the new string to the appropriate string in the array, along with a vbNewLine to separate it from the original line, the array is glued back together with the Join() function, using a vbNewLine between each line to restore its original structure. This new string is then returned as the result of the InsertLine() function.

The following lines of code demonstrate the function’s use:

	Dim result As New System.Text.StringBuilder
	result.AppendLine("This string")
	result.AppendLine("contains")
	result.AppendLine("several")
	result.AppendLine("lines")
	result.Append("of text.")
	
	' ----- Show the original content.
	Dim resultAsString As String = result.ToString( )
	MsgBox(resultAsString)
	
	' ----- Show the modified content.
	resultAsString = InsertLine(resultAsString, 3, "(inserted)")
	MsgBox(resultAsString)

A StringBuilder is used to build the original string containing several lines of text separated by vbNewLines. The first message box (displayed in Figure 5-18) shows the string before the extra line is inserted. The second message box (displayed in Figure 5-19) shows the new string inserted after the third line.

The original string containing five lines of text
Figure 5-18. The original string containing five lines of text

Tip

The Split() method will accept either a character or a string to define the split points in a string, but only the first character of the string is used. The Split() function, however, uses the entire string parameter, of any length, to split the string. Both the Split() method and the Split() function are very handy, but make sure you understand the difference in the way they work.

The same string after “(inserted)” is inserted after the third line
Figure 5-19. The same string after “(inserted)” is inserted after the third line

See Also

Recipe 5.17 also discusses text insertions. The difference between the Split() method and the Split() function is further discussed in Recipe 5.44.

5.19. Double-Spacing a String

Problem

You want to double-space a string comprised of multiple lines of text separated by newlines.

Solution

Use the String object’s Replace() method to replace all vbNewLines with two vbNewLines.

Discussion

The Replace() method provides an easy solution to this problem. Simply replace each occurrence of a vbNewLine separating the lines of text with a double vbNewLine:

	content = content.Replace(vbNewLine, vbNewLine & vbNewLine)

Figures 5-20 and 5-21 show a multiline example string before and after this replacement.

A string comprised of five lines of single-spaced text
Figure 5-20. A string comprised of five lines of single-spaced text
The same string, double spaced
Figure 5-21. The same string, double spaced

See Also

Recipe 5.16 shows how to replace specific substrings within a larger string.

5.20. Formatting Numbers into Strings

Problem

You want to format a number into a string suitable for displaying or printing, something that provides formatting control beyond the defaults.

Solution

Sample code folder: Chapter 05 FormatNumbers

Apply the String object’s Format() method, and use its custom formatting codes to get the output you desire.

Discussion

There are several ways and places in Visual Basic 2005 to apply formatting to numerical data. One of the best (and possibly the easiest to remember) is the Format() method, available as a shared method of the String object. A few simple examples will show you how to use this method:

	Dim intValue As Integer = 1234567
	Dim floatValue As Double = Math.PI
	Dim result As New System.Text.StringBuilder
	
	result.AppendLine(String.Format("{0} … {1}", _
	   intValue, floatValue))
	result.AppendLine(String.Format("{0:N} … {1:E}", _
	   intValue, floatValue))
	result.AppendLine(intValue.ToString("N5") & " … " & _
	   floatValue.ToString("G5"))
	
	MsgBox(result.ToString())

This example formats an Integer and a Double in several different ways. Other numerical values, such as Long, Short, Single, Decimal, and so on, can be formatted in the same ways. Figure 5-22 shows the result of applying the above formatting.

A sampling of the many ways numbers can be formatted into strings
Figure 5-22. A sampling of the many ways numbers can be formatted into strings

The Format() method’s first argument is a formatting string that indicates how to use the remaining arguments. It can include zero or more zero-based position specifiers in curly braces. For instance, the text {1} says to insert the second data argument at that position. Consider this line of code:

	result = String.Format( _
	   "There are about {0} days in {1} years.", _
	   365.25 * 3, 3, 17)

The first indexed specifier, {0}, inserts the first data argument, the calculated result of 365.25 * 3. The second indexed formatting specifier, {1}, inserts the integer value 3 at that spot in the resulting string. The argument list also includes a third data element, 17, but because {2} does not appear in the format string, that argument is ignored.

You can use as many indexed formatting specifiers as you want in a single string, but you should always provide a matching indexed argument in the method call following the string, and the first argument is always zero-based. You can use the same argument more than once, you can use them in any order, and you can even skip some arguments. The important thing to remember is to match carefully the index number in the brackets with the argument’s position, starting with zero.

When the index appears in the braces by itself, a default format is used. However, there are many formatting options available to customize the formatting. In the previous sample code, the {0:N} formatted the number to contain commas between every third digit, and {1:E} formatted the number using scientific notation. The Visual Studio online help documentation for the Format() method lists the many formatting options in detail.

You might have noticed that the last formatting line in the example is quite different from the previous ones. If you want to format a number into a string format without directly inserting it into a bigger string, you can use the many formatting options of the ToString() method, a method available to every .NET object (although specially overloaded for the numeric data types). In our example, the first number was formatted using “N5”, which inserts commas and formats the digits to five places after the decimal point. The second number was formatted using “G5”, causing “general” formatting of the number to five significant digits.

There are other formatting options for creating hexadecimal strings, formatting dates and times, formatting culture-specific data such as currency values, and so on. Several of these formatting options are used throughout this book. See the Visual Studio online documentation for specific predefined and custom format strings.

See Also

See the “String.Format” and “NumberFormatInfo Class” topics listed in the Visual Studio online help index. There are many links to related information, so plan to explore the help content for a while.

5.21. Trimming Sets of Characters from a String

Problem

You need to delete extraneous characters from each end of a string.

Solution

Use the String object’s Trim() method, passing to it a list of all characters to be deleted.

Discussion

The following example deletes four letters from the head and tail ends of a string. The letters chosen are just for demonstrating how the Trim() method works; a real-world example of where this might be handy would be to remove line numbers, colons, or other characters from the beginnings or ends of strings. As shown in Figure 5-23, the following code causes the entire first word (“The”) and the last character (“n”) to be removed, or trimmed, from the string:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim trimChars() As Char = {"T"c, "h"c, "e"c, "n"c}
	Dim result As String = quote.Trim(trimChars)
	MsgBox(result)
Trimming specific characters from the head and tail ends of a string
Figure 5-23. Trimming specific characters from the head and tail ends of a string

You do not need to supply the characters in any particular order; all supplied characters will be trimmed. Trimming continues until the first and last characters of the string are something other than those supplied to the Trim() method. If you supply no arguments to Trim(), all whitespace characters are trimmed instead.

If you want to trim certain characters from either the start or end of the string, but not both, use the TrimStart() and TrimEnd() methods, respectively. They accept the same character-array argument as the Trim() method.

See Also

Recipes 5.14 and 5.16 discuss related techniques.

5.22. Identifying and Validating Types of Data in a String

Problem

You want to check a string variable to see whether it has been assigned a value, or if it can be converted to a number, date, or time. This check can prevent an exception, and it can free your code from having to use an exception as part of its testing logic.

Solution

Sample code folder: Chapter 05StringTypes

Visual Basic 2005 has three string functions that help solve this problem: IsNothing(), IsNumeric(), and IsDate(). Use these to test a string’s contents before attempting conversions.

Discussion

The following code demonstrates the use of these three functions with data set to Nothing:

	Dim theData As String = Nothing
	Dim result As New System.Text.StringBuilder

	' ----- Format nothing.
	result.AppendLine(String.Format( _
	   "IsNumeric({0}) … {1}", theData, IsNumeric(theData)))
	result.AppendLine(String.Format( _
	   "IsDate({0}) … {1}", theData, IsDate(theData)))
	result.AppendLine(String.Format( _
	   "IsNothing({0}) … {1}", theData, IsNothing(theData)))
	result.AppendLine()

String variables are normally undefined, assigned the value of Nothing. We specifically assigned theData the value Nothing in the above code, but if we had left it blank Visual Studio would have questioned our motives and marked the first use of theData with a warning, as shown in Figure 5-24. As you can see, the unassigned string variable has squiggly lines under it, indicating a problem; hovering the mouse pointer over it causes the displayed explanation to pop up. This is a nonfatal warning, and the program will still run.

Visual Studio warns you if you attempt to use a string that has no data assigned to it
Figure 5-24. Visual Studio warns you if you attempt to use a string that has no data assigned to it

As shown in the first three lines of output displayed in Figure 5-25 (below), in this case the IsNumeric() and IsDate() functions verify that the string does not represent a valid number or date, but it does pass the IsNothing() test, as expected.

Next, the string is assigned a value that represents a valid number:

	' ----- Format a number in a string.
	theData = "-12.345"
	result.AppendLine(String.Format( _
	   "IsNumeric({0}) … {1}", theData, IsNumeric(theData)))
	result.AppendLine(String.Format( _
	   "IsDate({0}) … {1}", theData, IsDate(theData)))
	result.AppendLine(String.Format( _
	   "IsNothing({0}) … {1}", theData, IsNothing(theData)))
	result.AppendLine()

When the three tests are repeated, they match expectations. As shown in the middle three lines of output in Figure 5-25, the IsNumeric() test now returns True, and the IsDate() and IsNothing() tests return False.

Finally, the string is assigned a valid date, and the three tests are repeated for the last time:

	' ----- Format a date in a string.
	theData = "July 17, 2007"
	result.AppendLine(String.Format( _
	   "IsNumeric({0}) … {1}", theData, IsNumeric(theData)))
	result.AppendLine(String.Format( _
	   "IsDate({0}) … {1}", theData, IsDate(theData)))
	result.Append(String.Format( _
	   "IsNothing({0}) … {1}", theData, IsNothing(theData)))

	MsgBox(result.ToString())

In this last case the IsDate() function returns True, and the other two tests return False, as shown in the last three lines of output in Figure 5-25.

See Also

Recipes 5.24 and 5.25 show how to examine content for correct processing.

Results of testing a string’s contents
Figure 5-25. Results of testing a string’s contents

5.23. Converting Strings Between Encoding Systems

Problem

You need to convert string data to and from byte arrays using an encoding method matched to your data, environment, or culture.

Solution

Sample code folder: Chapter 05Encoding

Use System.Text.Encoding shared functions to convert between strings and byte arrays, using either UTF7, UTF8, Unicode, or UTF32 encoding, as appropriate.

Discussion

The following code starts with a sample string and then converts it to four byte arrays, one for each type of encoding. The length of each byte array will vary as a function of the encoding (to be explained in more detail later), so the Length property of each array is formatted into a StringBuilder for display at the end of the code. The four byte arrays are then converted back to Strings, using the same encoding in each case, and a quick check is made to verify that the resulting strings match the original:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim result As New System.Text.StringBuilder

	' ----- Convert a string to various formats.
	Dim bytesUTF7 As Byte( ) = _
	   System.Text.Encoding.UTF7.GetBytes(quote)
	Dim bytesUTF8 As Byte( ) = _
	   System.Text.Encoding.UTF8.GetBytes(quote)
	Dim bytesUnicode As Byte( ) = _
	   System.Text.Encoding.Unicode.GetBytes(quote)
	Dim bytesUTF32 As Byte( ) = _
	   System.Text.Encoding.UTF32.GetBytes(quote)

	' ----- Show the converted results.
	result.Append("bytesUTF7.Length = ")
	result.AppendLine(bytesUTF7.Length.ToString( ))
	result.Append("bytesUTF8.Length = ")
	result.AppendLine(bytesUTF8.Length.ToString( ))
	result.Append("bytesUnicode.Length = ")
	result.AppendLine(bytesUnicode.Length.ToString( ))
	result.Append("bytesUTF32.Length = ")
	result.AppendLine(bytesUTF32.Length.ToString( ))

	' ----- Convert everything back to standard strings.
	Dim fromUTF7 As String = _
	   System.Text.Encoding.UTF7.GetString(bytesUTF7)
	Dim fromUTF8 As String = _
	   System.Text.Encoding.UTF8.GetString(bytesUTF8)
	Dim fromUnicode As String = _
	   System.Text.Encoding.Unicode.GetString(bytesUnicode)
	Dim fromUTF32 As String = _
	   System.Text.Encoding.UTF32.GetString(bytesUTF32)
	
	' ----- Check for conversion issues.
	If (fromUTF7 <> quote) Then _
	   Throw New Exception("UTF7 Conversion Error")
	If (fromUTF8 <> quote) Then _
	   Throw New Exception("UTF8 Conversion Error")
	If (fromUnicode <> quote) Then _
	   Throw New Exception("Unicode Conversion Error")
	If (fromUTF32 <> quote) Then _
	   Throw New Exception("UTF32 Conversion Error")
	
	MsgBox(result.ToString( ))

All strings in .NET are internally stored as two-byte Unicode characters. However, if each character of the string always falls within a known range of characters, the string can be converted to a one-byte-per-character byte array.

UTF7 encoding converts each character of the string to a single byte with the assumption that only the lower seven bits of each byte are used, leaving the highest-order bit as zero in all cases. This is true of ASCII characters with binary values in the range 0to 127, which covers the normal range of English-language displayable and printable characters.

UTF8 is very similar to UTF7, but it also allows conversion of special characters in the byte value range 128 to 255. This is the extended ASCII character set that is sometimes used for special purposes. UTF8 uses all eight bits of each byte to define each character’s value in the range 0 to 255.

Today’s computer systems now invariably use the international standard Unicode character set, which requires two bytes per character. Standard ASCII characters still fall within the same 0to 127 range in Unicode, so the second byte of each Unicode character in this range is set to zero. Other languages and cultures have character sets with Unicode integer values greater than 255, and Visual Basic strings handle them just fine.

UTF32 is not widely used, because it requires four bytes per character. However, even the two-byte Unicode characters occasionally require multiple sequential characters to define the specialized characters defined in some languages. UTF32 covers all possible characters in a simple four-bytes-per-character way, allowing internal processing simplifications. Generally, most worldwide string data is stored on external media in the two-byte Unicode format. Only occasionally is it converted to and processed as four-byte UTF32 bytes, and then only while in memory.

For most ASCII conversions, UTF8 is a good choice, requiring the same number of bytes as UTF7 but handling the full range of character values from 0to 255. If squeezing bytes down to a minimum is not a mandate, Unicode is the safest bet.

See Also

Recipe 5.11 shows how to store standard string data as byte values.

5.24. Determining a Character’s Type

Problem

You want to determine if a character is a letter, a digit, whitespace, or any of several other types before processing it further. This can avoid unexpected exceptions, or prevent having to use an exception on purpose to help determine the type of a character.

Solution

Sample code folder: Chapter 05CharType

Use one of the many type-testing shared methods of the Char object.

Discussion

The Char object includes several methods that let you determine if a character is part of a larger general category of characters, such as the set of digits. The following code shows many of these in operation while it creates a handy listing of the types of all characters in the ASCII range 0 to 127:

	Dim result As New System.Text.StringBuilder
	Dim counter As Integer
	Dim testChar As Char
	Dim testHex As String
	Dim soFar As Integer

	' ----- Scan through the first half of the ASCII chart.
	For counter = 0 To 127
	   ' ----- What character will we test this time?
	   testChar = Chr(counter)
	   testHex = "x" & Hex(counter)

	   If Char.IsLetter(testChar) Then _
	      result.AppendLine(testHex & " IsLetter")
	   If Char.IsControl(testChar) Then _
	      result.AppendLine(testHex & " IsControl")

	   If Char.IsDigit(testChar) Then _
	      result.AppendLine(testHex & " IsDigit")
	   If Char.IsLetterOrDigit(testChar) Then _
	      result.AppendLine(testHex & " IsLetterOrDigit")
	   If Char.IsLower(testChar) Then _
	      result.AppendLine(testHex & " IsLower")
	   If Char.IsNumber(testChar) Then _
	      result.AppendLine(testHex & " IsNumber")
	   If Char.IsPunctuation(testChar) Then _
	      result.AppendLine(testHex & " IsPunctuation")
	   If Char.IsSeparator(testChar) Then _
	      result.AppendLine(testHex & " IsSeparator")
	   If Char.IsSymbol(testChar) Then _
	      result.AppendLine(testHex & " IsSymbol")
	   If Char.IsUpper(testChar) Then _
	      result.AppendLine(testHex & " IsUpper")
	   If Char.IsWhiteSpace(testChar) Then _
	      result.AppendLine(testHex & " IsWhiteSpace")
	   
	   ' ----- Display results in blocks of 16 characters.
	   soFar += 1
	   If ((soFar Mod 16) = 0) Then
	      MsgBox(result.ToString( ))
	      result.Length = 0
	   End If
	Next counter

The message box displays the results for 16 characters at a time. Figure 5-26 shows the output displayed for the first set of characters, and Figure 5-27 shows the results for characters with hexadecimal values in the range of some of the ASCII digits and letters.

Characters with ASCII values 0 to 15 are mostly control characters
Figure 5-26. Characters with ASCII values 0 to 15 are mostly control characters
Characters in the range hexadecimal 30 to hexadecimal 3F are mostly digits, letters, and numbers
Figure 5-27. Characters in the range hexadecimal 30 to hexadecimal 3F are mostly digits, letters, and numbers

Note that many characters fall into several categories. For example, the “0” (zero) character with hexadecimal value 30passes the test for IsDigit, IsLetterOrDigit, and IsNumber.

See Also

Recipe 5.22 includes examples of verifying logical data within strings, instead of the individual characters.

5.25. Parsing Strings

Problem

You want to convert string data to several types of numeric or date/time variables in a consistent way.

Solution

Sample code folder: Chapter 05ParseString

Use the Parse() method provided by all types of variables in Visual Basic 2005.

Discussion

The Parse() method is the counterpart to each object’s ToString() method. That is, the string created by calling an object’s ToString() method will always be in a for-mat suitable for converting back to the same type of object using its Parse() method. A few examples can help clarify this:

	Dim doubleParse As Double = Double.Parse("3.1416")
	Dim ushortParse As UShort = UShort.Parse("65533")
	Dim dateParse As Date = Date.Parse("December 25, 2007")

	MsgBox(String.Format( _
	   "doubleParse: {0}{3}ushortParse: {1}{3}dateParse: {2}", _
	   doubleParse, ushortParse, dateParse, vbNewLine))

As shown in Figure 5-28, the data items are stored in the variables as expected when they are parsed.

Converting string data to numeric and date/time formats
Figure 5-28. Converting string data to numeric and date/time formats

In many cases, you might want to first check the string to make sure it can be parsed to the desired type of variable before making any attempt to do so. For example, use the IsDate() function to test a string to make sure it can be converted successfully before calling a Date variable’s Parse() method to parse the date from the string. If the string is not convertible to the indicated data type, an exception will occur.

See Also

Recipe 5.22 discusses additional content-verification methods.

5.26. Concatenating Strings

Problem

You want to concatenate strings quickly and efficiently.

Solution

Sample code folder: Chapter 05Concatenate

Use the &= concatenation shortcut, or, even better, use a StringBuilder.

Discussion

Visual Basic 2005 offers a few tricks for working with strings more efficiently. The following code presents several helpful techniques, from least to most efficient.

This approach simply concatenates two words and assigns the resulting string to a string variable:

	Dim quote As String
	quote = "The " & "important "

This is how additional string data was always concatenated to the end of a string in VB 6 and earlier versions of the BASIC language:

	quote = quote & "thing "

Because .NET strings are immutable, this code copies the current contents of quote to a new location in memory, then copies the short string "thing " to its tail end, and finally assigns the address of the resulting string to the quote variable, marking the previous contents of quote for garbage collection. By the time you’ve repeat this type of command a few times to concatenate more strings to the tail end of quote, a lot of bytes have gotten shuffled in memory.

This newer technique, available in Visual Basic 2005, provides an improved syntax, although timing tests seem to indicate that a lot of string data is still being shuffled in memory:

	quote &= "is not to stop questioning. "
	quote &= "--Albert Einstein"

The StringBuilder is by far the better way to proceed when concatenating many strings end to end, and you’ll find a lot of examples of its use in this book. As shown here, you can run the Append() method on the results of another Append(), which may or may not make it easier to read the code:

	Dim result As New _
	   System.Text.StringBuilder("The important thing ")
	result.Append("is questioning. ")
	result.Append("--").Append("Albert ").Append("Einstein")

As explained in Recipe 5.1, the StringBuilder maintains an internal buffer of characters, not a true string, and the buffer grows by doubling in size whenever room runs out during an Append() operation. String data is concatenated in place in memory, which keeps the total clock cycles for concatenation way down compared to standard string techniques.

Just to round things out, these last few lines show some of the additional commands available when working with a StringBuilder:

	result.Insert(23, "note to stop ")
	result.Replace("note", "not")
	result.Insert(0, quote & vbNewLine)
	
	MsgBox(result.ToString())

These lines complete the building of the string data displayed by the message box shown in Figure 5-29. The two strings demonstrate that identical results are obtained even after we’ve manipulated the StringBuilder’s contents.

The string built up using a StringBuilder
Figure 5-29. The string built up using a StringBuilder

See Also

Recipe 5.1 and Recipe 5.27 discuss the StringBuilder class in more detail.

5.27. Speeding Up String Manipulation

Problem

You want to see a timing-test-based example that shows just how much faster a StringBuilder can be than standard string concatenation.

Solution

Sample code folder: Chapter 05StringTime

Create a short routine to concatenate the string values of the numbers 1 to 10,000, first using direct concatenation to a string variable and then using a StringBuilder. Use Date variables to calculate elapsed time for each loop in milliseconds, and dis-play the results of each for comparison.

Discussion

Here’s the code for doing the timing test. The two contestants are ready for the race. content is a conventional immutable string, and result is the highly acclaimed StringBuilder challenger:

	Dim content As String = ""
	Dim result As New System.Text.StringBuilder

The supporting cast of characters is ready to rally to the cause. Here, counter is a loop counter, dateTime1 through dateTime3 are Date variables to hold instants in time, and loopCount provides the number of laps for the race:

	Dim counter As Integer
	Dim dateTime1 As Date
	Dim dateTime2 As Date
	Dim dateTime3 As Date
	Dim loopCount As Integer = 15000

The flag is waved to start the race, and the starting time is noted very accurately:

	Me.Cursor = Cursors.WaitCursor
	dateTime1 = Now

The first contestant runs all the loops, concatenating the string representations of the numbers for each lap into one big string named content. The time of completion is carefully noted:

	For counter = 1 To loopCount
	   content &= counter.ToString()
	Next counter
	dateTime2 = Now

The StringBuilder now runs the same laps, appending the same strings in its internal buffer. The time at completion is accurately noted:

	For counter = 1 To loopCount
	   result.Append(counter.ToString())
	Next counter
	dateTime3 = Now

The flag drops, signaling the crossing of the finish line for both contestants:

	Me.Cursor = Cursors.Default

In a moment, the results of the race appear:

	content = String.Format( _
	   "First loop took {0:G4} ms, the second took {1:G4} ms.", _
	   dateTime2.Subtract(dateTime1).TotalMilliseconds, _
	   dateTime3.Subtract(dateTime2).TotalMilliseconds)
	MsgBox(content)

The results are shown in the message box displayed in Figure 5-30. Due to differences between systems, your results may vary.

The StringBuilder is the clear winner of this race
Figure 5-30. The StringBuilder is the clear winner of this race

To be fair, this race was highly contrived to help point out the difference in operational speed between string concatenation and StringBuilder appending. If you create a loop in which the same strings are used each time, the timing is much more equal. This is because Visual Basic handles immutable strings very intelligently, reusing existing strings whenever possible and hence speeding up repetitive operations involving the same data. The test shown here creates a unique string for each concatenation by converting the loop index number to a string, forcing a lot of extra string creation and storage in memory during the loops.

When running this test yourself, you might need to adjust the value of loopCount for your system. If the race seems to take too long, stop the program manually and adjust loopCount to a value a few thousand lower; if the race is too fast, resulting in an apparent elapsed time of 0ms for the StringBuilder, bump up loopCount by a few thousand, and try again.

See Also

Recipe 5.1 and Recipe 5.26 provide additional discussion of strings and StringBuilder instances.

5.28. Counting Occurrences of a Substring

Problem

You need to count occurrences of a specific word or substring in a string.

Solution

Sample code folder: Chapter 05CountSubstring

There are three standard approaches to this problem:

  • Use the regular expression object (System.Text. RegularExpressions.Regex)to provide a count of the number of matches on the string.

  • Use the Split() function to split the string using the specific substring as a split point, then use the length of the resulting string array to determine the count.

  • Loop through the string using the IndexOf() method to find all occurrences of the substring.

Discussion

This recipe’s sample code presents all three techniques. You can decide, based on your specific programming task, which will work best for you. Here’s the setup:

	Imports System.Text.RegularExpressions

	' …Later, in a method…
	
	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim count1 As Integer
	Dim count2 As Integer
	Dim count3 As Integer

With the first technique, the Regex.Matches() method returns a collection of matches on the searched-for string, and the collection’s Count property provides the number we want:

	count1 = Regex.Matches(quote, "(in)+").Count

The second technique splits the string using the searched-for string as the split point. The result of the split is a string array, and its Length is one greater than the number of split points where each substring occurred:

	count2 = Split(quote, "in").Length - 1

The third technique involves a little more coding, but no string data is shuffled in memory during the search, resulting in an efficient way to locate and count each occurrence of the searched-for string. The IndexOf() method searches for the next occurrence of a string within another, optionally starting the search at an indexed location within the string:

	Dim content As String = "in"
	Dim position As Integer = -content.Length
	Do
	   position = quote.IndexOf(content, position + content.Length)
	   If (position < 0) Then Exit Do
	   count3 += 1
	Loop

This lets the search proceed from occurrence to occurrence until IndexOf() runs out of matches and returns an index of–1. count3 keeps count of the number of times the IndexOf() search is successful, providing a count of the occurrences.

The last line of the example code formats and displays the three counts, as shown in Figure 5-31:

	MsgBox(String.Format( _
	   "{0}{3}{1}{3}{2}", count1, count2, count3, vbNewLine))
The substring “in” occurs four times in the sample string
Figure 5-31. The substring “in” occurs four times in the sample string

5.29. Padding a String for Exact Length and Alignment

Problem

You want to pad a string with spaces (or some other character) either on the head end, the tail end, or both ends, such that the resulting string is n characters in total length.

Solution

Sample code folder: Chapter 05PadString

Use the String. PadLeft() and String.PadRight() methods to pad the head and tail ends of the string, respectively, and use a calculated combination of these two methods to pad the string on both ends.

Discussion

The PadLeft() and PadRight() methods take a count value that defines the target length of the string after sufficient spaces are concatenated to it. An optional second parameter provides a character to use for the padding if you want something other than spaces to be used. In the first block of code the default space characters are used for the padding:

	Dim content1 As String
	Dim content2 As String
	Dim content3 As String
	Dim content4 As String
	content1 = "Not padded"
	content2 = "PadLeft".PadLeft(50)
	content3 = "PadRight".PadRight(50)
	content4 = "PadCenter"
	content4 = content4.PadLeft((50 + _
	   content4.Length)  2).PadRight(50)
	MsgBox(String.Format("{0}{4}{1}{4}{2}{4}{3}", _
	   content1, content2, content3, content4, vbNewLine))

The PadCenter() calculation adds half of the required padding characters to the head end of the string, then pads out the right end to the target length. The PadLeft() method is applied to the string first, and the PadRight() method is applied to the result, all in a single line. Figure 5-32 shows the strings with the padding causing the text to align to the left, right, and middle, depending on where the padding was applied.

Padding strings with spaces at the head, the tail, or both ends
Figure 5-32. Padding strings with spaces at the head, the tail, or both ends

Padding with spaces is often what you want to do in a real-world application, but for display purposes it isn’t very helpful. In Figure 5-32, for instance, you can’t tell that “PadRight” has 50spaces at its end. Therefore, let’s recode this example, padding the strings with periods instead:

	content1 = "Not padded"
	content2 = "PadLeft".PadLeft(50, "."c)
	content3 = "PadRight".PadRight(50, "."c)
	content4 = "PadCenter"
	content4 = content4.PadLeft((50 + content4.Length)  2, _
	   "."c).PadRight(50, "."c)
	MsgBox(String.Format("{0}{4}{1}{4}{2}{4}{3}", _
	   content1, content2, content3, content4, vbNewLine))

In this case, the same padding takes place, but with a period for the padding character. Figure 5-33 shows the result, which is more meaningful than Figure 5-32.

The same padding as before, but using periods for padding instead of spaces
Figure 5-33. The same padding as before, but using periods for padding instead of spaces

5.30. Converting Tabs to Spaces

Problem

You need to convert a string’s tab characters to spaces while preserving the string’s spacing.

Solution

Sample code folder: Chapter 05TabsToSpaces

Create a function to convert tabs to spaces in the defined way:

	Public Function TabsToSpaces(ByVal source As String, _
	      ByVal tabSize As Integer) As String
	   ' ----- Replace tabs with space characters.
	   Dim result As New System.Text.StringBuilder
	   Dim counter As Integer

	   For counter = 0 To source.Length - 1
	      If (source.Chars(counter) = vbTab) Then
	         Do

	            result.Append(Space(1))
	         Loop Until ((result.Length Mod  
tabSize) = 0)
	      Else
	         result.Append(source.Chars(counter))
	      End If
	   Next counter
	   Return result.ToString( )
	End Function

Discussion

The trick to replacing the tabs is to insert just the right number of spaces to preserve the original alignment of the text. Tab characters generally shift the next character to a position that is an exact multiple of the tab spacing. In Visual Studio, this spacing constant is often 4, but in many text editors, and even in the Windows Forms TextBox control, the standard tab spacing is 8. The sample function accepts an argument to set the tab-spacing constant to any value.

The function uses a StringBuilder to rebuild the original string, replacing tabs with enough spaces to maintain the alignment. The Chars property of the string makes it easy to access and process each individual character from the string, and the Mod() function simplifies the math checks required to determine the number of spaces to insert.

This code shows the TabsToSpaces() function in use:

	Dim tabs As String = _
	   "This~is~~a~tabbed~~~string".Replace("~"c, vbTab)
	Dim spaces As String = TabsToSpaces(tabs, 8)
	Dim periods As String = spaces.Replace(" "c, "."c)

The first line builds a string comprised of words separated by multiple tab characters. The tilde (~) characters provide a visual way to see where the tabs will go, and the Replace() method replaces each tilde with a tab.

The second statement calls the new function and places the returned string in spaces. This string contains no tab characters, but it does contain many spaces between the words.

The periods string provides a visual way to see the spaces more clearly. The Replace() method in this case replaces each space with a period.

Figure 5-34 shows these three strings displayed on a form containing three TextBox controls. Setting the Font property to Courier New, a fixed-width font, more clearly shows the alignment of the characters in the strings. The tab-spacing constant in these text boxes is 8, which is the value passed to TabsToSpaces(), correctly replacing the tabs and maintaining the original alignment.

See Also

Recipe 5.16 also discusses replacing substrings.

The same string with tabs, spaces instead of tabs, and periods instead of spaces
Figure 5-34. The same string with tabs, spaces instead of tabs, and periods instead of spaces

5.31. Reversing a String

Problem

You want to reverse, or mirror image, the order of the characters in a string.

Solution

Use the StrReverse() function.

Discussion

The StrReverse() function makes reversing a string simple:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim reversed As String = StrReverse(quote)
	MsgBox(reversed)

Figure 5-35 shows the reversed string as displayed in the message box.

The sample string reversed
Figure 5-35. The sample string reversed

Another way to reverse a string is to process the characters yourself. This sample code scans through the string in reverse order and appends each found character to a new StringBuilder instance:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	
	Dim counter As Integer
	Dim result As New System.Text.StringBuilder(quote.Length)

	For counter = quote.Length - 1 To 0 Step -1
	   result.Append(quote.Chars(counter))
	Next counter
	
	Dim reversed As String = result.ToString()
	MsgBox(reversed)

The overloaded constructor for the StringBuilder accepts an optional parameter defining the capacity the StringBuilder should use for its internal character buffer. Since we know the reversed string will be the same length as the original, the capacity can be set to exactly the amount needed. This prevents the StringBuilder from having to double its capacity when it runs low on space while appending characters (see Recipe 5.1). Using the Chars property of the string to grab characters and setting the initial capacity of the StringBuilder in this way ensures that the character bytes are transferred in memory just once in a tight, efficient loop.

5.32. Shuffling a String

Problem

You want to shuffle the order of the characters in a string quickly but thoroughly.

Solution

Sample code folder: Chapter 05StringShuffle

The best technique is to loop through each character location once, swapping the character at that location with a character at a random location anywhere in the string.

Discussion

The basic algorithm for shuffling a string, as presented here, is also good for shuffling arrays or any other ordered data. This algorithm takes a finite amount of time to run, and the results are as random as the random number generator used.

A walk through the code explains the process clearly. These lines declare the variables required and initialize the random number generator to a unique sequence, using the system clock for the random number generator’s seed:

	Dim counter As Integer
	Dim position As Integer
	Dim holdChar As Char
	Dim jumbleMethod As New Random
	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"

To manipulate the individual characters of the string, it’s best to convert the string to a character array:

	Dim chars() As Char = CType(quote, Char())

This allows for swapping the characters in memory without having to make multiple copies of immutable strings. You can directly access a string’s individual characters using the string’s Chars property, but this property is read-only. In this case, we need to store new characters into the string’s locations during each swap.

The following loop is the core of the shuffling algorithm:

	For counter = 0 To chars.Length - 1
	   position = jumbleMethod.Next Mod chars.Length
	   holdChar = chars(counter)
	   chars(counter) = chars(position)
	   chars(position) = holdChar
	Next counter

Each character is sequentially processed by swapping it with another character located randomly at any position in the string. This means that a character might even get swapped with itself occasionally, but that does not reduce the randomness of the results. This loop guarantees that each character gets swapped at least once, but statistically speaking each character gets swapped twice, on average.

The last two lines convert the character array back to a string and then display the result in a message box, as shown in Figure 5-36:

	Dim result As String = New String(chars)
	MsgBox(result)
The shuffled string
Figure 5-36. The shuffled string

The sample string will be shuffled into a unique random order every time the sample code is run.

See Also

Recipes 6.27 and 8.5 show additional uses of random numbers.

5.33. Using a Simple String Encryption

Problem

You want to encrypt a string using a key. The encrypted result should be a displayable and printable string of standard ASCII characters.

Solution

Sample code folder: Chapter 05EncryptString

The following short class defines a SimpleCrypt object containing shared functions for encrypting and decrypting a string. In addition to the string to be encrypted or decrypted, an integer is passed to each function to serve as the key:

	Public Class SimpleCrypt
	   Public Shared Function Encrypt(ByVal source As String, _
	         ByVal theKey As Integer) As String
	      ' ----- Encrypt a string.
	      Dim counter As Integer
	      Dim jumbleMethod As New Random(theKey)
	      Dim keySet(source.Length - 1) As Byte
	      Dim sourceBytes() As Byte = _
	         System.Text.Encoding.UTF8.GetBytes(source)

	      jumbleMethod.NextBytes(keySet)
	      For counter = 0 To sourceBytes.Length - 1
	         sourceBytes(counter) = _
	            sourceBytes(counter) Xor keySet(counter)
	      Next counter

	      Return Convert.ToBase64String(sourceBytes)
	   End Function

	   Public Shared Function Decrypt(ByVal source As String, _
	         ByVal theKey As Integer) As String
	      ' ----- Decrypt a previously encrypted string.
	      Dim counter As Integer
	      Dim jumbleMethod As New Random(theKey)
	      Dim sourceBytes() As Byte = _
	         Convert.FromBase64String(source)
	      Dim keySet(sourceBytes.Length - 1) As Byte

	      jumbleMethod.NextBytes(keySet)
	      For counter = 0 To sourceBytes.Length - 1
	         sourceBytes(counter) = _
	            sourceBytes(counter) Xor keySet(counter)
	      Next counter

	      Return System.Text.Encoding.UTF8.GetString(sourceBytes)
	   End Function
	End Class

Discussion

The following code calls the shared functions of the SimpleCrypt class to encrypt a sample string using a key integer value of 123456789, and then decrypts the results using the same key:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	
	Dim myKey As Integer = 123456789
	Dim encrypted As String = SimpleCrypt.Encrypt(quote, myKey)
	Dim decrypted As String = _
	   SimpleCrypt.Decrypt(encrypted, myKey)
	MsgBox(quote & vbNewLine & encrypted & vbNewLine & decrypted)

The encryption function first converts the string to a byte array using UTF8 encoding. Each byte is then Xor'd with a predictable sequence of pseudorandom bytes seeded using the given key integer, and the resulting byte array is converted back to a string. Since this encrypted string likely contains ASCII characters in the range of control and nonprintable characters, the string is then converted to a slightly longer Base64 string comprised of displayable characters.

The decryption function reverses the order of these same steps. First, the Base64 string is converted to a byte array, and the same set of pseudorandom bytes is Xor’d with these bytes to recover the bytes of the original string. Figure 5-37 shows the original string, the encrypted version of this string using a key value of 123456789, and the string that results by decrypting this Base64 string using the same key. As expected, the original string is restored.

Encrypting and decrypting a string using a key integer
Figure 5-37. Encrypting and decrypting a string using a key integer

The Random object can return an array of pseudorandom bytes with any desired length. This lets the code generate the required number of bytes used in the Xor process with only one call to the Random object.

The supplied key is any integer value from 0 to the maximum value for signed integers, which is 2,147,483,647. You can use a negative integer, but the Random class will automatically take its absolute value as the seed.

With over two billion unique seeds, the average user won’t be able to break this simple encryption easily. For quick, simple, relatively secure encryption for typical users, this class can serve you well. However, in cryptographic circles this level of encryption is considered dangerously poor, so be sure to check out Chapter 16 if you need to use something more serious and well tested by the cryptographic community.

See Also

See Chapter 16 for more encryption topics.

5.34. Converting a String to Morse Code

Problem

You want to convert a text string to Morse code characters.

Solution

Sample code folder: Chapter 05MorseCode

Use the IndexOf() string method to look up and cross-reference characters to string array entries representing each Morse code character.

Discussion

The following code converts the string “Hello world!” to a string that displays the Morse code “dahs” and “dits” for each character:

	Dim source As String = "Hello world!"
	Dim characters As String = _
	   "~ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,:?'-/"""
	Dim morse( ) As String = { _
	"?", ".-", "-…", "-.-.", "-..", ".", "..-.", "--.", "….", _
	"..", ".---", "-.-", ".-..", "--", "-.", "---", ".--.", _
	"--.-", ".-.", "…", "-", "..-", "…-", ".--", "-..-", _
	"-.--", "--..", "-----", ".----", "..---", "…--", _
	"….-", "…..", "-….", "--…", "---..", "----.", _
	".-.-.-", "--..--", "---…", "..--..", ".----.", _
	"-….-", "-..-.", ".-..-."}

	Dim result As New System.Text.StringBuilder
	Dim counter As Integer
	Dim position As Integer
	
	For counter = 0 To source.Length - 1
	   position = characters.IndexOf(Char.ToUpper( _
	      source.Chars(counter)))
	   If (position < 0) Then position = 0
	   result.Append(source.Substring(counter, 1))
	   result.Append(Space(5))
	   result.AppendLine(morse(position))
	Next counter

	MsgBox(result.ToString( ))

For most people this code is not all that useful, but there are some interesting details to be learned from this example. For instance, the second line assigns the standard set of characters covered by Morse code to a string named characters. Notice that at the tail end of this string there are three quote characters in a row. The last one terminates the string, as expected, and the pair just before the last one demonstrates how to enter a single-quote character into a string. By doubling up the quote character, you tell the Visual Basic compiler to enter one double-quote character and not to terminate the string.

At the head of the characters string is a tilde (~) character. This is not a Morse code character, but it provides a way to catch all characters in the string to be converted that aren’t found in the set of Morse code characters. For example, in the test string “Hello world!” there’s an exclamation point, which is not defined in the table of International Morse code characters. When the IndexOf() method attempts to find this exclamation point in characters, a value of–1 is returned. This value is changed to zero, which indexes to the question-mark sequence in the Morse() string array. Figure 5-38 shows how the sample string ends up with a question mark instead of the unavailable exclamation point.

The Morse code equivalent of the standard “Hello World!” string
Figure 5-38. The Morse code equivalent of the standard “Hello World!” string

5.35. Adding Strings to an Application’s Resources

Problem

You need to store and edit strings in an application’s resources. This makes it easy to internationalize the application by changing the strings for each culture.

Solution

To edit the resource strings in the Visual Studio environment, open the project’s properties page, and select the Resources tab on the left. Edit the table of string entries, changing the Name, Value, and Comment fields as required.

In the application, refer to each string through the My.Resources object.

Discussion

In Visual Studio, it’s very easy to maintain a table of strings in the application’s resources. Figure 5-39 shows the project’s properties page with the Resources tab selected along the left side.

Editing resource strings in Visual Studio
Figure 5-39. Editing resource strings in Visual Studio

The example shows two resource strings, one named Caption and the other named Text. As the following code shows, in the application these two strings are referenced by name through the My.Resources object. This code then displays a message box using the two strings from the resources, as shown in Figure 5-40:

	Dim stringText As String = My.Resources.Text
	Dim stringCaption As String = My.Resources.Caption
	MsgBox(stringText, , stringCaption)
The results of editing the message box’s Caption and Text properties
Figure 5-40. The results of editing the message box’s Caption and Text properties

Other types of resources can be added, such as images, sounds, and other files. Each of these resources is accessed in the application through the My.Resources object.

See Also

See Chapter 10for an example of storing and using media files in your application’s resources.

5.36. Converting Any Data to a String

Problem

You have an instance of data and want to convert it to its default string representation.

Solution

Sample code folder: Chapter 05UseToString

Use the ToString() method, which is included in all .NET objects, to return a general string for an object instance. To get you started, the following code demonstrates the default ToString() method on several types of variables:

	Dim someInt As Integer = 123
	Dim someDouble As Double = Math.PI
	Dim someString As String = "Testing"
	Dim someDate As Date = #7/4/1776 9:10:11 AM#
	Dim someDecimal As Decimal = 1D / 3D
	Dim result As New System.Text.StringBuilder

	result.Append("someInt.ToString ")
	result.AppendLine(someInt.ToString())

	result.Append("someDouble.ToString ")
	result.AppendLine(someDouble.ToString())

	result.Append("someString.ToString ")
	result.AppendLine(someString.ToString())

	result.Append("someDate.ToString ")
	result.AppendLine(someDate.ToString())

	result.Append("someDecimal.ToString ")
	result.Append(someDecimal.ToString())

	MsgBox(result.ToString())

Discussion

Figure 5-41 shows the results displayed by the sample code. Default formatting is used for all these ToString() methods.

The ToString() method is often overloaded to support a variety of formatting options, depending on the type of variable. This lets you convert doubles, for instance, to scientific or other formats. Check the Visual Studio online help resources for the ToString() method for each type of variable to discover the formatting options available.

All objects sport a ToString() method because all objects inherit it from System.Object. An example used repeatedly throughout this chapter is the StringBuilder class, which returns its internal character buffer converted to a string through its ToString() method.

Results of converting several variable types by using the ToString( ) method on each
Figure 5-41. Results of converting several variable types by using the ToString( ) method on each

As you create your own classes, consider adding both a ToString() method and a corresponding Parse() method if the object’s state can be represented as a string.

5.37. Using Regular Expressions to Extract All Numbers

Problem

You want to extract all numbers from a string that has extra whitespace, text, and other nonnumeric characters interspersed throughout.

Solution

Sample code folder: Chapter 05RegexExtractNum

Use a regular expression (Regex) object to identify and parse out a list of all numbers in the string.

Discussion

This is a very tricky problem if the exact format of the string is not known. Identifying exactly which sets of characters are parts of numbers with accuracy in all cases can be difficult. Negative signs, scientific notation, and other complications can arise. Fortunately, the regular expression object greatly simplifies the task. The fol-lowing code demonstrates how it works:

	Imports System.Text.RegularExpressions

	' …Later, in a method…

	Dim source As String = _
	   "This 321.0 string -0.020 contains " & _
	   "3.0E-17 several 1 2. 34 numbers"
	Dim result As String
	Dim parser As New _
	   Regex("[-+]?([0-9]*.)?[0-9]+([eE][-+]?[0-9]+)?")

	Dim sourceMatches As MatchCollection = _
	   parser. 
Matches(source)
	Dim counter As Integer

	result = "Count: " & _
	   sourceMatches.Count.ToString() & vbNewLine
	For counter = 0 To sourceMatches.Count - 1
	   result &= vbNewLine
	   result &= sourceMatches(counter).Value.ToString()
	   result &= Space(5)
	   result &= CDbl(sourceMatches(counter).Value).ToString()
	Next counter
	MsgBox(result)

The string to be parsed is source, which contains a variety of integer and floating-point numbers, both positive and negative, with words and other nonnumeric characters mixed in. A Regex object named parser is instantiated using a specially crafted regular expression designed to locate all conventionally defined numbers. The Matches() method of the Regex object is applied to the string, and a collection of Matches is returned. This collection’s Count property provides a tally of how many numbers were found in the string. Each item in the Matches collection has a Value property with a ToString() method that converts the numeric value to a string.

Figure 5-42 shows the results of parsing the sample string, listing the numbers found using the regular expression. The Matches value displays the string exactly as copied from the original string. That’s the first number on lines 2–7 in the message box. The second number shows the string converted to a Double and then back to a string. The reason for this extra step is to verify that the match string does convert to a numeric value.

Parsing the sample string reveals all the numbers it contains
Figure 5-42. Parsing the sample string reveals all the numbers it contains

Tip

The regular expression presented in this example is one of many that can be found on multiple Internet web sites. The Internet provides a great resource for locating regular expressions for any specific purposes.

See Also

Recipe 5.38 also discusses regular expression processing. The following web sites are just some of the many places on the Internet that provide regular expression samples:

http://www.regular-expressions.info/examples.html
http://sitescooper.org/tao_regexps.html
http://en.wikipedia.org/wiki/Regular_expression

5.38. Getting a Count of Regular Expression Matches

Problem

You want a quick count of the number of matches a regular expression finds in a string.

Solution

Sample code folder: Chapter 05RegexCountMatch

Use the Count property of the Matches() method of the Regex object.

Discussion

The following example code shows how to use regular expressions to count words in a string, as defined by the pattern w+:

	Imports System.Text.RegularExpressions
	
	' …Later, in a method…

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim parser As New Regex("w+")
	Dim totalMatches As Integer = parser.Matches(quote).Count
	MsgBox(quote & vbNewLine & "Number words: " & _
	   totalMatches.ToString)

This example returns a count of the number of matches, not a collection of matches. Figure 5-43 shows the results as displayed by the message box.

Using the Regex object to count words in a string
Figure 5-43. Using the Regex object to count words in a string

This technique can be useful for many other types of regular expression searches, too. For example, the regular expression shown in Recipe 5.37 can be used to quickly determine the number of numbers of all types in a string of any size.

See Also

Recipes 5.13 and 5.37 discuss regular expression processing in additional detail.

5.39. Getting the Nth Regular Expression Match

Problem

You want to get the nth match of a regular expression search within a string.

Solution

Sample code folder: Chapter 05RegexMatchN

Use the Regex object to return a MatchCollection based on the regular expression. The nth match is accessed by indexing item n–1 in the collection.

Discussion

The following code finds all numbers in a sample string, returning all matches as a MatchCollection. In this example, the regular expression accesses the third match in the zero-based collection as item number 2:

	Imports System.Text.RegularExpressions

	' …Later, in a method…

	Dim source As String = "This 7. string -0.02 " & _
	   "contains 003.141600 several 0.9 numbers"
	Dim parser As New Regex( _
	   "[-+]?([0-9]*.)?[0-9]+([eE][-+]?[0-9]+)?")
	Dim sourceMatches As MatchCollection = _
	   parser.Matches(source)
	Dim result As Double = CDbl(sourceMatches(2).Value)
	MsgBox(source & vbNewLine & "The 3rd number: " & _
	   result.ToString())

Figure 5-44 shows the third number found in the string.

Using a regular expression to find the nth match in a string
Figure 5-44. Using a regular expression to find the nth match in a string

See Also

Recipe 5.37 discusses the specific regular expression pattern used in this recipe.

5.40. Compiling Regular Expressions for Speed

Problem

You want to compile a regular expression to maximize runtime speed.

Solution

Sample code folder: Chapter 05RegexDLL

There are two steps to this solution, best described by working through an example. The first step is to run the code to create the compiled DLL file, and the second is to use the new compiled regular expression in one or more applications.

Discussion

First, run the following code one time only to compile and create a DLL file containing a regular expression, in this case using a pattern designed to find all numbers in a string:

	Imports System.Text.RegularExpressions
	
	' …Later, in a method…
	
	Dim numPattern As String = _
	   "[-+]?([0-9]*.)?[0-9]+([eE][-+]?[0-9]+)?"
	Dim wordPattern As String = "w+"
	Dim whichNamespace As String = "NumbersRegex"
	Dim isPublic As Boolean = True

	Dim compNumbers As New RegexCompilationInfo(numPattern, _
	   RegexOptions.Compiled, "RgxNumbers", _
	   whichNamespace, isPublic)
	Dim compWords As New RegexCompilationInfo(wordPattern, _
	   RegexOptions.Compiled, "RgxWords", whichNamespace, _
	   isPublic)
	Dim compAll( ) As RegexCompilationInfo = _
	   {compNumbers, compWords}

	Dim whichAssembly As New _
	   System.Reflection.AssemblyName("RgxNumbersWords")
	Regex.CompileToAssembly(compAll, whichAssembly)

This code creates a new file named RgxNumbersWords.dll that contains the compiled regular expression. The file is created in the same folder in which the executable program is located.

To use the new DLL in an application, you need to add a reference to it. Right-click on References in the Solution Explorer, click the Browse tab, find the DLL file in the folder where the application’s EXE file is located, and select it to add the reference. Figure 5-45 shows the new reference in the Solution Explorer.

The DLL file named RgxNumbersWords added to the References list in the Solution Explorer
Figure 5-45. The DLL file named RgxNumbersWords added to the References list in the Solution Explorer

You also need to import the namespace defined in this DLL into your application. Either add an Imports command at the top of your source code or, in the Project Properties window, select the References tab, and place a checkmark next to the name of the namespace, as shown in Figure 5-46.

Importing a namespace via the Project Properties window
Figure 5-46. Importing a namespace via the Project Properties window

Once the new DLL is referenced and its object’s namespace has been imported, you can use the compiled regular expression in an application. The following code uses the new RgxNumbers regular expression to count the numbers in a string:

	Imports System.Text.RegularExpressions
	
	' …Later, in a method…
	Dim source As String = _
	   "Making a Pi (3.1415926) is easy as One 1 Two 2 Three 3"
	Dim parser As New RgxNumbers
	Dim totalMatches As Integer = parser.Matches(source).Count

	MsgBox(source & vbNewLine & "Number count: " & _
	   totalMatches.ToString())

Figure 5-47 shows the result of running this code to determine how many numbers are in the sample string.

Quickly counting numbers in a string using the compiled regular expression
Figure 5-47. Quickly counting numbers in a string using the compiled regular expression

See Also

Recipe 5.37 also discusses regular expression processing.

5.41. Using Regular Expressions to Validate Data

Problem

You need to validate string data entered by a user to ensure it meets defined criteria.

Solution

Sample code folder: Chapter 05RegexValidate

Use a regular expression to check the string to make sure it matches the type of data expected.

Discussion

The Internet is a good place to find a wide range of regular expressions to validate strings using specific rules, and this recipe won’t attempt to list them all. Instead, the following code, which validates a String as an email address, demonstrates a specific example to show you the general technique involved:

	Imports System.Text.RegularExpressions

	' …Later, in a method…

	Dim testString As String
	Dim emailPattern As String = _
	   "^([0-9a-zA-Z]+[-._+&])*[0-9a-zA-Z]+@" & _
	   "([-0-9a-zA-Z]+[.])+[a-zA-Z]{2,6}$"

	testString = "[email protected]"
	MsgBox(testString & Space(3) & _
	   Regex. 
IsMatch(testString, emailPattern))

	testString = "john@[email protected]"
	MsgBox(testString & Space(3) & _
	   Regex.IsMatch(testString, emailPattern))

This regular expression checks a string to see if it is a valid email address. As shown in Figures 5-48 and 5-49, the first string passes the test, but the second has a problem. In general, the IsMatch() method returns True if the string matches the criteria defined in the regular expression and False if it fails the test.

A string that passes the regular expression test for valid email addresses
Figure 5-48. A string that passes the regular expression test for valid email addresses
A string that fails the regular expression test designed to validate it as a legal email address
Figure 5-49. A string that fails the regular expression test designed to validate it as a legal email address

See Also

Recipe 5.22 also discusses data validation.

5.42. Using Regular Expressions to Count Characters, Words, or Lines

Problem

You want to count the characters, words, and lines in a string.

Solution

Sample code folder: Chapter 05RegexCountParts

Use separate regular expressions to count words, characters, and lines in a string of any length.

Discussion

The following code demonstrates three very short regular expressions that provide simple counts of characters, words, and lines in a string of any length:

	Imports System.Text.RegularExpressions
	
	' …Later, in a method…

	Dim quote As String = _
	   "The important thing" & vbNewLine & _
	   "is not to stop questioning." & vbNewLine & _
	   "--Albert Einstein" & vbNewLine
	Dim numBytes As Integer = quote.Length * 2
	Dim numChars As Integer = Regex.Matches(quote, ".").Count
	Dim numWords As Integer = Regex.Matches(quote, "w+").Count
	Dim numLines As Integer = Regex.Matches(quote, ".+
*").Count
	MsgBox(String.Format( _
	   "{0}{5}bytes: {1}{5}Chars: {2}{5}Words: {3}{5}Lines: {4}", _
	   quote, numBytes, numChars, numWords, numLines, vbNewLine))

The number of bytes in the string is also displayed, as shown in Figure 5-50, but the string’s Length property provides this count directly without having to resort to a regular expression.

Using simple regular expressions to count characters, words, or lines in a string
Figure 5-50. Using simple regular expressions to count characters, words, or lines in a string

See Also

Recipe 5.38 also discusses the results of regular expression processing.

5.43. Converting a String to and from Base64

Problem

You want to convert a string to or from Base64 format for predictable transfer across a network.

Solution

Sample code folder: Chapter 05Base64

To convert a string to Base64, first use System.Text.Encoding methods to convert the string to a byte array and then use the Convert.ToBase64String() method to convert the byte array to a Base64 string.

To convert a Base64 string back to the original string, use Convert. FromBase64String() to convert the string to a byte array, and then use the appropriate System.Text.Encoding method to convert the byte array to a string.

Discussion

The following code demonstrates these steps as it converts a sample string to Base64 and back again:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim quoteBytes As Byte() = _
	   System.Text.Encoding.UTF8.GetBytes(quote)
	Dim quote64 As String = Convert.ToBase64String(quoteBytes)
	Dim byteSet As Byte() = Convert.FromBase64String(quote64)
	Dim result As String = _
	   System.Text.Encoding.UTF8.GetString(byteSet)
	MsgBox(quote & vbNewLine & quote64 & vbNewLine & result)

UTF8 encoding is used because the sample string’s characters all fall within the range of standard ASCII characters. For other character sets, it’s best to use Unicode encoding, in which case you should change both occurrences of “UTF8” to “Unicode” in the code sample. The byte array and the Base64 string will each be twice as large when using Unicode, but this eliminates the possibility of any data loss during the conversions.

Figure 5-51 shows the results of the above conversions as displayed by the message box.

A sample string converted to Base64 and back again
Figure 5-51. A sample string converted to Base64 and back again

See Also

Recipe 5.33 also shows how to convert string data into an alternative format that uses only printable characters.

5.44. Splitting a String

Problem

You want to split a string using a multicharacter string rather than a single character as the split point, but the String object’s Split() method only splits using one or more individual characters.

Solution

Sample code folder: Chapter 05SplitString

You can use the Visual Basic Split() function instead of the String.Split() method, or you can pass an array of strings to String.Split().

Discussion

The following code shows the differences between using the Split() function and the String.Split() method:

	Dim quote As String = "The important thing is not to " & _
	   "stop questioning. --Albert Einstein"
	Dim strArray1() As String = Split(quote, "ing")
	Dim strArray2() As String = quote.Split(CChar("ing"))
	Dim result As New System.Text.StringBuilder
	Dim counter As Integer

	For counter = 0 To strArray1.Length - 1
	   result.AppendLine(strArray1(counter))
	Next counter
	result.AppendLine(StrDup(30, "-"))

	For counter = 0 To strArray2.Length - 1
	   result.AppendLine(strArray2(counter))
	Next counter
	MsgBox(result.ToString())

String array strArray1 is created by applying the Split() function to the sample string, splitting the string at all occurrences of “ing”. strArray2 uses the String.Split() method to do the same thing. However, even though the string “ing” is passed to the String.Split() method to define the split points, only the first character of this string, the character “i,” is used to make the splits. The results of these two splits are quite different, as shown in the output displayed in the message box in Figure 5-52.

Results of passing the Split( ) function and the Split( ) method a multicharacter string as the split point
Figure 5-52. Results of passing the Split( ) function and the Split( ) method a multicharacter string as the split point

To confuse the issue even further, it is possible to use the String.Split() method to split a string at whole substring boundaries, but only by passing an array of strings to the method to define the split points (not just a simple string) and passing a required parameter defining split options. The following two lines of code demonstrate this technique, returning the desired results. The first line uses the Visual Basic function, and the second line uses the string array technique just described:

	Dim strArray1() As String = Split(quote, "ing")
	Dim strArray1() As String = _
	   quote.Split(New String() {"ing"}, StringSplitOptions.None)

Both String() options are very powerful and useful, but you do need to use the correct one, passing appropriate parameters.

See Also

Recipe 5.28 also discusses string parsing using Split().

5.45. Creating a String of Space Characters

Problem

You want to create a string of n space characters.

Solution

Use the Space(N) function, which returns a string of n space characters.

Discussion

The following sample code actually presents three different ways to create a string of n spaces. In most cases the Space() function works quite well to create the spaces, but it’s informative to compare the three techniques:

	Dim lotsOfSpaces1 As String = New String(" "c, 500)
	Dim lotsOfSpaces2 As String = StrDup(500, " "c)
	Dim lotsOfSpaces3 As String = Space(500)
	Dim result As String = String.Format( _
	   "Length of lotsOfSpaces1: {0}{3}" & _
	   "Length of lotsOfSpaces2: {1}{3}" & _
	   "Length of lotsOfSpaces3: {2}{3}", _
	   lotsOfSpaces1.Length, _
	   lotsOfSpaces2.Length, _
	   lotsOfSpaces3.Length, vbNewLine)
	MsgBox(result)

The String constructor is overloaded to initialize strings as they are created in several ways. As shown in the first statement above, you can create a new string comprised of n repetitions of any character (in this case, a space character).

The StrDup() function is similar in operation in that it also returns a string comprised of n occurrences of a given character. Both the String constructor and the StrDup() function are useful when the repeated character is something other than a space.

Finally, the Space() function returns a string comprised of n space characters, without the option to use any other character.

The rest of the code displays the lengths of the three strings of spaces to help verify that they were created as indicated, as shown in Figure 5-53.

Three identical long strings of spaces created in three different ways
Figure 5-53. Three identical long strings of spaces created in three different ways

See Also

Recipe 5.2 discusses similar functionality.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.133.160