Every Visual Basic developer quickly learns how to manipulate
strings, but it’s often easy to overlook some of the more powerful
techniques available, especially with all the new features in Visual
Basic 2005. A good example is the powerful
StringBuilder
object, which provides an
order-of-magnitude improvement for concatenating strings. Visual Basic 6
developers, in particular, will discover lots of exciting new
string-processing features. For example, Visual Basic 2005’s Substring()
method provides similar
functionality not only to the Mid()
function, but also to the Left()
and
Right()
string functions. The regular
expression library included with .NET also provides new and powerful
ways to analyze and process string data.
You need to process many pieces of string data with more efficiency than is allowed using standard .NET Framework immutable strings.
The StringBuilder
object
provides extremely fast and efficient in-place processing of string
and character data. The following code demonstrates several of its
powerful methods and some of the techniques you can use to speed up
your string processing:
Dim workText As New System.Text.StringBuilder ' ----- Build a basic text block. workText.Append("The important") workText.Append(vbNewLine) workText.Append("thing is not") workText.AppendLine() workText.AppendLine("to stop questioning.") workText. Append("--Albert Einstein") MsgBox(workText.ToString( )) ' ----- Delete the trailing text. Dim endSize As Integer = "--Albert Einstein".Length workText.Remove(workText.Length - endSize, endSize) MsgBox(workText.ToString( )) ' ----- Modify text in the middle. workText.Insert(4, "very ") MsgBox(workText.ToString( )) ' ----- Perform a search and replace. workText.Replace("not", "never") MsgBox(workText.ToString( )) ' ----- Truncate the existing text. workText.Length = 3 MsgBox(workText.ToString( ))
The first line of the previous code creates a new instance of
the StringBuilder
object. The next half dozen or
so lines of code show various common uses of the StringBuilder's Append()
and AppendLine()
methods. Each call to Append()
or AppendLine()
concatenates another string or
character piece into the StringBuilder
’s buffer. Figure 5-1 shows the result of
these first few append actions.
Avoid the temptation to concatenate these string pieces using
the & operator as you prepare the various pieces for appending to
the StringBuilder
. Doing so
detracts from the efficiency and speed advantages of the StringBuilder
. For example, both of the
following lines of code are legal and correct, but the line that uses
the & operator does a lot more work behind the scenes:
' ----- Don't do this! workText.Append("This " & "is " & "not advisable!") ' ----- Please do this. workText.Append("This ").Append("is ").Append("faster!")
The first statement (the one using the & operator) must make working copies of the immutable strings to do the concatenations. Timing tests demonstrate that this can slow down your code measurably.
Besides Append()
, the
StringBuilder
object also provides methods that parallel other functions available for
processing true strings. These include
Remove(), Replace()
, and
Insert()
methods, as demonstrated in the
sample code presented earlier in this recipe. The Length
property shown in the sample is also
available as a standard property of strings. The remaining lines of
code in the sample demonstrate the use of these methods by modifying
parts of the original quote.
A StringBuilder
’s contents
are technically not a string. Rather, the StringBuilder
maintains an internal
buffer of characters that at any time can easily be
converted to a string using the StringBuilder
’s
ToString()
method. Think of a StringBuilder
as a string in the making
that’s not really a string until you want it to be.
Behind the scenes, the default StringBuilder
’s buffer starts out with a
working space, or capacity, of only 16 bytes. The buffer
automatically doubles in size whenever it needs more space, jumping to
32 bytes, then 64, and so on. If you have a good idea how much space
your string processing may require, you can initialize StringBuilder
’s buffer to a given capacity
during the declaration. For example, this declaration creates a
StringBuilder
instance with a
preallocated buffer size of 1,000 bytes:
Dim workText As New System.Text.StringBuilder(1000)
The advantage of providing the starting capacity is a potential performance boost. In this case, the buffer’s workspace won’t need to be doubled until enough strings have been appended to overflow the 1,000-byte limit.
You can access the StringBuilder
’s capacity at runtime through
its Capacity
property. It’s
enlightening to read this property to follow along as the StringBuilder
doubles in size during
execution. You can set the Capacity
to a new value at any time, but if you set the Capacity
to less than the StringBuilder
’s current Length
, an exception occurs. If your intent
is to shorten, or truncate, the contents of the buffer, set the
Length
property instead, and leave
Capacity
alone. The easiest way to
empty a StringBuilder
of its
contents is to set its Length
property to zero.
Recipe 5.26 also discusses building up strings from smaller components.
You need to create a string comprised of a single character repeated many times. These strings are sometimes useful in the formatting of ASCII text for display or printed output.
Create a new string of repeated characters using the String
class itself. One of its overloaded
constructors accepts a character to repeat and a repetition
count.
Most of the time you create string variables using the default
constructor, which initializes the variables to Nothing
. This is why you must assign a
string value to a string variable after creating it, but before using
its contents. However, you can use over-loaded versions of the string
constructor to assign string data immediately upon creation. One
version of the string constructor takes a character and a count and
efficiently builds a string by repeating the character the given
number of times. The following statement builds a string of 72
asterisks:
Dim lotsOfAsterisks As New String("*"c, 72)
Visual Basic 2005 also provides a second way to create strings
of duplicated characters. The StrDup()
function, which is very similar to
the original String()
function
found in Visual Basic 6, does the trick:
lotsOfAsterisks = StrDup(72, "*")
Notice the difference in the order of the parameters between the string constructor syntax and the function call. Fortunately, Visual Studio’s IntelliSense means you don’t have to memorize the order of the parameters.
Recipe 5.45 demonstrates another method of creating strings of a common character.
You want a string comprised of a sequence of characters repeated many times. For example, you want to create a fancy separator string comprised of alternating “+” and “~” characters, as shown in Figure 5-2.
Use a StringBuilder
to append
as many copies of the string as desired. Then convert the result to a
true string using the StringBuilder
’s
ToString()
method:
Dim fancyString As New System.Text.StringBuilder For counter As Integer = 1 To 35 fancyString.Append("+~") Next counter MsgBox(fancyString.ToString())
Strings in .NET are immutable, which means that once they’ve been created, they sit in one spot in memory and can never be modified. All functions that might appear to be changing a string’s contents are actually making new copies of the original string, modified en route. In most cases, immutability provides superior string handling and processing capabilities, but when it comes to concatenating strings, the speed and efficiency advantages are nullified.
The StringBuilder
object
solves the concatenation dilemma nicely. It allows dynamic, in-place
modification of a buffer containing a sequence of string characters,
without the need to constantly reallocate String
objects. If the allocated buffer
space runs out, the StringBuilder
efficiently and automatically doubles the number of bytes for its
character workspace, and it will do so as many times as are required
to handle the strings and characters appended to it.
Recipe 5.27
shows how the StringBuilder
alternative really is faster than standard string
concatenation.
You need to store a string in such a way that a user won’t recognize it, but you also want to make sure that the string stays the same length and that it contains only printable ASCII characters.
Sample code folder: Chapter 05ObfuscateString
Process each printable character of the string by shifting its ASCII value to that of another character within the same set. The following two functions can be used to obfuscate strings in this way and then return them to their original states:
Public Function Obfuscate(ByVal origText As String) As String ' ----- Make a string unreadable, but retrievable. Dim textBytes As Byte( ) = _ System.Text.Encoding.UTF8.GetBytes(origText) For counter As Integer = 0 To textBytes.Length - 1 If (textBytes(counter) > 31) And _ (textBytes(counter) < 127) Then textBytes(counter) += CByte(counter Mod 31 + 1) If (textBytes(counter) > 126) Then _ textBytes(counter) -= CByte(95) End If Next counter Return System.Text.Encoding.UTF8.GetChars(textBytes) End Function Public Function DeObfuscate(ByVal origText As String) _ As String ' ----- Restore a previously obfuscated string. Dim textBytes As Byte( ) = _ System.Text.Encoding.UTF8.GetBytes(origText) For counter As Integer = 0 To textBytes.Length - 1 If (textBytes(counter) > 31) And _ (textBytes(counter) < 127) Then textBytes(counter) -= CByte(counter Mod 31 + 1) If (textBytes(counter) < 32) Then _ textBytes(counter) += CByte(95) End If Next counter Return System.Text.Encoding.UTF8.GetChars(textBytes) End Function
Figure 5-3 shows a
string before and after calling Obfuscate()
, and after returning it to its
original state by calling DeObfuscate()
.
The Obfuscate()
function lets
you modify strings to an unreadable state without resorting to
full-blown cryptographic techniques. An example of where this might
come in handy is for storing string data in the registry in such a
manner that the original contents are not easily searched for and that
the typical user won’t recognize the data.
When modifying individual bytes of a string, it’s often best to first convert the string to an array of bytes, as shown in these functions. You can freely modify the byte values in place, unlike the contents of the immutable string they came from, and generate a new string result by converting the entire byte array in one function call.
If you work with international character sets, consider using the Unicode versions of the encoding conversion functions instead of the UTF8 versions. The byte arrays will be twice as large, but you should be able to handle other sets of characters. You’ll also need to pay close attention to the numerical shift of the byte values, modifying the above code to keep the results within the desired range of characters.
Recipe 5.23 discusses additional modifications to strings that can be reversed.
You need to convert a byte array to a hexadecimal string. This is handy for the display or documentation of binary data.
Use a bit converter to get the hexadecimal representation of each byte within a block of data. The following code generates the hexadecimal string from source data:
Dim result As String = Replace(BitConverter.ToString( _ origBytes), "-", "")
There are several approaches to solving this problem. A quick review of some of these approaches will demonstrate several different programming techniques available to you in Visual Basic 2005.
The code samples in this recipe assume a byte array named
origBytes
built using the following
code, which creates a byte array of length 256 containing one each of
the byte values 0 through 255:
Dim origBytes(255) As Byte For counter As Short = 0 To 255 origBytes(counter) = CByte(counter) Next counter
The first approach is somewhat “brute force” in nature. Each
byte of the array is converted to a two-character string using one of
the many formatting options of the byte’s
ToString()
method. These short strings are
concatenated to the result string one at a time:
Dim result As String = "" For counter As Short = 0 To 255 result &= origBytes(counter).ToString("X2") Next counter
This is fine for small arrays of bytes, but the string
concatenation quickly becomes problematic as the byte count increases.
The next approach uses a StringBuilder
to make the concatenation more
efficient for large data sources:
Dim workText As New System.Text.StringBuilder(600) For counter = 0 To 255 workText.Append(origBytes(counter).ToString("X2")) Next counter Dim result As String = workText.ToString()
This solution runs faster, but it seems to lack the elegance and
power we expect of Visual Basic. Fortunately, the .NET Framework is
full of surprises, and of useful objects too. The BitConverter
object provides a shared method
that converts an entire array of bytes to a hexadecimal string in one
call. The resulting string has dashes between each pair of hexadecimal
characters. This can be nice in some circumstances, but in this case,
we’re trying to create a compact hexadecimal string comprised of only
two characters for each byte. The following two lines of code show how
to call the BitConverter.ToString()
method, and then squeeze out all the dashes using a single call to the
Replace()
function:
Dim result As String result = BitConverter.ToString(origBytes) '00-3F-F7 etc. result = Replace(result, "-", "") '003FF7 etc.
The solution presented first in this recipe is the result of combining these two function calls into a single line of code. Figure 5-4 shows the resulting hexadecimal string displaying all possible byte values.
Recipes 5.16 and 5.26 show other useful ways of modifying portions of strings.
You want to extract substrings located at the left end, the right end, or somewhere in the middle of a string.
Visual Basic 2005 strings now have a built-in method named
Substring()
that provides an
alternative to the traditional Visual Basic functions Left(), Mid()
, and Right()
, although the language retains these
features if you wish to use them. To emulate each of these functions,
set the Substring()
method’s
parameters appropriately. The following code shows how to do
this:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" ' ----- Left(quote, 3) … "The" MsgBox(quote.Substring(0, 3)) ' ----- Mid(quote, 5, 9) … "important" MsgBox(quote.Substring(4, 9)) ' ----- Mid(quote, 58) … "Einstein" MsgBox(quote.Substring(57)) ' ----- Right(quote, 8) … "Einstein" MsgBox(quote.Substring(quote.Length - 8))
Each line of code in the sample is prefaced by a comment line
showing the equivalent syntax from VB 6. One of the big differences
apparent in these examples is that the first character in the string
is now at offset position 0 instead of 1, requiring a change in the
offsets supplied to the Substring()
method. The lengths of the sub-strings are still the same.
You want to convert a string to all uppercase, all lowercase, or mixed case (with only the first letter of each word in uppercase).
Sample code folder: Chapter 05MixedCase
The string methods ToUpper()
and ToLower()
make it easy to convert strings to
upper-and lowercase, and a short special-purpose function can perform
the mixed conversion. You can also use the standard Visual Basic
UCase()
and LCase()
methods. To mix-case a string, use
Visual Basic’s StrConv()
function.
Changing strings to upper- or lowercase is standard Visual Basic fare:
' ----- To upper case. newString = oldString.ToUpper() newString = UCase(oldString) ' ----- To lower case. newString = oldString.ToLower() newString = LCase(oldString)
To convert the string to mixed or “proper” case, use one of the
conversion methods included in the StrConv()
function:
newString = StrConv(oldString, VbStrConv.ProperCase)
This function converts the first letter of each word to
uppercase, making every other letter lowercase. Its rules are pretty
basic, and it doesn’t know about special cases. If you need to
correctly capitalize names such as “MacArthur,” you have to write a
custom routine. The following code provides the start of a routine
using an algorithm that works much like the StrConv()
function. It assumes that space
characters separate each word:
Public Function MixedCase(ByVal origText As String) As String ' ----- Convert a string to "proper" case. Dim counter As Integer Dim textParts() As String = Split(origText, " ") For counter = 0 To textParts.Length - 1 If (textParts(counter).Length > 0) Then _ textParts(counter) = _ UCase(Microsoft.VisualBasic.Left( _ textParts(counter), 1)) & _ LCase(Mid(textParts(counter), 2)) Next counter Return Join(textParts, " ") End Function
The code splits up the original text into an array at
space-character boundaries using the Split()
function. It then processes each
word separately and merges them back together with the Join()
method.
Figure 5-5 shows
the results of various conversions on a string, including a conversion
using the custom MixedCase()
function. Notice that “albert” is not capitalized in the mixed-case
string. This is because the two leading dashes are considered to be
part of this word, based on how the Split()
function separated the words at
space-character locations.
Recipe 5.44
discusses the Split()
function and
the Split()
method.
You need to compare two strings, taking into account their case.
Use the shared Compare()
method provided by the String
object to compare two strings:
Select Case String.Compare(content1, content2, False) Case Is < 0 MsgBox("Content1 comes before Content2.") Case Is > 0 MsgBox("Content1 comes after Content2.") Case Is = 0 MsgBox("Content1 and Content2 are the same.") End Select
Setting the third parameter of the Compare()
method to False
instructs the method to perform a
case-sensitive comparison.
Consider the results shown in Figure 5-6, which indicate
that “apples” is less than “Apples”. The ASCII values for the
lowercase character “a” and the uppercase character “A” are 97 and 65,
respectively, which normally puts the uppercase version first. But the
String.Compare()
method compares
text using culture-defined sorting rules, and by default, English
words beginning with lowercase letters are considered “less than” the
same words beginning with uppercase letters.
You can change the comparison rules in several ways to match
what you want to accomplish. See the Visual Studio online help for the
CompareOptions
property for more
information on how to make these changes.
Recipe 5.9 discusses related comparisons.
You need to compare two strings without regard to their case.
Use the shared Compare()
method provided by the String
object to compare two strings:
Select Case String.Compare(content1, content2, True) Case Is < 0 MsgBox("Content1 comes before Content2.") Case Is > 0 MsgBox("Content1 comes after Content2.") Case Is = 0 MsgBox("Content1 and Content2 are the same.") End Select
Setting the third parameter of the Compare()
method to True instructs the
method to perform a case-insensitive comparison.
This type of string comparison compares all alphabetic characters as though lower-case and uppercase characters were identical. Figure 5-7 shows that “apples” is equal to “Apples” when the strings are compared this way.
String comparisons are culturally defined by default, so be sure
the sort order you get is really what you want. See the Visual Studio
online help for the CompareOptions
property to find more information on how to make changes to the way
strings are sorted.
Recipe 5.8 discusses related comparisons.
You need to work with individual characters in a string efficiently, changing them in place in memory if possible.
Sample code folder: Chapter 05StringsAndCharArrays
Use CType()
to convert the string to an array of
characters, modify characters throughout the array, and then directly
convert the character array back to a string:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim charArray() As Char = CType(quote, Char()) charArray(46) = "!"c Dim result As String = New String(charArray) MsgBox(result)
In this example, the string is converted to a character array
using the versatile CType()
type-conversion function. In this form, it’s easy to make a change
such as replacing the period at index 46 with an exclamation point.
The array is then recombined into a string by passing it to the
overloaded version of the String
constructor that takes an array of characters to initialize the new
string. Figure 5-8 shows
the displayed string result, now showing an exclamation point instead
of a period.
There is another way to access individual characters in a string, but it’s read-only, so you can’t use the technique to modify the string:
MsgBox(someString. Chars(46))
All strings have a Chars()
property that lets you access an indexed character from the string
with minimal overhead. The index is zero-based, so Chars(46)
returns the 47th character.
Recipe 5.12 also examines working with individual characters within a larger string.
You need to convert a string to bytes, and back to a string from a byte array. This enables you to work with the exact binary data comprising the string.
Sample code folder: Chapter 05StringsAndByteArrays
Use shared methods of the System.Text. Encoding
object to convert to and from bytes.
If you know the string data to be comprised entirely of ASCII
characters, use UTF8 encoding to minimize the length of the byte
array. Unicode encoding, which results in two bytes per character
instead of one, can be used to guarantee no loss of data when making
these conversions.
The following sample code shows both UTF8 and Unicode encoding methods:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim bytes() As Byte Dim result As String ' ----- Assumed to be all ASCII character. bytes = System.Text.Encoding.UTF8.GetBytes(quote) bytes(46) = 33 ' ASCII exclamation point result = System.Text.Encoding.UTF8.GetString(bytes) MsgBox(result) ' ----- Works with all character sets. bytes = System.Text.Encoding.Unicode.GetBytes(quote) bytes(92) = 63 ' ASCII question mark bytes(93) = 0 result = System.Text.Encoding.Unicode.GetString(bytes) MsgBox(result)
When using UTF8 encoding, the number of bytes in the array is the same as the number of characters in the string. The character at indexed position 46 in the string is a period. During the first conversion, this period is changed to an exclamation point, and the resulting string is displayed, a result identical to that previously shown in Figure 5-8.
A Unicode-encoded byte array contains twice as many bytes as the number of characters in the original string. This makes sense when you consider that Unicode characters are 16 bits each (or two bytes) in size. Take a close look at the byte array modifications in the second part of the example code. The byte at position 92 (twice as far into the array as the ASCII variation) is set to the desired ASCII value (63 in this case, for the question mark). But because each character now consumes two bytes in the array, you must set both bytes. Setting the byte at position 93 clears the other half of the two-byte set. Figure 5-9 shows the resulting string, now sporting a question mark at the 46th character position.
You want to tally, or count the occurrences of, each character value in a string.
Sample code folder: Chapter 05TallyCharacters
Convert the string to a byte array, and then tally the 256 possible byte values into an array of integer counts.
In the case presented, the string is assumed to be all ASCII, which means conversion using UTF8 encoding is appropriate, and the tally array only needs to be dimensioned to hold 256 counting bins:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim counter As Integer Dim tally(255) As Integer
Convert the string to a byte array, and then loop through each byte of the array to increment the count for each byte value:
Dim bytes() As Byte = _ System.Text.Encoding.UTF8.GetBytes(quote) For counter = 0 To bytes.Length - 1 tally(bytes(counter)) += 1 Next counter
The rest of the example prepares the tally for display. For efficiency, the code presents only characters with nonzero counts:
Dim result As New System.Text.StringBuilder(quote) For counter = 0 To 255 If (tally(counter) > 0) Then result.AppendLine( ) result.Append(Chr(counter)) result.Append(Space(3)) result.Append(tally(counter).ToString( )) End If Next counter MsgBox(result.ToString( ))
Figure 5-10 shows the results.
If you want to tally Unicode characters, you need to either dimension a much larger tally array or use a lookup system that constantly adds and counts characters as it finds them.
Recipe 5.11 provides additional details on encoded conversions.
Sample code folder: Chapter 05CountWords
Use the Split()
function to split the string at each
space character. The length of the resulting array is a good
approximation of the number of words in the string.
There always seems to be more than one way to get things done in Visual Basic 2005, and counting words is no exception. The following code shows one quick-and-dirty technique that requires very little coding to get the job done:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim wordCount As Integer = Split(quote, Space(1)).Length MsgBox(quote & vbNewLine & "Number of words: " & _ wordCount.ToString)
Figure 5-11 shows the resulting number of words in the string.
Inaccuracies can creep in if there are multiple spaces between
some words in the string, if extra spaces appear at either or both
ends of the string, or if other whitespace characters (such as tabs)
are involved. A little preparation of the string can help eliminate
some of these problems, but at the expense of added complexity. For
example, the following lines of code get rid of runs of two or more
space characters, replacing them with single spaces. Adding this code
just before the Split()
function
can provide a more accurate word count:
Do While (quote.IndexOf(Space(2)) >= 0) quote = quote.Replace(Space(2), Space(1)) Loop
Similarly, you can use the Replace()
method to replace all tabs with
spaces (probably best done just before converting all multiple spaces
to single spaces). As you can probably sense, efforts to guarantee a
more accurate count cause the code to grow quickly. The best course is
to decide what degree of word-counting accuracy is required, how much
value to place on speed of operation, and so on before deciding how
much cleanup code to add.
Another solution to this problem involves regular expressions, which are covered in Recipes 5.37, 5.38, 5.39, 5.40, 5.41 through 5.42.
Recipe 5.42 shows how to solve this same problem using a different solution.
You want to remove all extra whitespace characters from a string, leaving a single space character between each word.
Sample code folder: Chapter 05RemoveWhitespace
There are several possible ways to remove extra whitespace from
a string. One approach, presented here, is to test each character of
the string to see if it is whitespace and to build up the resulting
string using a StringBuilder
:
Dim source As String = _ Space(17) & "This string had " & Chr(12) & _ StrDup(5, Chr(9)) & "extra whitespace. " & Space(27) Dim thisIsWhiteSpace As Boolean Dim prevIsWhiteSpace As Boolean Dim result As New System.Text.StringBuilder(source.Length) Dim counter As Integer For counter = 0 To source.Length - 1 prevIsWhiteSpace = thisIsWhiteSpace thisIsWhiteSpace = _ Char.IsWhiteSpace(source.Chars(counter)) If (thisIsWhiteSpace = False) Then If (prevIsWhiteSpace = True) AndAlso _ (result.Length > 0) Then result.Append(Space(1)) result.Append(source.Chars(counter)) End If Next counter MsgBox("<" & result.ToString( ) & ">")
The previous code first builds a test string comprised of words separated by extra spaces, tabs, and other whitespace characters. After processing to replace runs of whitespace characters with single spaces, the resulting string is displayed for inspection, as shown in Figure 5-12.
Another straightforward approach to removing extra whitespace is
to use a series of Replace()
functions, first to replace tabs
and other whitespace characters with spaces, and finally to replace
multiple spaces with single ones. This will work fine, but the
disadvantage is that many temporary strings are built in memory as the
immutable strings are processed. The code presented here moves each
character in memory only once, or not at all if the character is an
extra whitespace.
Another good approach is to use regular expressions to grab an
array of the words and then piece them back together with single
spaces using a StringBuilder
.
Recipe 5.42 shows how to use regular expressions to attack the multiwhitespace problem.
You are developing an application that will run on several platforms, so you want to use end-of-line characters that are compatible with all platforms.
Sample code folder: Chapter 05EndOfLine
Use the property Environment.NewLine
, which returns the
end-of-line characters for the current platform. For example, the
following code adds a self-describing line of text to a StringBuilder
and ends the line with the
newline characters for the current platform:
Dim result As New System.Text.StringBuilder result.Append("Environment.NewLine").Append( _ Environment.NewLine) MsgBox(result.ToString())
The following code, which simply extends the prevous short snippet, terminates lines in 10 different ways, all with the same result in the Windows environment:
Dim result As New System.Text.StringBuilder result.Append(" vbNewLine").Append(vbNewLine) result.Append("vbCrLf").Append(vbCrLf) result.Append("vbCr").Append(vbCr) result.Append("vbLf").Append(vbLf) result.Append("Chr(13)").Append(Chr(13)) result.Append("Chr(10)").Append(Chr(10)) result.Append("Chr(13) & Chr(10)").Append(Chr(13) & Chr(10)) result.Append("Environment.NewLine").Append( _ Environment.NewLine) result.Append("ControlChars.CrLf").Append(ControlChars.CrLf) result.Append("ControlChars.NewLine").Append( _ ControlChars.NewLine) MsgBox(result.ToString( ))
Figure 5-13 shows each of these self-describing lines as displayed by the message box in the last line.
Different platforms, such as Linux and Mac OS, expect different combinations of carriage-return and line-feed characters to terminate lines in documents or in displayed text. Visual Basic 2005 defines several constants you can use that explicitly combine these characters in a variety of ways. These named constants are easily identified by their “vb” prefix.
The somewhat generic vbNewLine
constant provides a
platform-dependent end of line, but only if an application is
recompiled on each platform. Feel free to substitute any of the others
if you find them more suitable.
The ControlChars.NewLine
property is not a
constant. Instead, this property polls the current operating system
and returns the correct sequence of characters. This is your best
choice when you want to compile a .NET application on one platform but
run it on another.
The StreamWriter
object has
a property named NewLine
, which
can be altered to change its default end-of-line definition. This
lets you change the set of characters inserted into the stream at
the end of each call to the StreamWriter
’s WriteLine()
method. This can be handy, for
example if you wish to automate double spacing of lines.
Recipe 5.19 makes use of line endings in its adjustment of a string.
You need to find and replace all occurrences of a substring in a larger string.
The following example replaces all occurrences of lowercase “ing” with uppercase “ING” in a sample string:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim result As String = quote.Replace("ing", "ING") MsgBox(result)
Figure 5-14 shows the results, where two occurrences were found and replaced.
In this example, the substrings are replaced with a new string of the same length, but the replacement string can be of differing length. In fact, a useful technique is to make a replacement with a zero-length string, effectively deleting all occurrences of a given substring. For example, the following code, applied to the original string, results in the shortened string displayed in Figure 5-15:
result = Quote.Replace("not to stop ", "")
Recipe 5.21 shows how to remove characters from the start and end of a string.
You want to insert a character or string into another string at a given location.
The string method Insert()
is
overloaded to accept either a character or a string to be inserted at
a given location. For example, the following Insert()
method adds a comma just after the
word “thing” in the sample string:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim result As String = quote.Insert(19, ","c) MsgBox(result)
Figure 5-16 shows the result of inserting the comma character.
In this case the character is inserted after the 19th character
of the string, or just after the “g” in “thing.” You can
insert a character in the first position of a string by using position
0, and at the end of a string by using the string’s Length
value.
The following code inserts the word “definitely " into the sample string. The inserted text includes a space at the end to keep the words spaced correctly in the result:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" quote = quote.Insert(23, "definitely ") MsgBox(quote)
The 23rd position in the original string is just after the “s” character in “is not.” Figure 5-17 shows the result of this word insertion.
Recipe 5.18 also discusses text insertions.
You want to insert a complete line of text in a string that contains multiple lines separated by newlines. The desired insertion point is after the nth line.
Sample code folder: Chapter 05InsertLine
Split the string into a string array using the newlines as the
split point, append the line to be inserted to the
nth string, and use
Join()
to glue the string back together
again.
Use the string function Split()
, which is not to be confused with the
String.Split()
method, to split the string
into a string array. The Split()
method splits the string at individual-character split points, but the
Split()
function lets you split the
string using a multicharacter string for the defined split point. The
vbNewLine
constant is actually a
two-character string, so you must use the Split()
function to avoid splitting on the
carriage-return character only, leaving the line-feed character at the
front end of each array string.
Rather than redimensioning the string array to shuffle the lines
and create a slot in which to insert the new one, it’s easier to just
concatenate the new string, accompanied by a newline constant, to the
appropriate string in the array. This is a simpler and more efficient
procedure that involves less shuffling of string data in memory, and
the results after doing a Join()
are identical.
This insert functionality works well as a standalone function, which is presented in the following lines of code:
Public Function InsertLine(ByVal source As String, _ ByVal lineNum As Integer, _ ByVal lineToInsert As String) As String ' ----- Insert a line in the middle of a set of lines. Dim lineSet( ) As String Dim atLine As Integer ' ----- Break the content into multiple lines. lineSet = Split(source, vbNewLine) ' ----- Determine the new location, being careful not ' to fall off the edge of the line set. atLine = lineNum If (atLine < 0) Then atLine = 0 If (atLine >= lineSet.Length) Then ' ----- Append to the end of everything. lineSet(lineSet.Length - 1) &= vbNewLine & lineToInsert Else ' ----- Insert before the specified line. lineSet(atLine) = _ lineToInsert & vbNewLine & lineSet(atLine) End If ' ----- Reconnect and return the parts. Return Join(lineSet, vbNewLine) End Function
The string is first split at line boundaries into a string
array. LineNum
is the number of the
line after which the lineToInsert
string is inserted. You can pass zero to this parameter to insert the
new line before the first one. After appending the new string to the
appropriate string in the array, along with a vbNewLine
to separate it from the original
line, the array is glued back together with the Join()
function, using a vbNewLine
between each line to restore its
original structure. This new string is then returned as the result of
the InsertLine()
function.
The following lines of code demonstrate the function’s use:
Dim result As New System.Text.StringBuilder result.AppendLine("This string") result.AppendLine("contains") result.AppendLine("several") result.AppendLine("lines") result.Append("of text.") ' ----- Show the original content. Dim resultAsString As String = result.ToString( ) MsgBox(resultAsString) ' ----- Show the modified content. resultAsString = InsertLine(resultAsString, 3, "(inserted)") MsgBox(resultAsString)
A StringBuilder
is used to
build the original string containing several lines of text separated
by vbNewLines
. The first message
box (displayed in Figure
5-18) shows the string before the extra line is inserted. The
second message box (displayed in Figure 5-19) shows the new
string inserted after the third line.
The Split()
method will
accept either a character or a string to define the split points in
a string, but only the first character of the string is used. The
Split()
function, however, uses
the entire string parameter, of any length, to split the string.
Both the Split()
method and the
Split()
function are very handy,
but make sure you understand the difference in the way they
work.
Recipe 5.17 also
discusses text insertions. The difference between the Split()
method and the Split()
function is further discussed in
Recipe 5.44.
You want to double-space a string comprised of multiple lines of text separated by newlines.
Use the String
object’s
Replace()
method to replace all
vbNewLines
with two vbNewLines
.
The Replace()
method provides
an easy solution to this problem. Simply replace each occurrence of a
vbNewLine
separating the lines of
text with a double vbNewLine
:
content = content.Replace(vbNewLine, vbNewLine & vbNewLine)
Figures 5-20 and 5-21 show a multiline example string before and after this replacement.
Recipe 5.16 shows how to replace specific substrings within a larger string.
You want to format a number into a string suitable for displaying or printing, something that provides formatting control beyond the defaults.
Sample code folder: Chapter 05 FormatNumbers
Apply the String
object’s
Format()
method, and use its custom
formatting codes to get the output you desire.
There are several ways and places in Visual Basic 2005 to apply
formatting to numerical data. One of the best (and possibly the
easiest to remember) is the Format()
method, available as a shared
method of the String
object. A few
simple examples will show you how to use this method:
Dim intValue As Integer = 1234567 Dim floatValue As Double = Math.PI Dim result As New System.Text.StringBuilder result.AppendLine(String.Format("{0} … {1}", _ intValue, floatValue)) result.AppendLine(String.Format("{0:N} … {1:E}", _ intValue, floatValue)) result.AppendLine(intValue.ToString("N5") & " … " & _ floatValue.ToString("G5")) MsgBox(result.ToString())
This example formats an Integer
and a Double
in several different ways. Other
numerical values, such as Long, Short,
Single, Decimal
, and so on, can be formatted in the same
ways. Figure 5-22 shows
the result of applying the above formatting.
The Format()
method’s first
argument is a formatting string that indicates how to use the
remaining arguments. It can include zero or more zero-based position
specifiers in curly braces. For instance, the text {1
} says to insert the second data argument
at that position. Consider this line of code:
result = String.Format( _ "There are about {0} days in {1} years.", _ 365.25 * 3, 3, 17)
The first indexed specifier, {0
}, inserts the first data argument, the
calculated result of 365.25 * 3. The second indexed formatting
specifier, {1
}, inserts the integer
value 3 at that spot in the resulting string. The argument list also
includes a third data element, 17
,
but because {2
} does not appear in
the format string, that argument is ignored.
You can use as many indexed formatting specifiers as you want in a single string, but you should always provide a matching indexed argument in the method call following the string, and the first argument is always zero-based. You can use the same argument more than once, you can use them in any order, and you can even skip some arguments. The important thing to remember is to match carefully the index number in the brackets with the argument’s position, starting with zero.
When the index appears in the braces by itself, a default format
is used. However, there are many formatting options available to
customize the formatting. In the previous sample code, the {0:N
} formatted the number to contain commas
between every third digit, and {1:E
} formatted the number using scientific
notation. The Visual Studio online help documentation for the Format()
method lists the many formatting
options in detail.
You might have noticed that the last formatting line in the
example is quite different from the previous ones. If you want to
format a number into a string format without directly inserting it
into a bigger string, you can use the many formatting options of the
ToString()
method, a method
available to every .NET object (although specially overloaded for the
numeric data types). In our example, the first number was formatted
using “N5”, which inserts commas and formats the digits to five places
after the decimal point. The second number was formatted using “G5”,
causing “general” formatting of the number to five significant
digits.
There are other formatting options for creating hexadecimal strings, formatting dates and times, formatting culture-specific data such as currency values, and so on. Several of these formatting options are used throughout this book. See the Visual Studio online documentation for specific predefined and custom format strings.
See the “String.Format” and “NumberFormatInfo Class” topics listed in the Visual Studio online help index. There are many links to related information, so plan to explore the help content for a while.
You need to delete extraneous characters from each end of a string.
Use the String
object’s
Trim()
method, passing to it a list of all
characters to be deleted.
The following example deletes four letters from the head and
tail ends of a string. The letters chosen are just for demonstrating
how the Trim()
method works; a
real-world example of where this might be handy would be to remove
line numbers, colons, or other characters from the beginnings or ends
of strings. As shown in Figure
5-23, the following code causes the entire first word (“The”)
and the last character (“n”) to be removed, or trimmed, from the
string:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim trimChars() As Char = {"T"c, "h"c, "e"c, "n"c} Dim result As String = quote.Trim(trimChars) MsgBox(result)
You do not need to supply the characters in any particular
order; all supplied characters will be trimmed. Trimming continues
until the first and last characters of the string are something other
than those supplied to the Trim()
method. If you supply no arguments to Trim()
, all whitespace characters are
trimmed instead.
If you want to trim certain characters from either the start or
end of the string, but not both, use the TrimStart()
and TrimEnd()
methods, respectively. They accept
the same character-array argument as the Trim()
method.
Recipes 5.14 and 5.16 discuss related techniques.
You want to check a string variable to see whether it has been assigned a value, or if it can be converted to a number, date, or time. This check can prevent an exception, and it can free your code from having to use an exception as part of its testing logic.
Sample code folder: Chapter 05StringTypes
Visual Basic 2005 has three string functions that help solve
this problem: IsNothing(), IsNumeric()
, and IsDate()
. Use these to test a string’s
contents before attempting conversions.
The following code demonstrates the use of these three functions
with data set to Nothing
:
Dim theData As String = Nothing Dim result As New System.Text.StringBuilder ' ----- Format nothing. result.AppendLine(String.Format( _ "IsNumeric({0}) … {1}", theData, IsNumeric(theData))) result.AppendLine(String.Format( _ "IsDate({0}) … {1}", theData, IsDate(theData))) result.AppendLine(String.Format( _ "IsNothing({0}) … {1}", theData, IsNothing(theData))) result.AppendLine()
String variables are normally undefined, assigned the value of
Nothing
. We specifically assigned
theData
the value Nothing
in the above code, but if we had
left it blank Visual Studio would have questioned our motives and
marked the first use of theData
with a warning, as shown in Figure 5-24. As you can see,
the unassigned string variable has squiggly lines under it, indicating
a problem; hovering the mouse pointer over it causes the displayed
explanation to pop up. This is a nonfatal warning, and the program
will still run.
As shown in the first three lines of output displayed in Figure 5-25 (below), in this
case the IsNumeric()
and IsDate()
functions verify that the string
does not represent a valid number or date, but it does pass the
IsNothing()
test, as
expected.
Next, the string is assigned a value that represents a valid number:
' ----- Format a number in a string. theData = "-12.345" result.AppendLine(String.Format( _ "IsNumeric({0}) … {1}", theData, IsNumeric(theData))) result.AppendLine(String.Format( _ "IsDate({0}) … {1}", theData, IsDate(theData))) result.AppendLine(String.Format( _ "IsNothing({0}) … {1}", theData, IsNothing(theData))) result.AppendLine()
When the three tests are repeated, they match expectations. As
shown in the middle three lines of output in Figure 5-25, the IsNumeric()
test now returns True
, and the IsDate()
and IsNothing()
tests return False
.
Finally, the string is assigned a valid date, and the three tests are repeated for the last time:
' ----- Format a date in a string. theData = "July 17, 2007" result.AppendLine(String.Format( _ "IsNumeric({0}) … {1}", theData, IsNumeric(theData))) result.AppendLine(String.Format( _ "IsDate({0}) … {1}", theData, IsDate(theData))) result.Append(String.Format( _ "IsNothing({0}) … {1}", theData, IsNothing(theData))) MsgBox(result.ToString())
In this last case the IsDate()
function returns True
, and the other two tests return
False
, as shown in the last three
lines of output in Figure
5-25.
Recipes 5.24 and 5.25 show how to examine content for correct processing.
You need to convert string data to and from byte arrays using an encoding method matched to your data, environment, or culture.
Sample code folder: Chapter 05Encoding
Use System.Text.Encoding
shared functions to convert between strings and byte arrays, using
either UTF7, UTF8, Unicode, or UTF32 encoding, as appropriate.
The following code starts with a sample string and then converts
it to four byte arrays, one for each type of encoding. The length of
each byte array will vary as a function of the encoding (to be
explained in more detail later), so the Length
property of each array is formatted
into a StringBuilder
for display at
the end of the code. The four byte arrays are then converted back to
Strings
, using the same encoding in
each case, and a quick check is made to verify that the resulting
strings match the original:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim result As New System.Text.StringBuilder ' ----- Convert a string to various formats. Dim bytesUTF7 As Byte( ) = _ System.Text.Encoding.UTF7.GetBytes(quote) Dim bytesUTF8 As Byte( ) = _ System.Text.Encoding.UTF8.GetBytes(quote) Dim bytesUnicode As Byte( ) = _ System.Text.Encoding.Unicode.GetBytes(quote) Dim bytesUTF32 As Byte( ) = _ System.Text.Encoding.UTF32.GetBytes(quote) ' ----- Show the converted results. result.Append("bytesUTF7.Length = ") result.AppendLine(bytesUTF7.Length.ToString( )) result.Append("bytesUTF8.Length = ") result.AppendLine(bytesUTF8.Length.ToString( )) result.Append("bytesUnicode.Length = ") result.AppendLine(bytesUnicode.Length.ToString( )) result.Append("bytesUTF32.Length = ") result.AppendLine(bytesUTF32.Length.ToString( )) ' ----- Convert everything back to standard strings. Dim fromUTF7 As String = _ System.Text.Encoding.UTF7.GetString(bytesUTF7) Dim fromUTF8 As String = _ System.Text.Encoding.UTF8.GetString(bytesUTF8) Dim fromUnicode As String = _ System.Text.Encoding.Unicode.GetString(bytesUnicode) Dim fromUTF32 As String = _ System.Text.Encoding.UTF32.GetString(bytesUTF32) ' ----- Check for conversion issues. If (fromUTF7 <> quote) Then _ Throw New Exception("UTF7 Conversion Error") If (fromUTF8 <> quote) Then _ Throw New Exception("UTF8 Conversion Error") If (fromUnicode <> quote) Then _ Throw New Exception("Unicode Conversion Error") If (fromUTF32 <> quote) Then _ Throw New Exception("UTF32 Conversion Error") MsgBox(result.ToString( ))
All strings in .NET are internally stored as two-byte Unicode characters. However, if each character of the string always falls within a known range of characters, the string can be converted to a one-byte-per-character byte array.
UTF7 encoding converts each character of the string to a single byte with the assumption that only the lower seven bits of each byte are used, leaving the highest-order bit as zero in all cases. This is true of ASCII characters with binary values in the range 0to 127, which covers the normal range of English-language displayable and printable characters.
UTF8 is very similar to UTF7, but it also allows conversion of special characters in the byte value range 128 to 255. This is the extended ASCII character set that is sometimes used for special purposes. UTF8 uses all eight bits of each byte to define each character’s value in the range 0 to 255.
Today’s computer systems now invariably use the international standard Unicode character set, which requires two bytes per character. Standard ASCII characters still fall within the same 0to 127 range in Unicode, so the second byte of each Unicode character in this range is set to zero. Other languages and cultures have character sets with Unicode integer values greater than 255, and Visual Basic strings handle them just fine.
UTF32 is not widely used, because it requires four bytes per character. However, even the two-byte Unicode characters occasionally require multiple sequential characters to define the specialized characters defined in some languages. UTF32 covers all possible characters in a simple four-bytes-per-character way, allowing internal processing simplifications. Generally, most worldwide string data is stored on external media in the two-byte Unicode format. Only occasionally is it converted to and processed as four-byte UTF32 bytes, and then only while in memory.
For most ASCII conversions, UTF8 is a good choice, requiring the same number of bytes as UTF7 but handling the full range of character values from 0to 255. If squeezing bytes down to a minimum is not a mandate, Unicode is the safest bet.
Recipe 5.11 shows how to store standard string data as byte values.
You want to determine if a character is a letter, a digit, whitespace, or any of several other types before processing it further. This can avoid unexpected exceptions, or prevent having to use an exception on purpose to help determine the type of a character.
Sample code folder: Chapter 05CharType
Use one of the many type-testing shared methods of the Char
object.
The Char
object includes
several methods that let you determine if a character is part of a
larger general category of characters, such as the set of digits. The
following code shows many of these in operation while it creates a
handy listing of the types of all characters in the ASCII range 0 to
127:
Dim result As New System.Text.StringBuilder Dim counter As Integer Dim testChar As Char Dim testHex As String Dim soFar As Integer ' ----- Scan through the first half of the ASCII chart. For counter = 0 To 127 ' ----- What character will we test this time? testChar = Chr(counter) testHex = "x" & Hex(counter) If Char.IsLetter(testChar) Then _ result.AppendLine(testHex & " IsLetter") If Char.IsControl(testChar) Then _ result.AppendLine(testHex & " IsControl") If Char.IsDigit(testChar) Then _ result.AppendLine(testHex & " IsDigit") If Char.IsLetterOrDigit(testChar) Then _ result.AppendLine(testHex & " IsLetterOrDigit") If Char.IsLower(testChar) Then _ result.AppendLine(testHex & " IsLower") If Char.IsNumber(testChar) Then _ result.AppendLine(testHex & " IsNumber") If Char.IsPunctuation(testChar) Then _ result.AppendLine(testHex & " IsPunctuation") If Char.IsSeparator(testChar) Then _ result.AppendLine(testHex & " IsSeparator") If Char.IsSymbol(testChar) Then _ result.AppendLine(testHex & " IsSymbol") If Char.IsUpper(testChar) Then _ result.AppendLine(testHex & " IsUpper") If Char.IsWhiteSpace(testChar) Then _ result.AppendLine(testHex & " IsWhiteSpace") ' ----- Display results in blocks of 16 characters. soFar += 1 If ((soFar Mod 16) = 0) Then MsgBox(result.ToString( )) result.Length = 0 End If Next counter
The message box displays the results for 16 characters at a time. Figure 5-26 shows the output displayed for the first set of characters, and Figure 5-27 shows the results for characters with hexadecimal values in the range of some of the ASCII digits and letters.
Note that many characters fall into several categories. For
example, the “0” (zero) character with hexadecimal value 30passes the
test for IsDigit, IsLetterOrDigit
,
and IsNumber
.
Recipe 5.22 includes examples of verifying logical data within strings, instead of the individual characters.
You want to convert string data to several types of numeric or date/time variables in a consistent way.
Sample code folder: Chapter 05ParseString
Use the Parse()
method provided by all types of
variables in Visual Basic 2005.
The Parse()
method is the
counterpart to each object’s ToString()
method. That is, the string
created by calling an object’s ToString()
method will always be in a
for-mat suitable for converting back to the same type of object using
its Parse()
method. A few examples
can help clarify this:
Dim doubleParse As Double = Double.Parse("3.1416") Dim ushortParse As UShort = UShort.Parse("65533") Dim dateParse As Date = Date.Parse("December 25, 2007") MsgBox(String.Format( _ "doubleParse: {0}{3}ushortParse: {1}{3}dateParse: {2}", _ doubleParse, ushortParse, dateParse, vbNewLine))
As shown in Figure 5-28, the data items are stored in the variables as expected when they are parsed.
In many cases, you might want to first check the string to make
sure it can be parsed to the desired type of variable before making
any attempt to do so. For example, use the IsDate()
function to test a string to make
sure it can be converted successfully before calling a Date
variable’s Parse()
method to parse the date from the
string. If the string is not convertible to the indicated data type,
an exception will occur.
Recipe 5.22 discusses additional content-verification methods.
You want to concatenate strings quickly and efficiently.
Sample code folder: Chapter 05Concatenate
Use the &= concatenation shortcut, or, even better, use a
StringBuilder
.
Visual Basic 2005 offers a few tricks for working with strings more efficiently. The following code presents several helpful techniques, from least to most efficient.
This approach simply concatenates two words and assigns the resulting string to a string variable:
Dim quote As String quote = "The " & "important "
This is how additional string data was always concatenated to the end of a string in VB 6 and earlier versions of the BASIC language:
quote = quote & "thing "
Because .NET strings are immutable, this code copies the current
contents of quote to a new location in memory, then copies the short
string "thing
" to its tail end,
and finally assigns the address of the resulting string to the
quote
variable, marking the
previous contents of quote
for
garbage collection. By the time you’ve repeat this type of command a
few times to concatenate more strings to the tail end of quote
, a lot of bytes have gotten shuffled
in memory.
This newer technique, available in Visual Basic 2005, provides an improved syntax, although timing tests seem to indicate that a lot of string data is still being shuffled in memory:
quote &= "is not to stop questioning. " quote &= "--Albert Einstein"
The StringBuilder
is by far
the better way to proceed when concatenating many strings end to end,
and you’ll find a lot of examples of its use in this book. As shown
here, you can run the Append()
method on the results of another Append()
, which may or may not make it
easier to read the code:
Dim result As New _ System.Text.StringBuilder("The important thing ") result.Append("is questioning. ") result.Append("--").Append("Albert ").Append("Einstein")
As explained in Recipe
5.1, the StringBuilder
maintains an internal buffer of characters, not a true string, and the
buffer grows by doubling in size whenever room runs out during an
Append()
operation. String data is
concatenated in place in memory, which keeps the total clock cycles
for concatenation way down compared to standard string
techniques.
Just to round things out, these last few lines show some of the
additional commands available when working with a StringBuilder
:
result.Insert(23, "note to stop ") result.Replace("note", "not") result.Insert(0, quote & vbNewLine) MsgBox(result.ToString())
These lines complete the building of the string data displayed
by the message box shown in Figure 5-29. The two strings
demonstrate that identical results are obtained even after we’ve
manipulated the StringBuilder
’s
contents.
Recipe 5.1 and
Recipe 5.27 discuss
the StringBuilder
class in more
detail.
You want to see a timing-test-based example that shows just how
much faster a StringBuilder
can be
than standard string concatenation.
Sample code folder: Chapter 05StringTime
Create a short routine to concatenate the string values of the
numbers 1 to 10,000, first using direct concatenation to a string
variable and then using a StringBuilder
. Use Date
variables to calculate elapsed time for
each loop in milliseconds, and dis-play the results of each for
comparison.
Here’s the code for doing the timing test. The two contestants
are ready for the race. content is a conventional immutable string,
and result
is the highly acclaimed
StringBuilder
challenger:
Dim content As String = "" Dim result As New System.Text.StringBuilder
The supporting cast of characters is ready to rally to the
cause. Here, counter
is a loop
counter, dateTime1
through dateTime3
are Date
variables to hold instants in time, and
loopCount
provides the number of
laps for the race:
Dim counter As Integer Dim dateTime1 As Date Dim dateTime2 As Date Dim dateTime3 As Date Dim loopCount As Integer = 15000
The flag is waved to start the race, and the starting time is noted very accurately:
Me.Cursor = Cursors.WaitCursor dateTime1 = Now
The first contestant runs all the loops, concatenating the
string representations of the numbers for each lap into one big string
named content
. The time of
completion is carefully noted:
For counter = 1 To loopCount content &= counter.ToString() Next counter dateTime2 = Now
The StringBuilder
now runs
the same laps, appending the same strings in its internal buffer. The
time at completion is accurately noted:
For counter = 1 To loopCount result.Append(counter.ToString()) Next counter dateTime3 = Now
The flag drops, signaling the crossing of the finish line for both contestants:
Me.Cursor = Cursors.Default
In a moment, the results of the race appear:
content = String.Format( _ "First loop took {0:G4} ms, the second took {1:G4} ms.", _ dateTime2.Subtract(dateTime1).TotalMilliseconds, _ dateTime3.Subtract(dateTime2).TotalMilliseconds) MsgBox(content)
The results are shown in the message box displayed in Figure 5-30. Due to differences between systems, your results may vary.
To be fair, this race was highly contrived to help point out the
difference in operational speed between string concatenation and
StringBuilder
appending. If you
create a loop in which the same strings are used each time, the timing
is much more equal. This is because Visual Basic handles immutable
strings very intelligently, reusing existing strings whenever possible
and hence speeding up repetitive operations involving the same data.
The test shown here creates a unique string for each concatenation by
converting the loop index number to a string, forcing a lot of extra
string creation and storage in memory during the loops.
When running this test yourself, you might need to adjust the
value of loopCount
for your system.
If the race seems to take too long, stop the program manually and
adjust loopCount
to a value a few
thousand lower; if the race is too fast, resulting in an apparent
elapsed time of 0ms for the StringBuilder
, bump up loopCount
by a few thousand, and try
again.
Recipe 5.1 and
Recipe 5.26 provide
additional discussion of strings and StringBuilder
instances.
You need to count occurrences of a specific word or substring in a string.
Sample code folder: Chapter 05CountSubstring
There are three standard approaches to this problem:
Use the regular expression object (System.Text. RegularExpressions.Regex
)to provide a
count of the number of matches on the string.
Use the Split()
function
to split the string using the specific substring as a split point,
then use the length of the resulting string array to determine the
count.
Loop through the string using the
IndexOf()
method to find all occurrences
of the substring.
This recipe’s sample code presents all three techniques. You can decide, based on your specific programming task, which will work best for you. Here’s the setup:
Imports System.Text.RegularExpressions ' …Later, in a method… Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim count1 As Integer Dim count2 As Integer Dim count3 As Integer
With the first technique, the
Regex.Matches()
method returns a collection
of matches on the searched-for string, and the collection’s Count
property provides the number we
want:
count1 = Regex.Matches(quote, "(in)+").Count
The second technique splits the string using the searched-for
string as the split point. The result of the split is a string array,
and its Length
is one greater than
the number of split points where each substring occurred:
count2 = Split(quote, "in").Length - 1
The third technique involves a little more coding, but no string
data is shuffled in memory during the search, resulting in an
efficient way to locate and count each occurrence of the searched-for
string. The IndexOf()
method
searches for the next occurrence of a string within another,
optionally starting the search at an indexed location within the
string:
Dim content As String = "in" Dim position As Integer = -content.Length Do position = quote.IndexOf(content, position + content.Length) If (position < 0) Then Exit Do count3 += 1 Loop
This lets the search proceed from occurrence to occurrence until
IndexOf()
runs out of matches and
returns an index of–1. count3
keeps
count of the number of times the IndexOf()
search is successful, providing a
count of the occurrences.
The last line of the example code formats and displays the three counts, as shown in Figure 5-31:
MsgBox(String.Format( _ "{0}{3}{1}{3}{2}", count1, count2, count3, vbNewLine))
You want to pad a string with spaces (or some other character) either on the head end, the tail end, or both ends, such that the resulting string is n characters in total length.
Sample code folder: Chapter 05PadString
Use the String. PadLeft()
and String.PadRight()
methods to pad the head
and tail ends of the string, respectively, and use a calculated
combination of these two methods to pad the string on both
ends.
The PadLeft()
and PadRight()
methods take a count value that
defines the target length of the string after sufficient spaces are
concatenated to it. An optional second parameter provides a character
to use for the padding if you want something other than spaces to be
used. In the first block of code the default space characters are used
for the padding:
Dim content1 As String Dim content2 As String Dim content3 As String Dim content4 As String content1 = "Not padded" content2 = "PadLeft".PadLeft(50) content3 = "PadRight".PadRight(50) content4 = "PadCenter" content4 = content4.PadLeft((50 + _ content4.Length) 2).PadRight(50) MsgBox(String.Format("{0}{4}{1}{4}{2}{4}{3}", _ content1, content2, content3, content4, vbNewLine))
The PadCenter()
calculation adds half of the
required padding characters to the head end of the string, then pads
out the right end to the target length. The PadLeft()
method is applied to the string
first, and the PadRight()
method is
applied to the result, all in a single line. Figure 5-32 shows the strings
with the padding causing the text to align to the left, right, and
middle, depending on where the padding was applied.
Padding with spaces is often what you want to do in a real-world application, but for display purposes it isn’t very helpful. In Figure 5-32, for instance, you can’t tell that “PadRight” has 50spaces at its end. Therefore, let’s recode this example, padding the strings with periods instead:
content1 = "Not padded" content2 = "PadLeft".PadLeft(50, "."c) content3 = "PadRight".PadRight(50, "."c) content4 = "PadCenter" content4 = content4.PadLeft((50 + content4.Length) 2, _ "."c).PadRight(50, "."c) MsgBox(String.Format("{0}{4}{1}{4}{2}{4}{3}", _ content1, content2, content3, content4, vbNewLine))
In this case, the same padding takes place, but with a period for the padding character. Figure 5-33 shows the result, which is more meaningful than Figure 5-32.
You need to convert a string’s tab characters to spaces while preserving the string’s spacing.
Sample code folder: Chapter 05TabsToSpaces
Create a function to convert tabs to spaces in the defined way:
Public Function TabsToSpaces(ByVal source As String, _ ByVal tabSize As Integer) As String ' ----- Replace tabs with space characters. Dim result As New System.Text.StringBuilder Dim counter As Integer For counter = 0 To source.Length - 1 If (source.Chars(counter) = vbTab) Then Do result.Append(Space(1)) Loop Until ((result.Length Mod tabSize) = 0) Else result.Append(source.Chars(counter)) End If Next counter Return result.ToString( ) End Function
The trick to replacing the tabs is to insert just the right
number of spaces to preserve the original alignment of the text. Tab
characters generally shift the next character to a position that is an
exact multiple of the tab spacing. In Visual Studio, this spacing
constant is often 4, but in many text editors, and even in the Windows
Forms TextBox
control, the standard
tab spacing is 8. The sample function accepts an argument to set the
tab-spacing constant to any value.
The function uses a StringBuilder
to rebuild the original
string, replacing tabs with enough spaces to maintain the alignment.
The Chars
property of the string
makes it easy to access and process each individual character from the
string, and the Mod()
function
simplifies the math checks required to determine the number of spaces
to insert.
This code shows the TabsToSpaces()
function in use:
Dim tabs As String = _ "This~is~~a~tabbed~~~string".Replace("~"c, vbTab) Dim spaces As String = TabsToSpaces(tabs, 8) Dim periods As String = spaces.Replace(" "c, "."c)
The first line builds a string comprised of words separated by
multiple tab characters. The tilde (~) characters provide a visual way
to see where the tabs will go, and the Replace()
method replaces each tilde with a
tab.
The second statement calls the new function and places the
returned string in spaces
. This
string contains no tab characters, but it does contain many spaces
between the words.
The periods
string provides a
visual way to see the spaces more clearly. The Replace()
method in this case replaces each
space with a period.
Figure 5-34 shows
these three strings displayed on a form containing three TextBox
controls. Setting the Font
property to Courier New, a fixed-width
font, more clearly shows the alignment of the characters in the
strings. The tab-spacing constant in these text boxes is 8, which is
the value passed to TabsToSpaces()
,
correctly replacing the tabs and maintaining the original
alignment.
Recipe 5.16 also discusses replacing substrings.
You want to reverse, or mirror image, the order of the characters in a string.
The StrReverse()
function
makes reversing a string simple:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim reversed As String = StrReverse(quote) MsgBox(reversed)
Figure 5-35 shows the reversed string as displayed in the message box.
Another way to reverse a string is to process the characters
yourself. This sample code scans through the string in reverse order
and appends each found character to a new StringBuilder
instance:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim counter As Integer Dim result As New System.Text.StringBuilder(quote.Length) For counter = quote.Length - 1 To 0 Step -1 result.Append(quote.Chars(counter)) Next counter Dim reversed As String = result.ToString() MsgBox(reversed)
The overloaded constructor for the StringBuilder
accepts an optional parameter
defining the capacity the StringBuilder
should use for its internal
character buffer. Since we know the reversed string will be the same
length as the original, the capacity can be set to exactly the amount
needed. This prevents the StringBuilder
from having to double its
capacity when it runs low on space while appending characters (see
Recipe 5.1). Using the
Chars
property of the string to
grab characters and setting the initial capacity of the StringBuilder
in this way ensures that the
character bytes are transferred in memory just once in a tight,
efficient loop.
You want to shuffle the order of the characters in a string quickly but thoroughly.
Sample code folder: Chapter 05StringShuffle
The best technique is to loop through each character location once, swapping the character at that location with a character at a random location anywhere in the string.
The basic algorithm for shuffling a string, as presented here, is also good for shuffling arrays or any other ordered data. This algorithm takes a finite amount of time to run, and the results are as random as the random number generator used.
A walk through the code explains the process clearly. These lines declare the variables required and initialize the random number generator to a unique sequence, using the system clock for the random number generator’s seed:
Dim counter As Integer Dim position As Integer Dim holdChar As Char Dim jumbleMethod As New Random Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein"
To manipulate the individual characters of the string, it’s best to convert the string to a character array:
Dim chars() As Char = CType(quote, Char())
This allows for swapping the characters in memory without having to make multiple copies of immutable
strings. You can directly access a string’s individual characters
using the string’s Chars
property,
but this property is read-only. In this case, we need to store new
characters into the string’s locations during each swap.
The following loop is the core of the shuffling algorithm:
For counter = 0 To chars.Length - 1 position = jumbleMethod.Next Mod chars.Length holdChar = chars(counter) chars(counter) = chars(position) chars(position) = holdChar Next counter
Each character is sequentially processed by swapping it with another character located randomly at any position in the string. This means that a character might even get swapped with itself occasionally, but that does not reduce the randomness of the results. This loop guarantees that each character gets swapped at least once, but statistically speaking each character gets swapped twice, on average.
The last two lines convert the character array back to a string and then display the result in a message box, as shown in Figure 5-36:
Dim result As String = New String(chars) MsgBox(result)
The sample string will be shuffled into a unique random order every time the sample code is run.
Recipes 6.27 and 8.5 show additional uses of random numbers.
You want to encrypt a string using a key. The encrypted result should be a displayable and printable string of standard ASCII characters.
Sample code folder: Chapter 05EncryptString
The following short class defines a SimpleCrypt
object containing shared
functions for encrypting and decrypting a string. In addition to the
string to be encrypted or decrypted, an integer is passed to each
function to serve as the key:
Public Class SimpleCrypt Public Shared Function Encrypt(ByVal source As String, _ ByVal theKey As Integer) As String ' ----- Encrypt a string. Dim counter As Integer Dim jumbleMethod As New Random(theKey) Dim keySet(source.Length - 1) As Byte Dim sourceBytes() As Byte = _ System.Text.Encoding.UTF8.GetBytes(source) jumbleMethod.NextBytes(keySet) For counter = 0 To sourceBytes.Length - 1 sourceBytes(counter) = _ sourceBytes(counter) Xor keySet(counter) Next counter Return Convert.ToBase64String(sourceBytes) End Function Public Shared Function Decrypt(ByVal source As String, _ ByVal theKey As Integer) As String ' ----- Decrypt a previously encrypted string. Dim counter As Integer Dim jumbleMethod As New Random(theKey) Dim sourceBytes() As Byte = _ Convert.FromBase64String(source) Dim keySet(sourceBytes.Length - 1) As Byte jumbleMethod.NextBytes(keySet) For counter = 0 To sourceBytes.Length - 1 sourceBytes(counter) = _ sourceBytes(counter) Xor keySet(counter) Next counter Return System.Text.Encoding.UTF8.GetString(sourceBytes) End Function End Class
The following code calls the shared functions of the SimpleCrypt
class to encrypt a sample string
using a key integer value of 123456789, and then decrypts the results
using the same key:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim myKey As Integer = 123456789 Dim encrypted As String = SimpleCrypt.Encrypt(quote, myKey) Dim decrypted As String = _ SimpleCrypt.Decrypt(encrypted, myKey) MsgBox(quote & vbNewLine & encrypted & vbNewLine & decrypted)
The encryption function first converts the string to a byte
array using UTF8 encoding. Each byte is then Xor'd
with a predictable sequence of
pseudorandom bytes seeded using the given key integer,
and the resulting byte array is converted back to a string. Since this
encrypted string likely contains ASCII characters in the range of
control and nonprintable characters, the string is then converted to a
slightly longer Base64 string comprised of displayable
characters.
The decryption function reverses the order of these same steps. First, the Base64 string is converted to a byte array, and the same set of pseudorandom bytes is Xor’d with these bytes to recover the bytes of the original string. Figure 5-37 shows the original string, the encrypted version of this string using a key value of 123456789, and the string that results by decrypting this Base64 string using the same key. As expected, the original string is restored.
The Random
object can return
an array of pseudorandom bytes with any desired length. This lets the
code generate the required number of bytes used in the Xor
process with only one call to the
Random
object.
The supplied key is any integer value from 0 to the maximum
value for signed integers, which is 2,147,483,647. You can use a
negative integer, but the Random
class will automatically take its absolute value as the seed.
With over two billion unique seeds, the average user won’t be able to break this simple encryption easily. For quick, simple, relatively secure encryption for typical users, this class can serve you well. However, in cryptographic circles this level of encryption is considered dangerously poor, so be sure to check out Chapter 16 if you need to use something more serious and well tested by the cryptographic community.
See Chapter 16 for more encryption topics.
Sample code folder: Chapter 05MorseCode
Use the IndexOf()
string
method to look up and cross-reference characters to string array
entries representing each Morse code character.
The following code converts the string “Hello world!” to a string that displays the Morse code “dahs” and “dits” for each character:
Dim source As String = "Hello world!" Dim characters As String = _ "~ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,:?'-/""" Dim morse( ) As String = { _ "?", ".-", "-…", "-.-.", "-..", ".", "..-.", "--.", "….", _ "..", ".---", "-.-", ".-..", "--", "-.", "---", ".--.", _ "--.-", ".-.", "…", "-", "..-", "…-", ".--", "-..-", _ "-.--", "--..", "-----", ".----", "..---", "…--", _ "….-", "…..", "-….", "--…", "---..", "----.", _ ".-.-.-", "--..--", "---…", "..--..", ".----.", _ "-….-", "-..-.", ".-..-."} Dim result As New System.Text.StringBuilder Dim counter As Integer Dim position As Integer For counter = 0 To source.Length - 1 position = characters.IndexOf(Char.ToUpper( _ source.Chars(counter))) If (position < 0) Then position = 0 result.Append(source.Substring(counter, 1)) result.Append(Space(5)) result.AppendLine(morse(position)) Next counter MsgBox(result.ToString( ))
For most people this code is not all that useful, but there are
some interesting details to be learned from this example. For
instance, the second line assigns the standard set of characters
covered by Morse code to a string named characters
. Notice that at the tail end of
this string there are three quote characters in a row. The last one
terminates the string, as expected, and the pair just before the last
one demonstrates how to enter a single-quote character into a string.
By doubling up the quote character, you tell the Visual Basic compiler
to enter one double-quote character and not to terminate the
string.
At the head of the characters
string is a tilde (~) character. This is not a Morse code character,
but it provides a way to catch all characters in the string to be
converted that aren’t found in the set of Morse code characters. For
example, in the test string “Hello world!” there’s an exclamation
point, which is not defined in the table of International Morse code
characters. When the IndexOf()
method attempts to find this exclamation point in characters
, a value of–1 is returned. This
value is changed to zero, which indexes to the question-mark sequence
in the Morse()
string array. Figure 5-38 shows how the
sample string ends up with a question mark instead of the unavailable
exclamation point.
You need to store and edit strings in an application’s resources. This makes it easy to internationalize the application by changing the strings for each culture.
To edit the resource strings in the Visual Studio environment, open the project’s properties page, and select the Resources tab on the left. Edit the table of string entries, changing the Name, Value, and Comment fields as required.
In the application, refer to each string through the My.Resources
object.
In Visual Studio, it’s very easy to maintain a table of strings in the application’s resources. Figure 5-39 shows the project’s properties page with the Resources tab selected along the left side.
The example shows two resource strings, one named Caption
and the other named Text
. As the following code shows, in the
application these two strings are referenced by name through the
My.Resources
object. This code then
displays a message box using the two strings from the resources, as
shown in Figure
5-40:
Dim stringText As String = My.Resources.Text Dim stringCaption As String = My.Resources.Caption MsgBox(stringText, , stringCaption)
Other types of resources can be added, such as images, sounds,
and other files. Each of these resources is accessed in the
application through the My.Resources
object.
See Chapter 10for an example of storing and using media files in your application’s resources.
Sample code folder: Chapter 05UseToString
Use the ToString()
method, which is included in all
.NET objects, to return a general string for an object instance. To
get you started, the following code demonstrates the default ToString()
method on several types of
variables:
Dim someInt As Integer = 123 Dim someDouble As Double = Math.PI Dim someString As String = "Testing" Dim someDate As Date = #7/4/1776 9:10:11 AM# Dim someDecimal As Decimal = 1D / 3D Dim result As New System.Text.StringBuilder result.Append("someInt.ToString ") result.AppendLine(someInt.ToString()) result.Append("someDouble.ToString ") result.AppendLine(someDouble.ToString()) result.Append("someString.ToString ") result.AppendLine(someString.ToString()) result.Append("someDate.ToString ") result.AppendLine(someDate.ToString()) result.Append("someDecimal.ToString ") result.Append(someDecimal.ToString()) MsgBox(result.ToString())
Figure 5-41 shows
the results displayed by the sample code. Default formatting is used
for all these ToString()
methods.
The ToString()
method is
often overloaded to support a variety of formatting options, depending
on the type of variable. This lets you convert doubles, for instance,
to scientific or other formats. Check the Visual Studio online help
resources for the ToString()
method
for each type of variable to discover the formatting options
available.
All objects sport a ToString()
method because all objects
inherit it from System.Object
. An
example used repeatedly throughout this chapter is the StringBuilder
class, which returns its
internal character buffer converted to a string through its ToString()
method.
As you create your own classes, consider adding both a ToString()
method and a corresponding
Parse()
method if the object’s
state can be represented as a string.
You want to extract all numbers from a string that has extra whitespace, text, and other nonnumeric characters interspersed throughout.
Sample code folder: Chapter 05RegexExtractNum
Use a regular expression (Regex
) object to identify and parse out a
list of all numbers in the string.
This is a very tricky problem if the exact format of the string is not known. Identifying exactly which sets of characters are parts of numbers with accuracy in all cases can be difficult. Negative signs, scientific notation, and other complications can arise. Fortunately, the regular expression object greatly simplifies the task. The fol-lowing code demonstrates how it works:
Imports System.Text.RegularExpressions ' …Later, in a method… Dim source As String = _ "This 321.0 string -0.020 contains " & _ "3.0E-17 several 1 2. 34 numbers" Dim result As String Dim parser As New _ Regex("[-+]?([0-9]*.)?[0-9]+([eE][-+]?[0-9]+)?") Dim sourceMatches As MatchCollection = _ parser. Matches(source) Dim counter As Integer result = "Count: " & _ sourceMatches.Count.ToString() & vbNewLine For counter = 0 To sourceMatches.Count - 1 result &= vbNewLine result &= sourceMatches(counter).Value.ToString() result &= Space(5) result &= CDbl(sourceMatches(counter).Value).ToString() Next counter MsgBox(result)
The string to be parsed is source
, which contains a variety of integer
and floating-point numbers, both positive and negative, with words and
other nonnumeric characters mixed in. A
Regex
object named parser
is instantiated using a specially
crafted regular expression designed to locate all conventionally
defined numbers. The Matches()
method of the Regex
object is
applied to the string, and a collection of Matches
is returned. This collection’s
Count
property provides a tally of
how many numbers were found in the string. Each item in the Matches
collection has a Value
property with a ToString()
method that converts the numeric
value to a string.
Figure 5-42 shows
the results of parsing the sample string, listing the numbers found
using the regular expression. The Matches
value displays the string exactly as
copied from the original string. That’s the first number on lines 2–7
in the message box. The second number shows the string converted to a
Double
and then back to a string.
The reason for this extra step is to verify that the match string does
convert to a numeric value.
Recipe 5.38 also discusses regular expression processing. The following web sites are just some of the many places on the Internet that provide regular expression samples:
http://www.regular-expressions.info/examples.html |
http://sitescooper.org/tao_regexps.html |
http://en.wikipedia.org/wiki/Regular_expression |
Sample code folder: Chapter 05RegexCountMatch
Use the Count
property of the
Matches()
method of the Regex
object.
The following example code shows how to use regular expressions
to count words in a string, as defined by the pattern w+
:
Imports System.Text.RegularExpressions ' …Later, in a method… Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim parser As New Regex("w+") Dim totalMatches As Integer = parser.Matches(quote).Count MsgBox(quote & vbNewLine & "Number words: " & _ totalMatches.ToString)
This example returns a count of the number of matches, not a collection of matches. Figure 5-43 shows the results as displayed by the message box.
This technique can be useful for many other types of regular expression searches, too. For example, the regular expression shown in Recipe 5.37 can be used to quickly determine the number of numbers of all types in a string of any size.
Recipes 5.13 and 5.37 discuss regular expression processing in additional detail.
You want to get the nth match of a regular expression search within a string.
Sample code folder: Chapter 05RegexMatchN
Use the Regex
object to
return a MatchCollection
based on the regular
expression. The nth match is accessed by indexing
item n–1 in the collection.
The following code finds all numbers in a sample string,
returning all matches as a MatchCollection
. In this example, the
regular expression accesses the third match in the zero-based
collection as item number 2:
Imports System.Text.RegularExpressions ' …Later, in a method… Dim source As String = "This 7. string -0.02 " & _ "contains 003.141600 several 0.9 numbers" Dim parser As New Regex( _ "[-+]?([0-9]*.)?[0-9]+([eE][-+]?[0-9]+)?") Dim sourceMatches As MatchCollection = _ parser.Matches(source) Dim result As Double = CDbl(sourceMatches(2).Value) MsgBox(source & vbNewLine & "The 3rd number: " & _ result.ToString())
Figure 5-44 shows the third number found in the string.
Recipe 5.37 discusses the specific regular expression pattern used in this recipe.
You want to compile a regular expression to maximize runtime speed.
Sample code folder: Chapter 05RegexDLL
There are two steps to this solution, best described by working through an example. The first step is to run the code to create the compiled DLL file, and the second is to use the new compiled regular expression in one or more applications.
First, run the following code one time only to compile and create a DLL file containing a regular expression, in this case using a pattern designed to find all numbers in a string:
Imports System.Text.RegularExpressions ' …Later, in a method… Dim numPattern As String = _ "[-+]?([0-9]*.)?[0-9]+([eE][-+]?[0-9]+)?" Dim wordPattern As String = "w+" Dim whichNamespace As String = "NumbersRegex" Dim isPublic As Boolean = True Dim compNumbers As New RegexCompilationInfo(numPattern, _ RegexOptions.Compiled, "RgxNumbers", _ whichNamespace, isPublic) Dim compWords As New RegexCompilationInfo(wordPattern, _ RegexOptions.Compiled, "RgxWords", whichNamespace, _ isPublic) Dim compAll( ) As RegexCompilationInfo = _ {compNumbers, compWords} Dim whichAssembly As New _ System.Reflection.AssemblyName("RgxNumbersWords") Regex.CompileToAssembly(compAll, whichAssembly)
This code creates a new file named RgxNumbersWords.dll that contains the compiled regular expression. The file is created in the same folder in which the executable program is located.
To use the new DLL in an application, you need to add a reference to it. Right-click on References in the Solution Explorer, click the Browse tab, find the DLL file in the folder where the application’s EXE file is located, and select it to add the reference. Figure 5-45 shows the new reference in the Solution Explorer.
You also need to import the namespace defined in this DLL into
your application. Either add an Imports
command at the top of your source
code or, in the Project Properties window, select the References tab,
and place a checkmark next to the name of the namespace, as shown in
Figure 5-46.
Once the new DLL is referenced and its object’s namespace has
been imported, you can use the compiled regular expression in an
application. The following code uses the new RgxNumbers
regular expression to count the
numbers in a string:
Imports System.Text.RegularExpressions ' …Later, in a method… Dim source As String = _ "Making a Pi (3.1415926) is easy as One 1 Two 2 Three 3" Dim parser As New RgxNumbers Dim totalMatches As Integer = parser.Matches(source).Count MsgBox(source & vbNewLine & "Number count: " & _ totalMatches.ToString())
Figure 5-47 shows the result of running this code to determine how many numbers are in the sample string.
Recipe 5.37 also discusses regular expression processing.
You need to validate string data entered by a user to ensure it meets defined criteria.
Sample code folder: Chapter 05RegexValidate
Use a regular expression to check the string to make sure it matches the type of data expected.
The Internet is a good place to find a wide range of regular
expressions to validate strings using specific rules, and this recipe
won’t attempt to list them all. Instead, the following code, which
validates a String
as an email
address, demonstrates a specific example to show you the general
technique involved:
Imports System.Text.RegularExpressions ' …Later, in a method… Dim testString As String Dim emailPattern As String = _ "^([0-9a-zA-Z]+[-._+&])*[0-9a-zA-Z]+@" & _ "([-0-9a-zA-Z]+[.])+[a-zA-Z]{2,6}$" testString = "[email protected]" MsgBox(testString & Space(3) & _ Regex. IsMatch(testString, emailPattern)) testString = "john@[email protected]" MsgBox(testString & Space(3) & _ Regex.IsMatch(testString, emailPattern))
This regular expression checks a string to see if it is a valid
email address. As shown in Figures 5-48 and 5-49, the first string passes
the test, but the second has a problem. In general, the IsMatch()
method returns True
if the string matches the criteria
defined in the regular expression and False
if it fails the test.
Recipe 5.22 also discusses data validation.
You want to count the characters, words, and lines in a string.
Sample code folder: Chapter 05RegexCountParts
Use separate regular expressions to count words, characters, and lines in a string of any length.
The following code demonstrates three very short regular expressions that provide simple counts of characters, words, and lines in a string of any length:
Imports System.Text.RegularExpressions ' …Later, in a method… Dim quote As String = _ "The important thing" & vbNewLine & _ "is not to stop questioning." & vbNewLine & _ "--Albert Einstein" & vbNewLine Dim numBytes As Integer = quote.Length * 2 Dim numChars As Integer = Regex.Matches(quote, ".").Count Dim numWords As Integer = Regex.Matches(quote, "w+").Count Dim numLines As Integer = Regex.Matches(quote, ".+ *").Count MsgBox(String.Format( _ "{0}{5}bytes: {1}{5}Chars: {2}{5}Words: {3}{5}Lines: {4}", _ quote, numBytes, numChars, numWords, numLines, vbNewLine))
The number of bytes in the string is also displayed, as shown in Figure 5-50, but the string’s Length property provides this count directly without having to resort to a regular expression.
Recipe 5.38 also discusses the results of regular expression processing.
You want to convert a string to or from Base64 format for predictable transfer across a network.
Sample code folder: Chapter 05Base64
To convert a string to Base64, first use System.Text.Encoding
methods to convert the
string to a byte array and then use the Convert.ToBase64String()
method to convert
the byte array to a Base64 string.
To convert a Base64 string back to the original string, use
Convert. FromBase64String()
to
convert the string to a byte array, and then use the appropriate
System.Text.Encoding
method to
convert the byte array to a string.
The following code demonstrates these steps as it converts a sample string to Base64 and back again:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim quoteBytes As Byte() = _ System.Text.Encoding.UTF8.GetBytes(quote) Dim quote64 As String = Convert.ToBase64String(quoteBytes) Dim byteSet As Byte() = Convert.FromBase64String(quote64) Dim result As String = _ System.Text.Encoding.UTF8.GetString(byteSet) MsgBox(quote & vbNewLine & quote64 & vbNewLine & result)
UTF8 encoding is used because the sample string’s characters all fall within the range of standard ASCII characters. For other character sets, it’s best to use Unicode encoding, in which case you should change both occurrences of “UTF8” to “Unicode” in the code sample. The byte array and the Base64 string will each be twice as large when using Unicode, but this eliminates the possibility of any data loss during the conversions.
Figure 5-51 shows the results of the above conversions as displayed by the message box.
Recipe 5.33 also shows how to convert string data into an alternative format that uses only printable characters.
You want to split a string using a multicharacter string rather
than a single character as the split point, but the String
object’s
Split()
method only splits using one or more
individual characters.
Sample code folder: Chapter 05SplitString
You can use the Visual Basic Split()
function instead of the String.Split()
method, or you can pass an
array of strings to String.Split()
.
The following code shows the differences between using the
Split()
function and the String.Split()
method:
Dim quote As String = "The important thing is not to " & _ "stop questioning. --Albert Einstein" Dim strArray1() As String = Split(quote, "ing") Dim strArray2() As String = quote.Split(CChar("ing")) Dim result As New System.Text.StringBuilder Dim counter As Integer For counter = 0 To strArray1.Length - 1 result.AppendLine(strArray1(counter)) Next counter result.AppendLine(StrDup(30, "-")) For counter = 0 To strArray2.Length - 1 result.AppendLine(strArray2(counter)) Next counter MsgBox(result.ToString())
String array strArray1
is
created by applying the Split()
function to the sample string, splitting the string at all occurrences
of “ing”. strArray2
uses the
String.Split()
method to do the
same thing. However, even though the string “ing” is passed to the
String.Split()
method to define the
split points, only the first character of this string, the character
“i,” is used to make the splits. The results of these two splits are
quite different, as shown in the output displayed in the message box
in Figure 5-52.
To confuse the issue even further, it is possible to use the
String.Split()
method to split a
string at whole substring boundaries, but only by passing an array of
strings to the method to define the split points (not just a simple
string) and passing a required parameter defining split options. The
following two lines of code demonstrate this technique, returning the
desired results. The first line uses the Visual Basic function, and
the second line uses the string array technique just described:
Dim strArray1() As String = Split(quote, "ing") Dim strArray1() As String = _ quote.Split(New String() {"ing"}, StringSplitOptions.None)
Both String()
options are
very powerful and useful, but you do need to use the correct one,
passing appropriate parameters.
Recipe 5.28 also
discusses string parsing using Split()
.
You want to create a string of n space characters.
The following sample code actually presents three different ways
to create a string of n spaces. In most cases the
Space()
function works quite well
to create the spaces, but it’s informative to compare the three
techniques:
Dim lotsOfSpaces1 As String = New String(" "c, 500) Dim lotsOfSpaces2 As String = StrDup(500, " "c) Dim lotsOfSpaces3 As String = Space(500) Dim result As String = String.Format( _ "Length of lotsOfSpaces1: {0}{3}" & _ "Length of lotsOfSpaces2: {1}{3}" & _ "Length of lotsOfSpaces3: {2}{3}", _ lotsOfSpaces1.Length, _ lotsOfSpaces2.Length, _ lotsOfSpaces3.Length, vbNewLine) MsgBox(result)
The String
constructor is
overloaded to initialize strings as they are created in several ways.
As shown in the first statement above, you can create a new string
comprised of n repetitions of any character (in
this case, a space character).
The StrDup()
function is
similar in operation in that it also returns a string comprised of
n occurrences of a given character. Both the
String
constructor and the StrDup()
function are useful when the
repeated character is something other than a space.
Finally, the Space()
function
returns a string comprised of n space characters,
without the option to use any other character.
The rest of the code displays the lengths of the three strings of spaces to help verify that they were created as indicated, as shown in Figure 5-53.
Recipe 5.2 discusses similar functionality.
18.119.133.160