The most common type of data for variables is text. The most common types in .NET for working with text are show in the following table:
Namespace |
Type |
|
|
|
|
|
|
|
|
Add a new console application project named Ch04_ManipulatingText
.
In Visual Studio 2017, set the solution's startup project to be the current selection.
Sometimes, you need to find out the length of a piece of text stored in a string
class. Modify the code to look like this:
using static System.Console; namespace Ch04_ManipulatingText { class Program { static void Main(string[] args) { string city = "London"; WriteLine($"{city} is {city.Length} characters long."); } } }
A string
class uses an array of char
internally to store the text. It also has an indexer, which means that we can use the array syntax to read its characters.
Add the following statement, and then run the console application:
WriteLine($"First char is {city[0]} and third is {city[2]}.");
Sometimes, you need to split some text wherever there is a character, such as a comma.
Add more lines of code to define a single string with comma-separated city names. You can use the Split
method and specify a character that you want to treat as the separator. An array of strings is then created that you can enumerate using a foreach
statement:
string cities = "Paris,Berlin,Madrid,New York"; string[] citiesArray = cities.Split(','); foreach (string item in citiesArray) { WriteLine(item); }
Sometimes, you need to get part of some text. For example, if you had a person's full name stored in a string with a space character between the first and last name, then you could find the position of the space and extract the first name and last name as two parts, like this:
string fullname = "Alan Jones"; int indexOfTheSpace = fullname.IndexOf(' '); string firstname = fullname.Substring(0, indexOfTheSpace); string lastname = fullname.Substring(indexOfTheSpace + 1); WriteLine($"{lastname}, {firstname}");
Sometimes, you need to check whether a piece of text starts or ends with some characters or contains some characters:
string company = "Microsoft"; bool startsWithM = company.StartsWith("M"); bool containsN = company.Contains("N"); WriteLine($"Starts with M: {startsWithM}, contains an N: {containsN}");
Here are some other string
members:
Member |
Description |
|
These trim whitespaces from the beginning and/or end of the |
|
These convert the |
|
These insert or remove some text in the |
|
This replaces some text. |
|
This concatenates two |
|
This concatenates one or more |
|
This checks whether a |
|
This can be used instead of allocating memory each time you use a literal |
Note that some of the preceding methods are static methods. That means the method can only be called from the type, not from a variable instance.
For example, if I want to take an array of strings and combine them back together into a single string
with a separator, I can use the Join
method like this:
string recombined = string.Join(" => ", citiesArray); WriteLine(recombined);
If you run the console application and view the output, it should look like this:
London is 6 characters long. First char is L and third is n. Paris Berlin Madrid New York Jones, Alan Starts with M: True, contains an N: False Paris => Berlin => Madrid => New York
You can concatenate two strings to make a new string
using the String.Concat
method or simply using the +
operator. But, this is a bad practice because .NET must create a completely new string
in memory. This might not be noticeable if you are only adding two string
variables, but if you concatenate inside a loop, it can have a significant negative impact on performance and memory use.
In Chapter 5, Debugging, Monitoring, and Testing, you will learn how to concatenate string
variables efficiently using the StringBuilder
type.
Regular expressions are useful for validating input from the user. They are very powerful and can get very complicated. Almost all programming languages have support for regular expressions and use a common set of special characters to define them.
Add a new console application project named Ch04_RegularExpressions
.
At the top of the file, import the following namespaces:
using System.Text.RegularExpressions; using static System.Console;
In the Main
method, add the following statements:
Write("Enter your age: "); string input = ReadLine(); Regex ageChecker = new Regex(@"d"); if(ageChecker.IsMatch(input)) { WriteLine("Thank you!"); } else { WriteLine($"This is not a valid age: {input}"); }
Good Practice
The @
character in front of a string
switches off the ability to use escape characters in a string
. Escape characters are prefixed with a backslash (). For example,
means a tab and
means a new line. When writing regular expressions, we need to disable this feature. To paraphrase the television show, The West Wing, "Let backslash be backslash."
Run the console application and view the output.
If you enter a whole number for the age, you will see Thank you!
Enter your age: 34 Thank you!
If you enter carrots
, you will see the error message:
Enter your age: carrots This is not a valid age: carrots
However, if you enter bob30smith
, you will see Thank you!
Enter your age: bob30smith Thank you!
The regular expression we used is d
, which means one digit. However, it does not limit what is entered before and after the digit. This regular expression could be described in English as, "Enter at least one digit character."
Change the regular expression to ^d$
, like this:
Regex ageChecker = new Regex(@"^d$");
Rerun the application. Now, it rejects anything except a single digit.
We want to allow one or more digits. To do this, we add a +
(plus) after the digit expression. Change the regular expression to look like this:
Regex ageChecker = new Regex(@"^d+$");
Run the application and see how the regular expression now only allows positive whole numbers of any length.
Here are some common symbol combinations that you can use in regular expressions:
Symbol |
Meaning |
Symbol |
Meaning |
|
Start of input |
|
End of input |
|
A single digit |
|
A single NON-digit |
|
Whitespace |
|
NON-whitespace |
|
Range(s) of characters |
|
Set of characters |
|
One or more |
|
One or none |
|
A single character | ||
|
Exactly three |
|
Three to five |
|
Three or more |
|
Up to three |
Here are some example regular expressions:
Expression |
Meaning |
|
A single digit somewhere in the input. |
|
The |
|
The word |
|
The word |
|
The word |
|
Exactly two digits. |
|
Exactly two digits. |
|
At least four uppercase letters only. |
|
At least four upper or lowercase letters only. |
|
Two uppercase letters and three digits only. |
|
The letter |
|
The letter |
18.118.140.88