Storing and manipulating text

The most common type of data for variables is text. The most common types in .NET for working with text are show in the following table:

Namespace

Type

System

Char

System

String

System.Text

StringBuilder

System.Text.RegularExpressions

Regex

Getting the length of a string

Add a new console application project named Ch04_ManipulatingText.

In Visual Studio 2017, set the solution's startup project to be the current selection.

Sometimes, you need to find out the length of a piece of text stored in a string class. Modify the code to look like this:

    using static System.Console; 
 
    namespace Ch04_ManipulatingText 
    { 
      class Program 
      { 
        static void Main(string[] args) 
        { 
          string city = "London"; 
          WriteLine($"{city} is {city.Length} characters long."); 
        } 
      } 
    } 

Note

At any point during these exercises, you can see the output of your code by running the console application. In Visual Studio 2017, press Ctrl+ F5. In Visual Studio Code, open the Integrated Terminal and enter the command dotnet run.

Getting the characters of a string

A string class uses an array of char internally to store the text. It also has an indexer, which means that we can use the array syntax to read its characters.

Add the following statement, and then run the console application:

    WriteLine($"First char is {city[0]} and third is {city[2]}."); 

Splitting a string

Sometimes, you need to split some text wherever there is a character, such as a comma.

Add more lines of code to define a single string with comma-separated city names. You can use the Split method and specify a character that you want to treat as the separator. An array of strings is then created that you can enumerate using a foreach statement:

    string cities = "Paris,Berlin,Madrid,New York"; 
    string[] citiesArray = cities.Split(','); 
    foreach (string item in citiesArray) 
    { 
      WriteLine(item); 
    } 

Getting part of a string

Sometimes, you need to get part of some text. For example, if you had a person's full name stored in a string with a space character between the first and last name, then you could find the position of the space and extract the first name and last name as two parts, like this:

    string fullname = "Alan Jones"; 
    int indexOfTheSpace = fullname.IndexOf(' '); 
    string firstname = fullname.Substring(0, indexOfTheSpace); 
    string lastname = fullname.Substring(indexOfTheSpace + 1); 
    WriteLine($"{lastname}, {firstname}"); 

Note

If the format of the initial full name was different, for example, Lastname, Firstname, then the code would be slightly different. As an optional exercise, try writing some statements that would change the input Jones, Alan into Alan Jones.

Checking a string for content

Sometimes, you need to check whether a piece of text starts or ends with some characters or contains some characters:

    string company = "Microsoft"; 
    bool startsWithM = company.StartsWith("M"); 
    bool containsN = company.Contains("N"); 
    WriteLine($"Starts with M: {startsWithM}, contains an N:
    {containsN}"); 

Other string members

Here are some other string members:

Member

Description

Trim, TrimStart, and TrimEnd

These trim whitespaces from the beginning and/or end of the string.

ToUpper and ToLower

These convert the string into uppercase or lowercase.

Insert and Remove

These insert or remove some text in the string.

Replace

This replaces some text.

String.Concat

This concatenates two string variables. The + operator calls this method when used between string variables.

String.Join

This concatenates one or more string variables with a character in between each one.

String.IsEmptyOrNull

This checks whether a string is empty ("") or null.

String.Empty

This can be used instead of allocating memory each time you use a literal string value using an empty pair of double quotes ("").

Note that some of the preceding methods are static methods. That means the method can only be called from the type, not from a variable instance.

For example, if I want to take an array of strings and combine them back together into a single string with a separator, I can use the Join method like this:

    string recombined = string.Join(" => ", citiesArray); 
    WriteLine(recombined); 

If you run the console application and view the output, it should look like this:

London is 6 characters long.
First char is L and third is n.
Paris
Berlin
Madrid
New York
Jones, Alan
Starts with M: True, contains an N: False
Paris => Berlin => Madrid => New York

Building strings efficiently

You can concatenate two strings to make a new string using the String.Concat method or simply using the + operator. But, this is a bad practice because .NET must create a completely new string in memory. This might not be noticeable if you are only adding two string variables, but if you concatenate inside a loop, it can have a significant negative impact on performance and memory use.

Note

In Chapter 5, Debugging, Monitoring, and Testing, you will learn how to concatenate string variables efficiently using the StringBuilder type.

Pattern matching with regular expressions

Regular expressions are useful for validating input from the user. They are very powerful and can get very complicated. Almost all programming languages have support for regular expressions and use a common set of special characters to define them.

Add a new console application project named Ch04_RegularExpressions.

At the top of the file, import the following namespaces:

    using System.Text.RegularExpressions; 
    using static System.Console; 

In the Main method, add the following statements:

    Write("Enter your age: "); 
    string input = ReadLine(); 
    Regex ageChecker = new Regex(@"d"); 
    if(ageChecker.IsMatch(input)) 
    { 
      WriteLine("Thank you!"); 
    } 
    else 
    { 
      WriteLine($"This is not a valid age: {input}"); 
    } 

Tip

Good Practice

The @ character in front of a string switches off the ability to use escape characters in a string. Escape characters are prefixed with a backslash (). For example, means a tab and means a new line. When writing regular expressions, we need to disable this feature. To paraphrase the television show, The West Wing, "Let backslash be backslash."

Run the console application and view the output.

If you enter a whole number for the age, you will see Thank you!

Enter your age: 34
Thank you!

If you enter carrots, you will see the error message:

Enter your age: carrots
This is not a valid age: carrots

However, if you enter bob30smith, you will see Thank you!

Enter your age: bob30smith
Thank you!

The regular expression we used is d, which means one digit. However, it does not limit what is entered before and after the digit. This regular expression could be described in English as, "Enter at least one digit character."

Change the regular expression to ^d$, like this:

    Regex ageChecker = new Regex(@"^d$");

Rerun the application. Now, it rejects anything except a single digit.

We want to allow one or more digits. To do this, we add a + (plus) after the digit expression. Change the regular expression to look like this:

    Regex ageChecker = new Regex(@"^d+$");

Run the application and see how the regular expression now only allows positive whole numbers of any length.

The syntax of a regular expression

Here are some common symbol combinations that you can use in regular expressions:

Symbol

Meaning

Symbol

Meaning

^

Start of input

$

End of input

d

A single digit

D

A single NON-digit

w

Whitespace

W

NON-whitespace

[A-Za-z0-9]

Range(s) of characters

[AEIOU]

Set of characters

+

One or more

?

One or none

.

A single character

{3}

Exactly three

{3,5}

Three to five

{3,}

Three or more

{,3}

Up to three

Examples of regular expressions

Here are some example regular expressions:

Expression

Meaning

d

A single digit somewhere in the input.

a

The a character somewhere in the input.

Bob

The word Bob somewhere in the input.

^Bob

The word Bob at the start of the input.

Bob$

The word Bob at the end of the input.

^d{2}$

Exactly two digits.

^[0-9]{2}$

Exactly two digits.

^[A-Z]{4,}$

At least four uppercase letters only.

^[A-Za-z]{4,}$

At least four upper or lowercase letters only.

^[A-Z]{2}d{3}$

Two uppercase letters and three digits only.

^d.g$

The letter d, then any character, and then the letter g, so it would match both dig and dog or any characters between the d and g.

^d.g$

The letter d, then a dot (.), and then the letter g, so it would match d.g only.

Tip

Good Practice

Use regular expressions to validate input from the user. The same regular expressions can be reused in other languages such as JavaScript.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.140.88