CHAPTER 3

image

String Manipulation

As I mentioned in Chapter 1, a JavaScript application is written entirely as a sequence of Unicode characters. This is not at all a feature exclusive to the JavaScript language. Other examples are HTML and CSS, to name just a few. Even the underlying protocol used by the World Wide Web is an entirely text-based communications protocol.

Data is often transmitted in the form of text, as it’s highly interoperable. This is due to the fact that all computers have the ability to work with Unicode. One attribute that makes JSON highly interoperable is that it’s composed of, and transmitted simply as, Unicode. For this reason, this book will work extensively with the creation, formation, and general manipulation of strings designed for both inbound and outbound traffic.

String Concatenation

The incorporation of the string literal makes creating strings an absolute cinch. As you may recall from Chapter 1, a string value represents a sequence of 0 or more finite Unicode characters. The reason why the definition of a string contains the word finite is due to the fact that JavaScript strings are immutable. In other words, a string’s value is a constant. While strings themselves are immutable, entirely new strings can be created simply by joining two strings together end-to-end, using the addition operator, as shown in Listing 3-1.

Listing 3-1. Joining Strings

1 var str = "Hello" + " World";
2 console.log( str );  //Hello World

Listing 3-1 demonstrates the union between the two string literals, Hello and  World, via the addition operator (line 1). The result of the union will be that of Hello World. This joining of strings, known as string concatenation, is the language’s simplest means of string manipulation. It is the concatenation of strings, which invites our application to build strings on the fly.

While concatenation is solely limited to strings, we can use the addition operator to coerce primitive values into their string representations. This offers our application the ability to capture its state as a singular string value, which can later be transmitted across the Internet. Consider the demonstration in Listing 3-2.

Listing 3-2. Formatting Data

1 var userName = "Ben";
2 var clickedButton = false;
3 var stringRepresentation = "username="+userName +"&clickedButton=" + clickedButton;
4 console.log( stringRepresentation);  // "username=Ben&clickedButton=false"

Listing 3-2 employs the use of the addition operator to convert and append the existing state of an application into that of a string value. This results in the production of a string containing the Unicode characters that read as "username=Ben&clickedButton=false".

The way in which our data is represented is referred to as a data format. It is the purpose of the data format to provide a structure that infers the meaning of all concatenated values. Relying on a lesser-known data format makes it difficult for the recipient to extract or analyze the individual values. This book will regard a variety of data formats similar to the preceding one, as well as JSON.

The String Object

The String object is a specialized object whose collective behaviors facilitate the manipulation of a string value. We will learn more of its behaviors in the upcoming sections.

Creating String Objects

Like all objects, a String object, is created using the keyword new followed by the constructor function of the object-type. As revealed by the syntax of the String’s constructor, String( string );, each instance must be provided with a string value at the time of its instantiation. Listing 3-3 demonstrates the provision of the string literal “test”.

Listing 3-3. Instantiating a String object

1 var strObject = new String( "test" );
2 console.log( strObject ) ;   //String { 0="t", 1="e", 2="s", 3="t" }

To keep things succinct, the string object in Listing 3-3 is provided with a string literal. However, it could have just as easily been supplied an identifier that evaluates to a string value. Upon the instance’s creation, the string object is returned and assigned to the strObject variable (line 1). As the assignment to a variable, we can continue to reference it and its many behaviors.

As revealed by the subsequent line (line 2), logging out the reference to our instance shows that the provided string is no longer retained in its original form. Instead, each character of the provided string has been separated and cataloged within our collection. Exploding the string into the individual characters of which it was composed becomes the foundation from which all manipulation occurs.

The Interface of the String Object

As outlined in Table 3-1, the interface of the String object offers a wide range of utility. Furthermore, it is inheritied by each instance to allow for the manipulation and formatting of the string value for which it is provided.

Table 3-1. String object’s Interface

Properties

Type

Description

length

Property

Returns the length of the string

toString

Method

Returns a string representation of the collection

charAt

Method

Returns the character at the specified index

indexOf

Method

Returns the position of the first occurrence of a substring

lastIndexOf

Method

Returns the last occurrence of a substring

match

Method

Matches a string with a pattern and returns all matches as an array

replace

Method

Replaces text in a string

slice

Method

Returns a section of a string, as indicated by a range

substr

Method

Returns a substring, as indicated by a start index, through a specified length

split

Method

Splits a string into substrings, using the specified separator, and returns them as an array

toUpperCase

Method

Converts all characters in the string to uppercase

toLowerCase

Method

Converts all characters in the string to lowercase

Image Note  A substring can be a singular character or a sequence of characters.

length

The length member is the only behavior that is not a method. The sole purpose of the length property is to obtain an accurate count of how many characters are retained within the collection. Both forms of access notation can access the length member, as well as those outlined in Table 3-1. Listing 3-4 makes use of dot notation.

Listing 3-4. Obtaining a String’s Length

1 var str = "test";
2 var strObject = new String( str );
3 console.log( strObject ) ;   //String { 0="t", 1="e", 2="s", 3="t" }
4 console.log( strObject.length );     // 4

Listing 3-4 begins by assigning the string literal "test" to the str variable (line 1). Next, we instantiate a string object and provide our str variable as the argument. The instance is then assigned as the reference to strObject (line 2). Utilizing our reference, we print its contents to the console (line 3). Last, utilizing the dot notation, we access the length property and print the resulting value to the console (line 4).

As you can see in Listing 3-4, the access of length results in the return of the amount of characters used to devise the original string. Understanding the total character count will be a great benefit when manipulating an ordered sequence of characters.

toString

The toString method, whose signature is that of toString();, is used to return the string representation of the value possessed by our collection. It is worth noting that the toString method does not return a string object, but rather the primitive-type string.

charAt

The charAt method, whose signature is that of charAt( index );, is used to return the cataloged character whose key matches the specified index. As the string object represents an ordered collection of characters, the first character’s index is always 0. Obtaining a character is as simple as providing an index to the method, as seen in Listing 3-5.

Listing 3-5. Obtaining Unicode Characters

var str = "Hello World";
var strObject = new String( str );
console.log( strObject.charAt( 0 ));  // H
console.log( strObject.charAt( 1 ));  // e
console.log( strObject.charAt( 2 ));  // l

Image Note  As an ordered collection, the returned value of length -1 will always be the index to the last character in the collection.

By pairing the charAt and the length property, we can automate our efforts by way of a for loop, as seen in Listing 3-6.

Listing 3-6. Iterating Through a String’s Characters

1 var str = "Hello World";
2 var strObject = new String( str );
3 var length = strObject.length;
4 for(var i=0; i<length; i++)  console.log( strObject.charAt(i) );

Listing 3-6 uses a for loop to print each sequential character (line 3). The loop begins with an initial variable, i, which is assigned the value of 0. In order to ensure that all characters are evaluated, the condition for the loop determines whether the current value of i is less than the total length of characters in the collection. As long as this condition evaluates to true, our statement is executed, and the post-operation increments i by a value of 1.

indexOf

While the charAt method aims to return a character at the specified index, the indexOf method provides the inverse behavior. Instead of supplying an index to obtain its corresponding character, the indexOf enables you to obtain the index whereby the first use of a specified subset occurs. Its signature, indexOf(subString[, startIndex]);, reveals that the method anticipates a possible two arguments. The first represents the subString, whose index we seek, while the second parameter, startIndex, represents an offset from which the search should begin. Because the startIndex is optional, we will only focus on the required parameter. (See Listing 3-7).

Listing 3-7. Obtaining the First Location for a Substring

1 var str = "Hello World";
2 var strObject = new String( str );
3 console.log( strObject.indexOf( "H" )); // 0

Listing 3-7 relies on indexOf to obtain the location for the first determined substring, "H", within our string value (line 3). As you may have expected, the result returned and output to the console is 0. It’s worth stressing that indexOf only returns the index of the first determined substring. Therefore, if the substring used happens to occur more than once in the collection, only the location of the first occurrence will be returned, as shown in Listing 3-8.

Listing 3-8. The Index of the First Matched Character ‘l’ is Returned

1 var str = "Hello World";
2 var strObject = new String( str );
3 console.log( strObject.indexOf( "l" )); // 2

If a sought substring does not exist within the collection, the resulting index will be that of -1. Because our ordered list can only possess a positive sequence of numbers, the evaluation of -1 offers our application the ability to determine whether or not an operation should take place via a control statement, as seen in Listing 3-9.

Listing 3-9. If the Index of -1 is Returned, the Substring is Not Present

1 var str = "Hello World";
2 var strObject = new String(str);
3 var index = strObject.indexOf(";");
4 if(index>-1)  //perform operation
5 else console.log("substring does not occur");

As shown in Listing 3-9, we can incorporate the value returned by indexOf to control the flow of our application. Listing 3-9 uses a conditional operation to determine whether the index returned is greater than -1. This signifies to our application that our collection possesses the substring being sought after, resulting in some unknown operation being performed. However, if the condition is not met, the application prints to the console "substring does not occur".

It’s worth stressing that indexOf accepts multiple characters. The preceding listings have only supplied a singular character. In addition to working with individual characters, indexOf can determine the starting index for a sequence of characters. This will be very beneficial when attempting to obtain the location of a substring that has multiple occurrences. Consider an example in which we are required to find a particular occurrence in a phrase that relies on repetition. (See Listing 3-10.)

Listing 3-10. The Index of the First Matched Substring is Returned

1 var str = "side beside besides the ocean";
2 var strObject = new String(str);
3 var index = strObject.indexOf("side");
4 if(index>-1) console.log(index);  // 0
5 else console.log("substring does not occur");

lastIndexOf

While the indexOf method returns the index of the first found occurrence, lastIndexOf returns the index of the last found occurrence of a substring. Similarly, if the string does not possess the provided substring, -1 is returned as the result.

The method’s signature, lastIndexOf(subString[, startIndex]);, is equal to that of indexOf. It expects at most two arguments; however, this book only employs the first. Listing 3-11 demonstrates how we can obtain the starting index for the last occurrence of "side" in our previous string.

Listing 3-11. Locating the Index of the Last Matched Substring

1 var str = "side beside besides the ocean";
2 var strObject = new String(str);
3 var index = strObject.lastIndexOf("side");
4 if(index>-1) console.log(index);  //14
5 else console.log("substring does not occur");

match

The match method, whose signature is match( pattern );, is used to locate character patterns within a string. An invocation of the match accepts a string value or a regular expression and returns an array containing all matched substrings of said search. Listing 3-12 demonstrates the provision of both parameters to the method.

Listing 3-12. Obtaining Matched Substrings

1 var str = "username=Ben&clickedButton=false";
2 var strObject = new String(str);
3 var stringMatches = strObject.match("username");
4 console.log(stringMatches);   // ["username"]
5 var patternMatches = strObject.match( /[^&]+/g );
6 console.log(patternMatches);  // ["username=Ben", "clickedButton=false"]

Listing 3-12 begins by assigning a formatted string to the str variable (line 1). From there, we provide it as the value to initialize our instance (line 2).

From there, the string "username" is provided as the pattern to locate within our string (line 3). This results in the return of an array containing all found matches. The array returned reveals that it has, in fact, located a match (line 4). Alternatively, we employ a regular expression pattern to locate any and all series of characters that do not possess the & token (line 5). The array returned reveals that is has, in fact, located two matches (line 6).

replace

The replace method, whose signature is replace(pattern, replaceText);, can be used to exchange a matching substring with that of another. Whether or not a match is found, the method will result in the return of a string value. Listing 3-13 utilizes the replace method to substitute all found occurrences of the substring "Hello" with that of "Goodbye".

Listing 3-13. Replacing Matched Substrings

1 var str = "Hello World";
2 var strObject = new String( str );
3 var result = strObject.replace( "Hello", "Goodbye" );
4 console.log( result );    //Goodbye World
5 console.log( strObject ); //String { 0="H", 1="e", 2="l", 3="l", 4="o", 5=" ", ...//truncated }

Listing 3-13 employs the replace method in order to substitute the substring "Goodbye" for all determined occurrences of the substring "Hello". You may note that I assign the resulting string to a variable labeled result (line 3). Because strings are immutable, meaning they cannot be altered, the result of the behavior produces an entirely new string. It does not attempt to alter the variable it was initially supplied. Furthermore, as illustrated on line 5, use of the behaviors possessed by our string object will not alter the initial characters cataloged by the collection.

Image Note  All strings returned by the methods of a string object are the creation of a new string.

slice

The slice method is used to return a substring of the collection determined by a range of indexes. The method, as revealed by its signature, replace(start, [end]);, requires a starting index and an optional ending index. All characters located at the starting index and up to, but not including, the ending index will be returned to the caller of the method. If the end index is not specified, the substring reflects every subsequent character beyond the starting index. Listing 3-14 demonstrates how we can extract the word Hello from our string literal by utilizing the slice method.

Listing 3-14. Extracting Substrings with slice

1 var str = 'Hello World';
2 var strObject = new String(str);
3 var index = strObject.indexOf('o'), //4;
4 var result = strObject.slice(0, index);
5 console.log(result); //Hell
6 console.log(strObject.slice(0, index + 1)); //Hello

Listing 3-14 demonstrates the extraction of the word Hello from our string with the use of the slice method. Because we know that Hello begins at index 0, we simply have to determine which index is used to signify the boundary of our substring. It is important to note that slice returns the sequence of characters from the start index up to, but not including, the ending index. This is why line 4 outputs Hell rather than Hello.

Because the returned substring will always be one character less than that specified, the supplied index must always reflect one position more than we seek to obtain. The solution is to add 1 to the determined index (line 6).

substr

The substr method is used to return a substring within a specific range. The substr method is similar to the slice method in that it can be used to obtain a substring within a given boundary. As depicted by the signature substr(start [, length ]);, the substr method can accept two parameters; however, only the first is required.

The required parameter, start, signifies where the substring to extract begins. This value can be followed by an optional number of characters to include in the returned substring. The key difference between substr and slice is that the length does not indicate an index. Instead, it indicates the total number of characters (including the character at the specified start) to return in the substring. Listing 3-15 demonstrates how we can extract the word World from the string, utilizing the substr method.

Listing 3-15. Extracting Substrings with substr

1 var str = 'Hello World';
2 var strObject = new String(str);
3 var startIndex = strObject.indexOf('W'), //6;
4 var length = (new String('World')).length; //4
5 var result = strObject.substr(startIndex, length );
6 console.log(result); //World

Listing 3-15 begins by obtaining the starting index for our substring, 'World' (line 3). Once we have obtained its index, we can supply it to our substr method as the starting index. Additionally, we can provide an optional number of characters, which will determine how many subsequent characters beyond the starting point to be returned.

In this case, I have opted to supply the length of characters possessed by the substring 'World'. This is achieved by creating a second string object, supplying it with the string 'World', and obtaining its character count by way of the length attribute (line 4). This value is then supplied as the argument that identifies the total length of characters to include in the substring (line 5).

Image Note  If the optional parameter length is omitted, all characters, from the start index to the end of the string, will be returned.

split

The split method is used to split a string into substrings and return them as the values of an array. As revealed by the method’s signature split(separator[, limit]);, the method expects to receive at most two arguments. The first argument, labeled separator, is required, while the latter argument, limit, remains optional. This book will only make use of the separator parameter. The separator argument is used to define the delimiters that define the boundaries of substrings captured within the provided string. Listing 3-16 contains one such string, whereby substrings are delimited by way of an ordinary comma.

Listing 3-16. Separating a Comma-Delimited String

1 var strObject = new String('ben,mike,ivan,kyle'),
2 console.log( strObject.split(',') );  // ['ben','mike','ivan','kyle']

Listing 3-16 instantiates a string object and supplies it with a comma-delimited list of names (line 1). Next, we invoke the split method and supply it with the substring used to separate each name. In this particular case, that substring is a comma, resulting in the return of an ordered collection of all names (line 2).

toUpperCase

The toUpperCase method is used to convert all characters within a string to uppercase. The method does not accept any parameters, and it will be applied to an entire string, as seen in Listing 3-17.

Listing 3-17. Capitalizing All Alphabetic Characters

1 var strObject = new String( 'Hello World' );
2 console.log( strObject.toUpperCase() );  // HELLO WORLD

toLowerCase

Conversely, unlike the toUpperCase method, the toLowerCase method is used to convert all alphabetic characters within a string to lowercase, as seen in Listing 3-18.

Listing 3-18. Applying Lowercase to All Alphabetic Characters

1 var strObject = new String( 'Hello World' );
2 console.log( strObject.toLowerCase() );  // hello world

Aside from the obvious use for the toUpperCase and toLowerCase methods, there is yet another reason they will be used throughout this book. When working with text, the use of capitalization or lack thereof is to be expected. However, this makes it difficult to compare two strings within a language that is case-sensitive. Listing 3-19 compares strings that will always fail, due to the inconsistent use of letter casing.

Listing 3-19. Comparisons Are Case-Sensitive

1 console.log('Hello World' === 'hello world' ); //false
2 console.log('Hello world' === 'hello world' ); //false
3 console.log('HELLO WORLD' === 'Hello World' ); //false

While the characters used in both words may appear equal to us, they are definitely not viewed as the same by a computer. This is because computers view uppercase and lowercase letters as different Unicode values. Therefore, to ensure that casing is not an issue during the comparison of strings, we will often use toUpperCase and toLowerCase before comparing them.

The Implicit String Object

The preceding listings make explicit use of the string object, in order to tap into its many behaviors. While a string object adds great value, it comes at the cost of its syntactical overhead. Consider Listing 3-4, which required the instantiation of a string object simply to obtain the length of characters used to devise a string. To ease this burden for developers, the JavaScript language does, in fact, offer us the best of both worlds.

As mentioned in Chapter 1, primitive values are not objects and, therefore, cannot possibly possess key/value pairs. Any attempt to access a property of a string, or any primitive type for that matter, would ordinarily throw a SyntaxError. However, JavaScript seeks to reduce the syntactical overhead by allowing the behaviors of the string object to be accessed through a primitive string via access notation. Doing so prompts the engine to instantiate a string object on our behalf, using the target string as its argument. Once the instance is created, the accessed behavior is fulfilled by the instance itself. Listing 3-20 demonstrates how the interface of the string object can be accessed indirectly through a string value.

Listing 3-20. Implicit Use of the String object

1 var strLiteral = 'Hello World';
2 console.log( strLiteral.toLowerCase() );  // hello world
3 console.log( strLiteral.length );  // 11
4 console.log( strLiteral.substr(0 , 5 ));  // Hello

Listing 3-20 begins by assigning the string literal 'Hello World' to the variable strLiteral (line 1). From there, each subsequent line of code relies on dot notation to reference a behavior of the string object. Because the engine recognizes that a string does not possess any attributes, behind the scenes, it instantiates a string object, supplies it with the value of strLiteral, and returns the resulting value. The result is precisely the same as if we instantiated the string object ourselves, only without the syntactical overhead. For this reason, you should never have to instantiate a string object directly.

Summary

This chapter has introduced you to the behaviors of the String object, which will be employed extensively in the upcoming chapters. Each behavior covered offers our applications the necessary ability to work extensively with strings.

When it comes to string manipulation, you will find that there is no right way or wrong way to get something done. It’s as the old adage goes, “There is more than one way to skin a cat.”

Key Points from This Chapter

  • There is a corresponding object for each primitive type.
  • A data format refers to the way data is assembled.
  • The addition operator is used to capture application logic within a string.
  • The string primitive has pseudo members that can be accessed with access notation.
  • The behaviors of the string object can be used indirectly.
  • The HTTP protocol transmits text.
  • The comparison between strings does not ignore case.
  • Manipulating a string does not alter the original.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.22.23