© Russ Ferguson and Keith Cirkel 2017
Russ Ferguson and Keith CirkelJavaScript Recipes10.1007/978-1-4302-6107-0_20

20. Working with Regular Expressions

Russ Ferguson and Keith Cirkel2
(1)
Ocean, New Jersey, USA
(2)
London, UK
 

What Is a Regular Expression?

Problem

You want to know what a regular expression is and how to use one in JavaScript.

Solution

Regular expressions are patterns used to match characters . In JavaScript, they are also objects.

The Code

//regular expression literals
var companyBio = 'Twitter is an online social networking service that enables users to send and read short 140-character messages called "tweets".';
var simplePattern = /(twitter)/gi;
var simplePatternConstructor = new RegExp(simplePattern);
console.log(simplePatternConstructor.exec(companyBio));  //returns ["Twitter", "Twitter", index: 0, input: "Twitter is an online social networking service tha...ead short 140-character messages called "tweets"."]
Listing 20-1.
Creating a Simple Regular Expression

How It Works

Regular expressions are objects that describe a pattern of characters in text. They are often used with search functions. You can create a pattern that will be used on a single line of text or an entire document.
Regular expressions can be created one of two ways. One simple way of creating a regular expression is to have a string inside a set of forward slashes (/). The other way of creating a regular expression is to use the constructor. This way is preferred if you think that the pattern may change or come from an outside source.
In both of these cases, you are looking for a direct match.

How Do Regular Expression Flags Work?

Problem

You want to know how to take advantage of a flag when using regular expressions.

Solution

Flags in regular expressions give you extra functions when running the expression code.

The Code

//regular expression with flags
var words = `Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks.`;/
        words += ` Moff mon darth solo jabba yavin darth. skywalker endor k-3po mon fett binks.`;
var multiLineExpression = /(Skywalker)/gi; //a match group that is global and ignores case
 var multiLineResult = multiLineExpression.exec(words);
console.log(multiLineResult);
Listing 20-2.
Using Flags as Part of the Regular Expression

How It Works

Flags or modifiers customize a regular expression to give it extra functions while it is performing a search. If you need to search multiple lines or find a word regardless of its case, a flag will help you perform that type of search.
Global (g): This will tell the engine not to stop after the first match has been found.
Multiline (m): This will force the (^) hat or carrot and the dollar sign ($) to match the beginning and end of each line in a multiline document.
Case insensitive (i): This flag will have the search ignore the case of the string being searched.
Ignore whitespace (x): This flag will ignore all the whitespace inside a search.
Unicode (u): Strings are treated at UTF-16.
Sticky (y): This feature has a Boolean value where a search can begin not from the beginning, but from the last index that was found.
In this example, we have a multiline string with two regular expressions . The expression creates a match group to group a set of characters. The flags being used (gi) will make this a global search and will ignore the case. The search will result in two matches.

How Do You Match Literal and Special Characters ?

Problem

In your regular expression, you want to find a certain phrase even if it has special characters.

Solution

Using the dot (.) can help you find certain characters and the backslash () escapes special characters.

The Code

var words = "Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks.";
       words += " Moff mon darth solo jabba yavin darth. skywalker endor k-3po mon fett binks.";
var multiLineExpression = /(darth.)/gi; //a match group that includes the period
var multiLineResult = multiLineExpression.exec(words);
console.log(multiLineResult); //returns ["darth.", "darth.", index: 32, input: "Moff mon darth solo jabba yavin darth. Skywalker e...avin darth. skywalker endor k-3po mon fett binks."]
Listing 20-3.
Searching for Characters Even if There Are Special Characters in the Search

How It Works

The dot (.) acts as a wildcard. It can be used to match any single character. If the search needs to include special characters, those characters would need to be excepted. Using the backslash (), you can escape the special character that is part of your search.
In this example, the search group is similar to the example in Listing 20-2. The important difference is the forward slash that escapes that dot. This dot is not part of the search results.
When looking for other special characters , you can search for digits (d), whitespace (s), and alphanumeric letters with digits (w). You can also search for non-digits (D), non-whitespace (S), and any non-alphanumeric characters (W).

How Do You Use Conditions in a Search?

Problem

You want to add conditions to your searches.

Solution

The pipe (|), which is also used in if statements, can be used to add a logical condition to a regular expression.

The Code

var words = "Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks.";
var multiLineExpression = /darth|solo/g;  //search vader or solor
var multiLineResult = multiLineExpression.exec(words);
console.log(multiLineResult); //returns ["darth", index: 9, input: "Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks."]
var groupOfWords = 'cats, bats, dogs, logs, cogs';
var groupRegX = /[cb]ats|[dl]ogs/; //search cats, bats, dogs, logs
console.log(groupRegX.exec(groupOfWords)); //returns ["cats", index: 0, input: "cats, bats, dogs, logs, cogs"]
Listing 20-4.
Using the Logical Operator or Pipe (|) to Make a Choice of One or the Other

How It Works

There may be a situation where you want to search for one thing or another. To add this condition to your regular expression, you can add a pipe (|) to your search.
The first example allows you to search for the instance of two different words. The second example allows you to search for instance of words with a combination of results—if a word starts with ‘c’ or ‘b’ (cats or bats) or if a word starts with ‘d’ or ‘l’ (dogs or logs).

How Do You Search for Characters in a Certain Range?

Problem

You want to find certain characters inside a range of characters.

Solution

Using bracket notation ([ ]) with a dash ( - ), you can create a range of values to use in the search.

The Code

var textWithNumbers = 'USS Enterprise 1701-D';
var serachNumbers  = /[0-9]/;
var serachNumbersGreedy  = /[0-9]+/;
console.log(serachNumbers.exec(textWithNumbers));  //returns ["1", index: 15, input: "USS Enterprise 1701-D"]
console.log(serachNumbersGreedy.exec(textWithNumbers)); //returns ["1701", index: 15, input: "USS Enterprise 1701-D"]
Listing 20-5.
Searching for Certain Characters in a Range

How It Works

Square bracket notion is used to create ranges for your search. For example, if you wanted to search for capital letters [A-Z] or if you are looking for numbers between 0 and 9 [0-9]. When looking for characters this way, one of the important things to keep in mind is that the characters are sequential.
Some shorthand for looking for character ranges is to use w. This will look for both upper- and lowercase letters in addition to numbers between 0 to 9. If you’re looking for digits, you can use d for the search, whereas D (notice the capital letter) is used with any non-digit character. The second example creates a greedy search. This will return one or more results, not stopping at the first result.

How Do You Use Anchors?

Problem

You want to know how to use anchors in a regular expression.

Solution

Anchors specify a position where a match happens in a string.

The Code

var ipsumString = 'Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks.';
    ipsumString += 'Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks.';
var startAnchor = /^M/;
var multiLineAnchor = /^M/m;
var endOfLineAnchor = /binks.$/gm;
var firstInstanceAnchor = /ar/;
var startOrEndAnchor = /^darth|binks.$/;
console.log(startAnchor.exec(ipsumString)); //returns M
console.log(multiLineAnchor.exec(ipsumString)); //returns M
console.log(endOfLineAnchor.exec(ipsumString)); //returns binks.
console.log(firstInstanceAnchor.exec(ipsumString)); //returns ar
console.log(startOrEndAnchor.exec(ipsumString)); //returns binks.
Listing 20-6.
Anchors Will Specify Exactly Where a Match Happens

How It Works

Anchors let you specify exactly where a match should happen. It keeps the engine from searching through the entire string to find a match and brings you directly to the location where the match occurs. It is recommended to use anchors whenever you can.
A caret (^) is used in most engines to make sure the current position in the string is the beginning position.
The startAnchor variable will only find a match if the M is at the start of a string. If in multiline mode, it will find the M at the start of every string.
The dollar sign ($) will tell the engine to stop at the end of a string. Using endOfLineAnchor, you can search for the last word in the line. This example added the global and multiline modifiers to create two matches.
Using ar will find the first instance of “ar” in the first line. If the gm modifiers are added, it would return every instance of “ar” in every line.
The startOrEndAnchor lets you check if “darth” is at the beginning of a string or if “binks.”. In this instance, the result will return binks.

How Do You Use Matching Quantifiers?

Problem

You want to know what quantifiers in a regular expression are.

Solution

Quantifiers tell the engine how many instances of a character must exist in order for a match to be found.

The Code

var ipsumString = 'Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks.';
    ipsumString += 'Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks.';
var greedyQuantifier = /b+/;
var docileQuantifier = /.*k-/;
var lazyQuantifier = /jabba*?/;
var helpfulQuantifier = /.*?yavin/;
console.log(greedyQuantifier.exec(ipsumString)); //returns bb from the first instance of jabba
console.log(docileQuantifier.exec(ipsumString)); //returns Moff mon darth solo jabba yavin darth. Skywalker endor k-
console.log(lazyQuantifier.exec(ipsumString)); //returns jabb
console.log(helpfulQuantifier.exec(ipsumString)); //returns Moff mon darth solo jabba yavin
Listing 20-7.
Quantifiers Can Be Greedy, Docile, Lazy, and Helpful

How It Works

Quantifiers can be broken down into different categories; here are a few of them:
  • Greedy: This will tell the engine to match as many instances of the pattern as possible. Using a plus sign (+) as part of the expression will make the expression greedy. In this instance, you will return a result if the engine can find a match of one or more. We return bb from the first instance of jabba.
  • Docile: Using the dot (.), the expression starts out greedy, matching any characters except a new line ( ). The asterisk (*) will start to give back characters as needed. So it starts at the beginning of the line (the far left) and selects the entire line. Then it will give characters up from the right of the line until there is a match. If the k was the only character, then the period and s would be given up (.s in binks.). Adding the dash (-) gives up all the characters from the right up to the dash. The search will return Moff mon darth solo jabba yavin darth. Skywalker endor k-‘ and leave ‘3po mon fett binks.
  • Lazy: Sometimes called reluctant, this tries to match as few items as needed. Using a question mark (?) will turn an expression lazy. This expression will start looking for the first instance of jabb in jabba. Because the last ‘a’ is followed by the asterisk (*) and question mark (?), the search gives up the last character.
  • Helpful: Similar to the last example. This is also a lazy type of search. The search will return between zero and unlimited results, but run as few times as possible. This search starts at the left and will expand up to the word yavin.

What Are Capture Groups ?

Problem

You wonder how to group information for further processing.

Solution

You can group information for processing using parentheses ().

The Code

var ipsumString = 'Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks.';
    ipsumString += 'Moff mon darth solo jabba yavin darth. Skywalker endor k-3po mon fett binks. file_record_transcript.pdf';
var groupOfFilesNoExtention = /(.*).pdf/;
var groupOfFilesWithExtention = /(.*.pdf)/;
console.log(groupOfFilesNoExtention.exec(ipsumString)); //returns file_record_transcript
console.log(groupOfFilesWithExtention.exec(ipsumString)); //returns file_record_transcript.pdf
Listing 20-8.
Creating a Capture Group Using Parentheses

How It Works

It is possible to group information for further processing using parentheses. You can create a subpattern and capture it as a group.
The first example creates a capture group. This group will capture any characters except a newline character because of the dot (.). Then it will act greedy by using the asterisk (*) and capture between zero and unlimited amount of results. Outside the parentheses, we now start to narrow down the search and match the period (.) and the letters “pdf” exactly. The result is that the search will ignore everything and focus on the filename, excluding the extension.
The other example will do the same thing; however, the extension is part of the group. This will make the engine include the file extension.

What Are Lookaheads?

Problem

You want to know how to use a lookahead in a regular expression.

Solution

Lookaheads look for a certain kind of match, even when the expression is or isn’t followed by a pattern.

The Code

var textWithNumbers = '1701-D, 1701';
var noLetterLookaHead = /1701(?!-D)/; //returns the second set ignoring other characters
var withLetterLookaHead = /1701(?=-D)/;  //only returns the version with -D at the end
console.log(noLetterLookaHead.exec(textWithNumbers));
console.log(withLetterLookaHead.exec(textWithNumbers));
Listing 20-9.
Using a Lookahead to Find the Characters 1701

How It Works

Lookaheads will find a match if the string is or isn’t followed by the pattern. The first example has a pattern where the characters (-D) are not followed by the numbers 1701. It would ignore the first instance and go straight to the second instance. The other example will only return the version that contains (-D) at the end.
In both instances, the search is for the number , but only when that the number is followed by the pattern (-D).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.95.74