Repeated occurrences

So far, we saw how we can match fixed characters or numeric patterns. Most often, you want to handle certain repetitive natures of patterns also. For example, if I want to match 4 as, I can write /aaaa/, but what if I want to specify a pattern that can match any number of as?

Regular expressions provide you with a wide variety of repetition quantifiers. Repetition quantifiers let us specify how many times a particular pattern can occur. We can specify fixed values (characters should appear n times) and variable values (characters can appear at least n times till they appear m times). The following table lists the various repetition quantifiers:

  • ?: Either 0 or 1 occurrence (marks the occurrence as optional)
  • *: 0 or more occurrences
  • +: 1 or more occurrences
  • {n}: Exactly n occurrences
  • {n,m}: Occurrences between n and m
  • {n,}: At least an n occurrence
  • {,n}: 0 to n occurrences

In the following example, we create a pattern where the character u is optional (has 0 or 1 occurrence):

var str = /behaviou?r/;
console.log(str.test("behaviour"));
// true
console.log(str.test("behavior"));
// true

It helps to read the /behaviou?r/ expression as 0 or 1 occurrences of character u. The repetition quantifier succeeds the character that we want to repeat. Let's try out some more examples:

console.log(/'d+'/.test("'123'")); // true

You should read and interpret the d+ expression as ' is a literal character match, d matches characters [0-9], the + quantifier will allow one or more occurrences, and ' is a literal character match.

You can also group character expressions using (). Observe the following example:

var heartyLaugh = /Ha+(Ha+)+/i;
console.log(heartyLaugh.test("HaHaHaHaHaHaHaaaaaaaaaaa"));
//true

Let's break the preceding expression into smaller chunks to understand what is going on in here:

  • H: literal character match
  • a+: 1 or more occurrences of character a
  • (: start of the expression group
  • H: literal character match
  • a+: 1 or more occurrences of character a
  • ): end of expression group
  • +: 1 or more occurrences of expression group (Ha+)

Now it is easier to see how the grouping is done. If we have to interpret the expression, it is sometimes helpful to read out the expression, as shown in the preceding example.

Often, you want to match a sequence of letters or numbers on their own and not just as a substring. This is a fairly common use case when you are matching words that are not just part of any other words. We can specify the word boundaries by using the  pattern. The word boundary with  matches the position where one side is a word character (letter, digit, or underscore) and the other side is not. Consider the following examples.

The following is a simple literal match. This match will also be successful if cat is part of a substring:

console.log(/cat/.test('a black cat')); //true

However, in the following example, we define a word boundary by indicating  before the word cat—this means that we want to match only if cat is a word and not a substring. The boundary is established before cat, and hence a match is found on the text, a black cat:

console.log(/cat/.test('a black cat')); //true

When we use the same boundary with the word tomcat, we get a failed match because there is no word boundary before cat in the word tomcat:

console.log(/cat/.test('tomcat')); //false

There is a word boundary after the string cat in the word tomcat, hence the following is a successful match:

console.log(/cat/.test('tomcat')); //true

In the following example, we define the word boundary before and after the word cat to indicate that we want cat to be a standalone word with boundaries before and after:

console.log(/cat/.test('a black cat')); //true

Based on the same logic, the following match fails because there are no boundaries before and after cat in the word concatenate:

console.log(/cat/.test("concatenate")); //false

The exec() method is useful in getting information about the match found because it returns an object with information about the match. The object returned from exec() has an index property that tells us where the successful match begins in the string. This is useful in many ways:

var match = /d+/.exec("There are 100 ways to do this");
console.log(match);
// ["100"]
console.log(match.index);
// 10

Alternatives – OR

Alternatives can be expressed using the | (pipe) character. For example, /a|b/ matches either the a or b character, and /(ab)+|(cd)+/ matches one or more occurrences of either ab or cd.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.226.105