Chapter 6. Strings and Regular Expressions

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6. Strings and Regular Expressions

Topics in This Chapter

6.1 Converting between Strings and Code Point Sequences

6.2 Substrings

6.3 Other String Methods

6.4 Tagged Template Literals

6.5 Raw Template Literals

6.6 Regular Expressions

6.7 Regular Expression Literals

6.8 Flags

6.9 Regular Expressions and Unicode

6.10 The Methods of the RegExp Class

6.11 Groups

6.12 String Methods with Regular Expressions

6.13 More about Regex Replace

6.14 Exotic Features

Exercises

In this chapter, you will learn about the methods that the standard library provides for string processing. We will then turn to regular expressions, which let you find strings that match patterns. After an introduction into the syntax of regular expressions and the JavaScript-specific idiosyncrasies, you will see how to use the API for finding and replacing matches.

6.1 Converting between Strings and Code Point Sequences

A string is a sequence of Unicode code points. Each code point is an integer between zero and 0x10FFFF. The fromCodePoint function of the String class assembles a string from code point arguments:

Name	Description
Functions
`fromCodePoint(codePoints...)`	Yields a string consisting of the given code points
Methods
`startsWith(s)`, `endsWith(s)`, `includes(s)`	`true` if a string starts or ends with `s`, or has `s` as a substring
`indexOf(s, start)`, `lastIndexOf(s, start)`	The index of the first or last occurrence of `s` beginning with index `start` (which defaults to `0`)
`slice(start, end)`	The substring of code units with index between `start` inclusive and `end` exclusive. Negative index values are counted from the end of the string. `end` defaults to the length of the string. Prefer this method over `substring`.
`repeat(n)`	This string, repeated `n` times
`trimStart()`, `trimEnd()`, `trim()`	This string with leading, trailing, or leading and trailing white space removed
`padStart(minLength, padString)`, `padEnd(minLength, padString)`	This string, padded at the start or end until its length reaches `minLength`. The default `padString` is `' '`.
`toLowerCase()`, `toUpperCase()`	This string with all letters converted to lower or upper case
`split(separator, maxParts)`	An array of parts obtained by removing all copies of the separator (which can be a regular expression). If `maxParts` is omitted, all parts are returned.
`search(target)`	The index of the first match of `target` (which can be a regular expression)
`replace(target, replacement)`	This string, with the first match of `target` replaced. If `target` is a global regular expression, all matches are replaced. See Section 6.13 about replacement patterns and functions.
`match(regex)`	An array of matches if `regex` is global, `null` if there is no match, and the match result otherwise. The match result is an array of all group matches, with properties `index` (the index of the match) and `groups` (an object mapping group names to matches).
`matchAll(regex)`	An iterable of the match results

Expression	Description	Example
Characters
A character other than `. * + ? { \| ( ) [ ^ $`	Matches only the given character	`J`
`.`	Matches any character except , or any character if the `dotAll` flag is set
`u{`hhhh`}`, `u{`hhhhh`}`	The Unicode code point with the given hex value (requires `unicode` flag)	`u{1F310}`
`u`hhhh, `x`hh	The UTF-16 code unit with the given hex value	`xA0`
`f`, , , , `v`	Form feed (`x0C`), newline (`x0A`), carriage return (`x0D`), tab (`x09`), vertical tab (`x0B`)
`c`L, where L is in `[A-Za-z]`	The control character corresponding to the character L	`cH` is Ctrl-H or backspace (`x08`)
c, where c is not in `[0-9BDPSWbcdfknprstv]`	The character c	`\`
Character Classes
`[`C₁C₂. . .`]`, where C_i are characters, ranges c`-`d, or character classes	Any of the characters represented by C₁, C₂, . . .	`[0-9+-]`
`[^`. . .`]`	Complement of a character class	`[^ds]`
`p{`BooleanProperty`}` `p{`Property`=`Value`}` `P{`. . .`}`	A Unicode property (see Section 6.9); its complement (requires the `unicode` flag)	`p{L}` are Unicode letters
`d`, `D`	A digit `[0-9]`; the complement	`d+` is a sequence of digits
`w`, `W`	A word character `[a-zA-Z0-9_]`; the complement
`s`, `S`	A space from `[ vf xA0]` or 18 additional Unicode space characters; same as `p{White_Space}`	`s,s` is a comma surrounded by optional white space
Sequences and Alternatives
XY	Any string from `X`, followed by any string from `Y`	`[1-9][0-9]*` is a positive number without leading zero
X`\|`Y	Any string from `X` or `Y`	`http\|ftp`
Grouping
`(`X`)`	Captures the match of X into a group—see Section 6.11	`'([^']*)'` captures the quoted text
n	Matches the nth group	`(['"]).*1` matches `'Fred'` or `"Fred"` but not `"Fred'`
`(?<`name`>`X`)`	Captures the match of X with the given name	`'(?<qty>[0-9]+)'` captures the match with name `qty`
`k<`name`>`	The group with the given name	`k<qty>` matches the group with name `qty`
`(?:`X`)`	Use parentheses without capturing X	In `(?:http\|ftp)://(.*)`, the match after `://` is `1`
Other `(?`. . .`)`	See Section 6.14
Quantifiers
X`?`	Optional X	`+?` is an optional + sign
X`*`, X`+`	0 or more X, 1 or more X	`[1-9][0-9]+` is an integer ≥ 10
X`{`n`}`, X`{`n`,}`, X`{`m`,`n`}`	n times X, at least n times `X`, between m and n times X	`[0-9]{4,6}` are four to six digits
X`*?` or X`+?`	Reluctant quantifier, attempting the shortest match before trying longer matches	`.(<.+?>).` captures the shortest sequence enclosed in angle brackets
Boundary Matches
`^ $`	Beginning, end of input (or beginning, end of line if the `multiline` flag is set)	`^JavaScript$` matches the input or line `JavaScript`
, `B`	Word boundary, nonword boundary	`JavaB` matches `JavaScript` but not `Java code`

Single Letter	Property Name	Description
`i`	`ignoreCase`	Case-insensitive match
`m`	`multiline`	`^`, `$` match start, end of line
`s`	`dotAll`	`.` matches newline
`u`	`unicode`	Match Unicode characters, not code units—see Section 6.9
`g`	`global`	Find all matches—see Section 6.10
`y`	`sticky`	Match must start at `regex.lastIndex`—see Section 6.10

Name	Description
`L`	Letter
`Lu`	Uppercase letter
`Ll`	Lowercase letter
`Nd`	Decimal number
`P`	Punctuation
`S`	Symbol
`White_Space`	White space, same as `s`
`Emoji`	Emoji characters, modifiers, or components

Table of Contents for Chapter 6. Strings and Regular Expressions

Create new playlist

Sign In

Sign Up

Chapter 6. Strings and Regular Expressions

6.1 Converting between Strings and Code Point Sequences

6.2 Substrings

6.3 Other String Methods

6.4 Tagged Template Literals

6.5 Raw Template Literals

6.6 Regular Expressions

6.7 Regular Expression Literals

6.8 Flags

6.9 Regular Expressions and Unicode

6.10 The Methods of the RegExp Class

6.11 Groups

6.12 String Methods with Regular Expressions

6.13 More about Regex Replace

6.14 Exotic Features

Exercises

Table of Contents for
Chapter 6. Strings and Regular Expressions

6.10 The Methods of the `RegExp` Class