Text is represented in Ruby by objects of the String
class. Strings are mutable objects, and the String
class defines a powerful set of
operators and methods for extracting substrings, inserting and deleting
text, searching, replacing, and so on. Ruby provides a number of ways to
express string literals in your programs, and some of them support a
powerful string interpolation syntax by which the values of arbitrary
Ruby expressions can be substituted into string literals. The sections
that follow explain string and character literals and string operators.
The full string API is covered in Strings.
Textual patterns are represented in Ruby as Regexp
objects,
and Ruby defines a syntax for including regular expressions literally in
your programs. The code /[a-z]d+/
,
for example, represents a single lowercase letter followed by one or
more digits. Regular expressions are a commonly used feature of Ruby,
but regexps are not a fundamental datatype in the way that numbers,
strings, and arrays are. See Regular Expressions for
documentation of regular expression syntax and the Regexp
API.
Ruby provides quite a few ways to embed strings literally into your programs.
The simplest string literals are enclosed in single quotes (the apostrophe character). The text within the quote marks is the value of the string:
'This is a simple Ruby string literal'
If you need to place an apostrophe within a single-quoted string literal, precede it with a backslash so that the Ruby interpreter does not think that it terminates the string:
'Won't you read O'Reilly's book?'
The backslash also works to escape another backslash, so that the second backslash is not itself interpreted as an escape character. Here are some situations in which you need to use a double backslash:
'This string literal ends with a single backslash: ' 'This is a backslash-quote: '' 'Two backslashes: \'
In single-quoted strings, a backslash is not special if the character that follows it is anything other than a quote or a backslash. Most of the time, therefore, backslashes need not be doubled (although they can be) in string literals. For example, the following two string literals are equal:
'a' == 'a\b'
Single-quoted strings may extend over multiple lines, and the resulting string literal includes the newline characters. It is not possible to escape the newlines with a backslash:
'This is a long string literal that includes a backslash and a newline'
If you want to break a long single-quoted string literal across multiple lines without embedding newlines in it, simply break it into multiple adjacent string literals; the Ruby interpreter will concatenate them during the parsing process. Remember, though, that you must escape the newlines (see Chapter 2) between the literals so that Ruby does not interpret the newline as a statement terminator:
message = 'These three literals are ' 'concatenated into one by the interpreter. ' 'The resulting string contains no newlines.'
String literals delimited by double quotation marks are much more flexible than
single-quoted literals. Double-quoted literals support quite a few
backslash escape sequences, such as
for newline,
for tab, and "
for a quotation
mark that does not terminate the string:
" "This quote begins with a tab and ends with a newline" " "\" # A single backslash
In Ruby 1.9, the u
escape
embeds arbitrary Unicode characters, specified by their codepoint, into
a double-quoted string. This escape sequence is complex enough that
we’ll describe it in its own section (see Unicode escapes). Many of the other
backslash escape sequences are obscure and are used for encoding
binary data into strings. The complete list of escape sequences is
shown in Table 3-1.
More powerfully, double-quoted string literals may also
include arbitrary Ruby expressions. When the string is
created, the expression is evaluated, converted to a string, and
inserted into the string in place of the expression text itself.
This substitution of an expression with its value is known in Ruby
as “string interpolation.” Expressions within double-quoted strings
begin with the #
character and are enclosed within curly braces:
"360 degrees=#{2*Math::PI} radians" # "360 degrees=6.28318530717959 radians"
When the expression to be interpolated into the string literal is simply a reference to a global, instance, or class variable, then the curly braces may be omitted:
$salutation = 'hello' # Define a global variable "#$salutation world" # Use it in a double-quoted string
Use a backslash to escape the #
character if you do not want it to be
treated specially. Note that this only needs to be done if the
character after #
is {
, $
,
or @
:
"My phone #: 555-1234" # No escape needed "Use #{ to interpolate expressions" # Escape #{ with backslash
Double-quoted string literals may span multiple lines, and line terminators become part of the string literal, unless escaped with a backslash:
"This string literal has two lines but is written on three"
You may prefer to explicitly encode the line terminators in
your strings—in order to enforce network CRLF (Carriage Return Line
Feed) line terminators, as used in the HTTP protocol, for example.
To do this, write all your string literals on a single line and
explicitly include the line endings with the
and
escape sequences. Remember that
adjacent string literals are automatically concatenated, but if they
are written on separate lines, the newline between
them must be escaped:
"This string has three lines. " "It is written as three adjacent literals " "separated by escaped newlines "
Table 3-1. Backslash escapes in double-quoted strings