Chapter 3explained Ruby’s string literal syntax, as well as the String
operators for concatenation (+
), appends (<<
),
repetition (*
), and indexing
([]
). In this section we expand on that coverage by
demonstrating the named methods of the String
class. The subsections that follow this
API overview cover specific areas in more detail.
We begin with methods that provide named alternatives to some of the operators documented in Chapter 3:
s = "hello" s.concat(" world") # Synonym for <<. Mutating append to s. Returns new s. s.insert(5, " there") # Same as s[5,0] = " there". Alters s. Returns new s. s.slice(0,5) # Same as s[0,5]. Returns a substring. s.slice!(5,6) # Deletion. Same as s[5,6]="". Returns deleted substring. s.eql?("hello world") # True. Same as ==.
There are several methods for querying the length of a string:
s.length # => 11: counts characters in 1.9, bytes in 1.8 s.size # => 11: size is a synonym s.bytesize # => 11: length in bytes; Ruby 1.9 only s.empty? # => false "".empty? # => true
String methods for searching a string and for replacing content include the following. We’ll revisit some of these when we consider regular expressions later in this section:
s = "hello" # Finding the position of a substring or pattern match s.index('l') # => 2: index of first l in string s.index(?l) # => 2: works with character codes as well s.index(/l+/) # => 2: works with regular expressions, too s.index('l',3) # => 3: index of first l in string at or after position 3 s.index('Ruby') # => nil: search string not found s.rindex('l') # => 3: index of rightmost l in string s.rindex('l',2) # => 2: index of rightmost l in string at or before 2 # Checking for prefixes and suffixes: Ruby 1.9 and later s.start_with? "hell" # => true. Note start_with not starts_with s.end_with? "bells" # => false # Testing for presence of substring s.include?("ll") # => true: "hello" includes "ll" s.include?(?H) # => false: "hello" does not include character H # Pattern matching with regular expressions s =~ /[aeiou]{2}/ # => nil: no double vowels in "hello" s.match(/[aeiou]/) {|m| m.to_s} # => "e": return first vowel # Splitting a string into substrings based on a delimiter string or pattern "this is it".split # => ["this", "is", "it"]: split on spaces by default "hello".split('l') # => ["he", "", "o"] "1, 2,3".split(/,s*/) # => ["1","2","3"]: comma and optional space delimiter # Split a string into two parts plus a delimiter. Ruby 1.9 only. # These methods always return arrays of 3 strings: "banana".partition("an") # => ["b", "an", "ana"] "banana".rpartition("an") # => ["ban", "an", "a"]: start from right "a123b".partition(/d+/) # => ["a", "123", "b"]: works with Regexps, too # Search and replace the first (sub, sub!) or all (gsub, gsub!) # occurrences of the specified string or pattern. # More about sub and gsub when we cover regular expressions later. s.sub("l", "L") # => "heLlo": Just replace first occurrence s.gsub("l", "L") # => "heLLo": Replace all occurrences s.sub!(/(.)(.)/, '21') # => "ehllo": Match and swap first 2 letters s.sub!(/(.)(.)/, "\2\1") # => "hello": Double backslashes for double quotes # sub and gsub can also compute a replacement string with a block # Match the first letter of each word and capitalize it "hello world".gsub(/./) {|match| match.upcase } # => "Hello World" # In Ruby 1.9, you can specify a hash to map matches to replacements s.gsub(/[aeiou]/,"a"=>0, "e"=>1, "i"=>2) # => "h1ll"
The last line of this example uses the upcase
method to convert a string to uppercase. The String
class defines a number of methods for
working with case (but it does not define methods for testing the case
or category of a character):
# Case modification methods s = "world" # These methods work with ASCII characters only s.upcase # => "WORLD" s.upcase! # => "WORLD"; alter s in place s.downcase # => "world" s.capitalize # => "World": first letter upper, rest lower s.capitalize! # => "World": alter s in place s.swapcase # => "wORLD": alter case of each letter # Case insensitive comparison. (ASCII text only) # casecmp works like <=> and returns -1 for less, 0 for equal, +1 for greater "world".casecmp("WORLD") # => 0 "a".casecmp("B") # => -1 (<=> returns 1 in this case)
String
defines a number of
useful methods for adding and removing whitespace. Most exist in
mutating (end with !
) and nonmutating versions:
s = "hello " # A string with a line terminator s.chomp! # => "hello": remove one line terminator from end s.chomp # => "hello": no line terminator so no change s.chomp! # => nil: return of nil indicates no change made s.chomp("o") # => "hell": remove "o" from end $/ = ";" # Set global record separator $/ to semicolon "hello;".chomp # => "hello": now chomp removes semicolons and end # chop removes trailing character or line terminator ( , , or ) s = "hello " s.chop! # => "hello": line terminator removed. s modified. s.chop # => "hell": last character removed. s not modified. "".chop # => "": no characters to remove "".chop! # => nil: nothing changed # Strip all whitespace (including , , ) from left, right, or both # strip!, lstrip! and rstrip! modify the string in place. s = " hello " # Whitespace at beginning and end s.strip # => "hello" s.lstrip # => "hello " s.rstrip # => " hello" # Left-justify, right-justify, or center a string in a field n-characters wide. # There are no mutator versions of these methods. See also printf method. s = "x" s.ljust(3) # => "x " s.rjust(3) # => " x" s.center(3) # => " x " s.center(5, '-') # => "--x--": padding other than space are allowed s.center(7, '-=') # => "-=-x-=-": multicharacter padding allowed
Strings may be enumerated byte-by-byte or line-by-line
with the each_byte
and
each_line
iterators. In Ruby 1.8, the
each
method is a synonym for each_line
, and the String
class includes Enumerable
. Avoid using each
and its related iterators because Ruby
1.9 removes the each
method and no
longer makes strings Enumerable
. Ruby 1.9
(and the jcode
library in Ruby 1.8)
adds an each_char
iterator and
enables character-by-character enumeration of strings:
s = "A B" # Three ASCII characters on two lines s.each_byte {|b| print b, " " } # Prints "65 10 66 " s.each_line {|l| print l.chomp} # Prints "AB" # Sequentially iterate characters as 1-character strings # Works in Ruby 1.9, or in 1.8 with the jcode library: s.each_char { |c| print c, " " } # Prints "A B " # Enumerate each character as a 1-character string # This does not work for multibyte strings in 1.8 # It works (inefficiently) for multibyte strings in 1.9: 0.upto(s.length-1) {|n| print s[n,1], " "} # In Ruby 1.9, bytes, lines, and chars are aliases s.bytes.to_a # => [65,10,66]: alias for each_byte s.lines.to_a # => ["A ","B"]: alias for each_line s.chars.to_a # => ["A", " ", "B"] alias for each_char
String
defines a number of
methods for parsing numbers from strings, and for converting strings to symbols:
"10".to_i # => 10: convert string to integer "10".to_i(2) # => 2: argument is radix: between base-2 and base-36 "10x".to_i # => 10: nonnumeric suffix is ignored. Same for oct, hex " 10".to_i # => 10: leading whitespace ignored "ten".to_i # => 0: does not raise exception on bad input "10".oct # => 8: parse string as base-8 integer "10".hex # => 16: parse string as hexadecimal integer "0xff".hex # => 255: hex numbers may begin with 0x prefix " 1.1 dozen".to_f # => 1.1: parse leading floating-point number "6.02e23".to_f # => 6.02e+23: exponential notation supported "one".to_sym # => :one -- string to symbol conversion "two".intern # => :two -- intern is a synonym for to_sym
Finally, here are some miscellaneous String
methods:
# Increment a string: "a".succ # => "b": the successor of "a". Also, succ! "aaz".next # => "aba": next is a synonym. Also, next! "a".upto("e") {|c| print c } # Prints "abcde". upto iterator based on succ. # Reverse a string: "hello".reverse # => "olleh". Also reverse! # Debugging "hello ".dump # => ""hello\n"": Escape special characters "hello ".inspect # Works much like dump # Translation from one set of characters to another "hello".tr("aeiou", "AEIOU") # => "hEllO": capitalize vowels. Also tr! "hello".tr("aeiou", " ") # => "h ll ": convert vowels to spaces "bead".tr_s("aeiou", " ") # => "b d": convert and remove duplicates # Checksums "hello".sum # => 532: weak 16-bit checksum "hello".sum(8) # => 20: 8 bit checksum instead of 16 bit "hello".crypt("ab") # => "abl0JrMf6tlhw": one way cryptographic checksum # Pass two alphanumeric characters as "salt" # The result may be platform-dependent # Counting letters, deleting letters, and removing duplicates "hello".count('aeiou') # => 2: count lowercase vowels "hello".delete('aeiou') # => "hll": delete lowercase vowels. Also delete! "hello".squeeze('a-z') # => "helo": remove runs of letters. Also squeeze! # When there is more than one argument, take the intersection. # Arguments that begin with ^ are negated. "hello".count('a-z', '^aeiou') # => 3: count lowercase consonants "hello".delete('a-z', '^aeiou') # => "eo: delete lowercase consonants
As you know, Ruby’s double-quoted string literals allow arbitrary Ruby expressions to be interpolated into strings. For example:
n, animal = 2, "mice" "#{n+1} blind #{animal}" # => '3 blind mice'
This string-literal interpolation syntax was documented in Chapter 3. Ruby also supports another technique for
interpolating values into strings: the String
class defines a format operator
%
, and the Kernel
module defines global printf
and sprintf
methods. These methods and the %
operator are very much like the printf
function popularized by the C
programming language. One advantage of printf
-style formatting over regular string
literal interpolation is that it allows precise control over field
widths, which makes it useful for ASCII report generation. Another
advantage is that it allows you to specify the number of significant
digits to display in floating-point numbers, which is useful in
scientific (and sometimes financial) applications. Finally, printf
-style formatting decouples the values
to be formatted from the string into which they are
interpolated. This can be helpful for internationalization and
localization of applications.
Examples using the %
operator
follow. See Kernel.sprintf
for
complete documentation of the formatting directives used by these
methods:
# Alternatives to the interpolation above printf('%d blind %s', n+1, animal) # Prints '3 blind mice', returns nil sprintf('%d blind %s', n+1, animal) # => '3 blind mice' '%d blind %s' % [n+1, animal] # Use array on right if more than one argument # Formatting numbers '%d' % 10 # => '10': %d for decimal integers '%x' % 10 # => 'a': hexadecimal integers '%X' % 10 # => 'A': uppercase hexadecimal integers '%o' % 10 # => '12': octal integers '%f' % 1234.567 # => '1234.567000': full-length floating-point numbers '%e' % 1234.567 # => '1.234567e+03': force exponential notation '%E' % 1234.567 # => '1.234567e+03': exponential with uppercase E '%g' % 1234.567 # => '1234.57': six significant digits '%g' % 1.23456E12 # => '1.23456e+12': Use %f or %e depending on magnitude # Field width '%5s' % '<<<' # ' <<<': right-justify in field five characters wide '%-5s' % '>>>' # '>>> ': left-justify in field five characters wide '%5d' % 123 # ' 123': field is five characters wide '%05d' % 123 # '00123': pad with zeros in field five characters wide # Precision '%.2f' % 123.456 # '123.46': two digits after decimal place '%.2e' % 123.456 # '1.23e+02': two digits after decimal = three significant digits '%.6e' % 123.456 # '1.234560e+02': note added zero '%.4g' % 123.456 # '123.5': four significant digits # Field and precision combined '%6.4g' % 123.456 # ' 123.5': four significant digits in field six chars wide '%3s' % 'ruby' # 'ruby': string argument exceeds field width '%3.3s' % 'ruby' # 'rub': precision forces truncation of string # Multiple arguments to be formatted args = ['Syntax Error', 'test.rb', 20] # An array of arguments "%s: in '%s' line %d" % args # => "Syntax Error: in 'test.rb' line 20" # Same args, interpolated in different order! Good for internationalization. "%2$s:%3$d: %1$s" % args # => "test.rb:20: Syntax Error"
Ruby’s strings can hold binary data as well as textual data. A pair of
methods, Array.pack
and String.unpack
, can be helpful if you are
working with binary file formats or binary network protocols. Use
Array.pack
to encode the elements
of an array into a binary string. And use String.unpack
to decode a binary string,
extracting values from it and returning those values in an array. Both
the encoding and decoding
operations are under the control of a format string where letters
specify the datatype and encoding and numbers specify the number of
repetitions. The creation of these format strings is fairly arcane,
and you can find a complete list of letter codes in the documentation
for Array.pack
and String.unpack
. Here are some simple
examples:
a = [1,2,3,4,5,6,7,8,9,10] # An array of 10 integers b = a.pack('i10') # Pack 10 4-byte integers (i) into binary string b c = b.unpack('i*') # Decode all (*) the 4-byte integers from b c == a # => true m = 'hello world' # A message to encode data = [m.size, m] # Length first, then the bytes template = 'Sa*' # Unsigned short, any number of ASCII chars b = data.pack(template) # => "v 00hello world" b.unpack(template) # => [11, "hello world"]
The String
methods encoding
, encode
, encode!
, and force_encoding
and the Encoding
class were described in String Encodings and Multibyte Characters. You may want to reread that section now
if you will be writing programs using Unicode or other multibyte
character encodings.
52.15.38.176