Strings

Chapter 3explained Ruby’s string literal syntax, as well as the String operators for concatenation (+), appends (<<), repetition (*), and indexing ([]). In this section we expand on that coverage by demonstrating the named methods of the String class. The subsections that follow this API overview cover specific areas in more detail.

We begin with methods that provide named alternatives to some of the operators documented in Chapter 3:

s = "hello"
s.concat(" world")    # Synonym for <<. Mutating append to s. Returns new s.
s.insert(5, " there") # Same as s[5,0] = " there". Alters s. Returns new s.
s.slice(0,5)          # Same as s[0,5]. Returns a substring.
s.slice!(5,6)         # Deletion. Same as s[5,6]="". Returns deleted substring.
s.eql?("hello world") # True. Same as ==.

There are several methods for querying the length of a string:

s.length         # => 11: counts characters in 1.9, bytes in 1.8
s.size           # => 11: size is a synonym
s.bytesize       # => 11: length in bytes; Ruby 1.9 only
s.empty?         # => false
"".empty?        # => true

String methods for searching a string and for replacing content include the following. We’ll revisit some of these when we consider regular expressions later in this section:

s = "hello"
# Finding the position of a substring or pattern match
s.index('l')         # => 2: index of first l in string
s.index(?l)          # => 2: works with character codes as well
s.index(/l+/)        # => 2: works with regular expressions, too
s.index('l',3)       # => 3: index of first l in string at or after position 3
s.index('Ruby')      # => nil: search string not found
s.rindex('l')        # => 3: index of rightmost l in string
s.rindex('l',2)      # => 2: index of rightmost l in string at or before 2

# Checking for prefixes and suffixes: Ruby 1.9 and later
s.start_with? "hell" # => true. Note start_with not starts_with
s.end_with? "bells"  # => false

# Testing for presence of substring
s.include?("ll")     # => true: "hello" includes "ll"
s.include?(?H)       # => false: "hello" does not include character H

# Pattern matching with regular expressions
s =~ /[aeiou]{2}/    # => nil: no double vowels in "hello"
s.match(/[aeiou]/) {|m| m.to_s} # => "e": return first vowel

# Splitting a string into substrings based on a delimiter string or pattern
"this is it".split     # => ["this", "is", "it"]: split on spaces by default
"hello".split('l')     # => ["he", "", "o"]
"1, 2,3".split(/,s*/) # => ["1","2","3"]: comma and optional space delimiter

# Split a string into two parts plus a delimiter. Ruby 1.9 only.
# These methods always return arrays of 3 strings:
"banana".partition("an")  # => ["b", "an", "ana"] 
"banana".rpartition("an") # => ["ban", "an", "a"]: start from right
"a123b".partition(/d+/)  # => ["a", "123", "b"]: works with Regexps, too

# Search and replace the first (sub, sub!) or all (gsub, gsub!)
# occurrences of the specified string or pattern.
# More about sub and gsub when we cover regular expressions later.
s.sub("l", "L")            # => "heLlo": Just replace first occurrence
s.gsub("l", "L")           # => "heLLo": Replace all occurrences
s.sub!(/(.)(.)/, '21')   # => "ehllo": Match and swap first 2 letters
s.sub!(/(.)(.)/, "\2\1") # => "hello": Double backslashes for double quotes
# sub and gsub can also compute a replacement string with a block
# Match the first letter of each word and capitalize it
"hello world".gsub(/./) {|match| match.upcase } # => "Hello World"
# In Ruby 1.9, you can specify a hash to map matches to replacements
s.gsub(/[aeiou]/,"a"=>0, "e"=>1, "i"=>2)   # => "h1ll"

The last line of this example uses the upcase method to convert a string to uppercase. The String class defines a number of methods for working with case (but it does not define methods for testing the case or category of a character):

# Case modification methods
s = "world"   # These methods work with ASCII characters only
s.upcase      # => "WORLD"
s.upcase!     # => "WORLD"; alter s in place
s.downcase    # => "world"
s.capitalize  # => "World": first letter upper, rest lower
s.capitalize! # => "World": alter s in place
s.swapcase    # => "wORLD": alter case of each letter

# Case insensitive comparison. (ASCII text only)
# casecmp works like <=> and returns -1 for less, 0 for equal, +1 for greater
"world".casecmp("WORLD")  # => 0 
"a".casecmp("B")          # => -1 (<=> returns 1 in this case)

String defines a number of useful methods for adding and removing whitespace. Most exist in mutating (end with !) and nonmutating versions:

s = "hello
"      # A string with a line terminator
s.chomp!             # => "hello": remove one line terminator from end
s.chomp              # => "hello": no line terminator so no change
s.chomp!             # => nil: return of nil indicates no change made
s.chomp("o")         # => "hell": remove "o" from end
$/ = ";"             # Set global record separator $/ to semicolon
"hello;".chomp       # => "hello": now chomp removes semicolons and end

# chop removes trailing character or line terminator (
, 
, or 
)
s = "hello
"
s.chop!              # => "hello": line terminator removed. s modified.
s.chop               # => "hell": last character removed. s not modified.
"".chop              # => "": no characters to remove
"".chop!             # => nil: nothing changed

# Strip all whitespace (including 	, 
, 
) from left, right, or both
# strip!, lstrip! and rstrip! modify the string in place.
s = "	 hello 
"   # Whitespace at beginning and end
s.strip             # => "hello"
s.lstrip            # => "hello 
"
s.rstrip            # => "	 hello"

# Left-justify, right-justify, or center a string in a field n-characters wide.
# There are no mutator versions of these methods. See also printf method.
s = "x"
s.ljust(3)          # => "x  "
s.rjust(3)          # => "  x"
s.center(3)         # => " x "
s.center(5, '-')    # => "--x--": padding other than space are allowed
s.center(7, '-=')   # => "-=-x-=-": multicharacter padding allowed

Strings may be enumerated byte-by-byte or line-by-line with the each_byte and each_line iterators. In Ruby 1.8, the each method is a synonym for each_line, and the String class includes Enumerable. Avoid using each and its related iterators because Ruby 1.9 removes the each method and no longer makes strings Enumerable. Ruby 1.9 (and the jcode library in Ruby 1.8) adds an each_char iterator and enables character-by-character enumeration of strings:

s = "A
B"                       # Three ASCII characters on two lines
s.each_byte {|b| print b, " " }  # Prints "65 10 66 "
s.each_line {|l| print l.chomp}  # Prints "AB"

# Sequentially iterate characters as 1-character strings
# Works in Ruby 1.9, or in 1.8 with the jcode library:
s.each_char { |c| print c, " " } # Prints "A 
 B "

# Enumerate each character as a 1-character string
# This does not work for multibyte strings in 1.8
# It works (inefficiently) for multibyte strings in 1.9:
0.upto(s.length-1) {|n| print s[n,1], " "}

# In Ruby 1.9, bytes, lines, and chars are aliases
s.bytes.to_a                     # => [65,10,66]: alias for each_byte
s.lines.to_a                     # => ["A
","B"]: alias for each_line
s.chars.to_a                     # => ["A", "
", "B"] alias for each_char

String defines a number of methods for parsing numbers from strings, and for converting strings to symbols:

"10".to_i          # => 10: convert string to integer
"10".to_i(2)       # => 2: argument is radix: between base-2 and base-36
"10x".to_i         # => 10: nonnumeric suffix is ignored. Same for oct, hex
" 10".to_i         # => 10: leading whitespace ignored
"ten".to_i         # => 0: does not raise exception on bad input
"10".oct           # => 8: parse string as base-8 integer
"10".hex           # => 16: parse string as hexadecimal integer
"0xff".hex         # => 255: hex numbers may begin with 0x prefix
" 1.1 dozen".to_f  # => 1.1: parse leading floating-point number
"6.02e23".to_f     # => 6.02e+23: exponential notation supported

"one".to_sym       # => :one -- string to symbol conversion
"two".intern       # => :two -- intern is a synonym for to_sym

Finally, here are some miscellaneous String methods:

# Increment a string:
"a".succ                      # => "b": the successor of "a". Also, succ!
"aaz".next                    # => "aba": next is a synonym. Also, next!
"a".upto("e") {|c| print c }  # Prints "abcde". upto iterator based on succ.

# Reverse a string:
"hello".reverse     # => "olleh". Also reverse!

# Debugging
"hello
".dump      # => ""hello\n"": Escape special characters
"hello
".inspect   # Works much like dump

# Translation from one set of characters to another
"hello".tr("aeiou", "AEIOU")  # => "hEllO": capitalize vowels. Also tr!
"hello".tr("aeiou", " ")      # => "h ll ": convert vowels to spaces
"bead".tr_s("aeiou", " ")     # => "b d": convert and remove duplicates

# Checksums
"hello".sum          # => 532: weak 16-bit checksum
"hello".sum(8)       # => 20: 8 bit checksum instead of 16 bit
"hello".crypt("ab")  # => "abl0JrMf6tlhw": one way cryptographic checksum
                     # Pass two alphanumeric characters as "salt"
                     # The result may be platform-dependent

# Counting letters, deleting letters, and removing duplicates
"hello".count('aeiou')  # => 2: count lowercase vowels
"hello".delete('aeiou') # => "hll": delete lowercase vowels. Also delete!
"hello".squeeze('a-z')  # => "helo": remove runs of letters. Also squeeze!
# When there is more than one argument, take the intersection.
# Arguments that begin with ^ are negated.
"hello".count('a-z', '^aeiou')   # => 3: count lowercase consonants
"hello".delete('a-z', '^aeiou')  # => "eo: delete lowercase consonants

Formatting Text

As you know, Ruby’s double-quoted string literals allow arbitrary Ruby expressions to be interpolated into strings. For example:

n, animal = 2, "mice"
"#{n+1} blind #{animal}"  # => '3 blind mice'

This string-literal interpolation syntax was documented in Chapter 3. Ruby also supports another technique for interpolating values into strings: the String class defines a format operator %, and the Kernel module defines global printf and sprintf methods. These methods and the % operator are very much like the printf function popularized by the C programming language. One advantage of printf-style formatting over regular string literal interpolation is that it allows precise control over field widths, which makes it useful for ASCII report generation. Another advantage is that it allows you to specify the number of significant digits to display in floating-point numbers, which is useful in scientific (and sometimes financial) applications. Finally, printf-style formatting decouples the values to be formatted from the string into which they are interpolated. This can be helpful for internationalization and localization of applications.

Examples using the % operator follow. See Kernel.sprintf for complete documentation of the formatting directives used by these methods:

# Alternatives to the interpolation above
printf('%d blind %s', n+1, animal)  # Prints '3 blind mice', returns nil
sprintf('%d blind %s', n+1, animal) # => '3 blind mice'
'%d blind %s' % [n+1, animal]  # Use array on right if more than one argument

# Formatting numbers
'%d' % 10         # => '10': %d for decimal integers
'%x' % 10         # => 'a': hexadecimal integers
'%X' % 10         # => 'A': uppercase hexadecimal integers
'%o' % 10         # => '12': octal integers
'%f' % 1234.567   # => '1234.567000': full-length floating-point numbers
'%e' % 1234.567   # => '1.234567e+03': force exponential notation
'%E' % 1234.567   # => '1.234567e+03': exponential with uppercase E
'%g' % 1234.567   # => '1234.57': six significant digits
'%g' % 1.23456E12 # => '1.23456e+12': Use %f or %e depending on magnitude

# Field width
'%5s' % '<<<'     # '  <<<': right-justify in field five characters wide
'%-5s' % '>>>'    # '>>>  ': left-justify in field five characters wide
'%5d' % 123       # '  123': field is five characters wide
'%05d' % 123      # '00123': pad with zeros in field five characters wide

# Precision
'%.2f' % 123.456  # '123.46': two digits after decimal place
'%.2e' % 123.456  # '1.23e+02': two digits after decimal = three significant digits
'%.6e' % 123.456  # '1.234560e+02': note added zero 
'%.4g' % 123.456  # '123.5': four significant digits

# Field and precision combined
'%6.4g' % 123.456 # ' 123.5': four significant digits in field six chars wide
'%3s' % 'ruby'    # 'ruby': string argument exceeds field width
'%3.3s' % 'ruby'  # 'rub': precision forces truncation of string

# Multiple arguments to be formatted
args = ['Syntax Error', 'test.rb', 20]  # An array of arguments
"%s: in '%s' line %d" % args    # => "Syntax Error: in 'test.rb' line 20" 
# Same args, interpolated in different order!  Good for internationalization.
"%2$s:%3$d: %1$s" % args        # => "test.rb:20: Syntax Error"

Packing and Unpacking Binary Strings

Ruby’s strings can hold binary data as well as textual data. A pair of methods, Array.pack and String.unpack, can be helpful if you are working with binary file formats or binary network protocols. Use Array.pack to encode the elements of an array into a binary string. And use String.unpack to decode a binary string, extracting values from it and returning those values in an array. Both the encoding and decoding operations are under the control of a format string where letters specify the datatype and encoding and numbers specify the number of repetitions. The creation of these format strings is fairly arcane, and you can find a complete list of letter codes in the documentation for Array.pack and String.unpack. Here are some simple examples:

a = [1,2,3,4,5,6,7,8,9,10]  # An array of 10 integers
b = a.pack('i10')           # Pack 10 4-byte integers (i) into binary string b
c = b.unpack('i*')          # Decode all (*) the 4-byte integers from b
c == a                      # => true

m = 'hello world'           # A message to encode
data = [m.size, m]          # Length first, then the bytes
template = 'Sa*'            # Unsigned short, any number of ASCII chars
b = data.pack(template)     # => "v00hello world"
b.unpack(template)          # => [11, "hello world"]

Strings and Encodings

The String methods encoding, encode, encode!, and force_encoding and the Encoding class were described in String Encodings and Multibyte Characters. You may want to reread that section now if you will be writing programs using Unicode or other multibyte character encodings.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.38.176