String Handling

Before leaving the subject of strings, you’ll take a quick look at a few common string operations.

Concatenation

You can concatenate strings using << or + or just by placing a space between them. Here are three examples of string concatenation; in each case, s is assigned the string “Hello world”:

hello_world_concat.rb

s = "Hello " << "world"
s = "Hello " + "world"
s = "Hello "  "world"

Note that when you use the << method, you can append Fixnum integers (in the range 0 to 255), in which case those integers are converted to the character with that character code. Character codes 65 to 90 are converted to the uppercase characters A to Z, 97 to 122 are converted to the lowercase a to z, and other codes are converted to punctuation, special characters, and nonprinting characters. However, if you want to print the number itself, you must convert it to a string using the to_s method. The to_s method is obligatory when concatenating Fixnums using the + method or a space; attempting to concatenate a number without using to_s is an error. The following program prints out characters and numeric codes for values between 0 and 126, which include the standard Western alphanumeric and punctuation characters:

char_codes.rb

i = 0
begin
    s = "[" << i << ":" << i.to_s << "]"
    puts(s)
    i += 1
end until i == 126

For examples of concatenating using <<, +, or a space, see string_contact.rb:

string_contact.rb

s1 = "This " << "is" << " a string " << 36 # char 36 is '$'
s2 = "This "  + "is" + " a string "  + 36.to_s
s3 = "This "  "is"  " a string "  + 36.to_s

puts("(s1):" << s1)
puts("(s2):" << s2)
puts("(s3):" << s3)

The previous program produces this output:

(s1):This is a string $
(s2):This is a string 36
(s3):This is a string 36

What About Commas?

You may sometimes see Ruby code that uses commas to separate strings and other data types. In some circumstances, these commas appear to have the effect of concatenating strings. For example, the following code might, at first sight, seem to create and display a string from three substrings plus an integer:

s4 = "This " , "is" , " not a string!", 10
print("print (s4):" , s4, "
")

In fact, a list separated by commas creates an array—an ordered list of the original strings. The string_concat.rb program contains examples that prove this to be the case:

x = "This " , "is" , " not a string!", 36
print("print (x):" , x, "
")
puts("puts(x):", x)
puts("puts x.class is: " << (x.class).to_s )

print("print(x):" , x, "
")
puts("puts(x):", x)
puts("puts x.class is: " << (x.class).to_s )

The previous code causes the following to be displayed:

print (x):This is not a string!36
puts(x):
This
is
 not a string!
36
puts x.class is: Array

The first print statement here looks as though it is displaying a single string. This is because each successive item in the array, x, is printed on the same line as the preceding item. When you use puts instead of print, you can see that each item is printed on a separate line. This is because puts prints each item in turn and appends a carriage return after it. The fact that you are dealing with an array rather than a string is confirmed when you ask Ruby to print the class of the x object. It displays Array. You’ll learn about arrays in more depth in the next chapter.

String Assignment

The Ruby String class provides a number of useful string-handling methods. Most of these methods create new string objects. So, for example, in the following code, the s on the left side of the assignment on the second line is not the same object as the s on the right side:

s = "hello world"
s = s + "!"

A few string methods actually alter the string itself without creating a new object. These methods generally end with an exclamation mark (for example, the capitalize! method changes the original string, whereas the capitalize method does not). In addition, the string itself is also modified—and no new string is created—when you assign a character at an index of the string. For example, s[1] = 'A' would place the character A at index 1 (the second character) of the string s.

If in doubt, you can check an object’s identity using the object_id method. I’ve provided a few examples of operations that do and do not create new strings in the string_assign.rb program. Run this, and check the object_id of s after each string operation is performed.

string_assign.rb

s = "hello world"
print( "1) s='#{s}' and s.object_id=#{s.object_id}
" )
s = s + "!"            # this creates a new string object
print( "2) s='#{s}' and s.object_id=#{s.object_id}
" )
s = s.capitalize       # this creates a new string object
print( "3) s='#{s}' and s.object_id=#{s.object_id}
" )
s.capitalize!          # but this modifies the original string object
print( "4) s='#{s}' and s.object_id=#{s.object_id}
" )
s[1] = 'A'             # this also modifies the original string object
print( "5) s='#{s}' and s.object_id=#{s.object_id}
" )

This produces output similar to that shown next. The actual object ID values may differ, but the important thing to notice is when consecutive values remain the same, showing that the string object, s, remains the same and, when they change, showing that a new string object, s, has been created:

1) s='hello world' and s.object_id=29573230
2) s='hello world!' and s.object_id=29573190
3) s='Hello world!' and s.object_id=29573160
4) s='Hello world!' and s.object_id=29573160
5) s='HAllo world!' and s.object_id=29573160

Indexing into a String

In one of the previous examples, I treated a string as an array of characters and specified a character index with an integer inside square brackets: s[1]. Strings and arrays in Ruby are indexed from the first character at index 0. So, for instance, to replace the character e with u in the string s (which currently contains “Hello world”), you would assign a new character to index 1:

s[1] = 'u'

If you index into a string in order to find a character at a specific location, the behavior differs according to which version of Ruby you are using. Ruby 1.8 returns a numeric ASCII code of the character, whereas Ruby 1.9 returns the character itself.

s = "Hello world"
puts( s[1] )    #=> Ruby 1.8 displays 101; Ruby 1.9 displays 'e'

To obtain the actual character from the numeric value returned by Ruby 1.8, you can use a double index to print a single character, starting at index 1:

s = "Hello world"
puts( s[1,1] ) # prints out 'e'

If, on the other hand, you need the numeric value of the character returned by Ruby 1.9, you can use the ord method like this:

puts( s[1].ord)

The ord method does not exist in Ruby 1.8, so the previous code causes an “undefined method” error. To ensure compatibility between Ruby 1.8 and 1.9, you should use the double-index technique, with the first index indicating the starting position and the second index indicating the number of characters. For example, this returns one character at position 1: s[1,1]. You can see some more examples in the char_in_string.rb program:

char_in_string.rb

s = "Hello world"
puts( s[1] )
achar=s[1]
puts( achar )
puts( s[1,1] )
puts( achar.ord )

When you run this code, Ruby 1.9 displays this:

e
e
e
101

whereas Ruby 1.8 displays this:

101
101
e
undefined method `ord' for 101:Fixnum (NoMethodError)

You can also use double-indexes to return more than one character. If you want to return three characters starting at position 1, you would enter this:

puts( s[1,3] )     # prints 'ell'

This tells Ruby to start at position 1 and return the next three characters. Alternatively, you could use the two-dot range notation:

puts( s[1..3] )     # also prints 'ell'

Note

Ranges are discussed in more detail later in this chapter.

Strings can also be indexed using negative values, in which case −1 is the index of the last character, and, once again, you can specify the number of characters to be returned:

string_index.rb

puts( s[-1,1] )     # prints 'd'
puts( s[-5,5] )     # prints 'world'

When specifying ranges using a negative index, you must use negative values for both the start and end indexes:

string_methods.rb

puts( s[-5..5] )    # this prints an empty string!
puts( s[-5..-1] )   # prints 'world'

Finally, you may want to experiment with a few of the standard methods available for manipulating strings. These include methods to change the case of a string, reverse it, insert substrings, remove repeating characters, and so on. I’ve provided a few examples in string_methods.rb. The method names are generally descriptive of their functions. However, bear in mind that methods such as reverse (with no ! at the end) return a new string but do not modify the original string, whereas reverse! (with the !) modifies the original string. You saw similar behavior with the capitalize end capitalize! methods used earlier.

The insert method takes two arguments, an index and a string, and it inserts the string argument at the given index of the string, s. The squeeze method returns a string with any repeating character, such as the second adjacent l in “Hello” removed. The split method splits a string into an array. I’ll have more to say on split when I discuss regular expressions in Chapter 6. The following examples assume that s is the string “Hello world” and the output is shown in the #=> comments. In the program supplied in this book’s code archive, you may also experiment using much longer strings:

s.length            #=> 11
s.reverse!          #=> Hello world
s.reverse           #=> dlrow olleH
s.upcase            #=> HELLO WORLD
s.capitalize        #=> Hello world
s.swapcase          #=> hELLO WORLD
s.downcase          #=> hello world
s.insert(7,"NOT ")  #=> hello wNOT orld
s.squeeze           #=> helo wNOT orld
s.split             #=> ["helo", "wNOT", "orld"]

Removing Newline Characters: chop and chomp

A couple of handy string-processing methods deserve special mention. The chop and chomp methods can be used to remove characters from the end of a string. The chop method returns a string with the last character removed or with the carriage return and newline characters removed ( ) if these are found at the end of the string. The chomp method returns a string with the terminating carriage return or newline character removed (or both the carriage return and the newline character if both are found).

These methods are useful when you need to remove line feeds entered by the user or read from a file. For instance, when you use gets to read in a line of text, this returns the line including the terminating record separator, which, by default, is the newline character.

record_separator.rb

You can remove the newline character using either chop or chomp. In most cases, chomp is preferable because it won’t remove the final character unless it is the record separator (usually a newline), whereas chop will remove the last character no matter what it is. Here are some examples:

chop_chomp.rb

# NOTE: s1 includes a carriage return and linefeed
s1 = "Hello world
"
s2 = "Hello world"
s1.chop           # returns "Hello world"
s1.chomp          # returns "Hello world"
s2.chop           # returns "Hello worl" - note the missing 'd'!
s2.chomp          # returns "Hello world"

The chomp method also lets you specify a character or string to use as the separator:

s2.chomp('rld')   # returns "Hello wo"

Format Strings

Ruby provides the printf method to print “format strings” containing specifiers starting with a percent sign (%). The format string may be followed by one or more data items separated by commas; the list of data items should match the number and type of the format specifiers. The actual data items replace the matching specifiers in the string, and they are formatted accordingly. These are some common formatting specifiers:

%d - decimal number
%f - floating-point number
%o - octal number
%p - inspect object
%s - string
%x - hexadecimal number

You can control floating-point precision by putting a point-number before the floating-point formatting specifier, %f. For example, this would display the floating-point value to six digits (the default) followed by a carriage return (" "):

string_printf.rb

printf( "%f
", 10.12945 )        #=> 10.129450

And the following would display the floating-point value to two digits ("%0.02f"). It is purely a matter of stylistic preference whether the floating-point specifier includes a preceding 0 or not and "%0.2f" is equivalent.

printf( "%0.02f
", 10.12945 )     #=> 10.13

Here are a couple more examples:

printf("d=%d f=%f o=%o x=%x s=%s
", 10, 10, 10, 10, 10)

That would output d=10 f=10.000000 o=12 x=a s=10.

printf("0.04f=%0.04f : 0.02f=%0.02f
", 10.12945, 10.12945)

That would output 0.04f=10.1295 : 0.02f=10.13.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.166.149