3.4. Ruby's Essential Data Types

Ruby offers a wide range of data types that you can use in your programs and Rails applications. Being a dynamic language, when it comes to types, Ruby is slightly different from what you may have seen in compiled languages such as C, C++, C#, Visual Basic, or Java. Fortunately, working with Ruby data types tends to be much easier.

3.4.1. Everything Is an Object

.NET developers are familiar with the Common Type System (CTS), in which there are two broad categories of types: Value types and Reference types. Among the Value types there are the built-in value types such as System.Int32, System.Double, or System.Boolean.

Other languages like C, C++ (outside of the .NET Framework), and Java may use different names and a different terminology, but they all essentially distinguish between "primitive" types and actual full-fledged classes. In these languages, there are objects, like instances of the class Array, and then there are primitive types that you can't inherit from, call a method on, retrieve or set a property for, and so on.

Forget all that. In Ruby that distinction doesn't exist. Here are a few examples of perfectly valid Ruby code:

3.zero?     # Equivalent to 3 == 0
-5.abs      # 5
12.to_f     # 12.0
15.div(3)   # 5
9.9.round   # 10
true.to_s   # "true"
nil.nil?    # true

This example just called methods on simple numbers, true, and even nil (Ruby's version of null). How is that possible? In Ruby every value is an object!

Ruby methods can end with a question mark, as shown in the example with zero? and nil?. Such methods should always return a Boolean value. The general idea behind this nomenclature is that it increases the readability of the code.

To verify this further you can use the method class available for any object in Ruby. As you can imagine this method tells you the class of a given instance:

3.class         # Fixnum
-273.15.class   # Float
true.class      # TrueClass
false.class     # FalseClass
nil.class       # NilClass
"hello".class   # String

All those values, except for "hello," may appear "primitive" to you, but they really aren't. As you can see 3 is an instance of the Fixnum class. −273.15 is an object of the Float class. Even "special" values (they are keywords) such as true, false, and nil represent (the sole) instances of the classes TrueClass, FalseClass, and NilClass, respectively.

3.4.2. Identifiers and Variables

Identifiers are case-sensitive names for entities like variables, methods, and classes. There's nothing weird about identifiers in Ruby, but there are conventions and rules to follow when defining them. Conventions make your code nicer and more understandable to other Rubyists, but by the same token, they won't break your program if ignored. Rules, on the other hand, affect the semantics of your program and are definitely a deal-breaker and can't be ignored.

3.4.2.1. Convention: Adopt snake_case

As a Microsoft developer you're probably used to adopting camelCase (or CamelCase) for multiword variables and methods. In Ruby the convention is to use the so called snake_case instead. Conventionally, the underscore sign is used to separate words in variables and methods. For example, use file_name, not fileName. Though this may appear bizarre at first, you'll soon realize how it makes your code easier to read and introduces fewer accidental mistakes (for example, due to the fact that filename and fileName are distinct). This is not a rule; Ruby works fine with CamelCase variables and methods, but nothing screams "newbie" as much as employing this type of naming style in Ruby. Code is written for both humans and machines; the former will certainly appreciate any effort to keep your code as idiomatic as possible.

Abelson and Sussman remarked in their classic textbook Structure and Interpretation of Computer Programs (The MIT Press 1996, also known as the SICP) that "Programs must be written for people to read, and only incidentally for machines to execute." Truer words have rarely been written.

3.4.2.2. Convention: Don't Switch Types of a Variable

Ruby is a dynamically typed language and as such, you don't need to declare a variable's type. Aside from all the disputes about the merits of static versus dynamic typing, most people agree that dynamically typed languages tend to be more immediate, less verbose, and, ultimately, easier to program.

This is how you declare and assign a value to the following three variables in C#:

C#

int age = 100;
string name = "Antonio";
MyObject obj = new MyObject();

In Ruby this simply becomes:

age = 100
name = "Antonio"
obj = MyObject.new

Notice how the left-side type declarations (and the semicolon) have disappeared. In Ruby, you use variables without having to declare their type. Whenever you introduce a new variable without assigning a value to it, its default value is nil.

Ruby provides you with several methods to verify the type of a given variable. The most common ones are class, is_a?, and kind_of?.

The fact that variables don't have a fixed type implies that you could first assign an integer value to a variable, then assign it a string literal, and, finally, assign it the instance of a given object:

# Don't do this
my_var = 100
my_var = "Antonio"
my_var = MyObject.new

Though this is possible, it's highly discouraged. If you start assigning different types of values to the same variable throughout your program, you'll easily introduce bugs and greatly impair the readability of your code.

Ruby's dynamic nature has several (positive) consequences. Aside from being a more immediate, easier, concise, and readable programming language which doesn't get in the way, its dynamism implies that many features and design patterns that are required in other languages are already incorporated in Ruby and its Standard Library. For example, think of .NET Generics; in Ruby, there's no reason for them to exist.

3.4.2.3. Rule: Define Scope Using Sigils

Ruby makes use of sigils, which are symbols attached to an identifier to indicate its scope or data type. In Ruby's specific case, they don't offer any data type information, but they do fundamentally define the scope of the variables.

Local variables don't have any sigils attached to their names, and as such, they start with a letter or an underscore. Identifiers starting with an @ character are instance variables. Those starting with @@ are class variables. Finally, global variables start with a dollar sign ($).

The following summarizes this:

price     # local variable
@price    # instance variable
@@price   # class variable
$price    # global variable

The next chapter explains Ruby's object model, so the differences between the scopes of these variables will become evident.

3.4.2.4. Convention: Append ! and ? to Certain Methods

Ruby methods can end with a special character, like an equal sign, an exclamation mark (sometimes referred to as bang), or a question mark (as mentioned before). The final equal sign is used for setter methods, as you see in the next chapter. The other two, ! and ?, are appended to method names to increase the readability of code and to distinguish methods that behave differently despite having an otherwise identical name.

Many methods from the Core and Standard libraries adopt this convention and so should you. Whenever you define a new method, you need to decide if using one of them is appropriate, based on the behavior of the method. The rule of thumb is that methods that "answer a question" and return a true or a false value should be defined with a question mark at the end. For example:

"    ".empty?            # Evaluates to false, because spaces are characters
"".empty?                # It's the empty string, so the returned value is true

Methods that don't answer any questions, but rather perform certain actions on copies of the receiver object should not have any special ending signs. The method upcase of the class String is a perfect example:

name = "Matz"            # "Matz"
new_name = name.upcase   # new_name's value is "MATZ"
name                     # name's value is still "Matz"

Finally, methods that directly alter the object that they were called on should be defined with an exclamation mark as the last character. Methods such as these that alter the receiver carry an ! as a warning for the developer, because they can modify or destroy the receiver object, as shown in the following snippet:

name = "Matz"            # "Matz"
new_name = name.upcase!  # new_name's value is "MATZ"
name                     # name's value is "MATZ" as well

There are a few exceptions. For example, the element assignment of a string doesn't have an exclamation mark, yet it still alters the string. Whenever you're in doubt, you can consult the documentation (for example, through the ri tool) and verify Ruby's behavior from irb.

When working with these kinds of methods, you should be cautious and also keep in mind that they typically return nil when the method doesn't make any changes to the receiver object. Here is a practical example:

my_string = "1234"                  # "1234"
new_string = my_string.downcase!    # new_string's value is nil
my_string                           # my_string's value is still "1234"

The string "1234" can't be transformed into lowercase, because it's composed entirely of digits which are, in their nature, neither upper- nor lowercase. Hence, the downcase! method does not alter the receiver (my_string), and therefore, returns nil as a result. my_string will contain the original value and new_string will contain nil. This behavior is expected.

NOTE

Be careful when assigning the return value of a method that ends with an exclamation mark. In this example, you may have mistakenly assumed that downcase! would still assign the original string to a new_string like downcase would. But this is clearly not the case.

Using this set of conventions, you could theoretically have three versions of the same method name (for example, read, read!, and read?). This doesn't happen very often, but it isn't uncommon to have classes that offer some methods in both a "plain" version that returns a properly modified copy of the receiver, and one that ends with an exclamation mark, which actually modifies the original object.

3.4.2.5. Rule and Convention: Naming Constants

Identifiers that start with an uppercase letter are constants. Once you set the value of a constant, you're not really supposed to change it in your application. However, Ruby doesn't strictly enforce this, and instead of raising an exception, it issues a warning:

>> ALMOST_PI = 22/7.0
=> 3.14285714285714
>> ALMOST_PI = 355/113.0
(irb):2: warning: already initialized constant ALMOST_PI
=> 3.14159292035398

It's idiomatic to use all uppercase letters for the name of constants. For multiword constants, use the uppercase SNAKE_CASE naming convention. This is not a requirement though, given that Ruby will recognize a constant based solely on the case of its first letter. Yet it's still a good convention to adhere to. One exception is the case of class and module names (which are constants too); you would typically use MyName rather than MY_NAME.

Modules are formally introduced in the next chapter. For the time being, simply think of them as namespaces.

3.4.3. Working with Numbers

It's already established that numbers are objects. Ruby provides several classes for working with numbers. Some of these are part of the Core Library, while others are included in the Standard Library. The first are "built-in" or readily available, whereas files containing the classes that belong to the Standard Library need to be loaded explicitly into the program. Figure 3-6 shows the inheritance hierarchy for Ruby's numeric classes. Similar to .NET, there is an Object class from which each class inherits.

There are quite a few classes, but arithmetic operations in Ruby couldn't be easier. The following list analyzes each of the classes that you'll be using in your programs:

  • Fixnum: Integer numbers that can be represented on a machine word, minus 1 bit. This typically means 31 bits. In Ruby though, you never have to worry about overflows and rarely have to think in terms of bits. If the number becomes too big to be represented as a Fixnum, it is automatically converted to Bignum.

  • Bignum: Arbitrarily large integer numbers outside of the range of Fixnum. When a Bignum object becomes small enough to fit in a Fixnum, it automatically gets converted.

  • Float: Real numbers in double-precision floating-point representation.

  • BigDecimal: Real numbers with arbitrary precision. They can be seen as the equivalent of Bignums for the floating-point world.

  • Complex: Complex numbers.

  • Rational: Rational numbers (fractions).

Figure 3.6. Figure 3-6

BigDecimal, Complex, and Rational are far less common and are part of Ruby's Standard Library. As such, if you wanted to use rational numbers, for example, you would need to require the proper file from the Standard Library. The following irb session should clarify this requirement:

>>r = Rational(2,6)
NoMethodError: undefined method 'Rational' for main:Object
        from (irb):1
>>require 'rational'
=>true
>>r = Rational(2,6)
=>Rational(1, 3)
>>r.to_f
=> 0.333333333333333

As you can see, this example first tries to create a Rational object (corresponding to the 1/3 reduced fraction), but it can't because Ruby doesn't have Rational in its Core Library. When you require rational, irb tells you that it was successfully loaded and from then onward, you'll be able to use Rational.

require loads a Ruby file only once. If a file is already loaded, require returns false.

You assign a rational number to the variable r and then call its instance method to_f to obtain its floating-point representation. On a side note, rational numbers created in this way are reduced to their lowest terms.

At this stage you should really just worry about understanding how integer and floating-point numbers work in Ruby. And the easiest way to do that is with a few examples.

3.4.3.1. Fixnums and Bignums

Conversions between fixnums and bignums are usually automatic, and you can practically ignore the difference between the two in most cases. As previously mentioned, fixnums are stored in a native word, which is 4 bytes on most architectures. Bignums, on the other hand, can be arbitrarily large. If you wish to verify this and would like to see how much space an integer occupies, you can use the method size, which is available for both classes:

x = 2302
y = 150
x.class        # Fixnum
x.size         # 4 (bytes)
power = x**y
power.class    # Bignum
power.size     # 212 (again, bytes)

When dealing with large numbers, for legibility purposes, Ruby allows you to separate the digits with an underscore. For example, you can use award = 1_000_000 instead of award = 1000000, if you wish (yes, some may even opt for award = 10**6 in this particular case).

Fixnum and Bignum numbers support all the basic arithmetic operations that you would expect as follows:

a = 4
b = 5
sum = a+b
subtraction = a-b
product = a*b
power = a**b
division = a/b
modulus = a%b

Be Aware of the Differences

Ruby behaves very differently from C, C++, Java, C#, and Visual Basic when it comes to divisions where at least one operand is negative. In such instances, Ruby rounds the quotient to minus infinity, whereas C# and the other compiled languages that have already been mentioned round to zero. What this means is that, for example, 13/(-4) would return −3 in these languages, whereas it's −4 in Ruby.

The same care should be applied when dealing with modulo operators. A different quotient obviously affects the remainder. Moreover, in C# or VB the sign of the remainder (determined through the modulo operator) is provided by the sign of the first operand. In Ruby, in such circumstances, it's the second operator that determines the sign of the result. For instance, 13%(-4) is 1 in C#, but −3 in Ruby.

You should keep this consideration in mind whenever you use the modulo method, its equivalent % operator (syntax sugar), and the divmod method, which returns both the quotient and the modulus as elements of an array. The method remainder returns results consistent with C, C#, and all the other languages mentioned. So 13.remainder(−4) returns 1, as you would expect.

For further examples regarding this issue, please consult the documentation for divmod (with ri Numeric#divmod).


Shortcut assignments are available as well as shown here:

a = 4
a += 2   # Equivalent to a = a+2
a -= 2   # Equivalent to a = a-2
a *= 3   # Equivalent to a = a*3
a /= 6   # Equivalent to a = a/6
a %= 2   # Equivalent to a = a%2

Unlike C++ and C#, Ruby doesn't provide a ++ (increment) or -- (decrement) operator. But just as you would in Visual Basic, you can use a shortcut assignment (for example, i += 1 or i -= 1). In Ruby, the need for these two operators is even less prominent, given that programs tend to rely on higher-level control structures (such as iterators) when looping, as opposed to the typical loop of C-like languages.

Ruby allows you to convert an integer number into a string through the to_s method. Not surprisingly, this is the Ruby equivalent of the Object.ToString method provided by the .NET Framework. As such, it's not exclusive to numbers; you'll find that it's available for any object and you can customize it within your own classes. When dealing with integers, it can do more than a plain conversion to string, as shown here:

42.to_s       # "42"
42.to_s(2)    # "101010"
42.to_s(8)    # "52"
42.to_s(16)   # "2a"

Notice how when passing an integer argument (the radix), you specify that the string should represent the receiver (42) in binary, octal, or hexadecimal notation. The base that you select is not limited to 2, 8, and 16, and can be any positive number between 2 and 36 ("z" would be the last admissible digit in that case).

Integer literals can also be written in binary, octal, or hexadecimal by prefixing them with a 0b, 0, or 0x, respectively:

0b110   # Binary, equivalent to 6
0177    # Octal, equivalent to 127
0xfff   # Hexadecimal, equivalent to 4095

Finally, you can use the chr method to obtain a string containing the equivalent ASCII character that is represented by a given integer as shown here:

59.chr     # ";"
65.chr     # "A"
93.chr     # "]"
200.chr    # "310"
3000.chr   # RangeError: 3000 out of char range

3.4.3.2. Floats

Floats are numbers with a decimal point as in the following examples:

22/7.0     # 3.14285714285714
49.99
-273.15

2.1e3      # 2100.0
0.1010101
0.3333333

Unlike integers, the chr method doesn't exist and the to_s method on floating-point numbers doesn't accept any parameters:

65.0.chr                   # NoMethodError: undefined method 'chr' for 65.0:Float
345_002_132.1932.to_s      # "345002132.1932"
345_002_132.1932.to_s(2)   # ArgumentError: wrong number of arguments (1 for 0)

All the basic operations that you saw in the "Fixnums and Bignums" section are available for floats too, but it's worth nothing that whenever one of the operands is a float, the result will be a float because Ruby performs an automatic conversion. For example:

1/3    # Evaluates to 0. Integer division between two fixnums.
1/3.0  # 0.333333333333333
1.0/3  # 0.333333333333333

You should use the to_f method on one of the operands whenever you want to divide two integer numbers to obtain a float.

The Float class implements all the useful methods that you'd expect, and that you can find in any other modern programming language. You can use the methods to_i, to_int, or truncate to convert a float into an integer by eliminating the decimal part of the number. Similarly, you're able to use the methods floor, ceil, and round as shown here:

1.8.to_i       # 1
2.1.ceil       # 3
2.9.floor      # 2
1.8.round      # 2
1.3.round      # 1
1.5.round      # 2

Aside from regular "finite" floats, two special numbers exist: NaN (short for Not a Number) and Infinity. A float is represented with NaN when the number is not defined, and it is Infinity (or -Infinity) when it's extremely large. The following example should help demonstrate this:

1.0/0    # Infinity
-1.0/0   # -Infinity
0.0/0    # NaN

You can use the nan?, finite?, and infinite? methods to verify whether or not a float is a regular number.

Floats are not immune to accuracy errors, but as a developer, you should already be aware of this:

x = 0.4 - 0.3   # 0.1
y = 0.1         # 0.1
x == y          # false

This is due to their different representation. Using the "format % value" shorthand, you can verify this as follows:

>> "%.20f" % (0.4 - 0.3)
=> "0.10000000000000003000"
>> "%.20f" % (0.1)
=> "0.10000000000000001000"

In Ruby, the machine epsilon is available through Float::EPSILON. The two colons act as a scope resolution operator, and allow you to access the EPSILON constant within the Float class. Using this, you can perform relative comparisons and work around the accuracy issue:

x = 0.4 - 0.3
y = 0.1
epsilon = Float::EPSILON   # 2.22044604925031e-016
(x - y).abs <= epsilon     # true

You are probably very aware of this, but it's worth repeating: do not use floats for financial and monetary calculations! Whenever accuracy is a must, use the BigDecimal class (similarly to how you'd use decimal in .NET):

require 'bigdecimal'

x = 0.1
bx = BigDecimal(x.to_s)              # #<BigDecimal:28d6128,'0.1E0',4(8)>
by = BigDecimal((0.4 - 0.3).to_s)    # #<BigDecimal:28d1894,'0.1E0',4(8)>
bx == by                             # true

You can also automatically convert from floats to decimals in this way:

require 'bigdecimal'
require 'bigdecimal/util'

x = 0.1.to_d
y = (0.4 - 0.3).to_d
x == y   # true

3.4.4. Booleans

true is a keyword that evaluates to the only instance (known as a singleton instance) of the TrueClass; similarly, false is a keyword that evaluates to the only instance of the FalseClass. Ruby provides you with the usual operators for Boolean expressions:

true and false   # false
true and true            # true
true && false    # false
true && true     # true

The and and && operators are very similar but it's recommended that you use the latter, because it has higher precedence and is the one that's recommended for Rails' internal code. Note that the && operator applies a short-circuit evaluation; therefore, if the first operand evaluates to false, the second won't be calculated at all, no matter what it is:

false && 3/0   # Evaluates to false with no exceptions raised

Similarly, you have the or and || operators:

true or false    # true
false or false   # false
true || false    # true
false || false   # false

For the same reasons just mentioned, use || whenever you can. This performs the same short-circuited evaluation, so if the first operand is true, there is no reason to calculate the second one:

true || 3/0   # Evaluates to true with no exceptions raised

Ruby also has the & and | operators (single character as opposed to double), but these are, respectively, the bitwise AND and bitwise OR.

To no one's surprise, not and ! are also available:

not true # false
!false   # true

Most classes provide operators for comparing instances.

These classes use the Comparable mixin as explained in the next chapter.

<, <=, ==, >, >=, and between? are the most common ones:

3 < 5            # true
4 <= 4           # true
5 == 3           # false
9 >3             # true
9 >= 10          # false
3.between?(3,8)  # true

Only the between? method should be new to you and it can really come in handy at times.

Boolean expressions are often used in control structures, like if or while statements, so it's important to point out that any expression can be evaluated as a Boolean by Ruby.

When Ruby requires a Boolean value, everything but nil and false evaluates to true. This implies that 0, an empty string, or even NaN will all evaluate to true when a Boolean value is required. The following will print "Zero!" and "Empty String!":

if 0
  puts "Zero!"
end

if ""
  puts "Empty String!"
end

For the time being, ignore the specific syntax of the if statement in Ruby. It is formally introduced in the next chapter along with other control structures.

The && and || shortcut evaluation's nature can (sometimes) be used even when you don't need a Boolean value. In fact, && returns the second operand as long as the first is not false or nil, whereas the || operator returns the first operand if this is true, otherwise the second operand is returned:

10 && 20            # 20
"hello" && "Ruby"   # "Ruby"
false && Time.now   # false

10 || 20            # 10
"hello" || "Ruby"   # "hello"
false || Time.now   # Thu Jun 19 23:12:47 −0400 2008
false || nil        # nil

There is one particular idiom that is very common in Ruby programs:

x ||= y

If x evaluates to false, y gets assigned to x. If x already has been assigned a value other than false or nil, no assignment is performed. This is very efficient because if the expression on the right side of ||= is expensive (like retrieving data from a database) it gets executed only once in the life cycle of the variable on the left (assuming that this expensive calculation doesn't return false or nil). It's an easy way of achieving memorization.

3.4.5. Strings

Strings are the data type that you'll be dealing with the most. String objects can be created by explicitly calling String.new. As you've seen so far, string literals can also be defined with double quotes or with single quotes, and there is an important distinction between the two. Double-quoted literals interpret escape sequences and allow substitutions, whereas single-quoted literals do not.

Let's see this is in action:

name = "Antonio"
"Name:	#{name}"   # Name:        Antonio
'Name:	#{name}'   # Name:	#{name}

During the evaluation of the second line, which is a double-quoted literal, the sequence was interpreted as a tab. The variable name was also substituted with its actual value "Antonio." Whereas when the third line was evaluated, no interpretation of or substitution of name occurred.

This substitution process is called String Interpolation. Ruby recognizes #{expression} patterns inside double-quoted strings and substitutes the value of the expression in the string. This is usually much cleaner than concatenating values. For example, in the following snippet, the string "My name is Antonio Cangiano." gets printed twice. Both approaches work, but the string interpolation one is definitely the way to go:

name, lastname = "Antonio", "Cangiano"   # Parallel assignment

# Don't do this
puts "My name is " + name + " " + lastname + "."

# Do this instead
puts "My name is #{name} #{lastname}."

It is worth noting that the substituted expressions are not limited to simple variables but can be arbitrarily complex as follows:

x = 3
y = 4
"Hypotenuse: #{Math.sqrt(x**2 + y**2)}" # "Hypotenuse: 5.0"

When dealing with long strings that span several lines it's convenient to use the %q and %Q literal constructors as shown here:

%q{
This is a long string that spans multiple lines.
Any line within the matching brackets will be part of the string.
}

%Q{
This is a long string that spans multiple lines.
Any line within the matching brackets will be part of the string.
I can include expressions like this #{expression} and escape sequences as well.

}

Using %q is equivalent to using single quotes, and %Q is the same as creating the string literal and surrounding it with double quotes. The curly brackets are arbitrary, given that other matching special characters could be used as well (for example, %Q( ), %Q[ ], or %Q! !).

Aside from concatenating strings through the addition operator or through string interpolation, you can also "multiply" them:

"abc" * 5 # "abcabcabcabcabc"

Ruby is very flexible when it comes to working with strings. You can treat them as arrays and access substrings directly. For example, assume that you have the following string:

string = "Ruby on Rails for Microsoft Developers"

If you want to retrieve the first character, you can use string[0]. Unfortunately, passing a single Fixnum argument to the [] method returns the numeric representation of the ASCII character, which is not what you want (in most cases, at least). To obtain the actual character you would have to either concatenate the chr method to string[0] or slice the string by passing two Fixnum arguments to []:

string[0]       # 82
string[0].chr   # "R"
string[0,1]     # "R"

This behavior has been fixed in the upcoming version of Ruby, which will most likely be released in its stable form as Ruby 1.9.1. By the time you read these words, the release should already be out but this is not something that should concern you. It will take a long time for the community, and all the useful gems and plugins, to move to this new version. For the relatively near future, Ruby 1.8 is the version in use for Rails programming.

Passing two Fixnum arguments (call them a and b) tells Ruby that we requested a substring, which starts at the a position and continues on for a number b of characters. Negative indexes can be used as well, with −1 representing the last character of a string, −2 the one before it, and so on. If you want to obtain the last 10 characters from the preceding string, you could use the following:

string[-10,10]   # "Developers"

Running ri "String#[]" provides you with a lot more information and available options.

Use the length method (or its alias size) to determine the length of a string:

string.length   # 38

Another common method is chomp (or its chomp! version), which removes the separator, specified as an argument, from the end of the receiver. When the method is called without parameters, Ruby will remove carriage return characters from the end of the string if they exist:

input = gets   # "example
"
input.chomp!   # "example"

gets is Ruby's equivalent of .NET's Console.ReadLine.

You can also substitute strings using the sub or gsub method:

"hello".gsub('h', 'Oth')   # Othello

The String class offers a wide range of methods to manipulate strings. It is far easier to work with strings in Ruby than it is in C#. It's strongly advised that you run ri String to familiarize yourself with the many methods available. There is no reason to "reinvent the wheel" and with Ruby, for any given micro-task, there is often a built-in method ready to help you out.

An important distinction exists between Ruby's strings and C# strings. In C# or VB.NET, strings are immutable. In Ruby they are not (like C and C++). It's important to keep this in mind. The following example should help clarify this further:

first = "Ruby"
second = first

first.reverse!

puts first       # "ybuR"
puts second      # "ybuR"

Notice how reverse! is able to modify the first string, and how this affects the second variable that references it as well.

Whenever you need to render an instance unmodifiable, you can use the freeze method and verify if an instance is currently frozen through the frozen? method.

string = "Ruby"
string.frozen?    # false
string.freeze     # Freezes the object referenced by the variable string
string.frozen?    # true
string[0] = "N"   # '[]=': can't modify frozen string (TypeError)

3.4.6. Symbols

Not only Ruby's strings are mutable, but two different String instances with the same identical value are stored in memory as two distinct objects. This means that strings aren't the best choice when you need to use and reuse some form of identifiers that won't change over time.

Ruby finds an answer to this point in symbols. A symbol is an instance of the Symbol class. Symbol objects are used by the Ruby interpreter to represent names and certain strings. What's more important, only one instance exists for a given symbol.

Using the object_id method, you can determine if two instances are the same actual object in memory. Compare the different behavior between strings and symbols shown here:

# Strings
"Rails".object_id   # 21103620
"Rails".object_id   # 21103600
"Rails".object_id   # 21103580

# Symbols
:rails.object_id    # 99298
:rails.object_id    # 99298
:rails.object_id    # 99298

Symbols can be defined as :name or :"name" literals or created by to_sym methods available in a few classes (including String, of course). Some people like to think of symbols as lightweight strings, but it's much better to distinguish them in terms of when their usage is appropriate. When the focus is on the name itself — and this can be used over and over — symbols are favored over strings.

Symbols are ideal for keys of associative arrays (or Hashes, in Ruby speak) and are widely used when programming in Ruby on Rails. This is an example, taken straight from the Rails documentation:

link_to "Profile", :controller => "profiles", :action => "show", :id => @profile

As you can see the symbols :controller, :action, and :id are the keys of the hash passed to the link_to method (a Rails helper).

3.4.7. Regular Expressions

Ruby has built-in support for regular expressions, which are instances of the class Regexp. Regular expressions are essentially patterns used to match and extract information from text. They are a very powerful and useful tool, even though they can easily get complex and hard to understand.

In Rails they are often used to validate user input. For example, you may want to verify that a phone number is in the right format. Regular expressions allow you to do this, even if they appear to be very complex to the untrained eye.

Instances of Regexp can be created by delimiting the pattern between forward slashes /pattern/, through the %r{pattern} literal, or by passing the pattern to the Regexp constructor (for example, Regexp.new(pattern)).

Ruby has two built-in operators to test whether or not a given regular expression matches a string: =~ and its opposite !~. The first is the most common one and returns the index of the first matching character if there is a match, or nil if there's not:

'A long string with some text' =~ /long/  # 2
/Microsoft|Rails/ =~ "Ruby on Rails"      # 8
/s+/ =~ "The answer is 42"               # 3
/d+/ =~ "a word"                         # nil

As you can see the order of the operands is not important (such as if the string is on the left or on the right of the operator).

In case you are new to the world of regular expressions, the first one indicates that the word "long" has to appear in the string (and it does on the third character), the second that the string needs to contain either "Microsoft" or "Rails" for a match to exist, the third that the string needs to contain at least one space, and the last one that the string needs to contain one or more digits (and it doesn't, so nil is returned).

3.4.8. Ranges

Instances of the class Range represent a set of values defined by their start and their end. As usual, they can be defined through Range.new or through literals as shown here:

Range.new('a', 'z') # "a".."z"
'c'..'k'  # "c".."k"
−10...10  # −10...10

The difference between literals defined with the two dots and literals defined with the three dots is that ranges defined with the three dots won't include the last element. So −10...10 will include consecutive numbers from −10 to 9.

Ranges are not limited to single characters and integers. If a class, whether user-defined or built-in, satisfies two specific requirements, ranges can be constructed using its instances. The two requirements are: 1) The class instances can be compared using the special <=> operator, which unequivocally allows for a comparison of the elements, and 2) The class needs to implement a method called succ, which returns the next instance in a sequence.

Don't worry about the details of these requirements; they essentially mean that to create a range with an arbitrary object type, these objects need to be comparable (for example, 4 < 8) and that there must be a way to obtain the next number (for example, 6 following 5, 7 after 6, and so on). The second requirement is not a strict one; in fact you could create a range of floats. However, a range loses a lot of its usefulness if it can't be iterated over.

You can convert ranges to arrays, using the to_a method:

(1..10).to_a   # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Or through the so called splat operator, which "expands" the range into the array:

[*1..10]       # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Iterating over ranges is fairly easy, because they are essentially collections of consecutive objects. The following snippet loops through a range and prints the whole English alphabet, with one character per line:

for c in 'a'..'z'
  puts c
end

On a side note, idiomatic Ruby code uses two-space indentation.

This is perfectly fine and somewhat friendly to C# and VB programmers. However, the true "Ruby way" is to put aside the "syntax sugar" provided by the for loop and opt for the each iterator instead as shown here:

('a'..'z').each {|x| puts x }

To understand that one-liner, a few concepts need to be introduced. The next chapter tells you all you need to know about blocks and iterators, including how to create your own iterators for your classes and for built-in ones.

It may seem quite surprising, but Ruby allows you to reopen classes, including core ones, and define additional methods of your own or overwrite existing ones. This is a powerful feature of the language, though you do need to be careful not to abuse it. The next chapter covers this topic as well.

3.4.9. Arrays

Arrays in Ruby are instances of the Array class. They are collections of objects, whose integer index starts at 0, like in most other languages. These objects can be of any type and don't have to be homogeneous. A single array could contain numbers, strings, ranges, Booleans, and nil values. An array of arrays is perfectly fine as well. Of course, it usually makes sense to collect elements of the same type, but this is not enforced by the interpreter. With no set type and size, arrays don't require any special declarations.

[] and Array.new can both be used to create arrays, for example:

a = []                 # []
b = Array.new          # []
c = Array.new(5, 0)    # [0, 0, 0, 0, 0]
d = [2, 3, 7, 9, 18]   # [2, 3, 7, 9, 18]

As you can see, when you pass two parameters to the Array.new method, you are specifying the number of elements (in this case 5) "to initialize" and their value (in this case 0). Again, this doesn't mean that the array can contain only the specified number of elements. It's just an available initialization option that can be convenient in a few circumstances.

In the "Strings" section of this chapter, you've already seen how you can access individual characters by referring to their index, as well as how to select a substring, by specifying an offset and the number of characters required. The same rules apply to arrays, and this shouldn't come as a shock if you consider that strings are essentially sequences of bytes, for example:

array = ['Matz', 'David', 'Antonio']
array[0]                # "Matz"
array[-1] == array[2]   # true
array[2] = 'Tony'       # "Tony"
array                   # ["Matz", "David", "Tony"]
array[1, 2]             # ["David", "Tony"]
array[0, 2]             # ["Matz", "David"]

puts vs. p

If you are trying out these lines of code for yourself through irb, you will see the same properly formatted results that I've indicated in the inline comments. If, on the other hand, you are trying them out from a Ruby file, prefixing the expressions with puts, you may have noticed the following:

array = ['Matz', 'David', 'Antonio']
puts array   # Prints Matz, David and Antonio, one per line

That's not very nice, especially when dealing with large and complex arrays. In order to visualize arrays, the way you did in the inline comments, and the way irb displays them, you should use the method p rather than puts.

This method is very handy not only with arrays, but whenever you need to analyze objects. What this does is print the object's data based on the return value of its inspect method.

Even with simple strings, the difference between puts and p is evident: puts "hello " prints "hello" whereas p "hello " prints the actual stored value "hello ". If you are troubleshooting a problem, the second is definitely preferable.


If you decide to assign a value to the 50th element in an array (for example, array[49] = 'Something'), and there are currently only three elements, the assignment will be successful and the values from the 4th element to the 49th one will be nil.

Similarly, you can add elements:

a = [3, 10, 20, 44]
p a << 5                 # [3, 10, 20, 44, 5]
p a.push(2)              # [3, 10, 20, 44, 5, 2]

And remove them:

a = [1, 2, 3, 4]
a.pop                    # 4
a                        # [1, 2, 3]
a.delete_at(1)           # 2
a                        # [1, 3]

The following snippet shows a few of the many methods available for Array objects:

numbers = [9, 5, 3, 4, 2, 8, 1, 6, 7]

numbers.first            # 9
numbers.last             # 7
numbers.length           # 9

numbers.max              # 9
numbers.min              # 1

numbers.join             # "953428167"
numbers.join(", ")       # "9, 5, 3, 4, 2, 8, 1, 6, 7"

numbers.sort             # [1, 2, 3, 4, 5, 6, 7, 8, 9]
numbers                  # [9, 5, 3, 4, 2, 8, 1, 6, 7]

numbers[0] = nil         # nil
numbers[3] = nil         # nil
numbers                  # [nil, 5, 3, nil, 2, 8, 1, 6, 7]
numbers.compact!         # [5, 3, 2, 8, 1, 6, 7]

numbers << 9 << 4 << 10  # [5, 3, 2, 8, 1, 6, 7, 9, 4, 10]
numbers.sort!            # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

numbers += [20]          # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20]
numbers - [1,2,3,4]      # [5, 6, 7, 8, 9, 10, 20]
numbers.reverse!         # [20, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
numbers                  # [20, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

The few isolated numbers variables in the snippet are there to show you once again what methods modify the original array and which ones leave it unaltered. As usual, ri Array is your friend.

In the next chapter, when iterators and blocks are introduced, you will see some of the most useful methods of Array objects. Meanwhile, just as you can with ranges, you are able to loop through arrays too, for example:

# Prints 2, 4, 6, 8 one per line
for i in [1,2,3,4]
  puts i*2
end

3.4.10. Hashes

Hashes are associative arrays. They are essentially arrays that allow any type of keys to be used, as opposed to consecutive integer ones, which start from zero. Another important distinction is that Hashes, unlike Arrays, are not ordered.

Hashes are somewhat similar to Hashtables or Dictionary<TKey, TValue> in .NET. Like Hashtables, which return null if the key is not found, Ruby's Hashes return nil by default, unless a different value has been specified as a parameter of the Hash.new method.

You can use two curly brackets to create an object of the Hash class or (as usual) Hash.new:

h1 = {}                        # {}
h2 = Hash.new                  # {}
h1 == h2                       # true
h1.object_id == h2.object_id   # false

Once a variable has been assigned a reference to an empty hash, you can add entries (key-value pairs):

h1["name"] = "George"          # "George"
h1["age"] = 71                 # 71
h1                             # {"name"=>"George", "age"=>71}
h1["name"]                     # "George"
h1["age"]                      # 71

Ruby offers a much friendlier syntax for initializing hashes, and if you consider (as mentioned before) that symbols are a better choice for hash keys, you get the following:

h = {:name => "George", :age => 71}
h[:name]                       # "George"
h[:age]                        # 71

If you have a hash with strings as keys and would like to switch to symbols, you can always use the symbolize_keys! method.

Looping through the keys and values of a hash can be easily accomplished through a for loop or more idiomatically through an each iterator, for example:

h = {:name => "George", :age => 71}

for key, value in h
  puts "#{key} = #{value}"
end

# More idiomatic
h.each {|key, value| puts "#{key} = #{value}" }

This snippet prints the following output:

age = 71
name = George
age = 71
name = George

Hashes are widely used by Rails, so it's important that you understand their basic workings.

Again, ri Hash gives you more details about the class and the available methods. Most of the fun methods, just as was the case for Array, require knowledge of blocks and iterators. Further examples are therefore provided in the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.93.169