The Ruby language is made up of expressions. Each expression returns a value. Even elements that are just statements in other languages, such as if
and for
, are expressions in Ruby that return values.
Ruby is a pure object-oriented language. Every variable or constant in Ruby is an object, and there are no basic non-object types. Every variable or literal responds to the basic method call syntax.
A simple Ruby expression is one of the following:
A literal. Ruby has literal syntax for arrays, hashes, numbers, ranges, regular expressions, strings, and symbols.
The name of an existing variable or constant.
A method call, which combines the name of an existing variable or constant with a method name. The basic form of a method call is <receiver>.<method>(<arguments>). Variants on this form will be discussed later.
One of several special expressions invoked by the use of a keyword such as if, case
, or while
.
Complex expressions can be built using Ruby operators. Variable assignment using =
is considered to be a type of operator. Most expressions can also have arbitrarily complex expressions within them — for example, the arguments of a method call are all themselves expressions.
A Ruby expression ends with a line break unless the Ruby interpreter has a reason to believe the expression is intended to continue. The expression continues if there is an open delimiter such as a quotation mark, parenthesis, bracket, or brace. The expression also continues if the last character in the line is a comma or operator. A complex expression such as an if
statement is usually expected to cross over multiple lines. An expression can be forced to continue to the next line by ending a line with a backslash character.
Multiple expressions can be placed on the same line by separating the expressions with a semicolon.
Ruby has several different mechanisms to create literal objects. In addition to expected literals for number and string, Ruby literals can also create arrays, Booleans, hashes, ranges, regular expressions, and symbols.
A literal array is created by
enclosing the elements in brackets. Inside the brackets, the elements are separated by commas. Each element can be an arbitrarily complex Ruby expression. The result is an instance of the class Array
. For example, lool at the array defined here:
x = [1, "hello", fred]
It contains three elements: a number, a string, and a variable.
If the elements of the array are all strings, Ruby provides a shortcut syntax, using the notation %w followed by an arbitrary delimiter. Elements in the array are separated by a space, and the strings are interpreted as literals. A backslash character can be used to insert a space inside an element, rather than treat the space as a delimiter.
>> %w{zot jenny max butch peabody} => ["zot", "jenny", "max", "butch", "peabody"] >> %w(zot jenny max butch peabody arthur dekker) => ["zot", "jenny", "max", "butch", "peabody", "arthur dekker"]
A second form, with a capital %W
, allows for string interpolation rules to be obeyed inside the array — it's the equivalent of a double-quoted string. See the Strings section for full details.
>> %W(a b #{1 + 1} c) => ["a", "b", "2", "c"]
Adding elements to the end of an array is managed with the push
method or the following operator:
<<.[1, 2, 3].push(4) [1, 3, 3] << 4
Removing the last element from the array is managed with the method pop
. The methods shift
and unshift
provide similar functionality for the beginning of the array.
An arbitrary element in the array can be accessed using index lookup. The first element of the array is at index 0. A positive integer indicates the index from the start of the array, returning nil
if the integer is greater than the size of the array. The index −1 is the last element of the array; other negative integers are counted from the end of the array.
The expression inside the brackets can be a Ruby range, in which case the sub-array corresponding to the indexes in the range is returned. Less commonly, the index expression can be two integers separated by a comma [index, length]
, which returns a sub-array starting at the index for the given length.
[1, 2, 3][0] = 1 [1, 2, 3][-1] = 3 [1, 2, 3][0..1] = [1, 2] [1, 2, 3][0, 1] = [1]
Arrays provide a number of different methods that allow enumeration over the contents of the array. Many of these are provided by the Enumerable
module. The methods include each
, which iterates over the contents of the array, map
, which applies a block to each element of the array and returns the resulting list, and select
, which applies a block to each element of the array and returns those elements for which the result is true
.
Ruby defines the special variables true
and false
, which correspond to the expected Boolean values. They are the sole instances of the classes TrueClass
and FalseClass
. Ruby also defines the special value nil
, which is the sole instance of the class NilClass
.
The value nil
also evaluates to false when evaluated in a Boolean expression. Unlike other scripting languages, no other values in Ruby are treated as false. All values other than false
and nil
are considered to be logically true.
A hash literal contains a series of key/value pairs inside braces. The key and value are separated by =>
and each pair is separated by a comma. The key and value can be arbitrary Ruby expressions.
>> {:a => 1, "b" => "fred"} => {:a=>1, "b"=>"fred"}
The =>
sequence can be replaced by a comma; this should only be done if you don't want anybody to read your code. The created object is an instance of the class Hash
. Ideally, a hash key is an immutable object, such as a symbol or number.
Hash elements are accessed and set through bracket index lookup: hash[key]
. The methods keys
and values
return a list of the appropriate elements, and the method has_key?
returns true
if the key is in the hash.
Ruby's number literals are straightforward. An integer is any sequence of digits. A sign character can be the first character. Integer literals can be written to other bases by starting the numbers with a leading 0
for octal, a 0x
for hexadecimal, and a 0b
for binary. Decimal numbers can also be indicated with 0d
. Underscore characters are ignored in integer literals and therefore are often used as group separators. Integer literals create instances of the class Fixnum
, unless the literal is outside the range of Fixnum
, in which case the literal is of type Bignum
.
>> 100 => 100 >> −100 => −100 >> 0100 => 64 >> 0x100 => 256 >> 0b100 => 4 >> 987_123 => 987123
Floating point literals are a sequence of digits containing a decimal point. There must be at least one digit on either side of the decimal point, or else you get a syntax error. A floating point literal is converted to an instance of the class Float
.
>> 1.3 => 1.3 >> 1.3e2 => 130.0
A Ruby range literal can be indicated in one of two ways. The range consists of two expressions separated by either two or three dots. The two-dot version creates a range that includes the value in the second expression, while the three-dot version excludes the final value.
Ranges are normally used as compact storage for a long sequence of consecutive values, which can be iterated over and converted to arrays. The following example shows the difference between the two- and three-dot versions by showing the difference in the array that is created from the range.
>> x = 1..10 => 1..10 >> x.to_a => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >> y = 1...10 => 1...10 >> y.to_a => [1, 2, 3, 4, 5, 6, 7, 8, 9]
The expressions that make up the two ends of the range can be arbitrarily complex. The range ends are most commonly integers, but any class that implements the method succ
can act as a range boundary.
Dates and strings are also often used as range boundaries.
A regular expression literal can be created by placing a regular expression pattern inside a pair of forward slashes, as follows:
/ab*d/
Ruby also offers an arbitrary delimiter marker for regular expressions %r
, which is often used if the expression pattern itself contains a lot of slashes.
%r{ab*d}
Regular expression literals are converted to instances of the class Regexp
. Regular expressions are used to perform complex pattern matches against strings. In Ruby, these matches are performed using either the operator =~
or the method Regexp.match
. Most characters in the regular expression match against the same character in the test string; however, several special character forms augment the basic behavior.
After the ending delimiter of the literal, one or more characters can be used to indicate optional regular expression behavior. The three most common options are as follows:
i The pattern match is case sensitive
m The pattern match is assumed to encompass multiple lines. Practically, this means that the dot special character matches newline characters.
x Literal white space in the pattern is ignored, allowing you to use spacing to make the expression easier to read. Spaces in the pattern can be included by using a special pattern such as s.
Regular expression literals are converted to instances of the class Regexp
.
Within a regular expression literal, the following 14 characters have special meaning:
( ) [ ] { } . * ? | + $ ^
To actually include one of those characters literally in the pattern, they must be escaped using a backslash, as in *
or \
. Each of the delimiter pairs indicates something different within a regular expression.
Parentheses have their normal function of grouping elements to indicate the scope of operators. For example, /face*/
matches the string faceeee
, while /(face)*/
matches the string facefaceface
. In addition, parentheses cause a portion of the matched string to be saved for use either within the regular expression pattern or after the match is complete. The groups can be referred to using special variables of the form 1
and 2
during the pattern, and $1
and $2
after the match, as in the following example:
>> /(.*)c(.*)/ =~ "abcde" => 0 >> $1 => "ab" >> $2 => "de"
Variables are numbered based on the position of the opening parenthesis — in the pattern /((.*)c)(.*)/
, the variable $1
includes c
.
Brackets are used to mark a set of characters that can match the string at that point, so /[abcd]/
matches a, b, c
, or d
. That pattern could also be written /[a-d]/
; the hyphen indicates an inclusive sequence of characters.
Multiple sequences can be in one set, and so /[a-zA-z]/
matches any upper- or lowercase letter. If the set starts with a ^
, then the pattern matches only characters that are not in the set, so /[^a-zA-z]/
matches non-alphabetic characters. Also, within brackets, all special characters, except the right bracket and hyphen, can appear without being escaped.
Braces are used to indicate the number of times a sub-pattern must exist in the match. The basic form is /[a-z]{3}/
, which would indicate that exactly three characters must be in the matching string. A range can be indicated with {3,5}
, and a minimum value can be indicated with the form {3,}
. Braces are also used as part of the #{expr}
interpolation, which is the same in regular expressions as it is in double-quoted strings.
There are three shortcuts for indicating commonly used ranges for a match. The character *
means "zero or more," and so the regex /a[a-z]*/
matches any string that contains an a
followed by zero or more lowercase letters. The character +
means "one or more" and the character ?
means "zero or one."
By default, the *
and +
characters are "greedy," meaning that they match as much of the string as they can. This can be an issue if there is more than one potential stopping place for the sub-pattern. You can change the default behavior by putting a ?
after the *
or +
. In the first example following, the [a-z]*
pattern matches past the first b
and stops at the last possible point, before the second b
. In the second, non-greedy example, the pattern stops at the first point, before the first b
.
>> /a([a-z]*)b/ =~ "aabceb" => 0 >> $1 => "abce" >> /a([a-z]*?)b/ =~ "aabceb" => 0 >> $1 => "a"
There are several shortcuts to denote common character sequences. The special character ^
matches the beginning of the string, while $
matches the end. Many programmers prefer to use the slightly less cryptic synonyms A
and z
. The variant matches the entire string except for a trailing newline character. The sequence
matches a word boundary;
d
matches any digit and is the same as [0-9]
.
The sequence s
matches the white space characters space, tab, newline, carriage return, and line feed; it is the same as [ f
]
. The sequence w
matches "word" characters, meaning [a-zA-Z0-9_]
. All four of these sequences are negated by replacing the lowercase character with an uppercase one, and so D
matches any non-digit, or [^0-9]
.
Finally, the pipe character indicates a logical or
, and so /a|b/
matches a
or b
.
There are several different ways to write literal strings in Ruby. The simplest is to use single quotation marks. Within a single-quoted string literal, only two interpolations are performed. A '
is used to insert a literal single quote, and a \
is used to insert a literal backslash.
>> 'hello' => "hello" >> isn't' => "isn't"
>> 'nip\tuck' => "nip\tuck" >> 'nip\tuck'.size => 8
In a single quoted string, backslashes that are not used as an escape are treated as literals, but converted to the escape format in the irb output, so the third and fourth examples evaluate to the same result. In the final example, the irb output preserves the escape sequence in the return value — the resulting string only has one backslash, as shown by the size
method.)
With a double-quoted string literal, the full complement of escape characters, such as
for newline and
for tab, are substituted (backslashes that are not part of an escape sequence are ignored). In addition, the sequence #{expr}
is replaced in the string by the value of the expression; the value is converted to a string if needed. For example,
>> "ab#{1 + 1}c" => "ab2c"
There are generic delimiter forms for both string forms, %q
for single-quote and %Q
for doublequote. The double-quote form can also be written as just %
, as in the following example:
>> %q(ab#{1 + 1}c) => "ab#{1 + 1}c" >> %Q(ab#{1 + 1}c) => "ab2c" >> %(ab#{1 + 1}c) => "ab2c"
Ruby also supports Perl-style here docs. A here doc starts with <<
and an identifier, and continues over multiple lines until the identifier is reached.
<<DOC all this is in the here doc string DOC
If there is a hyphen before the identifier, then the closing identifier can be indented;
<<-DOC this can be indentd DOC
Otherwise, it must start at the beginning of a line. By default, the string is interpreted according to double-quote rules; however, if the identifier is encased in quotation marks, then the here doc is interpreted according to the style of quotation marks used.
A symbol in Ruby is an immutable string, similar to an interned string in other languages. In addition, all instances of a symbol with the same value are guaranteed to point to the same internal object. Symbols are used by Ruby as the internal representation of method and variable names. Because they are immutable, they are commonly used as hash keys.
A symbol is formed by a colon, followed by a name or string literal. The string literal is interpreted according to the normal string rules. It does not matter how the symbol is constructed in determining its value; all three of the following are the same symbol:
:person2person :'person2person' :"person#{1+1}person"
One important note about symbols: the Symbol
class does not implement the <=>
operator, meaning that a list of symbols cannot be sorted using the sort
method alone.
Ruby variable names are typical of identifiers and consist of lowercase letters, uppercase letters, digits, and the underscore character, with the following restrictions:
A local variable name must begin with either a lowercase letter or (much more rarely) an underscore. By convention, local variables use underscores to separate words (as in this_is_a_name
), rather than interCaps.
An instance variable for an object starts with an @
sign, as in @thingy
. By convention, instance variables are lowercase and use underscores to separate words.
Class variables for objects start with @@, and are otherwise identical to instance variables.
Constant values must start with a capital letter. By convention, constants that have normal object values are in all capitals with underscores to separate words. The capitalized initial letter is a marker to Ruby that the value is constant.
Class and module names are a special case of constant values. Class and module names must also begin with a capital letter. By convention, class and module names are mixed case.
Ruby has global variables, which begin with $
. Normally, global variables are rarely used within a Ruby program; however, there are several standard global variables, of which perhaps the most commonly used are $1, $2
, and so on, which contain regular expression matches.
Ruby method names come in two forms. The main form is similar to a local variable, starting with a lowercase letter or underscore, and conventionally having underscores separating words. Unlike a local variable, a method name may end with a question mark (?
) or exclamation point (!
).
By convention, a method ending with a question mark, such as nil?
, returns a Boolean value. An exclamation point is usually used to indicate a variant of a method that changes the receiving object in place, rather than returning a changed copy, for example sort
and sort!
. An exclamation point is sometimes used more generally to indicate any method that makes a destructive change on the receiving object.
Method names can also end with an equals sign (=
). This is interpreted to mean that the method is a setter method and is called as the left side of an assignment statement, rather than through a normal method call. So, a method defined as def name=
would be called in a line of code as follows:
obj.full_name = "Scott McCloud"
It is possible for a method and a local variable with the same name to co-exist in the same scope. Under normal circumstances, Ruby does a fine job of resolving ambiguity. The most common confusing case is something similar to the previous example, but without an explicit receiver, as follows:
full_name = "Perry Mason"
Ruby interprets this as an assignment to a new local variable called full_name
. However, you might want it to call the setter method full_name=
for the current value of self
. In this one case, you must include the self
value explicitly to invoke the setter, as follows:
self.full_name = "Perry Mason"
Method names can also be the symbols of one of the operators that is listed in the following section as capable of being overridden. The method declaration is just the symbol for the operator, as follows:
def +(other)
Therefore, the following line
a + b
calls the +
method for object a
if it exists (if the method doesn't exist, you get an error). Many of the operators are defined at the Object
level, and are valid for all Ruby objects.
Ruby operators include the typical set, as well as a few Ruby-specific ones. Following is the list, from highest to lowest priority. However, if you are depending on the details of the priority list in your code, you're probably writing hard-to-read code. Throw in a couple of parentheses.
Table A.1 shows the Ruby operators and their meanings in commonly used classes. All elements in the same table row have the same priority. Unless otherwise indicated, the operators can be overridden as Ruby methods using the same symbol.
Table A.1. Ruby Operators and Their Meanings
Operator | Definition |
---|---|
| Module and class scope resolution. This operator cannot be overridden. |
| Array or hash element lookup, overridden by directories to indicate file globbing, as in dir[file"file.*"]. |
| Array or hash element assignment. |
| Raising to a power. For example, 2 ** 3 = 8. |
| Bitwise complement for numbers. Pattern negation for regular expressions and strings. |
| Logical negation. |
| Unary plus. When overriding this method, use the method name +@, as in def +@. |
| Unary minus. When overriding this method, use the method name -@, as in def -@. |
| Multiplication. Overridden for arrays and strings to indicate repetition; for example, [a, b] * 3 = [a, b, a, b, a, b] "fred" * 2 = "fredfred". |
| Division. If both operands are integers, then so is the result. |
| Modulus. Overridden by strings to provide sprintf formatting. |
| Binary addition. For arrays and strings indicates concatenation. |
| Binary subtraction. For arrays, implements set difference. |
| Bitwise left shift. Left shift is overridden by Array to implement push, by String to implement append, and by IO objects to indicate writing to the output. |
| Bitwise right shift. |
| Logical and for Booleans; bitwise and for integers. For arrays, overridden to mean set intersection. |
| Exclusive logical or for Booleans; exclusive bitwise or for integers. |
| Inclusive logical or for Booleans; inclusive bitwise or for integers. For arrays indicates set union. |
| Greater than. For most objects, all the comparison methods are defined in terms of <=> by mixing in the |
| Greater than or equal to. For most objects, all the comparison methods are defined in terms of <=> by mixing in the |
| Less than. For most objects, all the comparison methods are defined in terms of <=> by mixing in the |
| Less than or equal to. For most objects, all the comparison methods are defined in terms of <=> by mixing in the Comparable module. |
| Comparison operator. Returns −1, 0, or 1 depending on relationship between operands. |
| Equal. |
| Not equal. Implemented in terms of ==. May not be overridden as a method. |
| Equal for purposes of case statement. |
| Pattern match. |
| Not pattern match. Implemented in terms of =~; may not be overridden. |
| Logical |
| Logical |
| Inclusive range operator. Cannot be overridden. |
| Exclusive range operator. Cannot be overridden. |
| Ternary operator. May not be overridden. |
| Assignment. Variant forms are /=, *=, %=, +=, -=, |=, &=, ||=, &&=, **=, >>=, <<=. None of these may be overridden as such, but the variant forms are defined in terms of their other operators. |
| Returns Boolean true if the symbol is defined, as in |
| Logical |
| Logical operators. May not be overridden. |
The basic form of a Ruby method call is as follows:
receiver.method
This indicates that the method is called on the receiver. See the "Objects and Classes" section for a discussion of how the class structure is searched to find the method.
There are a number of optional elements in the method call. The receiver can be omitted, in which case self
is assumed for the current context. As a matter of convention, self
is only explicitly used as a receiver where it is necessary to avoid ambiguity.
Arguments can be passed to the method; they are placed in a comma-delimited list.
receiver.method(arg1, arg2)
The parentheses may be omitted if doing so does not introduce ambiguity. By convention, empty pairs of parentheses are always omitted.
There are a few special argument forms, the calling forms will be discussed here. The "Defining Methods" section will discuss their meaning when responding to a call with special arguments. After the explicit arguments, a series of key/value pairs can be added:
receiver.method(arg1, arg2, key1 => val1, key2 => val2)
The key/value pairs are merged into a single hash object before being passed to the receiver.
An array can also be added with the *
syntax.
x = [3, 4] receiver.method(arg1, arg2, *x)
In this case, the array is unrolled and the elements of the array are passed to the receiver as individual arguments — the receiver in this case would get four arguments. Technically, the array rollup can only appear after the key/value pairs; however, it's almost unheard of for a method to have both.
The final optional argument to the method is a block, which can be defined in three different ways. First, the block can be an instance of the class Proc, in which case the argument must be preceded with an ampersand:
receiver.method(arg1, arg2, &proc)
The second and third ways define the block outside the argument list using either of the following syntax forms:
receiver.method(arg1, arg2) {|block_arg| ...} receiver.method(arg1, arg2) do |block_arg| end
The technical difference between the braces and the do/end syntax is that the braces have higher priority. However, that is unlikely to be an issue in typical Ruby code. By convention, the do/end
form is used for multiline blocks, and the braces are used for single-line blocks.
You will occasionally see the convention that braces are used in any case where you intend to chain the return value of the method call with the block, regardless of how many lines the block takes.
See the "Defining Methods" section for information on how these argument types are defined and used within objects.
Much of Ruby's control flow is managed by special expressions based on keywords. This section will discuss them.
An if
expression allows for conditional evaluation. The most basic form of the expression is as follows:
if <boolean expression> <body...> end
If the Boolean expression evaluates to true
, then the body expressions are evaluated.
An if
expression can be written on a single line, in which case the keyword then
must separate the Boolean expression and the body.
if <boolean expression> then <body...> end
The if
expression takes an optional else
clause, which is evaluated if the Boolean expression evaluates to false
.
if <boolean expression> <body...> else <else body...> end
If there are multiple else
clauses, then the keyword for separating them is elsif
. The body corresponding to the first Boolean expression to return true is executed. If none of the Boolean expressions return true, then the else
clause is evaluated if one has been included.
if <boolean expression> <body...> elsif <another boolean> <another body> elsif <yet again> <yet again body> else <else body> end
The if
expression as a whole returns the value of the final expression of the evaluated block. This means that an if
expression is commonly used as a more-readable replacement for the ternary operator:
result = if <boolean> then <true exp> else <false exp> end
An if
expression can also be used after another expression, in which case the expression is only evaluated when the if
expression is true. In the following line of code, the expression is completely skipped if the Boolean is false.
<expression> if <boolean>
This version is commonly used as a guard clause at the top of a method, as follows:
return if foo.nil?
Ruby provides an unless
expression as a shortcut for if not
. The unless
expression is the exact opposite of an if
; the body is only evaluated if the Boolean expression is false.
unless <boolean expression> <body> end
In normal usage, the unless
expression is preferred to a simple if not
. The unless
expression allows for an optional else
statement, but that usage is not recommended. There is no unless
equivalent to an elsif
clause.
The unless
expression also has a modifier form, which evaluates its attached expression only when the Boolean is false.
<expression> unless <boolean>
Ruby has a very flexible case
statement. The basic form is as follows:
case <value expression> when <expression1> <expression1 body> when <expression2> <expression2 body> end
The value expression is evaluated and compared to each when
expression. The first when
expression to be case-equal to the value expression has its body executed. No other clauses are executed. (Note the indentation style — the when
clauses are at the same indent level as the case
line.) The when
clause and the body can be on the same line, but they must be separated by the keyword then
.
The equality test for a case
expression is unusual. The special operator ===
is used to evaluate whether two clauses are equivalent for the purposes of a case
statement. For most objects, the ===
operator is equivalent to the ==
operator, but there are three standard classes that override ===
in an interesting way.
The Range
class overrides ===
to match any number in the range, allowing you to write the following:
case bowling_score when 0..100 then "Not very good" when 101..200 then "Good" when 200..299 then "Very Good" when 300 then "Perfect" end
The Regexp
class overrides ===
to perform a string match, allowing you to write the following:
case name when /$A[A-M]/ then "In first half of alphabet" when /$A[N-Z]/ then "In last half of alphabet" end
Somewhat less interestingly, Module
and Class
override the ===
operator to indicate that the value class is either equal to or a subclass of the class in the when
clause.
As with the if
and unless
expressions, the value of the case
expression is the value of the last expression evaluated in the chosen clause.
Ruby allows you to place an else
clause at the end of the case
expression, which is evaluated if none of the other clauses match.
case name when "fred" then "Hi Fred" when "barney" then "Hi Barney" else "Hi" end
You can also place multiple matching expressions in a single when
clause, separated by a comma. The associated expressions are evaluated if any of the expressions in the list match the initial value:
when "fred", "barney" then "Hi Guy"
Finally, you can also have a case
statement without an initial value in the case
clause. In this situation, each when
clause contains one or more complete Boolean expressions; if any of them are true, then the associated expressions are evaluated.
case when obj.nil?, obj.size > 3 then "do something" when obj.size = 5 then "do something else" else "shrug your shoulders" end
Ruby has a basic for
loop expression.
for <loop variable> in <enumerable exp> <body expressions> end
The loop variable is any valid local variable name. The expression must evaluate to an object that responds to the method each
, which includes an array or any Ruby Enumerable
. The body of the loop is evaluated once for each element in the enumerable expression, with the loop variable being set to each element in turn. This is almost exactly equivalent to the each
method with a block (except for one minor detail: local variables created inside the body of the for
expression are available outside it, while local variables created in an each
block are not). In normal Ruby practice, calling the each
method is preferred.
If the elements of the enumerable expression are themselves enumerable, they can all be assigned separately in the declaration of the for
expression:
for x, y in [[1, 2], [3, 4]] p "(#{x}, #{y})" end
The entire for
expression can be placed on a single line, in which case the list is separated from the body by the keyword do
.
for x in [1, 2, 3] do p x ** 2 end
Ruby's while
expression is extremely simple.
while <boolean expression> <body> end
The Boolean expression is evaluated first; if it is true, then the body is evaluated. The body continues to be evaluated until the Boolean expression is false at the end of a loop (or until the loop is exited through a loop control keyword).
The entire expression can be placed on a single line, in which case the Boolean expression and the body must be separated by the keyword do
.
while obj.incomplete? do obj.task end
The while
expression also has a modifier version that comes at the end of an expression. The preceding expression is evaluated repeatedly until the Boolean expression is false.
obj.task while obj.incomplete?
If the preceding expression is a block denoted by a begin/end
pair, then the block is always evaluated at least once, regardless of the value of the Boolean expression:
begin obj.task1 obj.task2 end while obj.incomplete?
The return value for a normally exited while
expression is nil
.
Ruby offers the until
expression, which is the exact opposite of the while
expression, looping over the body of the loop as long as the Boolean expression is false:
until <boolean-expression> <body> end
The single line and modifier versions of the until
expression have the same syntax as the while
versions.
Any Ruby loop can be controlled from within the loop with one of the following four keywords: break, next, redo
, and retry
. These keywords work within for
loops, while
loops, and until
loops, as well as Enumerable each
loops and their variants.
The keyword break
ends the loop at the point it is evaluated. The break
keyword may take an optional argument. Within a for, while
, or until
loop (but not in an each
loop), the value of that argument is returned as the value of the loop, allowing you to distinguish an exit that was the result of a break
from a normal exit.
The keyword next
causes the next iteration of the loop to start immediately. The keyword redo
causes the current iteration of the loop to start again from the top of the loop. The keyword retry
starts the entire loop over, returning the Boolean or list expression to its initial value.
There are a couple of nuances to Ruby assignment that you should know. The most basic form of assignment has a variable name on the left and an expression on the right.
score = 27
From that point, the name takes the value of the right-hand expression. Technically, it takes a reference to the object that is the result of the right-hand expression.
You can make multiple assignments in the same line:
score, location = 27, "Soldier Field"
The right side can also be an array with the same meaning — in fact, the previous form is converted internally into the following form:
score, location = [27, "Soldier Field"]
This can also include variable swapping:
x, y = y, x
If the two sides are unbalanced, extra names on the left side are set to nil
, and extra names on the right side are ignored. The last value on the left can have an asterisk preceding it, in which case it behaves like it would in a method argument list and takes any and all extra values on the right side as an array. The last value on the right side can also be prefixed with an asterisk, in which case it behaves like a method call and unrolls its values out of the array to be assigned one by one.
If the left-hand value is an object attribute, then Ruby looks for an appropriate setter method for that object. In the following example, Ruby calls the method score=
on the instance obj
with the argument 27
.
obj.score = 27
Similarly, a bracket reference on the left side triggers a call to the appropriate []=
method. The arguments to that method are, in order, any value that appears inside the brackets and then the value on the right side of the assignment.
A file can be opened by calling the method File.new
.
File.new(filename, mode)
The method returns an instance of class File
. The mode is a short string that indicates what operations can be performed on the file. If the mode is r
, then the file is read-only. This is the default mode value if none is specified. If the mode is w
, then the file is opened for writing. A non-existent file is created and an existing file is emptied. Somewhat less common is a
, which opens an existing file for writing at the end; a non-existent file is still created.
To write string data to a file, you use the method write
, or the shortcut operator <<
. Note that you have to explicitly include the newline characters. If the expression being written is not a string, then it is converted before being written.
f.write("log file ") f << "the next thing "
Files implement the method each
, which enumerates over the file line-by-line. The use of each
allows all the methods of Enumerable
to be used for files.
To read the data, you can use the method readline
, which returns a whole line, the method readchar
which reads a single character, or the method read(int)
, which reads an arbitrary number of bytes.
When you are done with the file, close it with the method close
. The "open, then do something, then close" structure is so common that Ruby provides a shortcut.
File.open(filename, mode) do |f| ## f is the File object, do things with it end
The open
method takes a block. The requested file is opened before the block is executed and closed when the block is complete.
Raising an exception in Ruby is done by calling the method raise
, which is of the class Kernel
, and is thus available anywhere in a Ruby program. The common usage of the method takes as an argument either one of the following:
The class Exception
or one of its subclasses
An instance of the class Exception
or one of its subclasses
An optional second argument is a string message for the exception; an optional third argument is a stack trace.
Handling an exception can take place inside any method without explicitly entering an exceptionaware block. The keywords begin
and end
also denote the borders of a block that can handle exceptions. To actually handle an exception, use the keyword rescue
to start a block that will be invoked when an exception is raised
def this_method_could_break <normal method body> rescue <exceptional code> end
Note the indentation — the rescue
clause is outdented to the same level as the def
or begin
statement.
The rescue
keyword takes an optional list of Exception
classes that are handled by the rescue
clause. If no classes are specified, then StandardError
is the default class, which catches nearly all errors in typical usage. Multiple rescue
clauses may be specified, each with its own response block of code. The rescue
keyword and the associated code may be on the same line, in which case they must be separated with the keyword then.
If specific exception lists are specified, then the list of exceptions can be ended with the phrase => varname
, in which case the variable name is assigned the value of the exception. Even if a variable is not specified, the current exception is always available in the global variable $!
.
After all the rescue
clauses, there are two optional clauses that may be added. The keyword else
is used to introduce code that is invoked if no exceptions are raised in the main body of the code. The keyword ensure
marks code that is always executed at the end of the method or block, regardless of whether or not an exception was raised.
def method <body> rescue
<exception> else <no exception> ensure <always> end
Every value in Ruby is an object, including classes and methods. In this section, you will see how methods, classes, and modules are defined and how they relate to one another.
Methods are defined using the keyword def
. The most basic form is as follows:
def <methodname> <body> end
The limitations on the method name are described in the previous section on variable names. A method defined in this way inside a class or module definition creates an instance method for that class or module. A method defined outside a class or module definition is effectively global to the Ruby program. Technically, it's a method of the class Object.
The return value of a method is the value of the last expression evaluated. You can exit the method at any time by using the keyword return
with an optional value. If no value is specified, then the return value is nil
. Most Ruby programmers do not explicitly use return
in places where it would be redundant. In the following example, the method explicitly returns 0 if the argument is nil
; otherwise, it implicitly returns two times the argument.
def example(argument) return 0 if argument.nil? argument * 2 end
The first variant to the structure involves placing a constant or expression before the method name, separated by a dot. This form is most often used to create class methods:
class User def self.total_count <method body> end end
With the preceding definition, you can then call the method User.total_count
. Generically, this structure binds the method to the object preceding the method name and with that object alone — only that one object can invoke the method definition. In the previous case, within the class definition, self
is set to the class User
, and so the method total_count
is uniquely associated with the class (the declaration def User.total_count
would have the same affect). However, you can also use the form to define a method that is specific to a single instance of a class.
ted = User.new robin = User.new def ted.go_to_work <body> end
After this definition, the method call ted.go_to_work
is successful, while the method call robin.go_to_work
is not.
Method arguments are normally defined in a comma-delimited list:
def method(arg1, arg2, arg3)
Any argument can have an optional default value, which can be a constant or expression. The default value is used if the calling argument list doesn't contain all the arguments. Normally, arguments with default values are placed after all the arguments that don't have default values. The following method can be called with one, two, or three arguments.
def method(arg1, arg2 = 7, arg3 = 2) p "#{arg1} #{arg2} #{arg3}" end method(1) ==> "1 7 2" method(1, 2) ==> "1 2 2" method(1, 2, 3) ==> "1 2 3"
The default value expression can use any argument name defined earlier in the list, meaning that, in the previous example, the default for arg3
could be defined as, say, arg2 * 3
.
The final argument can optionally be an array argument, denoted by putting an asterisk before the argument name. This argument absorbs any remaining values from the method call into an array. It's typical to give an array value the default value of an empty array:
def method(arg1, *arg2 = []) method(1, 2, 3, 4) ## arg1 = 1, arg2 = [2, 3, 4]
If you expect callers of the method to use the key/value feature to roll up arguments into a hash, it's customary to signal that by giving the last method the default value of an empty hash:
def method(arg1, arg2 = {}) method(1,:a => 3,:b => 4) ## arg1 = 1, arg2 = {:a => 3,:b => 4}
The final optional argument to a method is a block, which is the subject of the next section.
A Ruby block is a sequence of executable code that can be defined in one place and executed later on. As mentioned earlier, a block can be defined after a method call using one of two possible syntaxes. In both cases, arguments to the block are placed between pipe characters. Block argument lists cannot have default values or array lists.
receiver.method(arg1, arg2) {|block_arg| ...} receiver.method(arg1, arg2) do |block_arg| end
Within a block, you can place any arbitrary Ruby code. The block retains its context when it's called, and so any local variables that are visible from where the block is defined are still available when the block is called. This includes the values of self
and super
. Take the following example:
def outer_method alpha = 1 beta = 2 [:a,:b,:c].each {|e| p alpha * beta } end
The values alpha
and beta
are defined outside the block, but can still be used inside the block even though the block is eventually executed inside the Array#each
method.
Like other Ruby constructs, the value of the block when executed is the value of the last expression inside the block.
The block argument does not need to be specified in the argument list of the method being called. Instead, the block is just invoked using the keyword yield
, which causes the block to be executed at that point. Any arguments that come after the yield are passed directly to the block.
def block_thingy block_value = yield(1, 2) p block_value end block_thingy {|a, b| a + b} ==> 3
In this example, the existence of the block is only important in the yield
statement, which passes the values 1
and 2
to the block, which adds them.
To determine whether a block has been passed to a method, call block_given?
at any point; it returns true
if the current method is called with a block argument.
If the final argument of a method is preceded by an ampersand, then the method does check for a block argument in the argument call. The block is converted to an instance of the class Proc
and can be invoked using the method Proc#call
. The following example is functionally equivalent to the previous one.
def proc_thingy(&a_proc) proc_value = a_proc.call(1, 2) p proc_value end proc_thingy {|a, b| a + b}
The conversion works both ways; a Proc
value can be passed inside an argument list with an ampersand preceding it and is treated like an ordinary block by the receiving method. In this example, the block_thingy
method is called with an explicit Proc
object, which it treats exactly as though a block has been declared:
def proc_thingy(&a_proc) block_thingy(&a_proc) end
Internally, the method to_proc
is called on the value after the ampersand, which leads to some interesting possibilities. For example, Rails ActiveSupport extends the class Symbol to override to_proc
, such that the following two declarations are equivalent:
[1, 2, 3].map {|i| i.sqrt } [1, 2, 3].map(&:sqrt)
In the first example, the normal syntax is used to declare a block. In the second example, the ampersand triggers a to_proc
call on the symbol :sqrt
, which converts the symbol into a oneargument block where the argument's method, named :sqrt
, is called — identical to the first version. When used judiciously, the symbol-to-proc trick makes for some nicely readable code. This extension has been added to the core for Ruby 1.9.
In Ruby, classes and modules are related concepts. A module has all the abilities of a class except for the ability to create instances. Instead, modules can be included inside classes to provide additional functionality.
A module is defined using the following syntax:
module <ModuleName> <all kinds of goodness> end
The module name is a constant, and must begin with a capital letter.
All expressions inside the module are executed when the module is loaded. Specifically, the module can include classes, other modules, instance methods defined with def
, and module methods defined with def self.
Constants defined within the module, including nested modules and classes, are accessible through the ::
scope resolution operator, as in Module::InnerModule::Class
. Module-level methods are available through normal method syntax — Module.method
. Instance methods in the module are only accessible from an instance of a class that includes the module.
The class definition syntax starts similar to the module syntax:
class <ClassName> <class things> end
Again, the code inside the class definition is actually executed. Constants and module-level methods are defined as in modules, instance methods are accessible to any instance of the class. In particular, many lines of code inside a class definition that look like declarations, such as attribute listings or scope descriptions, are actually method calls to either the class Class
or Module
.
The biggest difference between a class and a module is that a class can create instances of itself using the method new
. Under normal circumstances, you would not override the new
method. If you want your class to perform initialization when a new
instance is created, override the instance method initialize
.
class Animal def initialize(name) @name = name end end scooby = Animal.new("Scooby-Doo")
At the end of this code snippet, scooby
is an instance of Animal
, and its name
attribute is set to Scooby-Doo
. At this, point, the attribute is still private.
By default, the class is placed inside the current module at the location where the class is defined. However, you can explicitly place the class inside a specific module by including the module as part of the class name with the scope resolution operator:
class OuterModule::InnerModule::NewClass
To explicitly place the class at the top-level module, prefix the class name with just ::
.
An alternate syntax allows access to the singleton classes referred to earlier for binding methods to a specific instance. The form discussed previously,
def obj.method <method body> end
is equivalent to the following:
class << obj def method <method body> end end
Within the class << obj
block, any code that is legal within a class can be placed. Methods defined within the block are bound to the specific object mentioned in the class declaration. This mechanism is frequently used with a class as the object:
class Animal class << self def this_is_a_class_method @count = @count + 1 end end end
The method defined in the inner class block is accessible as a class method Animal.this_is_a_class_method
. The reason why this mechanism is sometimes used to define class methods is that, by using this method, each subclass of Animal
gets its own copy of the instance variables, and can maintain separate values.
A class can have a special relationship with a class known as its superclass. The class inherits behavior from the superclass, meaning that instances of the subclass can access any methods defined in the superclass. To define this relationship, use the following form:
class Subclass < Superclass
The superclass is usually the constant name of the class in question, although it technically can be any expression that returns a class object. If no superclass is specified, the class is assumed to have Object
as its superclass. Object
's superclass is nil
.
The special variable self
always refers to the current object whose code is being executed. Within an instance method, self
is the receiver of that method call. Within the parts of a class definition that are outside instance methods, self
is the class object itself.
The special method super
is available inside any method definition and causes the same method name to be called in the superclass. If super
is called with no arguments, then the original arguments to the subclass method are automatically passed to the superclass method. If explicit arguments are used for the super
call, then those arguments are passed to the superclass method, allowing for the case where the superclass method may have a different argument signature than the subclass method.
Previously, you saw that instance methods declared within a module can only be declared by an instance; however, modules cannot create instances of their own. In order for instance methods in a module to be accessible, the module needs to be mixed into a class.
The most common way to mix in a module is with the include
method, if the following module is defined in any_module.rb
:
module AnyModule def self.total_modulate p "calling the class method" end def modulate p "modulating" end end
Then you can write the following in a different file:
require 'any_module' class AnyClass include AnyModule end x = AnyClass.new x.modulate AnyModule.total_modulate ### NOT AnyClass.total_modulate
By including the method, instances of the class can respond to instance methods in the module. The instance of AnyClass
can call the modulate
method, even though it is defined in AnyModule
.
The require
method takes the filename where the module is defined, minus the .rb
extension. You only need to call require
if the module's file has not already been loaded — if the module was loaded at startup, then require
may not be needed.
Rails, for example, provides special functionality to look for unknown modules, such that modules on the Rails load path can be found and loaded without needing require.
However, while using include
adds instance methods to the including class, it does not add class methods, and so the total_modulate
method is still only accessible through the AnyModule
module. In order to add a module's methods as class methods, use the extend
method. If a module is extended into a class, then instance methods of the module become class methods of the class. Class methods of the module still do not become class methods of the class:
class AnyClass extend AnyModule end AnyClass.modulate AnyModule.total_modulate ### NOT AnyClass.total_modulate
An instance variable can be declared for the current class at any time by prefixing the variable name with an @
. This is usually done in the initialize
method, but it can be done inside any method:
def doing_things @name = "fred" end
The new instance variable is available wherever the object is used. Using an instance variable never raises an exception in Ruby; if the variable has not been explicitly created, its value is nil
.
Ruby instance variables are never accessible outside the class they are a part of. In order for other classes to see the value, they must call a method that returns or changes the value. The standard getters and setters in Ruby look like this:
def name @name end def name=(val) @name = val end
It's tedious to have to write all that for each instance variable, and so Ruby gives you a shortcut:
attr_accessor:name,:date,:score
The attr_accessor
method takes one or more symbols as arguments, and creates standard getters and setters for each symbol for the instance variable of the same name. If you only want a getter, or only a setter, you can use the variants attr_reader
and attr_writer
.
Ruby objects have access control modifiers that can prevent outside objects from calling methods in the class. The default access is public
, which means that any other object can call the method. The next level of strictness is protected
, which means that the method can only be called inside the body of the class or one of its subclasses.
However, the protected method can be called on any instance of the class that happens to be available in the method body. A private
method can only be called by implicit lookup where the method has no receiver, meaning that the private method can only be called on the self
object in the current context.
The difference between protected
and private
may seem odd. Take a look at the following example of a comparison operator (assume that outside
is a different instance of the same class):
def <=>(outside) key <=> outside.key end
The comparison checks how the local value of the key
attribute compares to the value of the outside
object. The local value, key
, is accessible no matter what the access of the key
method is. If key is declared to be public
, then outside.key
is legal because all public access is legal. If key is protected
, then outside.key
is still legal, because it is being called inside the class, even though it is not the instance that is currently self
.
However, if key is private
, then outside.key
is not legal, because a private call can only be made with an implicit self
. The initial call to just key
is still legal, because that call is an implicit self
.
There are two ways to define access control. The most common way is to just include the method call public, protected
, or private
, with no arguments anywhere in the class. From that point, all methods have the newly declared access level, until the next no-argument access control method is called. If the access control method is called with arguments, then those arguments are the symbols of methods to be given the access. In the following example, protected
_method
is, well, protected because of its location, and thing
is private because of the explicit call in the last line.
class Example def initialize end def thing end protected def protected_method end private:thing end
The following is a complete list of the steps Ruby takes to find the definition of a method.
For an instance method, the receiving object is either the object explicitly designated as the recipient of the method, or implicitly, the current value of self
. All method matches are on the name only, not the number or type of arguments. The search path is as follows for an instance method:
The singleton class of the receiving object, if it exists.
The class of the receiving object, looking for an instance method.
Any module included in the class of the receiving object, looking for the instance method. If more than one module is included, they are searched in order.
The superclass of the receiving object, looking for an instance method.
Steps 3 and 4 are repeated for modules included in the superclass, and then the superclass of the superclass. This continues until the lookup reaches the class Object
.
If the class Object
doesn't have the method, the class Kernel
is checked.
If the method doesn't exist there, then the special method method_missing
is called, starting at the receiving object, and continuing along the same lookup mechanism until an implementation is found. (Object
is guaranteed to have one.) Inside method_missing
, the program gets a last chance to do something, based on the method name and arguments rather than throwing an exception.
If nothing is found, an exception is thrown.
The lookup path for a class method is slightly different. In this case, the recipient class is the class being sent the message, as in Employee.total_count.
The class is searched for a class method of the same name.
All superclasses are searched for a class method. Notice that included modules are not searched in this path. An extended module technically adds its methods directly into the namespace of the extending class, and thus would be found just by walking up the regular class hierarchy.
Eventually, the superclasses reach Object
. If Object
does not define the class method, then the next step is to search for instance methods of the class Class
. (Remember, classes are objects, too.)
The path from there is instance methods of Class
, instance methods of Module
, instance methods of Object
, and instance methods of Kernel
, in that order.
If nothing is found, then a class version of method_missing
is searched for, starting at the original recipient class.
Module methods are similar to class methods, except that modules don't have superclasses, and so the search path is simply module method of the module, instance methods of Module
, instance methods of Object
, and instance methods of Kernel
, then method_missing
.
18.225.55.198