The Ruby interpreter parses a program as a sequence of tokens. Tokens include comments, literals, punctuation, identifiers, and keywords. This section introduces these types of tokens and also includes important information about the characters that comprise the tokens and the whitespace that separates the tokens.
Comments in Ruby begin with a #
character and
continue to the end of the line. The Ruby interpreter ignores the
#
character and any text that
follows it (but does not ignore the newline character, which is
meaningful whitespace and may serve as a statement terminator). If a
#
character appears within a string
or regular expression literal (see Chapter 3), then
it is simply part of the string or regular expression and does not
introduce a comment:
# This entire line is a comment x = "#This is a string" # And this is a comment y = /#This is a regular expression/ # Here's another comment
Multiline comments are usually written simply by beginning each line with a
separate #
character:
# # This class represents a Complex number # Despite its name, it is not complex at all. #
Note that Ruby has no equivalent of the C-style /*...*/
comment. There is no way to embed a comment in the middle of a line of
code.
Ruby supports another style of multiline comment known as
an embedded document. These start on a line
that begins =begin
and
continue until (and include) a line that begins =end
. Any text that appears after =begin
or =end
is part of the comment and is also
ignored, but that extra text must be separated from the =begin
and =end
by at least one space.
Embedded documents are a convenient way to comment out long
blocks of code without prefixing each line with a #
character:
=begin Someone needs to fix the broken code below!
Any code here is commented out
=end
Note that embedded documents only work if the =
signs are the first characters of each line:
# =begin This used to begin a comment. Now it is itself commented out!
The code that goes here is no longer commented out
# =end
As their name implies, embedded documents can be used to
include long blocks of documentation within a program, or to embed
source code of another language (such as HTML or SQL) within a Ruby
program. Embedded documents are usually intended to be used by some
kind of postprocessing tool that is run over the Ruby source code,
and it is typical to follow =begin
with an identifier that indicates
which tool the comment is
intended for.
Ruby programs can include embedded API documentation as specially formatted comments that precede method, class, and module definitions. You can browse this documentation using the ri tool described earlier in Viewing Ruby Documentation with ri. The rdoc tool extracts documentation comments from Ruby source and formats them as HTML or prepares them for display by ri. Documentation of the rdoc tool is beyond the scope of this book; see the file lib/rdoc/README in the Ruby source code for details.
Documentation comments must come immediately before the
module, class, or method whose
API they document. They are usually written as multiline comments
where each line begins with #
,
but they can also be written as embedded documents that start
=begin rdoc
. (The
rdoc tool will not process these comments if
you leave out the “rdoc
”.)
The following example comment demonstrates the most important formatting elements of the markup grammar used in Ruby’s documentation comments; a detailed description of the grammar is available in the README file mentioned previously:
# # Rdoc comments use a simple markup grammar like those used in wikis. # # Separate paragraphs with a blank line. # # = Headings # # Headings begin with an equals sign # # == Sub-Headings # The line above produces a subheading. # === Sub-Sub-Heading # And so on. # # = Examples # # Indented lines are displayed verbatim in code font. # Be careful not to indent your headings and lists, though. # # = Lists and Fonts # # List items begin with * or -. Indicate fonts with punctuation or HTML: # * _italic_ or <i>multi-word italic</i> # * *bold* or <b>multi-word bold</b> # * +code+ or <tt>multi-word code</tt> # # 1. Numbered lists begin with numbers. # 99. Any number will do; they don't have to be sequential. # 1. There is no way to do nested lists. # # The terms of a description list are bracketed: # [item 1] This is a description of item 1 # [item 2] This is a description of item 2 #
Literals are values that appear directly in Ruby source code. They include numbers, strings of text, and regular expressions. (Other literals, such as array and hash values, are not individual tokens but are more complex expressions.) Ruby number and string literal syntax is actually quite complicated, and is covered in detail in Chapter 3. For now, an example suffices to illustrate what Ruby literals look like:
1 # An integer literal 1.0 # A floating-point literal 'one' # A string literal "two" # Another string literal /three/ # A regular expression literal
Ruby uses punctuation characters for a number of purposes. Most Ruby
operators are written using punctuation characters, such as +
for addition, *
for multiplication, and ||
for the Boolean OR operation. See Operators for a complete list of Ruby operators. Punctuation characters also serve to delimit
string, regular expression, array, and hash literals, and to group and
separate expressions, method arguments, and array indexes. We’ll see
miscellaneous other uses of punctuation scattered throughout Ruby
syntax.
An identifier is simply a name. Ruby uses identifiers to name variables, methods, classes, and so forth. Ruby identifiers consist of letters, numbers, and underscore characters, but they may not begin with a number. Identifiers may not include whitespace or nonprinting characters, and they may not include punctuation characters except as described here.
Identifiers that begin with a capital letter A–Z are constants, and the Ruby interpreter will issue a warning (but not an error) if you alter the value of such an identifier. Class and module names must begin with initial capital letters. The following are identifiers:
i x2 old_value _internal # Identifiers may begin with underscores PI # Constant
By convention, multiword identifiers that are not constants are
written with underscores like_this
,
whereas multiword constants are written LikeThis
or LIKE_THIS
.
Ruby is a case-sensitive language. Lowercase letters and uppercase letters are
distinct. The keyword end
, for
example, is completely different from the keyword END
.
Ruby’s rules for forming identifiers are defined in terms of ASCII characters that are not allowed. In general, all characters outside of the ASCII character set are valid in identifiers, including characters that appear to be punctuation. In a UTF-8 encoded file, for example, the following Ruby code is valid:
def ×(x,y) # The name of this method is the Unicode multiplication sign x*y # The body of this method multiplies its arguments end
Similarly, a Japanese programmer writing a program encoded in SJIS or EUC can include Kanji characters in her identifiers. See Specifying Program Encoding for more about writing Ruby programs using encodings other than ASCII.
The special rules about forming identifiers are based on ASCII
characters and are not enforced for characters outside of that set.
An identifier may not begin with an ASCII digit, for example, but it
may begin with a digit from a non-Latin alphabet. Similarly, an
identifier must begin with an ASCII capital letter in order to be
considered a constant. The identifier Å
, for example, is not a constant.
Two identifiers are the same only if they are represented by the same sequence of bytes. Some character sets, such as Unicode, have more than one codepoint that represents the same character. No Unicode normalization is performed in Ruby, and two distinct codepoints are treated as distinct characters, even if they have the same meaning or are represented by the same font glyph.
Punctuation characters may appear at the start and end of Ruby identifiers. They have the following meanings:
$ | Global variables are prefixed with a dollar
sign. Following Perl’s example, Ruby defines a number of
global variables that include other punctuation characters,
such as $_ and $-K . See Chapter 10
for a list of these special globals. |
@ | Instance variables are prefixed with a single at sign, and class variables are prefixed with two at signs. Instance variables and class variables are explained in Chapter 7. |
? | As a helpful convention, methods that return Boolean values often have names that end with a question mark. |
! | Method names may end with an exclamation point to indicate that they should be used cautiously. This naming convention is often to distinguish mutator methods that alter the object on which they are invoked from variants that return a modified copy of the original object. |
= | Methods whose names end with an equals sign can be invoked by placing the method name, without the equals sign, on the left side of an assignment operator. (You can read more about this in Assigning to Attributes and Array Elements and Accessors and Attributes.) |
Here are some example identifiers that contain leading or trailing punctuation characters:
$files # A global variable @data # An instance variable @@counter # A class variable empty? # A Boolean-valued method or predicate sort! # An in-place alternative to the regular sort method timeout= # A method invoked by assignment
A number of Ruby’s operators are implemented as methods, so that classes can redefine them for their own purposes. It is therefore possible to use certain operators as method names as well. In this context, the punctuation character or characters of the operator are treated as identifiers rather than operators. See Operators for more about Ruby’s operators.
The following keywords have special meaning in Ruby and are treated specially by the Ruby parser:
__LINE__ case ensure not then __ENCODING__ class false or true __FILE__ def for redo undef BEGIN defined? if rescue unless END do in retry until alias else module return when and elsif next self while begin end nil super yield break
In addition to those keywords, there are three keyword-like tokens that are treated specially by the Ruby parser when they appear at the beginning of a line:
=begin =end __END__
As we’ve seen, =begin
and
=end
at the beginning of a line
delimit multiline comments. And the token __END__
marks the end of the program (and
the beginning of a data section) if it appears on a line by itself
with no leading or trailing whitespace.
In most languages, these words would be called “reserved words”
and they would be never allowed as identifiers. The Ruby
parser is flexible and does not complain if you prefix these keywords
with @
, @@
, or $
prefixes and use them as instance, class, or global variable
names. Also, you can use these keywords as method names, with the
caveat that the method must always be explicitly invoked through an
object. Note, however, that using these keywords in identifiers will
result in confusing code. The best practice is to treat these keywords
as reserved.
Many important features of the Ruby language are actually
implemented as methods of the Kernel
, Module
, Class
, and Object
classes. It is good practice,
therefore, to treat the following identifiers as reserved words as
well:
# These are methods that appear to be statements or keywords at_exit catch private require attr include proc throw attr_accessor lambda protected attr_reader load public attr_writer loop raise # These are commonly used global functions Array chomp! gsub! select Float chop iterator? sleep Integer chop! load split String eval open sprintf URI exec p srand abort exit print sub autoload exit! printf sub! autoload? fail putc syscall binding fork puts system block_given? format rand test callcc getc readline trap caller gets readlines warn chomp gsub scan # These are commonly used object methods allocate freeze kind_of? superclass clone frozen? method taint display hash methods tainted? dup id new to_a enum_for inherited nil? to_enum eql? inspect object_id to_s equal? instance_of? respond_to? untaint extend is_a? send
Spaces, tabs, and newlines are not tokens themselves but are used to separate tokens that would otherwise merge into a single token. Aside from this basic token-separating function, most whitespace is ignored by the Ruby interpreter and is simply used to format programs so that they are easy to read and understand. Not all whitespace is ignored, however. Some is required, and some whitespace is actually forbidden. Ruby’s grammar is expressive but complex, and there are a few cases in which inserting or removing whitespace can change the meaning of a program. Although these cases do not often arise, it is important to know about them.
The most common form of whitespace dependency has to do with newlines as statement terminators. In languages like C and Java, every statement must be terminated with a semicolon. You can use semicolons to terminate statements in Ruby, too, but this is only required if you put more than one statement on the same line. Convention dictates that semicolons be omitted elsewhere.
Without explicit semicolons, the Ruby interpreter must figure out on its own where statements end. If the Ruby code on a line is a syntactically complete statement, Ruby uses the newline as the statement terminator. If the statement is not complete, then Ruby continues parsing the statement on the next line. (In Ruby 1.9, there is one exception, which is described later in this section.)
This is no problem if all your statements fit on a single
line. When they don’t, however, you must take care that you break
the line in such a way that the Ruby interpreter cannot interpret
the first line as a statement of its own. This is where the
whitespace dependency lies: your program may behave differently
depending on where you insert a newline. For example, the following
code adds x
and y
and assigns the sum to total
:
total = x + # Incomplete expression, parsing continues y
But this code assigns x
to
total
, and then evaluates
y
, doing nothing with it:
total = x # This is a complete expression + y # A useless but complete expression
As another example, consider the return
and break
statements.
These statements may optionally be followed by an expression that
provides a return value. A newline between the keyword and the
expression will terminate the statement before the
expression.
You can safely insert a newline without fear of prematurely terminating your statement after an operator or after a period or comma in a method invocation, array literal, or hash literal.
You can also escape a line break with a backslash, which prevents Ruby from automatically terminating the statement:
var total = first_long_variable_name + second_long_variable_name + third_long_variable_name # Note no statement terminator above
In Ruby 1.9, the statement terminator rules change slightly. If the first nonspace character on a line is a period, then the line is considered a continuation line, and the newline before it is not a statement terminator. Lines that start with periods are useful for the long method chains sometimes used with “fluent APIs,” in which each method invocation returns an object on which additional invocations can be made. For example:
animals = Array.new .push("dog") # Does not work in Ruby 1.8 .push("cow") .push("cat") .sort
Ruby’s grammar allows the parentheses around method invocations to be omitted in certain circumstances. This allows Ruby methods to be used as if they were statements, which is an important part of Ruby’s elegance. Unfortunately, however, it opens up a pernicious whitespace dependency. Consider the following two lines, which differ only by a single space:
f(3+2)+1 f (3+2)+1
The first line passes the value 5
to the function f
and then adds 1
to the result. Since the second line has
a space after the function name, Ruby assumes that the parentheses
around the method call have been omitted. The parentheses that
appear after the space are used to group a subexpression, but the
entire expression (3+2)+1
is used
as the method argument. If warnings are enabled (with -w
), Ruby issues a warning whenever it
sees ambiguous code like this.
The solution to this whitespace dependency is straightforward:
Never put a space between a method name and the opening parenthesis.
If the first argument to a method begins with an open
parenthesis, always use parentheses in the method invocation.
For example, write f((3+2)+1)
.
Always run the Ruby interpreter with the -w
option so it will warn you if you
forget either of the rules above!
52.14.134.130