An IO
object is a stream: a readable source of bytes or characters or a
writable sink for bytes or characters. The File
class is a subclass of IO
. IO
objects also represent the “standard input” and “standard output”
streams used to read from and write to the console. The stringio
module in the standard library allows
us to create a stream wrapper around a string object. Finally, the
socket objects used in networking (described later in this chapter) are also
IO
objects.
Before we can perform input or output, we must have an
IO
object to read from or write to.
The IO
class defines
factory methods new
,
open
, popen
, and pipe
, but these are low-level methods with
operating system dependencies, and they are not documented here. The
subsections that follow describe more common ways to obtain IO
objects. (And Networking includes examples that create IO
objects that communicate across the
network.)
One of the most common kinds of IO
is the reading and writing of files. The File
class defines some utility methods
(described below) that read the entire contents of a file with one
call. Often, however, you will instead open a file to obtain a
File
object and then use IO
methods to read from or write to the
file.
Use File.open
(or File.new
) to open a file. The first argument is the name of the
file. This is usually specified as a string, but in Ruby 1.9, you
can use any object with a to_path
method. Filenames are interpreted relative to the current working
directory unless they are specified with an absolute path. Use
forward slash characters to separate directories—Ruby automatically
converts them into backslashes on Windows. The second argument to
File.open
is a short string that
specifies how the file should be opened:
f = File.open("data.txt", "r") # Open file data.txt for reading out = File.open("out.txt", "w") # Open file out.txt for writing
The second argument to File.open
is a string that specifies the
“file mode.” It must begin with one of the values in the following
table. Add "b"
to the mode string
to prevent automatic line terminator conversion on Windows
platforms. For text files, you may add the name of a character
encoding to the mode string. For binary files, you should add
":binary"
to the string. This is
explained in Streams and Encodings.
File.open
(but not File.new
) may be followed by a block. If a
block is provided, then File.open
doesn’t return the File
object
but instead passes it to the block, and automatically closes it when
the block exits. The return value of the block becomes the return
value of File.open
:
File.open("log.txt", "a") do |log| # Open for appending log.puts("INFO: Logging a message") # Output to the file end # Automatically closed
The Kernel
method open
works like File.open
but
is more flexible. If the filename begins with |
, it is treated as an operating system
command, and the returned stream is used for reading from and
writing to that command process. This is platform-dependent, of
course:
# How long has the server been up? uptime = open("|uptime") {|f| f.gets }
If the open-uri
library has
been loaded, then open
can also
be used to read from http
and
ftp
URLs as if they were
files:
require "open-uri" # Required library f = open("http://www.davidflanagan.com/") # Webpage as a file webpage = f.read # Read it as one big string f.close # Don't forget to close!
In Ruby 1.9, if the argument to open
has a method named to_open
, then that method is called and
should return an opened IO
object.
Another way to obtain an IO
object is to
use the stringio
library to read
from or write to a string:
require "stringio" input = StringIO.open("now is the time") # Read from this string buffer = "" output = StringIO.open(buffer, "w") # Write into buffer
The StringIO
class is not a
subclass of IO
, but it defines
many of the same methods as IO
does, and duck typing usually allows us to use a StringIO
object in place of an IO
object.
Ruby predefines a number of streams that can be used without
being created or opened. The global constants STDIN
, STDOUT
, and STDERR
are the standard input stream, the
standard output stream, and the standard error stream, respectively.
By default, these streams are connected to the user’s console or a
terminal window of some sort. Depending on how your Ruby script is
invoked, they may instead use a file, or even another process, as a
source of input or a destination for output. Any Ruby program can
read from standard input and write to standard output (for normal
program output) or standard error (for error messages that should be
seen even if the standard output is redirected to a file). The
global variables $stdin
, $stdout
, and $stderr
are initially set to the same
values as the stream constants. Global functions like print
and
puts
write to $stdout
by default. If a script alters the
value of this global variable, it will change the behavior of those
methods. The true “standard output” will still be
available through STDOUT
,
however.
Another predefined stream is ARGF
, or
$<
. This stream has special
behavior intended to make it simple to write scripts that read the
files specified on the command line or from standard input. If there
are command-line arguments to the Ruby script (in the ARGV
or $*
array), then the ARGF
stream acts as if those files had
been concatenated together and the single resulting file opened for
reading. In order for this to work properly, a Ruby script that
accepts command-line options other than filenames must first process
those options and remove them from the ARGV
array. If the ARGV
array is empty, then ARGF
is the same as STDIN
. (See
Input Functions for further details about the ARGF
stream.)
Finally, the DATA
stream is
designed for reading text that appears after the end of your Ruby
script. This works only if your script includes the token __END__
on a
line by itself. That token marks the end of the program text. Any
lines after the token may be read with the DATA
stream.
One of the most significant changes in Ruby 1.9 is support for
multibyte character encodings. We saw in Text that
there were many changes to the String class. There are similar changes
to the IO
class.
In Ruby 1.9, every stream can have two encodings associated with
it. These are known as the external and internal encodings, and are
returned by the external_encoding
and internal_encoding
methods of an IO
object. The
external encoding is the encoding of the text as stored in the file.
If you do not explicitly specify an external encoding for a stream,
the default external encoding (see Source, External, and Internal Encodings) of the process is used. You can
specify the default external encoding with the -E
option (see Encoding Options). If you don’t specify the default
external encoding, an appropriate default encoding is derived from
your locale.
The internal encoding of a stream is the desired encoding for
text that is read from the stream. If you do not explicitly specify an
internal encoding, the default internal encoding (Source, External, and Internal Encodings) will be used. If you did not
explicitly specify a default internal encoding with the -E
, or -U
options (Encoding Options) then the default internal
encoding is unset. If a stream has an internal encoding, then all
strings read from it are automatically transcoded, if necessary, to
that encoding. If a stream does not have an internal encoding, then no
transcoding is done: strings read from the stream are simply tagged
with the external encoding (as by the String.force_encoding
method).
Specify the encoding of any IO
object (including pipes and network
sockets) with the set_encoding
method. With two arguments, it specifies an external
encoding and an internal encoding. You can also specify two encodings
with a single string argument, which consists of two encoding names
separated by a colon. Normally, however, a single argument specifies
just an external encoding. The arguments can be strings or Encoding
objects. The external encoding is
always specified first, followed, optionally, by an internal encoding.
For example:
f.set_encoding("iso-8859-1", "utf-8") # Latin-1, transcoded to UTF-8 f.set_encoding("iso-8859-1:utf-8") # Same as above f.set_encoding(Encoding::UTF-8) # UTF-8 text
set_encoding
works for any
kind of IO
object. For files,
however, it is often easiest to specify encoding when you open the
file. You can do this by appending the encoding names to the file mode
string. For example:
in = File.open("data.txt", "r:utf-8"); # Read UTF-8 text out = File.open("log", "a:utf-8"); # Write UTF-8 text in = File.open("data.txt", "r:iso8859-1:utf-8"); # Latin-1 transcoded to UTF-8
Note that it is not usually necessary to specify two encodings
for a stream that is to be used for output. In that case, the internal
encoding is specified by the String
objects that are written to the stream.
The default external encoding is, by default, derived from the
user’s locale settings and is often a multibyte encoding. In order to
read binary data from a file, therefore, you must explicitly specify
that you want unencoded bytes, or you’ll get characters in the default
external encoding. To do this, open a file with mode "r:binary"
, or pass Encoding::BINARY
to set_encoding
after opening the file:
File.open("data", "r:binary") # Open a file for reading binary data
On Windows, you should open binary files with mode "rb:binary"
or call
binmode
on the stream. This disables the automatic newline
conversion performed by Windows, and is only necessary on that
platform.
Not every stream-reading method honors the encoding of a stream. Some lower-level reading methods take an argument that specifies the number of bytes to read. By their nature, these methods return unencoded strings of bytes rather than strings of text. The methods that do not specify a length to read do honor the encoding.
The IO
class defines a number of methods for reading from streams.
They work only if the stream is readable, of course. You can read
from STDIN
, ARGF
, and DATA
, but not from STDOUT
or STDERR
. Files and StringIO
objects are opened for reading by
default, unless you explicitly open them for writing only.
IO
defines a number of ways
to read lines from a stream:
lines = ARGF.readlines # Read all input, return an array of lines line = DATA.readline # Read one line from stream print l while l = DATA.gets # Read until gets returns nil, at EOF DATA.each {|line| print line } # Iterate lines from stream until EOF DATA.each_line # An alias for each DATA.lines # An enumerator for each_line: Ruby 1.9
Here are some important notes on these line-reading methods.
First, the readline
and the
gets
method differ only in their
handling of EOF (end-of-file: the condition that occurs when
there is no more to read from a stream). gets
returns nil
if it is invoked on a stream at EOF.
readline
instead raises an
EOFError
. If you do not know how
many lines to expect, use gets
.
If you expect another line (and it is an error if it is not there),
then use readline
. You
can check whether a stream is already at EOF with the eof?
method.
Second, gets
and readline
implicitly set the global
variable $_
to the line of text
they return. A number of global methods, such as print
, use $_
if they are
not explicitly passed an argument. Therefore, the while
loop in the code above could be
written more succinctly as:
print while DATA.gets
Relying on $_
is useful for
short scripts, but in longer programs, it is better style to
explicitly use variables to store the lines of input you’ve
read.
Third, these methods are typically used for text (instead of
binary) streams, and a “line” is defined as a sequence of bytes up
to and including the default line terminator (newline on most
platforms). The lines returned by these methods include the line
terminator (although the last line in a file may not have one). Use
String.chomp!
to strip it off.
The special global variable $/
holds the line terminator. You can set $/
to alter the default behavior of all
the line-reading methods, or you can simply pass an alternate
separator to any of the methods (including the each
iterator). You might do this when
reading comma-separated fields from a file, for example, or when
reading a binary file that has some kind of “record separator”
character. There are two special cases for the line terminator. If
you specify nil
, then the
line-reading methods keep reading until EOF and return the entire
contents of the stream as a single line. If you specify the empty
string “” as the line terminator, then the line-reading methods read
a paragraph at a time, looking for a blank line as the
separator.
In Ruby 1.9, gets
and
readline
accept an optional
integer as the first argument or as the second after a separator
string. If specified, this integer specifies the maximum number of
bytes to read from the stream. This limit argument exists to prevent
accidental reads of unexpectedly long lines, and these methods are
exceptions to the previously cited rule; they return encoded
character strings despite the fact that they have a limit argument
measured in bytes.
Finally, the line-reading methods gets
, readline
, and the each
iterator (and its each_line
alias) keep track of the number
of lines they’ve read. You can query the line number of the most
recently read line with the lineno
method, and you can set that line
number with lineno=
accessor.
Note that lineno
does not
actually count the number of newlines in a file. It counts the
number of times line-reading methods have been called, and may
return different results if you use different line separator
characters:
DATA.lineno = 0 # Start from line 0, even though data is at end of file DATA.readline # Read one line of data DATA.lineno # => 1 $. # => 1: magic global variable, implicitly set
IO
defines three class methods for reading files without ever opening
an IO
stream. IO.read
reads an entire file (or a portion of a file) and returns it
as a single string. IO.readlines
reads an entire named file into an array of lines. And IO.foreach
iterates over the lines of a named file. In Ruby 1.9, you can
pass a hash to these methods to specify the mode string and/or
encoding of the file being read:
data = IO.read("data") # Read and return the entire file data = IO.read("data", mode:"rb") # Open with mode string "rb" data = IO.read("data", encoding:"binary") # Read unencoded bytes data = IO.read("data", 4, 2) # Read 4 bytes starting at byte 2 data = IO.read("data", nil, 6) # Read from byte 6 to end-of-file # Read lines into an array words = IO.readlines("/usr/share/dict/words") # Read lines one at a time and initialize a hash words = {} IO.foreach("/usr/share/dict/words") {|w| words[w] = true}
In Ruby 1.9 you can use IO.copy_stream
to
read a file (or a portion) and write its content to a stream:
IO.copy_stream("/usr/share/dict/words", STDOUT) # Print the dictionary IO.copy_stream("/usr/share/dict/words", STDOUT, 10, 100) # Print bytes 100-109
Although these class methods are defined by the IO
class, they operate on named files, and
it is also common to see them invoked as class methods of File
: File.read
, File.readlines
, File.foreach
, and File.copy_stream
.
The IO
class also defines
an instance method named read
,
which is similar to the class method with the same name; with no
arguments it reads text until the end of the stream and returns it
as an encoded string:
# An alternative to text = File.read("data.txt") f = File.open("data.txt") # Open a file text = f.read # Read its contents as text f.close # Close the file
The IO.read
instance method
can also be used with arguments to read a specified number of bytes
from the stream. That use is described in the next section.
The IO
class also
defines methods for reading a stream one or more bytes
or characters at a time, but these methods have changed
substantially between Ruby 1.8 and Ruby 1.9 because Ruby’s
definition of a character has changed.
In Ruby 1.8, bytes and characters are the same thing, and the
getc
and readchar
methods read a single byte and
return it as a Fixnum
. Like
gets
, getc
returns nil
at EOF. And like readline
, readchar
raises EOFError
if it is called at EOF.
In Ruby 1.9, getc
and
readchar
have been modified to
return a string of length 1 instead of a Fixnum
. When reading from a stream with a
multibyte encoding, these methods read as many bytes as necessary to
read a complete character. If you want to read a string a byte at a
time in Ruby 1.9, use the new methods getbyte
and readbyte
. getbyte
is like getc
and gets
: it returns nil
at EOF. And readbyte
is like readchar
and readline
: it raises EOFError
.
Programs (like parsers) that read a stream one character at a
time sometimes need to push a single character back into the
stream’s buffer, so that it will be returned by the next read call.
They can do this with ungetc
. This
method expects a Fixnum
in Ruby
1.8 and a single character string in Ruby 1.9. The character pushed
back will be returned by the next call to getc
or readchar
:
f = File.open("data", "r:binary") # Open data file for binary reads c = f.getc # Read the first byte as an integer f.ungetc(c) # Push that byte back c = f.readchar # Read it back again
You can also iterate and enumerate the characters and bytes of a stream:
f.each_byte {|b| ... } # Iterate through remaining bytes f.bytes # An enumerator for each_byte: Ruby 1.9 f.each_char {|c} ...} # Iterate characters: Ruby 1.9 f.chars # An enumerator for each_char: Ruby 1.9
If you want to read more than one byte at a time, you have a choice of five methods, each with slightly different behavior:
readbytes(n)
Read exactly n
bytes and return them as a string. Block, if necessary, until
n
bytes arrive. Raise
EOFError
if EOF occurs
before n
bytes are
available.
readpartial(n,
buffer=nil)
Read between 1 and n
bytes and return them as a new binary string, or, if a
String
object is passed as
the second argument, store them in that string (overwriting whatever text it
contains). If one or more bytes are available for reading,
this method returns them (up to a maximum of n
) immediately. It blocks only if no
bytes are available. This method raises EOFError
if called when the stream
is at EOF.
read(n=nil,
buffer=nil)
Read n
bytes (or
fewer, if EOF is reached), blocking if necessary, until
the bytes are ready. The bytes are returned as a binary
string. If the second argument is an existing String
object, then the bytes are
stored in that object (replacing any existing content) and the
string is returned. If the stream is at EOF and n
is specified, it returns nil
. If called at EOF and n
is omitted or is nil
, then it returns the empty
string ""
.
If n
is nil
or is omitted, then this method
reads the rest of the stream and returns it as an encoded
character string rather than an unencoded byte string.
read_nonblock(n,
buffer=nil)
Read the bytes (up to a maximum of n
) that are currently available for
reading, and return them as a string, using the buffer
string if it is specified.
This method does not block. If there is no data ready to be
read on the stream (this might occur with a networking socket
or with STDIN, for example) this method raises a SystemCallError
. If called at
EOF, this method raises EOFError
.
This method is new in Ruby 1.9. (Ruby 1.9 also defines
other nonblocking IO
methods, but they are low-level and are not covered
here.)
sysread(n)
This method works like readbytes
but operates at a lower
level without buffering. Do not mix calls to sysread
with any other line- or
byte-reading methods; they are incompatible.
Here is some example code you might use when reading a binary file:
f = File.open("data.bin", "rb:binary") # No newline conversion, no encoding magic = f.readbytes(4) # First four bytes identify filetype exit unless magic == "INTS" # Magic number spells "INTS" (ASCII) bytes = f.read # Read the rest of the file # Encoding is binary, so this is a byte string data = bytes.unpack("i*") # Convert bytes to an array of integers
The IO
methods for writing to a stream mirror those for reading. The
STDOUT
and STDERR
streams are writable, as are files
opened in any mode other than "r"
or "rb"
.
IO
defines a single putc
method for writing single bytes or characters to a stream. This
method accepts either a byte value or a single-character string as its
argument, and therefore has not changed between Ruby 1.8 and
1.9:
o = STDOUT # Single-character output o.putc(65) # Write single byte 65 (capital A) o.putc("B") # Write single byte 66 (capital B) o.putc("CD") # Write just the first byte of the string
The IO
class defines a number
of other methods for writing arbitrary strings. These methods differ
from each other in the number of arguments they accept and whether or
not line terminators are added. Recall that in Ruby 1.9, textual
output is transcoded to the external encoding of the stream, if one
was specified:
o = STDOUT # String output o << x # Output x.to_s o << x << y # May be chained: output x.to_s + y.to_s o.print # Output $_ + $ o.print s # Output s.to_s + $ o.print s,t # Output s.to_s + t.to_s + $ o.printf fmt,*args # Outputs fmt%[args] o.puts # Output newline o.puts x # Output x.to_s.chomp plus newline o.puts x,y # Output x.to_s.chomp, newline, y.to_s.chomp, newline o.puts [x,y] # Same as above o.write s # Output s.to_s, returns s.to_s.length o.syswrite s # Low-level version of write
Output streams are appendable, like strings and arrays are, and
you can write values to them with the <<
operator. puts
is one of the most common output
methods. It converts each of its arguments to a string, and writes
each one to the stream. If the string does not already end with a
newline character, it adds one. If any of the arguments to puts
is an array, the array is recursively
expanded, and each element is printed on its own line as if it were
passed directly as an argument to puts
. The print
method converts its arguments to strings, and outputs them to
the stream. If the global field separator $,
has been changed from its default value
of nil
, then that value is output
between each of the arguments to print
. If the output record separator
$
has been changed from its
default value of nil
, then that
value is output after all arguments are printed.
The printf
method expects a
format string as its first argument, and interpolates the values of
any additional arguments into that format string using the String
%
operator. It then outputs the interpolated string with no newline or
record separator.
write
simply outputs its
single argument as <<
does,
and returns the number of bytes written. Finally, syswrite
is a low-level, unbuffered,
nontranscoding version of write
. If
you use syswrite
, you must use that
method exclusively, and not mix it with any other writing
methods.
Some streams, such as those that represent network sockets, or user
input at the console, are sequential streams: once you have read or
written from them, you cannot go back. Other streams, such as those
that read from or write to files or strings, allow random access with
the methods described here. If you attempt to use these methods on a
stream that does not allow random access, they will raise a SystemCallException
:
f = File.open("test.txt") f.pos # => 0: return the current position in bytes f.pos = 10 # skip to position 10 f.tell # => 10: a synonym for pos f.rewind # go back to position 0, reset lineno to 0, also f.seek(10, IO::SEEK_SET) # Skip to absolute position 10 f.seek(10, IO::SEEK_CUR) # Skip 10 bytes from current position f.seek(-10, IO::SEEK_END) # Skip to 10 bytes from end f.seek(0, IO::SEEK_END) # Skip to very end of file f.eof? # => true: we're at the end
If you use sysread
or
syswrite
in your program, then use
sysseek
instead of seek
for random access. sysseek
is like seek
except that it returns the new file
position after each call:
pos = f.sysseek(0, IO::SEEK_CUR) # Get current position f.sysseek(0, IO::SEEK_SET) # Rewind stream f.sysseek(pos, IO::SEEK_SET) # Return to original position
When you are done reading from or writing to a stream, you must
close it with the close
method. This
flushes any buffered input or output, and also frees up operating
system resources. A number of stream-opening methods allow you to
associate a block with them. They pass the open stream to the block,
and automatically close the stream when the block exits. Managing
streams in this way ensures that they are properly closed even when
exceptions are raised:
File.open("test.txt") do |f| # Use stream f here # Value of this block becomes return value of the open method end # f is automatically closed for us here
The alternative to using a block is to use an ensure
clause of your own:
begin f = File.open("test.txt") # use stream f here ensure f.close if f end
Network sockets are implemented using IO
objects that have separate read and write
streams internally. You can use close_read
and close_write
to close these internal streams
individually. Although files can be opened for reading and writing at
the same time, you cannot use close_read
and close_write
on those IO
objects.
Ruby’s output methods (except syswrite
) buffer output for efficiency. The
output buffer is flushed at reasonable times, such as when a newline
is output or when data is read from a corresponding input stream.
There are times, however, when you may need to explicitly flush the
output buffer to force output to be sent right away:
out.print 'wait>' # Display a prompt out.flush # Manually flush output buffer to OS sleep(1) # Prompt appears before we go to sleep out.sync = true # Automatically flush buffer after every write out.sync = false # Don't automatically flush out.sync # Return current sync mode out.fsync # Flush output buffer and ask OS to flush its buffers # Returns nil if unsupported on current platform
IO
defines several predicates
for testing the state of a stream:
f.eof? # true if stream is at EOF f.closed? # true if stream has been closed f.tty? # true if stream is interactive
The only one of these methods that needs explanation is tty?
. This method, and its alias isatty
(with no question mark), returns
true
if the stream is connected to
an interactive device such as a
terminal window or a keyboard with (presumably) a human at it. They
return false
if the stream is a
noninteractive one, such as a file, pipe, or socket. A program can use
tty?
to avoid prompting a user for
input if STDIN
has actually been
redirected and is coming from a file, for example.
3.138.123.106