Input/Output

An IO object is a stream: a readable source of bytes or characters or a writable sink for bytes or characters. The File class is a subclass of IO. IO objects also represent the “standard input” and “standard output” streams used to read from and write to the console. The stringio module in the standard library allows us to create a stream wrapper around a string object. Finally, the socket objects used in networking (described later in this chapter) are also IO objects.

Opening Streams

Before we can perform input or output, we must have an IO object to read from or write to. The IO class defines factory methods new, open, popen, and pipe, but these are low-level methods with operating system dependencies, and they are not documented here. The subsections that follow describe more common ways to obtain IO objects. (And Networking includes examples that create IO objects that communicate across the network.)

Opening files

One of the most common kinds of IO is the reading and writing of files. The File class defines some utility methods (described below) that read the entire contents of a file with one call. Often, however, you will instead open a file to obtain a File object and then use IO methods to read from or write to the file.

Use File.open (or File.new) to open a file. The first argument is the name of the file. This is usually specified as a string, but in Ruby 1.9, you can use any object with a to_path method. Filenames are interpreted relative to the current working directory unless they are specified with an absolute path. Use forward slash characters to separate directories—Ruby automatically converts them into backslashes on Windows. The second argument to File.open is a short string that specifies how the file should be opened:

f = File.open("data.txt", "r")   # Open file data.txt for reading
out = File.open("out.txt", "w")  # Open file out.txt for writing

The second argument to File.open is a string that specifies the “file mode.” It must begin with one of the values in the following table. Add "b" to the mode string to prevent automatic line terminator conversion on Windows platforms. For text files, you may add the name of a character encoding to the mode string. For binary files, you should add ":binary" to the string. This is explained in Streams and Encodings.

ModeDescription
"r"Open for reading. The default mode.
"r+"

Open for reading and writing. Start at beginning of file. Fail if file does not exist.

"w"

Open for writing. Create a new file or truncate an existing one.

"w+"

Like "w", but allows reading of the file as well.

"a"

Open for writing, but append to the end of the file if it already exists.

"a+"

Like "a", but allows reads also.

File.open (but not File.new) may be followed by a block. If a block is provided, then File.open doesn’t return the File object but instead passes it to the block, and automatically closes it when the block exits. The return value of the block becomes the return value of File.open:

File.open("log.txt", "a") do |log|      # Open for appending
  log.puts("INFO: Logging a message")   # Output to the file
end                                     # Automatically closed

Kernel.open

The Kernel method open works like File.open but is more flexible. If the filename begins with |, it is treated as an operating system command, and the returned stream is used for reading from and writing to that command process. This is platform-dependent, of course:

# How long has the server been up?
uptime = open("|uptime") {|f| f.gets }

If the open-uri library has been loaded, then open can also be used to read from http and ftp URLs as if they were files:

require "open-uri"                         # Required library
f = open("http://www.davidflanagan.com/")  # Webpage as a file
webpage = f.read                           # Read it as one big string
f.close                                    # Don't forget to close!

In Ruby 1.9, if the argument to open has a method named to_open, then that method is called and should return an opened IO object.

StringIO

Another way to obtain an IO object is to use the stringio library to read from or write to a string:

require "stringio"
input = StringIO.open("now is the time")  # Read from this string
buffer = ""
output = StringIO.open(buffer, "w")       # Write into buffer

The StringIO class is not a subclass of IO, but it defines many of the same methods as IO does, and duck typing usually allows us to use a StringIO object in place of an IO object.

Predefined streams

Ruby predefines a number of streams that can be used without being created or opened. The global constants STDIN, STDOUT, and STDERR are the standard input stream, the standard output stream, and the standard error stream, respectively. By default, these streams are connected to the user’s console or a terminal window of some sort. Depending on how your Ruby script is invoked, they may instead use a file, or even another process, as a source of input or a destination for output. Any Ruby program can read from standard input and write to standard output (for normal program output) or standard error (for error messages that should be seen even if the standard output is redirected to a file). The global variables $stdin, $stdout, and $stderr are initially set to the same values as the stream constants. Global functions like print and puts write to $stdout by default. If a script alters the value of this global variable, it will change the behavior of those methods. The true “standard output” will still be available through STDOUT, however.

Another predefined stream is ARGF, or $<. This stream has special behavior intended to make it simple to write scripts that read the files specified on the command line or from standard input. If there are command-line arguments to the Ruby script (in the ARGV or $* array), then the ARGF stream acts as if those files had been concatenated together and the single resulting file opened for reading. In order for this to work properly, a Ruby script that accepts command-line options other than filenames must first process those options and remove them from the ARGV array. If the ARGV array is empty, then ARGF is the same as STDIN. (See Input Functions for further details about the ARGF stream.)

Finally, the DATA stream is designed for reading text that appears after the end of your Ruby script. This works only if your script includes the token __END__ on a line by itself. That token marks the end of the program text. Any lines after the token may be read with the DATA stream.

Streams and Encodings

One of the most significant changes in Ruby 1.9 is support for multibyte character encodings. We saw in Text that there were many changes to the String class. There are similar changes to the IO class.

In Ruby 1.9, every stream can have two encodings associated with it. These are known as the external and internal encodings, and are returned by the external_encoding and internal_encoding methods of an IO object. The external encoding is the encoding of the text as stored in the file. If you do not explicitly specify an external encoding for a stream, the default external encoding (see Source, External, and Internal Encodings) of the process is used. You can specify the default external encoding with the -E option (see Encoding Options). If you don’t specify the default external encoding, an appropriate default encoding is derived from your locale.

The internal encoding of a stream is the desired encoding for text that is read from the stream. If you do not explicitly specify an internal encoding, the default internal encoding (Source, External, and Internal Encodings) will be used. If you did not explicitly specify a default internal encoding with the -E, or -U options (Encoding Options) then the default internal encoding is unset. If a stream has an internal encoding, then all strings read from it are automatically transcoded, if necessary, to that encoding. If a stream does not have an internal encoding, then no transcoding is done: strings read from the stream are simply tagged with the external encoding (as by the String.force_encoding method).

Specify the encoding of any IO object (including pipes and network sockets) with the set_encoding method. With two arguments, it specifies an external encoding and an internal encoding. You can also specify two encodings with a single string argument, which consists of two encoding names separated by a colon. Normally, however, a single argument specifies just an external encoding. The arguments can be strings or Encoding objects. The external encoding is always specified first, followed, optionally, by an internal encoding. For example:

f.set_encoding("iso-8859-1", "utf-8") # Latin-1, transcoded to UTF-8
f.set_encoding("iso-8859-1:utf-8")    # Same as above
f.set_encoding(Encoding::UTF-8)       # UTF-8 text

set_encoding works for any kind of IO object. For files, however, it is often easiest to specify encoding when you open the file. You can do this by appending the encoding names to the file mode string. For example:

in = File.open("data.txt", "r:utf-8");           # Read UTF-8 text
out = File.open("log", "a:utf-8");               # Write UTF-8 text
in = File.open("data.txt", "r:iso8859-1:utf-8"); # Latin-1 transcoded to UTF-8 

Note that it is not usually necessary to specify two encodings for a stream that is to be used for output. In that case, the internal encoding is specified by the String objects that are written to the stream.

The default external encoding is, by default, derived from the user’s locale settings and is often a multibyte encoding. In order to read binary data from a file, therefore, you must explicitly specify that you want unencoded bytes, or you’ll get characters in the default external encoding. To do this, open a file with mode "r:binary", or pass Encoding::BINARY to set_encoding after opening the file:

File.open("data", "r:binary")  # Open a file for reading binary data

On Windows, you should open binary files with mode "rb:binary" or call binmode on the stream. This disables the automatic newline conversion performed by Windows, and is only necessary on that platform.

Not every stream-reading method honors the encoding of a stream. Some lower-level reading methods take an argument that specifies the number of bytes to read. By their nature, these methods return unencoded strings of bytes rather than strings of text. The methods that do not specify a length to read do honor the encoding.

Reading from a Stream

The IO class defines a number of methods for reading from streams. They work only if the stream is readable, of course. You can read from STDIN, ARGF, and DATA, but not from STDOUT or STDERR. Files and StringIO objects are opened for reading by default, unless you explicitly open them for writing only.

Reading lines

IO defines a number of ways to read lines from a stream:

lines = ARGF.readlines         # Read all input, return an array of lines
line = DATA.readline           # Read one line from stream
print l while l = DATA.gets    # Read until gets returns nil, at EOF
DATA.each {|line| print line } # Iterate lines from stream until EOF
DATA.each_line                 # An alias for each
DATA.lines                     # An enumerator for each_line: Ruby 1.9

Here are some important notes on these line-reading methods. First, the readline and the gets method differ only in their handling of EOF (end-of-file: the condition that occurs when there is no more to read from a stream). gets returns nil if it is invoked on a stream at EOF. readline instead raises an EOFError. If you do not know how many lines to expect, use gets. If you expect another line (and it is an error if it is not there), then use readline. You can check whether a stream is already at EOF with the eof? method.

Second, gets and readline implicitly set the global variable $_ to the line of text they return. A number of global methods, such as print, use $_ if they are not explicitly passed an argument. Therefore, the while loop in the code above could be written more succinctly as:

print while DATA.gets

Relying on $_ is useful for short scripts, but in longer programs, it is better style to explicitly use variables to store the lines of input you’ve read.

Third, these methods are typically used for text (instead of binary) streams, and a “line” is defined as a sequence of bytes up to and including the default line terminator (newline on most platforms). The lines returned by these methods include the line terminator (although the last line in a file may not have one). Use String.chomp! to strip it off. The special global variable $/ holds the line terminator. You can set $/ to alter the default behavior of all the line-reading methods, or you can simply pass an alternate separator to any of the methods (including the each iterator). You might do this when reading comma-separated fields from a file, for example, or when reading a binary file that has some kind of “record separator” character. There are two special cases for the line terminator. If you specify nil, then the line-reading methods keep reading until EOF and return the entire contents of the stream as a single line. If you specify the empty string “” as the line terminator, then the line-reading methods read a paragraph at a time, looking for a blank line as the separator.

In Ruby 1.9, gets and readline accept an optional integer as the first argument or as the second after a separator string. If specified, this integer specifies the maximum number of bytes to read from the stream. This limit argument exists to prevent accidental reads of unexpectedly long lines, and these methods are exceptions to the previously cited rule; they return encoded character strings despite the fact that they have a limit argument measured in bytes.

Finally, the line-reading methods gets, readline, and the each iterator (and its each_line alias) keep track of the number of lines they’ve read. You can query the line number of the most recently read line with the lineno method, and you can set that line number with lineno= accessor. Note that lineno does not actually count the number of newlines in a file. It counts the number of times line-reading methods have been called, and may return different results if you use different line separator characters:

DATA.lineno = 0     # Start from line 0, even though data is at end of file
DATA.readline       # Read one line of data
DATA.lineno         # => 1
$.                  # => 1: magic global variable, implicitly set

Reading entire files

IO defines three class methods for reading files without ever opening an IO stream. IO.read reads an entire file (or a portion of a file) and returns it as a single string. IO.readlines reads an entire named file into an array of lines. And IO.foreach iterates over the lines of a named file. In Ruby 1.9, you can pass a hash to these methods to specify the mode string and/or encoding of the file being read:

data = IO.read("data")                    # Read and return the entire file
data = IO.read("data", mode:"rb")         # Open with mode string "rb"
data = IO.read("data", encoding:"binary") # Read unencoded bytes
data = IO.read("data", 4, 2)              # Read 4 bytes starting at byte 2
data = IO.read("data", nil, 6)            # Read from byte 6 to end-of-file

# Read lines into an array
words = IO.readlines("/usr/share/dict/words")

# Read lines one at a time and initialize a hash
words = {}
IO.foreach("/usr/share/dict/words") {|w| words[w] = true}

In Ruby 1.9 you can use IO.copy_stream to read a file (or a portion) and write its content to a stream:

IO.copy_stream("/usr/share/dict/words", STDOUT) # Print the dictionary
IO.copy_stream("/usr/share/dict/words", STDOUT, 10, 100) # Print bytes 100-109

Although these class methods are defined by the IO class, they operate on named files, and it is also common to see them invoked as class methods of File: File.read, File.readlines, File.foreach, and File.copy_stream.

The IO class also defines an instance method named read, which is similar to the class method with the same name; with no arguments it reads text until the end of the stream and returns it as an encoded string:

# An alternative to text = File.read("data.txt")
f = File.open("data.txt")  # Open a file
text = f.read              # Read its contents as text
f.close                    # Close the file

The IO.read instance method can also be used with arguments to read a specified number of bytes from the stream. That use is described in the next section.

Reading bytes and characters

The IO class also defines methods for reading a stream one or more bytes or characters at a time, but these methods have changed substantially between Ruby 1.8 and Ruby 1.9 because Ruby’s definition of a character has changed.

In Ruby 1.8, bytes and characters are the same thing, and the getc and readchar methods read a single byte and return it as a Fixnum. Like gets, getc returns nil at EOF. And like readline, readchar raises EOFError if it is called at EOF.

In Ruby 1.9, getc and readchar have been modified to return a string of length 1 instead of a Fixnum. When reading from a stream with a multibyte encoding, these methods read as many bytes as necessary to read a complete character. If you want to read a string a byte at a time in Ruby 1.9, use the new methods getbyte and readbyte. getbyte is like getc and gets: it returns nil at EOF. And readbyte is like readchar and readline: it raises EOFError.

Programs (like parsers) that read a stream one character at a time sometimes need to push a single character back into the stream’s buffer, so that it will be returned by the next read call. They can do this with ungetc. This method expects a Fixnum in Ruby 1.8 and a single character string in Ruby 1.9. The character pushed back will be returned by the next call to getc or readchar:

f = File.open("data", "r:binary") # Open data file for binary reads 
c = f.getc                        # Read the first byte as an integer
f.ungetc(c)                       # Push that byte back
c = f.readchar                    # Read it back again

You can also iterate and enumerate the characters and bytes of a stream:

f.each_byte {|b| ... }      # Iterate through remaining bytes
f.bytes                     # An enumerator for each_byte: Ruby 1.9
f.each_char {|c} ...}       # Iterate characters: Ruby 1.9
f.chars                     # An enumerator for each_char: Ruby 1.9  

If you want to read more than one byte at a time, you have a choice of five methods, each with slightly different behavior:

readbytes(n)

Read exactly n bytes and return them as a string. Block, if necessary, until n bytes arrive. Raise EOFError if EOF occurs before n bytes are available.

readpartial(n, buffer=nil)

Read between 1 and n bytes and return them as a new binary string, or, if a String object is passed as the second argument, store them in that string (overwriting whatever text it contains). If one or more bytes are available for reading, this method returns them (up to a maximum of n) immediately. It blocks only if no bytes are available. This method raises EOFError if called when the stream is at EOF.

read(n=nil, buffer=nil)

Read n bytes (or fewer, if EOF is reached), blocking if necessary, until the bytes are ready. The bytes are returned as a binary string. If the second argument is an existing String object, then the bytes are stored in that object (replacing any existing content) and the string is returned. If the stream is at EOF and n is specified, it returns nil. If called at EOF and n is omitted or is nil, then it returns the empty string "".

If n is nil or is omitted, then this method reads the rest of the stream and returns it as an encoded character string rather than an unencoded byte string.

read_nonblock(n, buffer=nil)

Read the bytes (up to a maximum of n) that are currently available for reading, and return them as a string, using the buffer string if it is specified. This method does not block. If there is no data ready to be read on the stream (this might occur with a networking socket or with STDIN, for example) this method raises a SystemCallError. If called at EOF, this method raises EOFError.

This method is new in Ruby 1.9. (Ruby 1.9 also defines other nonblocking IO methods, but they are low-level and are not covered here.)

sysread(n)

This method works like readbytes but operates at a lower level without buffering. Do not mix calls to sysread with any other line- or byte-reading methods; they are incompatible.

Here is some example code you might use when reading a binary file:

f = File.open("data.bin", "rb:binary")  # No newline conversion, no encoding
magic = f.readbytes(4)       # First four bytes identify filetype
exit unless magic == "INTS"  # Magic number spells "INTS" (ASCII)
bytes = f.read               # Read the rest of the file
                             # Encoding is binary, so this is a byte string
data = bytes.unpack("i*")    # Convert bytes to an array of integers

Writing to a Stream

The IO methods for writing to a stream mirror those for reading. The STDOUT and STDERR streams are writable, as are files opened in any mode other than "r" or "rb".

IO defines a single putc method for writing single bytes or characters to a stream. This method accepts either a byte value or a single-character string as its argument, and therefore has not changed between Ruby 1.8 and 1.9:

o = STDOUT
# Single-character output
o.putc(65)         # Write single byte 65 (capital A)
o.putc("B")        # Write single byte 66 (capital B)
o.putc("CD")       # Write just the first byte of the string

The IO class defines a number of other methods for writing arbitrary strings. These methods differ from each other in the number of arguments they accept and whether or not line terminators are added. Recall that in Ruby 1.9, textual output is transcoded to the external encoding of the stream, if one was specified:

o = STDOUT
# String output
o << x             # Output x.to_s 
o << x << y        # May be chained: output x.to_s + y.to_s
o.print            # Output $_ + $
o.print s          # Output s.to_s + $
o.print s,t        # Output s.to_s + t.to_s + $
o.printf fmt,*args # Outputs fmt%[args]
o.puts             # Output newline
o.puts x           # Output x.to_s.chomp plus newline
o.puts x,y         # Output x.to_s.chomp, newline, y.to_s.chomp, newline
o.puts [x,y]       # Same as above
o.write s          # Output s.to_s, returns s.to_s.length
o.syswrite s       # Low-level version of write

Output streams are appendable, like strings and arrays are, and you can write values to them with the << operator. puts is one of the most common output methods. It converts each of its arguments to a string, and writes each one to the stream. If the string does not already end with a newline character, it adds one. If any of the arguments to puts is an array, the array is recursively expanded, and each element is printed on its own line as if it were passed directly as an argument to puts. The print method converts its arguments to strings, and outputs them to the stream. If the global field separator $, has been changed from its default value of nil, then that value is output between each of the arguments to print. If the output record separator $ has been changed from its default value of nil, then that value is output after all arguments are printed.

The printf method expects a format string as its first argument, and interpolates the values of any additional arguments into that format string using the String % operator. It then outputs the interpolated string with no newline or record separator.

write simply outputs its single argument as << does, and returns the number of bytes written. Finally, syswrite is a low-level, unbuffered, nontranscoding version of write. If you use syswrite, you must use that method exclusively, and not mix it with any other writing methods.

Random Access Methods

Some streams, such as those that represent network sockets, or user input at the console, are sequential streams: once you have read or written from them, you cannot go back. Other streams, such as those that read from or write to files or strings, allow random access with the methods described here. If you attempt to use these methods on a stream that does not allow random access, they will raise a SystemCallException:

f = File.open("test.txt")
f.pos        # => 0: return the current position in bytes
f.pos = 10   # skip to position 10
f.tell       # => 10: a synonym for pos
f.rewind     # go back to position 0, reset lineno to 0, also
f.seek(10, IO::SEEK_SET)  # Skip to absolute position 10
f.seek(10, IO::SEEK_CUR)  # Skip 10 bytes from current position
f.seek(-10, IO::SEEK_END) # Skip to 10 bytes from end
f.seek(0, IO::SEEK_END)   # Skip to very end of file
f.eof?                    # => true: we're at the end

If you use sysread or syswrite in your program, then use sysseek instead of seek for random access. sysseek is like seek except that it returns the new file position after each call:

pos = f.sysseek(0, IO::SEEK_CUR)  # Get current position
f.sysseek(0, IO::SEEK_SET)        # Rewind stream
f.sysseek(pos, IO::SEEK_SET)      # Return to original position

Closing, Flushing, and Testing Streams

When you are done reading from or writing to a stream, you must close it with the close method. This flushes any buffered input or output, and also frees up operating system resources. A number of stream-opening methods allow you to associate a block with them. They pass the open stream to the block, and automatically close the stream when the block exits. Managing streams in this way ensures that they are properly closed even when exceptions are raised:

File.open("test.txt") do |f|
  # Use stream f here
  # Value of this block becomes return value of the open method
end # f is automatically closed for us here

The alternative to using a block is to use an ensure clause of your own:

begin
  f = File.open("test.txt")
  # use stream f here
ensure
  f.close if f
end

Network sockets are implemented using IO objects that have separate read and write streams internally. You can use close_read and close_write to close these internal streams individually. Although files can be opened for reading and writing at the same time, you cannot use close_read and close_write on those IO objects.

Ruby’s output methods (except syswrite) buffer output for efficiency. The output buffer is flushed at reasonable times, such as when a newline is output or when data is read from a corresponding input stream. There are times, however, when you may need to explicitly flush the output buffer to force output to be sent right away:

out.print 'wait>' # Display a prompt
out.flush         # Manually flush output buffer to OS
sleep(1)          # Prompt appears before we go to sleep

out.sync = true   # Automatically flush buffer after every write
out.sync = false  # Don't automatically flush
out.sync          # Return current sync mode
out.fsync         # Flush output buffer and ask OS to flush its buffers
                  # Returns nil if unsupported on current platform

IO defines several predicates for testing the state of a stream:

f.eof?       # true if stream is at EOF
f.closed?    # true if stream has been closed
f.tty?       # true if stream is interactive

The only one of these methods that needs explanation is tty?. This method, and its alias isatty (with no question mark), returns true if the stream is connected to an interactive device such as a terminal window or a keyboard with (presumably) a human at it. They return false if the stream is a noninteractive one, such as a file, pipe, or socket. A program can use tty? to avoid prompting a user for input if STDIN has actually been redirected and is coming from a file, for example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.177.86