Overview
By the end of this chapter, you will be able to import external data and code to improve the functionality of an application; use Ruby gems in programs; interact with file systems and file modes in Ruby; read and write files to and from disk using Ruby; import and export external CSV data into Ruby applications and use service objects to package code for reuse within applications.
In the previous chapter, we learned about code reusability and how to clean up our code base by extracting common functionality and logic from modules that can be included as needed throughout our project, preventing unnecessary code duplication.
This is an important concept to grasp as it forms the base for Ruby's excellent package management system known as RubyGems, which we will dive into further in this chapter.
Most applications consist of inputs and outputs. Facebook will have data in the form of photos and status updates (as input), and users, in turn, will see other users' photos and status updates (as output). Additionally, a banking application will load data from a database (as input) and present it to the user in the form of charts and tables (as output). The input data sources will vary per application, but the concept of inputs and outputs is essentially the same.
Data is fed into the application, some sort of processing is performed, and there is an output action, be it saving to a database, exporting data to another format, or simply printing out a processed version of the input.
A common scenario in the workplace is the need to process data in the form of
comma-separated values (CSV) that may have been imported from another system. Following this, some sort of processing is then performed on the values and, finally, a result is outputted to the user in a way that helps them to understand the data.
In this chapter, we will look at handling this exact type of scenario. We will look at importing and exporting CSV data, processing it, and then outputting a result using an external library that's going to format the data into a nice readable table for us.
We will also take a closer look at RubyGems, how we can interact with the package manager, and how to utilize external gems in our own code base. We'll then wrap everything up by implementing everything we've learned about as a service object in our code.
Similar to the concept of including modules, Ruby has another way of including external code into your project, which is known as a gem. Essentially, a Ruby gem is a package of code that can be included in your project, much like a module, with a few key differences such as the ability to version a particular package and the ability to load other dependent gems at the same time.
Generally speaking, a gem is more of a collection of modules and classes than a single module or class. Gems can be tiny and can solve a single problem, such as formatting screen output, or they can be an entire application framework. The Ruby on Rails framework, for example, is a gem itself.
Most modern languages have an equivalent way of loading external code packages into an application. These are commonly referred to as package managers.
For Node.js, you would use npm or yarn; for Python, you would use PIP; for C#, you would use NuGet; and for Ruby, we use RubyGems.
So, why would we want to include other external code and libraries in our own code base? Well, quite simply, to save us time and effort. Consider the following examples.
You're building a new application and you want to allow for user registration so that customers can sign in to your website. Creating a robust user authentication and registration system is no small task. You would probably need to answer the following questions before you begin:
You could write these yourself, but it would take a lot of time and effort and you'll more than likely make mistakes that could compromise the security of your application. Thankfully, with RubyGems, we can simply include the devise gem (https://packt.live/318fy8k) into our project and have a fully featured authentication and user registration system that solves our problems in a matter of minutes.
But why stop at user authentication? Let's say our application needs to upload files to a remote location; well, we can just add the carrierwave gem (https://packt.live/33nzOV2).
What if you need to paginate the results of your web page? In that case, you just add the will_paginate gem (https://packt.live/2VuQuYa).
You can begin to see how we can create a very functional application in no time at all by leveraging this external code in the form of gems. This allows us to focus on what our core application functionality needs to do, rather than the standard functionality that we all expect from any application, such as being able to sign up and log in.
Now let's take a look at how we can interact with Ruby gems. The following are the gem functions we are going to study next:
Let's take a look at how we can interact with each of these Ruby gems, one by one.
gem search is used to search for available gems by name. Run the following command on the Terminal:
$ gem search terminal-table
The output would be as follows:
Note
Search for locally installed gems by passing the --local flag or search only for remote gems by passing the --remote flag, to the gem search command.
As you might expect, gem install will install a gem. It does so when you simply pass in the name of a gem to the command. By default, it will install the latest version of the gem:
$ gem install terminal-table
To install a specific version of a gem with the --version flag, use the following command:
$ gem install terminal-table --version
The output would be as follows:
As you can see, there are two gems installed – terminal table and unicode-display-width. The unicode-display-width gem is a dependency, which simply means that the creators of the terminal-table gem are using (much like including) the unicode-display-width gem in their own code and have listed it as a dependent gem in their own gem's definition.
RubyGems is intelligent enough to figure out all of these dependency chains and install them as required. As you can imagine, gems depending on other gems can go many levels deep, which would be a pain to manage yourself. Thankfully, we don't need to think about that with RubyGems.
As we saw previously, gems can have dependencies on other gems. You can view a gem dependency with the gem dependency command as follows:
$ gem dependency terminal-table
The output would be as follows:
You'll notice that there are several gems listed here; however, all but one are assigned to the development group. This is essentially saying that those gems are only required in the development environment for the terminal-table gem.
By default, these will not be installed when installing the terminal-table gem, as only a gem's non-development dependency is installed by default.
In the following command, notice the numbers with the arrows next to the gem names in some of the gem commands:
These are known as semantic versioning constraints. They essentially tell RubyGems what range of versions of that gem is required to be installed. RubyGems is able to read these numbers and select a compatible version of that gem to download and install.
You can translate them to mean "I need a version equal to or greater than version 1.10," or "I need a version that is no older than version 3.0."
Note
More information on how this works can be found at https://packt.live/2IF9wWq.
The ~> symbol in the semantic versioning constraints is called a twiddle-wakka.
gem list lists locally installed gems. It is extremely helpful when we are trying to understand what gems and their versions are installed:
You'll notice that there are a lot of gems listed even though you may have only installed one. The gems with default: included in the brackets are part of the Ruby core library. They come installed with Ruby and cannot be removed.
You can see from the preceding output that the two gems at the bottom are the ones that we installed previously and do not have the default: label attached to them.
Note
Depending on the version of Ruby you are using, this gem listing may appear differently. Newer versions of Ruby may include additional default gems.
To use a gem, you simply require the gem in your code, which is similar to how you would include a module. Consider the following example:
user = { name: 'John Smith', age: '35', address: { home: '1 kings cross road' }}
puts JSON.pretty_generate(user)
# NameError (uninitialized constant JSON)
require 'json'
puts JSON.pretty_generate(user)
{
"name": "John Smith",
"age": "35",
"address": {
"home": "1 kings cross road"
}
}
In the preceding example, we create a simple hash containing some user information and we attempt to convert it to JSON and display it in a formatted way using the JSON.pretty_generate function.
We can then see that it throws a NameError error because the JSON gem has not been required and is, therefore, not available. This is very much like trying to use a method from a module before you've included it.
In the preceding lines of code, we require the JSON gem (which is a default Ruby gem). Then, we call the JSON.pretty_generate method again; we can see it now works and formats our hash into a pretty JSON format.
It really is that simple to load other libraries into our code and extend the functionality of our application.
So, we've learned what Ruby gems are now and how they can be used to extend the functionality of our application by simply "requiring" them in our code. Let's try it out for ourselves now.
In the following exercise, we'll learn how to use a Ruby gem to format and present a basic data structure in our Terminal windows in a readable format. Creating a neatly presented table of information in a Terminal window is a tricky task; it's not something we would want to repeat for every project, so let's make it easier for ourselves and use one that has already been built.
In this exercise, we will be installing a Ruby gem, terminal-table, to generate a table of individuals and their locations and then print it.
The following steps will help you to complete this exercise:
gem install terminal-table
require "terminal-table"
headings = ["Name", "City"]
users = [
["James", "Sydney"],
["Chris", "New York"]
]
table = Terminal::Table.new rows: users, headings: headings
puts table
ruby exercise_1.rb
You should see a table printout of our users with a heading row, as follows:
Thus, we have successfully represented data in a tabular form, using the
terminal-table gem.
The ability to open, read, and write from the filesystem is an important part of any language. Thankfully, Ruby has quite an extensive and user-friendly file I/O interface.
The IO class is responsible for all input and output operations in Ruby. The File class is a subclass of the IO class:
File.superclass
=> IO
When we interact with the filesystem, we are generally always working with the File class, although it is helpful to understand where it sits in the class hierarchy.
Let's take a look at some common file operations:
We can create new files by instantiating a File object and passing the name of the file and the file mode to the initializer:
file = File.new("new.txt", "w")
=> #<File:new.txt>
file.close
When we create or open files using the File.new method, we also need to call close afterward to tell Ruby to release the handle it has opened for the file. When using the File.open method with a block, close is automatically called for us. We shall discuss this in more detail later on in the chapter.
You might be wondering what that w parameter that appears after the filename is. This tells Ruby what mode we want to use. In this example, we've set the mode to w, which means we want to write to a new file. By default, unless we supply this parameter, the file mode will be r, which is short for READ. The READ file mode is only for, as you may have guessed, reading files.Attempting to create a new file using this mode will give you an error like the following:
We will cover file modes more extensively later.
Reading the contents of files is quite a simple process with Ruby. There are a few different methods for reading and processing files:
We'll use a test file named company.txt for this section that contains the
following content:
ACME Company
555 Mystery Lane
2010
Let's take a look at each of these file reading methods in turn.
The File.read method will read the whole file into memory at once and handle
the file just like a large string:
File.read("company.txt")
=> "ACME Company 555 Mystery Lane 2010"
We see that the entire file is loaded into memory and returned as a single string.
The newline characters are represented as in the string.
Some potential issues may arise from the File.read method. When it comes to the loading of a large file into memory, it can be an inefficient method. In such cases, we can prompt for a more optimized solution for larger files, which leads into the next section of using File.open and File.foreach.
The File.open method on its own simply returns an instance of the File class indicating we have an open file handle on the company.txt file:
File.open("company.txt")
=> #<File:company.txt>
Passing a block to the File.open method, however, allows us to iterate over the contents of the file one line at a time and process the contents. The block will automatically close the file when it exits:
File.open('company.txt').each do |row|
puts row
end
The output would be the following:
ACME Company
555 Mystery Lane
2010
Much like the File.open().each method, we can use the slightly more succinct File.foreach method, which does essentially the same thing without the need to specifically call .each:
File.foreach('company.txt') do |row|
puts row
end
The output would be the following:
ACME Company
555 Mystery Lane
2010
You could easily be forgiven for being confused about these methods that seemingly do the exact same thing. From an end user's point of view, that's true; however, from a programming perspective, they are quite different.
The File.read method will load the entire contents of a file into memory for us to process at once. This may be suitable for smaller files, but for anything larger, this can have a serious impact on your system and application performance.
The File.open method with a block and the File.foreach method, however, process the contents of a file one line at a time. This allows Ruby to manage memory more efficiently, and they are generally a safer option for when your files are of varying sizes.
We will cover more on this topic of loading and processing data one row at a time in the Handling CSV Data section.
Note
Loading external data can have a detrimental effect on application performance if it is not done correctly. Understanding how data is being allocated and cleaned up by Ruby is a key factor in ensuring consistent performance.
There are several ways to write to files in Ruby, each with a slightly different use case:
Let's take a look at each of them.
File.new will return an instance of a file with an open file handle, which means we are able to write to it:
file = File.new("new.txt", "w")
file.puts "Hey, nice file"
file.close
Calling puts on the file object here writes the string to the file, although you'll still need to call close on the object before you can access the contents of the file from outside the application.
Note
You can actually call file.puts, file.write, and file << "my string" to write to the File object and you'll get the same result.
This method allows us to pass a block with the instantiated file object as a parameter. This is a cleaner approach that allows you to create, write, and close a file all in one block of code:
File.open("new.txt", "w") do |file|
file.write("Hey, nice file")
end
The section after the open statement is a Ruby block and it allows us to encapsulate our write logic into a section of code that, when completed, will automatically close the file so we don't need to call file.close manually. It looks much cleaner.
File.write("new.txt", "Hey, nice file")
The File.write method is more of a shorthand syntax. The write method is actually a member of the parent IO class and not the File class. It is a quick and short method for opening a file, writing a string to a file, and closing the file with the smallest amount of code. The length of the characters written is returned by the write method rather than a file handle.
You will see both of these methods used in examples online. It mostly comes down to personal preference, but File.open is the more useful method due to the fact that it supports the ability to pass in a block and immediately iterate over a collection, writing out the results to a file before automatically closing the file when the block exits.
There are times, however, when you may wish to open a file and pass the file reference to another method for processing before closing the file. In this case, you may prefer to use File.new over File.open.
Both methods will return an instance of File (File.open is used only when called without a block) and can be used to pass a reference to the open file around the rest of the code base. However, using File.new for this specific use case and File.open only with blocks can help to make your code easier to understand, as other engineers will know that whenever they see a File.new method, there needs to be a corresponding .close method call for the instantiated object.
We've seen the usage of file modes in the previous examples. To put it simply, they tell Ruby how much access we want to enable for the file we're going to interact with.
There are several file modes that we can choose from depending on the requirement we have for the file. The most common usages are to "read a file mode", r, and to "write a file mode", w.
The following table is from the official Ruby documentation from the IO class and explains the meaning behind each of the modes:
Aside from reading and writing data, Ruby comes with a bunch of very helpful file helpers right out of the box. These can help you to solve a number of common file-related tasks, such as checking for the existence of files and permissions and deleting files. Here are a few examples of useful file helpers in Ruby.
File.exist?
This checks for the existence of a file. It returns true or false. Use this before creating a new file or opening an existing one to ensure the file exists and to avoid throwing errors.
Dir.exist?
This checks for the existence of a directory. It returns true or false. It is very much like File.exist? except for the directories. Use this before creating a new directory to ensure the file exists and avoid throwing errors.
File.delete
This deletes a file when given a file path.
File.size?
This returns the size of a file. You may wish to use this to report on the size of datasets after importing them or to verify the size of a file before processing it.
File.truncate
This truncates (that is, clears) the contents of a file. You can use this to reset a file's content back to empty before writing to it. It is helpful if you wish to reuse a particular file.
File.zero?
This returns true if the file is empty. Use this when you want to verify whether a file has any content or not.
File.birthtime
This returns the birth time (that is, the time of creation) of a file. Use this when you want to know how old a file is.
Note
You can read more about all the available file options in the official Ruby documentation for the File class here: https://packt.live/35nehxE.
CSV is a very common format for representing tabular data. It is an easily parsable data format to work with and it can be opened in all common spreadsheet applications, such as Microsoft Office and Google Docs, with no need for conversion.
We can represent columns and rows with CSV, much like a relational database or a spreadsheet, which makes it a very handy tool for processing exported database records, generating data to be imported into a database, or creating spreadsheets.
Ruby comes with a full library for handling CSV data out of the box. The Ruby CSV library is actually a gem and is part of the Ruby default gem set. This means that to use the CSV library in your code, you simply need to "require" it.
We can see the csv gem with the following gem list command:
$ gem list | grep csv
csv (default: 1.0.0)
Ruby has even published this gem publicly on GitHub (https://packt.live/35qKCUf), just like any other gem.
All modern versions of Ruby (1.9 and later) will have a default CSV gem. The Ruby CSV gem from Ruby 1.9.3 and later is actually based on a popular CSV parsing gem called FasterCSV. This was an optional replacement for the core CSV gem before Ruby 1.9.3; however, as it was so popular, the Ruby team simply replaced the default CSV gem with FasterCSV as the default CSV gem.
There are other CSV gems out there if you have more specific needs. SmarterCSV is another well-known replacement for the default CSV gem that offers parallel import processing for the better handling of larger files.
Note
You can refer to the following for more information on SmarterCSV: https://packt.live/2B5xSVk.
Similar to the File class we covered previously, the CSV gem has a number of different ways in which we can interact with CSV data and files. Let's take a closer look.
The simplest way to load CSV data is with the CSV.read method. This is an "all-at-once" method that will load the entire CSV document at once into memory and return an array of arrays representing the data.
Let's imagine that we have a users.csv file containing the following CSV data:
Mike Smith,35,Sydney
James Taylor,42,New York
Susan Jones,29,San Francisco
We can read the data with the following:
CSV.read("users.csv")
This returns the following:
=> [["Mike Smith", "35", "Sydney"], ["James Taylor", "42", " New York"], ["Susan Jones", "29", "San Francisco"]]
Here, we can see that the data has been loaded into an array. We can also see that each row in the CSV data has been represented as an array inside the main array, so we have a nested array or an array of arrays.
We can iterate over this array to get access to the individual rows:
require 'csv'
users = CSV.read("users.csv")
users.each do |user|
puts "name: #{user[0]}"
puts "age: #{user[1]}"
puts "city: #{user[2]}"
end
This returns the following:
name: Mike Smith
age: 35
city: Sydney
name: James Taylor
age: 42
city: New York
name: Susan Jones
age: 29
city: San Francisco
This is great. We're able to load data from an external file and interact with it in Ruby with only a few lines of code.
But what kind of problems do you think will arise if we have a wide dataset with many columns or fields? Well, accessing the data with array index positions can get confusing when there are many columns.
The numbers in the square brackets after the user variable are index positions. As we can see from the preceding array-of-arrays example, each element of the array is another array containing the user information. For each user array, we see that index position 0 is the name, index position 1 is the age, and index position 2 is the city.
As you can imagine, if you have a dataset with many columns or fields, this can get confusing, as you need to keep track of exactly which column is at which index. For example, is user[18] the address or is it user[12]? Not only is this a bit confusing, but it makes our code hard to read. Wouldn't it be nicer if we could access the data based on the actual field name?
Thankfully, Ruby makes it easy to refer to row data by the field name, which makes our code cleaner and easier to read.
If, instead, our previous dataset included a header row, the output would be as follows:
name,age,city
Mike Smith,35,Sydney
James Taylor,42,New York
Susan Jones,29,San Francisco
We can simply pass in the headers parameter to the read method:
headers: true
We can now use the heading name to access the row data:
require 'csv'
users = CSV.read("users_with_headers.csv", headers: true)
users.each do |user|
puts "name: #{user["name"]}"
puts "age: #{user["age"]}"
puts "city: #{user["city"]}"
end
This returns the following output:
name: Mike Smith
age: 35
city: Sydney
name: James Taylor
age: 42
city: New York
name: Susan Jones
age: 29
city: San Francisco
Much better. How does that work, though? How do you access an array position with a string?
The simple answer is that you don't. Ruby is performing some magic behind the scenes here when the headers parameter is supplied, and, instead of returning an array of arrays as we saw previously, it is returning an instance of the CSV::Table class:
require 'csv'
CSV.read("users_with_headers.csv", headers: true)
=> #<CSV::Table mode:col_or_row row_count:4>
The core Ruby documentation describes the CSV::Table class (https://packt.live/2OKCIPT) as follows:
"A CSV::Table is a two-dimensional data structure for representing CSV documents. Tables allow you to work with the data by row or column, manipulate the data, and even convert the results back to CSV, if needed."
This simply means that it's a more flexible representation of the dataset than simply an array of arrays. It allows you to interact with the data in different dimensions, be it by row or by column.
Note
While both usability and reliability are increased, there is a performance cost of converting the dataset into hashes and the CSV::Table and CSV::Row types.For example, we can retrieve the first row with the by_row method, as shown in the following code block:
users = CSV.read("users_with_headers.csv", headers: true)
users.by_row[0]
=> #<CSV::Row "name":"Mike Smith" "age":"35" "city":"Sydney">
This returns an instance of the CSV::Row class, which has its own set of helper methods.
We can just as easily return the first column of all rows using the by_col method.
To retrieve the values of the first column and return them as an array, we can do the following:
users = CSV.read("users_with_headers.csv", headers: true)
users.by_col[0]
=> ["Mike Smith", "James Taylor", "Susan Jones"]
Only use the CSV.read method when you're dealing with small datasets. Loading large CSV files into memory can cause performance issues and result in excessive resource consumption. We'll discuss the usage of CSV.foreach for handling larger datasets in a moment.
In most cases, we are only interested in a part of the data and not the entire dataset. In this exercise, we will be obtaining only the city column from exercise_2.csv, which contains other columns too, such as name and gender. To do so, perform the following steps:
name,age,city
Mike Smith,35,Sydney
James Taylor,42,New York
Susan Jones,29,San Francisco
require 'csv'
users = CSV.read("exercise_2.csv", headers: true)
cities = users.by_col["city"]
puts cities
ruby exercise_2.rb
You should see a response as follows:
Previously, we have been using the CSV.read method to load all CSV data at once into memory before working with it. However, if you're working with a large dataset, this may cause resource consumption issues.
Processing the data row by row will allow Ruby to manage the memory usage more efficiently on your machine and is generally considered a more idiomatic approach. This can be done by using the CSV.foreach method.
Consider the following example:
require 'CSV'
CSV.foreach('users.csv') do |user|
puts "name: #{user[0]}"
puts "age: #{user[1]}"
puts "city: #{user[2]}"
end
In the preceding example, we call CSV.foreach with the filename, just like we did for CSV.read, which opens the file. However, instead of assigning the result to a variable, we supply a block that we can iterate over in order to print the results, much like what we did earlier by using each to loop over the array.
Once again, we can supply the headers: true parameter to invoke the named key functionality we saw earlier, which provides us with a cleaner interface to retrieve the row data:
require 'csv'
CSV.foreach("users_with_headers.csv", headers: true) do |user|
puts "name: #{user['name']}"
puts "age: #{user['age']}"
puts "city: #{user['city']}"
end
Running both of the preceding foreach examples results in the same response:
name: Mike Smith
age: 35
city: Sydney
name: James Taylor
age: 42
city: New York
name: Susan Jones
age: 29
city: San Francisco
Both CSV.read and CSV.foreach open the files in read-only mode unless specifically set otherwise by the user. As we're only reading the data, the default read-only mode is all we need.
We saw previously with CSV.read that, by default, it will return an array of arrays, unless you specify the headers: true parameter, in which case it will return an instance of CSV::Table, which contains a number of CSV::Row instances to represent the row data.
You could be assuming that the CSV.foreach method would work in the same way, but it is actually slightly different again.
The CSV.foreach method returns an instance of Enumerator, which is an internal Ruby class used for iterating over collections. This instance will contain instances of CSV::Row to represent the row data in the same way the CSV.read method did.
Unlike the CSV.read method, however, passing in the headers: true parameter does not change the response type:
irb(main):001:0> require "csv"
=> true
irb(main):002:0> CSV.read("users.csv").class
=> Array
irb(main):003:0> CSV.read("users_with_headers.csv", headers: true).class
=> CSV::Table
irb(main):004:0> CSV.foreach("users.csv").class
=> Enumerator
irb(main):005:0> CSV.foreach("users_with_headers.csv", headers: true).class
=> Enumerator
For the most part, all these slightly different response objects work in a similar way; you can loop over them with an each block, perform basic enumerable functions, and treat them much like you would treat a regular array.
It is important, however, to remember that these objects are not simply arrays and, therefore, you may encounter slightly different methods or operations for interacting with the row data depending on the class.
For example, the CSV::Table class provides the by_col method, whereas the Enumerator and basic Array classes do not:
irb(main):001:0> CSV.read("users.csv", headers: true).respond_to? :by_col
=> true
irb(main):002:0> CSV.read("users.csv").respond_to? :by_col
=> false
irb(main):003:0> CSV.foreach("users.csv").respond_to? :by_col
=> false
The preceding irb log uses the respond_to? method, which simply checks whether an object will "respond to" a particular method name. In other words, it checks whether that method exists on an object or class.
You can imagine that something as simple as changing your CSV import dataset to now include a header row may result in confusion as the response object can change. Simply being aware of these variations can be helpful and save you from having to debug the issue when something unexpected happens.
Writing to a CSV file follows a similar pattern to the CSV.foreach method we learned about previously, except that, here, we use the CSV.open method and supply a block.
Inside the block, we have access to the file, which we can write to by simply calling puts on the opened csv variable. In the same way as before, we do not need to manually call close as the block will automatically do that for us when it exits:
require 'csv'
CSV.open("new_users.csv", "w") do |csv|
csv.puts ["Sarah Meyer", "25", "Cologne"]
csv.puts ["Matt Hall", "35", "Sydney"]
end
There is an alternative syntax that you may see other people using that works in the same way and makes use of the append operator instead of the puts method. We do this by using the object << value syntax. Really, puts is just an alias for <<, so you can expect it to work in the same way:
require 'csv'
CSV.open("new_users.csv", "w") do |csv|
csv << ["Sarah Meyer", "25", "Cologne"]
csv << ["Matt Hall ", "35", "Sydney"]
end
Both of these examples will generate a new CSV file named new_users.csv with the following content:
Sarah Meyer,25,Cologne
Matt Hall,35,Sydney
The second parameter of the CSV.open command is the file mode parameter we learned about in the File I/O section. The same rules apply here. In our case, we have passed in w, the write file mode parameter, which will create a new file each time, overwriting any previous files with the same name.
We could just as easily open the file with the a mode and append the file data to the end of the existing file if we were building a larger file over time.
Most CSV methods have a similar parameter structure to the file I/O methods. For the most part, they will work in the same way when it comes to closing files.
In the previous example, our data was static, meaning there was a line of code for each row inserted into the CSV file. In the real world, however, you're not likely to be doing this (as this would be incredibly time-consuming). It's more likely that you would want to export a collection of records from a database or another data source as CSV data.
This may be hundreds or thousands of rows, but the code should still only be a few lines long.
Let's imagine that we have a table of data that contains the names of cities, the country name, and the number of employees in that city. We want to export that data as CSV.
That data is then handed to our code and we need to iterate over it to generate the data. Here's what it might look like:
require "csv"
cities = [
{ name: "San Francisco", country: "United States", employees: 15 },
{ name: "Sydney", country: "Australia", employees: 11 },
{ name: "London", country: "England", employees: 18 }
]
CSV.open("employee_count.csv", "w") do |csv|
cities.each do |city|
csv.puts [city[:name], city[:country], city[:employees]]
end
end
In the preceding example, we define an array of cities, where each city is a hash. We define this array as an example, but, as you can imagine, in a real-world application, this may be a very long list returned from a database table.The output from the preceding code will be as follows:
San Francisco,United States,15
Sydney,Australia,11
London,England,18
The following CSV code is only five lines long and will loop over the entire collection of cities, writing out the data as CSV rows.
We want to highlight here that five lines of code can process thousands of lines of data; we can achieve a very useful result very simply.
Note
There are a number of other pieces of CSV functionality that are available with the standard core Ruby CSV library; you can refer to https://packt.live/2pc66DI for more details.
Exporting data in CSV format is a great way to retrieve data from your application and share it with other systems or people. In this exercise, we will export an array of users as CSV much like you would export a table from a database in order to process that data with a spreadsheet or another system:
require "csv"
users = [
{ name: "John Smith", age: 36, city: "Sydney" },
{ name: "Susan Alan", age: 31, city: "San Francisco" },
{ name: "Daniel Jones", age: 43, city: "New York" }
]
CSV.open("new_users.csv", "w") do |csv|
users.each do |user|
csv.puts [user[:name], user[:age], user[:city]]
end
end
ruby exercise_3.rb
You should see the following content:
John Smith,36,Sydney
Susan Alan,31,San Francisco
Daniel Jones,43,New York
Thus, we have successfully used a Ruby gem to write data to a file.
In previous sections, we learned how to install and include Ruby gems to extend the functionality of our application. We also learned how to interact with files and CSV data. We now know how to read, process, and output data to the screen or the filesystem.
This is useful functionality, although opening files and writing CSV data is something that we would generally consider to be a common functionality that is likely to be shared between classes and can also seem unrelated to the existing classes in our code base.
What do we mean by unrelated? Well, let's assume we have a User class and a Company class in our application. We want the ability to load user and company data and print it to the Terminal for both of these classes. So, where do we write the code for this functionality? In the User class? In the Company class? Or, in both classes?
Possible solutions to the common usage code issue are using modules, service classes, and class inheritance.
The answer to the preceding questions is none of the above; neither the User class nor the Company class is the right place for this code. What we need instead is what's known as a service object.
A service object is a class that is created to perform a specific action or set of actions.
A service object is often created in order to DRY (Don't Repeat Yourself) up your code, which simply means it contains code that may be used in multiple different places in our application, but by exposing that code from a single central location, our domain model code stays clean and relevant.
Service objects play an important role in fulfilling the single responsibility principle.
The single responsibility principle is a well-known and established computer science methodology whereby every class or function has responsibility for no more than a single portion of the code base that represents a single responsibility of the application. This responsibility should be entirely encapsulated within the defined class or module.
Simply put, this is a guideline that means a class or module should only contain logic that relates to that specific class or module.
It is important for the following reasons:
Note
The single responsibility principle is the "S" portion of the SOLID design principles. For more information, you can refer to https://packt.live/2MAFQen.
With this in mind, we can begin to see why we would want to use service objects in our application.
By now, you're probably thinking, "Service objects sound very similar to the modules that we learned about earlier."
Sure, there are similarities between service objects and modules, but they are both fundamentally different concepts at their core.
Both service objects and modules will help keep your code clean and focused, both will DRY up your code base, both will promote code reusability, and both will improve the testability of your application.
So, why do we need both?
A module is used for sharing stateless functions between your code. Stateless functions typically operate on inputs from method parameters and do not assume any state or variables from the enclosing parent class. A stateless function is essentially like a class method; it does not have access to an instantiated objects variable, but merely whatever state is passed to the function when it's called. They are used as "mixins," which extend the functionality of our classes by "mixing in" the functions defined in the module to the class.
Modules are often thought of in a more "functional programming" aspect. They are stateless and declarative. We don't initialize or instantiate modules; this means that we tend to use them more like helpers.
Service objects are simply classes. Classes can be instantiated. Classes can have instance variables and be passed around to other methods. A service object is just an instantiated class with a specific purpose that generally follows a format for calling the code contained in the class.
A simplified example of a service object is as follows:
require "csv"
class CSVPrinter
def initialize(filepath)
@filepath = filepath
end
def perform
CSV.foreach(@filepath) do |row|
puts row.to_s
end
end
end
In the preceding example, we can see that a service object is just a Plain Old Ruby Object (PORO). Unlike a module, however, here we have an initialize method that we can use to instantiate an instance of the class with the filepath object passed in and set the instance as a private instance variable.
We then have an instance method called perform, which will perform the work of the service object; in this case, it's just simply printing out the rows of the CSV file.
Calling the perform method is a somewhat common design pattern that is used to standardize the interface for service objects. The words execute, run, and go are also common. As mentioned in the The Single Responsibility Principle section earlier, if we have just one method for executing the logic in the service object, then it helps us to stick to those guidelines. The name of the class is used as the identifier, not the method call.
It's an optional rule that aims to keep our classes simple and clearly defined with a single responsibility; however, it's still just a suggested pattern. You may decide to have a CSVService class, for example, with methods named print and print_table and so on, rather than many individual services with a single perform method.
An example of how you might use this service object would be as follows: CSVPrinter.new("users.csv").perform.
This is the one-liner way of running the CSVPrinter service object. We can see that an instance of the CSVPrinter class is instantiated and we immediately call the perform method on it.
We could, however, store the instantiated class and call perform on it at a later date if need be:
csvprinter = CSVPrinter.new("users.csv")
csvprinter.perform
Let's take a look at an example where a service object is used.
To highlight where you might want to use a service object, let's examine some slightly more involved code that could do with some refactoring:
require "mailgun-ruby"
class User
attr_accessor :name, :email, :address
def initialize(name, email, address)
@name = name
@email = email
@address = address
end
def create
if save_to_database(self)
send_invite_email
end
end
private
def send_invite_email
mailgun = Mailgun::Client.new ENV['MAILGUN_API_KEY']
params = from: "[email protected]",
to: email,
subject: "Welcome #{name}!",
text: "Thanks for signing up ...."
}
mailgun.send_message 'mail.myapp.com', params
end
end
This is a simplified example that has a single method using the mailgun gem to send an invite email after a user is created. In a real-world application, you tend to find many instances of these small, tightly coupled methods that creep into the application over time. Before you know it, your classes are hundreds of lines long and become difficult to manage.
So, what's actually wrong with doing it this way?
Well, to start with, the send_invite_email method depends on the user instance's email and name attributes to populate the recipient and subject fields, so it's dependent on the instantiated User object's attributes to function.
When you decide to change the email subject or message or add some additional fields or logic, you'll be editing the User class directly, which should really only hold logic regarding the management of User objects, attributes, and resources.
We've also required the mailgun Ruby gem into the Users class as it's a requirement for the send_invite_email method to instantiate the mailgun client in order to send the email.
But do we really need to require an email library in our User class? What about when we decide that we want to send Slack messages, upload avatars to S3, or trigger an SMS to the user? Are we going to require those libraries here too?
You can see how this small class can quickly get out of control.
This code also violates the single responsibility principle by defining email-related methods inside the User class. Even though it is a user-related email, it does not need to live here. You can imagine that the User class of any application is at the very heart of the application and is likely to have a lot of added functionality attached to it in this same way.
Let's refactor this code to use a service object. First, let's extract out the send_email_function function into a service object named UserInviter:
require "mailgun-ruby"
class UserInviter
def initialize(user)
@user = user
end
def perform
mailgun = Mailgun::Client.new ENV['MAILGUN_API_KEY']
mailgun.send_message 'mail.myapp.com', params
end
private
def params
{
from: "[email protected]",
to: @user.email,
subject: "Welcome #{@user.name}!",
text: "Thanks for signing up ...."
}
end
end
Now let's refactor our User class:
class User
attr_accessor :name, :email, :address
def initialize(name, email, address)
@name = name
@email = email
@address = address
end
def create
if save_to_database(self)
UserInviter.new(self).perform
end
end
end
Our User class now only defines what it needs to and, in the future, when we decide to change how our invite emails work or what content they contain, we will just be editing the UserInviter class. We've also moved out the require "mailgun-ruby" statement so it now lives in the UserInviter class, which feels a lot more appropriate as this is a class that specifically deals with sending emails.
It's considered good practice to keep your perform methods short and concise. In our preceding example, we moved the params section into its own private method, which will make testing easier and break down our code into more logical chunks.
Ruby is well known for being a highly readable language; it's one of the reasons why developers love the language so much. There is another common design pattern for service objects, which, in true Ruby fashion, sets out to make our service objects just a little bit nicer to work with.
For instance, instead of writing the code in the following way:
UserInviter.new(self).perform
Wouldn't it be cleaner if we could write it as follows? Let's take a closer look:
UserInviter.perform(self)
Now that feels more Ruby-like. Let's refactor our service object class one more time:
service_object_class_method.rb
1 require "mailgun-ruby"
2
3 class UserInviter
4 def initialize(user)
5 @user = user
6 end
7
8 def perform
9 mailgun = Mailgun::Client.new ENV['MAILGUN_API_KEY']
10 mailgun.send_message 'mail.myapp.com', params
11 end
12
13 def self.perform(*args)
14 new(*args).perform
15 end
This looks the same, right? Well, it is, except for the self.perform class method:
def self.perform(*args)
new(*args).perform
end
This little trick defines a class method named perform, which, when called, will simply instantiate a copy of the UserInviter class, pass through the arguments to the initializer, and then execute the instance method version of the perform method.
We've seen now how service objects can help clean up your code and encapsulate logic into their own classes. Let's try it out now for ourselves by creating a simple service object that lists users and their position in the list.
In this exercise, we will create a service object that lists our users. To do this, we will instantiate a new instance of the service class with our user list, which establishes our state, and then we'll print our users to the screen using the standard perform syntax:
class UserLister
def initialize(users)
@users = users
end
def perform
@users.each_with_index do |user, idx|
puts "User #{idx}: #{user}"
end
end
end
require "./exercise_4a"
users = ['John', 'Susy', 'Sarah', 'James']
UserLister.new(users).perform
ruby exercise_4b.rb
The output should be as follows:
Knowing when to use a module and when to use a service object is more obvious in a real-world application when there are more data objects, functionality, and state objects to think about. Generally, a module is more of a collection of one-time helper functions, such as formatters or converters, whereas service objects can be instantiated with an initial state and can be used to manipulate and mutate data over a series of operations. Instantiated service objects are also available to be passed into methods such as values, whereas modules are "included" in the class with no state.
For example, you may pass an instantiated service object that returns a mutated listing of users into another service object that interacts with the outputted users.
In this activity, we're going to put into practice everything that we've learned in this chapter. We're going to expand our voting program, initiated in Chapter 4, Methods, to allow the importing of external voting data from a CSV file, and we're also going to improve our user experience by using the terminal-table gem to print our vote results to the Terminal in a nice, readable, formatted table.
We will build this new code into service objects so that we don't pollute our models with this extended functionality. Then, finally, we'll write a few new tests to wrap everything up and make sure our code is doing what we expect it to. To do this, perform the following steps:
Dir["./services/*rb"].each { |f| require f }
# require all files in models and controllers directory
require './model'
require './controller'
require './service'
category,votee,count
VoteCategoryA,Chris Jones,23
VoteCategoryA,Susie Bennet,29
VoteCategoryB,Allan Green,33
VoteCategoryB,Tony Bennet,23
Here is the expected output:
The Leaderboard output would look like:
Note
The solution to the activity can be found on page 476
In this chapter, we've covered how to import and export raw CSV data in our application, how to extend the functionality of our applications by including external libraries with Ruby gems, and how to interact with the filesystem using Ruby.
These are powerful tools that can turn our applications from toys into real services with just a few simple lines of code. With these new tools, we can import data from databases, spreadsheets, or any number of other sources and process them programmatically with Ruby in any way we want; the options are endless.
We've also learned some best practices regarding how to structure code that doesn't necessarily fit within our domain models by refactoring them into service objects, keeping our models lean and clean. Your coworkers will thank you for this one, trust me.
An application is only as good as its input and output. We've now learned a few methods by which we can increase how much we input by importing external data, and we've seen how to improve what we output by using external libraries to provide a more user-friendly representation of data.
We can safely say now that we're getting the hang of this. In the next chapter, we will dive a little deeper and go beyond the basics, extending our knowledge of Ruby by looking a little closer at some more advanced functionality.
18.188.10.1